optimus
Data prep library
A Python library that provides a simple API for data preparation and analysis on various big-data engines
Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
1k stars
37 watching
232 forks
Language: Python
last commit: 17 days ago
Linked from 4 awesome lists
big-data-cleaningbigdatacudfdaskdask-cudfdata-analysisdata-cleanerdata-cleaningdata-cleansingdata-explorationdata-extractiondata-preparationdata-profilingdata-sciencedata-transformationdata-wranglingmachine-learningpysparkspark
Related projects:
Repository | Description | Stars |
---|---|---|
sfu-db/dataprep | A Python library for rapidly collecting, cleaning, and visualizing data with minimal code | 2,068 |
ibm/data-prep-kit | A toolkit for streamlining data preparation for developers building large language model applications | 290 |
vagmcs/optimus | A mathematical optimization library for Scala | 141 |
iceye-ltd/icecube | A Python library designed to organize SAR images and annotations for supervised machine learning applications. | 82 |
tum-i4/oedipus | A framework that uses machine learning to uncover metadata from obfuscated programs | 11 |
pytorch/data | A PyTorch project providing data loading utilities and scalable dataloading solutions | 1,133 |
zygmuntz/kaggle-merck | Provides tools to prepare and process data for the Merck challenge at Kaggle | 10 |
primlabs/bucket | A library providing a simple storage solution using stable memory, allowing canisters to store data without GC costs and upgradeability. | 31 |
msamogh/nonechucks | Library that provides dynamic data cleaning and filtering capabilities for PyTorch datasets and samplers | 377 |
dropbox/pyhive | Provides interfaces to connect and interact with data sources like Hive and Presto using Python. | 1,671 |
catalyst-cooperative/pudl | Provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists. | 481 |
opendatacube/datacube-core | A Python-based platform for integrated gridded data analysis from decades of Earth observation satellite data | 514 |
maximtrp/scikit-posthocs | Provides tools for conducting pairwise multiple comparisons tests in statistical data analysis | 348 |
pydap/pydap | A Python library for accessing and manipulating scientific data over the internet using the OPeNDAP protocol. | 139 |
ekami/torchlite | High-level library to simplify machine learning tasks by abstracting repetitive code | 32 |