optimus
Data prep library
A Python library that provides a simple API for data preparation and analysis on various big-data engines
Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
1k stars
38 watching
232 forks
Language: Python
last commit: about 2 months ago
Linked from 4 awesome lists
big-data-cleaningbigdatacudfdaskdask-cudfdata-analysisdata-cleanerdata-cleaningdata-cleansingdata-explorationdata-extractiondata-preparationdata-profilingdata-sciencedata-transformationdata-wranglingmachine-learningpysparkspark
Related projects:
Repository | Description | Stars |
---|---|---|
sfu-db/dataprep | A Python library for rapidly collecting, cleaning, and visualizing data with minimal code | 2,088 |
ibm/data-prep-kit | A toolkit for streamlining data preparation for developers building large language model applications | 363 |
vagmcs/optimus | A mathematical optimization library written in Scala, supporting linear and quadratic programming with various solver options. | 141 |
iceye-ltd/icecube | A Python library designed to organize SAR images and annotations for supervised machine learning applications. | 81 |
tum-i4/oedipus | A framework that uses machine learning to uncover metadata from obfuscated programs | 11 |
pytorch/data | Provides scalable, performant data loading solutions and utilities to be shared by PyTorch domain libraries | 1,149 |
zygmuntz/kaggle-merck | Provides tools to prepare and process data for the Merck challenge at Kaggle | 10 |
primlabs/bucket | A library providing a simple storage solution using stable memory, allowing canisters to store data without GC costs and upgradeability. | 31 |
msamogh/nonechucks | Library that provides dynamic data cleaning and filtering capabilities for PyTorch datasets and samplers | 378 |
dropbox/pyhive | Provides interfaces to connect and interact with data sources like Hive and Presto using Python. | 1,676 |
catalyst-cooperative/pudl | Provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists. | 492 |
opendatacube/datacube-core | A Python-based platform for integrated gridded data analysis from decades of Earth observation satellite data | 518 |
maximtrp/scikit-posthocs | Provides tools for conducting pairwise multiple comparisons tests in statistical data analysis | 354 |
pydap/pydap | A Python library for accessing and manipulating scientific data over the internet using the OPeNDAP protocol. | 139 |
ekami/torchlite | High-level library to simplify machine learning tasks by abstracting repetitive code | 32 |