optimus

Data prep library

A Python library that provides a simple API for data preparation and analysis on various big-data engines

truck Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

GitHub

1k stars
37 watching
232 forks
Language: Python
last commit: 17 days ago
Linked from 4 awesome lists

big-data-cleaningbigdatacudfdaskdask-cudfdata-analysisdata-cleanerdata-cleaningdata-cleansingdata-explorationdata-extractiondata-preparationdata-profilingdata-sciencedata-transformationdata-wranglingmachine-learningpysparkspark

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
sfu-db/dataprep A Python library for rapidly collecting, cleaning, and visualizing data with minimal code 2,068
ibm/data-prep-kit A toolkit for streamlining data preparation for developers building large language model applications 290
vagmcs/optimus A mathematical optimization library for Scala 141
iceye-ltd/icecube A Python library designed to organize SAR images and annotations for supervised machine learning applications. 82
tum-i4/oedipus A framework that uses machine learning to uncover metadata from obfuscated programs 11
pytorch/data A PyTorch project providing data loading utilities and scalable dataloading solutions 1,133
zygmuntz/kaggle-merck Provides tools to prepare and process data for the Merck challenge at Kaggle 10
primlabs/bucket A library providing a simple storage solution using stable memory, allowing canisters to store data without GC costs and upgradeability. 31
msamogh/nonechucks Library that provides dynamic data cleaning and filtering capabilities for PyTorch datasets and samplers 377
dropbox/pyhive Provides interfaces to connect and interact with data sources like Hive and Presto using Python. 1,671
catalyst-cooperative/pudl Provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists. 481
opendatacube/datacube-core A Python-based platform for integrated gridded data analysis from decades of Earth observation satellite data 514
maximtrp/scikit-posthocs Provides tools for conducting pairwise multiple comparisons tests in statistical data analysis 348
pydap/pydap A Python library for accessing and manipulating scientific data over the internet using the OPeNDAP protocol. 139
ekami/torchlite High-level library to simplify machine learning tasks by abstracting repetitive code 32