optimus
Data prep library
A Python library that provides a simple API for data preparation and analysis on various big-data engines
Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
1k stars
38 watching
232 forks
Language: Python
last commit: 3 months ago
Linked from 4 awesome lists
big-data-cleaningbigdatacudfdaskdask-cudfdata-analysisdata-cleanerdata-cleaningdata-cleansingdata-explorationdata-extractiondata-preparationdata-profilingdata-sciencedata-transformationdata-wranglingmachine-learningpysparkspark
Related projects:
Repository | Description | Stars |
---|---|---|
| A Python library for rapidly collecting, cleaning, and visualizing data with minimal code | 2,088 |
| A toolkit for streamlining data preparation for developers building large language model applications | 363 |
| A mathematical optimization library written in Scala, supporting linear and quadratic programming with various solver options. | 141 |
| A Python library designed to organize SAR images and annotations for supervised machine learning applications. | 81 |
| A framework that uses machine learning to uncover metadata from obfuscated programs | 11 |
| Provides scalable, performant data loading solutions and utilities to be shared by PyTorch domain libraries | 1,149 |
| Provides tools to prepare and process data for the Merck challenge at Kaggle | 10 |
| A library providing a simple storage solution using stable memory, allowing canisters to store data without GC costs and upgradeability. | 31 |
| Library that provides dynamic data cleaning and filtering capabilities for PyTorch datasets and samplers | 378 |
| Provides interfaces to connect and interact with data sources like Hive and Presto using Python. | 1,676 |
| Provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists. | 492 |
| A Python-based platform for integrated gridded data analysis from decades of Earth observation satellite data | 518 |
| Provides tools for conducting pairwise multiple comparisons tests in statistical data analysis | 354 |
| A Python library for accessing and manipulating scientific data over the internet using the OPeNDAP protocol. | 139 |
| High-level library to simplify machine learning tasks by abstracting repetitive code | 32 |