dirty_cat

Categorical encoder library

A Python library for handling and encoding dirty categorical data in machine learning

Machine learning on dirty tabular data (legacy clone of skrub)

GitHub

17 stars
0 watching
4 forks
Language: Python
last commit: about 1 month ago
Linked from 2 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
alfred82santa/dirty-models A Python library that provides a way to easily create and manage data models without modifying the original data. 10
davified/clean-code-ml Adapting clean code principles to machine learning and data science in Python 714
neuraxio/kata-clean-machine-learning-from-dirty-code Converting dirty machine learning code into clean, modular, and reusable components using the Pipe and Filter Design Pattern for Machine Learning. 18
msamogh/nonechucks Library that provides dynamic data cleaning and filtering capabilities for PyTorch datasets and samplers 378
cgnorthcutt/rankpruning An algorithm and package for handling noisy labels in binary classification problems 82
tidalcycles/clean-samples Provides pre-cleaned and documented audio samples for musical experimentation 45
kastnerkyle/kaggle-dogs-vs-cats A Python implementation of a machine learning solution for classifying images as dogs or cats from the Kaggle competition. 66
dizballanze/django-eraserhead Tool to optimize database usage in Django by identifying and suggesting the removal of unused fields. 196
kthyeon/fine_official Implementation of a method to improve machine learning models trained with noisy labels by selecting and collaborating with high-quality samples 39
sergioburdisso/pyss3 A Python package implementing an interpretable machine learning model for text classification with visualization tools 336
pytorch/data Provides scalable, performant data loading solutions and utilities to be shared by PyTorch domain libraries 1,149
databasecleaner/database_cleaner-mongoid A tool for cleaning up data in MongoDB databases. 9
ayush1997/visualize_ml A Python package for data analysis and visualization in machine learning 198
hcguersoy/cleanreg Removes unnecessary image manifests from a Docker Registry 56
scour-project/scour An SVG optimizer/cleaner tool that reduces the size of vector graphics by removing unnecessary data. 785