awesome-python-data-science
Data Science Toolbox
A curated list of data science software in Python
Probably the best curated list of data science software in Python.
3k stars
59 watching
345 forks
last commit: about 2 months ago
Linked from 4 awesome lists
awesomeawesome-listawesome-pythondata-analysisdata-sciencedata-visualizationdeep-learningmachine-learningpythonscikit-learnstatistics
Awesome Python Data Science / Machine Learning / General Purpose Machine Learning | |||
scikit-learn | Machine learning in Python | ||
PyCaret | 8,955 | 13 days ago | An open-source, low-code machine learning library in Python |
Shogun | 3,034 | 11 months ago | Machine learning toolbox |
xLearn | 3,087 | about 1 year ago | High Performance, Easy-to-use, and Scalable Machine Learning Package |
cuML | 4,238 | 7 days ago | RAPIDS Machine Learning Library |
modAL | 2,228 | 9 months ago | Modular active learning framework for Python3 |
Sparkit-learn | 1,154 | almost 4 years ago | PySpark + scikit-learn = Sparkit-learn |
mlpack | 5,113 | 8 days ago | A scalable C++ machine learning library (Python bindings) |
dlib | 13,561 | 29 days ago | Toolkit for making real-world machine learning and data analysis applications in C++ (Python bindings) |
MLxtend | 4,907 | 7 days ago | Extension and helper modules for Python's data analysis and machine learning libraries |
hyperlearn | 1,842 | about 1 month ago | 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels |
Reproducible Experiment Platform (REP) | 689 | 4 months ago | Machine Learning toolbox for Humans |
scikit-multilearn | 921 | 10 months ago | Multi-label classification for python |
seqlearn | 688 | over 1 year ago | Sequence classification toolkit for Python |
pystruct | 665 | about 3 years ago | Simple structured learning framework for Python |
sklearn-expertsys | 489 | over 7 years ago | Highly interpretable classifiers for scikit learn |
RuleFit | 411 | about 1 year ago | Implementation of the rulefit |
metric-learn | 1,399 | 4 months ago | Metric learning algorithms in Python |
pyGAM | 875 | 5 months ago | Generalized Additive Models in Python |
causalml | 5,095 | 13 days ago | Uplift modeling and causal inference with machine learning algorithms |
Awesome Python Data Science / Machine Learning / Gradient Boosting | |||
XGBoost | 26,299 | 6 days ago | Scalable, Portable, and Distributed Gradient Boosting |
LightGBM | 16,694 | 6 days ago | A fast, distributed, high-performance gradient boosting |
CatBoost | 8,088 | 6 days ago | An open-source gradient boosting on decision trees library |
ThunderGBM | 693 | 10 months ago | Fast GBDTs and Random Forests on GPUs |
NGBoost | 1,654 | 24 days ago | Natural Gradient Boosting for Probabilistic Prediction |
TensorFlow Decision Forests | 660 | 10 days ago | A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras |
Awesome Python Data Science / Machine Learning / Ensemble Methods | |||
ML-Ensemble | High performance ensemble learning | ||
Stacking | 221 | almost 7 years ago | Simple and useful stacking library written in Python |
stacked_generalization | 117 | over 5 years ago | Library for machine learning stacking generalization |
vecstack | 685 | 3 months ago | Python package for stacking (machine learning technique) |
Awesome Python Data Science / Machine Learning / Imbalanced Datasets | |||
imbalanced-learn | 6,847 | about 2 months ago | Module to perform under-sampling and over-sampling with various techniques |
imbalanced-algorithms | 235 | almost 3 years ago | Python-based implementations of algorithms for learning on imbalanced data |
Awesome Python Data Science / Machine Learning / Random Forests | |||
rpforest | 223 | almost 5 years ago | A forest of random projection trees |
sklearn-random-bits-forest | 9 | over 8 years ago | Wrapper of the Random Bits Forest program written by (Wang et al., 2016) |
rgf_python | 378 | almost 3 years ago | Python Wrapper of Regularized Greedy Forest |
Awesome Python Data Science / Machine Learning / Kernel Methods | |||
pyFM | 922 | about 4 years ago | Factorization machines in python |
fastFM | 1,075 | over 2 years ago | A library for Factorization Machines |
tffm | 780 | almost 3 years ago | TensorFlow implementation of an arbitrary order Factorization Machine |
liquidSVM | 66 | almost 5 years ago | An implementation of SVMs |
scikit-rvm | 231 | over 7 years ago | Relevance Vector Machine implementation using the scikit-learn API |
ThunderSVM | 1,573 | 8 months ago | A fast SVM Library on GPUs and CPUs |
Awesome Python Data Science / Deep Learning / PyTorch | |||
PyTorch | 83,959 | 6 days ago | Tensors and Dynamic neural networks in Python with strong GPU acceleration |
pytorch-lightning | 28,402 | 3 days ago | PyTorch Lightning is just organized PyTorch |
ignite | 4,526 | 14 days ago | High-level library to help with training neural networks in PyTorch |
skorch | 5,881 | 16 days ago | A scikit-learn compatible neural network library that wraps PyTorch |
Catalyst | 3,295 | 8 months ago | High-level utils for PyTorch DL & RL research |
ChemicalX | 714 | about 1 year ago | A PyTorch-based deep learning library for drug pair scoring |
Awesome Python Data Science / Deep Learning / TensorFlow | |||
TensorFlow | 186,382 | 6 days ago | Computation using data flow graphs for scalable machine learning by Google |
TensorLayer | 7,334 | almost 2 years ago | Deep Learning and Reinforcement Learning Library for Researcher and Engineer |
TFLearn | 9,619 | 7 months ago | Deep learning library featuring a higher-level API for TensorFlow |
Sonnet | 9,776 | 7 days ago | TensorFlow-based neural network library |
tensorpack | 6,303 | over 1 year ago | A Neural Net Training Interface on TensorFlow |
Polyaxon | 3,571 | 7 days ago | A platform that helps you build, manage and monitor deep learning models |
tfdeploy | 353 | 9 months ago | Deploy TensorFlow graphs for fast evaluation and export to TensorFlow-less environments running numpy |
tensorflow-upstream | 688 | 6 days ago | TensorFlow ROCm port |
TensorFlow Fold | 1,827 | over 3 years ago | Deep learning with dynamic computation graphs in TensorFlow |
TensorLight | 11 | about 2 years ago | A high-level framework for TensorFlow |
Mesh TensorFlow | 1,592 | about 1 year ago | Model Parallelism Made Easier |
Ludwig | 11,189 | 24 days ago | A toolbox that allows one to train and test deep learning models without the need to write code |
Keras | A high-level neural networks API running on top of TensorFlow | ||
keras-contrib | 1,581 | about 2 years ago | Keras community contributions |
Hyperas | 2,178 | almost 2 years ago | Keras + Hyperopt: A straightforward wrapper for a convenient hyperparameter |
Elephas | 1,574 | over 1 year ago | Distributed Deep learning with Keras & Spark |
qkeras | 540 | 29 days ago | A quantization deep learning library |
Awesome Python Data Science / Deep Learning / MXNet | |||
MXNet | 20,781 | about 1 year ago | Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler |
Gluon | 2,299 | over 5 years ago | A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet) |
Xfer | 252 | over 1 year ago | Transfer Learning library for Deep Neural Networks |
MXNet | 28 | almost 5 years ago | HIP Port of MXNet |
Awesome Python Data Science / Deep Learning / JAX | |||
JAX | 30,499 | 6 days ago | Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more |
FLAX | 6,132 | 7 days ago | A neural network library for JAX that is designed for flexibility |
Optax | 1,697 | 9 days ago | A gradient processing and optimization library for JAX |
Awesome Python Data Science / Deep Learning / Others | |||
transformers | 135,022 | 6 days ago | State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX |
Tangent | 2,315 | about 2 years ago | Source-to-Source Debuggable Derivatives in Pure Python |
autograd | 7,017 | 10 days ago | Efficiently computes derivatives of numpy code |
Caffe | 34,125 | 4 months ago | A fast open framework for deep learning |
nnabla | 2,728 | 6 days ago | Neural Network Libraries by Sony |
Awesome Python Data Science / Automated Machine Learning | |||
auto-sklearn | 7,632 | 6 days ago | An AutoML toolkit and a drop-in replacement for a scikit-learn estimator |
Auto-PyTorch | 2,376 | 8 months ago | Automatic architecture search and hyperparameter optimization for PyTorch |
AutoKeras | 9,154 | 16 days ago | AutoML library for deep learning |
AutoGluon | 8,039 | 6 days ago | AutoML for Image, Text, Tabular, Time-Series, and MultiModal Data |
TPOT | 9,736 | 4 months ago | AutoML tool that optimizes machine learning pipelines using genetic programming |
MLBox | 1,500 | over 1 year ago | A powerful Automated Machine Learning python library |
Awesome Python Data Science / Natural Language Processing | |||
torchtext | 3,514 | 6 days ago | Data loaders and abstractions for text and NLP |
gluon-nlp | 2,557 | about 1 year ago | NLP made easy |
KerasNLP | 797 | 6 days ago | Modular Natural Language Processing workflows with Keras |
spaCy | Industrial-Strength Natural Language Processing | ||
NLTK | 13,620 | 10 days ago | Modules, data sets, and tutorials supporting research and development in Natural Language Processing |
CLTK | 839 | 3 months ago | The Classical Language Toolkik |
gensim | Topic Modelling for Humans | ||
pyMorfologik | 18 | over 9 years ago | Python binding for |
skift | 234 | over 2 years ago | Scikit-learn wrappers for Python fastText |
Phonemizer | 1,231 | about 2 months ago | Simple text-to-phonemes converter for multiple languages |
flair | 13,939 | 6 days ago | Very simple framework for state-of-the-art NLP |
Awesome Python Data Science / Computer Audition | |||
torchaudio | 2,538 | 6 days ago | An audio library for PyTorch |
librosa | 7,171 | about 1 month ago | Python library for audio and music analysis |
Yaafe | 244 | over 3 years ago | Audio features extraction |
aubio | 3,314 | 4 months ago | A library for audio and music analysis |
Essentia | 2,858 | 29 days ago | Library for audio and music analysis, description, and synthesis |
LibXtract | 227 | over 4 years ago | A simple, portable, lightweight library of audio feature extraction functions |
Marsyas | 406 | over 1 year ago | Music Analysis, Retrieval, and Synthesis for Audio Signals |
muda | 233 | over 3 years ago | A library for augmenting annotated audio data |
madmom | 1,347 | 3 months ago | Python audio and music signal processing library |
Awesome Python Data Science / Computer Vision | |||
torchvision | 16,251 | 6 days ago | Datasets, Transforms, and Models specific to Computer Vision |
PyTorch3D | 8,806 | 15 days ago | PyTorch3D is FAIR's library of reusable components for deep learning with 3D data |
gluon-cv | 5,833 | 7 months ago | Provides implementations of the state-of-the-art deep learning models in computer vision |
KerasCV | 1,010 | 20 days ago | Industry-strength Computer Vision workflows with Keras |
OpenCV | 79,147 | 5 days ago | Open Source Computer Vision Library |
Decord | 1,891 | 4 months ago | An efficient video loader for deep learning with smart shuffling that's super easy to digest |
MMEngine | 1,179 | 15 days ago | OpenMMLab Foundational Library for Training Deep Learning Models |
scikit-image | 6,089 | 7 days ago | Image Processing SciKit (Toolbox for SciPy) |
imgaug | 14,417 | 4 months ago | Image augmentation for machine learning experiments |
imgaug_extension | Additional augmentations for imgaug | ||
Augmentor | 5,073 | 8 months ago | Image augmentation library in Python for machine learning |
albumentations | 14,254 | 9 days ago | Fast image augmentation library and easy-to-use wrapper around other libraries |
LAVIS | 9,926 | about 1 month ago | A One-stop Library for Language-Vision Intelligence |
Awesome Python Data Science / Time Series | |||
sktime | 7,943 | 6 days ago | A unified framework for machine learning with time series |
skforecast | 1,156 | 4 days ago | Time series forecasting with machine learning models |
darts | 8,087 | 5 days ago | A python library for easy manipulation and forecasting of time series |
statsforecast | 3,990 | 10 days ago | Lightning fast forecasting with statistical and econometric models |
mlforecast | 899 | 7 days ago | Scalable machine learning-based time series forecasting |
neuralforecast | 3,101 | 9 days ago | Scalable machine learning-based time series forecasting |
tslearn | 2,910 | 5 months ago | Machine learning toolkit dedicated to time-series data |
tick | 491 | 3 months ago | Module for statistical learning, with a particular emphasis on time-dependent modeling |
greykite | 1,813 | 5 months ago | A flexible, intuitive, and fast forecasting library next |
Prophet | 18,514 | 24 days ago | Automatic Forecasting Procedure |
PyFlux | 2,111 | about 1 year ago | Open source time series library for Python |
bayesloop | 153 | 7 months ago | Probabilistic programming framework that facilitates objective model selection for time-varying parameter models |
luminol | 1,189 | over 1 year ago | Anomaly Detection and Correlation library |
dateutil | Powerful extensions to the standard datetime module | ||
maya | 3,409 | 4 months ago | makes it very easy to parse a string and for changing timezones |
Chaos Genius | 733 | 2 months ago | ML powered analytics engine for outlier/anomaly detection and root cause analysis |
Awesome Python Data Science / Reinforcement Learning | |||
Gymnasium | 7,374 | 7 days ago | An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly ) |
PettingZoo | 2,627 | 9 days ago | An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities |
MAgent2 | 229 | 17 days ago | An engine for high performance multi-agent environments with very large numbers of agents, along with a set of reference environments |
Stable Baselines3 | 9,144 | 13 days ago | A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines |
Shimmy | 138 | about 1 month ago | An API conversion tool for popular external reinforcement learning environments |
EnvPool | 1,094 | 3 months ago | C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments |
RLlib | Scalable Reinforcement Learning | ||
Tianshou | 7,968 | 26 days ago | An elegant PyTorch deep reinforcement learning library |
Acme | 3,515 | 22 days ago | A library of reinforcement learning components and agents |
Catalyst-RL | 46 | about 3 years ago | PyTorch framework for RL research |
d3rlpy | 1,327 | 13 days ago | An offline deep reinforcement learning library |
DI-engine | 3,088 | 16 days ago | OpenDILab Decision AI Engine |
TF-Agents | 2,799 | about 1 month ago | A library for Reinforcement Learning in TensorFlow |
TensorForce | 3,296 | 4 months ago | A TensorFlow library for applied reinforcement learning |
TRFL | 3,134 | almost 2 years ago | TensorFlow Reinforcement Learning |
Dopamine | 10,569 | 17 days ago | A research framework for fast prototyping of reinforcement learning algorithms |
keras-rl | 5,526 | about 1 year ago | Deep Reinforcement Learning for Keras |
garage | 1,880 | over 1 year ago | A toolkit for reproducible reinforcement learning research |
Horizon | 3,575 | 9 days ago | A platform for Applied Reinforcement Learning |
rlpyt | 2,232 | almost 4 years ago | Reinforcement Learning in PyTorch |
cleanrl | 5,683 | 7 days ago | High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG) |
Machin | 401 | over 3 years ago | A reinforcement library designed for pytorch |
SKRL | 560 | 16 days ago | Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Isaac Orbit and Omniverse Isaac Gym |
Imitation | 1,327 | 4 months ago | Clean PyTorch implementations of imitation and reward learning algorithms |
Awesome Python Data Science / Graph Machine Learning | |||
pytorch_geometric | 21,382 | 6 days ago | Geometric Deep Learning Extension Library for PyTorch |
pytorch_geometric_temporal | 2,669 | about 1 month ago | Temporal Extension Library for PyTorch Geometric |
PyTorch Geometric Signed Directed | 128 | 4 months ago | A signed/directed graph neural network extension library for PyTorch Geometric |
dgl | 13,548 | about 1 month ago | Python package built to ease deep learning on graph, on top of existing DL frameworks |
Spektral | 2,371 | 10 months ago | Deep learning on graphs |
StellarGraph | 2,948 | 8 months ago | Machine Learning on Graphs |
Graph Nets | 5,360 | almost 2 years ago | Build Graph Nets in Tensorflow |
TensorFlow GNN | 1,362 | 7 days ago | A library to build Graph Neural Networks on the TensorFlow platform |
Auto Graph Learning | 1,088 | 3 months ago | -An autoML framework & toolkit for machine learning on graphs |
PyTorch-BigGraph | 3,383 | 9 months ago | Generate embeddings from large-scale graph-structured data |
Auto Graph Learning | 1,088 | 3 months ago | An autoML framework & toolkit for machine learning on graphs |
Karate Club | 2,163 | 4 months ago | An unsupervised machine learning library for graph-structured data |
Little Ball of Fur | 703 | 10 months ago | A library for sampling graph structured data |
GreatX | 83 | about 1 month ago | A graph reliability toolbox based on PyTorch and PyTorch Geometric (PyG) |
Jraph | 1,375 | 8 months ago | A Graph Neural Network Library in Jax |
Awesome Python Data Science / Learning-to-Rank & Recommender Systems | |||
LightFM | 4,773 | 4 months ago | A Python implementation of LightFM, a hybrid recommendation algorithm |
Spotlight | Deep recommender models using PyTorch | ||
Surprise | 6,413 | 5 months ago | A Python scikit for building and analyzing recommender systems |
RecBole | 3,450 | 3 months ago | A unified, comprehensive and efficient recommendation library |
allRank | 871 | 4 months ago | allRank is a framework for training learning-to-rank neural models based on PyTorch |
TensorFlow Recommenders | 1,849 | 12 days ago | A library for building recommender system models using TensorFlow |
TensorFlow Ranking | 2,743 | 8 months ago | Learning to Rank in TensorFlow |
Awesome Python Data Science / Probabilistic Graphical Models | |||
pomegranate | 3,376 | about 1 month ago | Probabilistic and graphical models for Python |
pgmpy | 2,748 | 7 days ago | A python library for working with Probabilistic Graphical Models |
pyAgrum | A GRaphical Universal Modeler | ||
Awesome Python Data Science / Probabilistic Methods | |||
pyro | 8,556 | 19 days ago | A flexible, scalable deep probabilistic programming library built on PyTorch |
PyMC | 8,722 | 3 days ago | Bayesian Stochastic Modelling in Python |
ZhuSuan | Bayesian Deep Learning | ||
GPflow | Gaussian processes in TensorFlow | ||
InferPy | 147 | 4 months ago | Deep Probabilistic Modelling Made Easy |
PyStan | 342 | 5 months ago | Bayesian inference using the No-U-Turn sampler (Python interface) |
sklearn-bayes | 514 | about 3 years ago | Python package for Bayesian Machine Learning with scikit-learn API |
skpro | 249 | 7 days ago | Supervised domain-agnostic prediction framework for probabilistic modelling by |
PyVarInf | 359 | about 5 years ago | Bayesian Deep Learning methods with Variational Inference for PyTorch |
emcee | 1,470 | 18 days ago | The Python ensemble sampling toolkit for affine-invariant MCMC |
hsmmlearn | 80 | about 3 years ago | A library for hidden semi-Markov models with explicit durations |
pyhsmm | 550 | about 2 years ago | Bayesian inference in HSMMs and HMMs |
GPyTorch | 3,580 | 20 days ago | A highly efficient and modular implementation of Gaussian Processes in PyTorch |
sklearn-crfsuite | 426 | about 1 year ago | A scikit-learn-inspired API for CRFsuite |
Awesome Python Data Science / Model Explanation | |||
dalex | 1,375 | about 2 months ago | moDel Agnostic Language for Exploration and explanation |
Shapley | 218 | over 1 year ago | A data-driven framework to quantify the value of classifiers in a machine learning ensemble |
Alibi | 2,414 | 4 months ago | Algorithms for monitoring and explaining machine learning models |
anchor | 798 | over 2 years ago | Code for "High-Precision Model-Agnostic Explanations" paper |
aequitas | 694 | 2 months ago | Bias and Fairness Audit Toolkit |
Contrastive Explanation | 45 | almost 2 years ago | Contrastive Explanation (Foil Trees) |
yellowbrick | 4,293 | about 2 months ago | Visual analysis and diagnostic tools to facilitate machine learning model selection |
scikit-plot | 2,427 | 3 months ago | An intuitive library to add plotting functionality to scikit-learn objects |
shap | 22,876 | 12 days ago | A unified approach to explain the output of any machine learning model |
ELI5 | 2,757 | over 2 years ago | A library for debugging/inspecting machine learning classifiers and explaining their predictions |
Lime | 11,615 | 4 months ago | Explaining the predictions of any machine learning classifier |
FairML | 360 | over 3 years ago | FairML is a python toolbox auditing the machine learning models for bias |
L2X | 124 | over 3 years ago | Code for replicating the experiments in the paper |
PDPbox | 845 | 3 months ago | Partial dependence plot toolbox |
PyCEbox | 165 | over 4 years ago | Python Individual Conditional Expectation Plot Toolbox |
Skater | Python Library for Model Interpretation | ||
model-analysis | 1,258 | 16 days ago | Model analysis tools for TensorFlow |
themis-ml | 124 | about 4 years ago | A library that implements fairness-aware machine learning algorithms |
treeinterpreter | 744 | over 1 year ago | Interpreting scikit-learn's decision tree and random forest predictions |
AI Explainability 360 | 1,633 | 4 months ago | Interpretability and explainability of data and machine learning models |
Auralisation | 42 | over 7 years ago | Auralisation of learned features in CNN (for audio) |
CapsNet-Visualization | 394 | about 3 years ago | A visualization of the CapsNet layers to better understand how it works |
lucid | 4,673 | almost 2 years ago | A collection of infrastructure and tools for research in neural network interpretability |
Netron | 28,134 | 6 days ago | Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks) |
FlashLight | Visualization Tool for your NeuralNetwork | ||
tensorboard-pytorch | 7,870 | 3 months ago | Tensorboard for PyTorch (and chainer, mxnet, numpy, ...) |
Awesome Python Data Science / Genetic Programming | |||
gplearn | 1,615 | 12 months ago | Genetic Programming in Python |
PyGAD | 1,884 | 2 months ago | Genetic Algorithm in Python |
DEAP | 5,852 | 8 days ago | Distributed Evolutionary Algorithms in Python |
karoo_gp | 161 | about 2 years ago | A Genetic Programming platform for Python with GPU support |
monkeys | 122 | over 6 years ago | A strongly-typed genetic programming framework for Python |
sklearn-genetic | 323 | 10 months ago | Genetic feature selection module for scikit-learn |
Awesome Python Data Science / Optimization | |||
Optuna | 10,910 | 6 days ago | A hyperparameter optimization framework |
pymoo | 2,285 | 3 months ago | Multi-objective Optimization in Python |
pycma | 1,109 | about 1 month ago | Python implementation of CMA-ES |
Spearmint | 1,547 | almost 5 years ago | Bayesian optimization |
BoTorch | 3,102 | 6 days ago | Bayesian optimization in PyTorch |
scikit-opt | 5,282 | 5 months ago | Heuristic Algorithms for optimization |
sklearn-genetic-opt | 314 | about 1 month ago | Hyperparameters tuning and feature selection using evolutionary algorithms |
SMAC3 | 1,085 | 23 days ago | Sequential Model-based Algorithm Configuration |
Optunity | 416 | 12 months ago | Is a library containing various optimizers for hyperparameter tuning |
hyperopt | 7,258 | 24 days ago | Distributed Asynchronous Hyperparameter Optimization in Python |
hyperopt-sklearn | 1,588 | 5 months ago | Hyper-parameter optimization for sklearn |
sklearn-deap | 771 | 10 months ago | Use evolutionary algorithms instead of gridsearch in scikit-learn |
sigopt_sklearn | 75 | about 1 year ago | SigOpt wrappers for scikit-learn methods |
Bayesian Optimization | 7,919 | about 1 month ago | A Python implementation of global optimization with gaussian processes |
SafeOpt | 141 | about 2 years ago | Safe Bayesian Optimization |
scikit-optimize | 2,744 | 9 months ago | Sequential model-based optimization with a interface |
Solid | 576 | over 5 years ago | A comprehensive gradient-free optimization framework written in Python |
PySwarms | 1,283 | 4 months ago | A research toolkit for particle swarm optimization in Python |
Platypus | 573 | about 2 months ago | A Free and Open Source Python Library for Multiobjective Optimization |
GPflowOpt | 270 | almost 4 years ago | Bayesian Optimization using GPflow |
POT | 2,431 | 14 days ago | Python Optimal Transport library |
Talos | 1,625 | 7 months ago | Hyperparameter Optimization for Keras Models |
nlopt | 1,892 | 7 days ago | Library for nonlinear optimization (global and local, constrained or unconstrained) |
OR-Tools | An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi | ||
Awesome Python Data Science / Feature Engineering / General | |||
Featuretools | 7,270 | 8 days ago | Automated feature engineering |
Feature Engine | 1,926 | 13 days ago | Feature engineering package with sklearn-like functionality |
OpenFE | 782 | 6 months ago | Automated feature generation with expert-level performance |
skl-groups | 41 | over 8 years ago | A scikit-learn addon to operate on set/"group"-based features |
Feature Forge | 382 | almost 7 years ago | A set of tools for creating and testing machine learning features |
few | 51 | over 4 years ago | A feature engineering wrapper for sklearn |
scikit-mdr | 126 | over 1 year ago | A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction |
tsfresh | 8,435 | 7 days ago | Automatic extraction of relevant features from time series |
dirty_cat | 16 | over 1 year ago | Machine learning on dirty tabular data (especially: string-based variables for classifcation and regression) |
NitroFE | 106 | over 2 years ago | Moving window features |
sk-transformer | 8 | 10 days ago | A collection of various pandas & scikit-learn compatible transformers for all kinds of preprocessing and feature engineering steps |
Awesome Python Data Science / Feature Engineering / Feature Selection | |||
scikit-feature | 1,509 | 4 months ago | Feature selection repository in Python |
boruta_py | 1,511 | 3 months ago | Implementations of the Boruta all-relevant feature selection method |
BoostARoota | 219 | over 3 years ago | A fast xgboost feature selection algorithm |
scikit-rebate | 409 | almost 2 years ago | A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning |
zoofs | 243 | 4 months ago | A feature selection library based on evolutionary algorithms |
Awesome Python Data Science / Visualization / General Purposes | |||
Matplotlib | 20,294 | 6 days ago | Plotting with Python |
seaborn | 12,575 | 3 months ago | Statistical data visualization using matplotlib |
prettyplotlib | 1,692 | almost 6 years ago | Painlessly create beautiful matplotlib plots |
python-ternary | 733 | 5 months ago | Ternary plotting library for Python with matplotlib |
missingno | 3,961 | 6 months ago | Missing data visualization module for Python |
chartify | 3,535 | about 1 month ago | Python library that makes it easy for data scientists to create charts |
physt | 134 | about 1 month ago | Improved histograms |
Awesome Python Data Science / Visualization / Interactive plots | |||
animatplot | 412 | 3 months ago | A python package for animating plots built on matplotlib |
plotly | A Python library that makes interactive and publication-quality graphs | ||
Bokeh | 19,372 | 8 days ago | Interactive Web Plotting for Python |
Altair | Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph | ||
bqplot | 3,627 | 23 days ago | Plotting library for IPython/Jupyter notebooks |
pyecharts | 14,903 | 15 days ago | Migrated from , a charting and visualization library, to Python's interactive visual drawing library |
Awesome Python Data Science / Visualization / Map | |||
folium | Makes it easy to visualize data on an interactive open street map | ||
geemap | 3,473 | 7 days ago | Python package for interactive mapping with Google Earth Engine (GEE) |
Awesome Python Data Science / Visualization / Automatic Plotting | |||
HoloViews | 2,707 | 5 days ago | Stop plotting your data - annotate your data and let it visualize itself |
AutoViz | 1,729 | 5 months ago | : Visualize data automatically with 1 line of code (ideal for machine learning) |
SweetViz | 2,949 | 4 months ago | : Visualize and compare datasets, target values and associations, with one line of code |
Awesome Python Data Science / Visualization / NLP | |||
pyLDAvis | 1,805 | 4 months ago | : Visualize interactive topic model |
Awesome Python Data Science / Deployment | |||
fastapi | Modern, fast (high-performance), a web framework for building APIs with Python | ||
streamlit | Make it easy to deploy the machine learning model | ||
streamsync | 1,328 | 6 days ago | No-code in the front, Python in the back. An open-source framework for creating data apps |
gradio | 33,962 | 6 days ago | Create UIs for your machine learning model in Python in 3 minutes |
Vizro | 2,707 | 6 days ago | A toolkit for creating modular data visualization applications |
datapane | A collection of APIs to turn scripts and notebooks into interactive reports | ||
binder | Enable sharing and execute Jupyter Notebooks | ||
Awesome Python Data Science / Statistics | |||
pandas_summary | 504 | 28 days ago | Extension to pandas dataframes describe function |
Pandas Profiling | 12,536 | 8 days ago | Create HTML profiling reports from pandas DataFrame objects |
statsmodels | 10,151 | 7 days ago | Statistical modeling and econometrics in Python |
stockstats | 1,303 | 11 months ago | Supply a wrapper based on the with inline stock statistics/indicators support |
weightedcalcs | 105 | 11 days ago | A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more |
scikit-posthocs | 348 | 26 days ago | Pairwise Multiple Comparisons Post-hoc Tests |
Alphalens | 3,384 | 9 months ago | Performance analysis of predictive (alpha) stock factors |
Awesome Python Data Science / Data Manipulation / Data Frames | |||
pandas | Powerful Python data analysis toolkit | ||
polars | 30,400 | 4 days ago | A fast multi-threaded, hybrid-out-of-core DataFrame library |
Arctic | 3,055 | 8 months ago | High-performance datastore for time series and tick data |
datatable | 1,817 | 28 days ago | Data.table for Python |
pandas_profiling | 12,536 | 8 days ago | Create HTML profiling reports from pandas DataFrame objects |
cuDF | 8,448 | 4 days ago | GPU DataFrame Library |
blaze | 3,187 | about 1 year ago | NumPy and pandas interface to Big Data |
pandasql | 1,342 | 4 months ago | Allows you to query pandas DataFrames using SQL syntax |
pandas-gbq | 448 | 9 days ago | pandas Google Big Query |
xpandas | 26 | over 2 years ago | Universal 1d/2d data containers with Transformers .functionality for data analysis by |
pysparkling | 262 | 3 months ago | A pure Python implementation of Apache Spark's RDD and DStream interfaces |
modin | 9,892 | 2 months ago | Speed up your pandas workflows by changing a single line of code |
swifter | 2,540 | 8 months ago | A package that efficiently applies any function to a pandas dataframe or series in the fastest available manner |
pandas-log | 214 | over 3 years ago | A package that allows providing feedback about basic pandas operations and finds both business logic and performance issues |
vaex | 8,297 | about 1 month ago | Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second |
xarray | 3,619 | 6 days ago | Xarray combines the best features of NumPy and pandas for multidimensional data selection by supplementing numerical axis labels with named dimensions for more intuitive, concise, and less error-prone indexing routines |
Awesome Python Data Science / Data Manipulation / Pipelines | |||
pdpipe | 716 | 21 days ago | Sasy pipelines for pandas DataFrames |
SSPipe | Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch | ||
pandas-ply | 200 | about 9 years ago | Functional data manipulation for pandas |
Dplython | 764 | almost 8 years ago | Dplyr for Python |
sklearn-pandas | 2,814 | over 1 year ago | pandas integration with sklearn |
Dataset | 201 | 23 days ago | Helps you conveniently work with random or sequential batches of your data and define data processing |
pyjanitor | 1,364 | 6 days ago | Clean APIs for data cleaning |
meza | 416 | 4 months ago | A Python toolkit for processing tabular data |
Prodmodel | 59 | over 2 years ago | Build system for data science pipelines |
dopanda | 473 | about 1 month ago | Hints and tips for using pandas in an analysis environment |
Hamilton | 1,861 | 7 days ago | A microframework for dataframe generation that applies Directed Acyclic Graphs specified by a flow of lazily evaluated Python functions |
Awesome Python Data Science / Data Manipulation / Data-centric AI | |||
cleanlab | 9,756 | 29 days ago | The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels |
snorkel | 5,809 | 7 months ago | A system for quickly generating training data with weak supervision |
dataprep | 2,068 | 5 months ago | Collect, clean, and visualize your data in Python with a few lines of code |
Awesome Python Data Science / Data Manipulation / Synthetic Data | |||
ydata-synthetic | 1,441 | 13 days ago | A package to generate synthetic tabular and time-series data leveraging the state-of-the-art generative models |
Awesome Python Data Science / Distributed Computing | |||
Horovod | 14,265 | 3 months ago | Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet |
PySpark | Exposes the Spark programming model to Python | ||
Veles | 906 | about 1 year ago | Distributed machine learning platform |
Jubatus | 707 | over 5 years ago | Framework and Library for Distributed Online Machine Learning |
DMTK | 2,745 | about 6 years ago | Microsoft Distributed Machine Learning Toolkit |
PaddlePaddle | 22,258 | 6 days ago | PArallel Distributed Deep LEarning |
dask-ml | 902 | 4 months ago | Distributed and parallel machine learning |
Distributed | 1,579 | 4 days ago | Distributed computation in Python |
Awesome Python Data Science / Experimentation | |||
mlflow | 18,781 | 6 days ago | Open source platform for the machine learning lifecycle |
Neptune | A lightweight ML experiment tracking, results visualization, and management tool | ||
dvc | 13,899 | 6 days ago | Data Version Control | Git for Data & Models | ML Experiments Management |
envd | 2,038 | about 2 months ago | 🏕️ machine learning development environment for data science and AI/ML engineering teams |
Sacred | 4,254 | about 1 month ago | A tool to help you configure, organize, log, and reproduce experiments |
Ax | 2,378 | 6 days ago | Adaptive Experimentation Platform |
Awesome Python Data Science / Data Validation | |||
great_expectations | 9,989 | 4 days ago | Always know what to expect from your data |
pandera | 3,393 | 7 days ago | A lightweight, flexible, and expressive statistical data testing library |
deepchecks | 3,623 | 8 days ago | Validation & testing of ML models and data during model development, deployment, and production |
evidently | 5,391 | 7 days ago | Evaluate and monitor ML models from validation to production |
TensorFlow Data Validation | 765 | 20 days ago | Library for exploring and validating machine learning data |
DataComPy | 485 | 6 days ago | A library to compare Pandas, Polars, and Spark data frames. It provides stats and lets users adjust for match accuracy |
Awesome Python Data Science / Evaluation | |||
recmetrics | 569 | 10 months ago | Library of useful metrics and plots for evaluating recommender systems |
Metrics | 1,627 | almost 2 years ago | Machine learning evaluation metric |
sklearn-evaluation | 3 | almost 2 years ago | Model evaluation made easy: plots, tables, and markdown reports |
AI Fairness 360 | 2,457 | 5 months ago | Fairness metrics for datasets and ML models, explanations, and algorithms to mitigate bias in datasets and models |
Awesome Python Data Science / Computations | |||
numpy | The fundamental package needed for scientific computing with Python | ||
Dask | 12,593 | 6 days ago | Parallel computing with task scheduling |
bottleneck | 1,073 | about 1 month ago | Fast NumPy array functions written in C |
CuPy | 9,485 | 4 days ago | NumPy-like API accelerated with CUDA |
scikit-tensor | 402 | about 6 years ago | Python library for multilinear algebra and tensor factorizations |
numdifftools | 256 | over 1 year ago | Solve automatic numerical differentiation problems in one or more variables |
quaternion | 612 | 23 days ago | Add built-in support for quaternions to numpy |
adaptive | 1,164 | 10 days ago | Tools for adaptive and parallel samping of mathematical functions |
NumExpr | 2,238 | 2 months ago | A fast numerical expression evaluator for NumPy that comes with an integrated computing virtual machine to speed calculations up by avoiding memory allocation for intermediate results |
Awesome Python Data Science / Web Scraping | |||
BeautifulSoup | : The easiest library to scrape static websites for beginners | ||
Scrapy | : Fast and extensible scraping library. Can write rules and create customized scraper without touching the core | ||
Selenium | : Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user | ||
Pattern | 8,750 | 5 months ago | : High level scraping for well-establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization |
twitterscraper | 2,412 | about 2 years ago | : Efficient library to scrape Twitter |
Awesome Python Data Science / Spatial Analysis | |||
GeoPandas | 4,519 | 4 days ago | Python tools for geographic data |
PySal | 1,331 | about 1 month ago | Python Spatial Analysis Library |
Awesome Python Data Science / Quantum Computing | |||
qiskit | 5,280 | 3 days ago | Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules |
cirq | 4,282 | 7 days ago | A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits |
PennyLane | 2,355 | 4 days ago | Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations |
QML | 199 | 7 months ago | A Python Toolkit for Quantum Machine Learning |
Awesome Python Data Science / Conversion | |||
sklearn-porter | 1,293 | 5 months ago | Transpile trained scikit-learn estimators to C, Java, JavaScript, and others |
ONNX | 17,938 | 4 days ago | Open Neural Network Exchange |
MMdnn | 5,797 | 6 months ago | A set of tools to help users inter-operate among different deep learning frameworks |
treelite | 738 | 16 days ago | Universal model exchange and serialization format for decision tree forests |