awesome-python-data-science
Data Science Toolbox
A curated list of data science software in Python
Probably the best curated list of data science software in Python.
3k stars
60 watching
346 forks
last commit: about 1 year ago
Linked from 4 awesome lists
awesomeawesome-listawesome-pythondata-analysisdata-sciencedata-visualizationdeep-learningmachine-learningpythonscikit-learnstatistics
Awesome Python Data Science / Machine Learning / General Purpose Machine Learning | |||
| scikit-learn | Machine learning in Python | ||
| PyCaret | 9,026 | 11 months ago | An open-source, low-code machine learning library in Python |
| Shogun | 3,032 | almost 2 years ago | Machine learning toolbox |
| xLearn | 3,087 | about 2 years ago | High Performance, Easy-to-use, and Scalable Machine Learning Package |
| cuML | 4,292 | 11 months ago | RAPIDS Machine Learning Library |
| modAL | 2,239 | over 1 year ago | Modular active learning framework for Python3 |
| Sparkit-learn | 1,154 | almost 5 years ago | PySpark + scikit-learn = Sparkit-learn |
| mlpack | 5,151 | 11 months ago | A scalable C++ machine learning library (Python bindings) |
| dlib | 13,623 | 12 months ago | Toolkit for making real-world machine learning and data analysis applications in C++ (Python bindings) |
| MLxtend | 4,926 | 12 months ago | Extension and helper modules for Python's data analysis and machine learning libraries |
| hyperlearn | 1,871 | 12 months ago | 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels |
| Reproducible Experiment Platform (REP) | 689 | over 1 year ago | Machine Learning toolbox for Humans |
| scikit-multilearn | 921 | almost 2 years ago | Multi-label classification for python |
| seqlearn | 691 | over 2 years ago | Sequence classification toolkit for Python |
| pystruct | 664 | about 4 years ago | Simple structured learning framework for Python |
| sklearn-expertsys | 489 | about 8 years ago | Highly interpretable classifiers for scikit learn |
| RuleFit | 411 | about 2 years ago | Implementation of the rulefit |
| metric-learn | 1,402 | over 1 year ago | Metric learning algorithms in Python |
| pyGAM | 876 | over 1 year ago | Generalized Additive Models in Python |
| causalml | 5,132 | 11 months ago | Uplift modeling and causal inference with machine learning algorithms |
Awesome Python Data Science / Machine Learning / Gradient Boosting | |||
| XGBoost | 26,396 | 11 months ago | Scalable, Portable, and Distributed Gradient Boosting |
| LightGBM | 16,769 | 11 months ago | A fast, distributed, high-performance gradient boosting |
| CatBoost | 8,139 | 11 months ago | An open-source gradient boosting on decision trees library |
| ThunderGBM | 695 | almost 2 years ago | Fast GBDTs and Random Forests on GPUs |
| NGBoost | 1,663 | about 1 year ago | Natural Gradient Boosting for Probabilistic Prediction |
| TensorFlow Decision Forests | 666 | 12 months ago | A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras |
Awesome Python Data Science / Machine Learning / Ensemble Methods | |||
| ML-Ensemble | High performance ensemble learning | ||
| Stacking | 222 | almost 8 years ago | Simple and useful stacking library written in Python |
| stacked_generalization | 117 | over 6 years ago | Library for machine learning stacking generalization |
| vecstack | 688 | about 1 year ago | Python package for stacking (machine learning technique) |
Awesome Python Data Science / Machine Learning / Imbalanced Datasets | |||
| imbalanced-learn | 6,875 | 11 months ago | Module to perform under-sampling and over-sampling with various techniques |
| imbalanced-algorithms | 235 | almost 4 years ago | Python-based implementations of algorithms for learning on imbalanced data |
Awesome Python Data Science / Machine Learning / Random Forests | |||
| rpforest | 225 | over 5 years ago | A forest of random projection trees |
| sklearn-random-bits-forest | 9 | over 9 years ago | Wrapper of the Random Bits Forest program written by (Wang et al., 2016) |
| rgf_python | 379 | almost 4 years ago | Python Wrapper of Regularized Greedy Forest |
Awesome Python Data Science / Machine Learning / Kernel Methods | |||
| pyFM | 923 | about 5 years ago | Factorization machines in python |
| fastFM | 1,078 | over 3 years ago | A library for Factorization Machines |
| tffm | 780 | almost 4 years ago | TensorFlow implementation of an arbitrary order Factorization Machine |
| liquidSVM | 66 | over 5 years ago | An implementation of SVMs |
| scikit-rvm | 231 | over 8 years ago | Relevance Vector Machine implementation using the scikit-learn API |
| ThunderSVM | 1,571 | over 1 year ago | A fast SVM Library on GPUs and CPUs |
Awesome Python Data Science / Deep Learning / PyTorch | |||
| PyTorch | 84,978 | 11 months ago | Tensors and Dynamic neural networks in Python with strong GPU acceleration |
| pytorch-lightning | 28,636 | 11 months ago | PyTorch Lightning is just organized PyTorch |
| ignite | 4,554 | 11 months ago | High-level library to help with training neural networks in PyTorch |
| skorch | 5,911 | 11 months ago | A scikit-learn compatible neural network library that wraps PyTorch |
| Catalyst | 3,300 | over 1 year ago | High-level utils for PyTorch DL & RL research |
| ChemicalX | 719 | about 2 years ago | A PyTorch-based deep learning library for drug pair scoring |
Awesome Python Data Science / Deep Learning / TensorFlow | |||
| TensorFlow | 186,822 | 11 months ago | Computation using data flow graphs for scalable machine learning by Google |
| TensorLayer | 7,337 | over 2 years ago | Deep Learning and Reinforcement Learning Library for Researcher and Engineer |
| TFLearn | 9,621 | over 1 year ago | Deep learning library featuring a higher-level API for TensorFlow |
| Sonnet | 9,790 | 12 months ago | TensorFlow-based neural network library |
| tensorpack | 6,303 | over 2 years ago | A Neural Net Training Interface on TensorFlow |
| Polyaxon | 3,581 | 11 months ago | A platform that helps you build, manage and monitor deep learning models |
| tfdeploy | 353 | over 1 year ago | Deploy TensorFlow graphs for fast evaluation and export to TensorFlow-less environments running numpy |
| tensorflow-upstream | 688 | 11 months ago | TensorFlow ROCm port |
| TensorFlow Fold | 1,826 | over 4 years ago | Deep learning with dynamic computation graphs in TensorFlow |
| TensorLight | 11 | about 3 years ago | A high-level framework for TensorFlow |
| Mesh TensorFlow | 1,597 | almost 2 years ago | Model Parallelism Made Easier |
| Ludwig | 11,236 | 11 months ago | A toolbox that allows one to train and test deep learning models without the need to write code |
| Keras | A high-level neural networks API running on top of TensorFlow | ||
| keras-contrib | 1,579 | about 3 years ago | Keras community contributions |
| Hyperas | 2,179 | almost 3 years ago | Keras + Hyperopt: A straightforward wrapper for a convenient hyperparameter |
| Elephas | 1,574 | over 2 years ago | Distributed Deep learning with Keras & Spark |
| qkeras | 541 | about 1 year ago | A quantization deep learning library |
Awesome Python Data Science / Deep Learning / MXNet | |||
| MXNet | 20,791 | about 2 years ago | Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler |
| Gluon | 2,300 | about 6 years ago | A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet) |
| Xfer | 253 | over 2 years ago | Transfer Learning library for Deep Neural Networks |
| MXNet | 28 | almost 6 years ago | HIP Port of MXNet |
Awesome Python Data Science / Deep Learning / JAX | |||
| JAX | 30,744 | 11 months ago | Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more |
| FLAX | 6,196 | 11 months ago | A neural network library for JAX that is designed for flexibility |
| Optax | 1,730 | 11 months ago | A gradient processing and optimization library for JAX |
Awesome Python Data Science / Deep Learning / Others | |||
| transformers | 136,357 | 11 months ago | State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX |
| Tangent | 2,314 | about 3 years ago | Source-to-Source Debuggable Derivatives in Pure Python |
| autograd | 7,049 | 11 months ago | Efficiently computes derivatives of numpy code |
| Caffe | 34,149 | over 1 year ago | A fast open framework for deep learning |
| nnabla | 2,729 | 12 months ago | Neural Network Libraries by Sony |
Awesome Python Data Science / Automated Machine Learning | |||
| auto-sklearn | 7,667 | 11 months ago | An AutoML toolkit and a drop-in replacement for a scikit-learn estimator |
| Auto-PyTorch | 2,385 | over 1 year ago | Automatic architecture search and hyperparameter optimization for PyTorch |
| AutoKeras | 9,172 | 11 months ago | AutoML library for deep learning |
| AutoGluon | 8,167 | 11 months ago | AutoML for Image, Text, Tabular, Time-Series, and MultiModal Data |
| TPOT | 9,776 | over 1 year ago | AutoML tool that optimizes machine learning pipelines using genetic programming |
| MLBox | 1,500 | about 2 years ago | A powerful Automated Machine Learning python library |
Awesome Python Data Science / Natural Language Processing | |||
| torchtext | 3,524 | 11 months ago | Data loaders and abstractions for text and NLP |
| gluon-nlp | 2,560 | about 2 years ago | NLP made easy |
| KerasNLP | 818 | 11 months ago | Modular Natural Language Processing workflows with Keras |
| spaCy | Industrial-Strength Natural Language Processing | ||
| NLTK | 13,694 | 12 months ago | Modules, data sets, and tutorials supporting research and development in Natural Language Processing |
| CLTK | 843 | 11 months ago | The Classical Language Toolkik |
| gensim | Topic Modelling for Humans | ||
| pyMorfologik | 18 | about 10 years ago | Python binding for |
| skift | 233 | over 3 years ago | Scikit-learn wrappers for Python fastText |
| Phonemizer | 1,249 | about 1 year ago | Simple text-to-phonemes converter for multiple languages |
| flair | 13,990 | 11 months ago | Very simple framework for state-of-the-art NLP |
Awesome Python Data Science / Computer Audition | |||
| torchaudio | 2,561 | 11 months ago | An audio library for PyTorch |
| librosa | 7,237 | 11 months ago | Python library for audio and music analysis |
| Yaafe | 244 | over 4 years ago | Audio features extraction |
| aubio | 3,336 | over 1 year ago | A library for audio and music analysis |
| Essentia | 2,889 | about 1 year ago | Library for audio and music analysis, description, and synthesis |
| LibXtract | 227 | over 5 years ago | A simple, portable, lightweight library of audio feature extraction functions |
| Marsyas | 407 | over 2 years ago | Music Analysis, Retrieval, and Synthesis for Audio Signals |
| muda | 233 | over 4 years ago | A library for augmenting annotated audio data |
| madmom | 1,366 | about 1 year ago | Python audio and music signal processing library |
Awesome Python Data Science / Computer Vision | |||
| torchvision | 16,364 | 11 months ago | Datasets, Transforms, and Models specific to Computer Vision |
| PyTorch3D | 8,889 | 12 months ago | PyTorch3D is FAIR's library of reusable components for deep learning with 3D data |
| gluon-cv | 5,850 | 12 months ago | Provides implementations of the state-of-the-art deep learning models in computer vision |
| KerasCV | 1,013 | 11 months ago | Industry-strength Computer Vision workflows with Keras |
| OpenCV | 79,662 | 11 months ago | Open Source Computer Vision Library |
| Decord | 1,923 | over 1 year ago | An efficient video loader for deep learning with smart shuffling that's super easy to digest |
| MMEngine | 1,196 | 12 months ago | OpenMMLab Foundational Library for Training Deep Learning Models |
| scikit-image | 6,117 | 11 months ago | Image Processing SciKit (Toolbox for SciPy) |
| imgaug | 14,458 | over 1 year ago | Image augmentation for machine learning experiments |
| imgaug_extension | Additional augmentations for imgaug | ||
| Augmentor | 5,084 | over 1 year ago | Image augmentation library in Python for machine learning |
| albumentations | 14,386 | 11 months ago | Fast image augmentation library and easy-to-use wrapper around other libraries |
| LAVIS | 10,058 | 12 months ago | A One-stop Library for Language-Vision Intelligence |
Awesome Python Data Science / Time Series | |||
| sktime | 8,020 | 11 months ago | A unified framework for machine learning with time series |
| skforecast | 1,189 | 11 months ago | Time series forecasting with machine learning models |
| darts | 8,166 | 11 months ago | A python library for easy manipulation and forecasting of time series |
| statsforecast | 4,045 | 11 months ago | Lightning fast forecasting with statistical and econometric models |
| mlforecast | 924 | 11 months ago | Scalable machine learning-based time series forecasting |
| neuralforecast | 3,181 | 11 months ago | Scalable machine learning-based time series forecasting |
| tslearn | 2,924 | over 1 year ago | Machine learning toolkit dedicated to time-series data |
| tick | 495 | 11 months ago | Module for statistical learning, with a particular emphasis on time-dependent modeling |
| greykite | 1,815 | over 1 year ago | A flexible, intuitive, and fast forecasting library next |
| Prophet | 18,627 | about 1 year ago | Automatic Forecasting Procedure |
| PyFlux | 2,114 | about 2 years ago | Open source time series library for Python |
| bayesloop | 156 | over 1 year ago | Probabilistic programming framework that facilitates objective model selection for time-varying parameter models |
| luminol | 1,193 | over 2 years ago | Anomaly Detection and Correlation library |
| dateutil | Powerful extensions to the standard datetime module | ||
| maya | 3,414 | over 1 year ago | makes it very easy to parse a string and for changing timezones |
| Chaos Genius | 744 | about 1 year ago | ML powered analytics engine for outlier/anomaly detection and root cause analysis |
Awesome Python Data Science / Reinforcement Learning | |||
| Gymnasium | 7,613 | 11 months ago | An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly ) |
| PettingZoo | 2,678 | 11 months ago | An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities |
| MAgent2 | 240 | about 1 year ago | An engine for high performance multi-agent environments with very large numbers of agents, along with a set of reference environments |
| Stable Baselines3 | 9,329 | 11 months ago | A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines |
| Shimmy | 143 | about 1 year ago | An API conversion tool for popular external reinforcement learning environments |
| EnvPool | 1,108 | about 1 year ago | C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments |
| RLlib | Scalable Reinforcement Learning | ||
| Tianshou | 8,069 | 11 months ago | An elegant PyTorch deep reinforcement learning library |
| Acme | 3,542 | about 1 year ago | A library of reinforcement learning components and agents |
| Catalyst-RL | 46 | about 4 years ago | PyTorch framework for RL research |
| d3rlpy | 1,349 | 12 months ago | An offline deep reinforcement learning library |
| DI-engine | 3,143 | 11 months ago | OpenDILab Decision AI Engine |
| TF-Agents | 2,816 | 11 months ago | A library for Reinforcement Learning in TensorFlow |
| TensorForce | 3,299 | over 1 year ago | A TensorFlow library for applied reinforcement learning |
| TRFL | 3,136 | almost 3 years ago | TensorFlow Reinforcement Learning |
| Dopamine | 10,591 | about 1 year ago | A research framework for fast prototyping of reinforcement learning algorithms |
| keras-rl | 5,530 | about 2 years ago | Deep Reinforcement Learning for Keras |
| garage | 1,893 | over 2 years ago | A toolkit for reproducible reinforcement learning research |
| Horizon | 3,575 | 12 months ago | A platform for Applied Reinforcement Learning |
| rlpyt | 2,236 | almost 5 years ago | Reinforcement Learning in PyTorch |
| cleanrl | 5,891 | 12 months ago | High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG) |
| Machin | 402 | about 4 years ago | A reinforcement library designed for pytorch |
| SKRL | 588 | 11 months ago | Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Isaac Orbit and Omniverse Isaac Gym |
| Imitation | 1,350 | about 1 year ago | Clean PyTorch implementations of imitation and reward learning algorithms |
Awesome Python Data Science / Graph Machine Learning | |||
| pytorch_geometric | 21,597 | 11 months ago | Geometric Deep Learning Extension Library for PyTorch |
| pytorch_geometric_temporal | 2,694 | about 1 year ago | Temporal Extension Library for PyTorch Geometric |
| PyTorch Geometric Signed Directed | 131 | over 1 year ago | A signed/directed graph neural network extension library for PyTorch Geometric |
| dgl | 13,601 | about 1 year ago | Python package built to ease deep learning on graph, on top of existing DL frameworks |
| Spektral | 2,372 | almost 2 years ago | Deep learning on graphs |
| StellarGraph | 2,957 | over 1 year ago | Machine Learning on Graphs |
| Graph Nets | 5,370 | almost 3 years ago | Build Graph Nets in Tensorflow |
| TensorFlow GNN | 1,372 | 11 months ago | A library to build Graph Neural Networks on the TensorFlow platform |
| Auto Graph Learning | 1,094 | about 1 year ago | -An autoML framework & toolkit for machine learning on graphs |
| PyTorch-BigGraph | 3,389 | over 1 year ago | Generate embeddings from large-scale graph-structured data |
| Auto Graph Learning | 1,094 | about 1 year ago | An autoML framework & toolkit for machine learning on graphs |
| Karate Club | 2,178 | over 1 year ago | An unsupervised machine learning library for graph-structured data |
| Little Ball of Fur | 705 | almost 2 years ago | A library for sampling graph structured data |
| GreatX | 85 | about 1 year ago | A graph reliability toolbox based on PyTorch and PyTorch Geometric (PyG) |
| Jraph | 1,380 | over 1 year ago | A Graph Neural Network Library in Jax |
Awesome Python Data Science / Learning-to-Rank & Recommender Systems | |||
| LightFM | 4,790 | over 1 year ago | A Python implementation of LightFM, a hybrid recommendation algorithm |
| Spotlight | Deep recommender models using PyTorch | ||
| Surprise | 6,434 | over 1 year ago | A Python scikit for building and analyzing recommender systems |
| RecBole | 3,497 | about 1 year ago | A unified, comprehensive and efficient recommendation library |
| allRank | 886 | about 1 year ago | allRank is a framework for training learning-to-rank neural models based on PyTorch |
| TensorFlow Recommenders | 1,869 | 11 months ago | A library for building recommender system models using TensorFlow |
| TensorFlow Ranking | 2,750 | over 1 year ago | Learning to Rank in TensorFlow |
Awesome Python Data Science / Probabilistic Graphical Models | |||
| pomegranate | 3,389 | about 1 year ago | Probabilistic and graphical models for Python |
| pgmpy | 2,776 | 11 months ago | A python library for working with Probabilistic Graphical Models |
| pyAgrum | A GRaphical Universal Modeler | ||
Awesome Python Data Science / Probabilistic Methods | |||
| pyro | 8,604 | 11 months ago | A flexible, scalable deep probabilistic programming library built on PyTorch |
| PyMC | 8,786 | 11 months ago | Bayesian Stochastic Modelling in Python |
| ZhuSuan | Bayesian Deep Learning | ||
| GPflow | Gaussian processes in TensorFlow | ||
| InferPy | 149 | over 1 year ago | Deep Probabilistic Modelling Made Easy |
| PyStan | 343 | over 1 year ago | Bayesian inference using the No-U-Turn sampler (Python interface) |
| sklearn-bayes | 514 | about 4 years ago | Python package for Bayesian Machine Learning with scikit-learn API |
| skpro | 250 | 11 months ago | Supervised domain-agnostic prediction framework for probabilistic modelling by |
| PyVarInf | 359 | about 6 years ago | Bayesian Deep Learning methods with Variational Inference for PyTorch |
| emcee | 1,478 | 11 months ago | The Python ensemble sampling toolkit for affine-invariant MCMC |
| hsmmlearn | 81 | about 4 years ago | A library for hidden semi-Markov models with explicit durations |
| pyhsmm | 549 | about 3 years ago | Bayesian inference in HSMMs and HMMs |
| GPyTorch | 3,605 | 11 months ago | A highly efficient and modular implementation of Gaussian Processes in PyTorch |
| sklearn-crfsuite | 425 | about 2 years ago | A scikit-learn-inspired API for CRFsuite |
Awesome Python Data Science / Model Explanation | |||
| dalex | 1,390 | about 1 year ago | moDel Agnostic Language for Exploration and explanation |
| Shapley | 219 | over 2 years ago | A data-driven framework to quantify the value of classifiers in a machine learning ensemble |
| Alibi | 2,421 | 11 months ago | Algorithms for monitoring and explaining machine learning models |
| anchor | 798 | over 3 years ago | Code for "High-Precision Model-Agnostic Explanations" paper |
| aequitas | 701 | about 1 year ago | Bias and Fairness Audit Toolkit |
| Contrastive Explanation | 45 | almost 3 years ago | Contrastive Explanation (Foil Trees) |
| yellowbrick | 4,304 | about 1 year ago | Visual analysis and diagnostic tools to facilitate machine learning model selection |
| scikit-plot | 2,432 | about 1 year ago | An intuitive library to add plotting functionality to scikit-learn objects |
| shap | 23,077 | 11 months ago | A unified approach to explain the output of any machine learning model |
| ELI5 | 2,763 | over 3 years ago | A library for debugging/inspecting machine learning classifiers and explaining their predictions |
| Lime | 11,663 | over 1 year ago | Explaining the predictions of any machine learning classifier |
| FairML | 361 | over 4 years ago | FairML is a python toolbox auditing the machine learning models for bias |
| L2X | 123 | over 4 years ago | Code for replicating the experiments in the paper |
| PDPbox | 846 | about 1 year ago | Partial dependence plot toolbox |
| PyCEbox | 164 | over 5 years ago | Python Individual Conditional Expectation Plot Toolbox |
| Skater | Python Library for Model Interpretation | ||
| model-analysis | 1,258 | 11 months ago | Model analysis tools for TensorFlow |
| themis-ml | 125 | about 5 years ago | A library that implements fairness-aware machine learning algorithms |
| treeinterpreter | 745 | over 2 years ago | Interpreting scikit-learn's decision tree and random forest predictions |
| AI Explainability 360 | 1,641 | over 1 year ago | Interpretability and explainability of data and machine learning models |
| Auralisation | 42 | over 8 years ago | Auralisation of learned features in CNN (for audio) |
| CapsNet-Visualization | 394 | about 4 years ago | A visualization of the CapsNet layers to better understand how it works |
| lucid | 4,678 | over 2 years ago | A collection of infrastructure and tools for research in neural network interpretability |
| Netron | 28,684 | 11 months ago | Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks) |
| FlashLight | Visualization Tool for your NeuralNetwork | ||
| tensorboard-pytorch | 7,887 | 11 months ago | Tensorboard for PyTorch (and chainer, mxnet, numpy, ...) |
Awesome Python Data Science / Genetic Programming | |||
| gplearn | 1,636 | almost 2 years ago | Genetic Programming in Python |
| PyGAD | 1,905 | 11 months ago | Genetic Algorithm in Python |
| DEAP | 5,891 | 12 months ago | Distributed Evolutionary Algorithms in Python |
| karoo_gp | 161 | about 3 years ago | A Genetic Programming platform for Python with GPU support |
| monkeys | 122 | over 7 years ago | A strongly-typed genetic programming framework for Python |
| sklearn-genetic | 323 | almost 2 years ago | Genetic feature selection module for scikit-learn |
Awesome Python Data Science / Optimization | |||
| Optuna | 11,082 | 11 months ago | A hyperparameter optimization framework |
| pymoo | 2,333 | 11 months ago | Multi-objective Optimization in Python |
| pycma | 1,123 | about 1 year ago | Python implementation of CMA-ES |
| Spearmint | 1,550 | almost 6 years ago | Bayesian optimization |
| BoTorch | 3,126 | 11 months ago | Bayesian optimization in PyTorch |
| scikit-opt | 5,316 | over 1 year ago | Heuristic Algorithms for optimization |
| sklearn-genetic-opt | 316 | about 1 year ago | Hyperparameters tuning and feature selection using evolutionary algorithms |
| SMAC3 | 1,093 | 11 months ago | Sequential Model-based Algorithm Configuration |
| Optunity | 417 | almost 2 years ago | Is a library containing various optimizers for hyperparameter tuning |
| hyperopt | 7,295 | about 1 year ago | Distributed Asynchronous Hyperparameter Optimization in Python |
| hyperopt-sklearn | 1,594 | over 1 year ago | Hyper-parameter optimization for sklearn |
| sklearn-deap | 771 | over 1 year ago | Use evolutionary algorithms instead of gridsearch in scikit-learn |
| sigopt_sklearn | 75 | about 2 years ago | SigOpt wrappers for scikit-learn methods |
| Bayesian Optimization | 7,978 | 11 months ago | A Python implementation of global optimization with gaussian processes |
| SafeOpt | 141 | almost 3 years ago | Safe Bayesian Optimization |
| scikit-optimize | 2,748 | over 1 year ago | Sequential model-based optimization with a interface |
| Solid | 575 | over 6 years ago | A comprehensive gradient-free optimization framework written in Python |
| PySwarms | 1,295 | about 1 year ago | A research toolkit for particle swarm optimization in Python |
| Platypus | 579 | about 1 year ago | A Free and Open Source Python Library for Multiobjective Optimization |
| GPflowOpt | 270 | almost 5 years ago | Bayesian Optimization using GPflow |
| POT | 2,454 | 11 months ago | Python Optimal Transport library |
| Talos | 1,626 | over 1 year ago | Hyperparameter Optimization for Keras Models |
| nlopt | 1,908 | 11 months ago | Library for nonlinear optimization (global and local, constrained or unconstrained) |
| OR-Tools | An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi | ||
Awesome Python Data Science / Feature Engineering / General | |||
| Featuretools | 7,304 | 11 months ago | Automated feature engineering |
| Feature Engine | 1,956 | 12 months ago | Feature engineering package with sklearn-like functionality |
| OpenFE | 806 | over 1 year ago | Automated feature generation with expert-level performance |
| skl-groups | 41 | about 9 years ago | A scikit-learn addon to operate on set/"group"-based features |
| Feature Forge | 382 | almost 8 years ago | A set of tools for creating and testing machine learning features |
| few | 51 | over 5 years ago | A feature engineering wrapper for sklearn |
| scikit-mdr | 126 | over 2 years ago | A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction |
| tsfresh | 8,486 | 12 months ago | Automatic extraction of relevant features from time series |
| dirty_cat | 17 | 11 months ago | Machine learning on dirty tabular data (especially: string-based variables for classifcation and regression) |
| NitroFE | 106 | over 3 years ago | Moving window features |
| sk-transformer | 10 | 11 months ago | A collection of various pandas & scikit-learn compatible transformers for all kinds of preprocessing and feature engineering steps |
Awesome Python Data Science / Feature Engineering / Feature Selection | |||
| scikit-feature | 1,513 | over 1 year ago | Feature selection repository in Python |
| boruta_py | 1,529 | about 1 year ago | Implementations of the Boruta all-relevant feature selection method |
| BoostARoota | 219 | over 4 years ago | A fast xgboost feature selection algorithm |
| scikit-rebate | 413 | over 2 years ago | A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning |
| zoofs | 245 | 11 months ago | A feature selection library based on evolutionary algorithms |
Awesome Python Data Science / Visualization / General Purposes | |||
| Matplotlib | 20,443 | 11 months ago | Plotting with Python |
| seaborn | 12,669 | 11 months ago | Statistical data visualization using matplotlib |
| prettyplotlib | 1,695 | almost 7 years ago | Painlessly create beautiful matplotlib plots |
| python-ternary | 744 | over 1 year ago | Ternary plotting library for Python with matplotlib |
| missingno | 3,987 | over 1 year ago | Missing data visualization module for Python |
| chartify | 3,546 | about 1 year ago | Python library that makes it easy for data scientists to create charts |
| physt | 134 | about 1 year ago | Improved histograms |
Awesome Python Data Science / Visualization / Interactive plots | |||
| animatplot | 412 | about 1 year ago | A python package for animating plots built on matplotlib |
| plotly | A Python library that makes interactive and publication-quality graphs | ||
| Bokeh | 19,453 | 11 months ago | Interactive Web Plotting for Python |
| Altair | Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph | ||
| bqplot | 3,634 | 11 months ago | Plotting library for IPython/Jupyter notebooks |
| pyecharts | 14,975 | about 1 year ago | Migrated from , a charting and visualization library, to Python's interactive visual drawing library |
Awesome Python Data Science / Visualization / Map | |||
| folium | Makes it easy to visualize data on an interactive open street map | ||
| geemap | 3,515 | 11 months ago | Python package for interactive mapping with Google Earth Engine (GEE) |
Awesome Python Data Science / Visualization / Automatic Plotting | |||
| HoloViews | 2,719 | 11 months ago | Stop plotting your data - annotate your data and let it visualize itself |
| AutoViz | 1,749 | over 1 year ago | : Visualize data automatically with 1 line of code (ideal for machine learning) |
| SweetViz | 2,965 | over 1 year ago | : Visualize and compare datasets, target values and associations, with one line of code |
Awesome Python Data Science / Visualization / NLP | |||
| pyLDAvis | 1,810 | over 1 year ago | : Visualize interactive topic model |
Awesome Python Data Science / Deployment | |||
| fastapi | Modern, fast (high-performance), a web framework for building APIs with Python | ||
| streamlit | Make it easy to deploy the machine learning model | ||
| streamsync | 1,340 | 11 months ago | No-code in the front, Python in the back. An open-source framework for creating data apps |
| gradio | 34,557 | 11 months ago | Create UIs for your machine learning model in Python in 3 minutes |
| Vizro | 2,736 | 11 months ago | A toolkit for creating modular data visualization applications |
| datapane | A collection of APIs to turn scripts and notebooks into interactive reports | ||
| binder | Enable sharing and execute Jupyter Notebooks | ||
Awesome Python Data Science / Statistics | |||
| pandas_summary | 510 | about 1 year ago | Extension to pandas dataframes describe function |
| Pandas Profiling | 12,602 | 11 months ago | Create HTML profiling reports from pandas DataFrame objects |
| statsmodels | 10,245 | 11 months ago | Statistical modeling and econometrics in Python |
| stockstats | 1,312 | almost 2 years ago | Supply a wrapper based on the with inline stock statistics/indicators support |
| weightedcalcs | 107 | 12 months ago | A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more |
| scikit-posthocs | 354 | 11 months ago | Pairwise Multiple Comparisons Post-hoc Tests |
| Alphalens | 3,433 | over 1 year ago | Performance analysis of predictive (alpha) stock factors |
Awesome Python Data Science / Data Manipulation / Data Frames | |||
| pandas | Powerful Python data analysis toolkit | ||
| polars | 30,943 | 11 months ago | A fast multi-threaded, hybrid-out-of-core DataFrame library |
| Arctic | 3,059 | over 1 year ago | High-performance datastore for time series and tick data |
| datatable | 1,821 | about 1 year ago | Data.table for Python |
| pandas_profiling | 12,602 | 11 months ago | Create HTML profiling reports from pandas DataFrame objects |
| cuDF | 8,534 | 11 months ago | GPU DataFrame Library |
| blaze | 3,185 | about 2 years ago | NumPy and pandas interface to Big Data |
| pandasql | 1,345 | over 1 year ago | Allows you to query pandas DataFrames using SQL syntax |
| pandas-gbq | 451 | 11 months ago | pandas Google Big Query |
| xpandas | 26 | over 3 years ago | Universal 1d/2d data containers with Transformers .functionality for data analysis by |
| pysparkling | 262 | about 1 year ago | A pure Python implementation of Apache Spark's RDD and DStream interfaces |
| modin | 9,942 | 11 months ago | Speed up your pandas workflows by changing a single line of code |
| swifter | 2,552 | over 1 year ago | A package that efficiently applies any function to a pandas dataframe or series in the fastest available manner |
| pandas-log | 214 | over 4 years ago | A package that allows providing feedback about basic pandas operations and finds both business logic and performance issues |
| vaex | 8,315 | about 1 year ago | Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second |
| xarray | 3,660 | 11 months ago | Xarray combines the best features of NumPy and pandas for multidimensional data selection by supplementing numerical axis labels with named dimensions for more intuitive, concise, and less error-prone indexing routines |
Awesome Python Data Science / Data Manipulation / Pipelines | |||
| pdpipe | 718 | about 1 year ago | Sasy pipelines for pandas DataFrames |
| SSPipe | Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch | ||
| pandas-ply | 199 | about 10 years ago | Functional data manipulation for pandas |
| Dplython | 764 | almost 9 years ago | Dplyr for Python |
| sklearn-pandas | 2,815 | over 2 years ago | pandas integration with sklearn |
| Dataset | 202 | 11 months ago | Helps you conveniently work with random or sequential batches of your data and define data processing |
| pyjanitor | 1,371 | 11 months ago | Clean APIs for data cleaning |
| meza | 417 | over 1 year ago | A Python toolkit for processing tabular data |
| Prodmodel | 58 | over 3 years ago | Build system for data science pipelines |
| dopanda | 475 | 11 months ago | Hints and tips for using pandas in an analysis environment |
| Hamilton | 1,900 | 11 months ago | A microframework for dataframe generation that applies Directed Acyclic Graphs specified by a flow of lazily evaluated Python functions |
Awesome Python Data Science / Data Manipulation / Data-centric AI | |||
| cleanlab | 9,820 | 11 months ago | The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels |
| snorkel | 5,826 | over 1 year ago | A system for quickly generating training data with weak supervision |
| dataprep | 2,088 | over 1 year ago | Collect, clean, and visualize your data in Python with a few lines of code |
Awesome Python Data Science / Data Manipulation / Synthetic Data | |||
| ydata-synthetic | 1,456 | 11 months ago | A package to generate synthetic tabular and time-series data leveraging the state-of-the-art generative models |
Awesome Python Data Science / Distributed Computing | |||
| Horovod | 14,305 | 11 months ago | Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet |
| PySpark | Exposes the Spark programming model to Python | ||
| Veles | 905 | almost 2 years ago | Distributed machine learning platform |
| Jubatus | 707 | over 6 years ago | Framework and Library for Distributed Online Machine Learning |
| DMTK | 2,748 | about 7 years ago | Microsoft Distributed Machine Learning Toolkit |
| PaddlePaddle | 22,340 | 11 months ago | PArallel Distributed Deep LEarning |
| dask-ml | 907 | 12 months ago | Distributed and parallel machine learning |
| Distributed | 1,582 | 11 months ago | Distributed computation in Python |
Awesome Python Data Science / Experimentation | |||
| mlflow | 19,021 | 11 months ago | Open source platform for the machine learning lifecycle |
| Neptune | A lightweight ML experiment tracking, results visualization, and management tool | ||
| dvc | 14,016 | 11 months ago | Data Version Control | Git for Data & Models | ML Experiments Management |
| envd | 2,061 | about 1 year ago | 🏕️ machine learning development environment for data science and AI/ML engineering teams |
| Sacred | 4,266 | 12 months ago | A tool to help you configure, organize, log, and reproduce experiments |
| Ax | 2,392 | 11 months ago | Adaptive Experimentation Platform |
Awesome Python Data Science / Data Validation | |||
| great_expectations | 10,054 | 11 months ago | Always know what to expect from your data |
| pandera | 3,472 | 11 months ago | A lightweight, flexible, and expressive statistical data testing library |
| deepchecks | 3,650 | 11 months ago | Validation & testing of ML models and data during model development, deployment, and production |
| evidently | 5,519 | 11 months ago | Evaluate and monitor ML models from validation to production |
| TensorFlow Data Validation | 766 | 12 months ago | Library for exploring and validating machine learning data |
| DataComPy | 487 | 11 months ago | A library to compare Pandas, Polars, and Spark data frames. It provides stats and lets users adjust for match accuracy |
Awesome Python Data Science / Evaluation | |||
| recmetrics | 571 | almost 2 years ago | Library of useful metrics and plots for evaluating recommender systems |
| Metrics | 1,632 | almost 3 years ago | Machine learning evaluation metric |
| sklearn-evaluation | 3 | almost 3 years ago | Model evaluation made easy: plots, tables, and markdown reports |
| AI Fairness 360 | 2,483 | 11 months ago | Fairness metrics for datasets and ML models, explanations, and algorithms to mitigate bias in datasets and models |
Awesome Python Data Science / Computations | |||
| numpy | The fundamental package needed for scientific computing with Python | ||
| Dask | 12,691 | 11 months ago | Parallel computing with task scheduling |
| bottleneck | 1,077 | about 1 year ago | Fast NumPy array functions written in C |
| CuPy | 9,586 | 11 months ago | NumPy-like API accelerated with CUDA |
| scikit-tensor | 403 | about 7 years ago | Python library for multilinear algebra and tensor factorizations |
| numdifftools | 258 | over 2 years ago | Solve automatic numerical differentiation problems in one or more variables |
| quaternion | 614 | about 1 year ago | Add built-in support for quaternions to numpy |
| adaptive | 1,168 | 11 months ago | Tools for adaptive and parallel samping of mathematical functions |
| NumExpr | 2,255 | 12 months ago | A fast numerical expression evaluator for NumPy that comes with an integrated computing virtual machine to speed calculations up by avoiding memory allocation for intermediate results |
Awesome Python Data Science / Web Scraping | |||
| BeautifulSoup | : The easiest library to scrape static websites for beginners | ||
| Scrapy | : Fast and extensible scraping library. Can write rules and create customized scraper without touching the core | ||
| Selenium | : Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user | ||
| Pattern | 8,758 | over 1 year ago | : High level scraping for well-establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization |
| twitterscraper | 2,414 | about 3 years ago | : Efficient library to scrape Twitter |
Awesome Python Data Science / Spatial Analysis | |||
| GeoPandas | 4,559 | 11 months ago | Python tools for geographic data |
| PySal | 1,346 | 12 months ago | Python Spatial Analysis Library |
Awesome Python Data Science / Quantum Computing | |||
| qiskit | 5,404 | 11 months ago | Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules |
| cirq | 4,347 | 11 months ago | A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits |
| PennyLane | 2,409 | 11 months ago | Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations |
| QML | 199 | 11 months ago | A Python Toolkit for Quantum Machine Learning |
Awesome Python Data Science / Conversion | |||
| sklearn-porter | 1,294 | over 1 year ago | Transpile trained scikit-learn estimators to C, Java, JavaScript, and others |
| ONNX | 18,098 | 11 months ago | Open Neural Network Exchange |
| MMdnn | 5,802 | over 1 year ago | A set of tools to help users inter-operate among different deep learning frameworks |
| treelite | 742 | 12 months ago | Universal model exchange and serialization format for decision tree forests |