awesome-python-data-science

Data Science Toolbox

A curated list of data science software in Python

Probably the best curated list of data science software in Python.

GitHub

3k stars
59 watching
345 forks
last commit: about 2 months ago
Linked from 4 awesome lists

awesomeawesome-listawesome-pythondata-analysisdata-sciencedata-visualizationdeep-learningmachine-learningpythonscikit-learnstatistics

Awesome Python Data Science / Machine Learning / General Purpose Machine Learning

scikit-learn Machine learning in Python
PyCaret 8,955 13 days ago An open-source, low-code machine learning library in Python
Shogun 3,034 11 months ago Machine learning toolbox
xLearn 3,087 about 1 year ago High Performance, Easy-to-use, and Scalable Machine Learning Package
cuML 4,238 7 days ago RAPIDS Machine Learning Library
modAL 2,228 9 months ago Modular active learning framework for Python3
Sparkit-learn 1,154 almost 4 years ago PySpark + scikit-learn = Sparkit-learn
mlpack 5,113 8 days ago A scalable C++ machine learning library (Python bindings)
dlib 13,561 29 days ago Toolkit for making real-world machine learning and data analysis applications in C++ (Python bindings)
MLxtend 4,907 7 days ago Extension and helper modules for Python's data analysis and machine learning libraries
hyperlearn 1,842 about 1 month ago 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels
Reproducible Experiment Platform (REP) 689 4 months ago Machine Learning toolbox for Humans
scikit-multilearn 921 10 months ago Multi-label classification for python
seqlearn 688 over 1 year ago Sequence classification toolkit for Python
pystruct 665 about 3 years ago Simple structured learning framework for Python
sklearn-expertsys 489 over 7 years ago Highly interpretable classifiers for scikit learn
RuleFit 411 about 1 year ago Implementation of the rulefit
metric-learn 1,399 4 months ago Metric learning algorithms in Python
pyGAM 875 5 months ago Generalized Additive Models in Python
causalml 5,095 13 days ago Uplift modeling and causal inference with machine learning algorithms

Awesome Python Data Science / Machine Learning / Gradient Boosting

XGBoost 26,299 6 days ago Scalable, Portable, and Distributed Gradient Boosting
LightGBM 16,694 6 days ago A fast, distributed, high-performance gradient boosting
CatBoost 8,088 6 days ago An open-source gradient boosting on decision trees library
ThunderGBM 693 10 months ago Fast GBDTs and Random Forests on GPUs
NGBoost 1,654 24 days ago Natural Gradient Boosting for Probabilistic Prediction
TensorFlow Decision Forests 660 10 days ago A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras

Awesome Python Data Science / Machine Learning / Ensemble Methods

ML-Ensemble High performance ensemble learning
Stacking 221 almost 7 years ago Simple and useful stacking library written in Python
stacked_generalization 117 over 5 years ago Library for machine learning stacking generalization
vecstack 685 3 months ago Python package for stacking (machine learning technique)

Awesome Python Data Science / Machine Learning / Imbalanced Datasets

imbalanced-learn 6,847 about 2 months ago Module to perform under-sampling and over-sampling with various techniques
imbalanced-algorithms 235 almost 3 years ago Python-based implementations of algorithms for learning on imbalanced data

Awesome Python Data Science / Machine Learning / Random Forests

rpforest 223 almost 5 years ago A forest of random projection trees
sklearn-random-bits-forest 9 over 8 years ago Wrapper of the Random Bits Forest program written by (Wang et al., 2016)
rgf_python 378 almost 3 years ago Python Wrapper of Regularized Greedy Forest

Awesome Python Data Science / Machine Learning / Kernel Methods

pyFM 922 about 4 years ago Factorization machines in python
fastFM 1,075 over 2 years ago A library for Factorization Machines
tffm 780 almost 3 years ago TensorFlow implementation of an arbitrary order Factorization Machine
liquidSVM 66 almost 5 years ago An implementation of SVMs
scikit-rvm 231 over 7 years ago Relevance Vector Machine implementation using the scikit-learn API
ThunderSVM 1,573 8 months ago A fast SVM Library on GPUs and CPUs

Awesome Python Data Science / Deep Learning / PyTorch

PyTorch 83,959 6 days ago Tensors and Dynamic neural networks in Python with strong GPU acceleration
pytorch-lightning 28,402 3 days ago PyTorch Lightning is just organized PyTorch
ignite 4,526 14 days ago High-level library to help with training neural networks in PyTorch
skorch 5,881 16 days ago A scikit-learn compatible neural network library that wraps PyTorch
Catalyst 3,295 8 months ago High-level utils for PyTorch DL & RL research
ChemicalX 714 about 1 year ago A PyTorch-based deep learning library for drug pair scoring

Awesome Python Data Science / Deep Learning / TensorFlow

TensorFlow 186,382 6 days ago Computation using data flow graphs for scalable machine learning by Google
TensorLayer 7,334 almost 2 years ago Deep Learning and Reinforcement Learning Library for Researcher and Engineer
TFLearn 9,619 7 months ago Deep learning library featuring a higher-level API for TensorFlow
Sonnet 9,776 7 days ago TensorFlow-based neural network library
tensorpack 6,303 over 1 year ago A Neural Net Training Interface on TensorFlow
Polyaxon 3,571 7 days ago A platform that helps you build, manage and monitor deep learning models
tfdeploy 353 9 months ago Deploy TensorFlow graphs for fast evaluation and export to TensorFlow-less environments running numpy
tensorflow-upstream 688 6 days ago TensorFlow ROCm port
TensorFlow Fold 1,827 over 3 years ago Deep learning with dynamic computation graphs in TensorFlow
TensorLight 11 about 2 years ago A high-level framework for TensorFlow
Mesh TensorFlow 1,592 about 1 year ago Model Parallelism Made Easier
Ludwig 11,189 24 days ago A toolbox that allows one to train and test deep learning models without the need to write code
Keras A high-level neural networks API running on top of TensorFlow
keras-contrib 1,581 about 2 years ago Keras community contributions
Hyperas 2,178 almost 2 years ago Keras + Hyperopt: A straightforward wrapper for a convenient hyperparameter
Elephas 1,574 over 1 year ago Distributed Deep learning with Keras & Spark
qkeras 540 29 days ago A quantization deep learning library

Awesome Python Data Science / Deep Learning / MXNet

MXNet 20,781 about 1 year ago Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler
Gluon 2,299 over 5 years ago A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet)
Xfer 252 over 1 year ago Transfer Learning library for Deep Neural Networks
MXNet 28 almost 5 years ago HIP Port of MXNet

Awesome Python Data Science / Deep Learning / JAX

JAX 30,499 6 days ago Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
FLAX 6,132 7 days ago A neural network library for JAX that is designed for flexibility
Optax 1,697 9 days ago A gradient processing and optimization library for JAX

Awesome Python Data Science / Deep Learning / Others

transformers 135,022 6 days ago State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Tangent 2,315 about 2 years ago Source-to-Source Debuggable Derivatives in Pure Python
autograd 7,017 10 days ago Efficiently computes derivatives of numpy code
Caffe 34,125 4 months ago A fast open framework for deep learning
nnabla 2,728 6 days ago Neural Network Libraries by Sony

Awesome Python Data Science / Automated Machine Learning

auto-sklearn 7,632 6 days ago An AutoML toolkit and a drop-in replacement for a scikit-learn estimator
Auto-PyTorch 2,376 8 months ago Automatic architecture search and hyperparameter optimization for PyTorch
AutoKeras 9,154 16 days ago AutoML library for deep learning
AutoGluon 8,039 6 days ago AutoML for Image, Text, Tabular, Time-Series, and MultiModal Data
TPOT 9,736 4 months ago AutoML tool that optimizes machine learning pipelines using genetic programming
MLBox 1,500 over 1 year ago A powerful Automated Machine Learning python library

Awesome Python Data Science / Natural Language Processing

torchtext 3,514 6 days ago Data loaders and abstractions for text and NLP
gluon-nlp 2,557 about 1 year ago NLP made easy
KerasNLP 797 6 days ago Modular Natural Language Processing workflows with Keras
spaCy Industrial-Strength Natural Language Processing
NLTK 13,620 10 days ago Modules, data sets, and tutorials supporting research and development in Natural Language Processing
CLTK 839 3 months ago The Classical Language Toolkik
gensim Topic Modelling for Humans
pyMorfologik 18 over 9 years ago Python binding for
skift 234 over 2 years ago Scikit-learn wrappers for Python fastText
Phonemizer 1,231 about 2 months ago Simple text-to-phonemes converter for multiple languages
flair 13,939 6 days ago Very simple framework for state-of-the-art NLP

Awesome Python Data Science / Computer Audition

torchaudio 2,538 6 days ago An audio library for PyTorch
librosa 7,171 about 1 month ago Python library for audio and music analysis
Yaafe 244 over 3 years ago Audio features extraction
aubio 3,314 4 months ago A library for audio and music analysis
Essentia 2,858 29 days ago Library for audio and music analysis, description, and synthesis
LibXtract 227 over 4 years ago A simple, portable, lightweight library of audio feature extraction functions
Marsyas 406 over 1 year ago Music Analysis, Retrieval, and Synthesis for Audio Signals
muda 233 over 3 years ago A library for augmenting annotated audio data
madmom 1,347 3 months ago Python audio and music signal processing library

Awesome Python Data Science / Computer Vision

torchvision 16,251 6 days ago Datasets, Transforms, and Models specific to Computer Vision
PyTorch3D 8,806 15 days ago PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
gluon-cv 5,833 7 months ago Provides implementations of the state-of-the-art deep learning models in computer vision
KerasCV 1,010 20 days ago Industry-strength Computer Vision workflows with Keras
OpenCV 79,147 5 days ago Open Source Computer Vision Library
Decord 1,891 4 months ago An efficient video loader for deep learning with smart shuffling that's super easy to digest
MMEngine 1,179 15 days ago OpenMMLab Foundational Library for Training Deep Learning Models
scikit-image 6,089 7 days ago Image Processing SciKit (Toolbox for SciPy)
imgaug 14,417 4 months ago Image augmentation for machine learning experiments
imgaug_extension Additional augmentations for imgaug
Augmentor 5,073 8 months ago Image augmentation library in Python for machine learning
albumentations 14,254 9 days ago Fast image augmentation library and easy-to-use wrapper around other libraries
LAVIS 9,926 about 1 month ago A One-stop Library for Language-Vision Intelligence

Awesome Python Data Science / Time Series

sktime 7,943 6 days ago A unified framework for machine learning with time series
skforecast 1,156 4 days ago Time series forecasting with machine learning models
darts 8,087 5 days ago A python library for easy manipulation and forecasting of time series
statsforecast 3,990 10 days ago Lightning fast forecasting with statistical and econometric models
mlforecast 899 7 days ago Scalable machine learning-based time series forecasting
neuralforecast 3,101 9 days ago Scalable machine learning-based time series forecasting
tslearn 2,910 5 months ago Machine learning toolkit dedicated to time-series data
tick 491 3 months ago Module for statistical learning, with a particular emphasis on time-dependent modeling
greykite 1,813 5 months ago A flexible, intuitive, and fast forecasting library next
Prophet 18,514 24 days ago Automatic Forecasting Procedure
PyFlux 2,111 about 1 year ago Open source time series library for Python
bayesloop 153 7 months ago Probabilistic programming framework that facilitates objective model selection for time-varying parameter models
luminol 1,189 over 1 year ago Anomaly Detection and Correlation library
dateutil Powerful extensions to the standard datetime module
maya 3,409 4 months ago makes it very easy to parse a string and for changing timezones
Chaos Genius 733 2 months ago ML powered analytics engine for outlier/anomaly detection and root cause analysis

Awesome Python Data Science / Reinforcement Learning

Gymnasium 7,374 7 days ago An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly )
PettingZoo 2,627 9 days ago An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities
MAgent2 229 17 days ago An engine for high performance multi-agent environments with very large numbers of agents, along with a set of reference environments
Stable Baselines3 9,144 13 days ago A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines
Shimmy 138 about 1 month ago An API conversion tool for popular external reinforcement learning environments
EnvPool 1,094 3 months ago C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments
RLlib Scalable Reinforcement Learning
Tianshou 7,968 26 days ago An elegant PyTorch deep reinforcement learning library
Acme 3,515 22 days ago A library of reinforcement learning components and agents
Catalyst-RL 46 about 3 years ago PyTorch framework for RL research
d3rlpy 1,327 13 days ago An offline deep reinforcement learning library
DI-engine 3,088 16 days ago OpenDILab Decision AI Engine
TF-Agents 2,799 about 1 month ago A library for Reinforcement Learning in TensorFlow
TensorForce 3,296 4 months ago A TensorFlow library for applied reinforcement learning
TRFL 3,134 almost 2 years ago TensorFlow Reinforcement Learning
Dopamine 10,569 17 days ago A research framework for fast prototyping of reinforcement learning algorithms
keras-rl 5,526 about 1 year ago Deep Reinforcement Learning for Keras
garage 1,880 over 1 year ago A toolkit for reproducible reinforcement learning research
Horizon 3,575 9 days ago A platform for Applied Reinforcement Learning
rlpyt 2,232 almost 4 years ago Reinforcement Learning in PyTorch
cleanrl 5,683 7 days ago High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Machin 401 over 3 years ago A reinforcement library designed for pytorch
SKRL 560 16 days ago Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Isaac Orbit and Omniverse Isaac Gym
Imitation 1,327 4 months ago Clean PyTorch implementations of imitation and reward learning algorithms

Awesome Python Data Science / Graph Machine Learning

pytorch_geometric 21,382 6 days ago Geometric Deep Learning Extension Library for PyTorch
pytorch_geometric_temporal 2,669 about 1 month ago Temporal Extension Library for PyTorch Geometric
PyTorch Geometric Signed Directed 128 4 months ago A signed/directed graph neural network extension library for PyTorch Geometric
dgl 13,548 about 1 month ago Python package built to ease deep learning on graph, on top of existing DL frameworks
Spektral 2,371 10 months ago Deep learning on graphs
StellarGraph 2,948 8 months ago Machine Learning on Graphs
Graph Nets 5,360 almost 2 years ago Build Graph Nets in Tensorflow
TensorFlow GNN 1,362 7 days ago A library to build Graph Neural Networks on the TensorFlow platform
Auto Graph Learning 1,088 3 months ago -An autoML framework & toolkit for machine learning on graphs
PyTorch-BigGraph 3,383 9 months ago Generate embeddings from large-scale graph-structured data
Auto Graph Learning 1,088 3 months ago An autoML framework & toolkit for machine learning on graphs
Karate Club 2,163 4 months ago An unsupervised machine learning library for graph-structured data
Little Ball of Fur 703 10 months ago A library for sampling graph structured data
GreatX 83 about 1 month ago A graph reliability toolbox based on PyTorch and PyTorch Geometric (PyG)
Jraph 1,375 8 months ago A Graph Neural Network Library in Jax

Awesome Python Data Science / Learning-to-Rank & Recommender Systems

LightFM 4,773 4 months ago A Python implementation of LightFM, a hybrid recommendation algorithm
Spotlight Deep recommender models using PyTorch
Surprise 6,413 5 months ago A Python scikit for building and analyzing recommender systems
RecBole 3,450 3 months ago A unified, comprehensive and efficient recommendation library
allRank 871 4 months ago allRank is a framework for training learning-to-rank neural models based on PyTorch
TensorFlow Recommenders 1,849 12 days ago A library for building recommender system models using TensorFlow
TensorFlow Ranking 2,743 8 months ago Learning to Rank in TensorFlow

Awesome Python Data Science / Probabilistic Graphical Models

pomegranate 3,376 about 1 month ago Probabilistic and graphical models for Python
pgmpy 2,748 7 days ago A python library for working with Probabilistic Graphical Models
pyAgrum A GRaphical Universal Modeler

Awesome Python Data Science / Probabilistic Methods

pyro 8,556 19 days ago A flexible, scalable deep probabilistic programming library built on PyTorch
PyMC 8,722 3 days ago Bayesian Stochastic Modelling in Python
ZhuSuan Bayesian Deep Learning
GPflow Gaussian processes in TensorFlow
InferPy 147 4 months ago Deep Probabilistic Modelling Made Easy
PyStan 342 5 months ago Bayesian inference using the No-U-Turn sampler (Python interface)
sklearn-bayes 514 about 3 years ago Python package for Bayesian Machine Learning with scikit-learn API
skpro 249 7 days ago Supervised domain-agnostic prediction framework for probabilistic modelling by
PyVarInf 359 about 5 years ago Bayesian Deep Learning methods with Variational Inference for PyTorch
emcee 1,470 18 days ago The Python ensemble sampling toolkit for affine-invariant MCMC
hsmmlearn 80 about 3 years ago A library for hidden semi-Markov models with explicit durations
pyhsmm 550 about 2 years ago Bayesian inference in HSMMs and HMMs
GPyTorch 3,580 20 days ago A highly efficient and modular implementation of Gaussian Processes in PyTorch
sklearn-crfsuite 426 about 1 year ago A scikit-learn-inspired API for CRFsuite

Awesome Python Data Science / Model Explanation

dalex 1,375 about 2 months ago moDel Agnostic Language for Exploration and explanation
Shapley 218 over 1 year ago A data-driven framework to quantify the value of classifiers in a machine learning ensemble
Alibi 2,414 4 months ago Algorithms for monitoring and explaining machine learning models
anchor 798 over 2 years ago Code for "High-Precision Model-Agnostic Explanations" paper
aequitas 694 2 months ago Bias and Fairness Audit Toolkit
Contrastive Explanation 45 almost 2 years ago Contrastive Explanation (Foil Trees)
yellowbrick 4,293 about 2 months ago Visual analysis and diagnostic tools to facilitate machine learning model selection
scikit-plot 2,427 3 months ago An intuitive library to add plotting functionality to scikit-learn objects
shap 22,876 12 days ago A unified approach to explain the output of any machine learning model
ELI5 2,757 over 2 years ago A library for debugging/inspecting machine learning classifiers and explaining their predictions
Lime 11,615 4 months ago Explaining the predictions of any machine learning classifier
FairML 360 over 3 years ago FairML is a python toolbox auditing the machine learning models for bias
L2X 124 over 3 years ago Code for replicating the experiments in the paper
PDPbox 845 3 months ago Partial dependence plot toolbox
PyCEbox 165 over 4 years ago Python Individual Conditional Expectation Plot Toolbox
Skater Python Library for Model Interpretation
model-analysis 1,258 16 days ago Model analysis tools for TensorFlow
themis-ml 124 about 4 years ago A library that implements fairness-aware machine learning algorithms
treeinterpreter 744 over 1 year ago Interpreting scikit-learn's decision tree and random forest predictions
AI Explainability 360 1,633 4 months ago Interpretability and explainability of data and machine learning models
Auralisation 42 over 7 years ago Auralisation of learned features in CNN (for audio)
CapsNet-Visualization 394 about 3 years ago A visualization of the CapsNet layers to better understand how it works
lucid 4,673 almost 2 years ago A collection of infrastructure and tools for research in neural network interpretability
Netron 28,134 6 days ago Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks)
FlashLight Visualization Tool for your NeuralNetwork
tensorboard-pytorch 7,870 3 months ago Tensorboard for PyTorch (and chainer, mxnet, numpy, ...)

Awesome Python Data Science / Genetic Programming

gplearn 1,615 12 months ago Genetic Programming in Python
PyGAD 1,884 2 months ago Genetic Algorithm in Python
DEAP 5,852 8 days ago Distributed Evolutionary Algorithms in Python
karoo_gp 161 about 2 years ago A Genetic Programming platform for Python with GPU support
monkeys 122 over 6 years ago A strongly-typed genetic programming framework for Python
sklearn-genetic 323 10 months ago Genetic feature selection module for scikit-learn

Awesome Python Data Science / Optimization

Optuna 10,910 6 days ago A hyperparameter optimization framework
pymoo 2,285 3 months ago Multi-objective Optimization in Python
pycma 1,109 about 1 month ago Python implementation of CMA-ES
Spearmint 1,547 almost 5 years ago Bayesian optimization
BoTorch 3,102 6 days ago Bayesian optimization in PyTorch
scikit-opt 5,282 5 months ago Heuristic Algorithms for optimization
sklearn-genetic-opt 314 about 1 month ago Hyperparameters tuning and feature selection using evolutionary algorithms
SMAC3 1,085 23 days ago Sequential Model-based Algorithm Configuration
Optunity 416 12 months ago Is a library containing various optimizers for hyperparameter tuning
hyperopt 7,258 24 days ago Distributed Asynchronous Hyperparameter Optimization in Python
hyperopt-sklearn 1,588 5 months ago Hyper-parameter optimization for sklearn
sklearn-deap 771 10 months ago Use evolutionary algorithms instead of gridsearch in scikit-learn
sigopt_sklearn 75 about 1 year ago SigOpt wrappers for scikit-learn methods
Bayesian Optimization 7,919 about 1 month ago A Python implementation of global optimization with gaussian processes
SafeOpt 141 about 2 years ago Safe Bayesian Optimization
scikit-optimize 2,744 9 months ago Sequential model-based optimization with a interface
Solid 576 over 5 years ago A comprehensive gradient-free optimization framework written in Python
PySwarms 1,283 4 months ago A research toolkit for particle swarm optimization in Python
Platypus 573 about 2 months ago A Free and Open Source Python Library for Multiobjective Optimization
GPflowOpt 270 almost 4 years ago Bayesian Optimization using GPflow
POT 2,431 14 days ago Python Optimal Transport library
Talos 1,625 7 months ago Hyperparameter Optimization for Keras Models
nlopt 1,892 7 days ago Library for nonlinear optimization (global and local, constrained or unconstrained)
OR-Tools An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi

Awesome Python Data Science / Feature Engineering / General

Featuretools 7,270 8 days ago Automated feature engineering
Feature Engine 1,926 13 days ago Feature engineering package with sklearn-like functionality
OpenFE 782 6 months ago Automated feature generation with expert-level performance
skl-groups 41 over 8 years ago A scikit-learn addon to operate on set/"group"-based features
Feature Forge 382 almost 7 years ago A set of tools for creating and testing machine learning features
few 51 over 4 years ago A feature engineering wrapper for sklearn
scikit-mdr 126 over 1 year ago A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction
tsfresh 8,435 7 days ago Automatic extraction of relevant features from time series
dirty_cat 16 over 1 year ago Machine learning on dirty tabular data (especially: string-based variables for classifcation and regression)
NitroFE 106 over 2 years ago Moving window features
sk-transformer 8 10 days ago A collection of various pandas & scikit-learn compatible transformers for all kinds of preprocessing and feature engineering steps

Awesome Python Data Science / Feature Engineering / Feature Selection

scikit-feature 1,509 4 months ago Feature selection repository in Python
boruta_py 1,511 3 months ago Implementations of the Boruta all-relevant feature selection method
BoostARoota 219 over 3 years ago A fast xgboost feature selection algorithm
scikit-rebate 409 almost 2 years ago A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning
zoofs 243 4 months ago A feature selection library based on evolutionary algorithms

Awesome Python Data Science / Visualization / General Purposes

Matplotlib 20,294 6 days ago Plotting with Python
seaborn 12,575 3 months ago Statistical data visualization using matplotlib
prettyplotlib 1,692 almost 6 years ago Painlessly create beautiful matplotlib plots
python-ternary 733 5 months ago Ternary plotting library for Python with matplotlib
missingno 3,961 6 months ago Missing data visualization module for Python
chartify 3,535 about 1 month ago Python library that makes it easy for data scientists to create charts
physt 134 about 1 month ago Improved histograms

Awesome Python Data Science / Visualization / Interactive plots

animatplot 412 3 months ago A python package for animating plots built on matplotlib
plotly A Python library that makes interactive and publication-quality graphs
Bokeh 19,372 8 days ago Interactive Web Plotting for Python
Altair Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
bqplot 3,627 23 days ago Plotting library for IPython/Jupyter notebooks
pyecharts 14,903 15 days ago Migrated from , a charting and visualization library, to Python's interactive visual drawing library

Awesome Python Data Science / Visualization / Map

folium Makes it easy to visualize data on an interactive open street map
geemap 3,473 7 days ago Python package for interactive mapping with Google Earth Engine (GEE)

Awesome Python Data Science / Visualization / Automatic Plotting

HoloViews 2,707 5 days ago Stop plotting your data - annotate your data and let it visualize itself
AutoViz 1,729 5 months ago : Visualize data automatically with 1 line of code (ideal for machine learning)
SweetViz 2,949 4 months ago : Visualize and compare datasets, target values and associations, with one line of code

Awesome Python Data Science / Visualization / NLP

pyLDAvis 1,805 4 months ago : Visualize interactive topic model

Awesome Python Data Science / Deployment

fastapi Modern, fast (high-performance), a web framework for building APIs with Python
streamlit Make it easy to deploy the machine learning model
streamsync 1,328 6 days ago No-code in the front, Python in the back. An open-source framework for creating data apps
gradio 33,962 6 days ago Create UIs for your machine learning model in Python in 3 minutes
Vizro 2,707 6 days ago A toolkit for creating modular data visualization applications
datapane A collection of APIs to turn scripts and notebooks into interactive reports
binder Enable sharing and execute Jupyter Notebooks

Awesome Python Data Science / Statistics

pandas_summary 504 28 days ago Extension to pandas dataframes describe function
Pandas Profiling 12,536 8 days ago Create HTML profiling reports from pandas DataFrame objects
statsmodels 10,151 7 days ago Statistical modeling and econometrics in Python
stockstats 1,303 11 months ago Supply a wrapper based on the with inline stock statistics/indicators support
weightedcalcs 105 11 days ago A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more
scikit-posthocs 348 26 days ago Pairwise Multiple Comparisons Post-hoc Tests
Alphalens 3,384 9 months ago Performance analysis of predictive (alpha) stock factors

Awesome Python Data Science / Data Manipulation / Data Frames

pandas Powerful Python data analysis toolkit
polars 30,400 4 days ago A fast multi-threaded, hybrid-out-of-core DataFrame library
Arctic 3,055 8 months ago High-performance datastore for time series and tick data
datatable 1,817 28 days ago Data.table for Python
pandas_profiling 12,536 8 days ago Create HTML profiling reports from pandas DataFrame objects
cuDF 8,448 4 days ago GPU DataFrame Library
blaze 3,187 about 1 year ago NumPy and pandas interface to Big Data
pandasql 1,342 4 months ago Allows you to query pandas DataFrames using SQL syntax
pandas-gbq 448 9 days ago pandas Google Big Query
xpandas 26 over 2 years ago Universal 1d/2d data containers with Transformers .functionality for data analysis by
pysparkling 262 3 months ago A pure Python implementation of Apache Spark's RDD and DStream interfaces
modin 9,892 2 months ago Speed up your pandas workflows by changing a single line of code
swifter 2,540 8 months ago A package that efficiently applies any function to a pandas dataframe or series in the fastest available manner
pandas-log 214 over 3 years ago A package that allows providing feedback about basic pandas operations and finds both business logic and performance issues
vaex 8,297 about 1 month ago Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second
xarray 3,619 6 days ago Xarray combines the best features of NumPy and pandas for multidimensional data selection by supplementing numerical axis labels with named dimensions for more intuitive, concise, and less error-prone indexing routines

Awesome Python Data Science / Data Manipulation / Pipelines

pdpipe 716 21 days ago Sasy pipelines for pandas DataFrames
SSPipe Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch
pandas-ply 200 about 9 years ago Functional data manipulation for pandas
Dplython 764 almost 8 years ago Dplyr for Python
sklearn-pandas 2,814 over 1 year ago pandas integration with sklearn
Dataset 201 23 days ago Helps you conveniently work with random or sequential batches of your data and define data processing
pyjanitor 1,364 6 days ago Clean APIs for data cleaning
meza 416 4 months ago A Python toolkit for processing tabular data
Prodmodel 59 over 2 years ago Build system for data science pipelines
dopanda 473 about 1 month ago Hints and tips for using pandas in an analysis environment
Hamilton 1,861 7 days ago A microframework for dataframe generation that applies Directed Acyclic Graphs specified by a flow of lazily evaluated Python functions

Awesome Python Data Science / Data Manipulation / Data-centric AI

cleanlab 9,756 29 days ago The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels
snorkel 5,809 7 months ago A system for quickly generating training data with weak supervision
dataprep 2,068 5 months ago Collect, clean, and visualize your data in Python with a few lines of code

Awesome Python Data Science / Data Manipulation / Synthetic Data

ydata-synthetic 1,441 13 days ago A package to generate synthetic tabular and time-series data leveraging the state-of-the-art generative models

Awesome Python Data Science / Distributed Computing

Horovod 14,265 3 months ago Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet
PySpark Exposes the Spark programming model to Python
Veles 906 about 1 year ago Distributed machine learning platform
Jubatus 707 over 5 years ago Framework and Library for Distributed Online Machine Learning
DMTK 2,745 about 6 years ago Microsoft Distributed Machine Learning Toolkit
PaddlePaddle 22,258 6 days ago PArallel Distributed Deep LEarning
dask-ml 902 4 months ago Distributed and parallel machine learning
Distributed 1,579 4 days ago Distributed computation in Python

Awesome Python Data Science / Experimentation

mlflow 18,781 6 days ago Open source platform for the machine learning lifecycle
Neptune A lightweight ML experiment tracking, results visualization, and management tool
dvc 13,899 6 days ago Data Version Control | Git for Data & Models | ML Experiments Management
envd 2,038 about 2 months ago 🏕️ machine learning development environment for data science and AI/ML engineering teams
Sacred 4,254 about 1 month ago A tool to help you configure, organize, log, and reproduce experiments
Ax 2,378 6 days ago Adaptive Experimentation Platform

Awesome Python Data Science / Data Validation

great_expectations 9,989 4 days ago Always know what to expect from your data
pandera 3,393 7 days ago A lightweight, flexible, and expressive statistical data testing library
deepchecks 3,623 8 days ago Validation & testing of ML models and data during model development, deployment, and production
evidently 5,391 7 days ago Evaluate and monitor ML models from validation to production
TensorFlow Data Validation 765 20 days ago Library for exploring and validating machine learning data
DataComPy 485 6 days ago A library to compare Pandas, Polars, and Spark data frames. It provides stats and lets users adjust for match accuracy

Awesome Python Data Science / Evaluation

recmetrics 569 10 months ago Library of useful metrics and plots for evaluating recommender systems
Metrics 1,627 almost 2 years ago Machine learning evaluation metric
sklearn-evaluation 3 almost 2 years ago Model evaluation made easy: plots, tables, and markdown reports
AI Fairness 360 2,457 5 months ago Fairness metrics for datasets and ML models, explanations, and algorithms to mitigate bias in datasets and models

Awesome Python Data Science / Computations

numpy The fundamental package needed for scientific computing with Python
Dask 12,593 6 days ago Parallel computing with task scheduling
bottleneck 1,073 about 1 month ago Fast NumPy array functions written in C
CuPy 9,485 4 days ago NumPy-like API accelerated with CUDA
scikit-tensor 402 about 6 years ago Python library for multilinear algebra and tensor factorizations
numdifftools 256 over 1 year ago Solve automatic numerical differentiation problems in one or more variables
quaternion 612 23 days ago Add built-in support for quaternions to numpy
adaptive 1,164 10 days ago Tools for adaptive and parallel samping of mathematical functions
NumExpr 2,238 2 months ago A fast numerical expression evaluator for NumPy that comes with an integrated computing virtual machine to speed calculations up by avoiding memory allocation for intermediate results

Awesome Python Data Science / Web Scraping

BeautifulSoup : The easiest library to scrape static websites for beginners
Scrapy : Fast and extensible scraping library. Can write rules and create customized scraper without touching the core
Selenium : Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user
Pattern 8,750 5 months ago : High level scraping for well-establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization
twitterscraper 2,412 about 2 years ago : Efficient library to scrape Twitter

Awesome Python Data Science / Spatial Analysis

GeoPandas 4,519 4 days ago Python tools for geographic data
PySal 1,331 about 1 month ago Python Spatial Analysis Library

Awesome Python Data Science / Quantum Computing

qiskit 5,280 3 days ago Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules
cirq 4,282 7 days ago A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits
PennyLane 2,355 4 days ago Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations
QML 199 7 months ago A Python Toolkit for Quantum Machine Learning

Awesome Python Data Science / Conversion

sklearn-porter 1,293 5 months ago Transpile trained scikit-learn estimators to C, Java, JavaScript, and others
ONNX 17,938 4 days ago Open Neural Network Exchange
MMdnn 5,797 6 months ago A set of tools to help users inter-operate among different deep learning frameworks
treelite 738 16 days ago Universal model exchange and serialization format for decision tree forests

Backlinks from these awesome lists:

More related projects: