awesome-python-data-science

Data Science Toolbox

A curated list of data science software in Python

Probably the best curated list of data science software in Python.

GitHub

3k stars
60 watching
346 forks
last commit: 8 months ago
Linked from 4 awesome lists

awesomeawesome-listawesome-pythondata-analysisdata-sciencedata-visualizationdeep-learningmachine-learningpythonscikit-learnstatistics

Awesome Python Data Science / Machine Learning / General Purpose Machine Learning

scikit-learn Machine learning in Python
PyCaret 9,026 5 months ago An open-source, low-code machine learning library in Python
Shogun 3,032 over 1 year ago Machine learning toolbox
xLearn 3,087 over 1 year ago High Performance, Easy-to-use, and Scalable Machine Learning Package
cuML 4,292 5 months ago RAPIDS Machine Learning Library
modAL 2,239 about 1 year ago Modular active learning framework for Python3
Sparkit-learn 1,154 over 4 years ago PySpark + scikit-learn = Sparkit-learn
mlpack 5,151 5 months ago A scalable C++ machine learning library (Python bindings)
dlib 13,623 6 months ago Toolkit for making real-world machine learning and data analysis applications in C++ (Python bindings)
MLxtend 4,926 6 months ago Extension and helper modules for Python's data analysis and machine learning libraries
hyperlearn 1,871 6 months ago 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels
Reproducible Experiment Platform (REP) 689 10 months ago Machine Learning toolbox for Humans
scikit-multilearn 921 over 1 year ago Multi-label classification for python
seqlearn 691 about 2 years ago Sequence classification toolkit for Python
pystruct 664 over 3 years ago Simple structured learning framework for Python
sklearn-expertsys 489 almost 8 years ago Highly interpretable classifiers for scikit learn
RuleFit 411 over 1 year ago Implementation of the rulefit
metric-learn 1,402 10 months ago Metric learning algorithms in Python
pyGAM 876 11 months ago Generalized Additive Models in Python
causalml 5,132 6 months ago Uplift modeling and causal inference with machine learning algorithms

Awesome Python Data Science / Machine Learning / Gradient Boosting

XGBoost 26,396 5 months ago Scalable, Portable, and Distributed Gradient Boosting
LightGBM 16,769 5 months ago A fast, distributed, high-performance gradient boosting
CatBoost 8,139 5 months ago An open-source gradient boosting on decision trees library
ThunderGBM 695 over 1 year ago Fast GBDTs and Random Forests on GPUs
NGBoost 1,663 7 months ago Natural Gradient Boosting for Probabilistic Prediction
TensorFlow Decision Forests 666 6 months ago A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras

Awesome Python Data Science / Machine Learning / Ensemble Methods

ML-Ensemble High performance ensemble learning
Stacking 222 over 7 years ago Simple and useful stacking library written in Python
stacked_generalization 117 about 6 years ago Library for machine learning stacking generalization
vecstack 688 9 months ago Python package for stacking (machine learning technique)

Awesome Python Data Science / Machine Learning / Imbalanced Datasets

imbalanced-learn 6,875 5 months ago Module to perform under-sampling and over-sampling with various techniques
imbalanced-algorithms 235 over 3 years ago Python-based implementations of algorithms for learning on imbalanced data

Awesome Python Data Science / Machine Learning / Random Forests

rpforest 225 over 5 years ago A forest of random projection trees
sklearn-random-bits-forest 9 almost 9 years ago Wrapper of the Random Bits Forest program written by (Wang et al., 2016)
rgf_python 379 over 3 years ago Python Wrapper of Regularized Greedy Forest

Awesome Python Data Science / Machine Learning / Kernel Methods

pyFM 923 over 4 years ago Factorization machines in python
fastFM 1,078 almost 3 years ago A library for Factorization Machines
tffm 780 over 3 years ago TensorFlow implementation of an arbitrary order Factorization Machine
liquidSVM 66 over 5 years ago An implementation of SVMs
scikit-rvm 231 about 8 years ago Relevance Vector Machine implementation using the scikit-learn API
ThunderSVM 1,571 about 1 year ago A fast SVM Library on GPUs and CPUs

Awesome Python Data Science / Deep Learning / PyTorch

PyTorch 84,978 5 months ago Tensors and Dynamic neural networks in Python with strong GPU acceleration
pytorch-lightning 28,636 5 months ago PyTorch Lightning is just organized PyTorch
ignite 4,554 5 months ago High-level library to help with training neural networks in PyTorch
skorch 5,911 5 months ago A scikit-learn compatible neural network library that wraps PyTorch
Catalyst 3,300 about 1 year ago High-level utils for PyTorch DL & RL research
ChemicalX 719 over 1 year ago A PyTorch-based deep learning library for drug pair scoring

Awesome Python Data Science / Deep Learning / TensorFlow

TensorFlow 186,822 5 months ago Computation using data flow graphs for scalable machine learning by Google
TensorLayer 7,337 over 2 years ago Deep Learning and Reinforcement Learning Library for Researcher and Engineer
TFLearn 9,621 about 1 year ago Deep learning library featuring a higher-level API for TensorFlow
Sonnet 9,790 6 months ago TensorFlow-based neural network library
tensorpack 6,303 almost 2 years ago A Neural Net Training Interface on TensorFlow
Polyaxon 3,581 5 months ago A platform that helps you build, manage and monitor deep learning models
tfdeploy 353 about 1 year ago Deploy TensorFlow graphs for fast evaluation and export to TensorFlow-less environments running numpy
tensorflow-upstream 688 5 months ago TensorFlow ROCm port
TensorFlow Fold 1,826 almost 4 years ago Deep learning with dynamic computation graphs in TensorFlow
TensorLight 11 over 2 years ago A high-level framework for TensorFlow
Mesh TensorFlow 1,597 over 1 year ago Model Parallelism Made Easier
Ludwig 11,236 6 months ago A toolbox that allows one to train and test deep learning models without the need to write code
Keras A high-level neural networks API running on top of TensorFlow
keras-contrib 1,579 over 2 years ago Keras community contributions
Hyperas 2,179 over 2 years ago Keras + Hyperopt: A straightforward wrapper for a convenient hyperparameter
Elephas 1,574 about 2 years ago Distributed Deep learning with Keras & Spark
qkeras 541 7 months ago A quantization deep learning library

Awesome Python Data Science / Deep Learning / MXNet

MXNet 20,791 over 1 year ago Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler
Gluon 2,300 almost 6 years ago A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet)
Xfer 253 almost 2 years ago Transfer Learning library for Deep Neural Networks
MXNet 28 over 5 years ago HIP Port of MXNet

Awesome Python Data Science / Deep Learning / JAX

JAX 30,744 5 months ago Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
FLAX 6,196 5 months ago A neural network library for JAX that is designed for flexibility
Optax 1,730 5 months ago A gradient processing and optimization library for JAX

Awesome Python Data Science / Deep Learning / Others

transformers 136,357 5 months ago State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Tangent 2,314 over 2 years ago Source-to-Source Debuggable Derivatives in Pure Python
autograd 7,049 5 months ago Efficiently computes derivatives of numpy code
Caffe 34,149 10 months ago A fast open framework for deep learning
nnabla 2,729 6 months ago Neural Network Libraries by Sony

Awesome Python Data Science / Automated Machine Learning

auto-sklearn 7,667 6 months ago An AutoML toolkit and a drop-in replacement for a scikit-learn estimator
Auto-PyTorch 2,385 about 1 year ago Automatic architecture search and hyperparameter optimization for PyTorch
AutoKeras 9,172 5 months ago AutoML library for deep learning
AutoGluon 8,167 5 months ago AutoML for Image, Text, Tabular, Time-Series, and MultiModal Data
TPOT 9,776 10 months ago AutoML tool that optimizes machine learning pipelines using genetic programming
MLBox 1,500 almost 2 years ago A powerful Automated Machine Learning python library

Awesome Python Data Science / Natural Language Processing

torchtext 3,524 5 months ago Data loaders and abstractions for text and NLP
gluon-nlp 2,560 over 1 year ago NLP made easy
KerasNLP 818 5 months ago Modular Natural Language Processing workflows with Keras
spaCy Industrial-Strength Natural Language Processing
NLTK 13,694 6 months ago Modules, data sets, and tutorials supporting research and development in Natural Language Processing
CLTK 843 6 months ago The Classical Language Toolkik
gensim Topic Modelling for Humans
pyMorfologik 18 almost 10 years ago Python binding for
skift 233 almost 3 years ago Scikit-learn wrappers for Python fastText
Phonemizer 1,249 8 months ago Simple text-to-phonemes converter for multiple languages
flair 13,990 5 months ago Very simple framework for state-of-the-art NLP

Awesome Python Data Science / Computer Audition

torchaudio 2,561 5 months ago An audio library for PyTorch
librosa 7,237 6 months ago Python library for audio and music analysis
Yaafe 244 almost 4 years ago Audio features extraction
aubio 3,336 10 months ago A library for audio and music analysis
Essentia 2,889 7 months ago Library for audio and music analysis, description, and synthesis
LibXtract 227 about 5 years ago A simple, portable, lightweight library of audio feature extraction functions
Marsyas 407 about 2 years ago Music Analysis, Retrieval, and Synthesis for Audio Signals
muda 233 about 4 years ago A library for augmenting annotated audio data
madmom 1,366 9 months ago Python audio and music signal processing library

Awesome Python Data Science / Computer Vision

torchvision 16,364 5 months ago Datasets, Transforms, and Models specific to Computer Vision
PyTorch3D 8,889 6 months ago PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
gluon-cv 5,850 6 months ago Provides implementations of the state-of-the-art deep learning models in computer vision
KerasCV 1,013 6 months ago Industry-strength Computer Vision workflows with Keras
OpenCV 79,662 5 months ago Open Source Computer Vision Library
Decord 1,923 10 months ago An efficient video loader for deep learning with smart shuffling that's super easy to digest
MMEngine 1,196 7 months ago OpenMMLab Foundational Library for Training Deep Learning Models
scikit-image 6,117 5 months ago Image Processing SciKit (Toolbox for SciPy)
imgaug 14,458 10 months ago Image augmentation for machine learning experiments
imgaug_extension Additional augmentations for imgaug
Augmentor 5,084 about 1 year ago Image augmentation library in Python for machine learning
albumentations 14,386 5 months ago Fast image augmentation library and easy-to-use wrapper around other libraries
LAVIS 10,058 6 months ago A One-stop Library for Language-Vision Intelligence

Awesome Python Data Science / Time Series

sktime 8,020 5 months ago A unified framework for machine learning with time series
skforecast 1,189 5 months ago Time series forecasting with machine learning models
darts 8,166 5 months ago A python library for easy manipulation and forecasting of time series
statsforecast 4,045 5 months ago Lightning fast forecasting with statistical and econometric models
mlforecast 924 5 months ago Scalable machine learning-based time series forecasting
neuralforecast 3,181 5 months ago Scalable machine learning-based time series forecasting
tslearn 2,924 11 months ago Machine learning toolkit dedicated to time-series data
tick 495 6 months ago Module for statistical learning, with a particular emphasis on time-dependent modeling
greykite 1,815 11 months ago A flexible, intuitive, and fast forecasting library next
Prophet 18,627 7 months ago Automatic Forecasting Procedure
PyFlux 2,114 over 1 year ago Open source time series library for Python
bayesloop 156 about 1 year ago Probabilistic programming framework that facilitates objective model selection for time-varying parameter models
luminol 1,193 about 2 years ago Anomaly Detection and Correlation library
dateutil Powerful extensions to the standard datetime module
maya 3,414 10 months ago makes it very easy to parse a string and for changing timezones
Chaos Genius 744 8 months ago ML powered analytics engine for outlier/anomaly detection and root cause analysis

Awesome Python Data Science / Reinforcement Learning

Gymnasium 7,613 5 months ago An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly )
PettingZoo 2,678 6 months ago An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities
MAgent2 240 7 months ago An engine for high performance multi-agent environments with very large numbers of agents, along with a set of reference environments
Stable Baselines3 9,329 6 months ago A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines
Shimmy 143 8 months ago An API conversion tool for popular external reinforcement learning environments
EnvPool 1,108 9 months ago C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments
RLlib Scalable Reinforcement Learning
Tianshou 8,069 5 months ago An elegant PyTorch deep reinforcement learning library
Acme 3,542 7 months ago A library of reinforcement learning components and agents
Catalyst-RL 46 over 3 years ago PyTorch framework for RL research
d3rlpy 1,349 6 months ago An offline deep reinforcement learning library
DI-engine 3,143 5 months ago OpenDILab Decision AI Engine
TF-Agents 2,816 5 months ago A library for Reinforcement Learning in TensorFlow
TensorForce 3,299 10 months ago A TensorFlow library for applied reinforcement learning
TRFL 3,136 over 2 years ago TensorFlow Reinforcement Learning
Dopamine 10,591 7 months ago A research framework for fast prototyping of reinforcement learning algorithms
keras-rl 5,530 over 1 year ago Deep Reinforcement Learning for Keras
garage 1,893 about 2 years ago A toolkit for reproducible reinforcement learning research
Horizon 3,575 6 months ago A platform for Applied Reinforcement Learning
rlpyt 2,236 over 4 years ago Reinforcement Learning in PyTorch
cleanrl 5,891 6 months ago High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Machin 402 almost 4 years ago A reinforcement library designed for pytorch
SKRL 588 5 months ago Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Isaac Orbit and Omniverse Isaac Gym
Imitation 1,350 10 months ago Clean PyTorch implementations of imitation and reward learning algorithms

Awesome Python Data Science / Graph Machine Learning

pytorch_geometric 21,597 5 months ago Geometric Deep Learning Extension Library for PyTorch
pytorch_geometric_temporal 2,694 7 months ago Temporal Extension Library for PyTorch Geometric
PyTorch Geometric Signed Directed 131 10 months ago A signed/directed graph neural network extension library for PyTorch Geometric
dgl 13,601 7 months ago Python package built to ease deep learning on graph, on top of existing DL frameworks
Spektral 2,372 over 1 year ago Deep learning on graphs
StellarGraph 2,957 about 1 year ago Machine Learning on Graphs
Graph Nets 5,370 over 2 years ago Build Graph Nets in Tensorflow
TensorFlow GNN 1,372 5 months ago A library to build Graph Neural Networks on the TensorFlow platform
Auto Graph Learning 1,094 10 months ago -An autoML framework & toolkit for machine learning on graphs
PyTorch-BigGraph 3,389 about 1 year ago Generate embeddings from large-scale graph-structured data
Auto Graph Learning 1,094 10 months ago An autoML framework & toolkit for machine learning on graphs
Karate Club 2,178 10 months ago An unsupervised machine learning library for graph-structured data
Little Ball of Fur 705 over 1 year ago A library for sampling graph structured data
GreatX 85 7 months ago A graph reliability toolbox based on PyTorch and PyTorch Geometric (PyG)
Jraph 1,380 about 1 year ago A Graph Neural Network Library in Jax

Awesome Python Data Science / Learning-to-Rank & Recommender Systems

LightFM 4,790 10 months ago A Python implementation of LightFM, a hybrid recommendation algorithm
Spotlight Deep recommender models using PyTorch
Surprise 6,434 11 months ago A Python scikit for building and analyzing recommender systems
RecBole 3,497 9 months ago A unified, comprehensive and efficient recommendation library
allRank 886 10 months ago allRank is a framework for training learning-to-rank neural models based on PyTorch
TensorFlow Recommenders 1,869 6 months ago A library for building recommender system models using TensorFlow
TensorFlow Ranking 2,750 about 1 year ago Learning to Rank in TensorFlow

Awesome Python Data Science / Probabilistic Graphical Models

pomegranate 3,389 7 months ago Probabilistic and graphical models for Python
pgmpy 2,776 5 months ago A python library for working with Probabilistic Graphical Models
pyAgrum A GRaphical Universal Modeler

Awesome Python Data Science / Probabilistic Methods

pyro 8,604 6 months ago A flexible, scalable deep probabilistic programming library built on PyTorch
PyMC 8,786 5 months ago Bayesian Stochastic Modelling in Python
ZhuSuan Bayesian Deep Learning
GPflow Gaussian processes in TensorFlow
InferPy 149 10 months ago Deep Probabilistic Modelling Made Easy
PyStan 343 11 months ago Bayesian inference using the No-U-Turn sampler (Python interface)
sklearn-bayes 514 over 3 years ago Python package for Bayesian Machine Learning with scikit-learn API
skpro 250 5 months ago Supervised domain-agnostic prediction framework for probabilistic modelling by
PyVarInf 359 over 5 years ago Bayesian Deep Learning methods with Variational Inference for PyTorch
emcee 1,478 6 months ago The Python ensemble sampling toolkit for affine-invariant MCMC
hsmmlearn 81 almost 4 years ago A library for hidden semi-Markov models with explicit durations
pyhsmm 549 over 2 years ago Bayesian inference in HSMMs and HMMs
GPyTorch 3,605 5 months ago A highly efficient and modular implementation of Gaussian Processes in PyTorch
sklearn-crfsuite 425 over 1 year ago A scikit-learn-inspired API for CRFsuite

Awesome Python Data Science / Model Explanation

dalex 1,390 8 months ago moDel Agnostic Language for Exploration and explanation
Shapley 219 almost 2 years ago A data-driven framework to quantify the value of classifiers in a machine learning ensemble
Alibi 2,421 5 months ago Algorithms for monitoring and explaining machine learning models
anchor 798 almost 3 years ago Code for "High-Precision Model-Agnostic Explanations" paper
aequitas 701 8 months ago Bias and Fairness Audit Toolkit
Contrastive Explanation 45 over 2 years ago Contrastive Explanation (Foil Trees)
yellowbrick 4,304 8 months ago Visual analysis and diagnostic tools to facilitate machine learning model selection
scikit-plot 2,432 9 months ago An intuitive library to add plotting functionality to scikit-learn objects
shap 23,077 5 months ago A unified approach to explain the output of any machine learning model
ELI5 2,763 about 3 years ago A library for debugging/inspecting machine learning classifiers and explaining their predictions
Lime 11,663 10 months ago Explaining the predictions of any machine learning classifier
FairML 361 about 4 years ago FairML is a python toolbox auditing the machine learning models for bias
L2X 123 about 4 years ago Code for replicating the experiments in the paper
PDPbox 846 9 months ago Partial dependence plot toolbox
PyCEbox 164 almost 5 years ago Python Individual Conditional Expectation Plot Toolbox
Skater Python Library for Model Interpretation
model-analysis 1,258 6 months ago Model analysis tools for TensorFlow
themis-ml 125 over 4 years ago A library that implements fairness-aware machine learning algorithms
treeinterpreter 745 almost 2 years ago Interpreting scikit-learn's decision tree and random forest predictions
AI Explainability 360 1,641 10 months ago Interpretability and explainability of data and machine learning models
Auralisation 42 about 8 years ago Auralisation of learned features in CNN (for audio)
CapsNet-Visualization 394 over 3 years ago A visualization of the CapsNet layers to better understand how it works
lucid 4,678 over 2 years ago A collection of infrastructure and tools for research in neural network interpretability
Netron 28,684 5 months ago Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks)
FlashLight Visualization Tool for your NeuralNetwork
tensorboard-pytorch 7,887 5 months ago Tensorboard for PyTorch (and chainer, mxnet, numpy, ...)

Awesome Python Data Science / Genetic Programming

gplearn 1,636 over 1 year ago Genetic Programming in Python
PyGAD 1,905 5 months ago Genetic Algorithm in Python
DEAP 5,891 6 months ago Distributed Evolutionary Algorithms in Python
karoo_gp 161 over 2 years ago A Genetic Programming platform for Python with GPU support
monkeys 122 almost 7 years ago A strongly-typed genetic programming framework for Python
sklearn-genetic 323 over 1 year ago Genetic feature selection module for scikit-learn

Awesome Python Data Science / Optimization

Optuna 11,082 5 months ago A hyperparameter optimization framework
pymoo 2,333 6 months ago Multi-objective Optimization in Python
pycma 1,123 8 months ago Python implementation of CMA-ES
Spearmint 1,550 over 5 years ago Bayesian optimization
BoTorch 3,126 5 months ago Bayesian optimization in PyTorch
scikit-opt 5,316 11 months ago Heuristic Algorithms for optimization
sklearn-genetic-opt 316 7 months ago Hyperparameters tuning and feature selection using evolutionary algorithms
SMAC3 1,093 5 months ago Sequential Model-based Algorithm Configuration
Optunity 417 over 1 year ago Is a library containing various optimizers for hyperparameter tuning
hyperopt 7,295 7 months ago Distributed Asynchronous Hyperparameter Optimization in Python
hyperopt-sklearn 1,594 11 months ago Hyper-parameter optimization for sklearn
sklearn-deap 771 over 1 year ago Use evolutionary algorithms instead of gridsearch in scikit-learn
sigopt_sklearn 75 almost 2 years ago SigOpt wrappers for scikit-learn methods
Bayesian Optimization 7,978 5 months ago A Python implementation of global optimization with gaussian processes
SafeOpt 141 over 2 years ago Safe Bayesian Optimization
scikit-optimize 2,748 about 1 year ago Sequential model-based optimization with a interface
Solid 575 almost 6 years ago A comprehensive gradient-free optimization framework written in Python
PySwarms 1,295 10 months ago A research toolkit for particle swarm optimization in Python
Platypus 579 8 months ago A Free and Open Source Python Library for Multiobjective Optimization
GPflowOpt 270 over 4 years ago Bayesian Optimization using GPflow
POT 2,454 5 months ago Python Optimal Transport library
Talos 1,626 about 1 year ago Hyperparameter Optimization for Keras Models
nlopt 1,908 5 months ago Library for nonlinear optimization (global and local, constrained or unconstrained)
OR-Tools An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi

Awesome Python Data Science / Feature Engineering / General

Featuretools 7,304 5 months ago Automated feature engineering
Feature Engine 1,956 7 months ago Feature engineering package with sklearn-like functionality
OpenFE 806 12 months ago Automated feature generation with expert-level performance
skl-groups 41 almost 9 years ago A scikit-learn addon to operate on set/"group"-based features
Feature Forge 382 over 7 years ago A set of tools for creating and testing machine learning features
few 51 almost 5 years ago A feature engineering wrapper for sklearn
scikit-mdr 126 almost 2 years ago A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction
tsfresh 8,486 6 months ago Automatic extraction of relevant features from time series
dirty_cat 17 6 months ago Machine learning on dirty tabular data (especially: string-based variables for classifcation and regression)
NitroFE 106 about 3 years ago Moving window features
sk-transformer 10 6 months ago A collection of various pandas & scikit-learn compatible transformers for all kinds of preprocessing and feature engineering steps

Awesome Python Data Science / Feature Engineering / Feature Selection

scikit-feature 1,513 11 months ago Feature selection repository in Python
boruta_py 1,529 9 months ago Implementations of the Boruta all-relevant feature selection method
BoostARoota 219 about 4 years ago A fast xgboost feature selection algorithm
scikit-rebate 413 over 2 years ago A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning
zoofs 245 6 months ago A feature selection library based on evolutionary algorithms

Awesome Python Data Science / Visualization / General Purposes

Matplotlib 20,443 5 months ago Plotting with Python
seaborn 12,669 5 months ago Statistical data visualization using matplotlib
prettyplotlib 1,695 over 6 years ago Painlessly create beautiful matplotlib plots
python-ternary 744 11 months ago Ternary plotting library for Python with matplotlib
missingno 3,987 about 1 year ago Missing data visualization module for Python
chartify 3,546 7 months ago Python library that makes it easy for data scientists to create charts
physt 134 7 months ago Improved histograms

Awesome Python Data Science / Visualization / Interactive plots

animatplot 412 9 months ago A python package for animating plots built on matplotlib
plotly A Python library that makes interactive and publication-quality graphs
Bokeh 19,453 5 months ago Interactive Web Plotting for Python
Altair Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
bqplot 3,634 5 months ago Plotting library for IPython/Jupyter notebooks
pyecharts 14,975 7 months ago Migrated from , a charting and visualization library, to Python's interactive visual drawing library

Awesome Python Data Science / Visualization / Map

folium Makes it easy to visualize data on an interactive open street map
geemap 3,515 5 months ago Python package for interactive mapping with Google Earth Engine (GEE)

Awesome Python Data Science / Visualization / Automatic Plotting

HoloViews 2,719 5 months ago Stop plotting your data - annotate your data and let it visualize itself
AutoViz 1,749 12 months ago : Visualize data automatically with 1 line of code (ideal for machine learning)
SweetViz 2,965 10 months ago : Visualize and compare datasets, target values and associations, with one line of code

Awesome Python Data Science / Visualization / NLP

pyLDAvis 1,810 11 months ago : Visualize interactive topic model

Awesome Python Data Science / Deployment

fastapi Modern, fast (high-performance), a web framework for building APIs with Python
streamlit Make it easy to deploy the machine learning model
streamsync 1,340 5 months ago No-code in the front, Python in the back. An open-source framework for creating data apps
gradio 34,557 5 months ago Create UIs for your machine learning model in Python in 3 minutes
Vizro 2,736 5 months ago A toolkit for creating modular data visualization applications
datapane A collection of APIs to turn scripts and notebooks into interactive reports
binder Enable sharing and execute Jupyter Notebooks

Awesome Python Data Science / Statistics

pandas_summary 510 7 months ago Extension to pandas dataframes describe function
Pandas Profiling 12,602 5 months ago Create HTML profiling reports from pandas DataFrame objects
statsmodels 10,245 5 months ago Statistical modeling and econometrics in Python
stockstats 1,312 over 1 year ago Supply a wrapper based on the with inline stock statistics/indicators support
weightedcalcs 107 6 months ago A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more
scikit-posthocs 354 6 months ago Pairwise Multiple Comparisons Post-hoc Tests
Alphalens 3,433 over 1 year ago Performance analysis of predictive (alpha) stock factors

Awesome Python Data Science / Data Manipulation / Data Frames

pandas Powerful Python data analysis toolkit
polars 30,943 5 months ago A fast multi-threaded, hybrid-out-of-core DataFrame library
Arctic 3,059 about 1 year ago High-performance datastore for time series and tick data
datatable 1,821 7 months ago Data.table for Python
pandas_profiling 12,602 5 months ago Create HTML profiling reports from pandas DataFrame objects
cuDF 8,534 5 months ago GPU DataFrame Library
blaze 3,185 over 1 year ago NumPy and pandas interface to Big Data
pandasql 1,345 10 months ago Allows you to query pandas DataFrames using SQL syntax
pandas-gbq 451 5 months ago pandas Google Big Query
xpandas 26 almost 3 years ago Universal 1d/2d data containers with Transformers .functionality for data analysis by
pysparkling 262 9 months ago A pure Python implementation of Apache Spark's RDD and DStream interfaces
modin 9,942 6 months ago Speed up your pandas workflows by changing a single line of code
swifter 2,552 about 1 year ago A package that efficiently applies any function to a pandas dataframe or series in the fastest available manner
pandas-log 214 almost 4 years ago A package that allows providing feedback about basic pandas operations and finds both business logic and performance issues
vaex 8,315 8 months ago Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second
xarray 3,660 5 months ago Xarray combines the best features of NumPy and pandas for multidimensional data selection by supplementing numerical axis labels with named dimensions for more intuitive, concise, and less error-prone indexing routines

Awesome Python Data Science / Data Manipulation / Pipelines

pdpipe 718 7 months ago Sasy pipelines for pandas DataFrames
SSPipe Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch
pandas-ply 199 over 9 years ago Functional data manipulation for pandas
Dplython 764 over 8 years ago Dplyr for Python
sklearn-pandas 2,815 almost 2 years ago pandas integration with sklearn
Dataset 202 6 months ago Helps you conveniently work with random or sequential batches of your data and define data processing
pyjanitor 1,371 5 months ago Clean APIs for data cleaning
meza 417 10 months ago A Python toolkit for processing tabular data
Prodmodel 58 almost 3 years ago Build system for data science pipelines
dopanda 475 6 months ago Hints and tips for using pandas in an analysis environment
Hamilton 1,900 5 months ago A microframework for dataframe generation that applies Directed Acyclic Graphs specified by a flow of lazily evaluated Python functions

Awesome Python Data Science / Data Manipulation / Data-centric AI

cleanlab 9,820 5 months ago The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels
snorkel 5,826 about 1 year ago A system for quickly generating training data with weak supervision
dataprep 2,088 11 months ago Collect, clean, and visualize your data in Python with a few lines of code

Awesome Python Data Science / Data Manipulation / Synthetic Data

ydata-synthetic 1,456 5 months ago A package to generate synthetic tabular and time-series data leveraging the state-of-the-art generative models

Awesome Python Data Science / Distributed Computing

Horovod 14,305 5 months ago Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet
PySpark Exposes the Spark programming model to Python
Veles 905 over 1 year ago Distributed machine learning platform
Jubatus 707 about 6 years ago Framework and Library for Distributed Online Machine Learning
DMTK 2,748 over 6 years ago Microsoft Distributed Machine Learning Toolkit
PaddlePaddle 22,340 5 months ago PArallel Distributed Deep LEarning
dask-ml 907 6 months ago Distributed and parallel machine learning
Distributed 1,582 5 months ago Distributed computation in Python

Awesome Python Data Science / Experimentation

mlflow 19,021 5 months ago Open source platform for the machine learning lifecycle
Neptune A lightweight ML experiment tracking, results visualization, and management tool
dvc 14,016 5 months ago Data Version Control | Git for Data & Models | ML Experiments Management
envd 2,061 8 months ago 🏕️ machine learning development environment for data science and AI/ML engineering teams
Sacred 4,266 6 months ago A tool to help you configure, organize, log, and reproduce experiments
Ax 2,392 5 months ago Adaptive Experimentation Platform

Awesome Python Data Science / Data Validation

great_expectations 10,054 5 months ago Always know what to expect from your data
pandera 3,472 5 months ago A lightweight, flexible, and expressive statistical data testing library
deepchecks 3,650 5 months ago Validation & testing of ML models and data during model development, deployment, and production
evidently 5,519 5 months ago Evaluate and monitor ML models from validation to production
TensorFlow Data Validation 766 6 months ago Library for exploring and validating machine learning data
DataComPy 487 5 months ago A library to compare Pandas, Polars, and Spark data frames. It provides stats and lets users adjust for match accuracy

Awesome Python Data Science / Evaluation

recmetrics 571 over 1 year ago Library of useful metrics and plots for evaluating recommender systems
Metrics 1,632 over 2 years ago Machine learning evaluation metric
sklearn-evaluation 3 over 2 years ago Model evaluation made easy: plots, tables, and markdown reports
AI Fairness 360 2,483 5 months ago Fairness metrics for datasets and ML models, explanations, and algorithms to mitigate bias in datasets and models

Awesome Python Data Science / Computations

numpy The fundamental package needed for scientific computing with Python
Dask 12,691 5 months ago Parallel computing with task scheduling
bottleneck 1,077 7 months ago Fast NumPy array functions written in C
CuPy 9,586 5 months ago NumPy-like API accelerated with CUDA
scikit-tensor 403 over 6 years ago Python library for multilinear algebra and tensor factorizations
numdifftools 258 almost 2 years ago Solve automatic numerical differentiation problems in one or more variables
quaternion 614 7 months ago Add built-in support for quaternions to numpy
adaptive 1,168 5 months ago Tools for adaptive and parallel samping of mathematical functions
NumExpr 2,255 6 months ago A fast numerical expression evaluator for NumPy that comes with an integrated computing virtual machine to speed calculations up by avoiding memory allocation for intermediate results

Awesome Python Data Science / Web Scraping

BeautifulSoup : The easiest library to scrape static websites for beginners
Scrapy : Fast and extensible scraping library. Can write rules and create customized scraper without touching the core
Selenium : Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user
Pattern 8,758 12 months ago : High level scraping for well-establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization
twitterscraper 2,414 over 2 years ago : Efficient library to scrape Twitter

Awesome Python Data Science / Spatial Analysis

GeoPandas 4,559 5 months ago Python tools for geographic data
PySal 1,346 6 months ago Python Spatial Analysis Library

Awesome Python Data Science / Quantum Computing

qiskit 5,404 5 months ago Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules
cirq 4,347 5 months ago A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits
PennyLane 2,409 5 months ago Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations
QML 199 6 months ago A Python Toolkit for Quantum Machine Learning

Awesome Python Data Science / Conversion

sklearn-porter 1,294 11 months ago Transpile trained scikit-learn estimators to C, Java, JavaScript, and others
ONNX 18,098 5 months ago Open Neural Network Exchange
MMdnn 5,802 12 months ago A set of tools to help users inter-operate among different deep learning frameworks
treelite 742 6 months ago Universal model exchange and serialization format for decision tree forests

Backlinks from these awesome lists:

More related projects: