awesome-python-data-science

Data Science Toolbox

A curated list of data science software in Python

Probably the best curated list of data science software in Python.

GitHub

3k stars
60 watching
346 forks
last commit: 4 months ago
Linked from 4 awesome lists

awesomeawesome-listawesome-pythondata-analysisdata-sciencedata-visualizationdeep-learningmachine-learningpythonscikit-learnstatistics

Awesome Python Data Science / Machine Learning / General Purpose Machine Learning

scikit-learn Machine learning in Python
PyCaret 9,026 about 1 month ago An open-source, low-code machine learning library in Python
Shogun 3,032 about 1 year ago Machine learning toolbox
xLearn 3,087 over 1 year ago High Performance, Easy-to-use, and Scalable Machine Learning Package
cuML 4,292 about 1 month ago RAPIDS Machine Learning Library
modAL 2,239 11 months ago Modular active learning framework for Python3
Sparkit-learn 1,154 about 4 years ago PySpark + scikit-learn = Sparkit-learn
mlpack 5,151 about 1 month ago A scalable C++ machine learning library (Python bindings)
dlib 13,623 2 months ago Toolkit for making real-world machine learning and data analysis applications in C++ (Python bindings)
MLxtend 4,926 2 months ago Extension and helper modules for Python's data analysis and machine learning libraries
hyperlearn 1,871 2 months ago 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels
Reproducible Experiment Platform (REP) 689 6 months ago Machine Learning toolbox for Humans
scikit-multilearn 921 12 months ago Multi-label classification for python
seqlearn 691 almost 2 years ago Sequence classification toolkit for Python
pystruct 664 over 3 years ago Simple structured learning framework for Python
sklearn-expertsys 489 over 7 years ago Highly interpretable classifiers for scikit learn
RuleFit 411 over 1 year ago Implementation of the rulefit
metric-learn 1,402 6 months ago Metric learning algorithms in Python
pyGAM 876 7 months ago Generalized Additive Models in Python
causalml 5,132 about 1 month ago Uplift modeling and causal inference with machine learning algorithms

Awesome Python Data Science / Machine Learning / Gradient Boosting

XGBoost 26,396 about 1 month ago Scalable, Portable, and Distributed Gradient Boosting
LightGBM 16,769 about 1 month ago A fast, distributed, high-performance gradient boosting
CatBoost 8,139 about 1 month ago An open-source gradient boosting on decision trees library
ThunderGBM 695 12 months ago Fast GBDTs and Random Forests on GPUs
NGBoost 1,663 3 months ago Natural Gradient Boosting for Probabilistic Prediction
TensorFlow Decision Forests 666 about 2 months ago A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras

Awesome Python Data Science / Machine Learning / Ensemble Methods

ML-Ensemble High performance ensemble learning
Stacking 222 about 7 years ago Simple and useful stacking library written in Python
stacked_generalization 117 over 5 years ago Library for machine learning stacking generalization
vecstack 688 5 months ago Python package for stacking (machine learning technique)

Awesome Python Data Science / Machine Learning / Imbalanced Datasets

imbalanced-learn 6,875 about 1 month ago Module to perform under-sampling and over-sampling with various techniques
imbalanced-algorithms 235 almost 3 years ago Python-based implementations of algorithms for learning on imbalanced data

Awesome Python Data Science / Machine Learning / Random Forests

rpforest 225 almost 5 years ago A forest of random projection trees
sklearn-random-bits-forest 9 over 8 years ago Wrapper of the Random Bits Forest program written by (Wang et al., 2016)
rgf_python 379 about 3 years ago Python Wrapper of Regularized Greedy Forest

Awesome Python Data Science / Machine Learning / Kernel Methods

pyFM 923 over 4 years ago Factorization machines in python
fastFM 1,078 over 2 years ago A library for Factorization Machines
tffm 780 about 3 years ago TensorFlow implementation of an arbitrary order Factorization Machine
liquidSVM 66 almost 5 years ago An implementation of SVMs
scikit-rvm 231 over 7 years ago Relevance Vector Machine implementation using the scikit-learn API
ThunderSVM 1,571 10 months ago A fast SVM Library on GPUs and CPUs

Awesome Python Data Science / Deep Learning / PyTorch

PyTorch 84,978 about 1 month ago Tensors and Dynamic neural networks in Python with strong GPU acceleration
pytorch-lightning 28,636 about 1 month ago PyTorch Lightning is just organized PyTorch
ignite 4,554 about 1 month ago High-level library to help with training neural networks in PyTorch
skorch 5,911 about 1 month ago A scikit-learn compatible neural network library that wraps PyTorch
Catalyst 3,300 10 months ago High-level utils for PyTorch DL & RL research
ChemicalX 719 over 1 year ago A PyTorch-based deep learning library for drug pair scoring

Awesome Python Data Science / Deep Learning / TensorFlow

TensorFlow 186,822 about 1 month ago Computation using data flow graphs for scalable machine learning by Google
TensorLayer 7,337 almost 2 years ago Deep Learning and Reinforcement Learning Library for Researcher and Engineer
TFLearn 9,621 9 months ago Deep learning library featuring a higher-level API for TensorFlow
Sonnet 9,790 2 months ago TensorFlow-based neural network library
tensorpack 6,303 over 1 year ago A Neural Net Training Interface on TensorFlow
Polyaxon 3,581 about 1 month ago A platform that helps you build, manage and monitor deep learning models
tfdeploy 353 11 months ago Deploy TensorFlow graphs for fast evaluation and export to TensorFlow-less environments running numpy
tensorflow-upstream 688 about 1 month ago TensorFlow ROCm port
TensorFlow Fold 1,826 over 3 years ago Deep learning with dynamic computation graphs in TensorFlow
TensorLight 11 over 2 years ago A high-level framework for TensorFlow
Mesh TensorFlow 1,597 about 1 year ago Model Parallelism Made Easier
Ludwig 11,236 about 2 months ago A toolbox that allows one to train and test deep learning models without the need to write code
Keras A high-level neural networks API running on top of TensorFlow
keras-contrib 1,579 about 2 years ago Keras community contributions
Hyperas 2,179 about 2 years ago Keras + Hyperopt: A straightforward wrapper for a convenient hyperparameter
Elephas 1,574 over 1 year ago Distributed Deep learning with Keras & Spark
qkeras 541 3 months ago A quantization deep learning library

Awesome Python Data Science / Deep Learning / MXNet

MXNet 20,791 about 1 year ago Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler
Gluon 2,300 over 5 years ago A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet)
Xfer 253 over 1 year ago Transfer Learning library for Deep Neural Networks
MXNet 28 about 5 years ago HIP Port of MXNet

Awesome Python Data Science / Deep Learning / JAX

JAX 30,744 about 1 month ago Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
FLAX 6,196 about 1 month ago A neural network library for JAX that is designed for flexibility
Optax 1,730 about 1 month ago A gradient processing and optimization library for JAX

Awesome Python Data Science / Deep Learning / Others

transformers 136,357 about 1 month ago State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Tangent 2,314 over 2 years ago Source-to-Source Debuggable Derivatives in Pure Python
autograd 7,049 about 1 month ago Efficiently computes derivatives of numpy code
Caffe 34,149 6 months ago A fast open framework for deep learning
nnabla 2,729 2 months ago Neural Network Libraries by Sony

Awesome Python Data Science / Automated Machine Learning

auto-sklearn 7,667 about 2 months ago An AutoML toolkit and a drop-in replacement for a scikit-learn estimator
Auto-PyTorch 2,385 9 months ago Automatic architecture search and hyperparameter optimization for PyTorch
AutoKeras 9,172 about 1 month ago AutoML library for deep learning
AutoGluon 8,167 about 1 month ago AutoML for Image, Text, Tabular, Time-Series, and MultiModal Data
TPOT 9,776 6 months ago AutoML tool that optimizes machine learning pipelines using genetic programming
MLBox 1,500 over 1 year ago A powerful Automated Machine Learning python library

Awesome Python Data Science / Natural Language Processing

torchtext 3,524 about 1 month ago Data loaders and abstractions for text and NLP
gluon-nlp 2,560 over 1 year ago NLP made easy
KerasNLP 818 about 1 month ago Modular Natural Language Processing workflows with Keras
spaCy Industrial-Strength Natural Language Processing
NLTK 13,694 2 months ago Modules, data sets, and tutorials supporting research and development in Natural Language Processing
CLTK 843 about 2 months ago The Classical Language Toolkik
gensim Topic Modelling for Humans
pyMorfologik 18 over 9 years ago Python binding for
skift 233 over 2 years ago Scikit-learn wrappers for Python fastText
Phonemizer 1,249 4 months ago Simple text-to-phonemes converter for multiple languages
flair 13,990 about 1 month ago Very simple framework for state-of-the-art NLP

Awesome Python Data Science / Computer Audition

torchaudio 2,561 about 1 month ago An audio library for PyTorch
librosa 7,237 about 2 months ago Python library for audio and music analysis
Yaafe 244 over 3 years ago Audio features extraction
aubio 3,336 6 months ago A library for audio and music analysis
Essentia 2,889 3 months ago Library for audio and music analysis, description, and synthesis
LibXtract 227 almost 5 years ago A simple, portable, lightweight library of audio feature extraction functions
Marsyas 407 over 1 year ago Music Analysis, Retrieval, and Synthesis for Audio Signals
muda 233 over 3 years ago A library for augmenting annotated audio data
madmom 1,366 5 months ago Python audio and music signal processing library

Awesome Python Data Science / Computer Vision

torchvision 16,364 about 1 month ago Datasets, Transforms, and Models specific to Computer Vision
PyTorch3D 8,889 about 2 months ago PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
gluon-cv 5,850 about 2 months ago Provides implementations of the state-of-the-art deep learning models in computer vision
KerasCV 1,013 about 2 months ago Industry-strength Computer Vision workflows with Keras
OpenCV 79,662 about 1 month ago Open Source Computer Vision Library
Decord 1,923 6 months ago An efficient video loader for deep learning with smart shuffling that's super easy to digest
MMEngine 1,196 2 months ago OpenMMLab Foundational Library for Training Deep Learning Models
scikit-image 6,117 about 1 month ago Image Processing SciKit (Toolbox for SciPy)
imgaug 14,458 6 months ago Image augmentation for machine learning experiments
imgaug_extension Additional augmentations for imgaug
Augmentor 5,084 10 months ago Image augmentation library in Python for machine learning
albumentations 14,386 about 1 month ago Fast image augmentation library and easy-to-use wrapper around other libraries
LAVIS 10,058 2 months ago A One-stop Library for Language-Vision Intelligence

Awesome Python Data Science / Time Series

sktime 8,020 about 1 month ago A unified framework for machine learning with time series
skforecast 1,189 about 1 month ago Time series forecasting with machine learning models
darts 8,166 about 1 month ago A python library for easy manipulation and forecasting of time series
statsforecast 4,045 about 1 month ago Lightning fast forecasting with statistical and econometric models
mlforecast 924 about 1 month ago Scalable machine learning-based time series forecasting
neuralforecast 3,181 about 1 month ago Scalable machine learning-based time series forecasting
tslearn 2,924 7 months ago Machine learning toolkit dedicated to time-series data
tick 495 about 2 months ago Module for statistical learning, with a particular emphasis on time-dependent modeling
greykite 1,815 7 months ago A flexible, intuitive, and fast forecasting library next
Prophet 18,627 3 months ago Automatic Forecasting Procedure
PyFlux 2,114 about 1 year ago Open source time series library for Python
bayesloop 156 9 months ago Probabilistic programming framework that facilitates objective model selection for time-varying parameter models
luminol 1,193 over 1 year ago Anomaly Detection and Correlation library
dateutil Powerful extensions to the standard datetime module
maya 3,414 6 months ago makes it very easy to parse a string and for changing timezones
Chaos Genius 744 4 months ago ML powered analytics engine for outlier/anomaly detection and root cause analysis

Awesome Python Data Science / Reinforcement Learning

Gymnasium 7,613 about 1 month ago An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly )
PettingZoo 2,678 about 1 month ago An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities
MAgent2 240 2 months ago An engine for high performance multi-agent environments with very large numbers of agents, along with a set of reference environments
Stable Baselines3 9,329 about 2 months ago A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines
Shimmy 143 3 months ago An API conversion tool for popular external reinforcement learning environments
EnvPool 1,108 5 months ago C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments
RLlib Scalable Reinforcement Learning
Tianshou 8,069 about 1 month ago An elegant PyTorch deep reinforcement learning library
Acme 3,542 3 months ago A library of reinforcement learning components and agents
Catalyst-RL 46 over 3 years ago PyTorch framework for RL research
d3rlpy 1,349 about 2 months ago An offline deep reinforcement learning library
DI-engine 3,143 about 1 month ago OpenDILab Decision AI Engine
TF-Agents 2,816 about 1 month ago A library for Reinforcement Learning in TensorFlow
TensorForce 3,299 6 months ago A TensorFlow library for applied reinforcement learning
TRFL 3,136 about 2 years ago TensorFlow Reinforcement Learning
Dopamine 10,591 2 months ago A research framework for fast prototyping of reinforcement learning algorithms
keras-rl 5,530 over 1 year ago Deep Reinforcement Learning for Keras
garage 1,893 over 1 year ago A toolkit for reproducible reinforcement learning research
Horizon 3,575 about 2 months ago A platform for Applied Reinforcement Learning
rlpyt 2,236 about 4 years ago Reinforcement Learning in PyTorch
cleanrl 5,891 2 months ago High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Machin 402 over 3 years ago A reinforcement library designed for pytorch
SKRL 588 about 1 month ago Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Isaac Orbit and Omniverse Isaac Gym
Imitation 1,350 5 months ago Clean PyTorch implementations of imitation and reward learning algorithms

Awesome Python Data Science / Graph Machine Learning

pytorch_geometric 21,597 about 1 month ago Geometric Deep Learning Extension Library for PyTorch
pytorch_geometric_temporal 2,694 3 months ago Temporal Extension Library for PyTorch Geometric
PyTorch Geometric Signed Directed 131 6 months ago A signed/directed graph neural network extension library for PyTorch Geometric
dgl 13,601 3 months ago Python package built to ease deep learning on graph, on top of existing DL frameworks
Spektral 2,372 12 months ago Deep learning on graphs
StellarGraph 2,957 9 months ago Machine Learning on Graphs
Graph Nets 5,370 about 2 years ago Build Graph Nets in Tensorflow
TensorFlow GNN 1,372 about 1 month ago A library to build Graph Neural Networks on the TensorFlow platform
Auto Graph Learning 1,094 5 months ago -An autoML framework & toolkit for machine learning on graphs
PyTorch-BigGraph 3,389 11 months ago Generate embeddings from large-scale graph-structured data
Auto Graph Learning 1,094 5 months ago An autoML framework & toolkit for machine learning on graphs
Karate Club 2,178 6 months ago An unsupervised machine learning library for graph-structured data
Little Ball of Fur 705 12 months ago A library for sampling graph structured data
GreatX 85 3 months ago A graph reliability toolbox based on PyTorch and PyTorch Geometric (PyG)
Jraph 1,380 10 months ago A Graph Neural Network Library in Jax

Awesome Python Data Science / Learning-to-Rank & Recommender Systems

LightFM 4,790 6 months ago A Python implementation of LightFM, a hybrid recommendation algorithm
Spotlight Deep recommender models using PyTorch
Surprise 6,434 7 months ago A Python scikit for building and analyzing recommender systems
RecBole 3,497 5 months ago A unified, comprehensive and efficient recommendation library
allRank 886 5 months ago allRank is a framework for training learning-to-rank neural models based on PyTorch
TensorFlow Recommenders 1,869 about 1 month ago A library for building recommender system models using TensorFlow
TensorFlow Ranking 2,750 10 months ago Learning to Rank in TensorFlow

Awesome Python Data Science / Probabilistic Graphical Models

pomegranate 3,389 3 months ago Probabilistic and graphical models for Python
pgmpy 2,776 about 1 month ago A python library for working with Probabilistic Graphical Models
pyAgrum A GRaphical Universal Modeler

Awesome Python Data Science / Probabilistic Methods

pyro 8,604 about 2 months ago A flexible, scalable deep probabilistic programming library built on PyTorch
PyMC 8,786 about 1 month ago Bayesian Stochastic Modelling in Python
ZhuSuan Bayesian Deep Learning
GPflow Gaussian processes in TensorFlow
InferPy 149 6 months ago Deep Probabilistic Modelling Made Easy
PyStan 343 7 months ago Bayesian inference using the No-U-Turn sampler (Python interface)
sklearn-bayes 514 over 3 years ago Python package for Bayesian Machine Learning with scikit-learn API
skpro 250 about 1 month ago Supervised domain-agnostic prediction framework for probabilistic modelling by
PyVarInf 359 over 5 years ago Bayesian Deep Learning methods with Variational Inference for PyTorch
emcee 1,478 about 1 month ago The Python ensemble sampling toolkit for affine-invariant MCMC
hsmmlearn 81 over 3 years ago A library for hidden semi-Markov models with explicit durations
pyhsmm 549 about 2 years ago Bayesian inference in HSMMs and HMMs
GPyTorch 3,605 about 1 month ago A highly efficient and modular implementation of Gaussian Processes in PyTorch
sklearn-crfsuite 425 over 1 year ago A scikit-learn-inspired API for CRFsuite

Awesome Python Data Science / Model Explanation

dalex 1,390 4 months ago moDel Agnostic Language for Exploration and explanation
Shapley 219 over 1 year ago A data-driven framework to quantify the value of classifiers in a machine learning ensemble
Alibi 2,421 about 1 month ago Algorithms for monitoring and explaining machine learning models
anchor 798 over 2 years ago Code for "High-Precision Model-Agnostic Explanations" paper
aequitas 701 4 months ago Bias and Fairness Audit Toolkit
Contrastive Explanation 45 almost 2 years ago Contrastive Explanation (Foil Trees)
yellowbrick 4,304 4 months ago Visual analysis and diagnostic tools to facilitate machine learning model selection
scikit-plot 2,432 5 months ago An intuitive library to add plotting functionality to scikit-learn objects
shap 23,077 about 1 month ago A unified approach to explain the output of any machine learning model
ELI5 2,763 over 2 years ago A library for debugging/inspecting machine learning classifiers and explaining their predictions
Lime 11,663 6 months ago Explaining the predictions of any machine learning classifier
FairML 361 over 3 years ago FairML is a python toolbox auditing the machine learning models for bias
L2X 123 over 3 years ago Code for replicating the experiments in the paper
PDPbox 846 5 months ago Partial dependence plot toolbox
PyCEbox 164 over 4 years ago Python Individual Conditional Expectation Plot Toolbox
Skater Python Library for Model Interpretation
model-analysis 1,258 about 2 months ago Model analysis tools for TensorFlow
themis-ml 125 about 4 years ago A library that implements fairness-aware machine learning algorithms
treeinterpreter 745 over 1 year ago Interpreting scikit-learn's decision tree and random forest predictions
AI Explainability 360 1,641 6 months ago Interpretability and explainability of data and machine learning models
Auralisation 42 almost 8 years ago Auralisation of learned features in CNN (for audio)
CapsNet-Visualization 394 over 3 years ago A visualization of the CapsNet layers to better understand how it works
lucid 4,678 almost 2 years ago A collection of infrastructure and tools for research in neural network interpretability
Netron 28,684 about 1 month ago Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks)
FlashLight Visualization Tool for your NeuralNetwork
tensorboard-pytorch 7,887 about 1 month ago Tensorboard for PyTorch (and chainer, mxnet, numpy, ...)

Awesome Python Data Science / Genetic Programming

gplearn 1,636 about 1 year ago Genetic Programming in Python
PyGAD 1,905 about 1 month ago Genetic Algorithm in Python
DEAP 5,891 about 2 months ago Distributed Evolutionary Algorithms in Python
karoo_gp 161 about 2 years ago A Genetic Programming platform for Python with GPU support
monkeys 122 over 6 years ago A strongly-typed genetic programming framework for Python
sklearn-genetic 323 12 months ago Genetic feature selection module for scikit-learn

Awesome Python Data Science / Optimization

Optuna 11,082 about 1 month ago A hyperparameter optimization framework
pymoo 2,333 about 2 months ago Multi-objective Optimization in Python
pycma 1,123 3 months ago Python implementation of CMA-ES
Spearmint 1,550 about 5 years ago Bayesian optimization
BoTorch 3,126 about 1 month ago Bayesian optimization in PyTorch
scikit-opt 5,316 7 months ago Heuristic Algorithms for optimization
sklearn-genetic-opt 316 3 months ago Hyperparameters tuning and feature selection using evolutionary algorithms
SMAC3 1,093 about 1 month ago Sequential Model-based Algorithm Configuration
Optunity 417 about 1 year ago Is a library containing various optimizers for hyperparameter tuning
hyperopt 7,295 3 months ago Distributed Asynchronous Hyperparameter Optimization in Python
hyperopt-sklearn 1,594 7 months ago Hyper-parameter optimization for sklearn
sklearn-deap 771 11 months ago Use evolutionary algorithms instead of gridsearch in scikit-learn
sigopt_sklearn 75 over 1 year ago SigOpt wrappers for scikit-learn methods
Bayesian Optimization 7,978 about 1 month ago A Python implementation of global optimization with gaussian processes
SafeOpt 141 about 2 years ago Safe Bayesian Optimization
scikit-optimize 2,748 11 months ago Sequential model-based optimization with a interface
Solid 575 over 5 years ago A comprehensive gradient-free optimization framework written in Python
PySwarms 1,295 5 months ago A research toolkit for particle swarm optimization in Python
Platypus 579 3 months ago A Free and Open Source Python Library for Multiobjective Optimization
GPflowOpt 270 about 4 years ago Bayesian Optimization using GPflow
POT 2,454 about 1 month ago Python Optimal Transport library
Talos 1,626 9 months ago Hyperparameter Optimization for Keras Models
nlopt 1,908 about 1 month ago Library for nonlinear optimization (global and local, constrained or unconstrained)
OR-Tools An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi

Awesome Python Data Science / Feature Engineering / General

Featuretools 7,304 about 1 month ago Automated feature engineering
Feature Engine 1,956 2 months ago Feature engineering package with sklearn-like functionality
OpenFE 806 8 months ago Automated feature generation with expert-level performance
skl-groups 41 over 8 years ago A scikit-learn addon to operate on set/"group"-based features
Feature Forge 382 about 7 years ago A set of tools for creating and testing machine learning features
few 51 over 4 years ago A feature engineering wrapper for sklearn
scikit-mdr 126 over 1 year ago A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction
tsfresh 8,486 about 2 months ago Automatic extraction of relevant features from time series
dirty_cat 17 about 1 month ago Machine learning on dirty tabular data (especially: string-based variables for classifcation and regression)
NitroFE 106 over 2 years ago Moving window features
sk-transformer 10 about 2 months ago A collection of various pandas & scikit-learn compatible transformers for all kinds of preprocessing and feature engineering steps

Awesome Python Data Science / Feature Engineering / Feature Selection

scikit-feature 1,513 6 months ago Feature selection repository in Python
boruta_py 1,529 5 months ago Implementations of the Boruta all-relevant feature selection method
BoostARoota 219 almost 4 years ago A fast xgboost feature selection algorithm
scikit-rebate 413 almost 2 years ago A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning
zoofs 245 about 2 months ago A feature selection library based on evolutionary algorithms

Awesome Python Data Science / Visualization / General Purposes

Matplotlib 20,443 about 1 month ago Plotting with Python
seaborn 12,669 about 1 month ago Statistical data visualization using matplotlib
prettyplotlib 1,695 almost 6 years ago Painlessly create beautiful matplotlib plots
python-ternary 744 7 months ago Ternary plotting library for Python with matplotlib
missingno 3,987 8 months ago Missing data visualization module for Python
chartify 3,546 3 months ago Python library that makes it easy for data scientists to create charts
physt 134 3 months ago Improved histograms

Awesome Python Data Science / Visualization / Interactive plots

animatplot 412 5 months ago A python package for animating plots built on matplotlib
plotly A Python library that makes interactive and publication-quality graphs
Bokeh 19,453 about 1 month ago Interactive Web Plotting for Python
Altair Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
bqplot 3,634 about 1 month ago Plotting library for IPython/Jupyter notebooks
pyecharts 14,975 2 months ago Migrated from , a charting and visualization library, to Python's interactive visual drawing library

Awesome Python Data Science / Visualization / Map

folium Makes it easy to visualize data on an interactive open street map
geemap 3,515 about 1 month ago Python package for interactive mapping with Google Earth Engine (GEE)

Awesome Python Data Science / Visualization / Automatic Plotting

HoloViews 2,719 about 1 month ago Stop plotting your data - annotate your data and let it visualize itself
AutoViz 1,749 7 months ago : Visualize data automatically with 1 line of code (ideal for machine learning)
SweetViz 2,965 5 months ago : Visualize and compare datasets, target values and associations, with one line of code

Awesome Python Data Science / Visualization / NLP

pyLDAvis 1,810 6 months ago : Visualize interactive topic model

Awesome Python Data Science / Deployment

fastapi Modern, fast (high-performance), a web framework for building APIs with Python
streamlit Make it easy to deploy the machine learning model
streamsync 1,340 about 1 month ago No-code in the front, Python in the back. An open-source framework for creating data apps
gradio 34,557 about 1 month ago Create UIs for your machine learning model in Python in 3 minutes
Vizro 2,736 about 1 month ago A toolkit for creating modular data visualization applications
datapane A collection of APIs to turn scripts and notebooks into interactive reports
binder Enable sharing and execute Jupyter Notebooks

Awesome Python Data Science / Statistics

pandas_summary 510 3 months ago Extension to pandas dataframes describe function
Pandas Profiling 12,602 about 1 month ago Create HTML profiling reports from pandas DataFrame objects
statsmodels 10,245 about 1 month ago Statistical modeling and econometrics in Python
stockstats 1,312 about 1 year ago Supply a wrapper based on the with inline stock statistics/indicators support
weightedcalcs 107 2 months ago A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more
scikit-posthocs 354 about 1 month ago Pairwise Multiple Comparisons Post-hoc Tests
Alphalens 3,433 11 months ago Performance analysis of predictive (alpha) stock factors

Awesome Python Data Science / Data Manipulation / Data Frames

pandas Powerful Python data analysis toolkit
polars 30,943 about 1 month ago A fast multi-threaded, hybrid-out-of-core DataFrame library
Arctic 3,059 9 months ago High-performance datastore for time series and tick data
datatable 1,821 3 months ago Data.table for Python
pandas_profiling 12,602 about 1 month ago Create HTML profiling reports from pandas DataFrame objects
cuDF 8,534 about 1 month ago GPU DataFrame Library
blaze 3,185 over 1 year ago NumPy and pandas interface to Big Data
pandasql 1,345 6 months ago Allows you to query pandas DataFrames using SQL syntax
pandas-gbq 451 about 1 month ago pandas Google Big Query
xpandas 26 over 2 years ago Universal 1d/2d data containers with Transformers .functionality for data analysis by
pysparkling 262 5 months ago A pure Python implementation of Apache Spark's RDD and DStream interfaces
modin 9,942 about 2 months ago Speed up your pandas workflows by changing a single line of code
swifter 2,552 10 months ago A package that efficiently applies any function to a pandas dataframe or series in the fastest available manner
pandas-log 214 over 3 years ago A package that allows providing feedback about basic pandas operations and finds both business logic and performance issues
vaex 8,315 3 months ago Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second
xarray 3,660 about 1 month ago Xarray combines the best features of NumPy and pandas for multidimensional data selection by supplementing numerical axis labels with named dimensions for more intuitive, concise, and less error-prone indexing routines

Awesome Python Data Science / Data Manipulation / Pipelines

pdpipe 718 3 months ago Sasy pipelines for pandas DataFrames
SSPipe Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch
pandas-ply 199 over 9 years ago Functional data manipulation for pandas
Dplython 764 about 8 years ago Dplyr for Python
sklearn-pandas 2,815 over 1 year ago pandas integration with sklearn
Dataset 202 about 2 months ago Helps you conveniently work with random or sequential batches of your data and define data processing
pyjanitor 1,371 about 1 month ago Clean APIs for data cleaning
meza 417 6 months ago A Python toolkit for processing tabular data
Prodmodel 58 over 2 years ago Build system for data science pipelines
dopanda 475 about 2 months ago Hints and tips for using pandas in an analysis environment
Hamilton 1,900 about 1 month ago A microframework for dataframe generation that applies Directed Acyclic Graphs specified by a flow of lazily evaluated Python functions

Awesome Python Data Science / Data Manipulation / Data-centric AI

cleanlab 9,820 about 1 month ago The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels
snorkel 5,826 9 months ago A system for quickly generating training data with weak supervision
dataprep 2,088 7 months ago Collect, clean, and visualize your data in Python with a few lines of code

Awesome Python Data Science / Data Manipulation / Synthetic Data

ydata-synthetic 1,456 about 1 month ago A package to generate synthetic tabular and time-series data leveraging the state-of-the-art generative models

Awesome Python Data Science / Distributed Computing

Horovod 14,305 about 1 month ago Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet
PySpark Exposes the Spark programming model to Python
Veles 905 about 1 year ago Distributed machine learning platform
Jubatus 707 over 5 years ago Framework and Library for Distributed Online Machine Learning
DMTK 2,748 over 6 years ago Microsoft Distributed Machine Learning Toolkit
PaddlePaddle 22,340 about 1 month ago PArallel Distributed Deep LEarning
dask-ml 907 about 2 months ago Distributed and parallel machine learning
Distributed 1,582 about 1 month ago Distributed computation in Python

Awesome Python Data Science / Experimentation

mlflow 19,021 about 1 month ago Open source platform for the machine learning lifecycle
Neptune A lightweight ML experiment tracking, results visualization, and management tool
dvc 14,016 about 1 month ago Data Version Control | Git for Data & Models | ML Experiments Management
envd 2,061 3 months ago 🏕️ machine learning development environment for data science and AI/ML engineering teams
Sacred 4,266 about 2 months ago A tool to help you configure, organize, log, and reproduce experiments
Ax 2,392 about 1 month ago Adaptive Experimentation Platform

Awesome Python Data Science / Data Validation

great_expectations 10,054 about 1 month ago Always know what to expect from your data
pandera 3,472 about 1 month ago A lightweight, flexible, and expressive statistical data testing library
deepchecks 3,650 about 1 month ago Validation & testing of ML models and data during model development, deployment, and production
evidently 5,519 about 1 month ago Evaluate and monitor ML models from validation to production
TensorFlow Data Validation 766 about 2 months ago Library for exploring and validating machine learning data
DataComPy 487 about 1 month ago A library to compare Pandas, Polars, and Spark data frames. It provides stats and lets users adjust for match accuracy

Awesome Python Data Science / Evaluation

recmetrics 571 about 1 year ago Library of useful metrics and plots for evaluating recommender systems
Metrics 1,632 about 2 years ago Machine learning evaluation metric
sklearn-evaluation 3 about 2 years ago Model evaluation made easy: plots, tables, and markdown reports
AI Fairness 360 2,483 about 1 month ago Fairness metrics for datasets and ML models, explanations, and algorithms to mitigate bias in datasets and models

Awesome Python Data Science / Computations

numpy The fundamental package needed for scientific computing with Python
Dask 12,691 about 1 month ago Parallel computing with task scheduling
bottleneck 1,077 3 months ago Fast NumPy array functions written in C
CuPy 9,586 about 1 month ago NumPy-like API accelerated with CUDA
scikit-tensor 403 over 6 years ago Python library for multilinear algebra and tensor factorizations
numdifftools 258 over 1 year ago Solve automatic numerical differentiation problems in one or more variables
quaternion 614 3 months ago Add built-in support for quaternions to numpy
adaptive 1,168 about 1 month ago Tools for adaptive and parallel samping of mathematical functions
NumExpr 2,255 about 2 months ago A fast numerical expression evaluator for NumPy that comes with an integrated computing virtual machine to speed calculations up by avoiding memory allocation for intermediate results

Awesome Python Data Science / Web Scraping

BeautifulSoup : The easiest library to scrape static websites for beginners
Scrapy : Fast and extensible scraping library. Can write rules and create customized scraper without touching the core
Selenium : Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user
Pattern 8,758 7 months ago : High level scraping for well-establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization
twitterscraper 2,414 over 2 years ago : Efficient library to scrape Twitter

Awesome Python Data Science / Spatial Analysis

GeoPandas 4,559 about 1 month ago Python tools for geographic data
PySal 1,346 2 months ago Python Spatial Analysis Library

Awesome Python Data Science / Quantum Computing

qiskit 5,404 about 1 month ago Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules
cirq 4,347 about 1 month ago A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits
PennyLane 2,409 about 1 month ago Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations
QML 199 about 1 month ago A Python Toolkit for Quantum Machine Learning

Awesome Python Data Science / Conversion

sklearn-porter 1,294 7 months ago Transpile trained scikit-learn estimators to C, Java, JavaScript, and others
ONNX 18,098 about 1 month ago Open Neural Network Exchange
MMdnn 5,802 8 months ago A set of tools to help users inter-operate among different deep learning frameworks
treelite 742 about 2 months ago Universal model exchange and serialization format for decision tree forests

Backlinks from these awesome lists:

More related projects: