awesome-production-machine-learning

ML deployment toolkit

A curated collection of tools and libraries for deploying, monitoring, and maintaining machine learning models in production environments.

A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

GitHub

18k stars

407 watching

2k forks

last commit: over 1 year ago

Linked from 7 awesome lists

awesomeawesome-listdata-miningdeep-learningexplainabilityinterpretabilitylarge-scale-machine-learninglarge-scale-mlmachine-learningmachine-learning-operationsml-operationsml-opsmlopsprivacy-preservingprivacy-preserving-machine-learningprivacy-preserving-mlproduction-machine-learningproduction-mlresponsible-ai

Screenshot of EthicalML/awesome-production-machine-learning website

ethicalml.github.io/awesome-production-machine-learning

Awesome Production Machine Learning / 10 Min Video Overview
10 minute video			This provides an overview of the motivations for machine learning operations as well as a high level overview on some of the tools in this repo. This covers the an updated 2024 version of the state of MLOps

Awesome Production Machine Learning / Want to receive recurrent updates on this repo and other advancements?
Machine Learning Engineer			You can join the newsletter. Join over 10,000 ML professionals and enthusiasts who receive weekly curated articles & tutorials on production Machine Learning

Awesome Artificial Intelligence Regulation	1,277	over 1 year ago	Also check out the List, where we aim to map the landscape of "Frameworks", "Codes of Ethics", "Guidelines", "Regulations", etc related to Artificial Intelligence
EthicalML/awesome-artificial-intelligence-guidelines	1,277	over 1 year ago
Main Content / Adversarial Robustness
AdvBox	1,389	about 3 years ago	A toolbox to generate adversarial examples that fool neural networks in PaddlePaddle, PyTorch, Caffe2, MxNet, Keras, TensorFlow, and Advbox can benchmark the robustness of machine learning models
Adversarial DNN Playground	130	almost 3 years ago	think , but for Adversarial Examples! A visualization tool designed for learning and teaching - the attack library is limited in size, but it has a nice front-end to it with buttons you can press!
AdverTorch	1,311	over 2 years ago	library for adversarial attacks / defenses specifically for PyTorch
ART	4,945	about 1 year ago	ART (Adversarial Robustness Toolbox) provides tools that enable developers and researchers to defend and evaluate Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference
Artificial Adversary	399	about 4 years ago	AirBnB's library to generate text that reads the same to a human but passes adversarial classifiers
Counterfit	818	over 2 years ago	Counterfit is a command-line tool and generic automation layer for assessing the security of machine learning systems
Factool	839	over 1 year ago	Factool is a tool augmented framework for detecting factual errors of texts generated by large language models
Foolbox	2,798	almost 2 years ago	Foolbox is a Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX
MIA	139	over 3 years ago	A library for running membership inference attacks (MIA) against machine learning models
NeMo Guardrails	4,263	about 1 year ago	NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems
OpenAttack	699	over 2 years ago	OpenAttack is a Python-based textual adversarial attack toolkit, which handles the whole process of textual adversarial attacking, including preprocessing text, accessing the victim model, generating adversarial examples and evaluation
Main Content / Agentic Workflow
Agents	4,210	about 1 year ago	Agents allows users to build AI-driven server programs that can see, hear, and speak in realtime
AgentScope	5,501	about 1 year ago	AgentScope is a multi-agent platform designed to empower developers to build multi-agent applications with large-scale models
AutoGen	35,795	about 1 year ago	AutoGen is an open-source framework for building AI agent systems
Chidori	1,282	over 1 year ago	Chidori is a reactive runtime that supports building robust AI agents using languages like Node.js, Python, and Rust, with a focus on reactivity and observability in agent workflows
CrewAI	22,550	about 1 year ago	CrewAI is a cutting-edge framework for orchestrating role-playing, autonomous AI agents
LangGraph	7,250	about 1 year ago	LangGraph is a library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows
Modelscope-Agent	2,779	over 1 year ago	Modelscope-Agent is a customizable and scalable agent framework
OpenAGI	1,992	over 1 year ago	OpenAGI is used as the agent creation package to build agents for AIOS
Swarm	16,939	over 1 year ago	Swarm is an educational framework exploring ergonomic, lightweight multi-agent orchestration
Swarms	1,904	about 1 year ago	Swarms is an enterprise grade and production ready multi-agent collaboration framework that enables you to orchestrate many agents to work collaboratively at scale to automate real-world activities
Main Content / AutoML
AutoGluon	8,167	over 1 year ago	Automated feature, model, and hyperparameter selection for tabular, image, and text data on top of popular machine learning libraries (Scikit-Learn, LightGBM, CatBoost, PyTorch, MXNet)
Autokeras	9,172	about 1 year ago	AutoML library for Keras based on
auto-sklearn	7,667	over 1 year ago	Framework to automate algorithm and hyperparameter tuning for sklearn
Feature Engine	1,956	over 1 year ago	Feature-engine is a Python library that contains several transformers to engineer features for use in machine learning models
Featuretools	7,304	about 1 year ago	An open source framework for automated feature engineering
FLAML	3,968	about 1 year ago	FLAML is a fast library for automated machine learning & tuning
go-featureprocessing	121	over 1 year ago	A feature pre-processing framework in Go that matches functionality of sklearn
HEBO	3,306	over 1 year ago	Set of open-source hyperparameter optimization frameworks, including the winning submission to the tested on hyperparameter tuning tasks
Katib	1,521	over 1 year ago	A Kubernetes-based system for Hyperparameter Tuning and Neural Architecture Search
keras-tuner	2,860	over 1 year ago	Keras Tuner is an easy-to-use, distributable hyperparameter optimisation framework that solves the pain points of performing a hyperparameter search. Keras Tuner makes it easy to define a search space and leverage included algorithms to find the best hyperparameter values
Neural Architecture Search with Controller RNN	433	over 4 years ago	Basic implementation of Controller RNN from and
Neural Network Intelligence	14,076	over 1 year ago	NNI (Neural Network Intelligence) is a toolkit to help users run automated machine learning (AutoML) experiments
Optuna	11,082	about 1 year ago	Optuna is an automatic hyperparameter optimisation software framework, particularly designed for machine learning
OSS Vizier	1,494	about 1 year ago	OSS Vizier is a Python-based service for black-box optimisation and research, one of the first hyperparameter tuning services designed to work at scale
sklearn-deap	771	about 2 years ago	Use evolutionary algorithms instead of gridsearch in scikit-learn
TPOT	9,776	over 1 year ago	Automation of sklearn pipeline creation (including feature selection, pre-processor, etc.)
tsfresh	8,486	over 1 year ago	Automatic extraction of relevant features from time series
Upgini	321	over 1 year ago	Free automated data & feature enrichment library for machine learning: automatically searches through thousands of ready-to-use features from public and community shared data sources and enriches your training dataset with only the accuracy improving features
Main Content / Computation Load Distribution
Apache Beam	7,911	about 1 year ago	Apache Beam is a unified programming model for Batch and Streaming
Bagua	876	over 1 year ago	Bagua is a performant and flexible distributed training framework for PyTorch, providing a faster alternative to PyTorch DDP and Horovod. It supports advanced distributed training algorithms such as quantization and decentralization
Colossal-AI	38,907	over 1 year ago	A unified deep learning system for big model era, which helps users to efficiently and quickly deploy large AI model training and inference
Dask	12,691	about 1 year ago	Distributed parallel processing framework for Pandas and NumPy computations -
DEAP	5,891	over 1 year ago	A novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data structures transparent. It works in perfect harmony with parallelisation mechanisms such as multiprocessing and SCOOP
DeepSpeed	35,863	about 1 year ago	A deep learning optimization library (lightweight PyTorch wrapper) that makes distributed training easy, efficient, and effective
DLRover	1,302	about 1 year ago	DLRover makes the distributed training of large AI models easy, stable, fast and green
einops	8,574	over 1 year ago	Flexible and powerful tensor operations for readable and reliable code
Fiber	1,042	almost 3 years ago	Distributed computing library for modern computer clusters from Uber
Flashlight	5,300	over 1 year ago	A fast, flexible machine learning library written entirely in C++ from the Facebook AI Research and the creators of Torch, TensorFlow, Eigen and Deep Speech
Hivemind	2,078	about 1 year ago	Decentralized deep learning in PyTorch
Horovod	14,305	about 1 year ago	Uber's distributed training framework for TensorFlow, Keras, and PyTorch
Liger Kernel	3,840	about 1 year ago	Liger Kernel is a collection of Triton kernels designed specifically for LLM training
LightGBM	16,769	about 1 year ago	LightGBM is a gradient boosting framework that uses tree based learning algorithms
PaddlePaddle	22,340	about 1 year ago	PaddlePaddle is a framework to perform large-scale deep network training, using data sources distributed across hundreds of nodes
PyTorch Lightning	28,636	about 1 year ago	PyTorch Lightning pretrains, finetunes and deploys AI models on multiple GPUs, TPUs with zero code changes
PyWren	848	about 2 years ago	Answer the question of the "cloud button" for python function execution. It's a framework that abstracts AWS Lambda to enable data scientists to execute any Python function -
Ray	34,412	about 1 year ago	Ray is a flexible, high-performance distributed execution framework for machine learning ( )
TensorFlowOnSpark	3,875	over 2 years ago	TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters
Vespa	5,881	about 1 year ago	Vespa is an engine for low-latency computation over large data sets
Main Content / Data Labelling and Synthesis
Argilla	4,126	about 1 year ago	Argilla helps domain experts and data teams to build better NLP datasets in less time
Baal	876	over 1 year ago	Baal is an active learning library that supports both industrial applications and research usecases
brat rapid annotation tool	1,831	over 1 year ago	Web-based text annotation tool for Named-Entity-Recogntion task
cleanlab	9,820	over 1 year ago	Python library for data-centric AI. Can automatically: find mislabeled data, detect outliers, estimate consensus + annotator-quality for multi-annotator datasets, suggest which data is best to (re)label next
COCO Annotator	2,125	over 1 year ago	Web-based image segmentation tool for object detection, localization and keypoints
CVAT	12,821	about 1 year ago	CVAT (Computer Vision Annotation Tool) is OpenCV's web-based annotation tool for both videos and images for computer algorithms
Doccano	9,645	over 1 year ago	Open source text annotation tools for humans, providing functionality for sentiment analysis, named entity recognition, and machine translation
Gretel Synthetics	602	over 1 year ago	Gretel Synthetics is a synthetic data generators for structured and unstructured text, featuring differentially private learning
ImageTagger	267	over 1 year ago	Image labelling tool with support for collaboration, supporting bounding box, polygon, line, point labelling, label export, etc
ImgLab	989	about 2 years ago	Image annotation tool for bounding boxes with auto-suggestion and extensibility for plugins
Label Studio	19,798	about 1 year ago	Multi-domain data labeling and annotation tool with standardized output format
makesense.ai	3,195	over 1 year ago	Free to use online tool for labelling photos. Prepared labels can be downloaded in one of multiple supported formats
MedTagger	119	over 3 years ago	A collaborative framework for annotating medical datasets using crowdsourcing
modAL	2,239	about 2 years ago	modAL is an active learning framework designed with modularity, flexibility and extensibility in mind
NeMo Curator	672	about 1 year ago	NeMo Curator is a GPU-accelerated framework for efficient large language model data curation
OpenLabeling	932	over 3 years ago	Open source tool for labelling images with support for labels, edges, as well as image resizing and zooming in
PixelAnnotationTool	1,410	over 3 years ago	Image annotation tool with ability to "colour" on the images to select labels for segmentation. Process is semi-automated with the
refinery	1,405	over 1 year ago	The data scientist's open-source choice to scale, assess and maintain natural language data
Rubrix	4,126	about 1 year ago	Open-source tool for tracking, exploring, and labeling data for AI projects
SDV	2,416	about 1 year ago	Synthetic Data Vault (SDV) is a Synthetic Data Generation ecosystem of libraries that allows users to easily learn single-table, multi-table and timeseries datasets to later on generate new Synthetic Data that has the same format and statistical properties as the original dataset
Semantic Segmentation Editor	1,827	over 1 year ago	Hitachi's Open source tool for labelling camera and LIDAR data
Snorkel	5,826	almost 2 years ago	Snorkel is a system for quickly generating training data with weak supervision
Superintendent	189	over 3 years ago	superintendent provides an ipywidget-based interactive labelling tool for your data
YData Synthetic	1,456	over 1 year ago	YData Synthetic is a package to generate synthetic tabular and time-series data leveraging the state of the art generative models
Main Content / Data Pipeline
Apache Airflow	37,580	about 1 year ago	Data Pipeline framework built in Python, including scheduler, DAG definition and a UI for visualisation
Apache Nifi	4,955	about 1 year ago	Apache NiFi was made for dataflow. It supports highly configurable directed graphs of data routing, transformation, and system mediation logic
Apache Oozie	717	over 1 year ago	Workflow scheduler for Hadoop jobs
Argo Workflows	15,155	about 1 year ago	Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition)
Azkaban	4,481	over 1 year ago	Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs. Azkaban resolves the ordering through job dependencies and provides an easy to use web user interface to maintain and track your workflows
BatchFlow	202	over 1 year ago	BatchFlow helps data scientists conveniently work with random or sequential batches of your data and define data processing and machine learning workflows for large datasets
Bonobo	1,589	almost 3 years ago	ETL framework for Python 3.5+ with focus on simple atomic operations working concurrently on rows of data
Chronos	4,389	over 3 years ago	More of a job scheduler for Mesos than ETL pipeline
Couler	919	over 1 year ago	Unified interface for constructing and managing machine learning workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow
DataTrove	2,103	over 1 year ago	DataTrove is a library to process, filter and deduplicate text data at a very large scale
D6tflow	953	over 2 years ago	A python library that allows for building complex data science workflows on Python
DALL·E Flow	2,837	almost 3 years ago	DALL·E Flow is an interactive workflow for generating high-definition images from text prompt
Dagster	12,055	about 1 year ago	A data orchestrator for machine learning, analytics, and ETL
DBND	260	over 1 year ago	DBND is an agile pipeline framework that helps data engineering teams track and orchestrate their data processes
DBT	10,071	about 1 year ago	ETL tool for running transformations inside data warehouses
Flyte	5,850	about 1 year ago	Lyft’s Cloud Native Machine Learning and Data Processing Platform -
Genie	1,723	over 1 year ago	Job orchestration engine to interface and trigger the execution of jobs from Hadoop-based systems
Gokart	319	over 1 year ago	Wrapper of the data pipeline Luigi
Hamilton	1,900	about 1 year ago	Hamilton is a micro-orchestration framework for defining dataflows. Runs anywhere python runs (e.g. jupyter, fastAPI, spark, ray, dask). Brings software engineering best practices without you knowing it. Use it to define feature engineering transforms, end-to-end model pipelines, and LLM workflows. It complements macro-orchestration systems (e.g. kedro, luigi, airflow, dbt, etc.) as it replaces the code within those macro tasks. Comes with a self-hostable UI that captures lineage & provenance, execution telemetry & data summaries, and builds a self-populating catalog; usable in development as well as production
Instill VDP	2,181	about 1 year ago	Instill VDP (Versatile Data Pipeline) aims to streamline the data processing pipelines from inception to completion
Instructor	8,551	about 1 year ago	Instructor makes it easy to get structured data like JSON from LLMs like GPT-3.5, GPT-4, GPT-4-Vision, and open-source models
Kedro	10,050	over 1 year ago	Kedro is a workflow development tool that helps you build data pipelines that are robust, scalable, deployable, reproducible and versioned
Luigi	17,950	over 1 year ago	Luigi is a Python module that helps you build complex pipelines of batch jobs, handling dependency resolution, workflow management, visualisation, etc
Metaflow	8,341	about 1 year ago	A framework for data scientists to easily build and manage real-life data science projects
Neuraxle	610	almost 3 years ago	A framework for building neat pipelines, providing the right abstractions to chain your data transformation and prediction steps with data streaming, as well as doing hyperparameter searches (AutoML)
Pachyderm	6,191	about 1 year ago	Open source distributed processing framework build on Kubernetes focused mainly on dynamic building of production machine learning pipelines -
PipelineX	226	over 2 years ago	Based on Kedro and MLflow. Full comparison is found
Ploomber	3,530	over 1 year ago	The fastest way to build data pipelines. Develop iteratively, deploy anywhere
Prefect Core	17,771	about 1 year ago	Workflow management system that makes it easy to take your data pipelines and add semantics like retries, logging, dynamic mapping, caching, failure notifications, and more
Snakemake	2,327	about 1 year ago	Workflow management system for reproducible and scalable data analyses
Sycamore	401	about 1 year ago	Sycamore is an open source, AI-powered document processing engine for ETL, RAG, LLM-based applications, and analytics on unstructured data
Towhee	3,255	over 1 year ago	General-purpose machine learning pipeline for generating embedding vectors using one or many ML models
unstructured	9,452	about 1 year ago	unstructured streamlines and optimizes the data processing workflow for LLMs, ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more
ZenML	4,261	about 1 year ago	ZenML is an extensible, open-source MLOps framework to create reproducible ML pipelines with a focus on automated metadata tracking, caching, and many integrations to other tools
Main Content / DS Notebook
Apache Zeppelin	6,422	over 1 year ago	Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more
H2O Flow	134	over 1 year ago	Jupyter notebook-like interface for H2O to create, save and re-use "flows"
Jupyter Notebooks	11,823	over 1 year ago	Web interface python sandbox environments for reproducible development
ML Workspace	3,446	over 1 year ago	All-in-one web IDE for machine learning and data science. Combines Jupyter, VS Code, Tensorflow, and many other tools/libraries into one Docker image
.NET Interactive	2,931	over 1 year ago	.NET Interactive takes the power of .NET and embeds it into your interactive experiences
Papermill	6,029	over 1 year ago	Papermill is a library for parameterizing notebooks and executing them like Python scripts
Polynote	4,542	over 1 year ago	Polynote is an experimental polyglot notebook environment. Currently, it supports Scala and Python (with or without Spark), SQL, and Vega
RMarkdown	2,890	over 1 year ago	The rmarkdown package is a next generation implementation of R Markdown based on Pandoc
Stencila	803	about 1 year ago	Stencila is a platform for creating, collaborating on, and sharing data driven content. Content that is transparent and reproducible
Voilà	5,508	over 1 year ago	Voilà turns Jupyter notebooks into standalone web applications that can e.g. be used as dashboards
Main Content / Data Storage Optimisation
AIStore	1,315	about 1 year ago	AIStore is a lightweight object storage system with the capability to linearly scale out with each added storage node and a special focus on petascale deep learning
Alluxio	6,880	over 1 year ago	A virtual distributed storage system that bridges the gab between computation frameworks and storage systems
Apache Arrow	14,728	about 1 year ago	In-memory columnar representation of data compatible with Pandas, Hadoop-based systems, etc
Apache Druid	13,548	about 1 year ago	A high performance real-time analytics database. Check this for introduction
Apache Hudi	5,498	about 1 year ago	Hudi is a transactional data lake platform that brings core warehouse and database functionality directly to a data lake. Hudi is great for streaming workloads, and also allows creation of efficient incremental batch pipelines. Supports popular query engines including Spark, Flink, Presto, Trino, Hive, etc. More info
Apache Iceberg	6,621	about 1 year ago	Iceberg is an ACID-compliant, high-performance format built for huge analytic tables (containing tens of petabytes of data), and it brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. More info
Apache Ignite	4,834	about 1 year ago	A memory-centric distributed database, caching, and processing platform for transactional, analytical, and streaming workloads delivering in-memory speeds at petabyte scale -
Apache Parquet	2,665	over 1 year ago	On-disk columnar representation of data compatible with Pandas, Hadoop-based systems, etc
Apache Pinot	5,562	about 1 year ago	A realtime distributed OLAP datastore. Comparison of the open source OLAP systems for big data: ClickHouse, Druid, and Pinot is found
BayesDB	923	over 2 years ago	A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself. -
Casibase	2,885	over 1 year ago	Casibase is a LangChain-like RAG (Retrieval-Augmented Generation) knowledge database with web UI and Enterprise SSO
Chroma	15,865	about 1 year ago	BayesDB is an AI-native embedding database
ClickHouse	38,076	about 1 year ago	ClickHouse is an open source column oriented database management system
Delta Lake	7,677	about 1 year ago	Delta Lake is a storage layer that brings scalable, ACID transactions to Apache Spark and other big-data engines
EdgeDB	13,207	about 1 year ago	NoSQL interface for Postgres that allows for object interaction to data stored
GPTCache	7,293	over 1 year ago	GPTCache is a library for creating semantic cache for large language model queries
HopsFS	309	over 1 year ago	HDFS-compatible file system with scale-out strongly consistent metadata
InfluxDB	29,126	about 1 year ago	Scalable datastore for metrics, events, and real-time analytics
Milvus	31,283	about 1 year ago	Milvus is a cloud-native, open-source vector database built to manage embedding vectors generated by machine learning models and neural networks
Marqo	4,679	about 1 year ago	Marqo is an end-to-end vector search engine
pgvector	13,027	over 1 year ago	pgvector helps with vector similarity search for Postgres
PostgresML	6,070	over 1 year ago	PostgresML is a machine learning extension for PostgreSQL that enables you to perform training and inference on text and tabular data using SQL queries
Safetensors	2,953	over 1 year ago	Simple, safe way to store and distribute tensors
TimescaleDB	18,066	about 1 year ago	An open-source time-series SQL database optimized for fast ingest and complex queries packaged as a PostgreSQL extension -
Weaviate	11,812	about 1 year ago	A low-latency vector search engine (GraphQL, RESTful) with out-of-the-box support for different media types. Modules include Semantic Search, Q&A, Classification, Customizable Models (PyTorch/TensorFlow/Keras), and more
Zarr	1,549	about 1 year ago	Python implementation of chunked, compressed, N-dimensional arrays designed for use in parallel computing
Main Content / Data Stream Processing
Apache Flink	24,261	about 1 year ago	Open source stream processing framework with powerful stream and batch processing capabilities
Apache Kafka	29,060	about 1 year ago	Kafka client library for buliding applications and microservices where the input and output are stored in kafka clusters
Apache Samza	817	over 1 year ago	Distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management
Apache Spark	40,170	about 1 year ago	Micro-batch processing for streams using the apache spark framework as a backend supporting stateful exactly-once semantics
Brooklin	931	almost 2 years ago	Distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management
Bytewax	1,585	over 1 year ago	Flexible Python-centric stateful stream processing framework built on top of Rust engine
FastStream	3,241	about 1 year ago	A modern broker-agnostic streaming Python framework supporting Apache Kafka, RabbitMQ and NATS protocols, inspired by FastAPI and easily integratable with other web frameworks
Faust	6,751	over 1 year ago	Streaming library built on top of Python's Asyncio library using the async kafka client inspired by the kafka streaming library
TensorStore	1,362	about 1 year ago	Library for reading and writing large multi-dimensional arrays
RobustBench	682	over 1 year ago	another robustness resource maintained by some of the leading names in adversarial ML. They specifically focus on defenses, and onesa standardized adversarial robustness benchmark
Main Content / Deployment and Serving
AirLLM	5,446	over 1 year ago	AirLLM optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning
Apache PredictionIO	12,541	about 5 years ago	An open source Machine Learning Server built on top of a state-of-the-art open source stack for developers and data scientists to create predictive engines for any machine learning task
Backprop	243	almost 5 years ago	Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models
BentoML	7,222	about 1 year ago	BentoML is an open source framework for high performance ML model serving
Cortex	8,026	over 1 year ago	Cortex is an open source platform for deploying machine learning models—trained with any framework—as production web services. No DevOps required
DeepDetect	2,520	over 1 year ago	Machine Learning production server for TensorFlow, XGBoost and Cafe models written in C++ and maintained by Jolibrain
DeepSparse	3,052	over 1 year ago	DeepSparse is a sparsity-aware deep learning inference runtime for CPUs
exo	17,369	about 1 year ago	exo helps you run your AI cluster at home with everyday devices
Hydrosphere Serving	271	over 1 year ago	Hydrosphere Serving is a cluster for deploying and versioning your machine learning models in production
Intel® Extension for Transformers	2,145	over 1 year ago	An Innovative Transformer-based Toolkit to Accelerate GenAI/LLM Everywhere
Inference	1,401	about 1 year ago	A fast, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models. With Inference, you can deploy models such as YOLOv5, YOLOv8, CLIP, SAM, and CogVLM on your own hardware using Docker
Infinity	1,586	over 1 year ago	Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip
IPEX-LLM	6,801	about 1 year ago	IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency
Jina	21,180	over 1 year ago	Jina builds multimodal AI services and pipelines that communicate via gRPC, HTTP, and WebSockets, then scales them up and deploys to production
KsanaLLM	295	over 1 year ago	KsanaLLM is a high performance and easy-to-use engine for LLM inference and serving
KServe	3,716	about 1 year ago	KServe provides a Kubernetes Custom Resource Definition for serving predictive and generative ML
KTransformers	771	over 1 year ago	KTransformers is a flexible framework for experiencing cutting-edge LLM inference optimizations
Lepton AI	2,669	about 1 year ago	LeptonAI Python library allows you to build an AI service from Python code with ease
LightLLM	2,691	about 1 year ago	LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance
LocalAI	27,060	about 1 year ago	LocalAI is a drop-in replacement REST API that's compatible with OpenAI API specifications for local inferencing
m2cgen	2,826	over 1 year ago	A lightweight library which allows to transpile trained classic machine learning models into a native code of C, Java, Go, R, PHP, Dart, Haskell, Rust and many other programming languages
MindsDB	26,915	about 1 year ago	MindsDB is the platform to create, serve, and fine-tune models in real-time from your database, vector store, and application data
MLRun	1,458	about 1 year ago	MLRun is an open MLOps framework for quickly building and managing continuous ML and generative AI applications across their lifecycle
MLServer	737	about 1 year ago	An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
Mosec	802	about 1 year ago	A rust-powered and multi-stage pipelined model server which offers dynamic batching and more. Super easy to implement and deploy as micro-services
Nuclio	5,339	about 1 year ago	A high-performance "serverless" framework focused on data, I/O, and compute-intensive workloads. It is well integrated with popular data science tools, such as Jupyter and Kubeflow; supports a variety of data and streaming sources; and supports execution over CPUs and GPUs
OpenDiT	1,819	over 1 year ago	OpenDiT is an open-source project that provides a high-performance implementation of Diffusion Transformer(DiT), specifically designed to enhance the efficiency of training and inference for DiT applications, including text-to-video generation and text-to-image generation
OpenLLM	10,234	about 1 year ago	OpenLLM allows developers to run any open-source LLMs (Llama 3.1, Qwen2, Phi3 and more) or custom models as OpenAI-compatible APIs with a single command
OpenScoring	580	over 1 year ago	REST web service for the true real-time scoring (< 1 ms) of Scikit-Learn, R and Apache Spark models
OpenVINO	7,439	about 1 year ago	OpenVINO is an open-source toolkit for optimizing and deploying AI inference
PowerInfer	8,011	over 1 year ago	PowerInfer is a CPU/GPU LLM inference engine leveraging activation locality for your device
Prompt2Model	1,975	almost 2 years ago	Prompt2Model is a system that takes a natural language task description (like the prompts used for LLMs such as ChatGPT) to train a small special-purpose model that is conducive for deployment
Redis-AI	828	almost 2 years ago	A Redis module for serving tensors and executing deep learning models. Expect changes in the API and internals
Seldon Core	4,409	about 1 year ago	Open source platform for deploying and monitoring machine learning models in Kubernetes -
SkyPilot	6,905	about 1 year ago	SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution
skops	457	about 1 year ago	skops is a Python library helping you share your scikit-learn based models and put them in production
SparseML	2,083	over 1 year ago	SparseML is an open-source model optimization toolkit that enables you to create inference-optimized sparse models using pruning, quantization, and distillation algorithms
S-LoRA	1,766	about 2 years ago	Serving Thousands of Concurrent LoRA Adapters
Tempo	117	almost 4 years ago	Open source SDK that provides a unified interface to multiple MLOps projects that enable data scientists to deploy and productionise machine learning systems
Tensorflow Serving	6,195	about 1 year ago	High-performant framework to serve Tensorflow models via grpc protocol able to handle 100k requests per second per core
text-generation-inference	9,456	about 1 year ago	Large Language Model Text Generation Inference
TorchServe	4,238	over 1 year ago	TorchServe is a flexible and easy to use tool for serving PyTorch models
Triton Inference Server	8,460	about 1 year ago	Triton is a high performance open source serving software to deploy AI models from any framework on GPU & CPU while maximizing utilization
UnionML	336	over 2 years ago	UnionML is an open source MLOps framework that aims to reduce the boilerplate and friction that comes with building models and deploying them to production
Vercel AI	10,554	about 1 year ago	Vercel AI is a TypeScript toolkit designed to help you build AI-powered applications using popular frameworks like Next.js, React, Svelte, Vue and runtimes like Node.js
vLLM	31,982	about 1 year ago	vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs
Main Content / Evaluation and Monitoring
AlpacaEval	1,568	over 1 year ago	AlpacaEval is an automatic evaluator for instruction-following language models
ARES	499	over 1 year ago	ARES is a framework for automatically evaluating Retrieval-Augmented Generation (RAG) models
AutoML Benchmark	413	about 1 year ago	AutoML Benchmark is a framework for evaluating and comparing open-source AutoML systems
Banana-lyzer	274	over 1 year ago	Banana-lyzer is an open-source AI Agent evaluation framework and dataset for web tasks with Playwright
Code Generation LM Evaluation Harness	846	over 1 year ago	Code Generation LM Evaluation Harness is a framework for the evaluation of code generation models
continuous-eval	455	over 1 year ago	continuous-eval is a framework for data-driven evaluation of LLM-powered applications
Deepchecks	3,650	about 1 year ago	Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling you to test your data and models from research to production thoroughly
DeepEval	4,003	about 1 year ago	DeepEval is a simple-to-use, open-source evaluation framework for LLM applications
EvalAI	1,779	over 1 year ago	EvalAI is an open-source platform for evaluating and comparing AI algorithms at scale
Evals	15,168	over 1 year ago	Evals is a framework for evaluating OpenAI models and an open-source registry of benchmarks
EvalScope	308	about 1 year ago	EvalScope is a streamlined and customizable framework for efficient large model evaluation and performance benchmarking
Evaluate	2,063	over 1 year ago	Evaluate is a library that makes evaluating and comparing models and reporting their performance easier and more standardized
Evalverse	221	over 1 year ago	Evalverse is a framework to effortlessly evaluate and report LLMs with no-code requests and comprehensive reports
Evidently	5,519	about 1 year ago	Evidently is an open-source framework to evaluate, test and monitor ML and LLM-powered systems
FlagEval	307	over 1 year ago	FlagEval is an open-source evaluation toolkit as well as an open platform for evaluation of large models
FMBench	210	about 1 year ago	FMBench is a tool for running performance benchmarks for any Foundation Model (FM) deployed on any AWS Generative AI service, be it Amazon SageMaker, Amazon Bedrock, Amazon EKS, or Amazon EC2
Giskard	4,125	about 1 year ago	Giskard is an evaluation & testing framework for LLMs & ML models
HarmBench	366	over 1 year ago	HarmBench is a fast and scalable framework for evaluating automated red teaming methods and LLM attacks/defenses
Helicone	2,163	about 1 year ago	Helicone is an observability platform for LLMs
HELM	1,981	about 1 year ago	HELM (Holistic Evaluation of Language Models) provides tools for the holistic evaluation of language models, including standardized datasets, a unified API for various models, diverse metrics, robustness, and fairness perturbations, a prompt construction framework, and a proxy server for unified model access
Inspect	669	about 1 year ago	Inspect is a framework for large language model evaluations
InterCode	198	almost 2 years ago	InterCode is a lightweight, flexible, and easy-to-use framework for designing interactive code environments to evaluate language agents that can code
Langfuse	7,123	about 1 year ago	Langfuse is an observability & analytics solution for LLM-based applications
LangTest	506	about 1 year ago	LangTest is a comprehensive evaluation toolkit for NLP models
Language Model Evaluation Harness	7,200	about 1 year ago	Language Model Evaluation Harness is a framework to test generative language models on a large number of different evaluation tasks
LightEval	879	about 1 year ago	LightEval is a lightweight LLM evaluation suite
LLMonitor	1,108	about 1 year ago	LLMonitor is an observability & analytics for AI apps and agents
LLMPerf	678	over 1 year ago	LLMPerf is a tool for evaluating the performance of LLM APIs
LLM AutoEval	566	almost 2 years ago	LLM AutoEval simplifies the process of evaluating LLMs using a convenient Colab notebook
lmms-eval	2,164	about 1 year ago	lmms-eval is an evaluation framework meticulously crafted for consistent and efficient evaluation of LMM
MLPerf Inference	1,256	about 1 year ago	MLPerf Inference is a benchmark suite for measuring how fast systems can run models in a variety of deployment scenarios
mltrace	468	over 3 years ago	mltrace is a lightweight, open-source Python tool to get "bolt-on" observability in ML pipelines
MTEB	2,021	about 1 year ago	Massive Text Embedding Benchmark (MTEB) is a comprehensive benchmark of text embeddings
NannyML	1,998	over 1 year ago	NannyML is a library that allows you to estimate post-deployment model performance (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance
OLMo-Eval	326	over 1 year ago	OLMo-Eval is an evaluation suite for evaluating open language models
OpenCompass	4,295	about 1 year ago	OpenCompass is an LLM evaluation platform, supporting a wide range of models (LLaMA, LLaMa2, ChatGLM2, ChatGPT, Claude, etc) over 50+ datasets
Opik	2,588	about 1 year ago	Opik is an open-source platform for evaluating, testing and monitoring LLM applications
Optimum-Benchmark	274	about 1 year ago	A unified multi-backend utility for benchmarking Transformers and Diffusers with support for Optimum's arsenal of hardware optimizations/quantization schemes
PhaseLLM	451	over 1 year ago	PhaseLLM is a large language model evaluation and workflow framework
Phoenix	4,271	about 1 year ago	Phoenix is an open-source AI observability platform designed for experimentation, evaluation, and troubleshooting
PromptBench	2,487	over 1 year ago	PromptBench is a unified evaluation framework for large language models
Prometheus-Eval	820	over 1 year ago	Prometheus-Eval is a collection of tools for training, evaluating, and using language models specialized in evaluating other language models
Ragas	7,598	about 1 year ago	Ragas is a framework to evaluate RAG pipelines
RAGChecker			RAGChecker is an advanced automatic evaluation framework designed to assess and diagnose Retrieval-Augmented Generation (RAG) systems
Rageval	141	over 1 year ago	Rageval is a tool to evaluate RAG system
RefChecker	325	over 1 year ago	RefChecker provides a standardized assessment framework to identify subtle hallucinations present in the outputs of large language models (LLMs)
RewardBench	459	over 1 year ago	RewardBench is a benchmark designed to evaluate the capabilities and safety of reward models
TensorFlow Model Analysis	1,258	over 1 year ago	TensorFlow Model Analysis (TFMA) is a library for evaluating TensorFlow models on large amounts of data in a distributed manner, using the same metrics defined in their trainer
Tonic Validate	271	over 1 year ago	Tonic Validate is a high-performance evaluation framework for LLM/RAG outputs
TruLens	2,233	about 1 year ago	TruLens provides a set of tools for evaluating and tracking LLM experiments
TrustLLM	491	over 1 year ago	TrustLLM is a comprehensive framework to evaluate the trustworthiness of large language models, which includes principles, surveys, and benchmarks
UpTrain	2,218	over 1 year ago	UpTrain is an open-source tool for evaluating LLM applications
VBench	643	about 1 year ago	VBench is a comprehensive benchmark suite for video generative models
VLMEvalKit	1,514	about 1 year ago	VLMEvalKit is an open-source evaluation toolkit of large vision-language models (LVLMs)
Main Content / Explainability and Fairness
Aequitas	701	over 1 year ago	An open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias, and to make informed and equitable decisions around developing and deploying predictive risk-assessment tools
AI Explainability 360	1,641	over 1 year ago	Interpretability and explainability of data and machine learning models including a comprehensive set of algorithms that cover different dimensions of explanations along with proxy explainability metrics
AI Fairness 360	2,483	over 1 year ago	A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models
Alibi	2,421	about 1 year ago	Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The initial focus on the library is on black-box, instance based model explanations
anchor	798	over 3 years ago	Code for the paper , a model-agnostic system that explains the behaviour of complex models with high-precision rules called anchors
captum	4,982	about 1 year ago	model interpretability and understanding library for PyTorch developed by Facebook. It contains general purpose implementations of integrated gradients, saliency maps, smoothgrad, vargrad and others for PyTorch models
DeepLIFT	837	almost 4 years ago	Codebase that contains the methods in the paper . Here is the and the of the 15 minute talk given at ICML
DeepVis Toolbox	4,025	about 6 years ago	This is the code required to run the Deep Visualization Toolbox, as well as to generate the neuron-by-neuron visualizations using regularized optimisation. The toolbox and methods are described casually and more formally in this
ELI5	2,763	almost 4 years ago	"Explain Like I'm 5" is a Python package which helps to debug machine learning classifiers and explain their predictions
FACETS	7,357	almost 3 years ago	Facets contains two robust visualizations to aid in understanding and analyzing machine learning datasets. Get a sense of the shape of each feature of your dataset using Facets Overview, or explore individual observations using Facets Dive
Fairlearn	1,974	about 1 year ago	Fairlearn is a python toolkit to assess and mitigate unfairness in machine learning models
FairML	361	almost 5 years ago	FairML is a python toolbox auditing the machine learning models for bias
Fairness Comparison	159	about 3 years ago	This repository is meant to facilitate the benchmarking of fairness aware machine learning algorithms based on
Fairness Indicators	343	about 1 year ago	The tool supports teams in evaluating, improving, and comparing models for fairness concerns in partnership with the broader Tensorflow toolkit
iNNvestigate	1,271	about 2 years ago	An open-source library for analyzing Keras models visually by methods such as , , , and
Integrated-Gradients	604	about 4 years ago	This repository provides code for implementing integrated gradients for networks with image inputs
InterpretML	6,324	about 1 year ago	InterpretML is an open-source package for training interpretable models and explaining blackbox systems
keras-vis	2,985	about 4 years ago	keras-vis is a high-level toolkit for visualizing and debugging your trained keras neural net models. Currently supported visualizations include: Activation maximization, Saliency maps, Class activation maps
Lightly	3,204	about 1 year ago	A python framework for self-supervised learning on images. The learned representations can be used to analyze the distribution in unlabeled data and rebalance datasets
Lightwood	453	over 1 year ago	A Pytorch based framework that breaks down machine learning problems into smaller blocks that can be glued together seamlessly with an objective to build predictive models with one line of code
LIME	11,663	over 1 year ago	Local Interpretable Model-agnostic Explanations for machine learning models
LOFO Importance	821	about 2 years ago	LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice, for a model of choice, by iteratively removing each feature from the set, and evaluating the performance of the model, with a validation scheme of choice, based on the chosen metric
mljar-supervised	3,081	over 1 year ago	A Python package for AutoML on tabular data with feature engineering, hyper-parameters tuning, explanations and automatic documentation
SHAP	23,077	about 1 year ago	SHapley Additive exPlanations is a unified approach to explain the output of any machine learning model
SHAPash	2,749	over 1 year ago	Shapash is a Python library that provides several types of visualization that display explicit labels that everyone can understand
themis-ml	125	over 5 years ago	themis-ml is a Python library built on top of pandas and sklearn that implements fairness-aware machine learning algorithms
Themis	103	over 5 years ago	Themis is a testing-based approach for measuring discrimination in a software system
Transformer Debugger	4,047	almost 2 years ago	Transformer Debugger (TDB) is a tool developed by OpenAI's Superalignment team with the goal of supporting investigations into specific behaviors of small language models
TreeInterpreter	745	over 2 years ago	Package for interpreting scikit-learn's decision tree and random forest predictions. Allows decomposing each prediction into bias and feature contribution components as described
WhatIf	928	over 1 year ago	An easy-to-use interface for expanding understanding of a black-box classification or regression ML model
woe	256	over 6 years ago	Tools for WoE Transformation mostly used in ScoreCard Model for credit rating
Main Content / Feature Store
Butterfree	288	over 1 year ago	A tool for building feature stores which allows you to transform your raw data into beautiful features
FEAST	5,669	about 1 year ago	Feast (Feature Store) is an open source feature store for machine learning. Feast is the fastest path to manage existing infrastructure to productionize analytic data for model training and online inference
Feathr	1,985	almost 2 years ago	A scalable, unified data and AI engineering platform for enterprise
Featureform	1,822	about 1 year ago	A virtual featurestore. Plug-&-play with your existing infra. Data Scientist approved. Discovery, Governance, Lineage, & Collaboration just a pip install away. Supports pandas, Python, spark, SQL + integrations with major cloud vendors
Hopsworks Feature Store	1,177	over 1 year ago	Offline/Online Feature Store for ML
Main Content / Industry-strength AD
adtk	1,108	over 1 year ago	A Python toolkit for rule-based/unsupervised anomaly detection in time series
Alibi Detect	2,262	about 1 year ago	alibi-detect is a Python package focused on outlier, adversarial and concept drift detection
Darts	8,166	about 1 year ago	Darts is a library for user-friendly forecasting and anomaly detection on time series
Deequ	3,324	over 1 year ago	A library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets
Deep Anomaly Detection with Outlier Exposure	548	over 4 years ago	Outlier Exposure (OE) is a method for improving anomaly detection performance in deep learning models
PyOD	8,653	over 1 year ago	A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)
SUOD	382	about 2 years ago	SUOD (Scalable Unsupervised Outlier Detection) is an acceleration system for large-scale anomaly/outlier detection
TextAttack	3,015	over 1 year ago	TextAttack is a Python framework for adversarial attacks, data augmentation, and model training in NLP
TFDV	766	over 1 year ago	TFDV (Tensorflow Data Validation) is a library for exploring and validating machine learning data
TODS	1,484	over 2 years ago	TODS is a full-stack automated machine learning system for outlier detection on multivariate time-series data
Main Content / Industry Strength CV
Deep Lake	8,237	about 1 year ago	Deep Lake is a data infrastructure optimized for computer vision
Detectron2	30,778	over 1 year ago	Detectron2 is Facebook AI Research's next generation library that provides state-of-the-art detection and segmentation algorithms
iGibson	676	over 1 year ago	iGibson is a simulation environment providing fast visual rendering and physics simulation based on Bullet
JDiffusion	243	over 1 year ago	JDiffusion is a diffusion model library for generating images or videos based on Diffusers and Jittor
KerasCV	1,013	over 1 year ago	KerasCV is a library of modular computer vision oriented Keras components
LAVIS	10,058	over 1 year ago	LAVIS is a deep learning library for LAnguage-and-VISion intelligence research and applications
libcom	558	about 1 year ago	libcom is an image composition toolbox
MMDetection	29,808	over 1 year ago	MMDetection is an open source object detection toolbox based on PyTorch
SCEPTER	438	over 1 year ago	SCEPTER is an open-source code repository dedicated to generative training, fine-tuning, and inference, encompassing a suite of downstream tasks such as image generation, transfer, editing
SuperGradients	4,625	over 1 year ago	SuperGradients is an open-source library for training PyTorch-based computer vision models
supervision	24,444	about 1 year ago	Supervision is a Python library designed for efficient computer vision pipeline management, providing tools for annotation, visualization, and monitoring of models
VideoSys	1,819	over 1 year ago	VideoSys supports many diffusion models with our various acceleration techniques, enabling these models to run faster and consume less memory
VISSL	3,260	about 2 years ago	VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images
Main Content / Industry Strength NLP
aisuite	7,661	over 1 year ago	aisuite is a simple, unified interface to multiple generative AI providers
Align-Anything	270	about 1 year ago	Align-Anything aims to align any modality large models (any-to-any models), including LLMs, VLMs, and others, with human intentions and values
Blackstone	641	over 1 year ago	Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project from the Incorporated Council of Law Reporting for England and Wales' research lab, ICLR&D
BERTopic	6,246	about 1 year ago	BERTopic is a topic modeling technique that leverages transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions
Burr	1,368	about 1 year ago	Burr helps you develop applications that make decisions (chatbot, agent, simulation). It comes with production-ready features (telemetry, persistence, deployment, etc.) and the open-source, free, and local-first Burr UI
Coqui STT	2,302	about 2 years ago	Coqui STT is a fast, open-source, multi-platform, deep-learning toolkit for training and deploying speech-to-text models
CodeTF	1,461	almost 2 years ago	CodeTF is a one-stop Python transformer-based library for code large language models (Code LLMs) and code intelligence, provides a seamless interface for training and inferencing on code intelligence tasks like code summarization, translation, code generation and so on
CTRL	1,872	over 4 years ago	A Conditional Transformer Language Model for Controllable Generation released by SalesForce
dspy	20,235	about 1 year ago	A framework for programming with foundation models
Dust	994	about 1 year ago	Dust assists in the design and deployment of large language model apps
ESPnet	8,596	about 1 year ago	ESPnet is an end-to-end speech processing toolkit
Facebook's XLM	2,893	about 3 years ago	PyTorch original implementation of Cross-lingual Language Model Pretraining which includes BERT, XLM, NMT, XNLI, PKM, etc
FastChat	37,269	over 1 year ago	FastChat is an open platform for training, serving, and evaluating large language model based chatbots
Flair	13,990	about 1 year ago	Simple framework for state-of-the-art NLP developed by Zalando which builds directly on PyTorch
FlexGen	9,236	over 1 year ago	FlexGen is a high-throughput generation engine for running large language models with limited GPU memory
Gensim	15,735	over 1 year ago	Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora
GluonNLP	2,560	over 2 years ago	GluonNLP is a toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your Natural Language Processing (NLP) research
Grover	918	almost 3 years ago	Grover is a model for Neural Fake News -- both generation and detection. However, it probably can also be used for other generation tasks
h2oGPT	11,491	over 1 year ago	h2oGPT is an open source generative AI, gives organizations like yours the power to own large language models while preserving your data ownership
Haystack	18,094	about 1 year ago	Haystack is an open source NLP framework to interact with your data using Transformer models and LLMs (GPT-3 and alike). Haystack offers production-ready tools to quickly build ChatGPT-like question answering, semantic search, text generation, and more
Interactive Composition Explorer	537	over 1 year ago	ICE is a Python library and trace visualizer for language model programs
Kashgari	2,395	over 1 year ago	Kashgari is a simple and powerful NLP Transfer learning framework, build a state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS), and text classification tasks
Lamini	2,519	about 1 year ago	Lamini is an LLM engine for rapidly customizing models
LangChain	96,146	about 1 year ago	LangChain assists in building applications with LLMs through composability
LlamaIndex	37,371	about 1 year ago	LlamaIndex (GPT Index) is a data framework for your LLM application
LLaMA	56,832	over 1 year ago	LLaMA is intended as a minimal, hackable and readable example to load LLaMA (arXiv) models and run inference
LLaMA2-Accessory	2,732	almost 2 years ago	LLaMA2-Accessory is an open-source toolkit for pretraining, finetuning and deployment of Large Language Models (LLMs) and multimodal LLMs
LMFlow	8,312	about 1 year ago	LMFlow is an extensible, convenient, and efficient toolbox for finetuning large machine learning models
Megatron-LM	10,804	about 1 year ago	Megatron-LM is a highly optimized and efficient library for training large language models
MLC LLM	19,396	about 1 year ago	MLC LLM is a universal solution that allows any language models to be deployed natively on a diverse set of hardware backends and native applications, plus a productive framework for everyone to further optimize model performance for their own use cases
Ollama	102,617	about 1 year ago	Get up and running with large language models, locally
PaddleNLP	12,224	about 1 year ago	PaddleNLP is a Large Language Model (LLM) development suite based on the PaddlePaddle deep learning framework, supporting efficient large model training, lossless compression, and high-performance inference on various hardware devices
Semantic Kernel	22,277	about 1 year ago	Semantic Kernel is an SDK that integrates Large Language Models (LLMs) like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C#, Python, and Java. Semantic Kernel achieves this by allowing you to define plugins that can be chained together in just a few lines of code
sense2vec	1,630	almost 2 years ago	A Pytorch library that allows for training and using sense2vec models, which are models that leverage the same approach than word2vec, but also leverage part-of-speech attributes for each token, which allows it to be "meaning-aware"
Sentence Transformers	15,556	about 1 year ago	Sentence Transformers provides an easy method to compute dense vector representations for sentences, paragraphs, and images
SpaCy	30,459	about 1 year ago	spaCy is a library for advanced Natural Language Processing in Python and Cython
SWIFT	4,659	about 1 year ago	SWIFT is a scalable lightweight infrastructure for deep learning model fine-tuning
Tensorflow Lingvo	2,820	over 1 year ago	A for building neural networks in Tensorflow, particularly sequence models
Tensorflow Text	1,239	about 1 year ago	TensorFlow Text provides a collection of text related classes and ops ready to use with TensorFlow 2.0
Transformers	136,357	about 1 year ago	Huggingface's library of state-of-the-art pretrained models for Natural Language Processing (NLP)
trlX	4,537	about 2 years ago	trlX is a distributed training framework designed from the ground up to focus on fine-tuning large language models with reinforcement learning using either a provided reward function or a reward-labeled dataset
Main Content / Industry Strength RecSys
EasyRec	1,814	over 1 year ago	EasyRec is a framework for large scale recommendation algorithms
Gorse	8,652	over 1 year ago	Gorse aims to be a universal open-source recommender system that can be quickly introduced into a wide variety of online services
Implicit	3,581	over 1 year ago	Implicit provides fast Python implementations of several different popular recommendation algorithms for implicit feedback datasets
LightFM	4,790	over 1 year ago	LightFM is a Python implementation of a number of popular recommendation algorithms for both implicit and explicit feedback
NVTabular	1,057	over 1 year ago	NVTabular is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems
Merlin	787	over 1 year ago	NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production
Recommenders	19,418	over 1 year ago	Recommenders contains benchmark and best practices for building recommendation systems, provided as Jupyter notebooks
Surprise	6,434	over 1 year ago	Surprise is a Python scikit for building and analyzing recommender systems that deal with explicit rating data
YouTokenToMe	959	almost 2 years ago	YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast (BPE)
Main Content / Industry Strength RL
Acme	3,542	over 1 year ago	Acme is a library of reinforcement learning (RL) building blocks that strives to expose simple, efficient, and readable agents
AI-Optimizer	4,848	over 2 years ago	AI-Optimizer is a next-generation deep reinforcement learning suit, providing rich algorithm libraries ranging from model-free to model-based RL algorithms, from single-agent to multi-agent algorithms. Moreover, AI-Optimizer contains a flexible and easy-to-use distributed training framework for efficient policy training
ALF	306	about 1 year ago	ALF is a reinforcement learning framework emphasizing on the flexibility and easiness of implementing complex algorithms involving many different components
AlpacaFarm	786	over 1 year ago	AlpacaFarm is a simulation framework for methods that learn from human feedback
CityLearn	480	over 1 year ago	CityLearn is an open source OpenAI Gym environment for the implementation of Multi-Agent Reinforcement Learning (RL) for building energy coordination and demand response in cities
CleanRL	5,891	over 1 year ago	CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features. The implementation is clean and simple, yet we can scale it to run thousands of experiments using AWS Batch
CompilerGym	917	over 1 year ago	CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks
d3rlpy	1,349	over 1 year ago	d3rlpy is an offline deep reinforcement learning library for practitioners and researchers
D4RL	1,371	over 1 year ago	D4RL is an open-source benchmark for offline reinforcement learning
DIAMBRA	317	almost 2 years ago	DIAMBRA Arena is a software package featuring a collection of high-quality environments for Reinforcement Learning research and experimentation
Dopamine	10,591	over 1 year ago	Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grokked codebase in which users can freely experiment with wild ideas (speculative research)
EvoTorch	1,026	over 1 year ago	EvoTorch is an open source evolutionary computation library developed at NNAISENSE, built on top of PyTorch
FinRL	10,240	about 1 year ago	FinRL is the first open-source framework to demonstrate the great potential of financial reinforcement learning
garage	1,893	almost 3 years ago	garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementations built using that toolkit
Gymnasium	7,613	about 1 year ago	Gymnasium is an open source Python library for developing and comparing reinforcement learning algorithms by providing a standard API to communicate between learning algorithms and environments, as well as a standard set of environments compliant with that API
Gymnasium-Robotics	585	over 1 year ago	Gymnasium-Robotics contains a collection of Reinforcement Learning robotic environments that use the Gymansium API. The environments run with the MuJoCo physics engine and the maintained mujoco python bindings
Jumanji	657	over 1 year ago	Jumanji is a suite of Reinforcement Learning (RL) environments written in JAX providing clean, hardware-accelerated environments for industry-driven research
MALib	507	about 2 years ago	MALib is a parallel framework of population-based learning nested with reinforcement learning methods. MALib provides higher-level abstractions of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms
MARLlib	960	over 1 year ago	MARLlib is a comprehensive Multi-Agent Reinforcement Learning algorithm library based on RLlib. It provides MARL research community with a unified platform for building, training, and evaluating MARL algorithms
Mava	749	about 1 year ago	Mava is a framework for distributed multi-agent reinforcement learning in JAX
Melting Pot	637	about 1 year ago	Melting Pot is a suite of test scenarios for multi-agent reinforcement learning
MetaDrive	823	about 1 year ago	MetaDrive is a driving simulator that composes diverse driving scenarios for generalizable RL
Minigrid	2,139	about 1 year ago	The Minigrid library contains a collection of discrete grid-world environments to conduct research on Reinforcement Learning. The environments follow the Gymnasium standard API and they are designed to be lightweight, fast, and easily customizable
MiniHack	486	over 1 year ago	MiniHack is a sandbox framework for easily designing rich and diverse environments for Reinforcement Learning
MiniWorld	712	over 1 year ago	MiniWorld is a minimalistic 3D interior environment simulator for reinforcement learning & robotics research
ML-Agents	17,334	about 1 year ago	ML-Agents is an open-source project that enables games and simulations to serve as environments for training reinforcement learning intelligent agents
MushroomRL	824	over 1 year ago	MushroomRL is a Python reinforcement learning (RL) library whose modularity allows to easily use well-known Python libraries for tensor computation (e.g. PyTorch, Tensorflow) and RL benchmarks (e.g. OpenAI Gym, PyBullet, Deepmind Control Suite)
OmniSafe	954	over 1 year ago	OmniSafe is an infrastructural framework designed to accelerate safe reinforcement learning (RL) research
Overcooked-AI	726	about 1 year ago	Overcooked-AI is a benchmark environment for fully cooperative human-AI task performance, based on the wildly popular video game Overcooked
PARL	3,296	over 1 year ago	PARL is a flexible and high-efficient reinforcement learning framework
PettingZoo	2,678	over 1 year ago	PettingZoo is a Python library for conducting research in multi-agent reinforcement learning, akin to a multi-agent version of Gymnasium
RLeXplore	373	over 1 year ago	RLeXplore provides stable baselines of exploration methods in reinforcement learning
RLMeta	283	about 3 years ago	RLMeta is a flexible lightweight research framework for Distributed Reinforcement Learning based on PyTorch and moolib
Safety-Gymnasium	410	almost 2 years ago	Safety-Gymnasium is a highly scalable and customizable safe reinforcement learning environment library
skrl	588	about 1 year ago	skrl is an open-source modular library for Reinforcement Learning written in Python (using PyTorch) and designed with a focus on readability, simplicity, and transparency of algorithm implementation
Stable Baselines	9,329	over 1 year ago	A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
SuperSuit	457	over 1 year ago	SuperSuit introduces a collection of small functions which can wrap reinforcement learning environments to do preprocessing ('microwrappers')
TF-Agents	2,816	about 1 year ago	A reliable, scalable and easy to use TensorFlow library for contextual bandits and reinforcement learning
TRL	10,308	about 1 year ago	Train transformer language models with reinforcement learning
veRL	427	about 1 year ago	veRL (HybridFlow) is a flexible, efficient and industrial-level RL(HF) training framework designed for LLMs
Main Content / Industry Strength Visualisation
Apache ECharts	60,918	about 1 year ago	Apache ECharts is a powerful, interactive charting and data visualization library for browser
Apache Superset	63,320	about 1 year ago	A modern, enterprise-ready business intelligence web application
Bokeh	19,453	about 1 year ago	Bokeh is an interactive visualization library for Python that enables beautiful and meaningful visual presentation of data in modern web browsers
Geoplotlib	1,029	over 3 years ago	geoplotlib is a python toolbox for visualizing geographical data and making maps
ggplot2	6,560	about 1 year ago	An implementation of the grammar of graphics for R
gradio	34,557	about 1 year ago	Quickly create and share demos of models - by only writing Python. Debug models interactively in your browser, get feedback from collaborators, and generate public links without deploying anything
Kangas	1,045	over 1 year ago	Kangas is a tool for exploring, analyzing, and visualizing large-scale multimedia data. It provides a straightforward Python API for logging large tables of data, along with an intuitive visual interface for performing complex queries against your dataset
matplotlib	20,443	about 1 year ago	A Python 2D plotting library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms
Missingno	3,987	almost 2 years ago	missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities that allows you to get a quick visual summary of the completeness (or lack thereof) of your dataset
Netron	28,684	about 1 year ago	Netron is a viewer for neural network, deep learning and machine learning models
PDPBox	846	over 1 year ago	This repository is inspired by ICEbox. The goal is to visualize the impact of certain features towards model prediction for any supervised learning algorithm
Perspective	8,669	over 1 year ago	Streaming pivot visualization via WebAssembly
Pixiedust	1,041	about 5 years ago	PixieDust is a productivity tool for Python or Scala notebooks, which lets a developer encapsulate business logic into something easy for your customers to consume
Plotly	16,444	about 1 year ago	An interactive, open source, and browser-based graphing library for Python
PyCEbox	164	almost 6 years ago	Python Individual Conditional Expectation Plot Toolbox
pygal	2,673	over 1 year ago	pygal is a dynamic SVG charting library written in Python
Redash	26,572	about 1 year ago	Redash is anopen source visualisation framework that is built to allow easy access to big datasets leveraging multiple backends
seaborn	12,669	about 1 year ago	Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics
Spotlight	1,134	over 1 year ago	Spotlight helps you to identify critical data segments and model failure modes. It enables you to build and maintain reliable machine learning models by curating high-quality datasets
Streamlit	36,168	about 1 year ago	Streamlit lets you create apps for your machine learning projects with deceptively simple Python scripts. It supports hot-reloading, so your app updates live as you edit and save your file
tensorboardX	7,887	about 1 year ago	Write TensorBoard events with simple function call
TensorBoard	6,745	about 1 year ago	TensorBoard is a visualization toolkit for machine learning experimentation that makes it easy to host, track, and share ML experiments
Transformer Explainer	3,604	over 1 year ago	Transformer Explainer is an interactive visualization tool designed to help anyone learn how Transformer-based models like GPT work
Vega-Altair	9,441	over 1 year ago	Vega-Altair is a declarative statistical visualization library for Python
ydata-profiling	12,602	about 1 year ago	ydata-profiling provides a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution
Main Content / Metadata Management
Amundsen	4,455	over 1 year ago	Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data
Apache Atlas	1,850	about 1 year ago	Apache Atlas framework is an extensible set of core foundational governance services – enabling enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the whole enterprise data ecosystem
DataHub	10,046	about 1 year ago	DataHub is LinkedIn's generalized metadata search & discovery tool
Marquez	1,800	about 1 year ago	Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem's metadata
Metacat	1,616	about 1 year ago	Metacat is a unified metadata exploration API service. Metacat focusses on solving these problems: 1) federated views of metadata systems; 2) arbitrary metadata storage about data sets; 3) metadata discovery
ML Metadata	629	over 1 year ago	a library for recording and retrieving metadata associated with ML developer and data scientist workflows
Model Card Toolkit	427	over 2 years ago	Model Card Toolkit is a toolkit that streamlines and automates the generation of model cards
TensorFlow Metadata	107	over 1 year ago	TensorFlow Metadata provides standard representations for metadata that are useful when training machine learning models with TensorFlow
Main Content / Model, Data and Experiment Tracking
AI2 Tango	533	almost 2 years ago	AI2 Tango replaces messy directories and spreadsheets full of file versions by organizing experiments into discrete steps that can be cached and reused throughout the lifetime of a research project
Aim	5,261	about 1 year ago	A super-easy way to record, search and compare AI experiments
Catalyst	3,300	almost 2 years ago	High-level utils for PyTorch DL & RL research. It was developed with a focus on reproducibility, fast experimentation and code/ideas reusing
ClearML	5,740	about 1 year ago	Auto-Magical Experiment Manager & Version Control for AI (previously Trains)
CodaLab	158	over 1 year ago	CodaLab Worksheets is a collaborative platform for reproducible research that allows researchers to run, manage, and share their experiments in the cloud. It helps researchers ensure that their runs are reproducible and consistent
Deepkit	367	almost 3 years ago	An open-source platform and cross-platform desktop application to execute, track, and debug modern machine learning experiments
Dolt	18,052	about 1 year ago	Dolt is a SQL database that you can fork, clone, branch, merge, push and pull just like a git repository
DVC	14,016	over 1 year ago	DVC (Data Version Control) is a git fork that allows for version management of models
Flor	152	about 1 year ago	Easy to use logger and automatic version controller made for data scientists who write ML code
Guild AI	872	over 2 years ago	Open source toolkit that automates and optimizes machine learning experiments
Hangar	204	over 5 years ago	Version control for tensor data, git-like semantics on numerical data with high speed and efficiency
Keepsake	1,649	about 1 year ago	Version control for machine learning
lakeFS	4,496	about 1 year ago	Repeatable, atomic and versioned data lake on top of object storage
MLflow	19,021	about 1 year ago	Open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment
ModelDB	1,707	over 1 year ago	An open-source system to version machine learning models including their ingredients code, data, config, and environment and to track ML metadata across the model lifecycle
ModelStore	379	almost 2 years ago	An open-source Python library that allows you to version, export, and save a machine learning model to your cloud storage provider
Neptune	590	about 1 year ago	Neptune is a scalable experiment tracker for teams that train foundation models
ormb	465	about 2 years ago	Docker for Your ML/DL Models Based on OCI Artifacts
Polyaxon	3,581	over 1 year ago	A platform for reproducible and scalable machine learning and deep learning on kubernetes -
Quilt	1,328	about 1 year ago	Versioning, reproducibility and deployment of data and models
Sacred	4,266	over 1 year ago	Tool to help you configure, organize, log and reproduce machine learning experiments
Studio	380	over 1 year ago	Model management framework which minimizes the overhead involved with scheduling, running, monitoring and managing artifacts of your machine learning experiments
TerminusDB	2,798	over 1 year ago	A graph database management system that stores data like git
Weights & Biases	9,270	about 1 year ago	Weights & Biase is a machine learning experiment tracking, dataset versioning, hyperparameter search, visualization, and collaboration
Main Content / Model Storage Optimisation
AutoAWQ	1,827	about 1 year ago	AutoAWQ is an easy-to-use package for 4-bit quantized models
AutoGPTQ	4,560	over 1 year ago	An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm
AWQ	2,593	about 1 year ago	Activation-aware Weight Quantization for LLM Compression and Acceleration
GGML	11,362	about 1 year ago	GGML is a high-performance, tensor library for machine learning that enables efficient inference on CPUs, particularly optimized for large language models
GPTQ	1,964	almost 2 years ago	Accurate Post-training Quantization of Generative Pretrained Transformers
MMdnn	5,802	almost 2 years ago	MMdnn is a comprehensive cross-framework tool from Microsoft that facilitates model conversion, visualization, and deployment across various deep learning frameworks
neural-compressor	2,257	about 1 year ago	Intel® Neural Compressor aims to provide popular model compression techniques such as quantization, pruning (sparsity), distillation, and neural architecture search on mainstream frameworks
NNEF			Neural Network Exchange Format (NNEF) is an open standard for representing neural network models to enable interoperability and portability across different machine learning frameworks and platforms
ONNX	18,098	over 1 year ago	ONNX (Open Neural Network Exchange) is an open-source format designed to facilitate interoperability and portability of machine learning models across different frameworks and platforms
PFA			PFA (Portable Format for Analytics) format is a standard for representing and exchanging predictive models and analytics workflows in a portable, JSON-based format
PMML			PMML (Predictive Model Markup Language) is an XML-based standard for representing and sharing predictive models between different applications
Quanto	847	over 1 year ago	Quanto aims to simplify quantizing deep learning models
Main Content / Neural Search and Retrieval
Annoy	13,321	over 1 year ago	Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point
AutoRAG	2,960	about 1 year ago	AutoRAG is a RAG AutoML tool for automatically finds an optimal RAG pipeline for your data
BeyondLLM	267	over 1 year ago	Beyond LLM offers an all-in-one toolkit for experimentation, evaluation, and deployment of RAG systems, simplifying the process with automated integration, customizable evaluation metrics, and support for various LLMs tailored to specific needs, ultimately aiming to reduce LLM hallucination risks and enhance reliability
CLIP-as-service	12,497	about 2 years ago	CLIP-as-service is a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions
Cognita	3,401	about 1 year ago	Cognita is a RAG framework for building modular and production-ready applications
DocArray	2,998	over 1 year ago	DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API
Faiss	31,920	about 1 year ago	Faiss is a library for efficient similarity search and clustering of dense vectors
fastRAG	1,392	over 1 year ago	fastRAG is a research framework for efficient and optimized retrieval augmented generative pipelines, incorporating state-of-the-art LLMs and Information Retrieval
Finetuner	1,482	about 2 years ago	Finetuner provides an effective way to improve performance on neural search tasks
GraphRAG	20,636	about 1 year ago	GraphRAG is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using the power of LLMs
HippoRAG	1,456	over 1 year ago	HippoRAG is a novel retrieval augmented generation (RAG) framework inspired by the neurobiology of human long-term memory that enables LLMs to continuously integrate knowledge across external documents
LightRAG	11,616	about 1 year ago	A simple and fast retrieval-augmented generation framework
llmware	8,303	over 1 year ago	llmware provides a unified framework for building LLM-based applications (e.g, RAG, Agents), using small, specialized models that can be deployed privately, integrated with enterprise knowledge sources safely and securely, and cost-effectively tuned and adapted for any business process
Mem0	23,331	about 1 year ago	Mem0 enhances AI assistants and agents with an intelligent memory layer, enabling personalized AI interactions
MindSQL	246	over 1 year ago	MindSQL is a Python RAG library to streamline the interaction between users and their databases using just a few lines of code
NGT	1,272	about 1 year ago	NGT provides commands and a library for performing high-speed approximate nearest neighbor searches against a large volume of data in high dimensional vector data space
NMSLIB	3,429	over 1 year ago	Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces
Qdrant	21,001	about 1 year ago	An open source vector similarity search engine with extended filtering support
R2R	4,010	about 1 year ago	R2R (RAG to Riches) is a comprehensive platform for building, deploying, and scaling RAG applications with hybrid search, multimodal support, and advanced observability
RAGFlow	25,479	about 1 year ago	RAGFlow is a RAG engine based on deep document understanding
RAGxplorer	1,093	almost 2 years ago	RAGxplorer is a tool to build RAG visualisations
Rule-based Retrieval	229	over 1 year ago	Rule-based Retrieval enables users to create and manage RAG applications with advanced filtering capabilities
Vanna	12,311	over 1 year ago	Vanna is a RAG framework for SQL generation and related functionality
Main Content / Optimized Computation
Adapters	2,600	over 1 year ago	Adapters is a unified library for parameter-efficient and modular transfer learning
AutoTrain Advanced	4,151	over 1 year ago	AutoTrain Advanced is a no-code solution that allows you to train machine learning models in just a few clicks
BindsNET	1,517	over 1 year ago	BindsNET is a spiking neural network simulation library geared towards the development of biologically inspired algorithms for machine learning
BitBLAS	445	about 1 year ago	BitBLAS is a library to support mixed-precision BLAS operations on GPUs
bitsandbytes	6,409	about 1 year ago	Bitsandbytes library is a lightweight Python wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplication (LLM.int8()), and 8 & 4-bit quantization functions
BrainCog	467	about 1 year ago	BrainCog (Brain-inspired Cognitive Intelligence Engine) is a brain-inspired spiking neural network based platform for Brain-inspired Artificial Intelligence and simulating brains at multiple scales
Composer	5,190	over 1 year ago	Composer is a PyTorch library that enables you to train neural networks faster, at lower cost, and to higher accuracy
CuDF	8,534	about 1 year ago	Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data
CuML	4,292	about 1 year ago	cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects
CuPy	9,586	about 1 year ago	An implementation of NumPy-compatible multi-dimensional array on CUDA. CuPy consists of the core multi-dimensional array class, cupy.ndarray, and many functions on it
Flax	6,196	about 1 year ago	A neural network library and ecosystem for JAX designed for flexibility
H2O-3	6,950	about 1 year ago	Fast scalable Machine Learning platform for smarter applications: Deep Learning, Gradient Boosting & XGBoost, Random Forest, Generalized Linear Modeling (Logistic Regression, Elastic Net), K-Means, PCA, Stacked Ensembles, Automatic Machine Learning (AutoML), etc
Jax	30,744	about 1 year ago	Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Kompute	2,036	over 1 year ago	Blazing fast, lightweight and mobile phone-enabled Vulkan compute framework optimized for advanced GPU data processing usecases
MLX	17,878	about 1 year ago	MLX is an array framework for machine learning on Apple silicon
Modin	9,942	over 1 year ago	Speed up your Pandas workflows by changing a single line of code
Nevergrad	3,980	over 1 year ago	Nevergrad is a gradient-free optimisation platform
Norse	685	over 1 year ago	Norse aims to exploit the advantages of bio-inspired neural components, which are sparse and event-driven - a fundamental difference from artificial neural networks
Numba	10,053	about 1 year ago	A compiler for Python array and numerical functions
NumpyGroupies	197	over 1 year ago	Optimised tools for group-indexing operations: aggregated sum and more
OpenFlamingo	3,781	over 1 year ago	OpenFlamingo is an open-source framework for training large multimodal models
Optimum	2,618	about 1 year ago	Optimum is an extension of Transformers and Diffusers, providing a set of optimization tools enabling maximum efficiency to train and run models on targeted hardware while keeping things easy to use
PEFT	16,699	about 1 year ago	Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters
PyTorch	84,978	about 1 year ago	PyTorch is a library to develop and train neural network based deep learning models
scikit-learn	60,451	about 1 year ago	Scikit-learn is a powerful machine learning library that provides a wide variety of modules for data access, data preparation and statistical model building
SetFit	2,267	over 1 year ago	SetFit is an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers
snnTorch	1,383	over 1 year ago	snnTorch is a deep and online learning library with spiking neural networks
Sonnet	9,790	over 1 year ago	Sonnet is a library built on top of TensorFlow 2 designed to provide simple, composable abstractions for machine learning research
Tensor2Tensor	15,648	almost 3 years ago	Tensor2Tensor is a library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research
TensorFlow	186,822	about 1 year ago	TensorFlow is a leading library designed for developing and deploying state-of-the-art machine learning applications
ThunderKittens	1,746	about 1 year ago	ThunderKittens is a framework to make it easy to write fast deep learning kernels in CUDA
torchkeras	1,822	over 1 year ago	The torchkeras library is a simple tool for training neural network in pytorch jusk in a keras style
TorchOpt	554	over 1 year ago	TorchOpt is an efficient library for differentiable optimization built upon PyTorch
Vaex	8,315	over 1 year ago	Vaex is a high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted)
Vowpal Wabbit	8,495	over 1 year ago	Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning
Weld	2,996	over 3 years ago	High-performance runtime for data analytics applications, Here is an with Weld’s main contributor
XGBoost	26,396	about 1 year ago	XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable
yellowbrick	4,304	over 1 year ago	yellowbrick is a matplotlib-based model evaluation plots for scikit-learn and other machine learning libraries
Main Content / Privacy and Security
BastionLab	170	over 2 years ago	BastionLab is a framework for confidential data science collaboration. It uses Confidential Computing, Access control data science, and Differential Privacy to enable data scientists to remotely perform data exploration, statistics, and training on confidential data while ensuring maximal privacy for data owners
Concrete-ML	1,045	about 1 year ago	Concrete-ML is a Privacy-Preserving Machine Learning (PPML) open-source set of tools built on top of The Concrete Framework by . It aims to simplify the use of fully homomorphic encryption (FHE) for data scientists to help them automatically turn machine learning models into their homomorphic equivalent
Fedlearner	891	over 1 year ago	Fedlearner is collaborative machine learning framework that enables joint modeling of data distributed between institutions
FATE	5,750	over 1 year ago	FATE (Federated AI Technology Enabler) is the world's first industrial grade federated learning open source framework to enable enterprises and institutions to collaborate on data while protecting data security and privacy
FedML	4,205	over 1 year ago	FedML provides a research and production integrated edge-cloud platform for Federated/Distributed Machine Learning at anywhere at any scale
Flower	5,219	about 1 year ago	Flower is a Federated Learning Framework with a unified approach. It enables the federation of any ML workload, with any ML framework, and any programming language
Google's Differential Privacy	3,091	over 1 year ago	This is a C++ library of ε-differentially private algorithms, which can be used to produce aggregate statistics over numeric data sets containing private or sensitive information
Guardrails	4,254	over 1 year ago	Guardrails is a package that lets a user add structure, type and quality guarantees to the outputs of large language models
Intel Homomorphic Encryption Backend	222	about 3 years ago	The Intel HE transformer for nGraph is a Homomorphic Encryption (HE) backend to the Intel nGraph Compiler, Intel's graph compiler for Artificial Neural Networks
Microsoft SEAL	3,647	over 1 year ago	Microsoft SEAL is an easy-to-use open-source (MIT licensed) homomorphic encryption library developed by the Cryptography Research group at Microsoft
OpenFL	738	about 1 year ago	OpenFL is a Python framework for Federated Learning. OpenFL is designed to be a , and tool for data scientists. OpenFL is developed by Intel Internet of Things Group (IOTG) and Intel Labs
PySyft	9,557	over 1 year ago	A Python library for secure, private Deep Learning. PySyft decouples private data from model training, using Multi-Party Computation (MPC) within PyTorch
Rosetta	566	almost 4 years ago	A privacy-preserving framework based on TensorFlow with customized backend Operations using Multi-Party Computation (MPC). Rosetta reuses the APIs of TensorFlow and allows to transfer original TensorFlow codes into a privacy-preserving manner with minimal changes
Substra	274	over 1 year ago	Substra is an open-source framework for privacy-preserving, traceable and collaborative Machine Learning
Tensorflow Privacy	1,947	over 1 year ago	A Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy
TF Encrypted	1,213	over 1 year ago	A Framework for Confidential Machine Learning on Encrypted Data in TensorFlow
Main Content / Training Orchestration
Accelerate	8,056	about 1 year ago	Accelerate abstracts exactly and only the boilerplate code related to multi-GPU/TPU/mixed-precision and leaves the rest of your code unchanged
Axolotl	8,090	about 1 year ago	Axolotl is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures
CML	4,046	about 1 year ago	Continuous Machine Learning (CML) is an open-source library for implementing continuous integration & delivery (CI/CD) in machine learning projects
CoreNet	6,997	over 1 year ago	CoreNet is a deep neural network toolkit that allows researchers and engineers to train standard and novel small and large-scale models for variety of tasks, including foundation models (e.g., CLIP and LLM), object classification, object detection, and semantic segmentation
Determined	3,056	about 1 year ago	Deep learning training platform with integrated support for distributed training, hyperparameter tuning, and model management (supports Tensorflow and Pytorch)
envd	2,061	over 1 year ago	Machine learning development environment for data science and AI/ML engineering teams
Fabrik	1,127	about 5 years ago	Fabrik is an online collaborative platform to build, visualize and train deep learning models via a simple drag-and-drop interface
Hopsworks	1,177	over 1 year ago	Hopsworks is a data-intensive platform for the design and operation of machine learning pipelines that includes a Feature Store -
Ludwig	11,236	over 1 year ago	Ludwig is a low-code framework for building custom AI models like LLMs and other deep neural networks
Kubeflow	14,472	over 1 year ago	A cloud-native platform for machine learning based on Google’s internal machine learning pipelines
MFTCoder	647	about 1 year ago	MFTCoder is an open-source project of CodeFuse for accurate and efficient Multi-task Fine-tuning(MFT) on Large Language Models(LLMs), especially on Code-LLMs(large language model for code tasks)
MLeap	1,506	over 1 year ago	Standardisation of pipeline and model serialization for Spark, Tensorflow and sklearn
Nanotron	1,332	about 1 year ago	Nanotron provides distributed primitives to train a variety of models efficiently using 3D parallelism
NeMo	12,438	about 1 year ago	NVIDIA NeMo is a scalable and cloud-native generative AI framework built for researchers and PyTorch developers working on Large Language Models (LLMs), Multimodal Models (MMs), Automatic Speech Recognition (ASR), Text to Speech (TTS), and Computer Vision (CV) domains. It is designed to help you efficiently create, customize, and deploy new generative AI models by leveraging existing code and pre-trained model checkpoints
Nos	636	almost 2 years ago	Nos is an open-source platform to efficiently run AI workloads on Kubernetes, increasing GPU utilization and reducing infrastructure and operational costs
NVIDIA TensorRT	10,926	about 1 year ago	TensorRT is a C++ library for high-performance inference on NVIDIA GPUs and deep learning accelerators
Open Platform for AI	2,644	almost 2 years ago	Platform that provides complete AI model training and resource management capabilities
Prime	559	about 1 year ago	Prime is a framework for efficient, globally distributed training of AI models over the internet
PyCaret	9,026	about 1 year ago	) - low-code library for training and deploying models (scikit-learn, XGBoost, LightGBM, spaCy)
Sematic	976	about 1 year ago	Platform to build resource-intensive pipelines with simple Python
Skaffold	15,098	about 1 year ago	Skaffold is a command line tool that facilitates continuous development for Kubernetes applications. You can iterate on your application source code locally then deploy to local or remote Kubernetes clusters
Streaming	1,171	about 1 year ago	A Data Streaming Library for Efficient Neural Network Training
TFX	2,121	about 1 year ago	Tensorflow Extended (TFX) is a production oriented configuration framework for ML based on TensorFlow, incl. monitoring and model version management
torchdistill	1,409	about 1 year ago	torchdistill offers various state-of-the-art knowledge distillation methods and enables you to design (new) experiments simply by editing a declarative yaml config file instead of Python code
veScale	679	over 1 year ago	veScale is a PyTorch native LLM training framework

awesome-production-machine-learning

Awesome Production Machine Learning / 10 Min Video Overview

Awesome Production Machine Learning / Want to receive recurrent updates on this repo and other advancements?

Main Content / Adversarial Robustness

Main Content / Agentic Workflow

Main Content / AutoML

Main Content / Computation Load Distribution

Main Content / Data Labelling and Synthesis

Main Content / Data Pipeline

Main Content / DS Notebook

Main Content / Data Storage Optimisation

Main Content / Data Stream Processing

Main Content / Deployment and Serving

Main Content / Evaluation and Monitoring

Main Content / Explainability and Fairness

Main Content / Feature Store

Main Content / Industry-strength AD

Main Content / Industry Strength CV

Main Content / Industry Strength NLP

Main Content / Industry Strength RecSys

Main Content / Industry Strength RL

Main Content / Industry Strength Visualisation

Main Content / Metadata Management

Main Content / Model, Data and Experiment Tracking

Main Content / Model Storage Optimisation

Main Content / Neural Search and Retrieval

Main Content / Optimized Computation

Main Content / Privacy and Security

Main Content / Training Orchestration

Backlinks from these awesome lists:

More related projects: