| Awesome Software Engineering for Machine Learning / Broad Overviews | 
 | AI Engineering: 11 Foundational Practices |  |  | β | 
  | Best Practices for Machine Learning Applications |  |  |  | 
  | Engineering Best Practices for Machine Learning |  |  | β | 
  | Hidden Technical Debt in Machine Learning Systems |  |  | πβ | 
  | Rules of Machine Learning: Best Practices for ML Engineering |  |  | β | 
  | Software Engineering for Machine Learning: A Case Study |  |  | πβ | 
  | Awesome Software Engineering for Machine Learning / Data Management | 
 | A Survey on Data Collection for Machine Learning A Big Data - AI Integration Perspective_2019 |  |  | π | 
  | Automating Large-Scale Data Quality Verification |  |  | π | 
  | Data management challenges in production machine learning |  |  |  | 
  | Data Validation for Machine Learning |  |  | π | 
  | How to organize data labelling for ML |  |  |  | 
  | The curse of big data labeling and three ways to solve it |  |  |  | 
  | The Data Linter: Lightweight, Automated Sanity Checking for ML Data Sets |  |  | π | 
  | The ultimate guide to data labeling for ML |  |  |  | 
  | Awesome Software Engineering for Machine Learning / Model Training | 
 | 10 Best Practices for Deep Learning |  |  |  | 
  | Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement |  |  | π | 
  | Fairness On The Ground: Applying Algorithmic FairnessApproaches To Production Systems |  |  | π | 
  | How do you manage your Machine Learning Experiments? |  |  |  | 
  | Machine Learning Testing: Survey, Landscapes and Horizons |  |  | π | 
  | Nitpicking Machine Learning Technical Debt |  |  |  | 
  | On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach |  |  | πβ | 
  | On human intellect and machine failures: Troubleshooting integrative machine learning systems |  |  | π | 
  | Pitfalls and Best Practices in Algorithm Configuration |  |  | π | 
  | Pitfalls of supervised feature selection |  |  | π | 
  | Preparing and Architecting for Machine Learning |  |  |  | 
  | Preliminary Systematic Literature Review of Machine Learning System Development Process |  |  | π | 
  | Software development best practices in a deep learning environment |  |  |  | 
  | Testing and Debugging in Machine Learning |  |  |  | 
  | What Went Wrong and Why? Diagnosing Situated Interaction Failures in the Wild |  |  | π | 
  | Awesome Software Engineering for Machine Learning / Deployment and Operation | 
 | Best Practices in Machine Learning Infrastructure |  |  |  | 
  | Building Continuous Integration Services for Machine Learning |  |  | π | 
  | Continuous Delivery for Machine Learning |  |  | β | 
  | Continuous Training for Production ML in the TensorFlow Extended (TFX) Platform |  |  | π | 
  | Fairness Indicators: Scalable Infrastructure for Fair ML Systems |  |  | π | 
  | Machine Learning Logistics |  |  |  | 
  | Machine learning: Moving from experiments to production |  |  |  | 
  | ML Ops: Machine Learning as an engineered disciplined |  |  |  | 
  | Model Governance Reducing the Anarchy of Production |  |  | π | 
  | ModelOps: Cloud-based lifecycle management for reliable and trusted AI |  |  |  | 
  | Operational Machine Learning |  |  |  | 
  | Scaling Machine Learning as a Service |  |  | π | 
  | TFX: A tensorflow-based Production-Scale ML Platform |  |  | π | 
  | The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction |  |  | π | 
  | Underspecification Presents Challenges for Credibility in Modern Machine Learning |  |  | π | 
  | Versioning for end-to-end machine learning pipelines |  |  | π | 
  | Awesome Software Engineering for Machine Learning / Social Aspects | 
 | Data Scientists in Software Teams: State of the Art and Challenges |  |  | π | 
  | Machine Learning Interviews | 9,227 | over 2 years ago |  | 
  | Managing Machine Learning Projects |  |  |  | 
  | Principled Machine Learning: Practices and Tools for Efficient Collaboration |  |  |  | 
  | Awesome Software Engineering for Machine Learning / Governance | 
 | A Human-Centered Interpretability Framework Based on Weight of Evidence |  |  | π | 
  | An Architectural Risk Analysis Of Machine Learning Systems |  |  |  | 
  | Beyond Debiasing |  |  |  | 
  | Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing |  |  | π | 
  | Inherent trade-offs in the fair determination of risk scores |  |  | π | 
  | Responsible AI practices |  |  | β | 
  | Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims |  |  |  | 
  | Understanding Software-2.0 |  |  | π | 
  |  | 
 | Aim |  |  | Aim is an open source experiment tracking tool | 
  | Airflow |  |  | Programmatically author, schedule and monitor workflows | 
  | Alibi Detect | 2,262 | 11 months ago | Python library focused on outlier, adversarial and drift detection | 
  | Archai | 468 | about 1 year ago | Neural architecture search | 
  | Data Version Control (DVC) |  |  | DVC is a data and ML experiments management tool | 
  | Facets Overview / Facets Dive |  |  | Robust visualizations to aid in understanding machine learning datasets | 
  | FairLearn |  |  | A toolkit to assess and improve the fairness of machine learning models | 
  | Git Large File System (LFS) |  |  | Replaces large files such as datasets with text pointers inside Git | 
  | Great Expectations | 10,054 | 11 months ago | Data validation and testing with integration in pipelines | 
  | HParams | 126 | 12 months ago | A thoughtful approach to configuration management for machine learning projects | 
  | Kubeflow |  |  | A platform for data scientists who want to build and experiment with ML pipelines | 
  | Label Studio | 19,798 | 11 months ago | A multi-type data labeling and annotation tool with standardized output format | 
  | LiFT | 167 | over 2 years ago | Linkedin fairness toolkit | 
  | MLFlow |  |  | Manage the ML lifecycle, including experimentation, deployment, and a central model registry | 
  | Model Card Toolkit | 427 | over 2 years ago | Streamlines and automates the generation of model cards; for model documentation | 
  | Neptune.ai |  |  | Experiment tracking tool bringing organization and collaboration to data science projects | 
  | Neuraxle | 610 | over 2 years ago | Sklearn-like framework for hyperparameter tuning and AutoML in deep learning projects | 
  | OpenML |  |  | An inclusive movement to build an open, organized, online ecosystem for machine learning | 
  | PyTorch Lightning | 28,636 | 11 months ago | The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate | 
  | REVISE: REvealing VIsual biaSEs | 111 | about 3 years ago | Automatically detect bias in visual data sets | 
  | Robustness Metrics | 466 | over 1 year ago | Lightweight modules to evaluate the robustness of classification models | 
  | Seldon Core | 4,409 | 11 months ago | An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models on Kubernetes | 
  | Spark Machine Learning |  |  | Sparkβs ML library consisting of common learning algorithms and utilities | 
  | TensorBoard |  |  | TensorFlow's Visualization Toolkit | 
  | Tensorflow Extended (TFX) |  |  | An end-to-end platform for deploying production ML pipelines | 
  | Tensorflow Data Validation (TFDV) | 766 | 12 months ago | Library for exploring and validating machine learning data. Similar to Great Expectations, but for Tensorflow data | 
  | Weights & Biases |  |  | Experiment tracking, model optimization, and dataset versioning |