Awesome Software Engineering for Machine Learning / Broad Overviews |
| AI Engineering: 11 Foundational Practices | | | β |
| Best Practices for Machine Learning Applications | | | |
| Engineering Best Practices for Machine Learning | | | β |
| Hidden Technical Debt in Machine Learning Systems | | | πβ |
| Rules of Machine Learning: Best Practices for ML Engineering | | | β |
| Software Engineering for Machine Learning: A Case Study | | | πβ |
Awesome Software Engineering for Machine Learning / Data Management |
| A Survey on Data Collection for Machine Learning A Big Data - AI Integration Perspective_2019 | | | π |
| Automating Large-Scale Data Quality Verification | | | π |
| Data management challenges in production machine learning | | | |
| Data Validation for Machine Learning | | | π |
| How to organize data labelling for ML | | | |
| The curse of big data labeling and three ways to solve it | | | |
| The Data Linter: Lightweight, Automated Sanity Checking for ML Data Sets | | | π |
| The ultimate guide to data labeling for ML | | | |
Awesome Software Engineering for Machine Learning / Model Training |
| 10 Best Practices for Deep Learning | | | |
| Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement | | | π |
| Fairness On The Ground: Applying Algorithmic FairnessApproaches To Production Systems | | | π |
| How do you manage your Machine Learning Experiments? | | | |
| Machine Learning Testing: Survey, Landscapes and Horizons | | | π |
| Nitpicking Machine Learning Technical Debt | | | |
| On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach | | | πβ |
| On human intellect and machine failures: Troubleshooting integrative machine learning systems | | | π |
| Pitfalls and Best Practices in Algorithm Configuration | | | π |
| Pitfalls of supervised feature selection | | | π |
| Preparing and Architecting for Machine Learning | | | |
| Preliminary Systematic Literature Review of Machine Learning System Development Process | | | π |
| Software development best practices in a deep learning environment | | | |
| Testing and Debugging in Machine Learning | | | |
| What Went Wrong and Why? Diagnosing Situated Interaction Failures in the Wild | | | π |
Awesome Software Engineering for Machine Learning / Deployment and Operation |
| Best Practices in Machine Learning Infrastructure | | | |
| Building Continuous Integration Services for Machine Learning | | | π |
| Continuous Delivery for Machine Learning | | | β |
| Continuous Training for Production ML in the TensorFlow Extended (TFX) Platform | | | π |
| Fairness Indicators: Scalable Infrastructure for Fair ML Systems | | | π |
| Machine Learning Logistics | | | |
| Machine learning: Moving from experiments to production | | | |
| ML Ops: Machine Learning as an engineered disciplined | | | |
| Model Governance Reducing the Anarchy of Production | | | π |
| ModelOps: Cloud-based lifecycle management for reliable and trusted AI | | | |
| Operational Machine Learning | | | |
| Scaling Machine Learning as a Service | | | π |
| TFX: A tensorflow-based Production-Scale ML Platform | | | π |
| The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction | | | π |
| Underspecification Presents Challenges for Credibility in Modern Machine Learning | | | π |
| Versioning for end-to-end machine learning pipelines | | | π |
Awesome Software Engineering for Machine Learning / Social Aspects |
| Data Scientists in Software Teams: State of the Art and Challenges | | | π |
| Machine Learning Interviews | 9,227 | over 2 years ago | |
| Managing Machine Learning Projects | | | |
| Principled Machine Learning: Practices and Tools for Efficient Collaboration | | | |
Awesome Software Engineering for Machine Learning / Governance |
| A Human-Centered Interpretability Framework Based on Weight of Evidence | | | π |
| An Architectural Risk Analysis Of Machine Learning Systems | | | |
| Beyond Debiasing | | | |
| Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing | | | π |
| Inherent trade-offs in the fair determination of risk scores | | | π |
| Responsible AI practices | | | β |
| Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims | | | |
| Understanding Software-2.0 | | | π |
| |
| Aim | | | Aim is an open source experiment tracking tool |
| Airflow | | | Programmatically author, schedule and monitor workflows |
| Alibi Detect | 2,262 | 10 months ago | Python library focused on outlier, adversarial and drift detection |
| Archai | 468 | about 1 year ago | Neural architecture search |
| Data Version Control (DVC) | | | DVC is a data and ML experiments management tool |
| Facets Overview / Facets Dive | | | Robust visualizations to aid in understanding machine learning datasets |
| FairLearn | | | A toolkit to assess and improve the fairness of machine learning models |
| Git Large File System (LFS) | | | Replaces large files such as datasets with text pointers inside Git |
| Great Expectations | 10,054 | 10 months ago | Data validation and testing with integration in pipelines |
| HParams | 126 | 11 months ago | A thoughtful approach to configuration management for machine learning projects |
| Kubeflow | | | A platform for data scientists who want to build and experiment with ML pipelines |
| Label Studio | 19,798 | 10 months ago | A multi-type data labeling and annotation tool with standardized output format |
| LiFT | 167 | over 2 years ago | Linkedin fairness toolkit |
| MLFlow | | | Manage the ML lifecycle, including experimentation, deployment, and a central model registry |
| Model Card Toolkit | 427 | about 2 years ago | Streamlines and automates the generation of model cards; for model documentation |
| Neptune.ai | | | Experiment tracking tool bringing organization and collaboration to data science projects |
| Neuraxle | 610 | over 2 years ago | Sklearn-like framework for hyperparameter tuning and AutoML in deep learning projects |
| OpenML | | | An inclusive movement to build an open, organized, online ecosystem for machine learning |
| PyTorch Lightning | 28,636 | 10 months ago | The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate |
| REVISE: REvealing VIsual biaSEs | 111 | about 3 years ago | Automatically detect bias in visual data sets |
| Robustness Metrics | 466 | about 1 year ago | Lightweight modules to evaluate the robustness of classification models |
| Seldon Core | 4,409 | 11 months ago | An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models on Kubernetes |
| Spark Machine Learning | | | Sparkβs ML library consisting of common learning algorithms and utilities |
| TensorBoard | | | TensorFlow's Visualization Toolkit |
| Tensorflow Extended (TFX) | | | An end-to-end platform for deploying production ML pipelines |
| Tensorflow Data Validation (TFDV) | 766 | 11 months ago | Library for exploring and validating machine learning data. Similar to Great Expectations, but for Tensorflow data |
| Weights & Biases | | | Experiment tracking, model optimization, and dataset versioning |