kedro

Data pipeline toolkit

A toolbox for production-ready data science pipelines with software engineering best practices for reproducibility and modularity

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

GitHub

10k stars
108 watching
905 forks
Language: Python
last commit: 6 days ago
Linked from 5 awesome lists

experiment-trackinghacktoberfestkedromachine-learningmachine-learning-engineeringmlopspipelinepython

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
harisekhon/devops-python-tools A collection of 80+ CLI tools for DevOps, Cloud, Big Data, and Python development 773
gradio-app/gradio Enables rapid creation and deployment of web applications for machine learning models and functions using Python 33,962
jakevdp/pythondatasciencehandbook An online guide and set of executable Jupyter notebooks providing an introduction to core libraries for data science in Python. 43,214
donnemartin/data-science-ipython-notebooks A comprehensive collection of data science and machine learning notebooks using Python and various deep learning frameworks. 27,470
openmined/pysyft Enables data scientists to perform analysis on private data without accessing the underlying data, using a secure and decentralized server architecture. 9,516
pypi/warehouse A software system that powers the package registry for Python packages 3,601
sdv-dev/sdv A library for generating synthetic tabular data based on real-world patterns 2,380
pachyderm/pachyderm Automates data transformations with versioning and lineage tracking for scalable data pipelines 6,179
mito-ds/mito A Jupyter Notebook add-on for spreadsheet-like editing and automation of Pandas dataframes 2,297
unstructured-io/unstructured A toolkit for building custom machine learning pipelines from unstructured data 9,144
jupyterlab/jupyterlab An extensible environment for interactive and reproducible computing using the Jupyter Notebook architecture 14,183
ahkarami/deep-learning-in-production A collection of notes and references on deploying deep learning models in production environments 4,306
ploomber/ploomber A platform for building and deploying data pipelines using Python, with features for caching, automation, and modularization. 3,510
pandas-dev/pandas A powerful data analysis toolkit for Python that provides flexible and expressive data structures for efficient data manipulation and analysis. 43,807
rdkit/rdkit A comprehensive software suite for cheminformatics and machine learning tasks 2,673