kedro

Data pipeline toolkit

A toolbox for production-ready data science pipelines with software engineering best practices for reproducibility and modularity

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

GitHub

10k stars
110 watching
909 forks
Language: Python
last commit: about 1 month ago
Linked from 5 awesome lists

experiment-trackinghacktoberfestkedromachine-learningmachine-learning-engineeringmlopspipelinepython

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
harisekhon/devops-python-tools Tools for managing and automating DevOps tasks, data processing, and cloud infrastructure using Python. 783
gradio-app/gradio Enables rapid creation and deployment of web applications for machine learning models and functions using Python 34,557
jakevdp/pythondatasciencehandbook An online guide and set of executable Jupyter notebooks providing an introduction to core libraries for data science in Python. 43,422
donnemartin/data-science-ipython-notebooks A comprehensive collection of data science and machine learning notebooks using Python and various deep learning frameworks. 27,601
openmined/pysyft Enables data scientists to perform analysis on private data without accessing the underlying data, using a secure and decentralized server architecture. 9,557
pypi/warehouse The software behind the Python Package Index. 3,617
sdv-dev/sdv A library for generating synthetic tabular data based on real-world patterns 2,416
pachyderm/pachyderm Automates data transformations with versioning and lineage tracking for scalable data pipelines 6,191
mito-ds/mito A Jupyter Notebook add-on for spreadsheet-like editing and automation of Pandas dataframes 2,318
unstructured-io/unstructured A toolkit for building custom machine learning pipelines from unstructured data 9,452
jupyterlab/jupyterlab An extensible environment for interactive and reproducible computing using the Jupyter Notebook architecture 14,263
ahkarami/deep-learning-in-production A collection of notes and references on deploying deep learning models in production environments 4,313
ploomber/ploomber A platform for building and deploying data pipelines using Python, with features for caching, automation, and modularization. 3,530
pandas-dev/pandas A powerful data analysis toolkit for Python that provides flexible and expressive data structures for efficient data manipulation and analysis. 44,052
rdkit/rdkit A comprehensive software suite for cheminformatics and machine learning tasks 2,712