kedro

Data pipeline toolkit

A toolbox for production-ready data science pipelines with software engineering best practices for reproducibility and modularity

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

GitHub

10k stars

110 watching

909 forks

Language: Python

last commit: over 1 year ago

Linked from 5 awesome lists

experiment-trackinghacktoberfestkedromachine-learningmachine-learning-engineeringmlopspipelinepython

kedro.org

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
harisekhon/devops-python-tools	Tools for managing and automating DevOps tasks, data processing, and cloud infrastructure using Python.	783
gradio-app/gradio	Enables rapid creation and deployment of web applications for machine learning models and functions using Python	34,557
jakevdp/pythondatasciencehandbook	An online guide and set of executable Jupyter notebooks providing an introduction to core libraries for data science in Python.	43,422
donnemartin/data-science-ipython-notebooks	A comprehensive collection of data science and machine learning notebooks using Python and various deep learning frameworks.	27,601
openmined/pysyft	Enables data scientists to perform analysis on private data without accessing the underlying data, using a secure and decentralized server architecture.	9,557
pypi/warehouse	The software behind the Python Package Index.	3,617
sdv-dev/sdv	A library for generating synthetic tabular data based on real-world patterns	2,416
pachyderm/pachyderm	Automates data transformations with versioning and lineage tracking for scalable data pipelines	6,191
mito-ds/mito	A Jupyter Notebook add-on for spreadsheet-like editing and automation of Pandas dataframes	2,318
unstructured-io/unstructured	A toolkit for building custom machine learning pipelines from unstructured data	9,452
jupyterlab/jupyterlab	An extensible environment for interactive and reproducible computing using the Jupyter Notebook architecture	14,263
ahkarami/deep-learning-in-production	A collection of notes and references on deploying deep learning models in production environments	4,313
ploomber/ploomber	A platform for building and deploying data pipelines using Python, with features for caching, automation, and modularization.	3,530
pandas-dev/pandas	A powerful data analysis toolkit for Python that provides flexible and expressive data structures for efficient data manipulation and analysis.	44,052
rdkit/rdkit	A comprehensive software suite for cheminformatics and machine learning tasks	2,712