kedro
Data pipeline toolkit
A toolbox for production-ready data science pipelines with software engineering best practices for reproducibility and modularity
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
10k stars
110 watching
909 forks
Language: Python
last commit: about 1 month ago
Linked from 5 awesome lists
experiment-trackinghacktoberfestkedromachine-learningmachine-learning-engineeringmlopspipelinepython
Related projects:
Repository | Description | Stars |
---|---|---|
harisekhon/devops-python-tools | Tools for managing and automating DevOps tasks, data processing, and cloud infrastructure using Python. | 783 |
gradio-app/gradio | Enables rapid creation and deployment of web applications for machine learning models and functions using Python | 34,557 |
jakevdp/pythondatasciencehandbook | An online guide and set of executable Jupyter notebooks providing an introduction to core libraries for data science in Python. | 43,422 |
donnemartin/data-science-ipython-notebooks | A comprehensive collection of data science and machine learning notebooks using Python and various deep learning frameworks. | 27,601 |
openmined/pysyft | Enables data scientists to perform analysis on private data without accessing the underlying data, using a secure and decentralized server architecture. | 9,557 |
pypi/warehouse | The software behind the Python Package Index. | 3,617 |
sdv-dev/sdv | A library for generating synthetic tabular data based on real-world patterns | 2,416 |
pachyderm/pachyderm | Automates data transformations with versioning and lineage tracking for scalable data pipelines | 6,191 |
mito-ds/mito | A Jupyter Notebook add-on for spreadsheet-like editing and automation of Pandas dataframes | 2,318 |
unstructured-io/unstructured | A toolkit for building custom machine learning pipelines from unstructured data | 9,452 |
jupyterlab/jupyterlab | An extensible environment for interactive and reproducible computing using the Jupyter Notebook architecture | 14,263 |
ahkarami/deep-learning-in-production | A collection of notes and references on deploying deep learning models in production environments | 4,313 |
ploomber/ploomber | A platform for building and deploying data pipelines using Python, with features for caching, automation, and modularization. | 3,530 |
pandas-dev/pandas | A powerful data analysis toolkit for Python that provides flexible and expressive data structures for efficient data manipulation and analysis. | 44,052 |
rdkit/rdkit | A comprehensive software suite for cheminformatics and machine learning tasks | 2,712 |