SmartPipeline
Data pipeline framework
A framework for designing and executing concurrent data pipelines with a focus on simplicity and efficiency
A framework for rapid development of robust data pipelines following a simple design pattern
25 stars
2 watching
3 forks
Language: Python
last commit: 11 months ago
Linked from 1 awesome list
data-analysisdata-analyticsdata-miningdata-pipelinesdata-processingdata-sciencedataopsdesign-patternsetlmachine-learningmlopspipelinepipeline-frameworkpipelinesreproducibilitytask-queueworkflow
Related projects:
Repository | Description | Stars |
---|---|---|
huggingface/datatrove | A platform-agnostic data processing framework for large-scale text data pipelines | 2,103 |
vectaport/flowgraph | A software framework for building scalable, asynchronous data pipelines with explicit back-pressure management and logging capabilities. | 60 |
pdpipe/pdpipe | Provides a set of pre-defined data processing pipelines for pandas DataFrames. | 718 |
log2timeline/dftimewolf | A framework for orchestrating data collection, processing, and export | 299 |
ypares/porcupine | A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments | 89 |
databiosphere/toil | A workflow management system designed to efficiently run pipelines in various environments. | 901 |
druths/xp | A tool for creating flexible and self-documenting data science pipelines | 56 |
kevin-hanselman/dud | A lightweight tool for managing and versioning large data alongside source code in data pipelines | 184 |
mara/mara-pipelines | A lightweight ETL framework providing a simple way to define and execute data transformation pipelines using declarative Python code. | 2,082 |
johnsonc/lambdo | A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines | 1 |
galaxyproject/galaxy | A platform for data-intensive scientific analysis and workflow management | 1,431 |
m3dev/gokart | A framework that solves common problems in machine learning pipeline development and provides an environment for reproducibility and team collaboration. | 319 |
paysure/orinoco | A functional composable pipeline framework for Python that separates business logic from implementation. | 11 |
symphony09/ograph | A framework for building data pipelines with concurrent execution and dependency management | 33 |
calebwin/pipelines | A language and runtime for crafting massively parallel data pipelines | 375 |