SmartPipeline
Data pipeline framework
A framework for designing and executing concurrent data pipelines with a focus on simplicity and efficiency
A framework for rapid development of robust data pipelines following a simple design pattern
23 stars
2 watching
3 forks
Language: Python
last commit: 9 months ago
Linked from 1 awesome list
data-analysisdata-analyticsdata-miningdata-pipelinesdata-processingdata-sciencedataopsdesign-patternsetlmachine-learningmlopspipelinepipeline-frameworkpipelinesreproducibilitytask-queueworkflow
Related projects:
Repository | Description | Stars |
---|---|---|
huggingface/datatrove | A platform-agnostic data processing framework for large-scale text data pipelines | 2,043 |
vectaport/flowgraph | A software framework for building scalable, asynchronous data pipelines with explicit back-pressure management and logging capabilities. | 60 |
pdpipe/pdpipe | A tool for creating and managing data pipelines with pandas DataFrames | 716 |
log2timeline/dftimewolf | A framework for orchestrating data collection, processing, and export | 296 |
ypares/porcupine | A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments | 89 |
databiosphere/toil | A workflow management system designed to efficiently run pipelines in various environments. | 901 |
druths/xp | A tool for creating flexible and self-documenting data science pipelines | 56 |
kevin-hanselman/dud | A lightweight tool for managing and versioning large data alongside source code in data pipelines | 183 |
mara/mara-pipelines | A lightweight ETL framework providing a simple way to define and execute data transformation pipelines using declarative Python code. | 2,081 |
johnsonc/lambdo | A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines | 1 |
galaxyproject/galaxy | An integrated framework for data-intensive scientific analysis and workflow management | 1,410 |
m3dev/gokart | A framework that solves common problems in machine learning pipeline development and provides an environment for reproducibility and team collaboration. | 318 |
paysure/orinoco | A functional composable pipeline framework for Python that separates business logic from implementation. | 11 |
symphony09/ograph | A framework for building data pipelines with concurrent execution and dependency management | 32 |
calebwin/pipelines | A language and runtime for crafting massively parallel data pipelines | 374 |