dud
Data pipeline manager
A lightweight tool for managing and versioning large data alongside source code in data pipelines
A lightweight CLI tool for versioning data alongside source code and building data pipelines.
184 stars
8 watching
8 forks
Language: Go
last commit: about 2 months ago
Linked from 1 awesome list
data-engineeringdata-pipelinesdata-sciencedatasetdvcsmachine-learningmlops
Related projects:
Repository | Description | Stars |
---|---|---|
prodmodel/prodmodel | A tool for managing data science pipelines by automating build, testing, and deployment processes while ensuring correctness and performance. | 58 |
linkedin/brooklin | A distributed system for streaming data between heterogeneous systems with high reliability and throughput at scale | 931 |
johnsonc/lambdo | A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines | 1 |
hyfather/pipeline | A package implementing pipelines using goroutines to manage concurrency in Go applications. | 56 |
fluidattacks/makes | A framework for building and managing CI/CD pipelines and application environments with cryptographic signed dependencies. | 461 |
galaxyproject/galaxy | A platform for data-intensive scientific analysis and workflow management | 1,431 |
ypares/porcupine | A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments | 89 |
samapriya/planet-gee-pipeline-cli | A command-line tool for automating data processing and uploads from Planet's API to Google Earth Engine. | 42 |
pdpipe/pdpipe | Provides a set of pre-defined data processing pipelines for pandas DataFrames. | 718 |
apache/streampipes | A toolbox for industrial data analytics and stream processing | 614 |
druths/xp | A tool for creating flexible and self-documenting data science pipelines | 56 |
montilab/pipeliner | A framework for defining and automating bioinformatics pipelines using Nextflow. | 44 |
calebwin/pipelines | A language and runtime for crafting massively parallel data pipelines | 375 |
ssadedin/bpipe | A tool for running and managing bioinformatics pipelines by abstracting away low-level details and providing features such as dependency tracking, transactional management, and parallelism. | 233 |
giacbrd/smartpipeline | A framework for designing and executing concurrent data pipelines with a focus on simplicity and efficiency | 25 |