dud

Data pipeline manager

A lightweight tool for managing and versioning large data alongside source code in data pipelines

A lightweight CLI tool for versioning data alongside source code and building data pipelines.

GitHub

184 stars
8 watching
8 forks
Language: Go
last commit: about 2 months ago
Linked from 1 awesome list

data-engineeringdata-pipelinesdata-sciencedatasetdvcsmachine-learningmlops

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
prodmodel/prodmodel A tool for managing data science pipelines by automating build, testing, and deployment processes while ensuring correctness and performance. 58
linkedin/brooklin A distributed system for streaming data between heterogeneous systems with high reliability and throughput at scale 931
johnsonc/lambdo A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines 1
hyfather/pipeline A package implementing pipelines using goroutines to manage concurrency in Go applications. 56
fluidattacks/makes A framework for building and managing CI/CD pipelines and application environments with cryptographic signed dependencies. 461
galaxyproject/galaxy A platform for data-intensive scientific analysis and workflow management 1,431
ypares/porcupine A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments 89
samapriya/planet-gee-pipeline-cli A command-line tool for automating data processing and uploads from Planet's API to Google Earth Engine. 42
pdpipe/pdpipe Provides a set of pre-defined data processing pipelines for pandas DataFrames. 718
apache/streampipes A toolbox for industrial data analytics and stream processing 614
druths/xp A tool for creating flexible and self-documenting data science pipelines 56
montilab/pipeliner A framework for defining and automating bioinformatics pipelines using Nextflow. 44
calebwin/pipelines A language and runtime for crafting massively parallel data pipelines 375
ssadedin/bpipe A tool for running and managing bioinformatics pipelines by abstracting away low-level details and providing features such as dependency tracking, transactional management, and parallelism. 233
giacbrd/smartpipeline A framework for designing and executing concurrent data pipelines with a focus on simplicity and efficiency 25