dud

Data pipeline manager

A lightweight tool for managing and versioning large data alongside source code in data pipelines

A lightweight CLI tool for versioning data alongside source code and building data pipelines.

GitHub

184 stars

8 watching

8 forks

Language: Go

last commit: over 1 year ago

Linked from 1 awesome list

data-engineeringdata-pipelinesdata-sciencedatasetdvcsmachine-learningmlops

Screenshot of kevin-hanselman/dud website

kevin-hanselman.github.io/dud/

Backlinks from these awesome lists:

kelvins/awesome-mlops

Related projects:

Repository	Description	Stars
prodmodel/prodmodel	A tool for managing data science pipelines by automating build, testing, and deployment processes while ensuring correctness and performance.	58
linkedin/brooklin	A distributed system for streaming data between heterogeneous systems with high reliability and throughput at scale	931
johnsonc/lambdo	A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines	1
hyfather/pipeline	A package implementing pipelines using goroutines to manage concurrency in Go applications.	56
fluidattacks/makes	A framework for building and managing CI/CD pipelines and application environments with cryptographic signed dependencies.	461
galaxyproject/galaxy	A platform for data-intensive scientific analysis and workflow management	1,431
ypares/porcupine	A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments	89
samapriya/planet-gee-pipeline-cli	A command-line tool for automating data processing and uploads from Planet's API to Google Earth Engine.	42
pdpipe/pdpipe	Provides a set of pre-defined data processing pipelines for pandas DataFrames.	718
apache/streampipes	A toolbox for industrial data analytics and stream processing	614
druths/xp	A tool for creating flexible and self-documenting data science pipelines	56
montilab/pipeliner	A framework for defining and automating bioinformatics pipelines using Nextflow.	44
calebwin/pipelines	A language and runtime for crafting massively parallel data pipelines	375
ssadedin/bpipe	A tool for running and managing bioinformatics pipelines by abstracting away low-level details and providing features such as dependency tracking, transactional management, and parallelism.	233
giacbrd/smartpipeline	A framework for designing and executing concurrent data pipelines with a focus on simplicity and efficiency	25