brooklin

Data pipeline manager

A distributed system for streaming data between heterogeneous systems with high reliability and throughput at scale

An extensible distributed system for reliable nearline data streaming at scale

GitHub

920 stars
41 watching
137 forks
Language: Java
last commit: 6 months ago
Linked from 3 awesome lists

change-data-capturedata-streamingdistributed-systemsjavakafkakafka-mirror-makerlinkedinscalability

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
kevin-hanselman/dud A lightweight tool for managing and versioning large data alongside source code in data pipelines 183
apache/streampipes A toolbox for industrial data analytics and stream processing 605
ssadedin/bpipe A tool for running and managing bioinformatics pipelines by abstracting away low-level details and providing features such as dependency tracking, transactional management, and parallelism. 230
hyfather/pipeline A package implementing pipelines using goroutines to manage concurrency in Go applications. 58
galaxyproject/galaxy An integrated framework for data-intensive scientific analysis and workflow management 1,410
bjpop/rubra A bioinformatics pipeline system that supports running workflow stages on a distributed compute cluster. 38
montilab/pipeliner A framework for defining and automating bioinformatics pipelines using Nextflow. 44
prodmodel/prodmodel A tool for managing data science pipelines by automating build, testing, and deployment processes while ensuring correctness and performance. 59
netflix/suro A distributed data pipeline service for collecting, aggregating, and dispatching large volumes of application events. 794
nlguillemot/pipelineset A utility for managing and reloading graphics pipeline states in Direct3D 12 15
samapriya/planet-gee-pipeline-cli A command-line tool for automating data processing and uploads from Planet's API to Google Earth Engine. 42
apache/tez A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks 479
fluidattacks/makes A framework for building and managing CI/CD pipelines and application environments with cryptographic signed dependencies. 453
vincentclaes/datajob Automates end-to-end machine learning pipeline deployment with AWS services 110
natcap/taskgraph A Python library for managing and optimizing computational workflows with parallel processing and data reuse. 21