suro

Data pipeline service

A distributed data pipeline service for collecting, aggregating, and dispatching large volumes of application events.

Netflix's distributed Data Pipeline

GitHub

794 stars
513 watching
171 forks
Language: Java
last commit: over 1 year ago
Linked from 5 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
netflix/turbine A Java-based system for aggregating and streaming real-time data from various sources 835
netflix/servo Provides a simple interface to expose and publish Java application metrics using JMX 1,417
apache/streampipes A toolbox for industrial data analytics and stream processing 605
netflix/staash A tool to abstract storage details and automate common data access patterns for developers working with relational technologies 209
netflix/genie An orchestration service that simplifies the process of running Big Data queries by automating the configuration and execution of complex jobs. 1,716
linkedin/brooklin A distributed system for streaming data between heterogeneous systems with high reliability and throughput at scale 920
netflix/hollow A high-performance in-memory dataset dissemination toolset for scalable read-only access to data from a single producer. 1,206
datasalt/pangool A Java framework that simplifies Hadoop's MapReduce API to build efficient data processing pipelines 57
apache/tez A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks 479
samapriya/planet-gee-pipeline-cli A command-line tool for automating data processing and uploads from Planet's API to Google Earth Engine. 42
netflix/evcache A distributed in-memory caching solution designed to store frequently used data for short-term use cases 2,058
netflix-skunkworks/cloudaux Provides a unified interface to various cloud providers 76
raystack/firehose Delivers real-time streaming data to various destinations 325
gazette/core Enables teams to build platforms mixing SQL, batch, and real-time streaming processing paradigms 718
streamsets/datacollector-oss A continuous big data ingestion platform that enables easy creation of data pipelines for various data sources and destinations. 90