tez

data pipeline engine

A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks

Apache Tez

GitHub

479 stars
34 watching
423 forks
Language: Java
last commit: 14 days ago
Linked from 1 awesome list

apachebig-datahadoopjavatez

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apache/streampipes A toolbox for industrial data analytics and stream processing 605
apache/samza A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees 820
tenzir/tenzir A data pipeline engine designed to manage and process large volumes of security telemetry data at scale 645
apache/spark An analytics engine designed to handle large-scale data processing and analysis 39,916
apache/druid A high-performance real-time analytics database for fast queries and ingest 13,513
apache/pig Enables data processing and transformation in large files using a high-level language with compile-time optimizations for efficient execution on distributed computing frameworks. 681
datasalt/pangool A Java framework that simplifies Hadoop's MapReduce API to build efficient data processing pipelines 57
linkedin/brooklin A distributed system for streaming data between heterogeneous systems with high reliability and throughput at scale 920
databiosphere/toil A workflow management system designed to efficiently run pipelines in various environments. 901
johnsonc/lambdo A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines 1
apache/datafusion-ballista A distributed SQL query engine built on Apache Arrow and Rust, designed to provide efficient columnar processing and low memory usage. 1,544
kevin-hanselman/dud A lightweight tool for managing and versioning large data alongside source code in data pipelines 183
netflix/suro A distributed data pipeline service for collecting, aggregating, and dispatching large volumes of application events. 794
apache/rocketmq-connect A tool for streaming data between Apache RocketMQ and other systems 122
apache/datasketches-java A software library of stochastic streaming algorithms, providing efficient data processing and analysis tools 896