tez

data pipeline engine

A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks

Apache Tez

GitHub

482 stars
34 watching
424 forks
Language: Java
last commit: about 1 month ago
Linked from 1 awesome list

apachebig-datahadoopjavatez

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apache/streampipes A toolbox for industrial data analytics and stream processing 614
apache/samza A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees 817
tenzir/tenzir A data pipeline engine designed to manage and process large volumes of security telemetry data at scale 651
apache/spark An analytics engine designed to handle large-scale data processing and analysis 40,170
apache/druid A high-performance real-time analytics database for fast queries and ingest 13,548
apache/pig Enables data processing and transformation in large files using a high-level language with compile-time optimizations for efficient execution on distributed computing frameworks. 682
datasalt/pangool A Java framework that simplifies Hadoop's MapReduce API to build efficient data processing pipelines 57
linkedin/brooklin A distributed system for streaming data between heterogeneous systems with high reliability and throughput at scale 931
databiosphere/toil A workflow management system designed to efficiently run pipelines in various environments. 901
johnsonc/lambdo A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines 1
apache/datafusion-ballista Distributed query engine for Apache DataFusion applications 1,580
kevin-hanselman/dud A lightweight tool for managing and versioning large data alongside source code in data pipelines 184
netflix/suro A distributed data pipeline service for collecting, aggregating, and dispatching large volumes of application events. 794
apache/rocketmq-connect A tool for streaming data between Apache RocketMQ and other systems 122
apache/datasketches-java A software library of stochastic streaming algorithms, providing efficient data processing and analysis tools 899