tez
data pipeline engine
A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks
Apache Tez
479 stars
34 watching
423 forks
Language: Java
last commit: 14 days ago
Linked from 1 awesome list
apachebig-datahadoopjavatez
Related projects:
Repository | Description | Stars |
---|---|---|
apache/streampipes | A toolbox for industrial data analytics and stream processing | 605 |
apache/samza | A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees | 820 |
tenzir/tenzir | A data pipeline engine designed to manage and process large volumes of security telemetry data at scale | 645 |
apache/spark | An analytics engine designed to handle large-scale data processing and analysis | 39,916 |
apache/druid | A high-performance real-time analytics database for fast queries and ingest | 13,513 |
apache/pig | Enables data processing and transformation in large files using a high-level language with compile-time optimizations for efficient execution on distributed computing frameworks. | 681 |
datasalt/pangool | A Java framework that simplifies Hadoop's MapReduce API to build efficient data processing pipelines | 57 |
linkedin/brooklin | A distributed system for streaming data between heterogeneous systems with high reliability and throughput at scale | 920 |
databiosphere/toil | A workflow management system designed to efficiently run pipelines in various environments. | 901 |
johnsonc/lambdo | A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines | 1 |
apache/datafusion-ballista | A distributed SQL query engine built on Apache Arrow and Rust, designed to provide efficient columnar processing and low memory usage. | 1,544 |
kevin-hanselman/dud | A lightweight tool for managing and versioning large data alongside source code in data pipelines | 183 |
netflix/suro | A distributed data pipeline service for collecting, aggregating, and dispatching large volumes of application events. | 794 |
apache/rocketmq-connect | A tool for streaming data between Apache RocketMQ and other systems | 122 |
apache/datasketches-java | A software library of stochastic streaming algorithms, providing efficient data processing and analysis tools | 896 |