tez
data pipeline engine
A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks
Apache Tez
482 stars
34 watching
424 forks
Language: Java
last commit: about 1 month ago
Linked from 1 awesome list
apachebig-datahadoopjavatez
Related projects:
Repository | Description | Stars |
---|---|---|
apache/streampipes | A toolbox for industrial data analytics and stream processing | 614 |
apache/samza | A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees | 817 |
tenzir/tenzir | A data pipeline engine designed to manage and process large volumes of security telemetry data at scale | 651 |
apache/spark | An analytics engine designed to handle large-scale data processing and analysis | 40,170 |
apache/druid | A high-performance real-time analytics database for fast queries and ingest | 13,548 |
apache/pig | Enables data processing and transformation in large files using a high-level language with compile-time optimizations for efficient execution on distributed computing frameworks. | 682 |
datasalt/pangool | A Java framework that simplifies Hadoop's MapReduce API to build efficient data processing pipelines | 57 |
linkedin/brooklin | A distributed system for streaming data between heterogeneous systems with high reliability and throughput at scale | 931 |
databiosphere/toil | A workflow management system designed to efficiently run pipelines in various environments. | 901 |
johnsonc/lambdo | A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines | 1 |
apache/datafusion-ballista | Distributed query engine for Apache DataFusion applications | 1,580 |
kevin-hanselman/dud | A lightweight tool for managing and versioning large data alongside source code in data pipelines | 184 |
netflix/suro | A distributed data pipeline service for collecting, aggregating, and dispatching large volumes of application events. | 794 |
apache/rocketmq-connect | A tool for streaming data between Apache RocketMQ and other systems | 122 |
apache/datasketches-java | A software library of stochastic streaming algorithms, providing efficient data processing and analysis tools | 899 |