samza
Data processor
A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees
Mirror of Apache Samza
820 stars
58 watching
334 forks
Language: Java
last commit: about 1 month ago
Linked from 3 awesome lists
big-datasamzascala
Related projects:
Repository | Description | Stars |
---|---|---|
apache/samza-hello-samza | Provides a starter project to run and develop Apache Samza jobs in a local Yarn cluster. | 111 |
apache/spark | An analytics engine designed to handle large-scale data processing and analysis | 39,916 |
apache/pig | Enables data processing and transformation in large files using a high-level language with compile-time optimizations for efficient execution on distributed computing frameworks. | 681 |
apache/tez | A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks | 479 |
reubano/meza | A lightweight toolkit for processing tabular data with a focus on functional programming and PyPy compatibility. | 416 |
apache/rocketmq-streams | Provides a lightweight stream processing framework | 172 |
olacabs/fabric | A real-time stream processing framework designed to handle high-volume event ingestion and complex data processing tasks with guaranteed availability and scalability. | 55 |
knowledgeonwebscale/streamingmassif | A Java-based platform for efficient processing of data streams by performing cascading reasoning and complex event processing. | 9 |
nathanmarz/cascalog | A library for data processing and querying on large datasets without the need for Hadoop expertise | 1,376 |
romseygeek/samza-luwak | An experimental framework that integrates Luwak and Samza to enable scalable streaming search functionality | 99 |
bkirwi/coast | A streaming data processing framework with strong ordering and exactly-once semantics | 60 |
weblyzard/streaming-sparql | Provides a robust, incremental processing of streaming results from SPARQL servers. | 6 |
internetarchive/sparkling | A data processing library built on top of Apache Spark to handle temporal web data | 11 |
apache/druid | A high-performance real-time analytics database for fast queries and ingest | 13,513 |
apache/datasketches-java | A software library of stochastic streaming algorithms, providing efficient data processing and analysis tools | 896 |