samza

Data processor

A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees

Mirror of Apache Samza

GitHub

820 stars
58 watching
334 forks
Language: Java
last commit: about 1 month ago
Linked from 3 awesome lists

big-datasamzascala

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apache/samza-hello-samza Provides a starter project to run and develop Apache Samza jobs in a local Yarn cluster. 111
apache/spark An analytics engine designed to handle large-scale data processing and analysis 39,916
apache/pig Enables data processing and transformation in large files using a high-level language with compile-time optimizations for efficient execution on distributed computing frameworks. 681
apache/tez A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks 479
reubano/meza A lightweight toolkit for processing tabular data with a focus on functional programming and PyPy compatibility. 416
apache/rocketmq-streams Provides a lightweight stream processing framework 172
olacabs/fabric A real-time stream processing framework designed to handle high-volume event ingestion and complex data processing tasks with guaranteed availability and scalability. 55
knowledgeonwebscale/streamingmassif A Java-based platform for efficient processing of data streams by performing cascading reasoning and complex event processing. 9
nathanmarz/cascalog A library for data processing and querying on large datasets without the need for Hadoop expertise 1,376
romseygeek/samza-luwak An experimental framework that integrates Luwak and Samza to enable scalable streaming search functionality 99
bkirwi/coast A streaming data processing framework with strong ordering and exactly-once semantics 60
weblyzard/streaming-sparql Provides a robust, incremental processing of streaming results from SPARQL servers. 6
internetarchive/sparkling A data processing library built on top of Apache Spark to handle temporal web data 11
apache/druid A high-performance real-time analytics database for fast queries and ingest 13,513
apache/datasketches-java A software library of stochastic streaming algorithms, providing efficient data processing and analysis tools 896