samza

Data processor

A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees

Mirror of Apache Samza

GitHub

817 stars
58 watching
336 forks
Language: Java
last commit: about 2 months ago
Linked from 3 awesome lists

big-datasamzascala

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apache/samza-hello-samza Provides a starter project to run and develop Apache Samza jobs in a local Yarn cluster. 111
apache/spark An analytics engine designed to handle large-scale data processing and analysis 40,170
apache/pig Enables data processing and transformation in large files using a high-level language with compile-time optimizations for efficient execution on distributed computing frameworks. 682
apache/tez A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks 482
reubano/meza A lightweight toolkit for processing tabular data with a focus on functional programming and PyPy compatibility. 417
apache/rocketmq-streams Provides a lightweight stream processing framework 172
olacabs/fabric A real-time stream processing framework designed to handle high-volume event ingestion and complex data processing tasks with guaranteed availability and scalability. 55
knowledgeonwebscale/streamingmassif A Java-based platform for efficient processing of data streams by performing cascading reasoning and complex event processing. 10
nathanmarz/cascalog A library for data processing and querying on large datasets without the need for Hadoop expertise 1,375
romseygeek/samza-luwak An experimental framework that integrates Luwak and Samza to enable scalable streaming search functionality 99
bkirwi/coast A streaming data processing framework with strong ordering and exactly-once semantics 60
weblyzard/streaming-sparql Provides a robust, incremental processing of streaming results from SPARQL servers. 6
internetarchive/sparkling A data processing library built on top of Apache Spark to handle temporal web data 11
apache/druid A high-performance real-time analytics database for fast queries and ingest 13,548
apache/datasketches-java A software library of stochastic streaming algorithms, providing efficient data processing and analysis tools 899