samza

Data processor

A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees

Mirror of Apache Samza

GitHub

817 stars

58 watching

336 forks

Language: Java

last commit: over 1 year ago

Linked from 3 awesome lists

big-datasamzascala

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
apache/samza-hello-samza	Provides a starter project to run and develop Apache Samza jobs in a local Yarn cluster.	111
apache/spark	An analytics engine designed to handle large-scale data processing and analysis	40,170
apache/pig	Enables data processing and transformation in large files using a high-level language with compile-time optimizations for efficient execution on distributed computing frameworks.	682
apache/tez	A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks	482
reubano/meza	A lightweight toolkit for processing tabular data with a focus on functional programming and PyPy compatibility.	417
apache/rocketmq-streams	Provides a lightweight stream processing framework	172
olacabs/fabric	A real-time stream processing framework designed to handle high-volume event ingestion and complex data processing tasks with guaranteed availability and scalability.	55
knowledgeonwebscale/streamingmassif	A Java-based platform for efficient processing of data streams by performing cascading reasoning and complex event processing.	10
nathanmarz/cascalog	A library for data processing and querying on large datasets without the need for Hadoop expertise	1,375
romseygeek/samza-luwak	An experimental framework that integrates Luwak and Samza to enable scalable streaming search functionality	99
bkirwi/coast	A streaming data processing framework with strong ordering and exactly-once semantics	60
weblyzard/streaming-sparql	Provides a robust, incremental processing of streaming results from SPARQL servers.	6
internetarchive/sparkling	A data processing library built on top of Apache Spark to handle temporal web data	11
apache/druid	A high-performance real-time analytics database for fast queries and ingest	13,548
apache/datasketches-java	A software library of stochastic streaming algorithms, providing efficient data processing and analysis tools	899