datasketches-java

Data Processing Library

A software library of stochastic streaming algorithms, providing efficient data processing and analysis tools

A software library of stochastic streaming algorithms, a.k.a. sketches.

GitHub

896 stars
58 watching
209 forks
Language: Java
last commit: 17 days ago
Linked from 1 awesome list

datasketches

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apache/streampipes A toolbox for industrial data analytics and stream processing 605
apache/systemds An end-to-end data science platform that integrates data integration, machine learning model training, and deployment 1,035
apache/spark An analytics engine designed to handle large-scale data processing and analysis 39,916
netflix/staash A tool to abstract storage details and automate common data access patterns for developers working with relational technologies 209
svenkreiss/pysparkling A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets 262
apache/samza A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees 820
yoshuawuyts/normcore A JavaScript library that enables the creation of stable, decentralized data streams using hypercore 28
skyhacks/nerds An API that provides random data from various nerdy franchises. 109
datastax/spark-cassandra-connector A library that enables integration between Apache Spark and Apache Cassandra for fast data processing and analysis. 1,943
deepak-malik/data-structures-in-java A collection of Java implementations of various data structures and algorithms used in computer science 145
evilsoft/crocks A collection of well-known Algebraic Data Types and their associated helper functions for functional programming in JavaScript. 1,592
jason-kerney/peelandslice.java A Java implementation of a self-contained, serverless, and zero-configuration data processing framework 1
apache/tez A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks 479
apache/druid A high-performance real-time analytics database for fast queries and ingest 13,513
joshsh/ripple A programming language and runtime environment for creating data-driven programs with a focus on Linked Data and RDF data sources 101