hudi

Data lake management

A platform for storing and managing big data in cloud storage, enabling incremental processing and optimized querying of large datasets

Upserts, Deletes And Incremental Processing on Big Data.

GitHub

5k stars

1k watching

2k forks

Language: Java

last commit: over 1 year ago

Linked from 2 awesome lists

apacheflinkapachehudiapachesparkbigdatadata-integrationdatalakehudiincremental-processingstream-processing

hudi.apache.org/

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
apache/hive	A software project that enables data warehousing and management of large datasets using SQL	5,577
apache/kylin	An OLAP engine designed to handle Big Data with sub-second query latency and seamless integration with BI tools.	3,661
apache/incubator-hugegraph	A fast and scalable graph database for storing and querying billions of vertices and edges.	2,663
apache/iotdb	A time-series data management system for industrial IoT applications	5,651
apache/shardingsphere	A distributed SQL query and transaction engine for sharding, scaling, encryption, and more on any database	20,034
apache/ignite	A distributed, in-memory database system for high-performance computing and data processing	4,834
juicedata/juicefs	A distributed POSIX file system designed for cloud-native environments, providing high performance and compatibility with various storage engines.	11,030
intel-bigdata/hibench	A set of benchmarking tools to evaluate big data frameworks' performance and resource utilization	1,463
apache/hbase	A distributed, versioned, column-oriented store designed to scale and manage large amounts of structured data	5,246
pulumi/examples	Demonstrates building and deploying cloud applications and infrastructure across multiple clouds and programming languages using Pulumi.	2,414
hi-primus/optimus	A Python library that provides a simple API for data preparation and analysis on various big-data engines	1,486
apache/iceberg	Enables reliable and simple access to huge analytic tables across multiple engines	6,621
apache/flink	An open-source stream processing framework with powerful capabilities for handling high-throughput and low-latency data streams in various programming languages	24,261
volfpeter/fastapi-htmx-tailwind-example	An IoT dashboard application showcasing the integration of FastAPI, HTMX, TailwindCSS, and MongoDB for a interactive frontend experience.	45
apache/pinot	A distributed real-time analytics system with low latency	5,562