hudi

Data lake management

A platform for storing and managing big data in cloud storage, enabling incremental processing and optimized querying of large datasets

Upserts, Deletes And Incremental Processing on Big Data.

GitHub

5k stars
1k watching
2k forks
Language: Java
last commit: about 1 month ago
Linked from 2 awesome lists

apacheflinkapachehudiapachesparkbigdatadata-integrationdatalakehudiincremental-processingstream-processing

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apache/hive A software project that enables data warehousing and management of large datasets using SQL 5,577
apache/kylin An OLAP engine designed to handle Big Data with sub-second query latency and seamless integration with BI tools. 3,661
apache/incubator-hugegraph A fast and scalable graph database for storing and querying billions of vertices and edges. 2,663
apache/iotdb A time-series data management system for industrial IoT applications 5,651
apache/shardingsphere A distributed SQL query and transaction engine for sharding, scaling, encryption, and more on any database 20,034
apache/ignite A distributed, in-memory database system for high-performance computing and data processing 4,834
juicedata/juicefs A distributed POSIX file system designed for cloud-native environments, providing high performance and compatibility with various storage engines. 11,030
intel-bigdata/hibench A set of benchmarking tools to evaluate big data frameworks' performance and resource utilization 1,463
apache/hbase A distributed, versioned, column-oriented store designed to scale and manage large amounts of structured data 5,246
pulumi/examples Demonstrates building and deploying cloud applications and infrastructure across multiple clouds and programming languages using Pulumi. 2,414
hi-primus/optimus A Python library that provides a simple API for data preparation and analysis on various big-data engines 1,486
apache/iceberg Enables reliable and simple access to huge analytic tables across multiple engines 6,621
apache/flink An open-source stream processing framework with powerful capabilities for handling high-throughput and low-latency data streams in various programming languages 24,261
volfpeter/fastapi-htmx-tailwind-example An IoT dashboard application showcasing the integration of FastAPI, HTMX, TailwindCSS, and MongoDB for a interactive frontend experience. 45
apache/pinot A distributed real-time analytics system with low latency 5,562