hudi

Data manager

Manages large analytical datasets on distributed storage systems by enabling incremental processing and snapshot isolation.

Upserts, Deletes And Incremental Processing on Big Data.

GitHub

5k stars
1k watching
2k forks
Language: Java
last commit: about 19 hours ago
Linked from 2 awesome lists

apacheflinkapachehudiapachesparkbigdatadata-integrationdatalakehudiincremental-processingstream-processing

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apache/hive A software project that enables data warehousing and management of large datasets using SQL 5,561
apache/kylin An OLAP engine designed to handle Big Data with sub-second query latency and seamless integration with BI tools. 3,655
apache/incubator-hugegraph A graph database designed to handle large-scale data storage and querying 2,655
apache/iotdb A time-series data management system for industrial IoT applications 5,625
apache/shardingsphere A distributed SQL query and transaction engine for sharding, scaling, encryption, and more on any database 19,985
apache/ignite A distributed, in-memory database system for high-performance computing and data processing 4,822
juicedata/juicefs A distributed POSIX file system designed for cloud-native environments, providing high performance and compatibility with various storage engines. 10,948
intel-bigdata/hibench A set of benchmarking tools to evaluate big data frameworks' performance and resource utilization 1,458
apache/hbase A distributed, versioned column-oriented store modelled after Google Bigtable 5,233
pulumi/examples Demonstrates building and deploying cloud applications and infrastructure across multiple clouds and programming languages using Pulumi. 2,394
hi-primus/optimus A Python library that provides a simple API for data preparation and analysis on various big-data engines 1,481
apache/iceberg Enables reliable and simple access to huge analytic tables across multiple engines 6,494
apache/flink An open-source stream processing framework with powerful capabilities for handling high-throughput and low-latency data streams in various programming languages 24,156
volfpeter/fastapi-htmx-tailwind-example An IoT dashboard application showcasing the integration of FastAPI, HTMX, TailwindCSS, and MongoDB for a interactive frontend experience. 37
apache/pinot A distributed real-time analytics system with low latency 5,523