hudi
Data lake management
A platform for storing and managing big data in cloud storage, enabling incremental processing and optimized querying of large datasets
Upserts, Deletes And Incremental Processing on Big Data.
5k stars
1k watching
2k forks
Language: Java
last commit: about 1 month ago
Linked from 2 awesome lists
apacheflinkapachehudiapachesparkbigdatadata-integrationdatalakehudiincremental-processingstream-processing
Related projects:
Repository | Description | Stars |
---|---|---|
apache/hive | A software project that enables data warehousing and management of large datasets using SQL | 5,577 |
apache/kylin | An OLAP engine designed to handle Big Data with sub-second query latency and seamless integration with BI tools. | 3,661 |
apache/incubator-hugegraph | A fast and scalable graph database for storing and querying billions of vertices and edges. | 2,663 |
apache/iotdb | A time-series data management system for industrial IoT applications | 5,651 |
apache/shardingsphere | A distributed SQL query and transaction engine for sharding, scaling, encryption, and more on any database | 20,034 |
apache/ignite | A distributed, in-memory database system for high-performance computing and data processing | 4,834 |
juicedata/juicefs | A distributed POSIX file system designed for cloud-native environments, providing high performance and compatibility with various storage engines. | 11,030 |
intel-bigdata/hibench | A set of benchmarking tools to evaluate big data frameworks' performance and resource utilization | 1,463 |
apache/hbase | A distributed, versioned, column-oriented store designed to scale and manage large amounts of structured data | 5,246 |
pulumi/examples | Demonstrates building and deploying cloud applications and infrastructure across multiple clouds and programming languages using Pulumi. | 2,414 |
hi-primus/optimus | A Python library that provides a simple API for data preparation and analysis on various big-data engines | 1,486 |
apache/iceberg | Enables reliable and simple access to huge analytic tables across multiple engines | 6,621 |
apache/flink | An open-source stream processing framework with powerful capabilities for handling high-throughput and low-latency data streams in various programming languages | 24,261 |
volfpeter/fastapi-htmx-tailwind-example | An IoT dashboard application showcasing the integration of FastAPI, HTMX, TailwindCSS, and MongoDB for a interactive frontend experience. | 45 |
apache/pinot | A distributed real-time analytics system with low latency | 5,562 |