delta
Storage framework
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
8k stars
217 watching
2k forks
Language: Scala
last commit: 6 days ago
Linked from 4 awesome lists
acidanalyticsbig-datadelta-lakespark
Related projects:
Repository | Description | Stars |
---|---|---|
delta-io/delta-rs | A Rust library that provides low-level APIs and operations for working with the Delta Lake data storage format | 2,324 |
apache/kyuubi | An Apache project providing a distributed and multi-tenant gateway to enable serverless SQL on data warehouses and lakehouses | 2,105 |
delta-incubator/deltaray | A Python library that provides a Delta Lake table reader for the Ray open-source ML toolkit | 43 |
treeverse/lakefs | A tool for managing data lakes and versioning data transformations | 4,458 |
pypi/warehouse | A software system that powers the package registry for Python packages | 3,601 |
databricks/koalas | A Python package that allows users to work with pandas DataFrames on top of Apache Spark | 3,336 |
deltares/pyflwdir | A Python package for fast and efficient hydrological and topographic data processing | 75 |
juicedata/juicefs | A distributed POSIX file system designed for cloud-native environments, providing high performance and compatibility with various storage engines. | 10,904 |
apache/spark | An analytics engine designed to handle large-scale data processing and analysis | 39,916 |
deltares/wflow.jl | A Julia package for simulating hydrological processes in various configurations | 120 |
helgeho/archivespark | A framework for efficient data processing and extraction from archival collections, enabling the transformation of raw data into more accessible formats. | 145 |
internetarchive/sparkling | A data processing library built on top of Apache Spark to handle temporal web data | 11 |
dandavison/delta | A tool for efficiently viewing and navigating version control output | 24,394 |
dropbox/pyhive | Provides interfaces to connect and interact with data sources like Hive and Presto using Python. | 1,671 |
deltares/ribasim | A water resources modeling framework built in Julia that simulates the behavior of complex hydrological systems. | 42 |