lakeFS

Data lake manager

A tool for managing data lakes and versioning data transformations

lakeFS - Data version control for your data lake | Git for data

GitHub

4k stars
44 watching
360 forks
Language: Go
last commit: 1 day ago
Linked from 6 awesome lists

apache-sparkapache-sparksqlaws-s3azure-blob-storageazure-storagedata-engineeringdata-lakedata-qualitydata-version-controldata-versioningdatalakedatalakesgit-for-datagogolanggoogle-cloud-storagehadoop-filesystemlakefsobject-storage

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
juicedata/juicefs A distributed POSIX file system designed for cloud-native environments, providing high performance and compatibility with various storage engines. 11,030
delta-io/delta An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs 7,677
airbytehq/airbyte A platform for building data integration pipelines between various data sources and destinations 16,441
dotnet/efcore A modern object-database mapper for .NET, supporting LINQ queries and schema migrations. 13,838
cubefs/cubefs A cloud-native file storage system designed to support large-scale data centers and hybrid cloud infrastructures 4,724
netdata/netdata A high-performance observability platform designed to simplify modern infrastructure monitoring and provide real-time insights into systems and applications. 72,607
dolthub/dolt A system that integrates version control with SQL databases, allowing developers to track changes and collaborate on database schema and data. 18,052
databendlabs/databend A high-performance, scalable data warehouse built on Rust, offering blazing-fast query execution and real-time analytics capabilities. 7,978
microsoft/vfsforgit A Windows-based virtual file system that optimizes Git performance by caching and managing files on demand. 5,991
cloudquery/cloudquery An open-source ELT framework that enables data movement between any source and destination using high-performance data ingestion and processing 5,913
teevity/ice An AWS usage and cost management tool that aggregates data from billing files to provide insights and enable informed decision-making for cloud resource allocation and reservations. 2,861
git-lfs/git-lfs Manages large files in version control systems like Git 13,096
openmined/pysyft Enables data scientists to perform analysis on private data without accessing the underlying data, using a secure and decentralized server architecture. 9,557
rowyio/rowy A web-based platform for managing data in Firestore using a spreadsheet-like interface and automating workflows with cloud functions. 6,233
foundation/foundation-sites A comprehensive front-end framework for building responsive sites and apps on various devices 29,671