lakeFS

Data lake manager

A tool for managing data lakes and versioning data transformations

lakeFS - Data version control for your data lake | Git for data

GitHub

4k stars
44 watching
355 forks
Language: Go
last commit: 4 days ago
Linked from 6 awesome lists

apache-sparkapache-sparksqlaws-s3azure-blob-storageazure-storagedata-engineeringdata-lakedata-qualitydata-version-controldata-versioningdatalakedatalakesgit-for-datagogolanggoogle-cloud-storagehadoop-filesystemlakefsobject-storage

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
juicedata/juicefs A distributed POSIX file system designed for cloud-native environments, providing high performance and compatibility with various storage engines. 10,904
delta-io/delta An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs 7,593
airbytehq/airbyte A platform for building data integration pipelines between various data sources and destinations 16,184
dotnet/efcore A modern object-database mapper for .NET, supporting LINQ queries and schema migrations. 13,787
cubefs/cubefs A cloud-native distributed storage system that enables scalable and high-performance data storage for various applications. 4,672
netdata/netdata An observability platform designed to monitor and analyze systems in real-time with automated anomaly detection and root cause analysis. 72,075
dolthub/dolt A system that integrates version control with SQL databases, allowing developers to track changes and collaborate on database schema and data. 17,965
databendlabs/databend An open-source cloud-based data warehouse built on Rust with a focus on high-performance analytics and scalable storage 7,856
microsoft/vfsforgit A Windows-based virtual file system that optimizes Git performance by caching and managing files on demand. 5,984
cloudquery/cloudquery An open-source ELT framework that enables data movement between any source and destination using high-performance data ingestion and processing 5,877
teevity/ice An AWS usage and cost management tool that aggregates data from billing files to provide insights and enable informed decision-making for cloud resource allocation and reservations. 2,856
git-lfs/git-lfs Manages large files in version control systems like Git 12,998
openmined/pysyft Enables data scientists to perform analysis on private data without accessing the underlying data, using a secure and decentralized server architecture. 9,516
rowyio/rowy A low-code platform for managing Firestore databases and building cloud functions workflows on the web. 6,171
foundation/foundation-sites A comprehensive front-end framework for building responsive sites and apps on various devices 29,666