OpenLineage
Data lineage framework
An open standard for collecting and managing metadata about data lineage in distributed computing environments
An Open Standard for lineage metadata collection
2k stars
46 watching
311 forks
Language: Java
last commit: about 12 hours ago Related projects:
Repository | Description | Stars |
---|---|---|
kevin-hanselman/dud | A lightweight tool for managing and versioning large data alongside source code in data pipelines | 184 |
open-sdg/open-sdg | A platform for collecting and disseminating data for global sustainability indicators | 62 |
log2timeline/dftimewolf | A framework for orchestrating data collection, processing, and export | 299 |
googlecloudplatform/dataflowtemplates | A collection of pre-implemented data pipelines using Google Cloud Dataflow and Apache Beam | 1,169 |
intentmedia/mario | A library that enables the definition of complex data pipelines in a functional, typesafe, and efficient way using a declarative syntax | 139 |
datasalt/pangool | A Java framework that simplifies Hadoop's MapReduce API to build efficient data processing pipelines | 57 |
apache/tez | A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks | 482 |
stripe/openapi | Defines an OpenAPI specification for the Stripe API | 397 |
opendatalab/mllm-dataengine | Automates data generation and model training for improving MLLM capabilities | 39 |
linkedin/brooklin | A distributed system for streaming data between heterogeneous systems with high reliability and throughput at scale | 931 |
grai-io/grai-core | A tool to manage data lineage and validation across multiple databases and data sources | 301 |
huggingface/datatrove | A platform-agnostic data processing framework for large-scale text data pipelines | 2,103 |
apache/streampipes | A toolbox for industrial data analytics and stream processing | 614 |
pdpipe/pdpipe | Provides a set of pre-defined data processing pipelines for pandas DataFrames. | 718 |
joboccara/pipes | A header-only C++14 library for building expressive data pipelines using a chainable interface. | 808 |