OpenLineage

Data lineage framework

An open standard for collecting and managing metadata about data lineage in distributed computing environments

An Open Standard for lineage metadata collection

GitHub

2k stars
46 watching
311 forks
Language: Java
last commit: about 12 hours ago

Related projects:

Repository Description Stars
kevin-hanselman/dud A lightweight tool for managing and versioning large data alongside source code in data pipelines 184
open-sdg/open-sdg A platform for collecting and disseminating data for global sustainability indicators 62
log2timeline/dftimewolf A framework for orchestrating data collection, processing, and export 299
googlecloudplatform/dataflowtemplates A collection of pre-implemented data pipelines using Google Cloud Dataflow and Apache Beam 1,169
intentmedia/mario A library that enables the definition of complex data pipelines in a functional, typesafe, and efficient way using a declarative syntax 139
datasalt/pangool A Java framework that simplifies Hadoop's MapReduce API to build efficient data processing pipelines 57
apache/tez A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks 482
stripe/openapi Defines an OpenAPI specification for the Stripe API 397
opendatalab/mllm-dataengine Automates data generation and model training for improving MLLM capabilities 39
linkedin/brooklin A distributed system for streaming data between heterogeneous systems with high reliability and throughput at scale 931
grai-io/grai-core A tool to manage data lineage and validation across multiple databases and data sources 301
huggingface/datatrove A platform-agnostic data processing framework for large-scale text data pipelines 2,103
apache/streampipes A toolbox for industrial data analytics and stream processing 614
pdpipe/pdpipe Provides a set of pre-defined data processing pipelines for pandas DataFrames. 718
joboccara/pipes A header-only C++14 library for building expressive data pipelines using a chainable interface. 808