dagster

Data pipeline orchestrator

An orchestration platform for data pipelines and assets, providing a declarative programming model and integrated lineage and observability.

An orchestration platform for the development, production, and observation of data assets.

GitHub

12k stars
124 watching
2k forks
Language: Python
last commit: about 1 month ago
Linked from 10 awesome lists

analyticsdagsterdata-engineeringdata-integrationdata-orchestratordata-pipelinesdata-scienceetlmetadatamlopsorchestrationpythonschedulerworkflowworkflow-automation

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
moby/datakit A tool to orchestrate applications using a version-controlled dataflow 1,083
databand-ai/dbnd An agile pipeline framework for data engineering teams to track and orchestrate their data processes. 260
pipefunc/pipefunc Automates and simplifies the creation of function pipelines for efficient execution of scientific workflows. 230
it4innovations/hyperloom A platform for defining and executing scientific pipelines in distributed environments using C++ and Python. 16
streamsets/datacollector-oss A continuous big data ingestion platform that enables easy creation of data pipelines for various data sources and destinations. 90
apache/airflow A platform to programmatically author, schedule and monitor complex workflows 37,580
huawei/containerops An orchestration platform for automating DevOps workflows by combining tools and services into a single, GUI-based solution 339
dagworks-inc/hamilton Helps define and manage data transformations with a modular, self-documenting, and portable framework for directed acyclic graphs (DAGs) of data transformations. 1,900
synacker/daggy A utility and developer library for data streams catching and aggregation 154
danielgerlag/conductor A distributed workflow management system that coordinates services and scripts into complex workflows. 538
galaxyproject/galaxy A platform for data-intensive scientific analysis and workflow management 1,431
dataman-cloud/swan A Mesos scheduler that enables deployment and management of long-running applications with high availability and scalability. 408
kevin-hanselman/dud A lightweight tool for managing and versioning large data alongside source code in data pipelines 184
couler-proj/couler Provides a unified interface for constructing and managing workflows across different workflow engines. 919
apache/streampipes A toolbox for industrial data analytics and stream processing 614