awesome-pipeline

Workflow toolkit collection

A curated collection of workflow toolkits for managing complex processes and data pipelines.

A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin

GitHub

6k stars
231 watching
626 forks
last commit: about 1 month ago
Linked from 10 awesome lists

awesome-listworkflow

Awesome Pipeline / Pipeline frameworks & libraries

ActionChain A workflow system for simple linear success/failure workflows
Adage 56 almost 2 years ago Small package to describe workflows that are not completely known at definition time
AiiDA 440 about 1 month ago workflow manager with a strong focus on provenance, performance and extensibility
Airflow 37,580 about 1 month ago Python-based workflow system created by AirBnb
Anduril Component-based workflow framework for scientific data analysis
Antha High-level language for biology
Argo Workflows Container-native workflow engine for orchestrating parallel data processing, ML, or CI jobs on Kubernetes
Autosubmit An open source Python experiment and workflow manager used to manage complex workflows on Cloud and HPC platforms
AWE 69 about 4 years ago Workflow and resource management system with CWL support
Balsam 77 11 months ago Python-based high throughput task and workflow engine
Bds Scripting language for data pipelines
Beam Unified programming model for batch and streaming data-parallel processing pipelines
BioMake 103 about 1 year ago GNU-Make-like utility for managing builds and complex workflows
BioQueue 29 almost 2 years ago Explicit framework with web monitoring and resource estimation
Bioshake 55 over 5 years ago Haskell DSL built on shake with strong typing and EDAM support
Bistro 48 8 months ago Library to build and execute typed scientific workflows
Bpipe 233 about 1 month ago Tool for running and managing bioinformatics pipelines
Briefly 106 over 6 years ago Python Meta-programming Library for Job Flow Control
Burr 1,368 about 1 month ago Python based lightweight graph (i.e. can do loops and conditional branching, and not just DAGs) orchestrator
Cluster Flow Command-line tool which uses common cluster managers to run bioinformatics pipelines
Clusterjob 20 10 months ago Automated reproducibility, and hassle-free submission of computational jobs to clusters
Compi Application framework for portable computational pipelines
Compss Programming model for distributed infrastructures
Conan2 3 over 10 years ago Light-weight workflow management application
Consecution 169 almost 4 years ago A Python pipeline abstraction inspired by Apache Storm topologies
Cosmos Python library for massively parallel workflows
Couler 919 3 months ago Unified interface for constructing and managing workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow
Covalent 785 about 2 months ago Workflow orchestration toolkit for high-performance and quantum computing research and development
Cromwell 1,004 about 1 month ago Workflow Management System geared towards scientific workflows from the Broad Institute
Cuneiform 232 over 1 year ago Advanced functional workflow language and framework, implemented in Erlang
Cylc A workflow engine for cycling systems, originally developed for operational environmental forecasting
Dagobah 757 over 5 years ago Simple DAG-based job scheduler in Python
Dagr 69 over 2 years ago A scala based DSL and framework for writing and executing bioinformatics pipelines as Directed Acyclic Graphs
Dagster 12,055 about 1 month ago Python-based API for defining DAGs that interfaces with popular workflow managers for building data applications
DataJoint an open-source relational framework for scientific data pipelines
Dask 12,691 about 1 month ago Dask is a flexible parallel computing library for analytics
Dbt Framework for writing analytics workflows entirely in SQL. The T part of ETL, focuses on analytics engineering
Dockerflow 97 about 7 years ago Workflow runner that uses Dataflow to run a series of tasks in Docker
Drake 1,480 almost 3 years ago Robust DSL akin to Make, implemented in Clojure
Drake R package 1,343 about 1 month ago Reproducibility and high-performance computing with an easy R-focused interface. Unrelated to . Succeeded by
Dray 383 almost 5 years ago An engine for managing the execution of container-based workflows
ecFlow 41 about 1 month ago Workflow manager
eHive 53 3 months ago System for creating and running pipelines on a distributed compute resource
Fission Workflows 371 almost 2 years ago A fast, lightweight workflow engine for serverless/FaaS functions
Flex 56 almost 4 years ago Language agnostic framework for building flexible data science pipelines (Python/Shell/Gnuplot)
Flowr 84 almost 4 years ago Robust and efficient workflows using a simple language agnostic approach (R package)
Gc3pie 44 about 2 years ago Python libraries and tools for running applications on diverse Grids and clusters
Guix Workflow Language A workflow management language extension for GNU Guix
Gwf 31 about 1 month ago Make-like utility for submitting workflows via qsub
Hamilton 1,900 about 1 month ago A python micro-framework for describing dataflows; runs anywhere python runs
Hera 621 about 1 month ago Hera is an Argo Python SDK. Hera aims to make construction and submission of various Argo Project resources easy and accessible to everyone! Hera abstracts away low-level setup details while still maintaining a consistent vocabulary with Argo
HyperLoom 16 over 2 years ago Platform for defining and executing workflow pipelines in large-scale distributed environments
HyperQueue 292 about 1 month ago HPC-focused task scheduler that automatically assigns tasks to Slurm/PBS allocations and submits them for the user
Joblib Set of tools to provide lightweight pipelining in Python
Jug A task Based parallelization framework for Python
Kedro 10,050 about 1 month ago Workflow development tool that helps you build data pipelines
Kestra 14,708 about 1 month ago Open source data orchestration and scheduling platform with declarative syntax
Ketrew 77 almost 7 years ago Embedded DSL in the OCAML language alongside a client-server management application
https://github.com/Nike-Inc/koheesio] [ - Python framework for building efficient data pipelines
Kronos 19 about 8 years ago Workflow assembler for cancer genome analytics and informatics
Kubeflow Pipelines Framework for building and deploying portable, scalable machine learning workflows using Docker containers and Argo Workflows
Loom 29 about 5 years ago Tool for running bioinformatics workflows locally or in the cloud
Longbow Job proxying tool for biomolecular simulations
Luigi 17,950 about 1 month ago Python module that helps you build complex pipelines of batch jobs
Maestro 139 about 1 month ago YAML based HPC workflow execution tool
Makeflow Workflow engine for executing large complex workflows on clusters
makepipe 31 about 2 years ago An R package which provides a set of simple tools for transforming an existing workflow into a self-documenting pipeline with very minimal upfront costs
Mara 2,082 about 1 year ago A lightweight, opinionated ETL framework, halfway between plain scripts and Apache Airflow
Mario 139 almost 7 years ago Scala library for defining data pipelines
Martian A language and framework for developing and executing complex computational pipelines
MD Studio 12 about 5 years ago Microservice based workflow engine
MetaFlow Open-sourced framework from Netflix, for DAG generation for data scientists. Python and R API's
Mistral 291 about 1 month ago Python based workflow engine by the Open Stack project
Moa 23 about 10 years ago Lightweight workflows in bioinformatics
Nextflow Flow-based computational toolkit for reproducible and scalable bioinformatics pipelines
nFlow 206 about 2 months ago Embeddable JVM-based workflow engine with high availability, fault tolerance, and support for multiple databases. Additional libraries are provided for visualization and REST API
NiPype 750 about 1 month ago Workflows and interfaces for neuroimaging packages
OpenGE 26 over 11 years ago Accelerated framework for manipulating and interpreting high-throughput sequencing data
Pachyderm Distributed and reproducible data pipelining and data management, built on the container ecosystem
Parsl Productive parallel programming, for creating parallel programs composed of Python functions and external components
PipeFunc 230 about 1 month ago Lightweight function pipeline (DAG) creation in pure Python for scientific workflows
PipEngine 20 over 7 years ago Ruby based launcher for complex biological pipelines
Pinball 1,046 about 5 years ago Python based workflow engine by Pinterest
Popper 304 almost 3 years ago YAML based container-native workflow engine supporting Docker, Singularity, Vagrant VMs with Docker daemon in VM, and local host
Porcupine 89 almost 3 years ago Haskell workflow tool to express and compose tasks (optionally cached) whose datasources and sinks are known ahead of time and rebindable, and which can expose arbitrary sets of parameters to the outside world
Prefect Python based workflow engine powering Prefect
Pydra 123 about 1 month ago Lightweight, DAG-based Python dataflow engine for reproducible and scalable scientific pipelines
PyFlow 146 over 4 years ago Lightweight parallel task engine
pyperator 60 over 7 years ago Simple push-based python workflow framework using asyncio, supporting recursive networks
pyppl 105 5 months ago A python lightweight pipeline framework
pypyr Automation task-runner for sequential steps defined in a pipeline yaml, with AWS and Slack plug-ins
pytask 115 about 1 month ago A workflow management system that facilitates reproducible data analyses
Pwrake 57 about 5 years ago Parallel workflow extension for Rake
Qdo Lightweight high-throughput queuing system for workflows with many small tasks to perform
Qsubsec 10 almost 2 years ago Simple tokenised template system for SGE
Rabix 106 almost 6 years ago Python-based workflow toolkit based on the Common Workflow Language and Docker
Rain 749 almost 2 years ago Framework for large distributed task-based pipelines, written in Rust with Python API
Ray 34,412 about 1 month ago Flexible, high-performance distributed Python execution framework
Redun 537 5 months ago Yet another redundant workflow engine
Reflow 965 about 1 year ago Language and runtime for distributed, incremental data processing in the cloud
Remake 340 over 6 years ago Make-like declarative workflows in R
Rmake Wrapper for the creation of Makefiles, enabling massive parallelization
Rubra 38 over 9 years ago Pipeline system for bioinformatics workflows
Ruffus Computation Pipeline library for Python
Ruigi 42 over 5 years ago Pipeline tool for R, inspired by Luigi
Sake Self-documenting build automation tool
SciLuigi 335 about 1 month ago Helper library for writing flexible scientific workflows in Luigi
SciPipe Library for writing Scientific Workflows in Go
Signac Lightweight, but scalable framework for file-driven workflows to be run locally and on HPC systems
Scoop 642 almost 2 years ago Scalable Concurrent Operations in Python
Seqtools 49 9 months ago Python library for lazy evaluation of pipelined transformations on indexable containers
SmartPipeline 25 11 months ago A framework for rapid development of robust data pipelines following a simple design pattern
Snakemake Tool for running and managing bioinformatics pipelines
Spiff 1,713 about 2 months ago Based on the Workflow Patterns initiative and implemented in Python
Stolos 130 over 6 years ago Directed Acyclic Graph task dependency scheduler that simplify distributed pipelines
Steppy 134 about 6 years ago lightweight, open-source, Python 3 library for fast and reproducible experimentation. (This repository has been archived by the owner on Jun 22, 2022.)
Stpipe File processing pipelines as a Python library
StreamFlow 54 about 1 month ago Container native workflow management system focused on hybrid workflows
StreamPipes A self-service IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams
Sundial Jobsystem on AWS ECS or AWS Batch managing dependencies and scheduling
Suro 794 almost 2 years ago Java-based distributed pipeline from Netflix
Swift Fast easy parallel scripting - on multicores, clusters, clouds and supercomputers
TAF 3 2 months ago R package to organize reproducible scientific workflows
Targets 949 about 1 month ago Dynamic, function-oriented -like reproducible pipelines at scale in R
TaskGraph 22 7 months ago A library to help manage complicated computational software pipelines consisting of long running individual tasks
Tibanna 70 about 2 months ago Tool that helps you run genomic pipelines on Amazon cloud
Toil 901 about 1 month ago Distributed pipeline workflow manager (mostly for genomics)
Yap Extensible parallel framework, written in Python using OpenMPI libraries
Yapp 62 about 2 years ago A C++ parallel pipeline library for stream processing
Wallaroo Framework for streaming data applications and algorithms that react to real-time events
WorldMake Easy Collaborative Reproducible Computing
Zenaton Workflow engine for orchestrating jobs, data and events across your applications and third party services
ZenML Extensible open-source MLOps framework to create reproducible pipelines for data scientists

Awesome Pipeline / Workflow platforms

ActivePapers Computational science made reproducible and publishable
Active Workflow 845 almost 2 years ago Polyglot workflows without leaving the comfort of your technology stack
Anvi’o A community and framework centered around metagenomics, designed to facilitate reproducible exploration and visualization of data
Apache Iravata Framework for executing and managing computational workflows on distributed computing resources
Arteria Event-driven automation for sequencing centers. Initiates workflows based on events
Arvados A container based workflow platform
inactive since 10/2019 Biokepler - Bioinformatics Scientific Workflow for Distributed Analysis of Large-Scale Biological Data. ( )
Butler Framework for running scientific workflows on public and academic clouds
Chipster Open source platform for data analysis
Clubber Cluster Load Balancer for Bioinformatics e-Resources
Digdag Workflow manager designed for simplicity, extensibility and collaboration
Domino 155 6 months ago User friendly and open source visual workflow management platform
Fireworks 367 6 months ago Centralized workflow server for dynamic workflows of high-throughput computations
Flojoy 210 about 2 months ago Open source visual Python scripting for test, measurement, and robotics control
Flyte 5,850 about 1 month ago Container-native, type-safe workflow and pipelines platform for large scale processing and ML
Galaxy Powerful workflow system which can be used on the command line or with the GUI
Geoweaver 82 about 2 months ago In-browser tool for data processing workflows with high-performance server support, featuring code history and workflow orchestration
Kepler Kepler scientific workflow application from University of California
KNIME Analytics Platform General-purpose platform with many specialized domain extensions
Kubeflow Toolkit for making deployments of machine learning workflows on Kubernetes simple, portable and scalable
NextflowWorkbench Integrated development environment for Nextflow, Docker and Reusable Workflows
omega|ml DataOps Platform 96 about 1 month ago Data & model pipeline deployment for humans - integrated, scalable, extensible
OpenMOLE Workflow Management System for exploration of models and parameter optimization
Ophidia Data-analytics platform with declarative workflows of distributed operations
Orchest 4,091 over 1 year ago An IDE for Data Science
Pegasus Workflow Management System
Piper 491 over 1 year ago Distributed workflow engine designed to be dead simple
Polyaxon 3,581 about 1 month ago A platform for machine learning experimentation workflow
Reana 127 about 1 month ago Platform for reusable research data analyses developed by CERN
Sushi 25 about 1 month ago Supporting User for SHell script Integration
Yabi Online research environment for grid, HPC and cloud computing
Taverna Domain independent workflow system
Temporal Highly scalable developer oriented engine
Windmill 11,216 about 1 month ago Developer platform and workflow engine to turn scripts into internal tools
VisTrails Scientific workflow and provenance management system
Wings Semantic workflow system utilizing Pegasus as execution system
Watchdog 13 4 months ago Workflow management system for the automated and distributed analysis of large-scale experimental data
FlowHub FlowHub is a new workflow cloud platform

Awesome Pipeline / Workflow languages

Common Workflow Language 1,456 about 1 month ago
Cloudgene Workflow Language
OpenMOLE DSL
Workflow Description Language 780 4 months ago
Yet Another Workflow Language
Pipelines 375 about 5 years ago

Awesome Pipeline / Workflow standardization initiatives

Workflow 4 Ever Initiative
Workflow 4 Ever workflow research object model
Workflow Patterns Initiative
Workflow Patterns Library
ResearchObject.org

Awesome Pipeline / ETL & Data orchestration

DataLad git and git-annex based data version control system with lightweight provenance capture/re-execution support
DVC Data version control system for ML project with lightweight pipeline support
lakeFS 4,496 about 1 month ago Repeatable, atomic and versioned data lake on top of object storage
Nessie 1,064 about 1 month ago Provides Git-like capability & version control for Iceberg Tables, Delta Lake Tables & SQL Views

Awesome Pipeline / Literate programming (aka interactive notebooks)

Beaker Notebook-style development environment
Binder Turn a GitHub repo into a collection of interactive notebooks powered by Jupyter and Kubernetes
IPython A rich architecture for interactive computing
Jupyter Language-agnostic notebook literate programming environment
Org Mode GNU Emacs major mode for computational notebooks, literate programming, and much more
Pathomx Interactive data workflows built on Python
Polynote 4,542 about 1 month ago A better notebook for Scala (and more). Built by Netflix
Ploomber 3,530 4 months ago Consolidate your notebooks and scripts in a reproducible pipeline using a file
R Notebooks R Markdown notebook literate programming environment
RedPoint Notebooks Web-native computational notebook for programmers supporting multiple languages, APIs and webooks
SoS Readable, interactive, cross-platform and cross-language data science workflow system
Zeppelin Web-based notebook that enables interactive data analytics

Awesome Pipeline / Extract, transform, load (ETL)

Cadence 8,375 about 1 month ago Distributed, scalable, durable, and highly available orchestration engine developed by Uber
Dataform 860 about 1 month ago Dataform is a framework for managing SQL based operations in your data warehouse
Hevo Hevo is a Fully Automated, No-code Data Pipeline Platform that supports 150+ ready-to-use integrations across Databases, SaaS Applications, Cloud Storage, SDKs, and Streaming Services
Kiba ETL A data processing & ETL framework for Ruby
LinkedPipes ETL Linked Data publishing and consumption ETL tool
Pentaho Kettle A plataform that delivers poweful ETL capabilities, using a groundbreaking, metadata-driven approach
Substation 332 about 1 month ago Substation is a cloud native data pipeline and transformation toolkit written in Go

Awesome Pipeline / Continuous Delivery workflows

Argo 15,155 about 1 month ago Get stuff done with container-native workflows for Kubernetes
CDS 4,620 about 1 month ago A pipeline based Continuous Delivery Service written in Golang

Awesome Pipeline / Build automation tools

Bazel Build software just as engineers do at Google
doit 1,893 7 months ago Highly generalized task-management and automation in Python
Gradle Unified cross platforms builds
Just 22,560 about 1 month ago Command and recipe runner similar to Make, built in Rust
Make The GNU Make build system
Prodmodel 58 over 2 years ago Build system for data science pipelines
Scons Python library focused on C/C++ builds
Shake 772 9 months ago Define robust build systems akin to GNU Make using Haskell

Awesome Pipeline / Automated workflow composition

APE 18 3 months ago A tool for the automated exploration of possible computational workflows based on semantic annotations

Awesome Pipeline / Other projects

HPC Grid Runner
NiFi Powerful and scalable directed graphs of data routing, transformation, and system mediation logic
noWorkflow 122 about 2 months ago Supporting infrastructure to run scientific experiments without a scientific workflow management system, and still get things like provenance
Reprozip Simplifies the process of creating reproducible experiments from command-line executions
Awesome streaming 2,720 2 months ago Curated list of awesome streaming frameworks, applications
Awesome ETL 3,304 6 months ago Curated list of notable ETL (extract, transform, load) frameworks, libraries and software
Awesome workflow engines 6,519 2 months ago Curated list of awesome open source workflow engines
Computational Data Analysis Workflow Systems 1,456 about 1 month ago

Backlinks from these awesome lists:

More related projects: