awesome-pipeline

Pipeline toolkit collection

A curated list of workflow toolkits and libraries for managing complex computational pipelines.

A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin

GitHub

6k stars
231 watching
625 forks
last commit: about 1 month ago
Linked from 10 awesome lists

awesome-listworkflow

Awesome Pipeline / Pipeline frameworks & libraries

ActionChain A workflow system for simple linear success/failure workflows
Adage 55 almost 2 years ago Small package to describe workflows that are not completely known at definition time
AiiDA 436 15 days ago workflow manager with a strong focus on provenance, performance and extensibility
Airflow 37,120 6 days ago Python-based workflow system created by AirBnb
Anduril Component-based workflow framework for scientific data analysis
Antha High-level language for biology
Argo Workflows Container-native workflow engine for orchestrating parallel data processing, ML, or CI jobs on Kubernetes
Autosubmit An open source Python experiment and workflow manager used to manage complex workflows on Cloud and HPC platforms
AWE 69 about 4 years ago Workflow and resource management system with CWL support
Balsam 77 9 months ago Python-based high throughput task and workflow engine
Bds Scripting language for data pipelines
Beam Unified programming model for batch and streaming data-parallel processing pipelines
BioMake 103 about 1 year ago GNU-Make-like utility for managing builds and complex workflows
BioQueue 29 over 1 year ago Explicit framework with web monitoring and resource estimation
Bioshake 55 over 5 years ago Haskell DSL built on shake with strong typing and EDAM support
Bistro 47 6 months ago Library to build and execute typed scientific workflows
Bpipe 230 23 days ago Tool for running and managing bioinformatics pipelines
Briefly 105 about 6 years ago Python Meta-programming Library for Job Flow Control
Cluster Flow Command-line tool which uses common cluster managers to run bioinformatics pipelines
Clusterjob 19 8 months ago Automated reproducibility, and hassle-free submission of computational jobs to clusters
Compi Application framework for portable computational pipelines
Compss Programming model for distributed infrastructures
Conan2 3 over 10 years ago Light-weight workflow management application
Consecution 168 over 3 years ago A Python pipeline abstraction inspired by Apache Storm topologies
Cosmos Python library for massively parallel workflows
Couler 915 about 1 month ago Unified interface for constructing and managing workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow
Covalent 775 about 1 month ago Workflow orchestration toolkit for high-performance and quantum computing research and development
Cromwell 997 6 days ago Workflow Management System geared towards scientific workflows from the Broad Institute
Cuneiform 232 over 1 year ago Advanced functional workflow language and framework, implemented in Erlang
Cylc A workflow engine for cycling systems, originally developed for operational environmental forecasting
Dagobah 755 over 5 years ago Simple DAG-based job scheduler in Python
Dagr 69 over 2 years ago A scala based DSL and framework for writing and executing bioinformatics pipelines as Directed Acyclic Graphs
Dagster 11,699 6 days ago Python-based API for defining DAGs that interfaces with popular workflow managers for building data applications
DataJoint an open-source relational framework for scientific data pipelines
Dask 12,593 6 days ago Dask is a flexible parallel computing library for analytics
Dbt Framework for writing analytics workflows entirely in SQL. The T part of ETL, focuses on analytics engineering
Dockerflow 97 about 7 years ago Workflow runner that uses Dataflow to run a series of tasks in Docker
Drake 1,482 over 2 years ago Robust DSL akin to Make, implemented in Clojure
Drake R package 1,341 4 months ago Reproducibility and high-performance computing with an easy R-focused interface. Unrelated to . Succeeded by
Dray 383 almost 5 years ago An engine for managing the execution of container-based workflows
ecFlow 40 6 days ago Workflow manager
eHive 52 about 1 month ago System for creating and running pipelines on a distributed compute resource
Fission Workflows 371 over 1 year ago A fast, lightweight workflow engine for serverless/FaaS functions
Flex 56 almost 4 years ago Language agnostic framework for building flexible data science pipelines (Python/Shell/Gnuplot)
Flowr 84 over 3 years ago Robust and efficient workflows using a simple language agnostic approach (R package)
Gc3pie 44 almost 2 years ago Python libraries and tools for running applications on diverse Grids and clusters
Guix Workflow Language A workflow management language extension for GNU Guix
Gwf 31 about 1 month ago Make-like utility for submitting workflows via qsub
Hamilton 1,861 7 days ago A python micro-framework for describing dataflows; runs anywhere python runs
Hera 606 6 days ago Hera is an Argo Python SDK. Hera aims to make construction and submission of various Argo Project resources easy and accessible to everyone! Hera abstracts away low-level setup details while still maintaining a consistent vocabulary with Argo
HyperLoom 16 about 2 years ago Platform for defining and executing workflow pipelines in large-scale distributed environments
HyperQueue 278 4 days ago HPC-focused task scheduler that automatically assigns tasks to Slurm/PBS allocations and submits them for the user
Joblib Set of tools to provide lightweight pipelining in Python
Jug A task Based parallelization framework for Python
Kedro 10,004 6 days ago Workflow development tool that helps you build data pipelines
Kestra 12,971 4 days ago Open source data orchestration and scheduling platform with declarative syntax
Ketrew 77 almost 7 years ago Embedded DSL in the OCAML language alongside a client-server management application
https://github.com/Nike-Inc/koheesio] [ - Python framework for building efficient data pipelines
Kronos 19 about 8 years ago Workflow assembler for cancer genome analytics and informatics
Kubeflow Pipelines Framework for building and deploying portable, scalable machine learning workflows using Docker containers and Argo Workflows
Loom 29 almost 5 years ago Tool for running bioinformatics workflows locally or in the cloud
Longbow Job proxying tool for biomolecular simulations
Luigi 17,869 9 days ago Python module that helps you build complex pipelines of batch jobs
Maestro 134 16 days ago YAML based HPC workflow execution tool
Makeflow Workflow engine for executing large complex workflows on clusters
makepipe 30 almost 2 years ago An R package which provides a set of simple tools for transforming an existing workflow into a self-documenting pipeline with very minimal upfront costs
Mara 2,081 11 months ago A lightweight, opinionated ETL framework, halfway between plain scripts and Apache Airflow
Mario 139 almost 7 years ago Scala library for defining data pipelines
Martian A language and framework for developing and executing complex computational pipelines
MD Studio 12 almost 5 years ago Microservice based workflow engine
MetaFlow Open-sourced framework from Netflix, for DAG generation for data scientists. Python and R API's
Mistral 288 6 days ago Python based workflow engine by the Open Stack project
Moa 23 about 10 years ago Lightweight workflows in bioinformatics
Nextflow Flow-based computational toolkit for reproducible and scalable bioinformatics pipelines
nFlow 203 16 days ago Embeddable JVM-based workflow engine with high availability, fault tolerance, and support for multiple databases. Additional libraries are provided for visualization and REST API
NiPype 750 3 days ago Workflows and interfaces for neuroimaging packages
OpenGE 26 over 11 years ago Accelerated framework for manipulating and interpreting high-throughput sequencing data
Pachyderm Distributed and reproducible data pipelining and data management, built on the container ecosystem
Parsl Productive parallel programming, for creating parallel programs composed of Python functions and external components
PipeFunc 215 5 days ago Lightweight function pipeline (DAG) creation in pure Python for scientific workflows
PipEngine 20 about 7 years ago Ruby based launcher for complex biological pipelines
Pinball 1,047 almost 5 years ago Python based workflow engine by Pinterest
Popper 305 over 2 years ago YAML based container-native workflow engine supporting Docker, Singularity, Vagrant VMs with Docker daemon in VM, and local host
Porcupine 89 over 2 years ago Haskell workflow tool to express and compose tasks (optionally cached) whose datasources and sinks are known ahead of time and rebindable, and which can expose arbitrary sets of parameters to the outside world
Prefect Python based workflow engine powering Prefect
Pydra 120 3 days ago Lightweight, DAG-based Python dataflow engine for reproducible and scalable scientific pipelines
PyFlow 146 over 4 years ago Lightweight parallel task engine
pyperator 60 over 7 years ago Simple push-based python workflow framework using asyncio, supporting recursive networks
pyppl 103 3 months ago A python lightweight pipeline framework
pypyr Automation task-runner for sequential steps defined in a pipeline yaml, with AWS and Slack plug-ins
pytask 114 5 days ago A workflow management system that facilitates reproducible data analyses
Pwrake 57 almost 5 years ago Parallel workflow extension for Rake
Qdo Lightweight high-throughput queuing system for workflows with many small tasks to perform
Qsubsec 10 over 1 year ago Simple tokenised template system for SGE
Rabix 106 over 5 years ago Python-based workflow toolkit based on the Common Workflow Language and Docker
Rain 748 over 1 year ago Framework for large distributed task-based pipelines, written in Rust with Python API
Ray 33,994 6 days ago Flexible, high-performance distributed Python execution framework
Redun 522 3 months ago Yet another redundant workflow engine
Reflow 967 about 1 year ago Language and runtime for distributed, incremental data processing in the cloud
Remake 340 over 6 years ago Make-like declarative workflows in R
Rmake Wrapper for the creation of Makefiles, enabling massive parallelization
Rubra 38 over 9 years ago Pipeline system for bioinformatics workflows
Ruffus Computation Pipeline library for Python
Ruigi 42 over 5 years ago Pipeline tool for R, inspired by Luigi
Sake Self-documenting build automation tool
SciLuigi 334 almost 2 years ago Helper library for writing flexible scientific workflows in Luigi
SciPipe Library for writing Scientific Workflows in Go
Signac Lightweight, but scalable framework for file-driven workflows to be run locally and on HPC systems
Scoop 635 over 1 year ago Scalable Concurrent Operations in Python
Seqtools 48 7 months ago Python library for lazy evaluation of pipelined transformations on indexable containers
SmartPipeline 23 9 months ago A framework for rapid development of robust data pipelines following a simple design pattern
Snakemake Tool for running and managing bioinformatics pipelines
Spiff 1,695 about 1 month ago Based on the Workflow Patterns initiative and implemented in Python
Stolos 130 over 6 years ago Directed Acyclic Graph task dependency scheduler that simplify distributed pipelines
Steppy 134 almost 6 years ago lightweight, open-source, Python 3 library for fast and reproducible experimentation. (This repository has been archived by the owner on Jun 22, 2022.)
Stpipe File processing pipelines as a Python library
StreamFlow 52 6 days ago Container native workflow management system focused on hybrid workflows
StreamPipes A self-service IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams
Sundial Jobsystem on AWS ECS or AWS Batch managing dependencies and scheduling
Suro 794 over 1 year ago Java-based distributed pipeline from Netflix
Swift Fast easy parallel scripting - on multicores, clusters, clouds and supercomputers
TAF 3 8 days ago R package to organize reproducible scientific workflows
Targets 940 3 days ago Dynamic, function-oriented -like reproducible pipelines at scale in R
TaskGraph 21 5 months ago A library to help manage complicated computational software pipelines consisting of long running individual tasks
Tibanna 70 4 months ago Tool that helps you run genomic pipelines on Amazon cloud
Toil 901 6 days ago Distributed pipeline workflow manager (mostly for genomics)
Yap Extensible parallel framework, written in Python using OpenMPI libraries
Yapp 61 about 2 years ago A C++ parallel pipeline library for stream processing
Wallaroo Framework for streaming data applications and algorithms that react to real-time events
WorldMake Easy Collaborative Reproducible Computing
Zenaton Workflow engine for orchestrating jobs, data and events across your applications and third party services
ZenML Extensible open-source MLOps framework to create reproducible pipelines for data scientists

Awesome Pipeline / Workflow platforms

ActivePapers Computational science made reproducible and publishable
Active Workflow 836 over 1 year ago Polyglot workflows without leaving the comfort of your technology stack
Anvi’o A community and framework centered around metagenomics, designed to facilitate reproducible exploration and visualization of data
Apache Iravata Framework for executing and managing computational workflows on distributed computing resources
Arteria Event-driven automation for sequencing centers. Initiates workflows based on events
Arvados A container based workflow platform
inactive since 10/2019 Biokepler - Bioinformatics Scientific Workflow for Distributed Analysis of Large-Scale Biological Data. ( )
Butler Framework for running scientific workflows on public and academic clouds
Chipster Open source platform for data analysis
Clubber Cluster Load Balancer for Bioinformatics e-Resources
Digdag Workflow manager designed for simplicity, extensibility and collaboration
Domino 149 4 months ago User friendly and open source visual workflow management platform
Fireworks 361 4 months ago Centralized workflow server for dynamic workflows of high-throughput computations
Flojoy 204 3 months ago Open source visual Python scripting for test, measurement, and robotics control
Flyte 5,785 3 days ago Container-native, type-safe workflow and pipelines platform for large scale processing and ML
Galaxy Powerful workflow system which can be used on the command line or with the GUI
Geoweaver 80 8 days ago In-browser tool for data processing workflows with high-performance server support, featuring code history and workflow orchestration
Kepler Kepler scientific workflow application from University of California
KNIME Analytics Platform General-purpose platform with many specialized domain extensions
Kubeflow Toolkit for making deployments of machine learning workflows on Kubernetes simple, portable and scalable
NextflowWorkbench Integrated development environment for Nextflow, Docker and Reusable Workflows
omega|ml DataOps Platform 95 9 days ago Data & model pipeline deployment for humans - integrated, scalable, extensible
OpenMOLE Workflow Management System for exploration of models and parameter optimization
Ophidia Data-analytics platform with declarative workflows of distributed operations
Orchest 4,079 over 1 year ago An IDE for Data Science
Pegasus Workflow Management System
Piper 489 over 1 year ago Distributed workflow engine designed to be dead simple
Polyaxon 3,571 7 days ago A platform for machine learning experimentation workflow
Reana 127 3 days ago Platform for reusable research data analyses developed by CERN
Sushi 24 3 days ago Supporting User for SHell script Integration
Yabi Online research environment for grid, HPC and cloud computing
Taverna Domain independent workflow system
Temporal Highly scalable developer oriented engine
Windmill 10,864 3 days ago Developer platform and workflow engine to turn scripts into internal tools
VisTrails Scientific workflow and provenance management system
Wings Semantic workflow system utilizing Pegasus as execution system
Watchdog 13 about 2 months ago Workflow management system for the automated and distributed analysis of large-scale experimental data
FlowHub FlowHub is a new workflow cloud platform

Awesome Pipeline / Workflow languages

Common Workflow Language 1,455 3 months ago
Cloudgene Workflow Language
OpenMOLE DSL
Workflow Description Language 776 about 2 months ago
Yet Another Workflow Language
Pipelines 374 about 5 years ago

Awesome Pipeline / Workflow standardization initiatives

Workflow 4 Ever Initiative
Workflow 4 Ever workflow research object model
Workflow Patterns Initiative
Workflow Patterns Library
ResearchObject.org

Awesome Pipeline / ETL & Data orchestration

DataLad git and git-annex based data version control system with lightweight provenance capture/re-execution support
DVC Data version control system for ML project with lightweight pipeline support
lakeFS 4,458 4 days ago Repeatable, atomic and versioned data lake on top of object storage
Nessie 1,038 6 days ago Provides Git-like capability & version control for Iceberg Tables, Delta Lake Tables & SQL Views

Awesome Pipeline / Literate programming (aka interactive notebooks)

Beaker Notebook-style development environment
Binder Turn a GitHub repo into a collection of interactive notebooks powered by Jupyter and Kubernetes
IPython A rich architecture for interactive computing
Jupyter Language-agnostic notebook literate programming environment
Org Mode GNU Emacs major mode for computational notebooks, literate programming, and much more
Pathomx Interactive data workflows built on Python
Polynote 4,538 7 days ago A better notebook for Scala (and more). Built by Netflix
Ploomber 3,510 2 months ago Consolidate your notebooks and scripts in a reproducible pipeline using a file
R Notebooks R Markdown notebook literate programming environment
RedPoint Notebooks Web-native computational notebook for programmers supporting multiple languages, APIs and webooks
SoS Readable, interactive, cross-platform and cross-language data science workflow system
Zeppelin Web-based notebook that enables interactive data analytics

Awesome Pipeline / Extract, transform, load (ETL)

Cadence 8,319 3 days ago Distributed, scalable, durable, and highly available orchestration engine developed by Uber
Dataform 850 8 days ago Dataform is a framework for managing SQL based operations in your data warehouse
Hevo Hevo is a Fully Automated, No-code Data Pipeline Platform that supports 150+ ready-to-use integrations across Databases, SaaS Applications, Cloud Storage, SDKs, and Streaming Services
Kiba ETL A data processing & ETL framework for Ruby
LinkedPipes ETL Linked Data publishing and consumption ETL tool
Pentaho Kettle A plataform that delivers poweful ETL capabilities, using a groundbreaking, metadata-driven approach
Substation 329 7 days ago Substation is a cloud native data pipeline and transformation toolkit written in Go

Awesome Pipeline / Continuous Delivery workflows

Argo 15,082 7 days ago Get stuff done with container-native workflows for Kubernetes
CDS 4,601 6 days ago A pipeline based Continuous Delivery Service written in Golang

Awesome Pipeline / Build automation tools

Bazel Build software just as engineers do at Google
doit 1,871 5 months ago Highly generalized task-management and automation in Python
Gradle Unified cross platforms builds
Just 21,421 9 days ago Command and recipe runner similar to Make, built in Rust
Make The GNU Make build system
Prodmodel 59 over 2 years ago Build system for data science pipelines
Scons Python library focused on C/C++ builds
Shake 773 7 months ago Define robust build systems akin to GNU Make using Haskell

Awesome Pipeline / Automated workflow composition

APE 17 about 1 month ago A tool for the automated exploration of possible computational workflows based on semantic annotations

Awesome Pipeline / Other projects

HPC Grid Runner
NiFi Powerful and scalable directed graphs of data routing, transformation, and system mediation logic
noWorkflow 120 6 days ago Supporting infrastructure to run scientific experiments without a scientific workflow management system, and still get things like provenance
Reprozip Simplifies the process of creating reproducible experiments from command-line executions
Awesome streaming 2,701 3 days ago Curated list of awesome streaming frameworks, applications
Awesome ETL 3,287 4 months ago Curated list of notable ETL (extract, transform, load) frameworks, libraries and software
Awesome workflow engines 6,440 5 days ago Curated list of awesome open source workflow engines
Computational Data Analysis Workflow Systems 1,455 3 months ago

Backlinks from these awesome lists:

More related projects: