awesome-pipeline
Pipeline toolkit collection
A curated list of workflow toolkits and libraries for managing complex computational pipelines.
A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin
6k stars
231 watching
625 forks
last commit: about 1 month ago
Linked from 10 awesome lists
awesome-listworkflow
Awesome Pipeline / Pipeline frameworks & libraries | |||
ActionChain | A workflow system for simple linear success/failure workflows | ||
Adage | 55 | almost 2 years ago | Small package to describe workflows that are not completely known at definition time |
AiiDA | 436 | 15 days ago | workflow manager with a strong focus on provenance, performance and extensibility |
Airflow | 37,120 | 6 days ago | Python-based workflow system created by AirBnb |
Anduril | Component-based workflow framework for scientific data analysis | ||
Antha | High-level language for biology | ||
Argo Workflows | Container-native workflow engine for orchestrating parallel data processing, ML, or CI jobs on Kubernetes | ||
Autosubmit | An open source Python experiment and workflow manager used to manage complex workflows on Cloud and HPC platforms | ||
AWE | 69 | about 4 years ago | Workflow and resource management system with CWL support |
Balsam | 77 | 9 months ago | Python-based high throughput task and workflow engine |
Bds | Scripting language for data pipelines | ||
Beam | Unified programming model for batch and streaming data-parallel processing pipelines | ||
BioMake | 103 | about 1 year ago | GNU-Make-like utility for managing builds and complex workflows |
BioQueue | 29 | over 1 year ago | Explicit framework with web monitoring and resource estimation |
Bioshake | 55 | over 5 years ago | Haskell DSL built on shake with strong typing and EDAM support |
Bistro | 47 | 6 months ago | Library to build and execute typed scientific workflows |
Bpipe | 230 | 23 days ago | Tool for running and managing bioinformatics pipelines |
Briefly | 105 | about 6 years ago | Python Meta-programming Library for Job Flow Control |
Cluster Flow | Command-line tool which uses common cluster managers to run bioinformatics pipelines | ||
Clusterjob | 19 | 8 months ago | Automated reproducibility, and hassle-free submission of computational jobs to clusters |
Compi | Application framework for portable computational pipelines | ||
Compss | Programming model for distributed infrastructures | ||
Conan2 | 3 | over 10 years ago | Light-weight workflow management application |
Consecution | 168 | over 3 years ago | A Python pipeline abstraction inspired by Apache Storm topologies |
Cosmos | Python library for massively parallel workflows | ||
Couler | 915 | about 1 month ago | Unified interface for constructing and managing workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow |
Covalent | 775 | about 1 month ago | Workflow orchestration toolkit for high-performance and quantum computing research and development |
Cromwell | 997 | 6 days ago | Workflow Management System geared towards scientific workflows from the Broad Institute |
Cuneiform | 232 | over 1 year ago | Advanced functional workflow language and framework, implemented in Erlang |
Cylc | A workflow engine for cycling systems, originally developed for operational environmental forecasting | ||
Dagobah | 755 | over 5 years ago | Simple DAG-based job scheduler in Python |
Dagr | 69 | over 2 years ago | A scala based DSL and framework for writing and executing bioinformatics pipelines as Directed Acyclic Graphs |
Dagster | 11,699 | 6 days ago | Python-based API for defining DAGs that interfaces with popular workflow managers for building data applications |
DataJoint | an open-source relational framework for scientific data pipelines | ||
Dask | 12,593 | 6 days ago | Dask is a flexible parallel computing library for analytics |
Dbt | Framework for writing analytics workflows entirely in SQL. The T part of ETL, focuses on analytics engineering | ||
Dockerflow | 97 | about 7 years ago | Workflow runner that uses Dataflow to run a series of tasks in Docker |
Drake | 1,482 | over 2 years ago | Robust DSL akin to Make, implemented in Clojure |
Drake R package | 1,341 | 4 months ago | Reproducibility and high-performance computing with an easy R-focused interface. Unrelated to . Succeeded by |
Dray | 383 | almost 5 years ago | An engine for managing the execution of container-based workflows |
ecFlow | 40 | 6 days ago | Workflow manager |
eHive | 52 | about 1 month ago | System for creating and running pipelines on a distributed compute resource |
Fission Workflows | 371 | over 1 year ago | A fast, lightweight workflow engine for serverless/FaaS functions |
Flex | 56 | almost 4 years ago | Language agnostic framework for building flexible data science pipelines (Python/Shell/Gnuplot) |
Flowr | 84 | over 3 years ago | Robust and efficient workflows using a simple language agnostic approach (R package) |
Gc3pie | 44 | almost 2 years ago | Python libraries and tools for running applications on diverse Grids and clusters |
Guix Workflow Language | A workflow management language extension for GNU Guix | ||
Gwf | 31 | about 1 month ago | Make-like utility for submitting workflows via qsub |
Hamilton | 1,861 | 7 days ago | A python micro-framework for describing dataflows; runs anywhere python runs |
Hera | 606 | 6 days ago | Hera is an Argo Python SDK. Hera aims to make construction and submission of various Argo Project resources easy and accessible to everyone! Hera abstracts away low-level setup details while still maintaining a consistent vocabulary with Argo |
HyperLoom | 16 | about 2 years ago | Platform for defining and executing workflow pipelines in large-scale distributed environments |
HyperQueue | 278 | 4 days ago | HPC-focused task scheduler that automatically assigns tasks to Slurm/PBS allocations and submits them for the user |
Joblib | Set of tools to provide lightweight pipelining in Python | ||
Jug | A task Based parallelization framework for Python | ||
Kedro | 10,004 | 6 days ago | Workflow development tool that helps you build data pipelines |
Kestra | 12,971 | 4 days ago | Open source data orchestration and scheduling platform with declarative syntax |
Ketrew | 77 | almost 7 years ago | Embedded DSL in the OCAML language alongside a client-server management application |
https://github.com/Nike-Inc/koheesio] | [ - Python framework for building efficient data pipelines | ||
Kronos | 19 | about 8 years ago | Workflow assembler for cancer genome analytics and informatics |
Kubeflow Pipelines | Framework for building and deploying portable, scalable machine learning workflows using Docker containers and Argo Workflows | ||
Loom | 29 | almost 5 years ago | Tool for running bioinformatics workflows locally or in the cloud |
Longbow | Job proxying tool for biomolecular simulations | ||
Luigi | 17,869 | 9 days ago | Python module that helps you build complex pipelines of batch jobs |
Maestro | 134 | 16 days ago | YAML based HPC workflow execution tool |
Makeflow | Workflow engine for executing large complex workflows on clusters | ||
makepipe | 30 | almost 2 years ago | An R package which provides a set of simple tools for transforming an existing workflow into a self-documenting pipeline with very minimal upfront costs |
Mara | 2,081 | 11 months ago | A lightweight, opinionated ETL framework, halfway between plain scripts and Apache Airflow |
Mario | 139 | almost 7 years ago | Scala library for defining data pipelines |
Martian | A language and framework for developing and executing complex computational pipelines | ||
MD Studio | 12 | almost 5 years ago | Microservice based workflow engine |
MetaFlow | Open-sourced framework from Netflix, for DAG generation for data scientists. Python and R API's | ||
Mistral | 288 | 6 days ago | Python based workflow engine by the Open Stack project |
Moa | 23 | about 10 years ago | Lightweight workflows in bioinformatics |
Nextflow | Flow-based computational toolkit for reproducible and scalable bioinformatics pipelines | ||
nFlow | 203 | 16 days ago | Embeddable JVM-based workflow engine with high availability, fault tolerance, and support for multiple databases. Additional libraries are provided for visualization and REST API |
NiPype | 750 | 3 days ago | Workflows and interfaces for neuroimaging packages |
OpenGE | 26 | over 11 years ago | Accelerated framework for manipulating and interpreting high-throughput sequencing data |
Pachyderm | Distributed and reproducible data pipelining and data management, built on the container ecosystem | ||
Parsl | Productive parallel programming, for creating parallel programs composed of Python functions and external components | ||
PipeFunc | 215 | 5 days ago | Lightweight function pipeline (DAG) creation in pure Python for scientific workflows |
PipEngine | 20 | about 7 years ago | Ruby based launcher for complex biological pipelines |
Pinball | 1,047 | almost 5 years ago | Python based workflow engine by Pinterest |
Popper | 305 | over 2 years ago | YAML based container-native workflow engine supporting Docker, Singularity, Vagrant VMs with Docker daemon in VM, and local host |
Porcupine | 89 | over 2 years ago | Haskell workflow tool to express and compose tasks (optionally cached) whose datasources and sinks are known ahead of time and rebindable, and which can expose arbitrary sets of parameters to the outside world |
Prefect | Python based workflow engine powering Prefect | ||
Pydra | 120 | 3 days ago | Lightweight, DAG-based Python dataflow engine for reproducible and scalable scientific pipelines |
PyFlow | 146 | over 4 years ago | Lightweight parallel task engine |
pyperator | 60 | over 7 years ago | Simple push-based python workflow framework using asyncio, supporting recursive networks |
pyppl | 103 | 3 months ago | A python lightweight pipeline framework |
pypyr | Automation task-runner for sequential steps defined in a pipeline yaml, with AWS and Slack plug-ins | ||
pytask | 114 | 5 days ago | A workflow management system that facilitates reproducible data analyses |
Pwrake | 57 | almost 5 years ago | Parallel workflow extension for Rake |
Qdo | Lightweight high-throughput queuing system for workflows with many small tasks to perform | ||
Qsubsec | 10 | over 1 year ago | Simple tokenised template system for SGE |
Rabix | 106 | over 5 years ago | Python-based workflow toolkit based on the Common Workflow Language and Docker |
Rain | 748 | over 1 year ago | Framework for large distributed task-based pipelines, written in Rust with Python API |
Ray | 33,994 | 6 days ago | Flexible, high-performance distributed Python execution framework |
Redun | 522 | 3 months ago | Yet another redundant workflow engine |
Reflow | 967 | about 1 year ago | Language and runtime for distributed, incremental data processing in the cloud |
Remake | 340 | over 6 years ago | Make-like declarative workflows in R |
Rmake | Wrapper for the creation of Makefiles, enabling massive parallelization | ||
Rubra | 38 | over 9 years ago | Pipeline system for bioinformatics workflows |
Ruffus | Computation Pipeline library for Python | ||
Ruigi | 42 | over 5 years ago | Pipeline tool for R, inspired by Luigi |
Sake | Self-documenting build automation tool | ||
SciLuigi | 334 | almost 2 years ago | Helper library for writing flexible scientific workflows in Luigi |
SciPipe | Library for writing Scientific Workflows in Go | ||
Signac | Lightweight, but scalable framework for file-driven workflows to be run locally and on HPC systems | ||
Scoop | 635 | over 1 year ago | Scalable Concurrent Operations in Python |
Seqtools | 48 | 7 months ago | Python library for lazy evaluation of pipelined transformations on indexable containers |
SmartPipeline | 23 | 9 months ago | A framework for rapid development of robust data pipelines following a simple design pattern |
Snakemake | Tool for running and managing bioinformatics pipelines | ||
Spiff | 1,695 | about 1 month ago | Based on the Workflow Patterns initiative and implemented in Python |
Stolos | 130 | over 6 years ago | Directed Acyclic Graph task dependency scheduler that simplify distributed pipelines |
Steppy | 134 | almost 6 years ago | lightweight, open-source, Python 3 library for fast and reproducible experimentation. (This repository has been archived by the owner on Jun 22, 2022.) |
Stpipe | File processing pipelines as a Python library | ||
StreamFlow | 52 | 6 days ago | Container native workflow management system focused on hybrid workflows |
StreamPipes | A self-service IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams | ||
Sundial | Jobsystem on AWS ECS or AWS Batch managing dependencies and scheduling | ||
Suro | 794 | over 1 year ago | Java-based distributed pipeline from Netflix |
Swift | Fast easy parallel scripting - on multicores, clusters, clouds and supercomputers | ||
TAF | 3 | 8 days ago | R package to organize reproducible scientific workflows |
Targets | 940 | 3 days ago | Dynamic, function-oriented -like reproducible pipelines at scale in R |
TaskGraph | 21 | 5 months ago | A library to help manage complicated computational software pipelines consisting of long running individual tasks |
Tibanna | 70 | 4 months ago | Tool that helps you run genomic pipelines on Amazon cloud |
Toil | 901 | 6 days ago | Distributed pipeline workflow manager (mostly for genomics) |
Yap | Extensible parallel framework, written in Python using OpenMPI libraries | ||
Yapp | 61 | about 2 years ago | A C++ parallel pipeline library for stream processing |
Wallaroo | Framework for streaming data applications and algorithms that react to real-time events | ||
WorldMake | Easy Collaborative Reproducible Computing | ||
Zenaton | Workflow engine for orchestrating jobs, data and events across your applications and third party services | ||
ZenML | Extensible open-source MLOps framework to create reproducible pipelines for data scientists | ||
Awesome Pipeline / Workflow platforms | |||
ActivePapers | Computational science made reproducible and publishable | ||
Active Workflow | 836 | over 1 year ago | Polyglot workflows without leaving the comfort of your technology stack |
Anvi’o | A community and framework centered around metagenomics, designed to facilitate reproducible exploration and visualization of data | ||
Apache Iravata | Framework for executing and managing computational workflows on distributed computing resources | ||
Arteria | Event-driven automation for sequencing centers. Initiates workflows based on events | ||
Arvados | A container based workflow platform | ||
inactive since 10/2019 | Biokepler - Bioinformatics Scientific Workflow for Distributed Analysis of Large-Scale Biological Data. ( ) | ||
Butler | Framework for running scientific workflows on public and academic clouds | ||
Chipster | Open source platform for data analysis | ||
Clubber | Cluster Load Balancer for Bioinformatics e-Resources | ||
Digdag | Workflow manager designed for simplicity, extensibility and collaboration | ||
Domino | 149 | 4 months ago | User friendly and open source visual workflow management platform |
Fireworks | 361 | 4 months ago | Centralized workflow server for dynamic workflows of high-throughput computations |
Flojoy | 204 | 3 months ago | Open source visual Python scripting for test, measurement, and robotics control |
Flyte | 5,785 | 3 days ago | Container-native, type-safe workflow and pipelines platform for large scale processing and ML |
Galaxy | Powerful workflow system which can be used on the command line or with the GUI | ||
Geoweaver | 80 | 8 days ago | In-browser tool for data processing workflows with high-performance server support, featuring code history and workflow orchestration |
Kepler | Kepler scientific workflow application from University of California | ||
KNIME Analytics Platform | General-purpose platform with many specialized domain extensions | ||
Kubeflow | Toolkit for making deployments of machine learning workflows on Kubernetes simple, portable and scalable | ||
NextflowWorkbench | Integrated development environment for Nextflow, Docker and Reusable Workflows | ||
omega|ml DataOps Platform | 95 | 9 days ago | Data & model pipeline deployment for humans - integrated, scalable, extensible |
OpenMOLE | Workflow Management System for exploration of models and parameter optimization | ||
Ophidia | Data-analytics platform with declarative workflows of distributed operations | ||
Orchest | 4,079 | over 1 year ago | An IDE for Data Science |
Pegasus | Workflow Management System | ||
Piper | 489 | over 1 year ago | Distributed workflow engine designed to be dead simple |
Polyaxon | 3,571 | 7 days ago | A platform for machine learning experimentation workflow |
Reana | 127 | 3 days ago | Platform for reusable research data analyses developed by CERN |
Sushi | 24 | 3 days ago | Supporting User for SHell script Integration |
Yabi | Online research environment for grid, HPC and cloud computing | ||
Taverna | Domain independent workflow system | ||
Temporal | Highly scalable developer oriented engine | ||
Windmill | 10,864 | 3 days ago | Developer platform and workflow engine to turn scripts into internal tools |
VisTrails | Scientific workflow and provenance management system | ||
Wings | Semantic workflow system utilizing Pegasus as execution system | ||
Watchdog | 13 | about 2 months ago | Workflow management system for the automated and distributed analysis of large-scale experimental data |
FlowHub | FlowHub is a new workflow cloud platform | ||
Awesome Pipeline / Workflow languages | |||
Common Workflow Language | 1,455 | 3 months ago | |
Cloudgene Workflow Language | |||
OpenMOLE DSL | |||
Workflow Description Language | 776 | about 2 months ago | |
Yet Another Workflow Language | |||
Pipelines | 374 | about 5 years ago | |
Awesome Pipeline / Workflow standardization initiatives | |||
Workflow 4 Ever Initiative | |||
Workflow 4 Ever workflow research object model | |||
Workflow Patterns Initiative | |||
Workflow Patterns Library | |||
ResearchObject.org | |||
Awesome Pipeline / ETL & Data orchestration | |||
DataLad | git and git-annex based data version control system with lightweight provenance capture/re-execution support | ||
DVC | Data version control system for ML project with lightweight pipeline support | ||
lakeFS | 4,458 | 4 days ago | Repeatable, atomic and versioned data lake on top of object storage |
Nessie | 1,038 | 6 days ago | Provides Git-like capability & version control for Iceberg Tables, Delta Lake Tables & SQL Views |
Awesome Pipeline / Literate programming (aka interactive notebooks) | |||
Beaker | Notebook-style development environment | ||
Binder | Turn a GitHub repo into a collection of interactive notebooks powered by Jupyter and Kubernetes | ||
IPython | A rich architecture for interactive computing | ||
Jupyter | Language-agnostic notebook literate programming environment | ||
Org Mode | GNU Emacs major mode for computational notebooks, literate programming, and much more | ||
Pathomx | Interactive data workflows built on Python | ||
Polynote | 4,538 | 7 days ago | A better notebook for Scala (and more). Built by Netflix |
Ploomber | 3,510 | 2 months ago | Consolidate your notebooks and scripts in a reproducible pipeline using a file |
R Notebooks | R Markdown notebook literate programming environment | ||
RedPoint Notebooks | Web-native computational notebook for programmers supporting multiple languages, APIs and webooks | ||
SoS | Readable, interactive, cross-platform and cross-language data science workflow system | ||
Zeppelin | Web-based notebook that enables interactive data analytics | ||
Awesome Pipeline / Extract, transform, load (ETL) | |||
Cadence | 8,319 | 3 days ago | Distributed, scalable, durable, and highly available orchestration engine developed by Uber |
Dataform | 850 | 8 days ago | Dataform is a framework for managing SQL based operations in your data warehouse |
Hevo | Hevo is a Fully Automated, No-code Data Pipeline Platform that supports 150+ ready-to-use integrations across Databases, SaaS Applications, Cloud Storage, SDKs, and Streaming Services | ||
Kiba ETL | A data processing & ETL framework for Ruby | ||
LinkedPipes ETL | Linked Data publishing and consumption ETL tool | ||
Pentaho Kettle | A plataform that delivers poweful ETL capabilities, using a groundbreaking, metadata-driven approach | ||
Substation | 329 | 7 days ago | Substation is a cloud native data pipeline and transformation toolkit written in Go |
Awesome Pipeline / Continuous Delivery workflows | |||
Argo | 15,082 | 7 days ago | Get stuff done with container-native workflows for Kubernetes |
CDS | 4,601 | 6 days ago | A pipeline based Continuous Delivery Service written in Golang |
Awesome Pipeline / Build automation tools | |||
Bazel | Build software just as engineers do at Google | ||
doit | 1,871 | 5 months ago | Highly generalized task-management and automation in Python |
Gradle | Unified cross platforms builds | ||
Just | 21,421 | 9 days ago | Command and recipe runner similar to Make, built in Rust |
Make | The GNU Make build system | ||
Prodmodel | 59 | over 2 years ago | Build system for data science pipelines |
Scons | Python library focused on C/C++ builds | ||
Shake | 773 | 7 months ago | Define robust build systems akin to GNU Make using Haskell |
Awesome Pipeline / Automated workflow composition | |||
APE | 17 | about 1 month ago | A tool for the automated exploration of possible computational workflows based on semantic annotations |
Awesome Pipeline / Other projects | |||
HPC Grid Runner | |||
NiFi | Powerful and scalable directed graphs of data routing, transformation, and system mediation logic | ||
noWorkflow | 120 | 6 days ago | Supporting infrastructure to run scientific experiments without a scientific workflow management system, and still get things like provenance |
Reprozip | Simplifies the process of creating reproducible experiments from command-line executions | ||
Awesome Pipeline / Related lists | |||
Awesome streaming | 2,701 | 3 days ago | Curated list of awesome streaming frameworks, applications |
Awesome ETL | 3,287 | 4 months ago | Curated list of notable ETL (extract, transform, load) frameworks, libraries and software |
Awesome workflow engines | 6,440 | 5 days ago | Curated list of awesome open source workflow engines |
Computational Data Analysis Workflow Systems | 1,455 | 3 months ago |