porcupine

Data pipeline tool

A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments

Express parametrable, composable and portable data pipelines

GitHub

89 stars
43 watching
11 forks
Language: Haskell
last commit: over 2 years ago
Linked from 2 awesome lists

analyticshaskellreproducible-researchworkflows

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
pdpipe/pdpipe A tool for creating and managing data pipelines with pandas DataFrames 716
druths/xp A tool for creating flexible and self-documenting data science pipelines 56
giacbrd/smartpipeline A framework for designing and executing concurrent data pipelines with a focus on simplicity and efficiency 23
johnsonc/lambdo A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines 1
kevin-hanselman/dud A lightweight tool for managing and versioning large data alongside source code in data pipelines 183
yougov/mongo-connector Enables real-time data synchronization between MongoDB and other systems. 1,880
databiosphere/toil A workflow management system designed to efficiently run pipelines in various environments. 901
olirice/flupy A library that provides a fluent interface for processing data pipelines in Python without holding large amounts of memory 193
pwwang/pipen A Python-based workflow automation framework that enables easy creation of data processing pipelines 103
huggingface/datatrove A platform-agnostic data processing framework for large-scale text data pipelines 2,043
picanumber/yapp Parallel pipeline library for stream processing 61
streamsets/datacollector-oss A continuous big data ingestion platform that enables easy creation of data pipelines for various data sources and destinations. 90
prodmodel/prodmodel A tool for managing data science pipelines by automating build, testing, and deployment processes while ensuring correctness and performance. 59
mehd-io/pypi-duck-flow A project to build data pipelines and visualizations for analyzing Python package download data from PyPi. 148
nazar256/parapipe A non-blocking buffered pipeline library that allows concurrent processing of data while maintaining output order without locks or mutexes. 31