porcupine

Data pipeline tool

A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments

Express parametrable, composable and portable data pipelines

GitHub

89 stars
43 watching
11 forks
Language: Haskell
last commit: almost 3 years ago
Linked from 2 awesome lists

analyticshaskellreproducible-researchworkflows

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
pdpipe/pdpipe Provides a set of pre-defined data processing pipelines for pandas DataFrames. 718
druths/xp A tool for creating flexible and self-documenting data science pipelines 56
giacbrd/smartpipeline A framework for designing and executing concurrent data pipelines with a focus on simplicity and efficiency 25
johnsonc/lambdo A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines 1
kevin-hanselman/dud A lightweight tool for managing and versioning large data alongside source code in data pipelines 184
yougov/mongo-connector Enables real-time data synchronization between MongoDB and other systems. 1,881
databiosphere/toil A workflow management system designed to efficiently run pipelines in various environments. 901
olirice/flupy A library that provides a fluent interface for processing data pipelines in Python without holding large amounts of memory 193
pwwang/pipen A Python-based workflow automation framework that enables easy creation of data processing pipelines 105
huggingface/datatrove A platform-agnostic data processing framework for large-scale text data pipelines 2,103
picanumber/yapp Parallel pipeline library for stream processing 62
streamsets/datacollector-oss A continuous big data ingestion platform that enables easy creation of data pipelines for various data sources and destinations. 90
prodmodel/prodmodel A tool for managing data science pipelines by automating build, testing, and deployment processes while ensuring correctness and performance. 58
mehd-io/pypi-duck-flow A data engineering project that extracts insights from Python projects using DuckDB and MotherDuck. 173
nazar256/parapipe A library that provides a concurrent, non-blocking buffered pipeline for structuring and scaling applications. 33