porcupine
Data pipeline tool
A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments
Express parametrable, composable and portable data pipelines
89 stars
43 watching
11 forks
Language: Haskell
last commit: over 2 years ago
Linked from 2 awesome lists
analyticshaskellreproducible-researchworkflows
Related projects:
Repository | Description | Stars |
---|---|---|
pdpipe/pdpipe | A tool for creating and managing data pipelines with pandas DataFrames | 716 |
druths/xp | A tool for creating flexible and self-documenting data science pipelines | 56 |
giacbrd/smartpipeline | A framework for designing and executing concurrent data pipelines with a focus on simplicity and efficiency | 23 |
johnsonc/lambdo | A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines | 1 |
kevin-hanselman/dud | A lightweight tool for managing and versioning large data alongside source code in data pipelines | 183 |
yougov/mongo-connector | Enables real-time data synchronization between MongoDB and other systems. | 1,880 |
databiosphere/toil | A workflow management system designed to efficiently run pipelines in various environments. | 901 |
olirice/flupy | A library that provides a fluent interface for processing data pipelines in Python without holding large amounts of memory | 193 |
pwwang/pipen | A Python-based workflow automation framework that enables easy creation of data processing pipelines | 103 |
huggingface/datatrove | A platform-agnostic data processing framework for large-scale text data pipelines | 2,043 |
picanumber/yapp | Parallel pipeline library for stream processing | 61 |
streamsets/datacollector-oss | A continuous big data ingestion platform that enables easy creation of data pipelines for various data sources and destinations. | 90 |
prodmodel/prodmodel | A tool for managing data science pipelines by automating build, testing, and deployment processes while ensuring correctness and performance. | 59 |
mehd-io/pypi-duck-flow | A project to build data pipelines and visualizations for analyzing Python package download data from PyPi. | 148 |
nazar256/parapipe | A non-blocking buffered pipeline library that allows concurrent processing of data while maintaining output order without locks or mutexes. | 31 |