pypi-duck-flow
Data pipeline
A data engineering project that extracts insights from Python projects using DuckDB and MotherDuck.
end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence
173 stars
6 watching
27 forks
Language: Python
last commit: about 2 months ago
Linked from 1 awesome list
dataengineeringduckdbetlpython
Related projects:
Repository | Description | Stars |
---|---|---|
mehd-io/duckdb-extension-radar | A repository tracking DuckDB extensions on GitHub, providing information about created date and last updated date. | 84 |
amphi-ai/amphi-etl | A tool that enables data analysts to create and manage data pipelines with an intuitive interface, generating Python code for deployment anywhere. | 933 |
druths/xp | A tool for creating flexible and self-documenting data science pipelines | 56 |
johnsonc/lambdo | A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines | 1 |
ypares/porcupine | A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments | 89 |
giacbrd/smartpipeline | A framework for designing and executing concurrent data pipelines with a focus on simplicity and efficiency | 25 |
markroddy/duckdb-pytables | An extension for DuckDB that allows running SQL queries on arbitrary data sources using Python functions. | 84 |
kevin-hanselman/dud | A lightweight tool for managing and versioning large data alongside source code in data pipelines | 184 |
olirice/flupy | A library that provides a fluent interface for processing data pipelines in Python without holding large amounts of memory | 193 |
pdpipe/pdpipe | Provides a set of pre-defined data processing pipelines for pandas DataFrames. | 718 |
minyus/pipelinex | A Python package to build and experiment with machine learning pipelines using Kedro, MLflow, and other tools | 226 |
nipype/pydra | A lightweight Python dataflow engine for building and executing directed acyclic graphs (DAGs) in a scalable manner. | 123 |
sebdah/scrapy-mongodb | A MongoDB pipeline extension for Scrapy spiders that enables real-time data insertion and buffering options. | 357 |
databiosphere/toil | A workflow management system designed to efficiently run pipelines in various environments. | 901 |
man-group/mdf | A toolkit for expressing programs as directed acyclic graphs and wiring together computations over time-series data. | 169 |