pypi-duck-flow

Data pipeline

A data engineering project that extracts insights from Python projects using DuckDB and MotherDuck.

end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence

GitHub

173 stars

6 watching

27 forks

Language: Python

last commit: over 1 year ago

Linked from 1 awesome list

dataengineeringduckdbetlpython

Screenshot of mehd-io/pypi-duck-flow website

duckdbstats.com/

Backlinks from these awesome lists:

davidgasquez/awesome-duckdb

Related projects:

Repository	Description	Stars
mehd-io/duckdb-extension-radar	A repository tracking DuckDB extensions on GitHub, providing information about created date and last updated date.	84
amphi-ai/amphi-etl	A tool that enables data analysts to create and manage data pipelines with an intuitive interface, generating Python code for deployment anywhere.	933
druths/xp	A tool for creating flexible and self-documenting data science pipelines	56
johnsonc/lambdo	A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines	1
ypares/porcupine	A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments	89
giacbrd/smartpipeline	A framework for designing and executing concurrent data pipelines with a focus on simplicity and efficiency	25
markroddy/duckdb-pytables	An extension for DuckDB that allows running SQL queries on arbitrary data sources using Python functions.	84
kevin-hanselman/dud	A lightweight tool for managing and versioning large data alongside source code in data pipelines	184
olirice/flupy	A library that provides a fluent interface for processing data pipelines in Python without holding large amounts of memory	193
pdpipe/pdpipe	Provides a set of pre-defined data processing pipelines for pandas DataFrames.	718
minyus/pipelinex	A Python package to build and experiment with machine learning pipelines using Kedro, MLflow, and other tools	226
nipype/pydra	A lightweight Python dataflow engine for building and executing directed acyclic graphs (DAGs) in a scalable manner.	123
sebdah/scrapy-mongodb	A MongoDB pipeline extension for Scrapy spiders that enables real-time data insertion and buffering options.	357
databiosphere/toil	A workflow management system designed to efficiently run pipelines in various environments.	901
man-group/mdf	A toolkit for expressing programs as directed acyclic graphs and wiring together computations over time-series data.	169