pypi-duck-flow

Data pipeline

A project to build data pipelines and visualizations for analyzing Python package download data from PyPi.

end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence

GitHub

148 stars
5 watching
24 forks
Language: Python
last commit: 10 days ago
Linked from 1 awesome list

dataengineeringduckdbetlpython

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
mehd-io/duckdb-extension-radar Provides information about DuckDB extensions found on GitHub. 82
amphi-ai/amphi-etl A Python-based ETL tool for data transformation and pipeline development with low-code interface and native code generation. 904
druths/xp A tool for creating flexible and self-documenting data science pipelines 56
johnsonc/lambdo A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines 1
ypares/porcupine A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments 89
giacbrd/smartpipeline A framework for designing and executing concurrent data pipelines with a focus on simplicity and efficiency 23
markroddy/duckdb-pytables An extension for DuckDB that allows running SQL queries on arbitrary data sources using Python functions. 83
kevin-hanselman/dud A lightweight tool for managing and versioning large data alongside source code in data pipelines 183
olirice/flupy A library that provides a fluent interface for processing data pipelines in Python without holding large amounts of memory 193
pdpipe/pdpipe A tool for creating and managing data pipelines with pandas DataFrames 716
minyus/pipelinex A Python package to build and experiment with machine learning pipelines using Kedro, MLflow, and other tools 224
nipype/pydra A lightweight Python dataflow engine for building and executing directed acyclic graphs (DAGs) in a scalable manner. 120
sebdah/scrapy-mongodb A MongoDB pipeline extension for Scrapy spiders that enables real-time data insertion and buffering options. 357
databiosphere/toil A workflow management system designed to efficiently run pipelines in various environments. 901
man-group/mdf A toolkit for expressing programs as directed acyclic graphs and wiring together computations over time-series data. 169