pypi-duck-flow
Data pipeline
A project to build data pipelines and visualizations for analyzing Python package download data from PyPi.
end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence
148 stars
5 watching
24 forks
Language: Python
last commit: 10 days ago
Linked from 1 awesome list
dataengineeringduckdbetlpython
Related projects:
Repository | Description | Stars |
---|---|---|
mehd-io/duckdb-extension-radar | Provides information about DuckDB extensions found on GitHub. | 82 |
amphi-ai/amphi-etl | A Python-based ETL tool for data transformation and pipeline development with low-code interface and native code generation. | 904 |
druths/xp | A tool for creating flexible and self-documenting data science pipelines | 56 |
johnsonc/lambdo | A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines | 1 |
ypares/porcupine | A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments | 89 |
giacbrd/smartpipeline | A framework for designing and executing concurrent data pipelines with a focus on simplicity and efficiency | 23 |
markroddy/duckdb-pytables | An extension for DuckDB that allows running SQL queries on arbitrary data sources using Python functions. | 83 |
kevin-hanselman/dud | A lightweight tool for managing and versioning large data alongside source code in data pipelines | 183 |
olirice/flupy | A library that provides a fluent interface for processing data pipelines in Python without holding large amounts of memory | 193 |
pdpipe/pdpipe | A tool for creating and managing data pipelines with pandas DataFrames | 716 |
minyus/pipelinex | A Python package to build and experiment with machine learning pipelines using Kedro, MLflow, and other tools | 224 |
nipype/pydra | A lightweight Python dataflow engine for building and executing directed acyclic graphs (DAGs) in a scalable manner. | 120 |
sebdah/scrapy-mongodb | A MongoDB pipeline extension for Scrapy spiders that enables real-time data insertion and buffering options. | 357 |
databiosphere/toil | A workflow management system designed to efficiently run pipelines in various environments. | 901 |
man-group/mdf | A toolkit for expressing programs as directed acyclic graphs and wiring together computations over time-series data. | 169 |