butterfree

Feature pipeline builder

A Python library for building data pipelines to create and load features into a feature store using Apache Spark.

A tool for building feature stores.

GitHub

283 stars
186 watching
36 forks
Language: Python
last commit: about 1 month ago
Linked from 2 awesome lists

data-engineeringdata-scienceetletl-frameworkfeature-storepackagepysparkpython

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
amphi-ai/amphi-etl A Python-based ETL tool for data transformation and pipeline development with low-code interface and native code generation. 904
jazzband/django-pipeline An asset packaging library for Django that simplifies CSS and JavaScript concatenation and compression. 1,517
kubeflow-kale/kale Simplifies the deployment of Kubeflow Pipelines workflows by providing a graphical interface for Data Scientists to define and deploy pipelines directly from JupyterLab. 632
druths/xp A tool for creating flexible and self-documenting data science pipelines 56
py-universe/django-rest-cli A tool that speeds up the development of Django Rest APIs by automating repetitive tasks. 117
zorbash/opus A framework for building pluggable business logic pipelines with a focus on modular and composable components. 361
johnsonc/lambdo A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines 1
minyus/pipelinex A Python package to build and experiment with machine learning pipelines using Kedro, MLflow, and other tools 224
pakoito/rxfunctions A library for composing and chaining functions on Observables in RxJava to simplify complex data processing pipelines. 49
jackqqwang/pfedhr A Python project implementing a novel approach to high-performance feature learning and dimensionality reduction in deep neural networks 7
ypares/porcupine A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments 89
bytehub-ai/bytehub A Python-based feature store library with a simple, scalable, and flexible architecture for storing and managing data for machine learning applications. 58
datasalt/pangool A Java framework that simplifies Hadoop's MapReduce API to build efficient data processing pipelines 57
quixio/quix-streams A Python framework for real-time data processing on Apache Kafka streams 1,190
giacbrd/smartpipeline A framework for designing and executing concurrent data pipelines with a focus on simplicity and efficiency 23