dataform

Data pipeline framework

A framework for managing data operations in BigQuery using SQL and software engineering best practices

Dataform is a framework for managing SQL based data operations in BigQuery

GitHub

856 stars
27 watching
166 forks
Language: TypeScript
last commit: 2 days ago
Linked from 1 awesome list

analyticsbusiness-intelligencedata-engineeringdata-pipelineseltetlhacktoberfest

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
sitecore/data-exchange-framework-docs A documentation project for an ETL tool used in Sitecore to exchange and process data 1
vectaport/flowgraph A software framework for building scalable, asynchronous data pipelines with explicit back-pressure management and logging capabilities. 60
huggingface/datatrove A platform-agnostic data processing framework for large-scale text data pipelines 2,073
googlecloudplatform/dataflowtemplates A collection of pre-implemented data pipelines using Google Cloud Dataflow and Apache Beam 1,164
log2timeline/dftimewolf A framework for orchestrating data collection, processing, and export 296
giacbrd/smartpipeline A framework for designing and executing concurrent data pipelines with a focus on simplicity and efficiency 23
dagworks-inc/hamilton Helps define and manage data transformations with a modular, self-documenting, and portable framework for directed acyclic graphs (DAGs) of data transformations. 1,884
dataformsjs/dataformsjs A minimal JavaScript framework for rapid development of high-quality websites and single-page applications using JSX, Web Components, and templating engines. 191
ph200/cycle-react An RxJS-based framework for building functional React applications with controlled data flow 370
microsoft/chart-parts A React-based data visualization framework that abstracts away common charting complexities. 608
raftlib/raftlib A C++ library providing a framework for implementing parallel and concurrent data processing pipelines. 953
jexia/semaphore Builds high-performance data flows that can be exposed through multiple protocols and integrates with existing systems. 94
galaxyproject/galaxy A platform for data-intensive scientific analysis and workflow management 1,416
biocorecrg/bionextflow A collection of reusable modules and sub-workflows for Nextflow pipelines in bioinformatics 26
datacrypt-project/hitchhiker-tree A data structure and application framework for building fast, persistent, and scalable databases. 1,190