capillaries

Data processor

A distributed batch data processing framework that enables scalable and reliable data transformation, filtering, and aggregation.

Distributed batch data processing framework

GitHub

61 stars
0 watching
2 forks
Language: Go
last commit: about 2 months ago
Linked from 1 awesome list

batch-processingcassandradagdistributed-computingdistributed-systemsgogolangrabbitmqrelational-algebraworkflow-engineworkflows

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apache/datafusion-python A Python library that provides a data processing and querying framework using the Apache Arrow in-memory query engine. 375
cmassiot/upipe A dataflow framework designed to process multimedia data buffers in a flexible and modular way. 1
quixio/quix-streams A Python framework for real-time data processing on Apache Kafka streams 1,190
whitaker-io/machine A library for creating data workflows that can be simple or complex, with features like recursion and memoization. 158
tsherwen/ac_tools A package of tools and functions for processing and analyzing atmospheric model output and observational data. 13
kapolos/pramda A PHP implementation of functional programming concepts to simplify data processing and analysis. 245
johnsonc/lambdo A workflow engine for unifying feature engineering and machine learning operations in data analysis pipelines 1
vertica/distributedr A high-performance platform for large-scale R data processing and analytics 163
snoyberg/conduit A framework for handling and transforming streaming data in a consistent and efficient way 903
databiosphere/toil A workflow management system designed to efficiently run pipelines in various environments. 901
castagna/jena-grande A collection of utilities and examples for processing RDF data using various big-data technologies. 24
h2oai/datatable A Python package for manipulating 2-dimensional tabular data structures with an emphasis on speed and big data support. 1,817
wallaroolabs/wally A distributed stream processing framework for real-time data reactions 1,480
cube2222/jql A JSON query processor with a custom syntax that simplifies complex queries by breaking them down into step-by-step operations. 896
mehd-io/pypi-duck-flow A project to build data pipelines and visualizations for analyzing Python package download data from PyPi. 148