pangool

Data pipeline builder

A Java framework that simplifies Hadoop's MapReduce API to build efficient data processing pipelines

Tuple MapReduce for Hadoop: Hadoop API made easy

GitHub

57 stars
12 watching
13 forks
Language: Java
last commit: over 2 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
darky/rocket-pipes A TypeScript library that enables the creation of modular, composable, and reusable data processing pipelines 25
apache/tez A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks 480
linkedinattic/datafu A collection of libraries for working with large-scale data in Hadoop, providing incremental processing capabilities and user-defined functions. 584
druths/xp A tool for creating flexible and self-documenting data science pipelines 56
ypares/porcupine A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments 89
samapriya/planet-gee-pipeline-cli A command-line tool for automating data processing and uploads from Planet's API to Google Earth Engine. 42
netflix/suro A distributed data pipeline service for collecting, aggregating, and dispatching large volumes of application events. 794
linkedin/brooklin A distributed system for streaming data between heterogeneous systems with high reliability and throughput at scale 920
kubeflow-kale/kale Simplifies the deployment of Kubeflow Pipelines workflows by providing a graphical interface for Data Scientists to define and deploy pipelines directly from JupyterLab. 632
apache/streampipes A toolbox for industrial data analytics and stream processing 607
deepak-malik/data-structures-in-java A collection of Java implementations of various data structures and algorithms used in computer science 145
pakoito/rxfunctions A library for composing and chaining functions on Observables in RxJava to simplify complex data processing pipelines. 49
vincentclaes/datajob Automates end-to-end machine learning pipeline deployment with AWS services 110
damballa/parkour A Clojure-based library for writing efficient MapReduce programs on the Hadoop platform 257
pdpipe/pdpipe A tool for creating and managing data pipelines with pandas DataFrames 716