pangool
Data pipeline builder
A Java framework that simplifies Hadoop's MapReduce API to build efficient data processing pipelines
Tuple MapReduce for Hadoop: Hadoop API made easy
57 stars
12 watching
13 forks
Language: Java
last commit: over 2 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
darky/rocket-pipes | A TypeScript library that enables the creation of modular, composable, and reusable data processing pipelines | 25 |
apache/tez | A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks | 480 |
linkedinattic/datafu | A collection of libraries for working with large-scale data in Hadoop, providing incremental processing capabilities and user-defined functions. | 584 |
druths/xp | A tool for creating flexible and self-documenting data science pipelines | 56 |
ypares/porcupine | A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments | 89 |
samapriya/planet-gee-pipeline-cli | A command-line tool for automating data processing and uploads from Planet's API to Google Earth Engine. | 42 |
netflix/suro | A distributed data pipeline service for collecting, aggregating, and dispatching large volumes of application events. | 794 |
linkedin/brooklin | A distributed system for streaming data between heterogeneous systems with high reliability and throughput at scale | 920 |
kubeflow-kale/kale | Simplifies the deployment of Kubeflow Pipelines workflows by providing a graphical interface for Data Scientists to define and deploy pipelines directly from JupyterLab. | 632 |
apache/streampipes | A toolbox for industrial data analytics and stream processing | 607 |
deepak-malik/data-structures-in-java | A collection of Java implementations of various data structures and algorithms used in computer science | 145 |
pakoito/rxfunctions | A library for composing and chaining functions on Observables in RxJava to simplify complex data processing pipelines. | 49 |
vincentclaes/datajob | Automates end-to-end machine learning pipeline deployment with AWS services | 110 |
damballa/parkour | A Clojure-based library for writing efficient MapReduce programs on the Hadoop platform | 257 |
pdpipe/pdpipe | A tool for creating and managing data pipelines with pandas DataFrames | 716 |