datafu

Hadoop data processing library

A collection of libraries for working with large-scale data in Hadoop, providing incremental processing capabilities and user-defined functions.

Hadoop library for large-scale data processing, now an Apache Incubator project

GitHub

583 stars
75 watching
133 forks
Language: Java
last commit: over 10 years ago

Related projects:

Repository Description Stars
apache/datafu A collection of libraries for data mining and statistics in large-scale Hadoop environments 119
datasalt/pangool A Java framework that simplifies Hadoop's MapReduce API to build efficient data processing pipelines 57
linkedinattic/cleo A flexible library for enabling rapid development of typeahead search functionality 565
linkedinattic/kamikaze A utility package wrapping set implementations on document lists with compression and set operation support. 22
lacuna/bifurcan A Java library providing efficient, functional data structures with customizable equality semantics and high performance. 968
apache/tez A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks 482
linkeddata/rdflib.js A JavaScript library for working with RDF data in various formats and querying RDF stores 567
dfianthdl/dfhdl A programming language and library for describing dataflow-based digital hardware in a high-level, object-oriented way 82
frappe/datatable A modern javascript library for creating interactive and editable tables on the web 1,050
twitter/scalding A Scala library for specifying and executing MapReduce jobs in Hadoop 3,506
rbrahul/gofp A utility library providing common functions for working with data structures like slices and maps in Go. 148
linkedin/ambry A distributed object store designed to efficiently store and serve large media objects in web applications. 1,749
mhausenblas/mrlin Maps RDF data into HBase for scalable storage and processing of Linked Data 17
alangrafu/lodspeakr A framework for building Linked Data applications using PHP 32
intentmedia/mario A library that enables the definition of complex data pipelines in a functional, typesafe, and efficient way using a declarative syntax 139