datafu

Hadoop data processing library

A collection of libraries for working with large-scale data in Hadoop, providing incremental processing capabilities and user-defined functions.

Hadoop library for large-scale data processing, now an Apache Incubator project

GitHub

584 stars
75 watching
134 forks
Language: Java
last commit: over 10 years ago

Related projects:

Repository Description Stars
apache/datafu A collection of libraries for data mining and statistics in large-scale Hadoop environments 118
datasalt/pangool A Java framework that simplifies Hadoop's MapReduce API to build efficient data processing pipelines 57
linkedinattic/cleo A flexible library for enabling rapid development of typeahead search functionality 565
linkedinattic/kamikaze A utility package implementing set operations and compression algorithms for efficient document searching in search engines. 22
lacuna/bifurcan A Java library providing efficient, functional data structures with customizable equality semantics and high performance. 967
apache/tez A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks 479
linkeddata/rdflib.js A JavaScript library for working with RDF data in various formats and querying RDF stores 566
dfianthdl/dfhdl A programming language and library for describing dataflow-based digital hardware in a high-level, object-oriented way 80
frappe/datatable A JavaScript library for displaying and editing tabular data in a modern and interactive way 1,042
twitter/scalding A Scala library for specifying and executing MapReduce jobs in Hadoop 3,500
rbrahul/gofp A utility library providing common functions for working with data structures like slices and maps in Go. 146
linkedin/ambry A distributed object store designed to handle large amounts of small and large immutable objects with high availability and low latency. 1,751
mhausenblas/mrlin Maps RDF data into HBase for scalable storage and processing of Linked Data 17
alangrafu/lodspeakr A framework for creating Linked Data applications using PHP. 32
intentmedia/mario A library that enables the definition of complex data pipelines in a functional, typesafe, and efficient way using a declarative syntax 139