pig

Data processor

Enables data processing and transformation in large files using a high-level language with compile-time optimizations for efficient execution on distributed computing frameworks.

Mirror of Apache Pig

GitHub

681 stars
79 watching
450 forks
Language: Java
last commit: about 1 month ago
Linked from 1 awesome list

databasejavapig

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
netflix/pigpen A Clojure-based implementation of the map-reduce paradigm 567
apache/samza A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees 819
alienrobotwizard/varaha A set of Apache Pig scripts and UDFs for machine learning and natural language processing 53
apache/druid A high-performance real-time analytics database for fast queries and ingest 13,523
apache/impala A high-performance query engine designed to handle large-scale data processing and analytics 1,152
apache/opennlp A machine learning-based toolkit for text processing and analysis 1,447
apache/tez A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks 480
apache/accumulo A distributed key/value store with robust data storage and retrieval capabilities 1,074
tweag/haskellr An environment for efficient data processing using Haskell or R code. 585
apache/spark An analytics engine designed to handle large-scale data processing and analysis 40,002
apache/mahout An environment for quickly creating scalable machine learning applications 2,143
alanmarazzi/panthera A Clojure-based library for working with dataframes and numerical computations using Python libraries. 189
emorynlp/nlp4j Provides tools and APIs for text processing and analysis on Java-based platforms. 148
apache/datafusion-python A Python library that provides a data processing and querying framework using the Apache Arrow in-memory query engine. 377
castagna/jena-grande A collection of utilities and examples for processing RDF data using various big-data technologies. 24