pig

Data processor

Enables data processing and transformation in large files using a high-level language with compile-time optimizations for efficient execution on distributed computing frameworks.

Mirror of Apache Pig

GitHub

682 stars
79 watching
451 forks
Language: Java
last commit: 4 months ago
Linked from 1 awesome list

databasejavapig

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
netflix/pigpen A map-reduce framework for Clojure that compiles to Apache Pig or Cascading without requiring prior knowledge of those systems. 567
apache/samza A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees 817
alienrobotwizard/varaha A set of Apache Pig scripts and UDFs for machine learning and natural language processing 53
apache/druid A high-performance real-time analytics database for fast queries and ingest 13,548
apache/impala A high-performance query engine designed to handle large-scale data processing and analytics 1,164
apache/opennlp Provides a toolkit for natural language text processing tasks using machine learning algorithms in Java. 1,449
apache/tez A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks 482
apache/accumulo A distributed key/value store with robust data storage and retrieval capabilities 1,075
tweag/haskellr An environment for efficient data processing using Haskell or R code. 587
apache/spark An analytics engine designed to handle large-scale data processing and analysis 40,170
apache/mahout An environment for quickly creating scalable machine learning applications 2,145
alanmarazzi/panthera A Clojure-based library for working with dataframes and numerical computations using Python libraries. 189
emorynlp/nlp4j Provides tools and APIs for text processing and analysis on Java-based platforms. 148
apache/datafusion-python A Python library that provides a data processing and querying framework using the Apache Arrow in-memory query engine. 385
castagna/jena-grande A collection of utilities and examples for processing RDF data using various big-data technologies. 24