pig
Data processor
Enables data processing and transformation in large files using a high-level language with compile-time optimizations for efficient execution on distributed computing frameworks.
Mirror of Apache Pig
681 stars
79 watching
450 forks
Language: Java
last commit: about 1 month ago
Linked from 1 awesome list
databasejavapig
Related projects:
Repository | Description | Stars |
---|---|---|
netflix/pigpen | A Clojure-based implementation of the map-reduce paradigm | 567 |
apache/samza | A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees | 819 |
alienrobotwizard/varaha | A set of Apache Pig scripts and UDFs for machine learning and natural language processing | 53 |
apache/druid | A high-performance real-time analytics database for fast queries and ingest | 13,523 |
apache/impala | A high-performance query engine designed to handle large-scale data processing and analytics | 1,152 |
apache/opennlp | A machine learning-based toolkit for text processing and analysis | 1,447 |
apache/tez | A system that enables flexible data processing pipelines using a low-level engine for higher-level frameworks | 480 |
apache/accumulo | A distributed key/value store with robust data storage and retrieval capabilities | 1,074 |
tweag/haskellr | An environment for efficient data processing using Haskell or R code. | 585 |
apache/spark | An analytics engine designed to handle large-scale data processing and analysis | 40,002 |
apache/mahout | An environment for quickly creating scalable machine learning applications | 2,143 |
alanmarazzi/panthera | A Clojure-based library for working with dataframes and numerical computations using Python libraries. | 189 |
emorynlp/nlp4j | Provides tools and APIs for text processing and analysis on Java-based platforms. | 148 |
apache/datafusion-python | A Python library that provides a data processing and querying framework using the Apache Arrow in-memory query engine. | 377 |
castagna/jena-grande | A collection of utilities and examples for processing RDF data using various big-data technologies. | 24 |