hadoop

Data processing framework

A distributed computing framework that enables the processing and storage of large data sets in a scalable and fault-tolerant manner.

Apache Hadoop

GitHub

15k stars

985 watching

9k forks

Language: Java

last commit: over 1 year ago

Linked from 3 awesome lists

hadoop

hadoop.apache.org/

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
apache/hive	A software project that enables data warehousing and management of large datasets using SQL	5,577
apache/hudi	A platform for storing and managing big data in cloud storage, enabling incremental processing and optimized querying of large datasets	5,498
apache/hbase	A distributed, versioned, column-oriented store designed to scale and manage large amounts of structured data	5,246
hopshadoop/hops	A distributed Hadoop distribution with scalable metadata and highly available YARN architecture	309
linkedinattic/datafu	A collection of libraries for working with large-scale data in Hadoop, providing incremental processing capabilities and user-defined functions.	583
bwhite/hadoopy	A Python MapReduce library written in Cython for efficient data processing on Hadoop clusters.	243
mesos/hadoop	An integration of the Hadoop distributed computing framework with the Mesos cluster management system	176
elastic/elasticsearch-hadoop	Integrates Elasticsearch search and analytics with Hadoop data processing	1,930
esri/gis-tools-for-hadoop	A collection of tools and resources for spatial analysis on big data using Hadoop and ArcGIS Geoprocessing	521
apache/mesos	Provides efficient resource management and distribution across multiple applications on a shared pool of nodes.	5,276
apache/tomcat	An implementation of web application server technologies and protocols	7,616
clickhouse/clickhouse	A real-time analytics database engine designed to handle large volumes of data and provide fast querying capabilities	38,076
helgeho/hadoopconcatgz	Provides a custom input format for handling concatenated GZIP files in distributed processing systems like Hadoop	9
apache/kyuubi	An Apache project providing a distributed and multi-tenant gateway to enable serverless SQL on data warehouses and lakehouses	2,116
apache/dubbo-website	Maintains and builds the official documentation website for a popular open-source software framework	474