hadoop

Data processing framework

A distributed computing framework that enables the processing and storage of large data sets in a scalable and fault-tolerant manner.

Apache Hadoop

GitHub

15k stars
985 watching
9k forks
Language: Java
last commit: about 2 months ago
Linked from 3 awesome lists

hadoop

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apache/hive A software project that enables data warehousing and management of large datasets using SQL 5,577
apache/hudi A platform for storing and managing big data in cloud storage, enabling incremental processing and optimized querying of large datasets 5,498
apache/hbase A distributed, versioned, column-oriented store designed to scale and manage large amounts of structured data 5,246
hopshadoop/hops A distributed Hadoop distribution with scalable metadata and highly available YARN architecture 309
linkedinattic/datafu A collection of libraries for working with large-scale data in Hadoop, providing incremental processing capabilities and user-defined functions. 583
bwhite/hadoopy A Python MapReduce library written in Cython for efficient data processing on Hadoop clusters. 243
mesos/hadoop An integration of the Hadoop distributed computing framework with the Mesos cluster management system 176
elastic/elasticsearch-hadoop Integrates Elasticsearch search and analytics with Hadoop data processing 1,930
esri/gis-tools-for-hadoop A collection of tools and resources for spatial analysis on big data using Hadoop and ArcGIS Geoprocessing 521
apache/mesos Provides efficient resource management and distribution across multiple applications on a shared pool of nodes. 5,276
apache/tomcat An implementation of web application server technologies and protocols 7,616
clickhouse/clickhouse A real-time analytics database engine designed to handle large volumes of data and provide fast querying capabilities 38,076
helgeho/hadoopconcatgz Provides a custom input format for handling concatenated GZIP files in distributed processing systems like Hadoop 9
apache/kyuubi An Apache project providing a distributed and multi-tenant gateway to enable serverless SQL on data warehouses and lakehouses 2,116
apache/dubbo-website Maintains and builds the official documentation website for a popular open-source software framework 474