hadoop
Data processing framework
A distributed computing framework that enables the processing and storage of large data sets in a scalable and fault-tolerant manner.
Apache Hadoop
15k stars
985 watching
9k forks
Language: Java
last commit: about 2 months ago
Linked from 3 awesome lists
hadoop
Related projects:
Repository | Description | Stars |
---|---|---|
apache/hive | A software project that enables data warehousing and management of large datasets using SQL | 5,577 |
apache/hudi | A platform for storing and managing big data in cloud storage, enabling incremental processing and optimized querying of large datasets | 5,498 |
apache/hbase | A distributed, versioned, column-oriented store designed to scale and manage large amounts of structured data | 5,246 |
hopshadoop/hops | A distributed Hadoop distribution with scalable metadata and highly available YARN architecture | 309 |
linkedinattic/datafu | A collection of libraries for working with large-scale data in Hadoop, providing incremental processing capabilities and user-defined functions. | 583 |
bwhite/hadoopy | A Python MapReduce library written in Cython for efficient data processing on Hadoop clusters. | 243 |
mesos/hadoop | An integration of the Hadoop distributed computing framework with the Mesos cluster management system | 176 |
elastic/elasticsearch-hadoop | Integrates Elasticsearch search and analytics with Hadoop data processing | 1,930 |
esri/gis-tools-for-hadoop | A collection of tools and resources for spatial analysis on big data using Hadoop and ArcGIS Geoprocessing | 521 |
apache/mesos | Provides efficient resource management and distribution across multiple applications on a shared pool of nodes. | 5,276 |
apache/tomcat | An implementation of web application server technologies and protocols | 7,616 |
clickhouse/clickhouse | A real-time analytics database engine designed to handle large volumes of data and provide fast querying capabilities | 38,076 |
helgeho/hadoopconcatgz | Provides a custom input format for handling concatenated GZIP files in distributed processing systems like Hadoop | 9 |
apache/kyuubi | An Apache project providing a distributed and multi-tenant gateway to enable serverless SQL on data warehouses and lakehouses | 2,116 |
apache/dubbo-website | Maintains and builds the official documentation website for a popular open-source software framework | 474 |