hadoop
Data processing framework
A distributed computing framework that enables the processing and storage of large data sets in a scalable and fault-tolerant manner.
Apache Hadoop
15k stars
987 watching
9k forks
Language: Java
last commit: 6 days ago
Linked from 3 awesome lists
hadoop
Related projects:
Repository | Description | Stars |
---|---|---|
apache/hive | A software project that enables data warehousing and management of large datasets using SQL | 5,554 |
apache/hudi | Manages large analytical datasets on distributed storage systems by enabling incremental processing and snapshot isolation. | 5,429 |
apache/hbase | Provides a distributed, versioned column-oriented data storage system | 5,225 |
hopshadoop/hops | A distributed Hadoop distribution with scalable metadata and highly available YARN architecture | 308 |
linkedinattic/datafu | A collection of libraries for working with large-scale data in Hadoop, providing incremental processing capabilities and user-defined functions. | 584 |
bwhite/hadoopy | A Python MapReduce library written in Cython for efficient data processing on clusters. | 243 |
mesos/hadoop | An integration of the Hadoop distributed computing framework with the Mesos cluster management system | 176 |
elastic/elasticsearch-hadoop | Integrates Elasticsearch search and analytics with Hadoop data processing | 9 |
esri/gis-tools-for-hadoop | A collection of tools and resources for spatial analysis on big data using Hadoop and ArcGIS Geoprocessing | 521 |
apache/mesos | Provides efficient resource management and distribution across multiple applications on a shared pool of nodes. | 5,271 |
apache/tomcat | An implementation of web application server technologies and protocols | 7,571 |
clickhouse/clickhouse | A real-time analytics DBMS with support for distributed query processing and high-performance data analysis. | 37,649 |
helgeho/hadoopconcatgz | Provides a custom input format for handling concatenated GZIP files in distributed processing systems like Hadoop | 9 |
apache/kyuubi | An Apache project providing a distributed and multi-tenant gateway to enable serverless SQL on data warehouses and lakehouses | 2,105 |
apache/dubbo-website | Maintains and builds the official documentation website for a popular open-source software framework | 471 |