hadoop

Data processing framework

A distributed computing framework that enables the processing and storage of large data sets in a scalable and fault-tolerant manner.

Apache Hadoop

GitHub

15k stars
987 watching
9k forks
Language: Java
last commit: 6 days ago
Linked from 3 awesome lists

hadoop

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apache/hive A software project that enables data warehousing and management of large datasets using SQL 5,554
apache/hudi Manages large analytical datasets on distributed storage systems by enabling incremental processing and snapshot isolation. 5,429
apache/hbase Provides a distributed, versioned column-oriented data storage system 5,225
hopshadoop/hops A distributed Hadoop distribution with scalable metadata and highly available YARN architecture 308
linkedinattic/datafu A collection of libraries for working with large-scale data in Hadoop, providing incremental processing capabilities and user-defined functions. 584
bwhite/hadoopy A Python MapReduce library written in Cython for efficient data processing on clusters. 243
mesos/hadoop An integration of the Hadoop distributed computing framework with the Mesos cluster management system 176
elastic/elasticsearch-hadoop Integrates Elasticsearch search and analytics with Hadoop data processing 9
esri/gis-tools-for-hadoop A collection of tools and resources for spatial analysis on big data using Hadoop and ArcGIS Geoprocessing 521
apache/mesos Provides efficient resource management and distribution across multiple applications on a shared pool of nodes. 5,271
apache/tomcat An implementation of web application server technologies and protocols 7,571
clickhouse/clickhouse A real-time analytics DBMS with support for distributed query processing and high-performance data analysis. 37,649
helgeho/hadoopconcatgz Provides a custom input format for handling concatenated GZIP files in distributed processing systems like Hadoop 9
apache/kyuubi An Apache project providing a distributed and multi-tenant gateway to enable serverless SQL on data warehouses and lakehouses 2,105
apache/dubbo-website Maintains and builds the official documentation website for a popular open-source software framework 471