docker-spark-iceberg
Spark environment
A Docker-based environment for running Spark and Iceberg in a quick start scenario.
264 stars
13 watching
139 forks
Language: Jupyter Notebook
last commit: 2 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| An open source library that enables interactive development of applications using remote Spark clusters | 1,334 |
| A Docker image with Apache Spark pre-installed and configured for easy deployment on YARN clusters. | 765 |
| A library that parses and queries XML data in Apache Spark | 504 |
| A library for parsing and querying CSV data with Apache Spark | 1,052 |
| A tool to simplify running Spark on Kubernetes | 181 |
| A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets | 262 |
| An environment for experimenting with real-time data processing using Kafka, Spark streaming, and Cassandra | 97 |
| Wraps Stanford CoreNLP annotators as Spark DataFrame functions for natural language processing tasks | 422 |
| A Spark-based package to apply data fixes using rule-based SQL conditions | 28 |
| An R interface to Apache Spark for distributed data analysis and machine learning | 955 |
| An analytics engine designed to handle large-scale data processing and analysis | 40,170 |
| Enables manipulation of Apache Spark DataFrames using TensorFlow programs | 749 |
| An introductory Scala app using Apache Spark Streaming to process data from Kafka and write summaries to Cassandra. | 23 |
| A comprehensive guide to Docker development and deployment | 14 |
| Automates deployment of an AWS EMR cluster and execution of Spark jobs | 8 |