docker-spark-iceberg

Spark environment

A Docker-based environment for running Spark and Iceberg in a quick start scenario.

GitHub

256 stars
13 watching
128 forks
Language: Jupyter Notebook
last commit: 11 days ago

Related projects:

Repository Description Stars
jupyter-incubator/sparkmagic An open source library that enables interactive development of applications using remote Spark clusters 1,328
sequenceiq/docker-spark A Docker image with Apache Spark pre-installed and configured for easy deployment on YARN clusters. 765
databricks/spark-xml A library that parses and queries XML data in Apache Spark 505
databricks/spark-csv A library for parsing and querying CSV data with Apache Spark 1,053
apple/batch-processing-gateway A tool to simplify running Spark on Kubernetes 181
svenkreiss/pysparkling A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets 262
yannael/kafka-sparkstreaming-cassandra An environment for experimenting with real-time data processing using Kafka, Spark streaming, and Cassandra 97
databricks/spark-corenlp Wraps Stanford CoreNLP annotators as Spark DataFrame functions for natural language processing tasks 422
indix/sparkplug A Spark-based package to apply data fixes using rule-based SQL conditions 28
sparklyr/sparklyr An R interface to Apache Spark for distributed data analysis and machine learning 957
apache/spark An analytics engine designed to handle large-scale data processing and analysis 39,916
databricks/tensorframes Enables manipulation of Apache Spark DataFrames using TensorFlow programs 749
instaclustr/sample-kafkasparkcassandra An introductory Scala app using Apache Spark Streaming to process data from Kafka and write summaries to Cassandra. 23
ellerbrock/docker-tutorial A comprehensive guide to Docker development and deployment 14
kcrandall/emr_spark_automation Automates deployment of an AWS EMR cluster and execution of Spark jobs 8