pysparkling
Dataset processor
A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
262 stars
9 watching
45 forks
Language: Python
last commit: 3 months ago
Linked from 1 awesome list
apache-sparkdata-processingdata-sciencepython
Related projects:
Repository | Description | Stars |
---|---|---|
apache/spark | An analytics engine designed to handle large-scale data processing and analysis | 39,916 |
tubular/sparkly | A set of Python libraries and tools to simplify interactions with various data sources using Apache Spark. | 60 |
datastax/spark-cassandra-connector | A library that enables integration between Apache Spark and Apache Cassandra for fast data processing and analysis. | 1,943 |
sparklingpandas/sparklingpandas | Enables distributed data analysis using PySpark and Pandas APIs | 361 |
instaclustr/sample-kafkasparkcassandra | An introductory Scala app using Apache Spark Streaming to process data from Kafka and write summaries to Cassandra. | 23 |
dsgrid/dsgrid | A Python package for managing and analyzing demand-side grid data, models, and queries using Apache Spark | 26 |
internetarchive/sparkling | A data processing library built on top of Apache Spark to handle temporal web data | 11 |
sparklyr/sparklyr | An R interface to Apache Spark for distributed data analysis and machine learning | 957 |
mrpowers-io/quinn | Pyspark helper functions to maximize developer productivity | 643 |
dmmiller612/sparktorch | A PyTorch implementation on Apache Spark for distributed deep learning model training and inference. | 339 |
brbester/pyciscospark | Provides an interface to the Cisco Spark REST API | 30 |
kevinschaich/pyspark-cheatsheet | A comprehensive reference guide to working with PySpark SQL | 449 |
sparkica/lodgrefine | An extension of Google Refine for working with Linked Open Data | 14 |
pyjanitor-devs/pyjanitor | A Python library providing a clean and expressive API for data cleaning by chaining multiple operations together in a logical order. | 1,364 |
tktech/pysimdjson | Fast JSON parsing for Python, using SIMD instructions when available | 643 |