pysparkling

Dataset processor

A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

GitHub

262 stars
9 watching
45 forks
Language: Python
last commit: 3 months ago
Linked from 1 awesome list

apache-sparkdata-processingdata-sciencepython

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apache/spark An analytics engine designed to handle large-scale data processing and analysis 39,916
tubular/sparkly A set of Python libraries and tools to simplify interactions with various data sources using Apache Spark. 60
datastax/spark-cassandra-connector A library that enables integration between Apache Spark and Apache Cassandra for fast data processing and analysis. 1,943
sparklingpandas/sparklingpandas Enables distributed data analysis using PySpark and Pandas APIs 361
instaclustr/sample-kafkasparkcassandra An introductory Scala app using Apache Spark Streaming to process data from Kafka and write summaries to Cassandra. 23
dsgrid/dsgrid A Python package for managing and analyzing demand-side grid data, models, and queries using Apache Spark 26
internetarchive/sparkling A data processing library built on top of Apache Spark to handle temporal web data 11
sparklyr/sparklyr An R interface to Apache Spark for distributed data analysis and machine learning 957
mrpowers-io/quinn Pyspark helper functions to maximize developer productivity 643
dmmiller612/sparktorch A PyTorch implementation on Apache Spark for distributed deep learning model training and inference. 339
brbester/pyciscospark Provides an interface to the Cisco Spark REST API 30
kevinschaich/pyspark-cheatsheet A comprehensive reference guide to working with PySpark SQL 449
sparkica/lodgrefine An extension of Google Refine for working with Linked Open Data 14
pyjanitor-devs/pyjanitor A Python library providing a clean and expressive API for data cleaning by chaining multiple operations together in a logical order. 1,364
tktech/pysimdjson Fast JSON parsing for Python, using SIMD instructions when available 643