koalas

Data processing library

A Python package that allows users to work with pandas DataFrames on top of Apache Spark

Koalas: pandas API on Apache Spark

GitHub

3k stars
326 watching
358 forks
Language: Python
last commit: 8 months ago
Linked from 2 awesome lists

big-datadata-sciencedataframemlflowpandaspydataspark

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
spark-notebook/spark-notebook An interactive web-based editor for exploring and analyzing large datasets using Scala, Apache Spark, and other data science tools 3,151
blaze/blaze A Python library that translates familiar NumPy/Pandas-like syntax into database query language 3,187
databricks/spark-corenlp Wraps Stanford CoreNLP annotators as Spark DataFrame functions for natural language processing tasks 422
kotlin/kotlin-spark-api Provides compatibility and extensions between Kotlin and Apache Spark for big data processing 461
apache/spark An analytics engine designed to handle large-scale data processing and analysis 39,916
databricks/learning-spark Examples and tutorials for learning Spark using Java and Scala 3,890
svenkreiss/pysparkling A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets 262
pandas-dev/pandas A powerful data analysis toolkit for Python that provides flexible and expressive data structures for efficient data manipulation and analysis. 43,807
microsoft/mobius Provides a C# API for interacting with Apache Spark 942
awslabs/deequ A library for testing data quality in large datasets 3,308
datastax/spark-cassandra-connector A library that enables integration between Apache Spark and Apache Cassandra for fast data processing and analysis. 1,943
ydataai/ydata-profiling An exploratory data analysis tool for Pandas and Spark DataFrames 12,536
jerrylead/sparkinternals An in-depth analysis of Apache Spark's design and implementation 5,283
strat0sphere/spark-euca Provides scripts to deploy multiple big data tools in a managed environment using Eucalyptus and Amazon AWS 1
johnsnowlabs/spark-nlp Provides a set of pre-trained models and libraries for natural language processing tasks on top of Apache Spark 3,871