koalas
Data processing library
A Python package that allows users to work with pandas DataFrames on top of Apache Spark
Koalas: pandas API on Apache Spark
3k stars
326 watching
358 forks
Language: Python
last commit: 8 months ago
Linked from 2 awesome lists
big-datadata-sciencedataframemlflowpandaspydataspark
Related projects:
Repository | Description | Stars |
---|---|---|
spark-notebook/spark-notebook | An interactive web-based editor for exploring and analyzing large datasets using Scala, Apache Spark, and other data science tools | 3,151 |
blaze/blaze | A Python library that translates familiar NumPy/Pandas-like syntax into database query language | 3,187 |
databricks/spark-corenlp | Wraps Stanford CoreNLP annotators as Spark DataFrame functions for natural language processing tasks | 422 |
kotlin/kotlin-spark-api | Provides compatibility and extensions between Kotlin and Apache Spark for big data processing | 461 |
apache/spark | An analytics engine designed to handle large-scale data processing and analysis | 39,916 |
databricks/learning-spark | Examples and tutorials for learning Spark using Java and Scala | 3,890 |
svenkreiss/pysparkling | A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets | 262 |
pandas-dev/pandas | A powerful data analysis toolkit for Python that provides flexible and expressive data structures for efficient data manipulation and analysis. | 43,807 |
microsoft/mobius | Provides a C# API for interacting with Apache Spark | 942 |
awslabs/deequ | A library for testing data quality in large datasets | 3,308 |
datastax/spark-cassandra-connector | A library that enables integration between Apache Spark and Apache Cassandra for fast data processing and analysis. | 1,943 |
ydataai/ydata-profiling | An exploratory data analysis tool for Pandas and Spark DataFrames | 12,536 |
jerrylead/sparkinternals | An in-depth analysis of Apache Spark's design and implementation | 5,283 |
strat0sphere/spark-euca | Provides scripts to deploy multiple big data tools in a managed environment using Eucalyptus and Amazon AWS | 1 |
johnsnowlabs/spark-nlp | Provides a set of pre-trained models and libraries for natural language processing tasks on top of Apache Spark | 3,871 |