koalas
Data processing library
A Python package that allows users to work with pandas DataFrames on top of Apache Spark
Koalas: pandas API on Apache Spark
3k stars
323 watching
358 forks
Language: Python
last commit: about 1 year ago
Linked from 2 awesome lists
big-datadata-sciencedataframemlflowpandaspydataspark
Related projects:
Repository | Description | Stars |
---|---|---|
| An interactive web-based editor for exploring and analyzing large datasets using Scala, Apache Spark, and other data science tools | 3,155 |
| A Python library that translates familiar NumPy/Pandas-like syntax into database query language | 3,185 |
| Wraps Stanford CoreNLP annotators as Spark DataFrame functions for natural language processing tasks | 422 |
| Provides compatibility and extensions between Kotlin and Apache Spark for big data processing | 463 |
| An analytics engine designed to handle large-scale data processing and analysis | 40,170 |
| Examples and tutorials for learning Spark using Java and Scala | 3,892 |
| A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets | 262 |
| A powerful data analysis toolkit for Python that provides flexible and expressive data structures for efficient data manipulation and analysis. | 44,052 |
| Provides a C# API for interacting with Apache Spark | 941 |
| A library for testing data quality in large datasets | 3,324 |
| A library that enables integration between Apache Spark and Apache Cassandra for fast data processing and analysis. | 1,944 |
| An exploratory data analysis tool for Pandas and Spark DataFrames | 12,602 |
| An in-depth analysis of Apache Spark's design and implementation | 5,288 |
| Provides scripts to deploy multiple big data tools in a managed environment using Eucalyptus and Amazon AWS | 1 |
| Provides a set of pre-trained models and libraries for natural language processing tasks on top of Apache Spark | 3,889 |