koalas

Data processing library

A Python package that allows users to work with pandas DataFrames on top of Apache Spark

Koalas: pandas API on Apache Spark

GitHub

3k stars

323 watching

358 forks

Language: Python

last commit: over 2 years ago

Linked from 2 awesome lists

big-datadata-sciencedataframemlflowpandaspydataspark

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
spark-notebook/spark-notebook	An interactive web-based editor for exploring and analyzing large datasets using Scala, Apache Spark, and other data science tools	3,155
blaze/blaze	A Python library that translates familiar NumPy/Pandas-like syntax into database query language	3,185
databricks/spark-corenlp	Wraps Stanford CoreNLP annotators as Spark DataFrame functions for natural language processing tasks	422
kotlin/kotlin-spark-api	Provides compatibility and extensions between Kotlin and Apache Spark for big data processing	463
apache/spark	An analytics engine designed to handle large-scale data processing and analysis	40,170
databricks/learning-spark	Examples and tutorials for learning Spark using Java and Scala	3,892
svenkreiss/pysparkling	A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets	262
pandas-dev/pandas	A powerful data analysis toolkit for Python that provides flexible and expressive data structures for efficient data manipulation and analysis.	44,052
microsoft/mobius	Provides a C# API for interacting with Apache Spark	941
awslabs/deequ	A library for testing data quality in large datasets	3,324
datastax/spark-cassandra-connector	A library that enables integration between Apache Spark and Apache Cassandra for fast data processing and analysis.	1,944
ydataai/ydata-profiling	An exploratory data analysis tool for Pandas and Spark DataFrames	12,602
jerrylead/sparkinternals	An in-depth analysis of Apache Spark's design and implementation	5,288
strat0sphere/spark-euca	Provides scripts to deploy multiple big data tools in a managed environment using Eucalyptus and Amazon AWS	1
johnsnowlabs/spark-nlp	Provides a set of pre-trained models and libraries for natural language processing tasks on top of Apache Spark	3,889