datafusion-python

Data processor

A Python library that provides a data processing and querying framework using the Apache Arrow in-memory query engine.

Apache DataFusion Python Bindings

385 stars

34 watching

80 forks

Language: Python

last commit: over 1 year ago

Screenshot of apache/datafusion-python website

datafusion.apache.org/python

Related projects:

Repository	Description	Stars
apache/datafusion-ballista-python	Bindings for using Apache Arrow's query engine in Python to analyze and manipulate large datasets	34
h2oai/datatable	A Python package for manipulating 2-dimensional tabular data structures with an emphasis on speed and big data support.	1,821
olirice/flupy	A library that provides a fluent interface for processing data pipelines in Python without holding large amounts of memory	193
apache/datafusion-ballista	Distributed query engine for Apache DataFusion applications	1,580
apache/spark	An analytics engine designed to handle large-scale data processing and analysis	40,170
apache/pig	Enables data processing and transformation in large files using a high-level language with compile-time optimizations for efficient execution on distributed computing frameworks.	682
capillariesio/capillaries	A distributed batch data processing framework that enables scalable and reliable data transformation, filtering, and aggregation.	62
apache/datafusion	A query engine that supports various data formats and allows customization of its functionality.	6,462
pyjanitor-devs/pyjanitor	A Python library providing a clean and expressive API for data cleaning by chaining multiple operations together in a logical order.	1,371
quixio/quix-streams	A Python framework for real-time data processing on Apache Kafka streams	1,246
intake/intake	A package for describing, loading, and processing data in a declarative way	1,015
apache/druid	A high-performance real-time analytics database for fast queries and ingest	13,548
nipype/pydra	A lightweight Python dataflow engine for building and executing directed acyclic graphs (DAGs) in a scalable manner.	123
apache/samza	A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees	817
scriptfusion/porter	A PHP library that enables durable and asynchronous data imports with memory-efficient processing and robust recovery from network errors.	611