datafusion-python

Data processor

A Python library that provides a data processing and querying framework using the Apache Arrow in-memory query engine.

Apache DataFusion Python Bindings

GitHub

385 stars
34 watching
80 forks
Language: Python
last commit: 5 days ago

Related projects:

Repository Description Stars
apache/datafusion-ballista-python Bindings for using Apache Arrow's query engine in Python to analyze and manipulate large datasets 34
h2oai/datatable A Python package for manipulating 2-dimensional tabular data structures with an emphasis on speed and big data support. 1,821
olirice/flupy A library that provides a fluent interface for processing data pipelines in Python without holding large amounts of memory 193
apache/datafusion-ballista Distributed query engine for Apache DataFusion applications 1,580
apache/spark An analytics engine designed to handle large-scale data processing and analysis 40,170
apache/pig Enables data processing and transformation in large files using a high-level language with compile-time optimizations for efficient execution on distributed computing frameworks. 682
capillariesio/capillaries A distributed batch data processing framework that enables scalable and reliable data transformation, filtering, and aggregation. 62
apache/datafusion A query engine that supports various data formats and allows customization of its functionality. 6,462
pyjanitor-devs/pyjanitor A Python library providing a clean and expressive API for data cleaning by chaining multiple operations together in a logical order. 1,371
quixio/quix-streams A Python framework for real-time data processing on Apache Kafka streams 1,246
intake/intake A package for describing, loading, and processing data in a declarative way 1,015
apache/druid A high-performance real-time analytics database for fast queries and ingest 13,548
nipype/pydra A lightweight Python dataflow engine for building and executing directed acyclic graphs (DAGs) in a scalable manner. 123
apache/samza A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees 817
scriptfusion/porter A PHP library that enables durable and asynchronous data imports with memory-efficient processing and robust recovery from network errors. 612