datafusion-python
Data processor
A Python library that provides a data processing and querying framework using the Apache Arrow in-memory query engine.
Apache DataFusion Python Bindings
385 stars
34 watching
80 forks
Language: Python
last commit: 5 days ago Related projects:
Repository | Description | Stars |
---|---|---|
apache/datafusion-ballista-python | Bindings for using Apache Arrow's query engine in Python to analyze and manipulate large datasets | 34 |
h2oai/datatable | A Python package for manipulating 2-dimensional tabular data structures with an emphasis on speed and big data support. | 1,821 |
olirice/flupy | A library that provides a fluent interface for processing data pipelines in Python without holding large amounts of memory | 193 |
apache/datafusion-ballista | Distributed query engine for Apache DataFusion applications | 1,580 |
apache/spark | An analytics engine designed to handle large-scale data processing and analysis | 40,170 |
apache/pig | Enables data processing and transformation in large files using a high-level language with compile-time optimizations for efficient execution on distributed computing frameworks. | 682 |
capillariesio/capillaries | A distributed batch data processing framework that enables scalable and reliable data transformation, filtering, and aggregation. | 62 |
apache/datafusion | A query engine that supports various data formats and allows customization of its functionality. | 6,462 |
pyjanitor-devs/pyjanitor | A Python library providing a clean and expressive API for data cleaning by chaining multiple operations together in a logical order. | 1,371 |
quixio/quix-streams | A Python framework for real-time data processing on Apache Kafka streams | 1,246 |
intake/intake | A package for describing, loading, and processing data in a declarative way | 1,015 |
apache/druid | A high-performance real-time analytics database for fast queries and ingest | 13,548 |
nipype/pydra | A lightweight Python dataflow engine for building and executing directed acyclic graphs (DAGs) in a scalable manner. | 123 |
apache/samza | A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees | 817 |
scriptfusion/porter | A PHP library that enables durable and asynchronous data imports with memory-efficient processing and robust recovery from network errors. | 612 |