lance

Data storage format

A modern columnar data format for machine learning and large language models.

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

GitHub

4k stars
44 watching
221 forks
Language: Rust
last commit: 5 days ago
Linked from 1 awesome list

apache-arrowcomputer-visiondata-analysisdata-analyticsdata-centricdata-formatdata-sciencedataopsdeep-learningduckdbembeddingsllmsmachine-learningmlopspythonrust

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
lancedb/lancedb A serverless vector search and storage database built with Rust, enabling efficient similarity searches across multimodal data. 4,757
wesm/feather A binary data frame storage system that enables efficient and interoperable data sharing across multiple programming languages. 2,742
apache/arrow A toolkit for efficient data interchange and in-memory analytics in various languages 14,590
aksnzhy/xlearn A high-performance machine learning package with linear models and factorization machines. 3,087
ml-tooling/opyrator Automates conversion of machine learning code into production-ready microservices with web API and GUI. 3,102
paradedb/pg_analytics Enables direct querying of large data volumes from Postgres using a high-performance analytical query engine 380
mlpack/mlpack A C++ machine learning library with bindings to other languages and bindings for multiple programming languages. 5,113
root-project/root A software package for analyzing and visualizing large scientific data sets 2,707
ericlbuehler/mistral.rs A fast and flexible LLM inference platform supporting various models and devices 4,466
ponyorm/pony An object-relational mapper that allows Python developers to write database queries using Python code 3,650
qdrant/qdrant A high-performance vector search engine and database for efficient similarity searches in machine learning applications. 20,607
vaexio/vaex A high-performance Python library for streaming and exploring large tabular datasets. 8,297
postgresml/postgresml An open-source Postgres extension for machine learning and AI operations directly within the database. 6,033
infiniflow/infinity A high-performance database designed to support fast search and retrieval of dense vector, sparse vector, tensor, and full-text data 2,641
ayush1997/visualize_ml A Python package for data analysis and visualization in machine learning 200