lance

Data storage format

A modern columnar data format for machine learning and large language models.

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

GitHub

4k stars
42 watching
234 forks
Language: Rust
last commit: 1 day ago
Linked from 1 awesome list

apache-arrowcomputer-visiondata-analysisdata-analyticsdata-centricdata-formatdata-sciencedataopsdeep-learningduckdbembeddingsllmsmachine-learningmlopspythonrust

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
lancedb/lancedb A serverless vector search and storage database built with Rust, enabling efficient similarity searches across multimodal data. 4,993
wesm/feather A binary data frame storage system that enables efficient and interoperable data sharing across multiple programming languages. 2,742
apache/arrow A toolkit for efficient data interchange and in-memory analytics in various languages 14,728
aksnzhy/xlearn A high-performance machine learning package with linear models and factorization machines. 3,087
ml-tooling/opyrator Automates conversion of machine learning code into production-ready microservices with web API and GUI. 3,116
paradedb/pg_analytics Enables direct querying of data lakes from Postgres without moving data to a cloud data warehouse 407
mlpack/mlpack A C++ machine learning library with bindings to other languages and bindings for multiple programming languages. 5,151
root-project/root A software package for analyzing and visualizing large scientific data sets 2,736
ericlbuehler/mistral.rs A high-performance LLM inference framework written in Rust 4,677
ponyorm/pony An object-relational mapper that allows Python developers to write database queries using Python code 3,665
qdrant/qdrant A high-performance vector search engine and database for efficient similarity searches in machine learning applications. 21,001
vaexio/vaex A high-performance Python library for streaming and exploring large tabular datasets. 8,315
postgresml/postgresml An open-source Postgres extension for machine learning and AI operations directly within the database. 6,070
infiniflow/infinity A high-performance database designed to support the fast and efficient search of various data types in AI applications 2,780
ayush1997/visualize_ml A Python package for data analysis and visualization in machine learning 198