datachain

Data Warehouse Library

A Python-based framework for transforming and analyzing unstructured data from various formats like images, audio, videos, text, and PDFs.

ETL, Analytics, Versioning for Unstructured Data

GitHub

2k stars
15 watching
94 forks
Language: Python
last commit: about 1 month ago
aicvdata-analyticsdata-wranglingembeddingsllmllm-evalmachine-learningmlopsmultimodal

Related projects:

Repository Description Stars
jaimegildesagredo/booby A Python library for defining and validating data structures with built-in support for complex data models and relationships. 176
dotchain/dotjs A distributed, reactive, and functional data structure library for JavaScript 8
databricks/lilac A tool to improve data quality and efficiency for large language models 987
datamol-io/datamol A Python library for manipulating molecules 476
h2oai/datatable A Python package for manipulating 2-dimensional tabular data structures with an emphasis on speed and big data support. 1,821
dataoneorg/d1_python A collection of Python libraries and tools for interacting with DataONE repositories 17
f483/btctxstore A library to store and retrieve data in Bitcoin transactions using OP_RETURN nulldata outputs. 10
hellokaton/anima A minimal Java library for simple database operations with a focus on ease of use and support for multiple databases and relational mappings. 228
accelerationnet/data-table Provides a data structure to represent tabular data in Common Lisp, enabling easy interaction with databases and report generation. 22
ujjwalkarn/datasciencepython A curated list of tutorials and resources for learning Python for data science, machine learning, and other related topics. 5,301
whitaker-io/machine A library for creating data workflows that can be simple or complex, with features like recursion and memoization. 159
sabiwara/aja An Elixir standard library extension focused on efficient data structures and manipulation 213
indy256/codelibrary A comprehensive collection of algorithms and data structures implemented in multiple programming languages 1,944
tiledb-inc/tiledb-py Provides a Python interface to store and manage large datasets in a distributed, columnar storage system. 190
scalamolecule/molecule A library for defining and querying complex domain models using a high-level, declarative data access API 19