datachain
Data Warehouse Library
A Python-based framework for transforming and analyzing unstructured data from various formats like images, audio, videos, text, and PDFs.
ETL, Analytics, Versioning for Unstructured Data
2k stars
15 watching
94 forks
Language: Python
last commit: about 1 month ago aicvdata-analyticsdata-wranglingembeddingsllmllm-evalmachine-learningmlopsmultimodal
Related projects:
Repository | Description | Stars |
---|---|---|
jaimegildesagredo/booby | A Python library for defining and validating data structures with built-in support for complex data models and relationships. | 176 |
dotchain/dotjs | A distributed, reactive, and functional data structure library for JavaScript | 8 |
databricks/lilac | A tool to improve data quality and efficiency for large language models | 987 |
datamol-io/datamol | A Python library for manipulating molecules | 476 |
h2oai/datatable | A Python package for manipulating 2-dimensional tabular data structures with an emphasis on speed and big data support. | 1,821 |
dataoneorg/d1_python | A collection of Python libraries and tools for interacting with DataONE repositories | 17 |
f483/btctxstore | A library to store and retrieve data in Bitcoin transactions using OP_RETURN nulldata outputs. | 10 |
hellokaton/anima | A minimal Java library for simple database operations with a focus on ease of use and support for multiple databases and relational mappings. | 228 |
accelerationnet/data-table | Provides a data structure to represent tabular data in Common Lisp, enabling easy interaction with databases and report generation. | 22 |
ujjwalkarn/datasciencepython | A curated list of tutorials and resources for learning Python for data science, machine learning, and other related topics. | 5,301 |
whitaker-io/machine | A library for creating data workflows that can be simple or complex, with features like recursion and memoization. | 159 |
sabiwara/aja | An Elixir standard library extension focused on efficient data structures and manipulation | 213 |
indy256/codelibrary | A comprehensive collection of algorithms and data structures implemented in multiple programming languages | 1,944 |
tiledb-inc/tiledb-py | Provides a Python interface to store and manage large datasets in a distributed, columnar storage system. | 190 |
scalamolecule/molecule | A library for defining and querying complex domain models using a high-level, declarative data access API | 19 |