lilac
Data curator
A tool to improve data quality and efficiency for large language models
Curate better data for LLMs
987 stars
14 watching
93 forks
Language: Python
last commit: 12 months ago artificial-intelligencedata-analysisdataset-analysisunstructured-data
Related projects:
Repository | Description | Stars |
---|---|---|
| A Python-based framework for transforming and analyzing unstructured data from various formats like images, audio, videos, text, and PDFs. | 2,088 |
| A library of decorators to enforce data type constraints on function attributes | 0 |
| A tool to prettify Haskell data types with Show instances in an easy-to-read format | 243 |
| Library that provides dynamic data cleaning and filtering capabilities for PyTorch datasets and samplers | 378 |
| Automates the tracking of how data is produced and transformed in scientific experiments. | 122 |
| A Python package for data analysis and visualization in machine learning | 198 |
| A collection of curated tools and resources for datajournalists to analyze and visualize their data | 43 |
| A tool that simplifies the process of preparing and manipulating natural language processing datasets | 243 |
| Automates extraction of key circuit information from PDF datasheets/documents to build a database of commercial off-the-shelf IP. | 51 |
| Provides tools and methods for collecting, managing, and analyzing meteorological data from various sources | 16 |
| A crowdsourcing platform for digitizing biodiversity data using online volunteers | 17 |
| Recreating data visualizations from the book 'Knowledge is Beautiful' in R | 13 |
| A LiDAR-derived point cloud dataset of one million English buildings linked to energy characteristics | 13 |
| A Python library to manipulate and transform indexable data | 49 |
| A tool for collecting and organizing data from the CARLA simulation environment. | 74 |