lilac
Data curator
A tool to improve data quality and efficiency for large language models
Curate better data for LLMs
969 stars
13 watching
92 forks
Language: Python
last commit: 8 months ago artificial-intelligencedata-analysisdataset-analysisunstructured-data
Related projects:
Repository | Description | Stars |
---|---|---|
iterative/datachain | An AI-data warehouse that transforms and analyzes unstructured data from various formats | 1,990 |
mmaelicke/dtype-decorate | A library of decorators to enforce data type constraints on function attributes | 0 |
cdepillabout/pretty-simple | A tool to prettify Haskell data types with Show instances in an easy-to-read format | 243 |
msamogh/nonechucks | Library that provides dynamic data cleaning and filtering capabilities for PyTorch datasets and samplers | 377 |
gems-uff/noworkflow | Automates the tracking of how data is produced and transformed in scientific experiments. | 120 |
ayush1997/visualize_ml | A Python package for data analysis and visualization in machine learning | 200 |
basilesimon/datajournalists-toolbox | A collection of curated tools and resources for datajournalists to analyze and visualize their data | 43 |
chakki-works/chazutsu | A tool that simplifies the process of preparing and manipulating natural language processing datasets | 243 |
idea-fasoc/datasheet-scrubber | Automates extraction of key circuit information from PDF datasheets/documents to build a database of commercial off-the-shelf IP. | 51 |
m3works/metloom | Provides tools and methods for collecting, managing, and analyzing meteorological data from various sources | 16 |
atlasoflivingaustralia/volunteer-portal | A crowdsourcing platform for digitizing biodiversity data using online volunteers | 17 |
moldach/datarbeautiful | Recreating data visualizations from the book 'Knowledge is Beautiful' in R | 13 |
kdmayer/pointer | A LiDAR-derived point cloud dataset of one million English buildings linked to energy characteristics | 13 |
nlgranger/seqtools | A Python library to manipulate and transform indexable data | 48 |
carla-simulator/data-collector | A tool for collecting and organizing data from the CARLA simulation environment. | 74 |