lilac

Data curator

A tool to improve data quality and efficiency for large language models

Curate better data for LLMs

GitHub

969 stars
13 watching
92 forks
Language: Python
last commit: 8 months ago
artificial-intelligencedata-analysisdataset-analysisunstructured-data

Related projects:

Repository Description Stars
iterative/datachain An AI-data warehouse that transforms and analyzes unstructured data from various formats 1,990
mmaelicke/dtype-decorate A library of decorators to enforce data type constraints on function attributes 0
cdepillabout/pretty-simple A tool to prettify Haskell data types with Show instances in an easy-to-read format 243
msamogh/nonechucks Library that provides dynamic data cleaning and filtering capabilities for PyTorch datasets and samplers 377
gems-uff/noworkflow Automates the tracking of how data is produced and transformed in scientific experiments. 120
ayush1997/visualize_ml A Python package for data analysis and visualization in machine learning 200
basilesimon/datajournalists-toolbox A collection of curated tools and resources for datajournalists to analyze and visualize their data 43
chakki-works/chazutsu A tool that simplifies the process of preparing and manipulating natural language processing datasets 243
idea-fasoc/datasheet-scrubber Automates extraction of key circuit information from PDF datasheets/documents to build a database of commercial off-the-shelf IP. 51
m3works/metloom Provides tools and methods for collecting, managing, and analyzing meteorological data from various sources 16
atlasoflivingaustralia/volunteer-portal A crowdsourcing platform for digitizing biodiversity data using online volunteers 17
moldach/datarbeautiful Recreating data visualizations from the book 'Knowledge is Beautiful' in R 13
kdmayer/pointer A LiDAR-derived point cloud dataset of one million English buildings linked to energy characteristics 13
nlgranger/seqtools A Python library to manipulate and transform indexable data 48
carla-simulator/data-collector A tool for collecting and organizing data from the CARLA simulation environment. 74