lilac

Data curator

A tool to improve data quality and efficiency for large language models

Curate better data for LLMs

GitHub

978 stars
14 watching
91 forks
Language: Python
last commit: 9 months ago
artificial-intelligencedata-analysisdataset-analysisunstructured-data

Related projects:

Repository Description Stars
iterative/datachain An AI-data warehouse solution that enables efficient processing and analysis of unstructured data from various sources. 2,029
mmaelicke/dtype-decorate A library of decorators to enforce data type constraints on function attributes 0
cdepillabout/pretty-simple A tool to prettify Haskell data types with Show instances in an easy-to-read format 243
msamogh/nonechucks Library that provides dynamic data cleaning and filtering capabilities for PyTorch datasets and samplers 378
gems-uff/noworkflow Automates the tracking of how data is produced and transformed in scientific experiments. 121
ayush1997/visualize_ml A Python package for data analysis and visualization in machine learning 199
basilesimon/datajournalists-toolbox A collection of curated tools and resources for datajournalists to analyze and visualize their data 43
chakki-works/chazutsu A tool that simplifies the process of preparing and manipulating natural language processing datasets 243
idea-fasoc/datasheet-scrubber Automates extraction of key circuit information from PDF datasheets/documents to build a database of commercial off-the-shelf IP. 51
m3works/metloom Provides tools and methods for collecting, managing, and analyzing meteorological data from various sources 16
atlasoflivingaustralia/volunteer-portal A crowdsourcing platform for digitizing biodiversity data using online volunteers 17
moldach/datarbeautiful Recreating data visualizations from the book 'Knowledge is Beautiful' in R 13
kdmayer/pointer A LiDAR-derived point cloud dataset of one million English buildings linked to energy characteristics 13
nlgranger/seqtools A Python library to manipulate and transform indexable data 48
carla-simulator/data-collector A tool for collecting and organizing data from the CARLA simulation environment. 74