LLMDataHub

Datasets

A curated collection of high-quality datasets for training large language models.

A quick guide (especially) for trending instruction finetuning datasets

3k stars

50 watching

174 forks

last commit: over 2 years ago

Linked from 1 awesome list

chatbotchatgptdatasetllm

Backlinks from these awesome lists:

hannibal046/awesome-llm

Related projects:

Repository	Description	Stars
mooler0410/llmspracticalguide	A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP	9,551
mlabonne/llm-course	A comprehensive course and resource package on building and deploying Large Language Models (LLMs)	40,053
lm-sys/fastchat	An open platform for training, serving, and evaluating large language models used in chatbots.	37,269
young-geng/easylm	A framework for training and serving large language models using JAX/Flax	2,428
rasbt/llms-from-scratch	Developing and pretraining a GPT-like Large Language Model from scratch	35,405
bobazooba/xllm	A tool for training and fine-tuning large language models using advanced techniques	387
instruction-tuning-with-gpt-4/gpt-4-llm	This project generates instruction-following data using GPT-4 to fine-tune large language models for real-world tasks.	4,244
alpha-vllm/llama2-accessory	An open-source toolkit for pretraining and fine-tuning large language models	2,732
thunlp/plmpapers	Compiles and organizes key papers on pre-trained language models, providing a resource for developers and researchers.	3,331
phoebussi/alpaca-cot	Provides a unified interface for fine-tuning large language models with parameter-efficient methods and instruction collection data	2,640
huggingface/alignment-handbook	Provides recipes and guidelines for training language models to align with human preferences and AI goals	4,800
shm007g/llama-cult-and-more	Provides insights and practical guides for building and using large language models.	427
peremartra/large-language-model-notebooks-course	A practical course teaching large language models and their applications through hands-on projects using OpenAI API and Hugging Face library.	1,338
optimalscale/lmflow	A toolkit for fine-tuning and inferring large machine learning models	8,312
hiyouga/llama-factory	A tool for efficiently fine-tuning large language models across multiple architectures and methods.	36,219