LLMDataHub

Datasets

A curated collection of high-quality datasets for training large language models.

A quick guide (especially) for trending instruction finetuning datasets

GitHub

3k stars
50 watching
169 forks
last commit: 12 months ago
Linked from 1 awesome list

chatbotchatgptdatasetllm

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
mooler0410/llmspracticalguide A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP 9,489
mlabonne/llm-course A comprehensive course and resource package on building and deploying Large Language Models (LLMs) 39,120
lm-sys/fastchat An open platform for training, serving, and evaluating large language models used in chatbots. 36,975
young-geng/easylm A framework for training and serving large language models using JAX/Flax 2,409
rasbt/llms-from-scratch Developing and pretraining a GPT-like Large Language Model from scratch 32,908
bobazooba/xllm A tool for training and fine-tuning large language models using advanced techniques 380
instruction-tuning-with-gpt-4/gpt-4-llm This project generates instruction-following data using GPT-4 to fine-tune large language models for real-world tasks. 4,210
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,720
thunlp/plmpapers Compiles and organizes key papers on pre-trained language models, providing a resource for developers and researchers. 3,328
phoebussi/alpaca-cot Provides a unified interface for fine-tuning large language models with parameter-efficient methods and instruction collection data 2,619
huggingface/alignment-handbook Provides training recipes and resources to align language models with human preferences 4,677
shm007g/llama-cult-and-more Provides insights and practical guides for building and using large language models. 427
peremartra/large-language-model-notebooks-course A practical course teaching large language models and their applications through hands-on projects using OpenAI API and Hugging Face library. 1,281
optimalscale/lmflow A toolkit for finetuning large language models and providing efficient inference capabilities 8,273
hiyouga/llama-factory A unified platform for fine-tuning multiple large language models with various training approaches and methods 34,436