Huatuo-26M

Medical QA dataset

A large-scale medical question-and-answer dataset with over 26 million high-quality pairs, designed for natural language processing and machine learning applications in the medical field.

The Largest-scale Chinese Medical QA Dataset： with 26,000,000 question answer pairs.

GitHub

226 stars

9 watching

22 forks

last commit: over 2 years ago

Related projects:

Repository	Description	Stars
freedomintelligence/huatuogpt	Developing a large language model for medical consultations by combining distilled and real-world data to improve doctor-patient interactions	1,093
x-d-lab/sunsimiao	A large-scale Chinese medical language model trained on diverse data sources to provide accurate and reliable medical information	407
suprityoung/zhongjing	Develops a large language model capable of handling complex medical conversations with high accuracy and professionalism.	324
certainlyio/corona_dataset	A collection of data to train chatbots on COVID-19-related questions	11
xiaoman-zhang/pmc-vqa	A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions.	180
maluuba/newsqa	Compiles and provides structured access to Maluuba's NewsQA dataset for natural language question answering research.	253
xuefuzhao/instructionwild	Creating a large-scale user-based instruction dataset for natural language processing research and development	455
freedomintelligence/allava	A collection of datasets and models designed to support the training of lite vision-language models.	249
2020meai/tcmllm	Develops a large language model to aid in Chinese medicine diagnosis and prescription recommendations.	127
zcyang/imageqa-san	This project provides code for training image question answering models using stacked attention networks and convolutional neural networks.	108
michael-wzhu/promptcblue	A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain	328
royorel/ffhq-aging-dataset	Provides images of human faces with annotated age, gender, pose, and other attributes for testing age transformation algorithms.	262
chatopera/insuranceqa-corpus-zh	An insurance industry conversation corpus with pre-processed data for natural language processing and question answering tasks.	1,019
toyhom/chinese-medical-dialogue-data	A collection of medical dialogue data for training conversational AI models.	1,264
eperrier/qdataset	A collection of 52 machine learning datasets for simulating quantum systems with noise and controls.	99