Huatuo-26M

Medical QA dataset

A large-scale medical question-and-answer dataset with over 26 million high-quality pairs, designed for natural language processing and machine learning applications in the medical field.

The Largest-scale Chinese Medical QA Dataset: with 26,000,000 question answer pairs.

GitHub

226 stars
9 watching
22 forks
last commit: 10 months ago

Related projects:

Repository Description Stars
freedomintelligence/huatuogpt Developing a large language model for medical consultations by combining distilled and real-world data to improve doctor-patient interactions 1,093
x-d-lab/sunsimiao A large-scale Chinese medical language model trained on diverse data sources to provide accurate and reliable medical information 407
suprityoung/zhongjing Develops a large language model capable of handling complex medical conversations with high accuracy and professionalism. 324
certainlyio/corona_dataset A collection of data to train chatbots on COVID-19-related questions 11
xiaoman-zhang/pmc-vqa A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions. 180
maluuba/newsqa Compiles and provides structured access to Maluuba's NewsQA dataset for natural language question answering research. 253
xuefuzhao/instructionwild Creating a large-scale user-based instruction dataset for natural language processing research and development 455
freedomintelligence/allava A collection of datasets and models designed to support the training of lite vision-language models. 249
2020meai/tcmllm Develops a large language model to aid in Chinese medicine diagnosis and prescription recommendations. 127
zcyang/imageqa-san This project provides code for training image question answering models using stacked attention networks and convolutional neural networks. 108
michael-wzhu/promptcblue A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain 328
royorel/ffhq-aging-dataset Provides images of human faces with annotated age, gender, pose, and other attributes for testing age transformation algorithms. 262
chatopera/insuranceqa-corpus-zh An insurance industry conversation corpus with pre-processed data for natural language processing and question answering tasks. 1,019
toyhom/chinese-medical-dialogue-data A collection of medical dialogue data for training conversational AI models. 1,264
eperrier/qdataset A collection of 52 machine learning datasets for simulating quantum systems with noise and controls. 99