Huatuo-26M

Medical QA dataset

A large-scale medical question-and-answer dataset with over 26 million high-quality pairs, designed for natural language processing and machine learning applications in the medical field.

The Largest-scale Chinese Medical QA Dataset: with 26,000,000 question answer pairs.

GitHub

223 stars
9 watching
21 forks
last commit: 8 months ago

Related projects:

Repository Description Stars
freedomintelligence/huatuogpt Developing a large language model for medical consultations by combining distilled and real-world data to improve doctor-patient interactions 1,076
x-d-lab/sunsimiao Develops and provides a reliable Chinese medical language model based on traditional medicine knowledge 396
suprityoung/zhongjing Develops a large language model capable of handling complex medical conversations with high accuracy and professionalism. 316
certainlyio/corona_dataset A collection of data to train chatbots on COVID-19-related questions 11
xiaoman-zhang/pmc-vqa A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions. 174
maluuba/newsqa Compiles and provides structured access to Maluuba's NewsQA dataset for natural language question answering research. 253
xuefuzhao/instructionwild Creating a large-scale user-based instruction dataset for natural language processing research and development 453
freedomintelligence/allava A collection of datasets and models designed to support the training of lite vision-language models. 246
2020meai/tcmllm Develops a large language model to aid in Chinese medicine diagnosis and prescription recommendations. 118
zcyang/imageqa-san This project provides code for training image question answering models using stacked attention networks and convolutional neural networks. 107
michael-wzhu/promptcblue A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain 323
royorel/ffhq-aging-dataset Provides images of human faces with annotated age, gender, pose, and other attributes for testing age transformation algorithms. 261
chatopera/insuranceqa-corpus-zh An insurance industry conversation corpus with pre-processed data for natural language processing and question answering tasks. 1,020
toyhom/chinese-medical-dialogue-data A collection of medical dialogue data for training conversational AI models. 1,227
eperrier/qdataset A collection of 52 machine learning datasets for simulating quantum systems with noise and controls. 98