Huatuo-26M
Medical QA dataset
A large-scale medical question-and-answer dataset with over 26 million high-quality pairs, designed for natural language processing and machine learning applications in the medical field.
The Largest-scale Chinese Medical QA Dataset: with 26,000,000 question answer pairs.
223 stars
9 watching
21 forks
last commit: 8 months ago Related projects:
Repository | Description | Stars |
---|---|---|
freedomintelligence/huatuogpt | Developing a large language model for medical consultations by combining distilled and real-world data to improve doctor-patient interactions | 1,076 |
x-d-lab/sunsimiao | Develops and provides a reliable Chinese medical language model based on traditional medicine knowledge | 396 |
suprityoung/zhongjing | Develops a large language model capable of handling complex medical conversations with high accuracy and professionalism. | 316 |
certainlyio/corona_dataset | A collection of data to train chatbots on COVID-19-related questions | 11 |
xiaoman-zhang/pmc-vqa | A medical visual question-answering dataset and toolkit for training models to understand medical images and instructions. | 174 |
maluuba/newsqa | Compiles and provides structured access to Maluuba's NewsQA dataset for natural language question answering research. | 253 |
xuefuzhao/instructionwild | Creating a large-scale user-based instruction dataset for natural language processing research and development | 453 |
freedomintelligence/allava | A collection of datasets and models designed to support the training of lite vision-language models. | 246 |
2020meai/tcmllm | Develops a large language model to aid in Chinese medicine diagnosis and prescription recommendations. | 118 |
zcyang/imageqa-san | This project provides code for training image question answering models using stacked attention networks and convolutional neural networks. | 107 |
michael-wzhu/promptcblue | A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain | 323 |
royorel/ffhq-aging-dataset | Provides images of human faces with annotated age, gender, pose, and other attributes for testing age transformation algorithms. | 261 |
chatopera/insuranceqa-corpus-zh | An insurance industry conversation corpus with pre-processed data for natural language processing and question answering tasks. | 1,020 |
toyhom/chinese-medical-dialogue-data | A collection of medical dialogue data for training conversational AI models. | 1,227 |
eperrier/qdataset | A collection of 52 machine learning datasets for simulating quantum systems with noise and controls. | 98 |