M3Exam
Data and code for paper "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models"
91 stars
9 watching
12 forks
Language: Python
last commit: over 1 year ago ai-educationchatgptevaluationgpt-4large-language-modelsllmsmultilingualmultimodal