AGIEval

Exam evaluator

Evaluates foundation models on human-centric tasks with diverse exams and question types

GitHub

714 stars
9 watching
48 forks
Language: Python
last commit: 6 months ago

Related projects:

Repository Description Stars
openai/simple-evals Evaluates language models using standardized benchmarks and prompting techniques. 2,059
princeton-nlp/charxiv An evaluation suite for assessing chart understanding in multimodal large language models. 85
allenai/olmo-eval A framework for evaluating language models on NLP tasks 326
krrishdholakia/betterprompt An API for evaluating the quality of text prompts used in Large Language Models (LLMs) based on perplexity estimation 43
hkust-nlp/ceval An evaluation suite providing multiple-choice questions for foundation models in various disciplines, with tools for assessing model performance. 1,650
emrekavur/chaos-evaluation Evaluates segmentation performance in medical imaging using multiple metrics 57
cloud-cv/evalai A platform for comparing and evaluating AI and machine learning algorithms at scale 1,779
tatsu-lab/alpaca_eval An automatic evaluation tool for large language models 1,568
zhourax/vega Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. 33
langchain-ai/auto-evaluator Automated evaluation of language models for question answering tasks 749
1024pix/pix-editor An online platform offering innovative evaluation and certification of digital skills 6
mshukor/evalign-icl Evaluating and improving large multimodal models through in-context learning 21
mfaruqui/eval-word-vectors A set of Python scripts for evaluating word vectors on various tasks and comparing similarity between words. 120
eddieantonio/ocreval A collection of tools and utilities for evaluating the performance and quality of OCR output 57
maja42/goval A Go library for evaluating arbitrary arithmetic, string, and logic expressions with support for variables and custom functions. 160