AGIEval

Exam evaluator

Evaluates foundation models on human-centric tasks with diverse exams and question types

714 stars

9 watching

48 forks

Language: Python

last commit: over 1 year ago

Related projects:

Repository	Description	Stars
openai/simple-evals	Evaluates language models using standardized benchmarks and prompting techniques.	2,059
princeton-nlp/charxiv	An evaluation suite for assessing chart understanding in multimodal large language models.	85
allenai/olmo-eval	A framework for evaluating language models on NLP tasks	326
krrishdholakia/betterprompt	An API for evaluating the quality of text prompts used in Large Language Models (LLMs) based on perplexity estimation	43
hkust-nlp/ceval	An evaluation suite providing multiple-choice questions for foundation models in various disciplines, with tools for assessing model performance.	1,650
emrekavur/chaos-evaluation	Evaluates segmentation performance in medical imaging using multiple metrics	57
cloud-cv/evalai	A platform for comparing and evaluating AI and machine learning algorithms at scale	1,779
tatsu-lab/alpaca_eval	An automatic evaluation tool for large language models	1,568
zhourax/vega	Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs.	33
langchain-ai/auto-evaluator	Automated evaluation of language models for question answering tasks	749
1024pix/pix-editor	An online platform offering innovative evaluation and certification of digital skills	6
mshukor/evalign-icl	Evaluating and improving large multimodal models through in-context learning	21
mfaruqui/eval-word-vectors	A set of Python scripts for evaluating word vectors on various tasks and comparing similarity between words.	120
eddieantonio/ocreval	A collection of tools and utilities for evaluating the performance and quality of OCR output	57
maja42/goval	A Go library for evaluating arbitrary arithmetic, string, and logic expressions with support for variables and custom functions.	160