AGIEval
Exam evaluator
Evaluates foundation models on human-centric tasks with diverse exams and question types
714 stars
9 watching
48 forks
Language: Python
last commit: 6 months ago Related projects:
Repository | Description | Stars |
---|---|---|
openai/simple-evals | Evaluates language models using standardized benchmarks and prompting techniques. | 2,059 |
princeton-nlp/charxiv | An evaluation suite for assessing chart understanding in multimodal large language models. | 85 |
allenai/olmo-eval | A framework for evaluating language models on NLP tasks | 326 |
krrishdholakia/betterprompt | An API for evaluating the quality of text prompts used in Large Language Models (LLMs) based on perplexity estimation | 43 |
hkust-nlp/ceval | An evaluation suite providing multiple-choice questions for foundation models in various disciplines, with tools for assessing model performance. | 1,650 |
emrekavur/chaos-evaluation | Evaluates segmentation performance in medical imaging using multiple metrics | 57 |
cloud-cv/evalai | A platform for comparing and evaluating AI and machine learning algorithms at scale | 1,779 |
tatsu-lab/alpaca_eval | An automatic evaluation tool for large language models | 1,568 |
zhourax/vega | Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
langchain-ai/auto-evaluator | Automated evaluation of language models for question answering tasks | 749 |
1024pix/pix-editor | An online platform offering innovative evaluation and certification of digital skills | 6 |
mshukor/evalign-icl | Evaluating and improving large multimodal models through in-context learning | 21 |
mfaruqui/eval-word-vectors | A set of Python scripts for evaluating word vectors on various tasks and comparing similarity between words. | 120 |
eddieantonio/ocreval | A collection of tools and utilities for evaluating the performance and quality of OCR output | 57 |
maja42/goval | A Go library for evaluating arbitrary arithmetic, string, and logic expressions with support for variables and custom functions. | 160 |