inspect_ai

Model inspector

A framework for evaluating large language models

Inspect: A framework for large language model evaluations

GitHub

641 stars
9 watching
124 forks
Language: Python
last commit: about 19 hours ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
declare-lab/instruct-eval An evaluation framework for large language models trained with instruction tuning methods 532
allenai/olmo-eval An evaluation framework for large language models. 316
openai/simple-evals Evaluates language models using standardized benchmarks and prompting techniques. 2,000
klen/pylama Automates code quality checks for Python programs 1,047
flageval-baai/flageval An evaluation toolkit and platform for assessing large models in various domains 306
ruixiangcui/agieval Evaluates foundation models on human-centric tasks with diverse exams and question types 712
ilevkivskyi/typing_inspect Provides utilities for inspecting and analyzing Python types at runtime 350
johnsnowlabs/langtest A tool for testing and evaluating large language models with a focus on AI safety and model assessment. 506
modelscope/evalscope A framework for efficiently evaluating and benchmarking large models in various domains 285
open-compass/lawbench Evaluates the legal knowledge of large language models using a custom benchmarking framework. 270
openlmlab/gaokao-bench An evaluation framework using Chinese high school examination questions to assess large language model capabilities 562
huggingface/evaluate An evaluation framework for machine learning models and datasets, providing standardized metrics and tools for comparing model performance. 2,042
h2oai/mli-resources Provides tools and techniques for interpreting machine learning models 483
flagai-open/aquila2 Provides pre-trained language models and tools for fine-tuning and evaluation 438
xverse-ai/xverse-moe-a36b Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. 36