inspect_ai

Model inspector

A framework for evaluating large language models

Inspect: A framework for large language model evaluations

GitHub

615 stars
9 watching
114 forks
Language: Python
last commit: 4 days ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
declare-lab/instruct-eval An evaluation framework for large language models trained with instruction tuning methods 528
allenai/olmo-eval An evaluation framework for large language models. 310
openai/simple-evals A library for evaluating language models using standardized prompts and benchmarking tests. 1,939
klen/pylama Automates code quality checks for Python programs 1,050
flageval-baai/flageval An evaluation toolkit and platform for assessing large models in various domains 300
ruixiangcui/agieval Evaluates foundation models on human-centric tasks with diverse exams and question types 708
ilevkivskyi/typing_inspect Provides utilities for inspecting and analyzing Python types at runtime 350
johnsnowlabs/langtest A tool for testing and evaluating large language models with a focus on AI safety and model assessment. 501
modelscope/evalscope A framework for efficient large model evaluation and performance benchmarking. 248
open-compass/lawbench Evaluates the legal knowledge of large language models using a custom benchmarking framework. 267
openlmlab/gaokao-bench An evaluation framework using Chinese high school examination questions to assess large language model capabilities 551
huggingface/evaluate An evaluation framework for machine learning models and datasets, providing standardized metrics and tools for comparing model performance. 2,034
h2oai/mli-resources Provides tools and techniques for interpreting machine learning models 484
flagai-open/aquila2 Provides pre-trained language models and tools for fine-tuning and evaluation 437
xverse-ai/xverse-moe-a36b Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. 36