inspect_ai

Model inspector

A framework for evaluating large language models

Inspect: A framework for large language model evaluations

669 stars

9 watching

135 forks

Language: Python

last commit: about 1 year ago

Linked from 1 awesome list

Screenshot of UKGovernmentBEIS/inspect_ai website

inspect.ai-safety-institute.org.uk/

Backlinks from these awesome lists:

ethicalml/awesome-production-machine-learning

Related projects:

Repository	Description	Stars
declare-lab/instruct-eval	An evaluation framework for large language models trained with instruction tuning methods	535
allenai/olmo-eval	A framework for evaluating language models on NLP tasks	326
openai/simple-evals	Evaluates language models using standardized benchmarks and prompting techniques.	2,059
klen/pylama	Automates code quality checks for Python programs	1,049
flageval-baai/flageval	An evaluation toolkit and platform for assessing large models in various domains	307
ruixiangcui/agieval	Evaluates foundation models on human-centric tasks with diverse exams and question types	714
ilevkivskyi/typing_inspect	Provides utilities for inspecting and analyzing Python types at runtime	352
johnsnowlabs/langtest	A tool for testing and evaluating large language models with a focus on AI safety and model assessment.	506
modelscope/evalscope	A framework for efficiently evaluating and benchmarking large models	308
open-compass/lawbench	Evaluates the legal knowledge of large language models using a custom benchmarking framework.	273
openlmlab/gaokao-bench	An evaluation framework using Chinese high school examination questions to assess large language model capabilities	565
huggingface/evaluate	An evaluation framework for machine learning models and datasets, providing standardized metrics and tools for comparing model performance.	2,063
h2oai/mli-resources	Provides tools and techniques for interpreting machine learning models	483
flagai-open/aquila2	Provides pre-trained language models and tools for fine-tuning and evaluation	439
xverse-ai/xverse-moe-a36b	Develops and publishes large multilingual language models with advanced mixing-of-experts architecture.	37