inspect_ai
Model inspector
A framework for evaluating large language models
Inspect: A framework for large language model evaluations
615 stars
9 watching
114 forks
Language: Python
last commit: 4 days ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
declare-lab/instruct-eval | An evaluation framework for large language models trained with instruction tuning methods | 528 |
allenai/olmo-eval | An evaluation framework for large language models. | 310 |
openai/simple-evals | A library for evaluating language models using standardized prompts and benchmarking tests. | 1,939 |
klen/pylama | Automates code quality checks for Python programs | 1,050 |
flageval-baai/flageval | An evaluation toolkit and platform for assessing large models in various domains | 300 |
ruixiangcui/agieval | Evaluates foundation models on human-centric tasks with diverse exams and question types | 708 |
ilevkivskyi/typing_inspect | Provides utilities for inspecting and analyzing Python types at runtime | 350 |
johnsnowlabs/langtest | A tool for testing and evaluating large language models with a focus on AI safety and model assessment. | 501 |
modelscope/evalscope | A framework for efficient large model evaluation and performance benchmarking. | 248 |
open-compass/lawbench | Evaluates the legal knowledge of large language models using a custom benchmarking framework. | 267 |
openlmlab/gaokao-bench | An evaluation framework using Chinese high school examination questions to assess large language model capabilities | 551 |
huggingface/evaluate | An evaluation framework for machine learning models and datasets, providing standardized metrics and tools for comparing model performance. | 2,034 |
h2oai/mli-resources | Provides tools and techniques for interpreting machine learning models | 484 |
flagai-open/aquila2 | Provides pre-trained language models and tools for fine-tuning and evaluation | 437 |
xverse-ai/xverse-moe-a36b | Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. | 36 |