inspect_ai
Model inspector
A framework for evaluating large language models
Inspect: A framework for large language model evaluations
641 stars
9 watching
124 forks
Language: Python
last commit: about 19 hours ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
declare-lab/instruct-eval | An evaluation framework for large language models trained with instruction tuning methods | 532 |
allenai/olmo-eval | An evaluation framework for large language models. | 316 |
openai/simple-evals | Evaluates language models using standardized benchmarks and prompting techniques. | 2,000 |
klen/pylama | Automates code quality checks for Python programs | 1,047 |
flageval-baai/flageval | An evaluation toolkit and platform for assessing large models in various domains | 306 |
ruixiangcui/agieval | Evaluates foundation models on human-centric tasks with diverse exams and question types | 712 |
ilevkivskyi/typing_inspect | Provides utilities for inspecting and analyzing Python types at runtime | 350 |
johnsnowlabs/langtest | A tool for testing and evaluating large language models with a focus on AI safety and model assessment. | 506 |
modelscope/evalscope | A framework for efficiently evaluating and benchmarking large models in various domains | 285 |
open-compass/lawbench | Evaluates the legal knowledge of large language models using a custom benchmarking framework. | 270 |
openlmlab/gaokao-bench | An evaluation framework using Chinese high school examination questions to assess large language model capabilities | 562 |
huggingface/evaluate | An evaluation framework for machine learning models and datasets, providing standardized metrics and tools for comparing model performance. | 2,042 |
h2oai/mli-resources | Provides tools and techniques for interpreting machine learning models | 483 |
flagai-open/aquila2 | Provides pre-trained language models and tools for fine-tuning and evaluation | 438 |
xverse-ai/xverse-moe-a36b | Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. | 36 |