langtest

Model Tester

A tool for testing and evaluating large language models with a focus on AI safety and model assessment.

Deliver safe & effective language models

GitHub

501 stars
10 watching
40 forks
Language: Python
last commit: 9 days ago
ai-safetyai-testingartificial-intelligencebenchmark-frameworkbenchmarksethics-in-ailarge-language-modelsllmllm-as-evaluatorllm-evaluation-toolkitllm-testllm-testingml-safetyml-testingmlopsmodel-assessmentnlpresponsible-aitrustworthy-ai

Related projects:

Repository Description Stars
howiehwong/trustllm A toolkit for assessing trustworthiness in large language models 466
aiplanethub/beyondllm An open-source toolkit for building and evaluating large language models 261
declare-lab/instruct-eval An evaluation framework for large language models trained with instruction tuning methods 528
neulab/explainaboard An interactive tool to analyze and compare the performance of natural language processing models 361
vhellendoorn/code-lms A guide to using pre-trained large language models in source code analysis and generation 1,782
comet-ml/opik An end-to-end platform for evaluating and testing large language models. 2,121
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 55
innogames/ltc A tool for managing load tests and analyzing performance results 198
qcri/llmebench A benchmarking framework for large language models 80
maluuba/nlg-eval A toolset for evaluating and comparing natural language generation models 1,347
openlmlab/gaokao-bench An evaluation framework using Chinese high school examination questions to assess large language model capabilities 551
flagai-open/aquila2 Provides pre-trained language models and tools for fine-tuning and evaluation 437
bilibili/index-1.9b A lightweight, multilingual language model with a long context length 904
01-ai/yi A series of large language models trained from scratch to excel in multiple NLP tasks 7,699
ailab-cvc/seed-bench A benchmark for evaluating large language models' ability to process multimodal input 315