langtest
Model Tester
A tool for testing and evaluating large language models with a focus on AI safety and model assessment.
Deliver safe & effective language models
506 stars
10 watching
41 forks
Language: Python
last commit: 3 months ago ai-safetyai-testingartificial-intelligencebenchmark-frameworkbenchmarksethics-in-ailarge-language-modelsllmllm-as-evaluatorllm-evaluation-toolkitllm-testllm-testingml-safetyml-testingmlopsmodel-assessmentnlpresponsible-aitrustworthy-ai
Related projects:
Repository | Description | Stars |
---|---|---|
| A toolkit for assessing trustworthiness in large language models | 491 |
| An open-source toolkit for building and evaluating large language models | 267 |
| An evaluation framework for large language models trained with instruction tuning methods | 535 |
| An interactive tool to analyze and compare the performance of natural language processing models | 362 |
| A guide to using pre-trained large language models in source code analysis and generation | 1,789 |
| A platform for evaluating and testing large language models (LLMs) during development and production. | 2,588 |
| Evaluates and compares the performance of multimodal large language models on various tasks | 56 |
| A tool for managing load tests and analyzing performance results | 200 |
| A benchmarking framework for large language models | 81 |
| A toolset for evaluating and comparing natural language generation models | 1,350 |
| An evaluation framework using Chinese high school examination questions to assess large language model capabilities | 565 |
| Provides pre-trained language models and tools for fine-tuning and evaluation | 439 |
| A lightweight, multilingual language model with a long context length | 920 |
| A series of large language models trained from scratch to excel in multiple NLP tasks | 7,743 |
| A benchmark for evaluating large language models' ability to process multimodal input | 322 |