opencompass
LLM evaluator
An LLM evaluation platform supporting various models and datasets
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
4k stars
26 watching
437 forks
Language: Python
last commit: 6 days ago
Linked from 1 awesome list
benchmarkchatgptevaluationlarge-language-modelllama2llama3llmopenai
Related projects:
Repository | Description | Stars |
---|---|---|
imoneoi/openchat | Fine-tuned language models trained on mixed-quality data | 5,260 |
eleutherai/lm-evaluation-harness | Provides a unified framework to test generative language models on various evaluation tasks. | 6,970 |
open-compass/vlmevalkit | A toolkit for evaluating large vision-language models on various benchmarks and datasets. | 1,343 |
ianarawjo/chainforge | An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. | 2,334 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,720 |
traceloop/openllmetry | A collection of extensions and instrumentations for monitoring large language models (LLMs) in AI and machine learning applications. | 3,454 |
fittentech/openllama-chinese | A Chinese language large language model built from OpenLLaMA and fine-tuned on various datasets for multilingual text generation. | 64 |
openlmlab/openchinesellama | An incremental pre-trained Chinese large language model based on the LLaMA-7B model | 234 |
brexhq/prompt-engineering | Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. | 8,440 |
langfuse/langfuse | An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools. | 6,537 |
openbmb/toolbench | A platform for training, serving, and evaluating large language models to enable tool use capability | 4,843 |
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,754 |
open-compass/lawbench | Evaluates the legal knowledge of large language models using a custom benchmarking framework. | 267 |
thunlp/openprompt | A flexible framework for adapting pre-trained language models to downstream NLP tasks using textual templates | 4,371 |
openai-translator/openai-translator | A multi-platform translator and text processing tool leveraging ChatGPT API | 23,908 |