opencompass

LLM evaluator

An LLM evaluation platform supporting various models and datasets

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

GitHub

4k stars
26 watching
457 forks
Language: Python
last commit: about 24 hours ago
Linked from 1 awesome list

benchmarkchatgptevaluationlarge-language-modelllama2llama3llmopenai

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
imoneoi/openchat Fine-tuned language models trained on mixed-quality data 5,273
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 7,200
open-compass/vlmevalkit An evaluation toolkit for large vision-language models 1,514
ianarawjo/chainforge An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. 2,413
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,732
traceloop/openllmetry A set of extensions built on top of OpenTelemetry to provide observability for large language model applications. 5,188
fittentech/openllama-chinese A Chinese language large language model built from OpenLLaMA and fine-tuned on various datasets for multilingual text generation. 65
openlmlab/openchinesellama An incremental pre-trained Chinese large language model based on the LLaMA-7B model 234
brexhq/prompt-engineering Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. 8,487
langfuse/langfuse An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools. 7,123
openbmb/toolbench A platform for training, serving, and evaluating large language models to enable tool use capability 4,888
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,775
open-compass/lawbench Evaluates the legal knowledge of large language models using a custom benchmarking framework. 273
thunlp/openprompt A flexible framework for adapting pre-trained language models to downstream NLP tasks using textual templates 4,394
openai-translator/openai-translator A multi-platform translator and text processing tool leveraging ChatGPT API 24,004