opencompass

LLM evaluator

An LLM evaluation platform supporting various models and datasets

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

GitHub

4k stars
26 watching
437 forks
Language: Python
last commit: 6 days ago
Linked from 1 awesome list

benchmarkchatgptevaluationlarge-language-modelllama2llama3llmopenai

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
imoneoi/openchat Fine-tuned language models trained on mixed-quality data 5,260
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 6,970
open-compass/vlmevalkit A toolkit for evaluating large vision-language models on various benchmarks and datasets. 1,343
ianarawjo/chainforge An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. 2,334
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,720
traceloop/openllmetry A collection of extensions and instrumentations for monitoring large language models (LLMs) in AI and machine learning applications. 3,454
fittentech/openllama-chinese A Chinese language large language model built from OpenLLaMA and fine-tuned on various datasets for multilingual text generation. 64
openlmlab/openchinesellama An incremental pre-trained Chinese large language model based on the LLaMA-7B model 234
brexhq/prompt-engineering Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. 8,440
langfuse/langfuse An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools. 6,537
openbmb/toolbench A platform for training, serving, and evaluating large language models to enable tool use capability 4,843
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,754
open-compass/lawbench Evaluates the legal knowledge of large language models using a custom benchmarking framework. 267
thunlp/openprompt A flexible framework for adapting pre-trained language models to downstream NLP tasks using textual templates 4,371
openai-translator/openai-translator A multi-platform translator and text processing tool leveraging ChatGPT API 23,908