opencompass
LLM evaluator
An LLM evaluation platform supporting various models and datasets
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
4k stars
26 watching
457 forks
Language: Python
last commit: about 24 hours ago
Linked from 1 awesome list
benchmarkchatgptevaluationlarge-language-modelllama2llama3llmopenai
Related projects:
Repository | Description | Stars |
---|---|---|
imoneoi/openchat | Fine-tuned language models trained on mixed-quality data | 5,273 |
eleutherai/lm-evaluation-harness | Provides a unified framework to test generative language models on various evaluation tasks. | 7,200 |
open-compass/vlmevalkit | An evaluation toolkit for large vision-language models | 1,514 |
ianarawjo/chainforge | An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. | 2,413 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
traceloop/openllmetry | A set of extensions built on top of OpenTelemetry to provide observability for large language model applications. | 5,188 |
fittentech/openllama-chinese | A Chinese language large language model built from OpenLLaMA and fine-tuned on various datasets for multilingual text generation. | 65 |
openlmlab/openchinesellama | An incremental pre-trained Chinese large language model based on the LLaMA-7B model | 234 |
brexhq/prompt-engineering | Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. | 8,487 |
langfuse/langfuse | An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools. | 7,123 |
openbmb/toolbench | A platform for training, serving, and evaluating large language models to enable tool use capability | 4,888 |
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,775 |
open-compass/lawbench | Evaluates the legal knowledge of large language models using a custom benchmarking framework. | 273 |
thunlp/openprompt | A flexible framework for adapting pre-trained language models to downstream NLP tasks using textual templates | 4,394 |
openai-translator/openai-translator | A multi-platform translator and text processing tool leveraging ChatGPT API | 24,004 |