LLMeBench

LLM benchmarker

A benchmarking framework for large language models

Benchmarking Large Language Models

GitHub

81 stars
13 watching
18 forks
Language: Python
last commit: 4 months ago
Linked from 1 awesome list

benchmarkinglarge-language-modelsllmmultilingual

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
ray-project/llmperf A tool for evaluating the performance of large language model APIs 678
damo-nlp-sg/m3exam A benchmark for evaluating large language models in multiple languages and formats 93
bilibili/index-1.9b A lightweight, multilingual language model with a long context length 920
multimodal-art-projection/omnibench Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. 15
aifeg/benchlmm An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models 84
felixgithub2017/mmcu Measures the understanding of massive multitask Chinese datasets using large language models 87
nanbeige/nanbeige Develops large language models for text understanding and generation tasks. 85
ailab-cvc/seed-bench A benchmark for evaluating large language models' ability to process multimodal input 322
xverse-ai/xverse-7b A multilingual large language model developed by XVERSE Technology Inc. 50
bytedance/lynx-llm A framework for training GPT4-style language models with multimodal inputs using large datasets and pre-trained models 231
tianyi-lab/hallusionbench An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy 259
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 73
gmftbygmftby/science-llm A large-scale language model for scientific domain training on redpajama arXiv split 125
openbmb/bmlist A curated list of large machine learning models tracked over time 341
aiplanethub/beyondllm An open-source toolkit for building and evaluating large language models 267