lm-evaluation-harness

Evaluation framework

Provides a unified framework to test generative language models on various evaluation tasks.

A framework for few-shot evaluation of language models.

GitHub

7k stars

38 watching

2k forks

Language: Python

last commit: 8 months ago

Linked from 4 awesome lists

evaluation-frameworklanguage-modeltransformer

Screenshot of EleutherAI/lm-evaluation-harness website

www.eleuther.ai

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
alpha-vllm/llama2-accessory	An open-source toolkit for pretraining and fine-tuning large language models	2,732
explodinggradients/ragas	A toolkit for evaluating and optimizing Large Language Model applications with objective metrics, test data generation, and seamless integrations.	7,598
microsoft/promptbench	A unified framework for evaluating large language models' performance and robustness in various scenarios.	2,487
microsoft/lmops	A research initiative focused on developing fundamental technology to improve the performance and efficiency of large language models.	3,747
optimalscale/lmflow	A toolkit for fine-tuning and inferring large machine learning models	8,312
parisneo/lollms-webui	An all-encompassing tool providing a web interface to access and utilize various AI models for tasks such as text generation, image analysis, music generation, and more.	4,394
confident-ai/deepeval	A framework for evaluating large language models	4,003
langfuse/langfuse	An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools.	7,123
haotian-liu/llava	A system that uses large language and vision models to generate and process visual instructions	20,683
brexhq/prompt-engineering	Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4.	8,487
ianarawjo/chainforge	An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance.	2,413
mooler0410/llmspracticalguide	A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP	9,551
openbmb/toolbench	A platform for training, serving, and evaluating large language models to enable tool use capability	4,888
openai/evals	A framework for evaluating large language models and systems, providing a registry of benchmarks.	15,168
young-geng/easylm	A framework for training and serving large language models using JAX/Flax	2,428