lm-evaluation-harness

Evaluation framework

Provides a unified framework to test generative language models on various evaluation tasks.

A framework for few-shot evaluation of language models.

GitHub

7k stars
39 watching
2k forks
Language: Python
last commit: 4 days ago
Linked from 4 awesome lists

evaluation-frameworklanguage-modeltransformer

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,725
explodinggradients/ragas A toolkit for evaluating and optimizing Large Language Model applications with data-driven insights 7,421
microsoft/promptbench A unified framework for evaluating large language models' performance and robustness in various scenarios. 2,480
microsoft/lmops A research initiative focused on developing fundamental technology to improve the performance and efficiency of large language models. 3,726
optimalscale/lmflow A toolkit for finetuning large language models and providing efficient inference capabilities 8,295
parisneo/lollms-webui An all-encompassing tool providing a web interface to access and utilize various AI models for tasks such as text generation, image analysis, music generation, and more. 4,368
confident-ai/deepeval A framework for evaluating large language models 3,852
langfuse/langfuse An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools. 6,875
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,477
brexhq/prompt-engineering Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. 8,462
ianarawjo/chainforge An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. 2,396
mooler0410/llmspracticalguide A curated list of resources to help developers navigate the landscape of large language models and their applications in NLP 9,526
openbmb/toolbench A platform for training, serving, and evaluating large language models to enable tool use capability 4,854
openai/evals A framework for evaluating large language models and systems, providing a registry of benchmarks. 15,103
young-geng/easylm A framework for training and serving large language models using JAX/Flax 2,416