promptbench

Model evaluator

A unified framework for evaluating large language models' performance and robustness in various scenarios.

A unified evaluation framework for large language models

GitHub

2k stars

20 watching

184 forks

Language: Python

last commit: 12 months ago

Linked from 1 awesome list

adversarial-attacksbenchmarkchatgptevaluationlarge-language-modelspromptprompt-engineeringrobustness

Screenshot of microsoft/promptbench website

aka.ms/promptbench

Backlinks from these awesome lists:

ethicalml/awesome-production-machine-learning

Related projects:

Repository	Description	Stars
microsoft/prompt-engine	A utility library for crafting prompts to help Large Language Models generate specific outputs	2,602
bigscience-workshop/promptsource	A toolkit for creating and using natural language prompts to enable large language models to generalize to new tasks.	2,718
eleutherai/lm-evaluation-harness	Provides a unified framework to test generative language models on various evaluation tasks.	7,200
brexhq/prompt-engineering	Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4.	8,487
thunlp/promptpapers	A curated list of papers on prompt-based tuning for pre-trained language models, providing insights and advancements in the field.	4,112
agencyenterprise/promptinject	A framework for analyzing the robustness of large language models to adversarial prompt attacks	318
google/big-bench	A benchmark designed to probe large language models and extrapolate their future capabilities through a diverse set of tasks.	2,899
promptfoo/promptfoo	A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure	4,976
hegelai/prompttools	A set of tools for testing and evaluating natural language processing models and vector databases.	2,731
openbmb/toolbench	A platform for training, serving, and evaluating large language models to enable tool use capability	4,888
dair-ai/prompt-engineering-guide	A comprehensive resource for guiding the development and optimization of prompts to use language models effectively in various applications.	51,082
promptslab/promptify	A tool that uses large language models to extract structured information from unstructured text	3,327
optimalscale/lmflow	A toolkit for fine-tuning and inferring large machine learning models	8,312
ianarawjo/chainforge	An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance.	2,413
prompt-toolkit/python-prompt-toolkit	A Python library for building interactive command line applications with advanced features like syntax highlighting and code completion.	9,423