promptbench

Model evaluator

A unified framework for evaluating large language models' performance and robustness in various scenarios.

A unified evaluation framework for large language models

GitHub

2k stars
20 watching
184 forks
Language: Python
last commit: 3 months ago
Linked from 1 awesome list

adversarial-attacksbenchmarkchatgptevaluationlarge-language-modelspromptprompt-engineeringrobustness

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
microsoft/prompt-engine A utility library for crafting prompts to help Large Language Models generate specific outputs 2,602
bigscience-workshop/promptsource A toolkit for creating and using natural language prompts to enable large language models to generalize to new tasks. 2,718
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 7,200
brexhq/prompt-engineering Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. 8,487
thunlp/promptpapers A curated list of papers on prompt-based tuning for pre-trained language models, providing insights and advancements in the field. 4,112
agencyenterprise/promptinject A framework for analyzing the robustness of large language models to adversarial prompt attacks 318
google/big-bench A benchmark designed to probe large language models and extrapolate their future capabilities through a diverse set of tasks. 2,899
promptfoo/promptfoo A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure 4,976
hegelai/prompttools A set of tools for testing and evaluating natural language processing models and vector databases. 2,731
openbmb/toolbench A platform for training, serving, and evaluating large language models to enable tool use capability 4,888
dair-ai/prompt-engineering-guide A comprehensive resource for guiding the development and optimization of prompts to use language models effectively in various applications. 51,082
promptslab/promptify A tool that uses large language models to extract structured information from unstructured text 3,327
optimalscale/lmflow A toolkit for fine-tuning and inferring large machine learning models 8,312
ianarawjo/chainforge An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. 2,413
prompt-toolkit/python-prompt-toolkit A Python library for building interactive command line applications with advanced features like syntax highlighting and code completion. 9,423