promptbench

Model evaluator

A unified framework for evaluating large language models' performance and robustness in various scenarios.

A unified evaluation framework for large language models

GitHub

2k stars
20 watching
182 forks
Language: Python
last commit: 24 days ago
Linked from 1 awesome list

adversarial-attacksbenchmarkchatgptevaluationlarge-language-modelspromptprompt-engineeringrobustness

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
microsoft/prompt-engine A utility library for crafting prompts to help Large Language Models generate specific outputs 2,591
bigscience-workshop/promptsource A toolkit for creating and using natural language prompts to enable large language models to generalize to new tasks. 2,696
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 6,970
brexhq/prompt-engineering Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. 8,440
thunlp/promptpapers A curated list of papers on prompt-based tuning for pre-trained language models, providing insights and advancements in the field. 4,092
agencyenterprise/promptinject A framework for analyzing the robustness of large language models to adversarial prompt attacks 309
google/big-bench A benchmark designed to evaluate the capabilities of large language models by simulating various tasks and measuring their performance 2,868
promptfoo/promptfoo A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure 4,754
hegelai/prompttools A set of tools for testing and evaluating natural language processing models and vector databases. 2,708
openbmb/toolbench A platform for training, serving, and evaluating large language models to enable tool use capability 4,843
dair-ai/prompt-engineering-guide A comprehensive resource for designing and optimizing prompts to interact with language models 50,262
promptslab/promptify A tool that uses large language models to extract structured information from unstructured text 3,266
optimalscale/lmflow A toolkit for finetuning large language models and providing efficient inference capabilities 8,273
ianarawjo/chainforge An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. 2,334
prompt-toolkit/python-prompt-toolkit A Python library for building interactive command line applications with advanced features like syntax highlighting and code completion. 9,374