promptbench
Model evaluator
A unified framework for evaluating large language models' performance and robustness in various scenarios.
A unified evaluation framework for large language models
2k stars
20 watching
182 forks
Language: Python
last commit: 24 days ago
Linked from 1 awesome list
adversarial-attacksbenchmarkchatgptevaluationlarge-language-modelspromptprompt-engineeringrobustness
Related projects:
Repository | Description | Stars |
---|---|---|
microsoft/prompt-engine | A utility library for crafting prompts to help Large Language Models generate specific outputs | 2,591 |
bigscience-workshop/promptsource | A toolkit for creating and using natural language prompts to enable large language models to generalize to new tasks. | 2,696 |
eleutherai/lm-evaluation-harness | Provides a unified framework to test generative language models on various evaluation tasks. | 6,970 |
brexhq/prompt-engineering | Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. | 8,440 |
thunlp/promptpapers | A curated list of papers on prompt-based tuning for pre-trained language models, providing insights and advancements in the field. | 4,092 |
agencyenterprise/promptinject | A framework for analyzing the robustness of large language models to adversarial prompt attacks | 309 |
google/big-bench | A benchmark designed to evaluate the capabilities of large language models by simulating various tasks and measuring their performance | 2,868 |
promptfoo/promptfoo | A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure | 4,754 |
hegelai/prompttools | A set of tools for testing and evaluating natural language processing models and vector databases. | 2,708 |
openbmb/toolbench | A platform for training, serving, and evaluating large language models to enable tool use capability | 4,843 |
dair-ai/prompt-engineering-guide | A comprehensive resource for designing and optimizing prompts to interact with language models | 50,262 |
promptslab/promptify | A tool that uses large language models to extract structured information from unstructured text | 3,266 |
optimalscale/lmflow | A toolkit for finetuning large language models and providing efficient inference capabilities | 8,273 |
ianarawjo/chainforge | An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. | 2,334 |
prompt-toolkit/python-prompt-toolkit | A Python library for building interactive command line applications with advanced features like syntax highlighting and code completion. | 9,374 |