promptbench
Model evaluator
A unified framework for evaluating large language models' performance and robustness in various scenarios.
A unified evaluation framework for large language models
2k stars
20 watching
184 forks
Language: Python
last commit: 3 months ago
Linked from 1 awesome list
adversarial-attacksbenchmarkchatgptevaluationlarge-language-modelspromptprompt-engineeringrobustness
Related projects:
Repository | Description | Stars |
---|---|---|
microsoft/prompt-engine | A utility library for crafting prompts to help Large Language Models generate specific outputs | 2,602 |
bigscience-workshop/promptsource | A toolkit for creating and using natural language prompts to enable large language models to generalize to new tasks. | 2,718 |
eleutherai/lm-evaluation-harness | Provides a unified framework to test generative language models on various evaluation tasks. | 7,200 |
brexhq/prompt-engineering | Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. | 8,487 |
thunlp/promptpapers | A curated list of papers on prompt-based tuning for pre-trained language models, providing insights and advancements in the field. | 4,112 |
agencyenterprise/promptinject | A framework for analyzing the robustness of large language models to adversarial prompt attacks | 318 |
google/big-bench | A benchmark designed to probe large language models and extrapolate their future capabilities through a diverse set of tasks. | 2,899 |
promptfoo/promptfoo | A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure | 4,976 |
hegelai/prompttools | A set of tools for testing and evaluating natural language processing models and vector databases. | 2,731 |
openbmb/toolbench | A platform for training, serving, and evaluating large language models to enable tool use capability | 4,888 |
dair-ai/prompt-engineering-guide | A comprehensive resource for guiding the development and optimization of prompts to use language models effectively in various applications. | 51,082 |
promptslab/promptify | A tool that uses large language models to extract structured information from unstructured text | 3,327 |
optimalscale/lmflow | A toolkit for fine-tuning and inferring large machine learning models | 8,312 |
ianarawjo/chainforge | An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. | 2,413 |
prompt-toolkit/python-prompt-toolkit | A Python library for building interactive command line applications with advanced features like syntax highlighting and code completion. | 9,423 |