promptbench
Model evaluator
A unified framework for evaluating large language models' performance and robustness in various scenarios.
A unified evaluation framework for large language models
2k stars
20 watching
184 forks
Language: Python
last commit: 4 months ago
Linked from 1 awesome list
adversarial-attacksbenchmarkchatgptevaluationlarge-language-modelspromptprompt-engineeringrobustness
Related projects:
Repository | Description | Stars |
---|---|---|
| A utility library for crafting prompts to help Large Language Models generate specific outputs | 2,602 |
| A toolkit for creating and using natural language prompts to enable large language models to generalize to new tasks. | 2,718 |
| Provides a unified framework to test generative language models on various evaluation tasks. | 7,200 |
| Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. | 8,487 |
| A curated list of papers on prompt-based tuning for pre-trained language models, providing insights and advancements in the field. | 4,112 |
| A framework for analyzing the robustness of large language models to adversarial prompt attacks | 318 |
| A benchmark designed to probe large language models and extrapolate their future capabilities through a diverse set of tasks. | 2,899 |
| A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure | 4,976 |
| A set of tools for testing and evaluating natural language processing models and vector databases. | 2,731 |
| A platform for training, serving, and evaluating large language models to enable tool use capability | 4,888 |
| A comprehensive resource for guiding the development and optimization of prompts to use language models effectively in various applications. | 51,082 |
| A tool that uses large language models to extract structured information from unstructured text | 3,327 |
| A toolkit for fine-tuning and inferring large machine learning models | 8,312 |
| An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. | 2,413 |
| A Python library for building interactive command line applications with advanced features like syntax highlighting and code completion. | 9,423 |