promptfoo
LLM tester
A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
5k stars
21 watching
377 forks
Language: TypeScript
last commit: 4 days ago
Linked from 2 awesome lists
cici-cdcicdevaluationevaluation-frameworkllmllm-evalllm-evaluationllm-evaluation-frameworkllmopspentestingprompt-engineeringprompt-testingpromptsragred-teamingtestingvulnerability-scanners
Related projects:
Repository | Description | Stars |
---|---|---|
hegelai/prompttools | A set of tools for testing and evaluating natural language processing models and vector databases. | 2,708 |
ianarawjo/chainforge | An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. | 2,334 |
promptslab/promptify | A tool that uses large language models to extract structured information from unstructured text | 3,266 |
microsoft/prompt-engine | A utility library for crafting prompts to help Large Language Models generate specific outputs | 2,591 |
pathwaycom/llm-app | Pre-built templates for integrating large language models into enterprise applications with real-time data APIs and various data sources. | 4,642 |
explodinggradients/ragas | A toolkit for evaluating and optimizing Large Language Model applications with data-driven insights | 7,233 |
microsoft/promptbench | A unified framework for evaluating large language models' performance and robustness in various scenarios. | 2,462 |
confident-ai/deepeval | A framework for evaluating large language models | 3,669 |
rafalzawadzki/spellbook-forge | An ExpressJS middleware that allows users to execute LLM prompts stored in a git repository and retrieve results from a chosen model. | 74 |
agencyenterprise/promptinject | A framework for analyzing the robustness of large language models to adversarial prompt attacks | 309 |
eleutherai/lm-evaluation-harness | Provides a unified framework to test generative language models on various evaluation tasks. | 6,970 |
langgenius/dify | An open-source LLM app development platform that enables users to build and deploy AI-powered applications quickly and efficiently. | 51,873 |
llmware-ai/llmware | A framework for building enterprise LLM-based applications using small, specialized models | 6,651 |
langroid/langroid | A Python framework to build LLM-powered applications by setting up agents with optional components and having them collaboratively solve problems through message exchange | 2,654 |
poyro/poyro | An extension of Vitest for testing LLM applications using local language models | 30 |