promptfoo

LLM tester

A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

GitHub

5k stars

21 watching

405 forks

Language: TypeScript

last commit: 8 months ago

Linked from 2 awesome lists

cici-cdcicdevaluationevaluation-frameworkllmllm-evalllm-evaluationllm-evaluation-frameworkllmopspentestingprompt-engineeringprompt-testingpromptsragred-teamingtestingvulnerability-scanners

Screenshot of promptfoo/promptfoo website

promptfoo.dev

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
hegelai/prompttools	A set of tools for testing and evaluating natural language processing models and vector databases.	2,731
ianarawjo/chainforge	An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance.	2,413
promptslab/promptify	A tool that uses large language models to extract structured information from unstructured text	3,327
microsoft/prompt-engine	A utility library for crafting prompts to help Large Language Models generate specific outputs	2,602
pathwaycom/llm-app	Provides pre-built AI application templates to integrate Large Language Models (LLMs) with various data sources for scalable RAG and enterprise search.	7,426
explodinggradients/ragas	A toolkit for evaluating and optimizing Large Language Model applications with objective metrics, test data generation, and seamless integrations.	7,598
microsoft/promptbench	A unified framework for evaluating large language models' performance and robustness in various scenarios.	2,487
confident-ai/deepeval	A framework for evaluating large language models	4,003
rafalzawadzki/spellbook-forge	An ExpressJS middleware that allows users to execute LLM prompts stored in a git repository and retrieve results from a chosen model.	74
agencyenterprise/promptinject	A framework for analyzing the robustness of large language models to adversarial prompt attacks	318
eleutherai/lm-evaluation-harness	Provides a unified framework to test generative language models on various evaluation tasks.	7,200
langgenius/dify	An open-source LLM app development platform that enables users to build and deploy AI-powered applications quickly and efficiently.	54,931
llmware-ai/llmware	A framework for building enterprise LLM-based applications using small, specialized models	8,303
langroid/langroid	A Python framework to build LLM-powered applications by setting up agents with optional components and having them collaboratively solve problems through message exchange	2,795
poyro/poyro	An extension of Vitest for testing LLM applications using local language models	31