promptfoo

LLM tester

A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

GitHub

5k stars
21 watching
377 forks
Language: TypeScript
last commit: 4 days ago
Linked from 2 awesome lists

cici-cdcicdevaluationevaluation-frameworkllmllm-evalllm-evaluationllm-evaluation-frameworkllmopspentestingprompt-engineeringprompt-testingpromptsragred-teamingtestingvulnerability-scanners

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
hegelai/prompttools A set of tools for testing and evaluating natural language processing models and vector databases. 2,708
ianarawjo/chainforge An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. 2,334
promptslab/promptify A tool that uses large language models to extract structured information from unstructured text 3,266
microsoft/prompt-engine A utility library for crafting prompts to help Large Language Models generate specific outputs 2,591
pathwaycom/llm-app Pre-built templates for integrating large language models into enterprise applications with real-time data APIs and various data sources. 4,642
explodinggradients/ragas A toolkit for evaluating and optimizing Large Language Model applications with data-driven insights 7,233
microsoft/promptbench A unified framework for evaluating large language models' performance and robustness in various scenarios. 2,462
confident-ai/deepeval A framework for evaluating large language models 3,669
rafalzawadzki/spellbook-forge An ExpressJS middleware that allows users to execute LLM prompts stored in a git repository and retrieve results from a chosen model. 74
agencyenterprise/promptinject A framework for analyzing the robustness of large language models to adversarial prompt attacks 309
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 6,970
langgenius/dify An open-source LLM app development platform that enables users to build and deploy AI-powered applications quickly and efficiently. 51,873
llmware-ai/llmware A framework for building enterprise LLM-based applications using small, specialized models 6,651
langroid/langroid A Python framework to build LLM-powered applications by setting up agents with optional components and having them collaboratively solve problems through message exchange 2,654
poyro/poyro An extension of Vitest for testing LLM applications using local language models 30