promptfoo

LLM tester

A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

GitHub

5k stars
21 watching
405 forks
Language: TypeScript
last commit: 4 months ago
Linked from 2 awesome lists

cici-cdcicdevaluationevaluation-frameworkllmllm-evalllm-evaluationllm-evaluation-frameworkllmopspentestingprompt-engineeringprompt-testingpromptsragred-teamingtestingvulnerability-scanners

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
hegelai/prompttools A set of tools for testing and evaluating natural language processing models and vector databases. 2,731
ianarawjo/chainforge An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. 2,413
promptslab/promptify A tool that uses large language models to extract structured information from unstructured text 3,327
microsoft/prompt-engine A utility library for crafting prompts to help Large Language Models generate specific outputs 2,602
pathwaycom/llm-app Provides pre-built AI application templates to integrate Large Language Models (LLMs) with various data sources for scalable RAG and enterprise search. 7,426
explodinggradients/ragas A toolkit for evaluating and optimizing Large Language Model applications with objective metrics, test data generation, and seamless integrations. 7,598
microsoft/promptbench A unified framework for evaluating large language models' performance and robustness in various scenarios. 2,487
confident-ai/deepeval A framework for evaluating large language models 4,003
rafalzawadzki/spellbook-forge An ExpressJS middleware that allows users to execute LLM prompts stored in a git repository and retrieve results from a chosen model. 74
agencyenterprise/promptinject A framework for analyzing the robustness of large language models to adversarial prompt attacks 318
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 7,200
langgenius/dify An open-source LLM app development platform that enables users to build and deploy AI-powered applications quickly and efficiently. 54,931
llmware-ai/llmware A framework for building enterprise LLM-based applications using small, specialized models 8,303
langroid/langroid A Python framework to build LLM-powered applications by setting up agents with optional components and having them collaboratively solve problems through message exchange 2,795
poyro/poyro An extension of Vitest for testing LLM applications using local language models 31