ChainForge

Prompt testing env

An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance.

An open-source visual programming environment for battle-testing prompts to LLMs.

GitHub

2k stars

30 watching

188 forks

Language: TypeScript

last commit: 8 months ago

aievaluationlarge-language-modelsllmopsllmsprompt-engineering

Screenshot of ianarawjo/ChainForge website

chainforge.ai/docs

Related projects:

Repository	Description	Stars
promptfoo/promptfoo	A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure	4,976
hegelai/prompttools	A set of tools for testing and evaluating natural language processing models and vector databases.	2,731
eleutherai/lm-evaluation-harness	Provides a unified framework to test generative language models on various evaluation tasks.	7,200
confident-ai/deepeval	A framework for evaluating large language models	4,003
explodinggradients/ragas	A toolkit for evaluating and optimizing Large Language Model applications with objective metrics, test data generation, and seamless integrations.	7,598
langfuse/langfuse	An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools.	7,123
llmware-ai/llmware	A framework for building enterprise LLM-based applications using small, specialized models	8,303
open-compass/opencompass	An LLM evaluation platform supporting various models and datasets	4,295
openai/evals	A framework for evaluating large language models and systems, providing a registry of benchmarks.	15,168
scisharp/llamasharp	An efficient C#/.NET library for running Large Language Models (LLMs) on local devices	2,750
brexhq/prompt-engineering	Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4.	8,487
instructor-ai/instructor	A Python library that simplifies working with structured outputs from large language models	8,551
microsoft/prompt-engine	A utility library for crafting prompts to help Large Language Models generate specific outputs	2,602
fminference/flexllmgen	Generates large language model outputs in high-throughput mode on single GPUs	9,236
pair-code/lit	An interactive tool for analyzing and understanding machine learning models	3,500