ChainForge

Prompt testing env

An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance.

An open-source visual programming environment for battle-testing prompts to LLMs.

GitHub

2k stars
30 watching
179 forks
Language: TypeScript
last commit: 23 days ago
aievaluationlarge-language-modelsllmopsllmsprompt-engineering

Related projects:

Repository Description Stars
promptfoo/promptfoo A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure 4,754
hegelai/prompttools A set of tools for testing and evaluating natural language processing models and vector databases. 2,708
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 6,970
confident-ai/deepeval A framework for evaluating large language models 3,669
explodinggradients/ragas A toolkit for evaluating and optimizing Large Language Model applications with data-driven insights 7,233
langfuse/langfuse An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools. 6,537
llmware-ai/llmware A framework for building enterprise LLM-based applications using small, specialized models 6,651
open-compass/opencompass An LLM evaluation platform supporting various models and datasets 4,124
openai/evals A framework for evaluating large language models and systems, providing a registry of benchmarks. 15,015
scisharp/llamasharp A C#/.NET library to efficiently run Large Language Models (LLMs) on local devices 2,673
brexhq/prompt-engineering Guides software developers on how to effectively use and build systems around Large Language Models like GPT-4. 8,440
instructor-ai/instructor A Python library that provides structured outputs from large language models (LLMs) and facilitates seamless integration with various LLM providers. 8,163
microsoft/prompt-engine A utility library for crafting prompts to help Large Language Models generate specific outputs 2,591
fminference/flexllmgen Generates large language model outputs in high-throughput mode on single GPUs 9,192
pair-code/lit An interactive tool for analyzing and understanding machine learning models 3,492