ragas

LLM evaluation tool

A toolkit for evaluating and optimizing Large Language Model applications with data-driven insights

Supercharge Your LLM Application Evaluations 🚀

GitHub

7k stars
37 watching
742 forks
Language: Python
last commit: 7 days ago
Linked from 2 awesome lists

evaluationllmllmops

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
confident-ai/deepeval A framework for evaluating large language models 3,669
langfuse/langfuse An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools. 6,537
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 6,970
evidentlyai/evidently An observability framework for evaluating and monitoring the performance of machine learning models and data pipelines 5,391
openai/evals A framework for evaluating large language models and systems, providing a registry of benchmarks. 15,015
ianarawjo/chainforge An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. 2,334
promptfoo/promptfoo A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure 4,754
llmware-ai/llmware A framework for building enterprise LLM-based applications using small, specialized models 6,651
pathwaycom/llm-app Pre-built templates for integrating large language models into enterprise applications with real-time data APIs and various data sources. 4,642
shishirpatil/gorilla Enables large language models to interact with external APIs using natural language queries 11,473
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,720
relari-ai/continuous-eval Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics 446
young-geng/easylm A framework for training and serving large language models using JAX/Flax 2,409
mlabonne/llm-course A comprehensive course and resource package on building and deploying Large Language Models (LLMs) 39,120
giskard-ai/giskard Automates detection and evaluation of performance, bias, and security issues in AI applications 4,071