ragas

LLM evaluation toolkit

A toolkit for evaluating and optimizing Large Language Model applications with objective metrics, test data generation, and seamless integrations.

Supercharge Your LLM Application Evaluations 🚀

GitHub

8k stars

38 watching

771 forks

Language: Python

last commit: 10 months ago

Linked from 2 awesome lists

evaluationllmllmops

Screenshot of explodinggradients/ragas website

docs.ragas.io

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
confident-ai/deepeval	A framework for evaluating large language models	4,003
langfuse/langfuse	An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools.	7,123
eleutherai/lm-evaluation-harness	Provides a unified framework to test generative language models on various evaluation tasks.	7,200
evidentlyai/evidently	An observability framework for evaluating and monitoring the performance of machine learning models and data pipelines	5,519
openai/evals	A framework for evaluating large language models and systems, providing a registry of benchmarks.	15,168
ianarawjo/chainforge	An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance.	2,413
promptfoo/promptfoo	A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure	4,976
llmware-ai/llmware	A framework for building enterprise LLM-based applications using small, specialized models	8,303
pathwaycom/llm-app	Provides pre-built AI application templates to integrate Large Language Models (LLMs) with various data sources for scalable RAG and enterprise search.	7,426
shishirpatil/gorilla	Enables large language models to interact with external APIs using natural language queries	11,564
alpha-vllm/llama2-accessory	An open-source toolkit for pretraining and fine-tuning large language models	2,732
relari-ai/continuous-eval	Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics	455
young-geng/easylm	A framework for training and serving large language models using JAX/Flax	2,428
mlabonne/llm-course	A comprehensive course and resource package on building and deploying Large Language Models (LLMs)	40,053
giskard-ai/giskard	Automates the detection of performance, bias, and security issues in AI applications	4,125