ragas

LLM evaluation toolkit

A toolkit for evaluating and optimizing Large Language Model applications with objective metrics, test data generation, and seamless integrations.

Supercharge Your LLM Application Evaluations 🚀

GitHub

8k stars
38 watching
771 forks
Language: Python
last commit: about 1 month ago
Linked from 2 awesome lists

evaluationllmllmops

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
confident-ai/deepeval A framework for evaluating large language models 4,003
langfuse/langfuse An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools. 7,123
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 7,200
evidentlyai/evidently An observability framework for evaluating and monitoring the performance of machine learning models and data pipelines 5,519
openai/evals A framework for evaluating large language models and systems, providing a registry of benchmarks. 15,168
ianarawjo/chainforge An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. 2,413
promptfoo/promptfoo A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure 4,976
llmware-ai/llmware A framework for building enterprise LLM-based applications using small, specialized models 8,303
pathwaycom/llm-app Provides pre-built AI application templates to integrate Large Language Models (LLMs) with various data sources for scalable RAG and enterprise search. 7,426
shishirpatil/gorilla Enables large language models to interact with external APIs using natural language queries 11,564
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,732
relari-ai/continuous-eval Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics 455
young-geng/easylm A framework for training and serving large language models using JAX/Flax 2,428
mlabonne/llm-course A comprehensive course and resource package on building and deploying Large Language Models (LLMs) 40,053
giskard-ai/giskard Automates the detection of performance, bias, and security issues in AI applications 4,125