ragas
LLM evaluation tool
A toolkit for evaluating and optimizing Large Language Model applications with data-driven insights
Supercharge Your LLM Application Evaluations 🚀
7k stars
37 watching
742 forks
Language: Python
last commit: 7 days ago
Linked from 2 awesome lists
evaluationllmllmops
Related projects:
Repository | Description | Stars |
---|---|---|
confident-ai/deepeval | A framework for evaluating large language models | 3,669 |
langfuse/langfuse | An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools. | 6,537 |
eleutherai/lm-evaluation-harness | Provides a unified framework to test generative language models on various evaluation tasks. | 6,970 |
evidentlyai/evidently | An observability framework for evaluating and monitoring the performance of machine learning models and data pipelines | 5,391 |
openai/evals | A framework for evaluating large language models and systems, providing a registry of benchmarks. | 15,015 |
ianarawjo/chainforge | An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. | 2,334 |
promptfoo/promptfoo | A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure | 4,754 |
llmware-ai/llmware | A framework for building enterprise LLM-based applications using small, specialized models | 6,651 |
pathwaycom/llm-app | Pre-built templates for integrating large language models into enterprise applications with real-time data APIs and various data sources. | 4,642 |
shishirpatil/gorilla | Enables large language models to interact with external APIs using natural language queries | 11,473 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,720 |
relari-ai/continuous-eval | Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics | 446 |
young-geng/easylm | A framework for training and serving large language models using JAX/Flax | 2,409 |
mlabonne/llm-course | A comprehensive course and resource package on building and deploying Large Language Models (LLMs) | 39,120 |
giskard-ai/giskard | Automates detection and evaluation of performance, bias, and security issues in AI applications | 4,071 |