ragas
LLM evaluation toolkit
A toolkit for evaluating and optimizing Large Language Model applications with objective metrics, test data generation, and seamless integrations.
Supercharge Your LLM Application Evaluations 🚀
8k stars
38 watching
771 forks
Language: Python
last commit: about 1 month ago
Linked from 2 awesome lists
evaluationllmllmops
Related projects:
Repository | Description | Stars |
---|---|---|
confident-ai/deepeval | A framework for evaluating large language models | 4,003 |
langfuse/langfuse | An integrated development platform for large language models (LLMs) that provides observability, analytics, and management tools. | 7,123 |
eleutherai/lm-evaluation-harness | Provides a unified framework to test generative language models on various evaluation tasks. | 7,200 |
evidentlyai/evidently | An observability framework for evaluating and monitoring the performance of machine learning models and data pipelines | 5,519 |
openai/evals | A framework for evaluating large language models and systems, providing a registry of benchmarks. | 15,168 |
ianarawjo/chainforge | An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. | 2,413 |
promptfoo/promptfoo | A tool for testing and evaluating large language models (LLMs) to ensure they are reliable and secure | 4,976 |
llmware-ai/llmware | A framework for building enterprise LLM-based applications using small, specialized models | 8,303 |
pathwaycom/llm-app | Provides pre-built AI application templates to integrate Large Language Models (LLMs) with various data sources for scalable RAG and enterprise search. | 7,426 |
shishirpatil/gorilla | Enables large language models to interact with external APIs using natural language queries | 11,564 |
alpha-vllm/llama2-accessory | An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
relari-ai/continuous-eval | Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics | 455 |
young-geng/easylm | A framework for training and serving large language models using JAX/Flax | 2,428 |
mlabonne/llm-course | A comprehensive course and resource package on building and deploying Large Language Models (LLMs) | 40,053 |
giskard-ai/giskard | Automates the detection of performance, bias, and security issues in AI applications | 4,125 |