deepeval
LLM evaluator
A framework for evaluating large language models
The LLM Evaluation Framework
4k stars
21 watching
292 forks
Language: Python
last commit: 9 days ago
Linked from 2 awesome lists
evaluation-frameworkevaluation-metricsllm-evaluationllm-evaluation-frameworkllm-evaluation-metrics
Related projects:
Repository | Description | Stars |
---|---|---|
explodinggradients/ragas | A toolkit for evaluating and optimizing Large Language Model applications with data-driven insights | 7,233 |
evidentlyai/evidently | An observability framework for evaluating and monitoring the performance of machine learning models and data pipelines | 5,419 |
eleutherai/lm-evaluation-harness | Provides a unified framework to test generative language models on various evaluation tasks. | 7,028 |
meta-llama/llama-stack | Provides a set of standardized APIs and tools to build generative AI applications | 4,591 |
ianarawjo/chainforge | An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. | 2,334 |
deepset-ai/haystack | An AI orchestration framework to build customizable LLM applications with advanced retrieval methods. | 17,817 |
ludwig-ai/ludwig | A low-code framework for building custom deep learning models and neural networks | 11,189 |
relari-ai/continuous-eval | Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics | 446 |
giskard-ai/giskard | Automates detection and evaluation of performance, bias, and security issues in AI applications | 4,071 |
openai/evals | A framework for evaluating large language models and systems, providing a registry of benchmarks. | 15,069 |
modeltc/lightllm | An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models. | 2,609 |
mlabonne/llm-course | A comprehensive course and resource package on building and deploying Large Language Models (LLMs) | 39,120 |
activeloopai/deeplake | A Database for AI that stores and manages various data types used in deep learning applications. | 8,188 |
psycoy/mixeval | An evaluation suite and dynamic data release platform for large language models | 224 |
ggerganov/llama.cpp | Enables efficient inference of large language models using optimized C/C++ implementations and various backend frameworks | 68,190 |