deepeval

LLM evaluator

A framework for evaluating large language models

The LLM Evaluation Framework

GitHub

4k stars
21 watching
292 forks
Language: Python
last commit: 9 days ago
Linked from 2 awesome lists

evaluation-frameworkevaluation-metricsllm-evaluationllm-evaluation-frameworkllm-evaluation-metrics

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
explodinggradients/ragas A toolkit for evaluating and optimizing Large Language Model applications with data-driven insights 7,233
evidentlyai/evidently An observability framework for evaluating and monitoring the performance of machine learning models and data pipelines 5,419
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 7,028
meta-llama/llama-stack Provides a set of standardized APIs and tools to build generative AI applications 4,591
ianarawjo/chainforge An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. 2,334
deepset-ai/haystack An AI orchestration framework to build customizable LLM applications with advanced retrieval methods. 17,817
ludwig-ai/ludwig A low-code framework for building custom deep learning models and neural networks 11,189
relari-ai/continuous-eval Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics 446
giskard-ai/giskard Automates detection and evaluation of performance, bias, and security issues in AI applications 4,071
openai/evals A framework for evaluating large language models and systems, providing a registry of benchmarks. 15,069
modeltc/lightllm An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models. 2,609
mlabonne/llm-course A comprehensive course and resource package on building and deploying Large Language Models (LLMs) 39,120
activeloopai/deeplake A Database for AI that stores and manages various data types used in deep learning applications. 8,188
psycoy/mixeval An evaluation suite and dynamic data release platform for large language models 224
ggerganov/llama.cpp Enables efficient inference of large language models using optimized C/C++ implementations and various backend frameworks 68,190