deepeval

LLM evaluator

A framework for evaluating large language models

The LLM Evaluation Framework

GitHub

4k stars

23 watching

324 forks

Language: Python

last commit: 10 months ago

Linked from 2 awesome lists

evaluation-frameworkevaluation-metricsllm-evaluationllm-evaluation-frameworkllm-evaluation-metrics

Screenshot of confident-ai/deepeval website

docs.confident-ai.com/

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
explodinggradients/ragas	A toolkit for evaluating and optimizing Large Language Model applications with objective metrics, test data generation, and seamless integrations.	7,598
evidentlyai/evidently	An observability framework for evaluating and monitoring the performance of machine learning models and data pipelines	5,519
eleutherai/lm-evaluation-harness	Provides a unified framework to test generative language models on various evaluation tasks.	7,200
meta-llama/llama-stack	Provides pre-packaged building blocks for generative AI applications with standardized APIs and service-oriented design.	5,164
ianarawjo/chainforge	An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance.	2,413
deepset-ai/haystack	An AI orchestration framework to build customizable LLM applications with advanced retrieval methods.	18,094
ludwig-ai/ludwig	A low-code framework for building custom deep learning models and neural networks	11,236
relari-ai/continuous-eval	Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics	455
giskard-ai/giskard	Automates the detection of performance, bias, and security issues in AI applications	4,125
openai/evals	A framework for evaluating large language models and systems, providing a registry of benchmarks.	15,168
modeltc/lightllm	A Python-based framework for serving large language models with low latency and high scalability.	2,691
mlabonne/llm-course	A comprehensive course and resource package on building and deploying Large Language Models (LLMs)	40,053
activeloopai/deeplake	A Database for AI that stores and manages various data types used in deep learning applications.	8,237
psycoy/mixeval	An evaluation suite and dynamic data release platform for large language models	230
ggerganov/llama.cpp	Enables LLM inference with minimal setup and high performance on various hardware platforms	69,185