deepeval

LLM evaluator

A framework for evaluating large language models

The LLM Evaluation Framework

GitHub

4k stars
23 watching
324 forks
Language: Python
last commit: about 1 month ago
Linked from 2 awesome lists

evaluation-frameworkevaluation-metricsllm-evaluationllm-evaluation-frameworkllm-evaluation-metrics

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
explodinggradients/ragas A toolkit for evaluating and optimizing Large Language Model applications with objective metrics, test data generation, and seamless integrations. 7,598
evidentlyai/evidently An observability framework for evaluating and monitoring the performance of machine learning models and data pipelines 5,519
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 7,200
meta-llama/llama-stack Provides pre-packaged building blocks for generative AI applications with standardized APIs and service-oriented design. 5,164
ianarawjo/chainforge An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. 2,413
deepset-ai/haystack An AI orchestration framework to build customizable LLM applications with advanced retrieval methods. 18,094
ludwig-ai/ludwig A low-code framework for building custom deep learning models and neural networks 11,236
relari-ai/continuous-eval Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics 455
giskard-ai/giskard Automates the detection of performance, bias, and security issues in AI applications 4,125
openai/evals A framework for evaluating large language models and systems, providing a registry of benchmarks. 15,168
modeltc/lightllm A Python-based framework for serving large language models with low latency and high scalability. 2,691
mlabonne/llm-course A comprehensive course and resource package on building and deploying Large Language Models (LLMs) 40,053
activeloopai/deeplake A Database for AI that stores and manages various data types used in deep learning applications. 8,237
psycoy/mixeval An evaluation suite and dynamic data release platform for large language models 230
ggerganov/llama.cpp Enables LLM inference with minimal setup and high performance on various hardware platforms 69,185