deepeval
LLM evaluator
A framework for evaluating large language models
The LLM Evaluation Framework
4k stars
23 watching
324 forks
Language: Python
last commit: about 1 month ago
Linked from 2 awesome lists
evaluation-frameworkevaluation-metricsllm-evaluationllm-evaluation-frameworkllm-evaluation-metrics
Related projects:
Repository | Description | Stars |
---|---|---|
explodinggradients/ragas | A toolkit for evaluating and optimizing Large Language Model applications with objective metrics, test data generation, and seamless integrations. | 7,598 |
evidentlyai/evidently | An observability framework for evaluating and monitoring the performance of machine learning models and data pipelines | 5,519 |
eleutherai/lm-evaluation-harness | Provides a unified framework to test generative language models on various evaluation tasks. | 7,200 |
meta-llama/llama-stack | Provides pre-packaged building blocks for generative AI applications with standardized APIs and service-oriented design. | 5,164 |
ianarawjo/chainforge | An environment for battle-testing prompts to Large Language Models (LLMs) to evaluate response quality and performance. | 2,413 |
deepset-ai/haystack | An AI orchestration framework to build customizable LLM applications with advanced retrieval methods. | 18,094 |
ludwig-ai/ludwig | A low-code framework for building custom deep learning models and neural networks | 11,236 |
relari-ai/continuous-eval | Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics | 455 |
giskard-ai/giskard | Automates the detection of performance, bias, and security issues in AI applications | 4,125 |
openai/evals | A framework for evaluating large language models and systems, providing a registry of benchmarks. | 15,168 |
modeltc/lightllm | A Python-based framework for serving large language models with low latency and high scalability. | 2,691 |
mlabonne/llm-course | A comprehensive course and resource package on building and deploying Large Language Models (LLMs) | 40,053 |
activeloopai/deeplake | A Database for AI that stores and manages various data types used in deep learning applications. | 8,237 |
psycoy/mixeval | An evaluation suite and dynamic data release platform for large language models | 230 |
ggerganov/llama.cpp | Enables LLM inference with minimal setup and high performance on various hardware platforms | 69,185 |