continuous-eval

LLM evaluation framework

Provides a comprehensive framework for evaluating Large Language Model (LLM) applications and pipelines with customizable metrics

Data-Driven Evaluation for LLM-Powered Applications

GitHub

455 stars

4 watching

31 forks

Language: Python

last commit: 11 months ago

Linked from 1 awesome list

evaluation-frameworkevaluation-metricsinformation-retrievalllm-evaluationllmopsragretrieval-augmented-generation

Screenshot of relari-ai/continuous-eval website

continuous-eval.docs.relari.ai/

Backlinks from these awesome lists:

ethicalml/awesome-production-machine-learning

Related projects:

Repository	Description	Stars
mlgroupjlu/llm-eval-survey	A repository of papers and resources for evaluating large language models.	1,450
aiverify-foundation/llm-evals-catalogue	A collaborative catalogue of LLM evaluation frameworks and papers	13
h2oai/h2o-llm-eval	An evaluation framework for large language models with Elo rating system and A/B testing capabilities	50
wgryc/phasellm	A framework for managing and testing large language models to evaluate their performance and optimize user experiences.	451
allenai/olmo-eval	A framework for evaluating language models on NLP tasks	326
psycoy/mixeval	An evaluation suite and dynamic data release platform for large language models	230
ai-hypercomputer/maxtext	A high-performance LLM written in Python/Jax for training and inference on Google Cloud TPUs and GPUs.	1,557
aiplanethub/beyondllm	An open-source toolkit for building and evaluating large language models	267
ray-project/llmperf	A tool for evaluating the performance of large language model APIs	678
qcri/llmebench	A benchmarking framework for large language models	81
prometheus-eval/prometheus-eval	An open-source framework that enables language model evaluation using Prometheus and GPT4	820
victordibia/llmx	An API that provides a unified interface to multiple large language models for chat fine-tuning	79
volcengine/verl	A flexible RL training framework designed for large language models	427
modelscope/evalscope	A framework for efficiently evaluating and benchmarking large models	308
mlabonne/llm-autoeval	A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters.	566