auto-evaluator

QA evaluator

An evaluation tool for question-answering systems using large language models and natural language processing techniques

Evaluation tool for LLM QA chains

GitHub

1k stars
8 watching
95 forks
Language: Python
last commit: over 1 year ago
Linked from 2 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
langchain-ai/auto-evaluator Automated evaluation of language models for question answering tasks 744
allenai/olmo-eval An evaluation framework for large language models. 310
mlabonne/llm-autoeval A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters. 558
krrishdholakia/betterprompt An API for evaluating the quality of text prompts used in Large Language Models (LLMs) based on perplexity estimation 38
tatsu-lab/alpaca_eval An automatic evaluation tool for large language models 1,526
openai/simple-evals A library for evaluating language models using standardized prompts and benchmarking tests. 1,939
gomate-community/rageval An evaluation tool for Retrieval-augmented Generation methods 132
declare-lab/instruct-eval An evaluation framework for large language models trained with instruction tuning methods 528
reworkd/bananalyzer A tool to evaluate AI agents on web tasks by dynamically constructing and executing test suites against predefined example websites. 267
evolvinglmms-lab/lmms-eval Tools and evaluation suite for large multimodal models 2,058
obss/jury A comprehensive toolkit for evaluating NLP experiments offering automated metrics and efficient computation. 188
stanford-futuredata/ares A tool for automatically evaluating RAG models by generating synthetic data and fine-tuning classifiers 483
chenllliang/mmevalpro A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. 22
allenai/document-qa Tools and codebase for training neural question answering models on multiple paragraphs of text data 434
allenai/reward-bench A comprehensive benchmarking framework for evaluating the performance and safety of reward models in reinforcement learning. 429