auto-evaluator
QA evaluator
An evaluation tool for question-answering systems using large language models and natural language processing techniques
Evaluation tool for LLM QA chains
1k stars
8 watching
95 forks
Language: Python
last commit: almost 2 years ago
Linked from 2 awesome lists
Related projects:
Repository | Description | Stars |
---|---|---|
| Automated evaluation of language models for question answering tasks | 749 |
| A framework for evaluating language models on NLP tasks | 326 |
| A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters. | 566 |
| An API for evaluating the quality of text prompts used in Large Language Models (LLMs) based on perplexity estimation | 43 |
| An automatic evaluation tool for large language models | 1,568 |
| Evaluates language models using standardized benchmarks and prompting techniques. | 2,059 |
| An evaluation tool for Retrieval-augmented Generation methods | 141 |
| An evaluation framework for large language models trained with instruction tuning methods | 535 |
| A tool to evaluate AI agents on web tasks by dynamically constructing and executing test suites against predefined example websites. | 274 |
| Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance | 2,164 |
| A comprehensive toolkit for evaluating NLP experiments offering automated metrics and efficient computation. | 187 |
| A tool for automatically evaluating RAG models by generating synthetic data and fine-tuning classifiers | 499 |
| A benchmarking framework for evaluating Large Multimodal Models by providing rigorous metrics and an efficient evaluation pipeline. | 22 |
| Tools and codebase for training neural question answering models on multiple paragraphs of text data | 435 |
| A comprehensive benchmarking framework for evaluating the performance and safety of reward models in reinforcement learning. | 459 |