auto-evaluator

QA evaluation tool

Automated evaluation of language models for question answering tasks

GitHub

749 stars
12 watching
100 forks
Language: TypeScript
last commit: 12 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
rlancemartin/auto-evaluator An evaluation tool for question-answering systems using large language models and natural language processing techniques 1,065
allenai/document-qa Tools and codebase for training neural question answering models on multiple paragraphs of text data 435
retraigo/appraisal Utilities for transforming and analyzing text data using machine learning algorithms 5
langchain-ai/langserve Provides a REST API for deploying and managing LangChain runnables and chains 1,970
mlabonne/llm-autoeval A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters. 566
allenai/olmo-eval A framework for evaluating language models on NLP tasks 326
cloud-cv/evalai A platform for comparing and evaluating AI and machine learning algorithms at scale 1,779
open-compass/lawbench Evaluates the legal knowledge of large language models using a custom benchmarking framework. 273
ruixiangcui/agieval Evaluates foundation models on human-centric tasks with diverse exams and question types 714
wordweb/langchain-chatglm-and-tigerbot Develops a knowledge-based question answering application using Open Source models like ChatGLM and TigerBot 105
kevincoble/aitoolbox A toolbox of AI modules written in Swift for various machine learning tasks and algorithms 794
openai/simple-evals Evaluates language models using standardized benchmarks and prompting techniques. 2,059
langchain-ai/langgraphjs A framework for building resilient, stateful applications with LLMs as directed graphs 742
johnsnowlabs/langtest A tool for testing and evaluating large language models with a focus on AI safety and model assessment. 506
tatsu-lab/alpaca_eval An automatic evaluation tool for large language models 1,568