auto-evaluator

QA evaluation tool

Automated evaluation of language models for question answering tasks

749 stars

12 watching

100 forks

Language: TypeScript

last commit: over 1 year ago

Linked from 1 awesome list

Screenshot of langchain-ai/auto-evaluator website

autoevaluator.langchain.com/

Backlinks from these awesome lists:

kyrolabs/awesome-langchain

Related projects:

Repository	Description	Stars
rlancemartin/auto-evaluator	An evaluation tool for question-answering systems using large language models and natural language processing techniques	1,065
allenai/document-qa	Tools and codebase for training neural question answering models on multiple paragraphs of text data	435
retraigo/appraisal	Utilities for transforming and analyzing text data using machine learning algorithms	5
langchain-ai/langserve	Provides a REST API for deploying and managing LangChain runnables and chains	1,970
mlabonne/llm-autoeval	A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters.	566
allenai/olmo-eval	A framework for evaluating language models on NLP tasks	326
cloud-cv/evalai	A platform for comparing and evaluating AI and machine learning algorithms at scale	1,779
open-compass/lawbench	Evaluates the legal knowledge of large language models using a custom benchmarking framework.	273
ruixiangcui/agieval	Evaluates foundation models on human-centric tasks with diverse exams and question types	714
wordweb/langchain-chatglm-and-tigerbot	Develops a knowledge-based question answering application using Open Source models like ChatGLM and TigerBot	105
kevincoble/aitoolbox	A toolbox of AI modules written in Swift for various machine learning tasks and algorithms	794
openai/simple-evals	Evaluates language models using standardized benchmarks and prompting techniques.	2,059
langchain-ai/langgraphjs	A framework for building resilient, stateful applications with LLMs as directed graphs	742
johnsnowlabs/langtest	A tool for testing and evaluating large language models with a focus on AI safety and model assessment.	506
tatsu-lab/alpaca_eval	An automatic evaluation tool for large language models	1,568