auto-evaluator
QA evaluation tool
Automated evaluation of language models for question answering tasks
749 stars
12 watching
100 forks
Language: TypeScript
last commit: 12 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
rlancemartin/auto-evaluator | An evaluation tool for question-answering systems using large language models and natural language processing techniques | 1,065 |
allenai/document-qa | Tools and codebase for training neural question answering models on multiple paragraphs of text data | 435 |
retraigo/appraisal | Utilities for transforming and analyzing text data using machine learning algorithms | 5 |
langchain-ai/langserve | Provides a REST API for deploying and managing LangChain runnables and chains | 1,970 |
mlabonne/llm-autoeval | A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters. | 566 |
allenai/olmo-eval | A framework for evaluating language models on NLP tasks | 326 |
cloud-cv/evalai | A platform for comparing and evaluating AI and machine learning algorithms at scale | 1,779 |
open-compass/lawbench | Evaluates the legal knowledge of large language models using a custom benchmarking framework. | 273 |
ruixiangcui/agieval | Evaluates foundation models on human-centric tasks with diverse exams and question types | 714 |
wordweb/langchain-chatglm-and-tigerbot | Develops a knowledge-based question answering application using Open Source models like ChatGLM and TigerBot | 105 |
kevincoble/aitoolbox | A toolbox of AI modules written in Swift for various machine learning tasks and algorithms | 794 |
openai/simple-evals | Evaluates language models using standardized benchmarks and prompting techniques. | 2,059 |
langchain-ai/langgraphjs | A framework for building resilient, stateful applications with LLMs as directed graphs | 742 |
johnsnowlabs/langtest | A tool for testing and evaluating large language models with a focus on AI safety and model assessment. | 506 |
tatsu-lab/alpaca_eval | An automatic evaluation tool for large language models | 1,568 |