bananalyzer
Web task evaluator
A tool to evaluate AI agents on web tasks by dynamically constructing and executing test suites against predefined example websites.
Open source AI Agent evaluation framework for web tasks 🐒🍌
274 stars
3 watching
21 forks
Language: Python
last commit: 12 months ago
Linked from 1 awesome list
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | A comprehensive benchmarking framework for evaluating the performance and safety of reward models in reinforcement learning. | 459 |
| | An automatic evaluation tool for large language models | 1,568 |
| | Evaluates language models using standardized benchmarks and prompting techniques. | 2,059 |
| | An evaluation tool for question-answering systems using large language models and natural language processing techniques | 1,065 |
| | A process automation tool that allows users to design and execute rule-based automation without writing application code. | 1,125 |
| | A framework for evaluating language models on NLP tasks | 326 |
| | An evaluation framework for large language models trained with instruction tuning methods | 535 |
| | A comprehensive toolkit for evaluating NLP experiments offering automated metrics and efficient computation. | 187 |
| | An evaluation tool for Retrieval-augmented Generation methods | 141 |
| | Evaluates segmentation performance in medical imaging using multiple metrics | 57 |
| | A tool that utilizes AI and automation to execute complex tasks and generate code in response to user requests. | 869 |
| | A framework for evaluating and diagnosing retrieval-augmented generation systems | 630 |
| | A JavaScript-based tool for automating repetitive tasks in software development. | 1 |
| | A tool for automatically evaluating RAG models by generating synthetic data and fine-tuning classifiers | 499 |
| | A performance testing tool for web applications and APIs | 572 |