bananalyzer
Web task evaluator
A tool to evaluate AI agents on web tasks by dynamically constructing and executing test suites against predefined example websites.
Open source AI Agent evaluation framework for web tasks 🐒🍌
274 stars
3 watching
21 forks
Language: Python
last commit: 3 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| A comprehensive benchmarking framework for evaluating the performance and safety of reward models in reinforcement learning. | 459 |
| An automatic evaluation tool for large language models | 1,568 |
| Evaluates language models using standardized benchmarks and prompting techniques. | 2,059 |
| An evaluation tool for question-answering systems using large language models and natural language processing techniques | 1,065 |
| A process automation tool that allows users to design and execute rule-based automation without writing application code. | 1,125 |
| A framework for evaluating language models on NLP tasks | 326 |
| An evaluation framework for large language models trained with instruction tuning methods | 535 |
| A comprehensive toolkit for evaluating NLP experiments offering automated metrics and efficient computation. | 187 |
| An evaluation tool for Retrieval-augmented Generation methods | 141 |
| Evaluates segmentation performance in medical imaging using multiple metrics | 57 |
| A tool that utilizes AI and automation to execute complex tasks and generate code in response to user requests. | 869 |
| A framework for evaluating and diagnosing retrieval-augmented generation systems | 630 |
| A JavaScript-based tool for automating repetitive tasks in software development. | 1 |
| A tool for automatically evaluating RAG models by generating synthetic data and fine-tuning classifiers | 499 |
| A performance testing tool for web applications and APIs | 572 |