AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
2k stars
29 watching
150 forks
Language: Python
last commit: about 2 months ago
Linked from 1 awesome list
chatgptgpt-4llmllm-agent
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)