THUDM/VisualAgentBench

Towards Large Multimodal Models as Visual Foundation Agents

37
/ 100
Emerging

This tool helps researchers and AI developers systematically assess how well large multimodal models (LMMs) can act as 'agents' in visual environments. You input different LMMs and a set of diverse visual tasks (like navigating a virtual world, interacting with a graphical user interface, or designing web elements), and it outputs performance metrics showing the model's success rate in completing those tasks. This is for AI researchers and practitioners who develop or evaluate LMMs for agentic applications.

258 stars. No commits in the last 6 months.

Use this if you need to benchmark the capabilities of large multimodal models to understand and act within various visual environments, from embodied simulations to web interfaces and visual design tasks.

Not ideal if you are looking for a tool to build or deploy LMM-powered applications directly, as this focuses on model evaluation rather than application development.

AI model evaluation Multimodal AI research Agentic AI systems Computer vision benchmarking Embodied AI
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 9 / 25

How are scores calculated?

Stars

258

Forks

10

Language

Python

License

Apache-2.0

Last pushed

Apr 24, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/THUDM/VisualAgentBench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.