Tongyi-MAI/MobileWorld
Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments
This project helps researchers and developers evaluate how well autonomous mobile agents perform complex tasks on Android devices. It provides a standardized way to test agents using real-world mobile apps and workflows, including interactions with human users. The output is a performance benchmark, showing how different agents succeed at tasks like using social media, e-commerce, or communication apps, for those developing smarter mobile AI.
152 stars.
Use this if you are developing or researching autonomous AI agents and need a rigorous, reproducible way to benchmark their ability to navigate and complete tasks across various mobile applications, especially those requiring multi-step reasoning or user interaction.
Not ideal if you are a typical mobile app user looking to automate simple tasks on your personal device.
Stars
152
Forks
28
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 11, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/Tongyi-MAI/MobileWorld"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related agents
OSU-NLP-Group/ScienceAgentBench
[ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven...
ml-dev-bench/ml-dev-bench
ML-Dev-Bench is a benchmark for evaluating AI agents against various ML development tasks.
michaelabrt/clarte-benchmark
Paired A/B benchmark suite for Clarté - measures how dependency-graph intelligence affects AI...
zzhiyuann/agent-bench
Benchmarking framework for AI agents — pytest for AI agents. Define tasks in YAML, run against...
MSKazemi/ExaBench-QA
ExaBench-QA is a benchmark and dataset for evaluating role-aware, LLM-based AI agents for...