ltzheng/agent-studio
[ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents
This project provides a comprehensive toolkit for AI researchers and developers working on virtual agents. It helps in creating, evaluating, and benchmarking AI agents that can interact with various computer software using visual observations (like video) and actions (like GUI clicks or API calls). The user provides agent code, and the toolkit outputs performance metrics and detailed insights into agent capabilities.
229 stars. No commits in the last 6 months.
Use this if you are developing or studying general-purpose virtual agents and need a standardized environment, tools, and benchmarks to test their ability to interact with diverse software, from terminal commands to graphical user interfaces.
Not ideal if you are looking for a pre-built agent to solve a specific problem or if you are not involved in the research and development of AI agents.
Stars
229
Forks
30
Language
Python
License
AGPL-3.0
Category
Last pushed
Jun 16, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/ltzheng/agent-studio"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
StonyBrookNLP/appworld
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and...
qualifire-dev/rogue
AI Agent Evaluator & Red Team Platform
microsoft/WindowsAgentArena
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of...
future-agi/ai-evaluation
Evaluation Framework for all your AI related Workflows
agentscope-ai/OpenJudge
OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards