yjyddq/RiOSWorld

[NeurIPS 2025] Official repository of RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use Agents

32
/ 100
Emerging

This project helps researchers and developers evaluate the potential risks of multimodal computer-use agents, particularly those designed to interact with a desktop environment. It takes a computer-use agent (like an AI assistant that controls a mouse and keyboard) as input and outputs a benchmark of its risk behaviors in various scenarios. The primary users are AI researchers and developers working on agent safety and trustworthiness.

117 stars.

Use this if you are developing or researching multimodal AI agents and need a standardized way to benchmark their safety and identify risky behaviors.

Not ideal if you are an end-user looking for a pre-built safety tool for AI agents, as this is a research and benchmarking framework.

AI Safety Agent Benchmarking Trustworthy AI AI Risk Assessment Multimodal Agents
No License No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 7 / 25
Community 9 / 25

How are scores calculated?

Stars

117

Forks

6

Language

HTML

License

Last pushed

Dec 02, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/agents/yjyddq/RiOSWorld"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.