zihao-ai/EARBench
Benchmarking Physical Risk Awareness of Foundation Model-based Embodied AI Agents
This framework helps AI researchers and developers ensure that embodied AI agents (like robots) can safely operate in real-world environments. You input detailed descriptions of physical scenes and specific tasks, and the system outputs an evaluation of how safely and effectively an AI agent plans to perform those tasks. It's designed for those who build and test AI systems intended for physical deployment, ensuring they are aware of potential risks.
No commits in the last 6 months.
Use this if you are developing or evaluating AI agents that interact with physical environments and need a systematic way to assess their awareness of safety risks during task planning.
Not ideal if you are looking for a tool to control or program robots directly, or if your AI agents do not operate in physical spaces.
Stars
23
Forks
2
Language
Python
License
MIT
Category
Last pushed
Nov 28, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/zihao-ai/EARBench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
sierra-research/tau2-bench
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
xlang-ai/OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
bigcode-project/bigcodebench
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
THUDM/AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
scicode-bench/SciCode
A benchmark that challenges language models to code solutions for scientific problems