zihao-ai/EARBench

Benchmarking Physical Risk Awareness of Foundation Model-based Embodied AI Agents

/ 100

Experimental

This framework helps AI researchers and developers ensure that embodied AI agents (like robots) can safely operate in real-world environments. You input detailed descriptions of physical scenes and specific tasks, and the system outputs an evaluation of how safely and effectively an AI agent plans to perform those tasks. It's designed for those who build and test AI systems intended for physical deployment, ensuring they are aware of potential risks.

No commits in the last 6 months.

Use this if you are developing or evaluating AI agents that interact with physical environments and need a systematic way to assess their awareness of safety risks during task planning.

Not ideal if you are looking for a tool to control or program robots directly, or if your AI agents do not operate in physical spaces.

robotics safety AI agent testing physical risk assessment embodied AI development AI ethics and safety

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

sierra-research/tau2-bench

τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment

xlang-ai/OSWorld

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

bigcode-project/bigcodebench

[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI

THUDM/AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

scicode-bench/SciCode

A benchmark that challenges language models to code solutions for scientific problems

Explore LLM Tools

All categories Trending LLM Tool directory Insights