Open-Social-World/EgoNormia

EgoNormia | Benchmarking Physical Social Norm Understanding in VLMs

/ 100

Experimental

This tool helps AI researchers and developers assess how well their Vision-Language Models (VLMs) understand and reason about physical social norms in real-world scenarios. You provide your VLM and a dataset of social interaction scenarios, and it produces a benchmark score indicating the model's performance in interpreting these situations. It's designed for researchers working on improving the social intelligence of AI agents.

No commits in the last 6 months.

Use this if you are developing or evaluating Vision-Language Models and need a standardized way to measure their understanding of human social behavior in physical environments.

Not ideal if you are looking for a tool to develop or train new VLMs, as this is solely for benchmarking existing models.

AI research VLM evaluation social AI agentic AI model benchmarking

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

stanfordnlp/axbench

Stanford NLP Python library for benchmarking the utility of LLM interpretability methods

aidatatools/ollama-benchmark

LLM Benchmark for Throughput via Ollama (Local LLMs)

LarHope/ollama-benchmark

Ollama based Benchmark with detail I/O token per second. Python with Deepseek R1 example.

qcri/LLMeBench

Benchmarking Large Language Models

THUDM/LongBench

LongBench v2 and LongBench (ACL 25'&24')

Explore Transformer Models

All categories Trending Transformer directory Insights