nttmdlab-nlp/ToMATO

ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind (AAAI2025)

28
/ 100
Experimental

This project offers a specialized benchmark to assess how well large language models (LLMs) understand and predict the thoughts, beliefs, and intentions of others—a capability known as 'Theory of Mind.' It uses scenarios where LLMs interact with each other under different knowledge conditions, providing a dataset to evaluate an LLM's capacity for complex social reasoning. LLM researchers and developers focused on advanced AI capabilities would use this.

No commits in the last 6 months.

Use this if you are an AI researcher or developer evaluating the 'Theory of Mind' capabilities of your large language models in realistic, conversational settings.

Not ideal if you are looking for a dataset to fine-tune your LLM, as this benchmark is strictly for evaluation to prevent contamination.

LLM evaluation AI research cognitive AI natural language processing AI ethics and bias
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 12 / 25

How are scores calculated?

Stars

19

Forks

3

Language

Python

License

Last pushed

Apr 16, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/nttmdlab-nlp/ToMATO"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.