zhchen18/ToMBench

ToMBench: Benchmarking Theory of Mind in Large Language Models, ACL 2024.

35
/ 100
Emerging

This project provides a comprehensive benchmark to evaluate how well large language models (LLMs) understand human-like social intelligence, often called 'Theory of Mind'. It helps researchers and AI developers assess an LLM's ability to interpret complex social scenarios, motivations, and non-literal communication. You provide an LLM's responses to various social prompts, and the benchmark quantifies its 'Theory of Mind' capabilities across different tasks and abilities.

No commits in the last 6 months.

Use this if you are developing or evaluating large language models and need a systematic way to measure their social intelligence, particularly their ability to infer mental states, understand emotions, and interpret non-literal communication in diverse real-world social scenarios.

Not ideal if you are looking for a dataset to train an LLM for specific social tasks, as this benchmark is designed purely for evaluation to prevent data contamination.

AI evaluation LLM capabilities social intelligence cognitive AI natural language understanding
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 11 / 25

How are scores calculated?

Stars

66

Forks

6

Language

Python

License

MIT

Last pushed

Jun 24, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/zhchen18/ToMBench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.