sileod/llm-theory-of-mind

Testing Theory of Mind (ToM) in language models with epistemic logic

/ 100

Emerging

This project helps researchers in artificial intelligence and cognitive science evaluate how well large language models (LLMs) understand complex social situations and the beliefs of others. It takes reasoning problems based on modal logic, similar to classic brain teasers about knowledge and belief, and uses them to test if an LLM can infer what different agents know or believe. The output shows how accurately the LLM solves these 'Theory of Mind' challenges.

No commits in the last 6 months.

Use this if you are an AI researcher or cognitive scientist looking to rigorously test the advanced reasoning capabilities and social intelligence of large language models.

Not ideal if you are looking to build applications with LLMs or need to benchmark their performance on standard language tasks like translation or summarization.

AI-evaluation cognitive-science language-model-benchmarking epistemic-reasoning artificial-intelligence-research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights