chuanyangjin/MMToM-QA
[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering
This project helps researchers evaluate AI's ability to understand human-like minds, a concept known as Theory of Mind. It processes videos and text descriptions of everyday scenarios, then determines if an AI can correctly infer an agent's goals and beliefs based on their actions. Scientists in AI research or cognitive science focusing on machine intelligence and human-computer interaction would use this to benchmark advanced AI models.
154 stars.
Use this if you are an AI researcher or cognitive scientist developing or testing AI models that need to understand and predict human intentions and beliefs in complex, real-world interactions.
Not ideal if you need a pre-trained, ready-to-deploy AI for practical applications like customer service bots or predictive analytics, as this is a research benchmark.
Stars
154
Forks
19
Language
Python
License
MIT
Category
Last pushed
Jan 02, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/chuanyangjin/MMToM-QA"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
kyegomez/RT-X
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment:...
kyegomez/PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
lyuchenyang/Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Muennighoff/vilio
🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle
kyegomez/PALM-E
Implementation of "PaLM-E: An Embodied Multimodal Language Model"