JingbiaoMei/ATM-Bench

ATM-Bench: A benchmark for long-term personalized memory QA spanning ~4 years of multimodal data (images, videos, emails). Features referential queries, evidence-grounded answering, and multi-source reasoning. Paper: "According to Me: Long-Term Personalized Referential Memory QA"

26
/ 100
Experimental

This project offers a benchmark for evaluating how well AI systems can recall specific, personalized details from a person's digital history. It takes in a mix of personal images, videos, and emails spanning several years, and lets you ask complex questions that require the AI to find and connect information from these various sources. It's designed for researchers and developers working on AI agents that need a long-term, multimodal understanding of an individual's past interactions and experiences.

Use this if you are developing or evaluating AI systems that need to answer personalized questions based on a large, diverse collection of a user's past memories and digital content.

Not ideal if your AI application doesn't require multimodal data, long-term memory recall, or referential questioning based on personal history.

AI-memory-research personal-assistant-development multimodal-AI knowledge-retrieval personalized-AI
No Package No Dependents
Maintenance 10 / 25
Adoption 5 / 25
Maturity 11 / 25
Community 0 / 25

How are scores calculated?

Stars

9

Forks

Language

Python

License

MIT

Last pushed

Mar 12, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/JingbiaoMei/ATM-Bench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.