JingbiaoMei/ATM-Bench
ATM-Bench: A benchmark for long-term personalized memory QA spanning ~4 years of multimodal data (images, videos, emails). Features referential queries, evidence-grounded answering, and multi-source reasoning. Paper: "According to Me: Long-Term Personalized Referential Memory QA"
This project offers a benchmark for evaluating how well AI systems can recall specific, personalized details from a person's digital history. It takes in a mix of personal images, videos, and emails spanning several years, and lets you ask complex questions that require the AI to find and connect information from these various sources. It's designed for researchers and developers working on AI agents that need a long-term, multimodal understanding of an individual's past interactions and experiences.
Use this if you are developing or evaluating AI systems that need to answer personalized questions based on a large, diverse collection of a user's past memories and digital content.
Not ideal if your AI application doesn't require multimodal data, long-term memory recall, or referential questioning based on personal history.
Stars
9
Forks
—
Language
Python
License
MIT
Category
Last pushed
Mar 12, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/JingbiaoMei/ATM-Bench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
MemoriLabs/Memori
SQL Native Memory Layer for LLMs, AI Agents & Multi-Agent Systems
volcengine/OpenViking
OpenViking is an open-source context database designed specifically for AI Agents(such as...
mem0ai/mem0
Universal memory layer for AI Agents
zjunlp/LightMem
[ICLR 2026] LightMem: Lightweight and Efficient Memory-Augmented Generation
MemTensor/MemOS
AI memory OS for LLM and Agent systems(moltbot,clawdbot,openclaw), enabling persistent Skill...