bowen-upenn/PersonaMem
[COLM 2025] Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale
PersonaMem is a benchmark that helps evaluate how well large language models (LLMs) understand and adapt to individual users' evolving preferences across multiple conversations. It takes simulated chat interactions and user profiles as input, then assesses the LLM's ability to generate tailored responses or recommendations. This is useful for anyone designing or deploying chatbots, virtual assistants, or personalized content systems.
119 stars.
Use this if you need to test or improve how your conversational AI understands individual user preferences and remembers them across different interactions, leading to more personalized and engaging experiences.
Not ideal if you are looking for a dataset to train a general-purpose language model without a specific focus on personalization or multi-session memory.
Stars
119
Forks
7
Language
Python
License
MIT
Category
Last pushed
Feb 07, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/bowen-upenn/PersonaMem"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents