bowen-upenn/PersonaMem

[COLM 2025] Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale

/ 100

Emerging

PersonaMem is a benchmark that helps evaluate how well large language models (LLMs) understand and adapt to individual users' evolving preferences across multiple conversations. It takes simulated chat interactions and user profiles as input, then assesses the LLM's ability to generate tailored responses or recommendations. This is useful for anyone designing or deploying chatbots, virtual assistants, or personalized content systems.

119 stars.

Use this if you need to test or improve how your conversational AI understands individual user preferences and remembers them across different interactions, leading to more personalized and engaging experiences.

Not ideal if you are looking for a dataset to train a general-purpose language model without a specific focus on personalization or multi-session memory.

personalization chatbot-development user-experience customer-engagement AI-evaluation

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 9 / 25

How are scores calculated?

Stars

119

Forks

Language

Python

License

MIT

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights