HUST-AI-HYZ/MemoryAgentBench

Open source code for ICLR 2026 Paper: Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

46
/ 100
Emerging

This project helps AI developers and researchers evaluate how well their large language model (LLM) agents remember information over extended, multi-turn conversations. It takes an LLM agent and a dataset of questions and scenarios as input, then outputs performance metrics across key memory competencies like accurate retrieval and conflict resolution. This is for anyone building or researching AI assistants that need to maintain context and learn across many interactions.

253 stars.

Use this if you are developing or studying LLM agents and need a standardized way to measure their long-term memory capabilities through realistic, incremental conversations.

Not ideal if you are an end-user looking for an AI agent to solve a specific problem, rather than evaluating the memory performance of such agents.

LLM-development AI-agent-evaluation conversational-AI natural-language-processing machine-learning-research
No License No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 7 / 25
Community 19 / 25

How are scores calculated?

Stars

253

Forks

41

Language

Python

License

Last pushed

Jan 27, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/HUST-AI-HYZ/MemoryAgentBench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.