MemTensor/HaluMem
HaluMem is the first operation level hallucination evaluation benchmark tailored to agent memory systems.
This project helps developers and researchers evaluate how well an AI agent's memory system handles factual information and avoids making things up. It provides a benchmark to test if a memory system accurately extracts, updates, and retrieves memories from dialogues, and then uses that information to answer questions without hallucinating. The main output is a detailed breakdown of performance metrics for different memory operations, revealing where a system might be generating incorrect or irrelevant information.
113 stars.
Use this if you are developing or studying AI agent memory systems and need to rigorously assess their ability to store and use information truthfully across various operational steps.
Not ideal if you are evaluating the overall end-to-end performance of a large language model and are not specifically focused on the internal mechanisms and hallucination tendencies of its memory component.
Stars
113
Forks
13
Language
Python
License
—
Category
Last pushed
Jan 08, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/MemTensor/HaluMem"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
vectara/hallucination-leaderboard
Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
PKU-YuanGroup/Hallucination-Attack
Attack to induce LLMs within hallucinations
amir-hameed-mir/Sirraya_LSD_Code
Layer-wise Semantic Dynamics (LSD) is a model-agnostic framework for hallucination detection in...
NishilBalar/Awesome-LVLM-Hallucination
up-to-date curated list of state-of-the-art Large vision language models hallucinations...
intuit/sac3
Official repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via...