Hallucination Detection RAG RAG Tools

Tools and systems specifically designed to detect, mitigate, verify, and prevent hallucinations in RAG pipelines through claim extraction, evidence retrieval, and factuality validation. Does NOT include general RAG quality monitoring, broader fact-checking systems outside RAG context, or hallucination research in non-RAG LLM applications.

There are 42 hallucination detection rag tools tracked. 3 score above 50 (established tier). The highest-rated is onestardao/WFGY at 67/100 with 1,620 stars. 1 of the top 10 are actively maintained.

Get all 42 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=rag&subcategory=hallucination-detection-rag&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	onestardao/WFGY WFGY: open-source reasoning and debugging infrastructure for RAG and AI...	67	Established	1,620	Jupyter Notebook
2	KRLabsOrg/verbatim-rag Hallucination-prevention RAG system with verbatim span extraction. Ensures...	60	Established	170	Python
3	iMoonLab/Hyper-RAG "Hyper-RAG: Combating LLM Hallucinations using Hypergraph-Driven...	55	Established	251	Python
4	frmoretto/clarity-gate Stop LLMs from hallucinating your guesses as facts. Clarity Gate is a...	45	Emerging	23	Python
5	project-miracl/nomiracl NoMIRACL: A multilingual hallucination evaluation dataset to evaluate LLM...	45	Emerging	26	Python
6	chensyCN/LogicRAG Source code of LogicRAG at AAAI'26.	43	Emerging	180	Python
7	Betswish/MIRAGE Easy-to-use MIRAGE code for faithful answer attribution in RAG applications....	33	Emerging	26	Python
8	anlp-team/LTI_Neural_Navigator "Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case...	33	Emerging	45	HTML
9	anulum/director-ai Real-time LLM hallucination guardrail — NLI + RAG fact-checking with...	32	Emerging	2	Python
10	rungalileo/hallucination-index Initiative to evaluate and rank the most popular LLMs across common task...	31	Emerging	116	—
11	lechmazur/confabulations Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes...	29	Experimental	243	HTML
12	amitgambhir/rag-auditor Open source RAG evaluation platform — automatically score faithfulness,...	25	Experimental	1	Python
13	tarekmasryo/rag-qa-logs-and-corpus Multi-table RAG QA telemetry + decision-grade RAG Ops notebook for retrieval...	25	Experimental	2	Jupyter Notebook
14	rafay123321/embedding-hallucinations This repo shows how foundational model hallucinates and how we can fix such...	23	Experimental	—	Python
15	PolarisLiu1/LAT Look As You Think: Unifying Reasoning and Visual Evidence Attribution for...	23	Experimental	8	Python
16	renataennes/rag-hallucination-detector RAG pipeline with bilingual EN/PT hallucination detection	22	Experimental	—	Jupyter Notebook
17	scasella/adaptive_rag_rlm A verifiers RLM environment for testing whether adaptive recursive search...	22	Experimental	—	Python
18	TECHKNOWMAD-LABS/ground-truth Hallucination detection for RAG pipelines.	22	Experimental	—	Python
19	aryan-bhadana/rag-debugger A production-style RAG debugger with hybrid retrieval, failure detection,...	22	Experimental	—	Python
20	MukundaKatta/RAGGuard RAG hallucination detection — verify LLM responses are grounded in source...	22	Experimental	—	Python
21	onurcandonmezer/rag-quality-monitor RAG quality monitoring and assurance platform	21	Experimental	—	Python
22	metawake/raglint pytest-native quality checks for RAG systems. Catches hallucinated entities,...	20	Experimental	1	Python
23	emory-irlab/conqret-rag Controversial Questions for Argumentation and Retrieval	19	Experimental	4	Python
24	Kanisha-Shah/Hallucination-Mitigation-Using-RAG A Columbia University capstone project focused on mitigating hallucinations...	19	Experimental	3	—
25	hemanthballa07/HALO-RAG Self-Verification Chains for Hallucination-Free Retrieval-Augmented...	17	Experimental	—	Python
26	kareem2002-k/clara-vs-rag-comparison 🔬 Compare CLaRa (latent compression) vs RAG (prompt stuffing) for document...	17	Experimental	—	Python
27	GreyCatVP/raft-canon Architectural canon for production-grade RAFT / RAG systems: evaluation,...	17	Experimental	—	—
28	nickhuang99/Intent-Aware-RAG Why Pure Vector Search is a "False Proposition" for RAG?	16	Experimental	3	—
29	usal-research/rag_ctxdq Implementation prototype for and executable context-aware data quality assessment	15	Experimental	2	Python
30	bdeva1975/hallucinationbench Detect hallucinations in your RAG pipeline output — in two lines of Python.	15	Experimental	1	Python
31	Padraigobrien08/model-failure-lab Toolkit for discovering, classifying, and debugging failure modes in LLM and...	15	Experimental	1	Python
32	samuel-isr/VeritasRAG A hallucination-resistant Retrieval-Augmented Generation (RAG) system.	14	Experimental	—	Python
33	yuvaraj949/Dynamic-Uncertainty-Aware-Attribution-RAG Token-level hallucination detection for RAG systems using Contextual...	14	Experimental	—	Python
34	alp-oz/cautious-rag A RAG system that knows when not to answer using concentration inequalities	14	Experimental	—	Python
35	Sakshi3027/rag-handbook-qa A production-ready RAG system with citations and hallucination prevention	14	Experimental	1	Python
36	qualigenai/rag-learning Production-ready RAG system with evaluation framework — zero hallucination,...	13	Experimental	—	Python
37	apatni24/VisionQA Context-aware tool for automated BDD test generation and execution using...	13	Experimental	—	Python
38	khaledahmed-Tech/rag-patterns-in-production RAG reliability patterns: failure modes, observability, and quality loops.	13	Experimental	—	—
39	Arnav-Ajay/rag-systems-foundations A systems-level analysis of static RAG pipelines, isolating ingestion,...	13	Experimental	—	—
40	Arnav-Ajay/rag-failure-modes Failure-first analysis of retrieval-augmented and agentic systems, focused...	13	Experimental	—	Python
41	F4biian/HalluRAG Source code of "The HalluRAG Dataset: Detecting Closed-Domain Hallucinations...	13	Experimental	9	Python
42	Tomsawyerhu/LRP4RAG RAG Hallucination Detecting By LRP.	13	Experimental	11	Jupyter Notebook

Comparisons in this category

Hyper-RAG and LTI_Neural_Navigator (55 vs 33) Hyper-RAG and RAGGuard (55 vs 22) verbatim-rag and RAGGuard (60 vs 22)