project-miracl/nomiracl
NoMIRACL: A multilingual hallucination evaluation dataset to evaluate LLM robustness in RAG against first-stage retrieval errors on 18 languages.
This project provides a specialized dataset and starter code for evaluating how well large language models (LLMs) handle situations where the retrieved information is irrelevant to the user's question, especially across multiple languages. It takes a query and a set of potentially relevant passages, and determines if the LLM correctly abstains from answering when no relevant passage is found. This is for LLM developers or researchers who build and test multilingual RAG (Retrieval Augmented Generation) systems.
No commits in the last 6 months. Available on PyPI.
Use this if you are building or evaluating an LLM application that needs to reliably determine if information is relevant to a user's query before generating a response, particularly in a multilingual context.
Not ideal if you are looking for a general-purpose LLM evaluation framework or want to test LLM capabilities beyond relevance assessment in a RAG setting.
Stars
26
Forks
4
Language
Python
License
Apache-2.0
Category
Last pushed
Nov 29, 2024
Commits (30d)
0
Dependencies
6
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/project-miracl/nomiracl"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
onestardao/WFGY
WFGY: open-source reasoning and debugging infrastructure for RAG and AI agents. Includes the...
KRLabsOrg/verbatim-rag
Hallucination-prevention RAG system with verbatim span extraction. Ensures all generated content...
iMoonLab/Hyper-RAG
"Hyper-RAG: Combating LLM Hallucinations using Hypergraph-Driven Retrieval-Augmented Generation"...
frmoretto/clarity-gate
Stop LLMs from hallucinating your guesses as facts. Clarity Gate is a verification protocol for...
chensyCN/LogicRAG
Source code of LogicRAG at AAAI'26.