project-miracl/nomiracl

NoMIRACL: A multilingual hallucination evaluation dataset to evaluate LLM robustness in RAG against first-stage retrieval errors on 18 languages.

/ 100

Emerging

This project provides a specialized dataset and starter code for evaluating how well large language models (LLMs) handle situations where the retrieved information is irrelevant to the user's question, especially across multiple languages. It takes a query and a set of potentially relevant passages, and determines if the LLM correctly abstains from answering when no relevant passage is found. This is for LLM developers or researchers who build and test multilingual RAG (Retrieval Augmented Generation) systems.

No commits in the last 6 months. Available on PyPI.

Use this if you are building or evaluating an LLM application that needs to reliably determine if information is relevant to a user's query before generating a response, particularly in a multilingual context.

Not ideal if you are looking for a general-purpose LLM evaluation framework or want to test LLM capabilities beyond relevance assessment in a RAG setting.

LLM evaluation RAG systems Multilingual AI Information retrieval Natural Language Processing

Stale 6m

Maintenance 0 / 25

Adoption 7 / 25

Maturity 25 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

onestardao/WFGY

WFGY: open-source reasoning and debugging infrastructure for RAG and AI agents. Includes the...

KRLabsOrg/verbatim-rag

Hallucination-prevention RAG system with verbatim span extraction. Ensures all generated content...

iMoonLab/Hyper-RAG

"Hyper-RAG: Combating LLM Hallucinations using Hypergraph-Driven Retrieval-Augmented Generation"...

frmoretto/clarity-gate

Stop LLMs from hallucinating your guesses as facts. Clarity Gate is a verification protocol for...

chensyCN/LogicRAG

Source code of LogicRAG at AAAI'26.

Explore RAG Tools

All categories Trending RAG directory Insights