project-miracl/nomiracl

NoMIRACL: A multilingual hallucination evaluation dataset to evaluate LLM robustness in RAG against first-stage retrieval errors on 18 languages.

45
/ 100
Emerging

This project provides a specialized dataset and starter code for evaluating how well large language models (LLMs) handle situations where the retrieved information is irrelevant to the user's question, especially across multiple languages. It takes a query and a set of potentially relevant passages, and determines if the LLM correctly abstains from answering when no relevant passage is found. This is for LLM developers or researchers who build and test multilingual RAG (Retrieval Augmented Generation) systems.

No commits in the last 6 months. Available on PyPI.

Use this if you are building or evaluating an LLM application that needs to reliably determine if information is relevant to a user's query before generating a response, particularly in a multilingual context.

Not ideal if you are looking for a general-purpose LLM evaluation framework or want to test LLM capabilities beyond relevance assessment in a RAG setting.

LLM evaluation RAG systems Multilingual AI Information retrieval Natural Language Processing
Stale 6m
Maintenance 0 / 25
Adoption 7 / 25
Maturity 25 / 25
Community 13 / 25

How are scores calculated?

Stars

26

Forks

4

Language

Python

License

Apache-2.0

Last pushed

Nov 29, 2024

Commits (30d)

0

Dependencies

6

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/project-miracl/nomiracl"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.