GiovanniTRA/UDCG
Code and Data of the paper: "Redefining Retrieval Evaluation in the Era of LLMs"
This helps AI researchers and developers evaluate how effectively a set of retrieved passages assists a large language model (LLM) in answering a question. You provide a list of questions, potential answer passages for each, and indicate which passages are relevant. The tool then calculates a 'Utility and Distraction-aware Cumulative Gain' (UDCG) score, indicating the overall quality of the passages for that specific LLM.
Use this if you need to quantitatively measure the quality of retrieved information for your LLM-powered question-answering systems.
Not ideal if you are looking for a tool to generate relevance labels or passages, or if your evaluation doesn't involve language models.
Stars
12
Forks
1
Language
Python
License
MIT
Category
Last pushed
Oct 27, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/GiovanniTRA/UDCG"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
gunthercox/chatterbot-corpus
A multilingual dialog corpus
EdinburghNLP/awesome-hallucination-detection
List of papers on hallucination detection in LLMs.
jfainberg/self_dialogue_corpus
The Self-dialogue Corpus - a collection of self-dialogues across music, movies and sports
jkkummerfeld/irc-disentanglement
Dataset and model for disentangling chat on IRC
Tomiinek/MultiWOZ_Evaluation
Unified MultiWOZ evaluation scripts for the context-to-response task.