MozerWang/Loong
[EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA
This project provides a rigorous way to test and compare how well large language models (LLMs) can understand and answer questions based on many lengthy documents, like financial reports, legal cases, or academic papers. It takes in multiple documents, some very long, and evaluates an LLM's ability to locate specific information, compare details, group related facts, or follow complex chains of reasoning across them. This is for researchers and practitioners who use or develop LLMs and need to assess their performance on real-world, complex document analysis tasks.
149 stars.
Use this if you need to evaluate the ability of LLMs to process and extract information from very long and multiple documents across various scenarios and question types.
Not ideal if you are looking for a tool to train LLMs or apply them directly to your data without a focus on comprehensive benchmarking of their long-context understanding.
Stars
149
Forks
11
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 22, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/MozerWang/Loong"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
denser-org/denser-retriever
An enterprise-grade AI retriever designed to streamline AI integration into your applications,...
rayliuca/T-Ragx
Enhancing Translation with RAG-Powered Large Language Models
neuml/rag
🚀 Retrieval Augmented Generation (RAG) with txtai. Combine search and LLMs to find insights with...
NovaSearch-Team/RAG-Retrieval
Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.
RulinShao/retrieval-scaling
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".