MozerWang/Loong

[EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

43
/ 100
Emerging

This project provides a rigorous way to test and compare how well large language models (LLMs) can understand and answer questions based on many lengthy documents, like financial reports, legal cases, or academic papers. It takes in multiple documents, some very long, and evaluates an LLM's ability to locate specific information, compare details, group related facts, or follow complex chains of reasoning across them. This is for researchers and practitioners who use or develop LLMs and need to assess their performance on real-world, complex document analysis tasks.

149 stars.

Use this if you need to evaluate the ability of LLMs to process and extract information from very long and multiple documents across various scenarios and question types.

Not ideal if you are looking for a tool to train LLMs or apply them directly to your data without a focus on comprehensive benchmarking of their long-context understanding.

LLM evaluation document analysis financial reporting legal research academic review
No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 11 / 25

How are scores calculated?

Stars

149

Forks

11

Language

Python

License

Apache-2.0

Last pushed

Dec 22, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/MozerWang/Loong"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.