MozerWang/Loong

[EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

/ 100

Emerging

This project provides a rigorous way to test and compare how well large language models (LLMs) can understand and answer questions based on many lengthy documents, like financial reports, legal cases, or academic papers. It takes in multiple documents, some very long, and evaluates an LLM's ability to locate specific information, compare details, group related facts, or follow complex chains of reasoning across them. This is for researchers and practitioners who use or develop LLMs and need to assess their performance on real-world, complex document analysis tasks.

149 stars.

Use this if you need to evaluate the ability of LLMs to process and extract information from very long and multiple documents across various scenarios and question types.

Not ideal if you are looking for a tool to train LLMs or apply them directly to your data without a focus on comprehensive benchmarking of their long-context understanding.

LLM evaluation document analysis financial reporting legal research academic review

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

149

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

denser-org/denser-retriever

An enterprise-grade AI retriever designed to streamline AI integration into your applications,...

rayliuca/T-Ragx

Enhancing Translation with RAG-Powered Large Language Models

neuml/rag

🚀 Retrieval Augmented Generation (RAG) with txtai. Combine search and LLMs to find insights with...

NovaSearch-Team/RAG-Retrieval

Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.

RulinShao/retrieval-scaling

Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".

Explore RAG Tools

All categories Trending RAG directory Insights