megagonlabs/holobench

🫧 Code for Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data (Maekawa*, Iso* et al.; ICLR 2025)

27
/ 100
Experimental

This project helps evaluate how well advanced language models can understand and answer complex questions based on very large text datasets, similar to how a database query works. You provide a question in a SQL-like format and a large collection of text documents, and it assesses the model's ability to extract and combine information from those documents to give an accurate answer. Data scientists and AI researchers who are building or testing large language models for information retrieval and complex reasoning tasks would use this.

No commits in the last 6 months.

Use this if you need to rigorously benchmark the "holistic reasoning" capabilities of long-context large language models against database-style operations on extensive textual data.

Not ideal if you are looking for a tool to perform standard database queries or to simply fine-tune a language model for basic text generation tasks.

AI evaluation natural language processing information retrieval large language models textual data analysis
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 6 / 25

How are scores calculated?

Stars

12

Forks

1

Language

Python

License

BSD-3-Clause

Last pushed

Feb 25, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/megagonlabs/holobench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.