megagonlabs/holobench

🫧 Code for Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data (Maekawa*, Iso* et al.; ICLR 2025)

/ 100

Experimental

This project helps evaluate how well advanced language models can understand and answer complex questions based on very large text datasets, similar to how a database query works. You provide a question in a SQL-like format and a large collection of text documents, and it assesses the model's ability to extract and combine information from those documents to give an accurate answer. Data scientists and AI researchers who are building or testing large language models for information retrieval and complex reasoning tasks would use this.

No commits in the last 6 months.

Use this if you need to rigorously benchmark the "holistic reasoning" capabilities of long-context large language models against database-style operations on extensive textual data.

Not ideal if you are looking for a tool to perform standard database queries or to simply fine-tune a language model for basic text generation tasks.

AI evaluation natural language processing information retrieval large language models textual data analysis

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

BSD-3-Clause

Higher-rated alternatives

google/langfun

OO for LLMs

tanaos/artifex

Small Language Model Inference, Fine-Tuning and Observability. No GPU, no labeled data needed.

preligens-lab/textnoisr

Adding random noise to a text dataset, and controlling very accurately the quality of the result

vulnerability-lookup/VulnTrain

A tool to generate datasets and models based on vulnerabilities descriptions from @Vulnerability-Lookup.

masakhane-io/masakhane-mt

Machine Translation for Africa

Explore NLP Tools

All categories Trending NLP directory Insights