megagonlabs/holobench
🫧 Code for Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data (Maekawa*, Iso* et al.; ICLR 2025)
This project helps evaluate how well advanced language models can understand and answer complex questions based on very large text datasets, similar to how a database query works. You provide a question in a SQL-like format and a large collection of text documents, and it assesses the model's ability to extract and combine information from those documents to give an accurate answer. Data scientists and AI researchers who are building or testing large language models for information retrieval and complex reasoning tasks would use this.
No commits in the last 6 months.
Use this if you need to rigorously benchmark the "holistic reasoning" capabilities of long-context large language models against database-style operations on extensive textual data.
Not ideal if you are looking for a tool to perform standard database queries or to simply fine-tune a language model for basic text generation tasks.
Stars
12
Forks
1
Language
Python
License
BSD-3-Clause
Category
Last pushed
Feb 25, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/megagonlabs/holobench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
google/langfun
OO for LLMs
tanaos/artifex
Small Language Model Inference, Fine-Tuning and Observability. No GPU, no labeled data needed.
preligens-lab/textnoisr
Adding random noise to a text dataset, and controlling very accurately the quality of the result
vulnerability-lookup/VulnTrain
A tool to generate datasets and models based on vulnerabilities descriptions from @Vulnerability-Lookup.
masakhane-io/masakhane-mt
Machine Translation for Africa