OpenBMB/InfiniteBench

Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718

40
/ 100
Emerging

InfiniteBench provides a specialized dataset and framework to test how well large language models can handle extremely long texts, over 100,000 tokens. It takes various forms of lengthy documents like books, code, or dialogues as input and evaluates the model's ability to summarize, answer questions, debug code, or perform calculations on them. This is primarily for AI researchers and developers working on advanced language models to understand their limitations with extended context.

378 stars. No commits in the last 6 months.

Use this if you are developing or evaluating a large language model and need to thoroughly test its ability to process and reason over very long documents, beyond what traditional benchmarks offer.

Not ideal if you are looking for a benchmark to evaluate standard language model tasks with typical context lengths, or if your primary interest is in fine-tuning existing models for shorter-context applications.

large-language-models natural-language-processing model-evaluation long-context-understanding AI-research
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 14 / 25

How are scores calculated?

Stars

378

Forks

32

Language

Python

License

MIT

Last pushed

Sep 25, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/OpenBMB/InfiniteBench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.