open-compass/Ada-LEval

The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"

25
/ 100
Experimental

This tool helps AI researchers and developers systematically evaluate how well large language models (LLMs) can handle very long texts. You provide your custom LLM or select a known model, and the tool outputs detailed accuracy scores across various text lengths for tasks like ordering text segments or choosing the best answer from a long document. This is for professionals building or fine-tuning LLMs who need to understand their model's long-context comprehension capabilities.

No commits in the last 6 months.

Use this if you are developing or deploying large language models and need a rigorous, length-adaptable benchmark to measure their ability to process and understand extensive textual inputs.

Not ideal if you are looking for an LLM for general use or a benchmark for short-context tasks, as this focuses specifically on challenging long-context comprehension.

LLM evaluation natural language processing AI model testing long-context understanding text comprehension
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 7 / 25

How are scores calculated?

Stars

56

Forks

3

Language

Python

License

Last pushed

May 22, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/open-compass/Ada-LEval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.