LeonEricsson/llmcontext

:anger: Pressure testing the context window of open LLMs

27
/ 100
Experimental

This project helps developers and researchers understand how well various open-source large language models (LLMs) can find a specific piece of information hidden within a very long text. You input an LLM, a long text with a 'needle' fact inside, and a question. The output is a score indicating how accurately the LLM retrieved the 'needle' and visualizations showing performance across different text lengths and fact locations. Anyone working with or choosing open-source LLMs for tasks requiring long-context understanding would use this.

No commits in the last 6 months.

Use this if you need to evaluate the long-context retrieval capabilities of open-source LLMs before deploying them for information extraction or question-answering on lengthy documents.

Not ideal if you are looking for a general-purpose LLM evaluation tool or if your primary concern is text generation quality rather than precise information retrieval from long contexts.

LLM evaluation open-source AI natural language processing context window testing model performance
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 4 / 25

How are scores calculated?

Stars

25

Forks

1

Language

Jupyter Notebook

License

MIT

Last pushed

Aug 25, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/LeonEricsson/llmcontext"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.