LeonEricsson/llmcontext

:anger: Pressure testing the context window of open LLMs

/ 100

Experimental

This project helps developers and researchers understand how well various open-source large language models (LLMs) can find a specific piece of information hidden within a very long text. You input an LLM, a long text with a 'needle' fact inside, and a question. The output is a score indicating how accurately the LLM retrieved the 'needle' and visualizations showing performance across different text lengths and fact locations. Anyone working with or choosing open-source LLMs for tasks requiring long-context understanding would use this.

No commits in the last 6 months.

Use this if you need to evaluate the long-context retrieval capabilities of open-source LLMs before deploying them for information extraction or question-answering on lengthy documents.

Not ideal if you are looking for a general-purpose LLM evaluation tool or if your primary concern is text generation quality rather than precise information retrieval from long contexts.

LLM evaluation open-source AI natural language processing context window testing model performance

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

PacktPublishing/Mastering-NLP-from-Foundations-to-LLMs

Mastering NLP from Foundations to LLMs, Published by Packt

HandsOnLLM/Hands-On-Large-Language-Models

Official code repo for the O'Reilly Book - "Hands-On Large Language Models"

mlabonne/llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

louisfb01/start-llms

A complete guide to start and improve your LLM skills in 2026 with little background in the...

Denis2054/Transformers-for-NLP-and-Computer-Vision-3rd-Edition

Transformers 3rd Edition

Explore Transformer Models

All categories Trending Transformer directory Insights