salesforce/summary-of-a-haystack

Codebase accompanying the Summary of a Haystack paper.

33
/ 100
Emerging

This project helps researchers and developers evaluate how well large language models (LLMs) and Retrieval Augmented Generation (RAG) systems can summarize very long documents or conversations. You input large text documents (like news articles or conversation transcripts) and get back automatically generated summaries alongside evaluation scores. It's designed for AI researchers and machine learning engineers who need to benchmark and compare the performance of different summarization methods.

No commits in the last 6 months.

Use this if you are developing or comparing long-context LLMs and RAG systems and need a standardized way to measure their summarization capabilities on complex, lengthy texts.

Not ideal if you are looking for an out-of-the-box summarization tool for general use, as this project focuses on research and evaluation of underlying models.

AI-research LLM-benchmarking NLP-evaluation generative-AI information-retrieval
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 8 / 25

How are scores calculated?

Stars

80

Forks

5

Language

Jupyter Notebook

License

Apache-2.0

Category

rag-qa-systems

Last pushed

Sep 20, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/salesforce/summary-of-a-haystack"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.