megagonlabs/llm-longeval

💵 Code for Less is More for Long Document Summary Evaluation by LLMs (Wu*, Iso* et al; EACL 2024)

/ 100

Experimental

This tool helps researchers, content analysts, or anyone working with large volumes of text efficiently evaluate the quality of AI-generated summaries for long documents. It takes a long source document and its AI-generated summary as input, then provides metrics like relevance, factual consistency, or faithfulness to assess how well the summary captures the original's essence. This is particularly useful for those needing to gauge the reliability and accuracy of automated summarization without incurring high costs.

No commits in the last 6 months.

Use this if you need to reliably and cost-effectively evaluate the quality of AI-generated summaries for very long reports, articles, or scientific papers.

Not ideal if you are evaluating summaries of short documents or if you primarily need to compare summarization models using traditional metrics like ROUGE or BERTScore without human-like judgment.

document-summarization content-evaluation natural-language-processing research-analysis AI-model-assessment

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

BSD-3-Clause

Higher-rated alternatives

fangevo/KD-efficient-text-summarization

The project leverages a larger model, Qwen2.5-14B, to generate high-quality reference summaries,...

Explore Transformer Models

All categories Trending Transformer directory Insights