salesforce/factualNLG

Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"

/ 100

Emerging

This project helps evaluate how accurately large language models (LLMs) summarize content across various domains like news, sales calls, and scientific papers. It provides a benchmark dataset and tools to assess whether an LLM's summary is factually consistent with the original text, or if it introduces incorrect information. Content strategists, researchers, or anyone generating summaries with AI can use this to understand the reliability of different LLMs.

No commits in the last 6 months.

Use this if you need to determine which large language models are best at producing factually accurate summaries for different types of content.

Not ideal if you are looking for a tool to generate summaries, rather than evaluate the factual accuracy of existing summaries.

content-evaluation AI-summarization factual-verification natural-language-processing text-analysis

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

Apache-2.0

Higher-rated alternatives

MadryLab/context-cite

Attribute (or cite) statements generated by LLMs back to in-context information.

microsoft/augmented-interpretable-models

Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.

Trustworthy-ML-Lab/CB-LLMs

[ICLR 25] A novel framework for building intrinsically interpretable LLMs with...

poloclub/LLM-Attributor

LLM Attributor: Attribute LLM's Generated Text to Training Data

THUDM/LongCite

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Explore Transformer Models

All categories Trending Transformer directory Insights