google-deepmind/long-form-factuality

Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".

/ 100

Established

This project helps evaluate how factual long-form answers from large language models are. It takes a set of prompts requiring detailed responses and the model's generated answers, then automatically assesses their factual accuracy. This is designed for researchers and developers working on improving the reliability of large language models.

672 stars.

Use this if you are a researcher or developer who needs to benchmark and improve the factual accuracy of long-form text generated by large language models like OpenAI or Anthropic models.

Not ideal if you are looking for a user-friendly application to simply check the factual accuracy of a single piece of AI-generated text without setting up a research pipeline.

AI research language model evaluation fact-checking AI natural language processing AI development

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

672

Forks

Language

Python

License

—

Related models

gnai-creator/aletheion-llm-v2

Decoder-only LLM with integrated epistemic tomography. Knows what it doesn't know.

sandylaker/ib-edl

Calibrating LLMs with Information-Theoretic Evidential Deep Learning (ICLR 2025)

nightdessert/Retrieval_Head

open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality

MLD3/steerability

An open-source evaluation framework for measuring LLM steerability.

kazemihabib/Mitigating-Reasoning-LLM-Social-Bias

A novel approach to mitigating social bias in Large Language Models through a multi-judge...

Explore Transformer Models

All categories Trending Transformer directory Insights