google-deepmind/long-form-factuality
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
This project helps evaluate how factual long-form answers from large language models are. It takes a set of prompts requiring detailed responses and the model's generated answers, then automatically assesses their factual accuracy. This is designed for researchers and developers working on improving the reliability of large language models.
672 stars.
Use this if you are a researcher or developer who needs to benchmark and improve the factual accuracy of long-form text generated by large language models like OpenAI or Anthropic models.
Not ideal if you are looking for a user-friendly application to simply check the factual accuracy of a single piece of AI-generated text without setting up a research pipeline.
Stars
672
Forks
82
Language
Python
License
—
Category
Last pushed
Feb 05, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/google-deepmind/long-form-factuality"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
gnai-creator/aletheion-llm-v2
Decoder-only LLM with integrated epistemic tomography. Knows what it doesn't know.
sandylaker/ib-edl
Calibrating LLMs with Information-Theoretic Evidential Deep Learning (ICLR 2025)
nightdessert/Retrieval_Head
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
MLD3/steerability
An open-source evaluation framework for measuring LLM steerability.
kazemihabib/Mitigating-Reasoning-LLM-Social-Bias
A novel approach to mitigating social bias in Large Language Models through a multi-judge...