google-deepmind/long-form-factuality

Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".

55
/ 100
Established

This project helps evaluate how factual long-form answers from large language models are. It takes a set of prompts requiring detailed responses and the model's generated answers, then automatically assesses their factual accuracy. This is designed for researchers and developers working on improving the reliability of large language models.

672 stars.

Use this if you are a researcher or developer who needs to benchmark and improve the factual accuracy of long-form text generated by large language models like OpenAI or Anthropic models.

Not ideal if you are looking for a user-friendly application to simply check the factual accuracy of a single piece of AI-generated text without setting up a research pipeline.

AI research language model evaluation fact-checking AI natural language processing AI development
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

672

Forks

82

Language

Python

License

Last pushed

Feb 05, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/google-deepmind/long-form-factuality"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.