fannie1208/FactTest

[ICML2025] "FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees"

/ 100

Experimental

This tool helps researchers and AI practitioners systematically assess how truthful a Large Language Model (LLM) is when it generates text. You provide the LLM you want to test and a calibration dataset, and it produces a statistical measure of its factual accuracy, backed by strong statistical guarantees. It's designed for those who need to rigorously quantify and report the factuality of LLMs.

No commits in the last 6 months.

Use this if you need to scientifically test and report the factuality of an LLM with reliable, statistically-sound metrics.

Not ideal if you're looking for a simple, quick way to get a subjective sense of an LLM's general truthfulness without deep statistical analysis.

LLM evaluation AI fact-checking natural language processing research model reliability statistical testing

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

google-deepmind/long-form-factuality

Benchmarking long-form factuality in large language models. Original code for our paper...

gnai-creator/aletheion-llm-v2

Decoder-only LLM with integrated epistemic tomography. Knows what it doesn't know.

sandylaker/ib-edl

Calibrating LLMs with Information-Theoretic Evidential Deep Learning (ICLR 2025)

nightdessert/Retrieval_Head

open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality

MLD3/steerability

An open-source evaluation framework for measuring LLM steerability.

Explore Transformer Models

All categories Trending Transformer directory Insights