RManLuo/llm-facteval

Source code of paper "Systematic Assessment of Factual Knowledge in Large Language Models" - EMNLP Findings 2023

/ 100

Experimental

This tool helps researchers and evaluators systematically assess how accurately Large Language Models (LLMs) recall factual information from structured knowledge graphs. You input a knowledge graph (like a database of facts) and it outputs generated questions, along with expected answers, specifically designed to test an LLM's factual knowledge. The ideal user is an AI researcher or data scientist focused on LLM performance.

No commits in the last 6 months.

Use this if you need to create precise, fact-based benchmarks to test and compare the factual accuracy of different Large Language Models.

Not ideal if you're looking for a general-purpose tool to improve LLM generation quality or evaluate subjective aspects of LLM responses.

LLM evaluation knowledge representation AI research natural language processing factual recall

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

MadryLab/context-cite

Attribute (or cite) statements generated by LLMs back to in-context information.

microsoft/augmented-interpretable-models

Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.

Trustworthy-ML-Lab/CB-LLMs

[ICLR 25] A novel framework for building intrinsically interpretable LLMs with...

poloclub/LLM-Attributor

LLM Attributor: Attribute LLM's Generated Text to Training Data

THUDM/LongCite

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Explore Transformer Models

All categories Trending Transformer directory Insights