uqlm and kernel-language-entropy

Both tools offer distinct approaches to uncertainty quantification for language models, with UQLM focusing on Python-packaged UQ-based hallucination detection and kernel-language-entropy providing code for fine-grained UQ from semantic similarities, making them **complementary** in the broader LLM-reasoning-research landscape as they address different facets of the same problem.

uqlm
73
Verified
Maintenance 20/25
Adoption 10/25
Maturity 24/25
Community 19/25
Maintenance 0/25
Adoption 7/25
Maturity 16/25
Community 14/25
Stars: 1,121
Forks: 116
Downloads:
Commits (30d): 33
Language: Python
License: Apache-2.0
Stars: 36
Forks: 6
Downloads:
Commits (30d): 0
Language: Python
License: BSD-3-Clause-Clear
No risk flags
Stale 6m No Package No Dependents

About uqlm

cvs-health/uqlm

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection

This tool helps people who use large language models (LLMs) to detect when the LLM might be generating incorrect or fabricated information, known as "hallucinations." You provide text prompts to an LLM, and this tool analyzes the responses to give you a confidence score, indicating how likely the answer is to be accurate. This is useful for anyone relying on LLM outputs for critical tasks, such as content creators, researchers, or customer service managers.

LLM-reliability content-verification AI-assurance information-quality response-evaluation

About kernel-language-entropy

AlexanderVNikitin/kernel-language-entropy

Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)

This tool helps AI researchers and practitioners evaluate how confident a large language model (LLM) is about its generated responses. It takes an LLM's output and determines a fine-grained uncertainty score by analyzing the semantic similarities in its predictions. Researchers building or deploying LLMs would use this to understand and improve model reliability.

AI-research LLM-evaluation model-reliability natural-language-processing

Related comparisons

Scores updated daily from GitHub, PyPI, and npm data. How scores work