intuit-ai-research/SPUQ

SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models

/ 100

Emerging

This project helps evaluate how confident a Large Language Model (LLM) is in its answers. You provide an LLM's response to a query, and it outputs a confidence score, indicating how likely that answer is to be accurate. This is useful for anyone working with LLMs who needs to trust their outputs, such as content moderators, customer service managers, or data analysts relying on LLM-generated insights.

No commits in the last 6 months.

Use this if you need to quickly assess the reliability of individual answers generated by a Large Language Model for critical applications.

Not ideal if you are looking for a tool to improve the LLM's accuracy directly or fine-tune its behavior, as this only provides an evaluation metric.

LLM evaluation AI quality assurance content moderation customer service automation data analysis

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

open-thought/reasoning-gym

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Hmbown/Hegelion

Dialectical reasoning architecture for LLMs (Thesis → Antithesis → Synthesis)

LLM360/Reasoning360

A repo for open research on building large reasoning models

TsinghuaC3I/Awesome-RL-for-LRMs

A Survey of Reinforcement Learning for Large Reasoning Models

bowang-lab/BioReason

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | NeurIPS '25

Explore LLM Tools

All categories Trending LLM Tool directory Insights