intuit-ai-research/SPUQ
SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models
This project helps evaluate how confident a Large Language Model (LLM) is in its answers. You provide an LLM's response to a query, and it outputs a confidence score, indicating how likely that answer is to be accurate. This is useful for anyone working with LLMs who needs to trust their outputs, such as content moderators, customer service managers, or data analysts relying on LLM-generated insights.
No commits in the last 6 months.
Use this if you need to quickly assess the reliability of individual answers generated by a Large Language Model for critical applications.
Not ideal if you are looking for a tool to improve the LLM's accuracy directly or fine-tune its behavior, as this only provides an evaluation metric.
Stars
15
Forks
3
Language
Python
License
Apache-2.0
Category
Last pushed
Jun 24, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/intuit-ai-research/SPUQ"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-thought/reasoning-gym
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
Hmbown/Hegelion
Dialectical reasoning architecture for LLMs (Thesis → Antithesis → Synthesis)
LLM360/Reasoning360
A repo for open research on building large reasoning models
TsinghuaC3I/Awesome-RL-for-LRMs
A Survey of Reinforcement Learning for Large Reasoning Models
bowang-lab/BioReason
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | NeurIPS '25