ZBox1005/CoT-UQ
[arXiv 2025] "CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought"
This project helps evaluate the trustworthiness of answers generated by large language models (LLMs) like Llama, especially for complex, multi-step reasoning tasks. It takes an LLM's question and response, analyzes the reasoning steps, and provides a score indicating how confident the LLM is in its own answer. Data scientists, AI researchers, and machine learning engineers working with LLMs would use this.
No commits in the last 6 months.
Use this if you need to reliably understand how confident your LLMs are in their answers, particularly for tasks requiring logical reasoning or multi-step problem-solving.
Not ideal if you are looking for a tool to simply generate LLM responses or if your LLM applications do not require quantifying the certainty of the outputs.
Stars
16
Forks
1
Language
Python
License
—
Category
Last pushed
Apr 03, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ZBox1005/CoT-UQ"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
InternLM/SIM-CoT
[ICLR 2026] An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"
zhenyi4/codi
Official repository for "CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation"
xf-zhao/LoT
Official implementation of LoT paper: "Enhancing Zero-Shot Chain-of-Thought Reasoning in Large...
nicolay-r/Reasoning-for-Sentiment-Analysis-Framework
The official code for CoT / ZSL reasoning framework 🧠, utilized in paper: "Large Language Models...
FranxYao/FlanT5-CoT-Specialization
Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.