ZBox1005/CoT-UQ

[arXiv 2025] "CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought"

/ 100

Experimental

This project helps evaluate the trustworthiness of answers generated by large language models (LLMs) like Llama, especially for complex, multi-step reasoning tasks. It takes an LLM's question and response, analyzes the reasoning steps, and provides a score indicating how confident the LLM is in its own answer. Data scientists, AI researchers, and machine learning engineers working with LLMs would use this.

No commits in the last 6 months.

Use this if you need to reliably understand how confident your LLMs are in their answers, particularly for tasks requiring logical reasoning or multi-step problem-solving.

Not ideal if you are looking for a tool to simply generate LLM responses or if your LLM applications do not require quantifying the certainty of the outputs.

AI research LLM evaluation uncertainty quantification natural language processing model reliability

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

InternLM/SIM-CoT

[ICLR 2026] An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"

zhenyi4/codi

Official repository for "CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation"

xf-zhao/LoT

Official implementation of LoT paper: "Enhancing Zero-Shot Chain-of-Thought Reasoning in Large...

nicolay-r/Reasoning-for-Sentiment-Analysis-Framework

The official code for CoT / ZSL reasoning framework 🧠, utilized in paper: "Large Language Models...

FranxYao/FlanT5-CoT-Specialization

Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.

Explore Transformer Models

All categories Trending Transformer directory Insights