ZBox1005/CoT-UQ

[arXiv 2025] "CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought"

19
/ 100
Experimental

This project helps evaluate the trustworthiness of answers generated by large language models (LLMs) like Llama, especially for complex, multi-step reasoning tasks. It takes an LLM's question and response, analyzes the reasoning steps, and provides a score indicating how confident the LLM is in its own answer. Data scientists, AI researchers, and machine learning engineers working with LLMs would use this.

No commits in the last 6 months.

Use this if you need to reliably understand how confident your LLMs are in their answers, particularly for tasks requiring logical reasoning or multi-step problem-solving.

Not ideal if you are looking for a tool to simply generate LLM responses or if your LLM applications do not require quantifying the certainty of the outputs.

AI research LLM evaluation uncertainty quantification natural language processing model reliability
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 5 / 25

How are scores calculated?

Stars

16

Forks

1

Language

Python

License

Last pushed

Apr 03, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ZBox1005/CoT-UQ"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.