intuit-ai-research/SPUQ

SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models

36
/ 100
Emerging

This project helps evaluate how confident a Large Language Model (LLM) is in its answers. You provide an LLM's response to a query, and it outputs a confidence score, indicating how likely that answer is to be accurate. This is useful for anyone working with LLMs who needs to trust their outputs, such as content moderators, customer service managers, or data analysts relying on LLM-generated insights.

No commits in the last 6 months.

Use this if you need to quickly assess the reliability of individual answers generated by a Large Language Model for critical applications.

Not ideal if you are looking for a tool to improve the LLM's accuracy directly or fine-tune its behavior, as this only provides an evaluation metric.

LLM evaluation AI quality assurance content moderation customer service automation data analysis
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 14 / 25

How are scores calculated?

Stars

15

Forks

3

Language

Python

License

Apache-2.0

Last pushed

Jun 24, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/intuit-ai-research/SPUQ"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.