IAAR-Shanghai/xVerify

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

40
/ 100
Emerging

This tool helps researchers, educators, and evaluators quickly and accurately assess the correctness of answers generated by AI reasoning models. It takes the original question, the known correct answer, and the AI's generated reasoning process and final answer as input. It then determines if the AI's answer is correct, even when the formatting or language differs, outputting a judgment of 'Correct' or 'Incorrect'. This is ideal for anyone who needs to systematically evaluate the performance of large language models on objective tasks.

144 stars.

Use this if you need to reliably evaluate the accuracy of AI-generated answers for objective questions, especially when responses include complex reasoning, various mathematical notations, or natural language variations.

Not ideal if your questions are open-ended, subjective, or require nuanced human judgment beyond clear-cut objective correctness.

AI-model-evaluation answer-assessment educational-assessment mathematics-evaluation natural-language-processing
No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 8 / 25

How are scores calculated?

Stars

144

Forks

7

Language

Jupyter Notebook

License

Last pushed

Nov 13, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/IAAR-Shanghai/xVerify"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.