asaparov/prontoqa

Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.

41
/ 100
Emerging

This project helps researchers and AI practitioners evaluate how well large language models (LLMs) reason and explain their answers. It generates specialized question-answering datasets where the inputs are simple sentences and the outputs include the correct answer along with a step-by-step reasoning process. You would use this if you are an AI researcher or a developer working on LLMs and need to rigorously test their deductive reasoning capabilities, especially on new, unseen examples.

156 stars. No commits in the last 6 months.

Use this if you need to create controlled datasets to formally analyze the 'chain-of-thought' explanations from large language models and understand their deductive reasoning.

Not ideal if you are looking for a general-purpose dataset for training or fine-tuning language models on a wide variety of real-world tasks, as this is designed for specific reasoning analysis.

AI-research language-model-evaluation reasoning-assessment NLP-benchmarking chain-of-thought-analysis
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 13 / 25

How are scores calculated?

Stars

156

Forks

16

Language

Python

License

Apache-2.0

Last pushed

Sep 09, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/asaparov/prontoqa"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.