asaparov/prontoqa

Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.

/ 100

Emerging

This project helps researchers and AI practitioners evaluate how well large language models (LLMs) reason and explain their answers. It generates specialized question-answering datasets where the inputs are simple sentences and the outputs include the correct answer along with a step-by-step reasoning process. You would use this if you are an AI researcher or a developer working on LLMs and need to rigorously test their deductive reasoning capabilities, especially on new, unseen examples.

156 stars. No commits in the last 6 months.

Use this if you need to create controlled datasets to formally analyze the 'chain-of-thought' explanations from large language models and understand their deductive reasoning.

Not ideal if you are looking for a general-purpose dataset for training or fine-tuning language models on a wide variety of real-world tasks, as this is designed for specific reasoning analysis.

AI-research language-model-evaluation reasoning-assessment NLP-benchmarking chain-of-thought-analysis

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

156

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

InternScience/GraphGen

GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

timothepearce/synda

A CLI for generating synthetic data

rasinmuhammed/misata

High-performance open-source synthetic data engine. Uses LLMs for schema design and vectorized...

ziegler-ingo/CRAFT

[TACL, EMNLP 2025 Oral] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset:...

ZhuLinsen/FastDatasets

A powerful tool for creating high-quality training datasets for Large Language Models...

Explore LLM Tools

All categories Trending LLM Tool directory Insights