Yangyi-Chen/CoTConsistency

The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".

/ 100

Experimental

This dataset provides a benchmark, CURE, for evaluating how well vision-language models can reason and maintain consistency in their explanations. It offers structured data including images, highlighted visual clues, potential inferences, and step-by-step reasoning chains. Researchers and developers working with AI models that interpret images and text can use this to assess and improve their models' explanatory capabilities.

No commits in the last 6 months.

Use this if you are a researcher or AI developer working on vision-language models and need a dataset to measure their reasoning performance and the consistency of their explanations.

Not ideal if you are looking for a general-purpose image annotation tool or a dataset for basic image classification tasks without a focus on complex reasoning chains.

vision-language-models AI-reasoning model-evaluation computer-vision natural-language-processing

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

—

License

—

Higher-rated alternatives

SimonAytes/SoT

Official code repository for Sketch-of-Thought (SoT)

xuyige/SoftCoT

ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint:...

Fr0zenCrane/UniCoT

[ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

logikon-ai/cot-eval

A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.

nicolay-r/THOR-ECAC

The official fork of THoR Chain-of-Thought framework, enhanced and adapted for Emotion Cause...

Explore LLM Tools

All categories Trending LLM Tool directory Insights