logikon-ai/cot-eval

A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.

/ 100

Emerging

This tool helps AI researchers and developers systematically assess how effectively large language models (LLMs) use Chain-of-Thought (CoT) reasoning. You provide an LLM, a specific task like logical reasoning, and a CoT prompting strategy. The tool then generates reasoning traces and evaluates the model's performance on both original and perturbed versions of the task, providing metrics on CoT effectiveness and potential data contamination.

No commits in the last 6 months.

Use this if you are a researcher or AI developer working on large language models and need to rigorously evaluate their reasoning capabilities and identify potential training data contamination.

Not ideal if you are an end-user looking for a ready-to-use LLM application, rather than a framework for model evaluation.

AI-model-evaluation LLM-benchmarking reasoning-assessment natural-language-processing AI-research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

SimonAytes/SoT

Official code repository for Sketch-of-Thought (SoT)

xuyige/SoftCoT

ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint:...

Fr0zenCrane/UniCoT

[ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

nicolay-r/THOR-ECAC

The official fork of THoR Chain-of-Thought framework, enhanced and adapted for Emotion Cause...

andrewginns/CoT-at-Home

Who needs o1 anyways. Add CoT to any OpenAI compatible endpoint.

Explore LLM Tools

All categories Trending LLM Tool directory Insights