logikon-ai/cot-eval

A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.

34
/ 100
Emerging

This tool helps AI researchers and developers systematically assess how effectively large language models (LLMs) use Chain-of-Thought (CoT) reasoning. You provide an LLM, a specific task like logical reasoning, and a CoT prompting strategy. The tool then generates reasoning traces and evaluates the model's performance on both original and perturbed versions of the task, providing metrics on CoT effectiveness and potential data contamination.

No commits in the last 6 months.

Use this if you are a researcher or AI developer working on large language models and need to rigorously evaluate their reasoning capabilities and identify potential training data contamination.

Not ideal if you are an end-user looking for a ready-to-use LLM application, rather than a framework for model evaluation.

AI-model-evaluation LLM-benchmarking reasoning-assessment natural-language-processing AI-research
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 12 / 25

How are scores calculated?

Stars

19

Forks

3

Language

Jupyter Notebook

License

MIT

Last pushed

Feb 06, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/logikon-ai/cot-eval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.