lechmazur/debate

Adversarial multi-turn benchmark for LLM debate quality, using side-swapped matchups and multi-model judging to rank models by judged debate performance.

18
/ 100
Experimental

This benchmark helps developers and researchers evaluate how well different large language models (LLMs) perform in multi-turn, adversarial debates across diverse topics. It takes various LLMs and a set of debate propositions as input, then outputs a leaderboard ranking models based on their ability to argue, rebut, and maintain coherence under pressure. This is for anyone creating, deploying, or researching LLMs who needs to understand their true argumentative capabilities.

Use this if you need to understand how well an LLM can defend a position, adapt to counterarguments, and maintain a coherent argument over multiple turns in a debate, rather than just providing a single good answer.

Not ideal if you are looking for a simple, one-shot evaluation of an LLM's general knowledge or immediate question-answering ability without the pressure of an adversarial exchange.

LLM evaluation conversational AI model comparison argumentation assessment AI research
No License No Package No Dependents
Maintenance 13 / 25
Adoption 4 / 25
Maturity 1 / 25
Community 0 / 25

How are scores calculated?

Stars

8

Forks

Language

License

Category

ai-debate-arenas

Last pushed

Mar 23, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/lechmazur/debate"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.