lechmazur/debate
Adversarial multi-turn benchmark for LLM debate quality, using side-swapped matchups and multi-model judging to rank models by judged debate performance.
This benchmark helps developers and researchers evaluate how well different large language models (LLMs) perform in multi-turn, adversarial debates across diverse topics. It takes various LLMs and a set of debate propositions as input, then outputs a leaderboard ranking models based on their ability to argue, rebut, and maintain coherence under pressure. This is for anyone creating, deploying, or researching LLMs who needs to understand their true argumentative capabilities.
Use this if you need to understand how well an LLM can defend a position, adapt to counterarguments, and maintain a coherent argument over multiple turns in a debate, rather than just providing a single good answer.
Not ideal if you are looking for a simple, one-shot evaluation of an LLM's general knowledge or immediate question-answering ability without the pressure of an adversarial exchange.
Stars
8
Forks
—
Language
—
License
—
Category
Last pushed
Mar 23, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/lechmazur/debate"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
betagouv/ComparIA
Open source LLM arena created by the French Government
Skytliang/Multi-Agents-Debate
MAD: The first work to explore Multi-Agent Debate with Large Language Models :D
liuxiaotong/ai-dataset-radar
Multi-source async competitive intelligence engine for AI training data ecosystems with...
Arnoldlarry15/ARES-Dashboard
AI Red Team Operations Console
llm-ring/lmring
Open-source, self-hostable LLM arena with model compare, voting, and leaderboards