AI Debate Arenas LLM Tools

Interactive platforms where AI models or humans compete in structured debates with scoring/judging. Includes multi-model comparison tools, live debate staging, and adversarial benchmarking. Does NOT include general competitive game frameworks, coding competitions, or security red-team tools without debate mechanics.

There are 36 ai debate arenas tools tracked. 1 score above 50 (established tier). The highest-rated is betagouv/ComparIA at 52/100 with 63 stars.

Get all 36 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=ai-debate-arenas&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	betagouv/ComparIA Open source LLM arena created by the French Government	52	Established	63	Jupyter Notebook
2	Skytliang/Multi-Agents-Debate MAD: The first work to explore Multi-Agent Debate with Large Language Models :D	49	Emerging	532	Python
3	liuxiaotong/ai-dataset-radar Multi-source async competitive intelligence engine for AI training data...	48	Emerging	2	Python
4	Arnoldlarry15/ARES-Dashboard AI Red Team Operations Console	44	Emerging	14	TypeScript
5	llm-ring/lmring Open-source, self-hostable LLM arena with model compare, voting, and leaderboards	41	Emerging	8	TypeScript
6	YerbaPage/SWE-Debate SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution	35	Emerging	25	Python
7	khoren93/ai-debates Orchestrate epic battles between 600+ AI models (GPT-5, Gemini 3, DeepSeek...	35	Emerging	11	TypeScript
8	rd-serendipity/ai-debate-arena AI Debate Arena: Streamlit app for AI model debates. Features multi-model...	33	Emerging	7	Python
9	debate/debate-ai.com Debate AI enables debaters in PF, LD & Policy to streamline AI research...	30	Emerging	7	TypeScript
10	jhammant/agent-drift Stress-test AI agents for goal drift and system prompt violations. Inspired...	28	Experimental	5	HTML
11	the-meta-value/The-Perfect-Storm tracking AI system failures with solar weather	26	Experimental	1	TeX
12	aramirez-maza/beto-framework BETO formalizes the ignorance of an AI. Materializes raw semantic intent...	26	Experimental	2	Python
13	lukeslp/consensus Multi-model research debate engine — 8+ language models independently...	25	Experimental	1	HTML
14	SurgeCLI/Surge Surge is a lightweight self-learning AI observability and remediation agent...	23	Experimental	2	Python
15	dinesh-git17/debate-lab Watch AI models debate any topic in real-time. ChatGPT and Grok argue,...	22	Experimental	3	TypeScript
16	13120740298z-lang/AI-Tech-Radar 🤖 Automated AI technology intelligence platform — tracks GitHub AI projects,...	22	Experimental	1	JavaScript
17	qingni/TechSentry 🛡️ AI-powered tech intelligence tool that auto-tracks GitHub repos, Hacker...	22	Experimental	—	Python
18	sanifhimani/llm-colosseum AI models fight each other in a pixel arena every day. They decide what to...	22	Experimental	1	JavaScript
19	armsp/AIFU AI flub ups	21	Experimental	1	JavaScript
20	konradhy/battlearena A proof of concept illustrating how AI can enhance games	21	Experimental	1	TypeScript
21	florykhan/TelusGuardAI AI-powered network impact analyzer. Natural-language queries → multi-agent...	21	Experimental	—	JavaScript
22	netlify/nextjs-sentinel Monitors Next.js releases for relevance to Netlify	21	Experimental	4	TypeScript
23	Firmislabs/ai-inventory Detect AI frameworks, LLM dependencies, and model files in any project. One...	19	Experimental	—	TypeScript
24	lechmazur/debate Adversarial multi-turn benchmark for LLM debate quality, using side-swapped...	18	Experimental	8	—
25	ethicals7s/EchoArena Local LLM debate arena — make two models battle any topic offline, third...	17	Experimental	—	Go
26	Rak2k6/gig-audit Fair Gig Guardian is an AI-powered platform that analyzes gig economy...	14	Experimental	—	TypeScript
27	gregorydouglasquarles/lavaflow-site AI-driven predictive safety ecosystem (qView) and multimedia architecture....	14	Experimental	—	HTML
28	VAMP-NEER/release-radar-ai 🔍 Ultimate Free GitHub Trend Tracker 2026 🚀 \| AI-Powered Repo & Dev Team...	14	Experimental	—	—
29	samuel-dobrancin-qa/ai-content-quality-framework Structured framework for evaluating AI content generator quality across six...	14	Experimental	—	—
30	SuperAdam47/GabyT-frontend GabyT is LLM platform with OpenAI	14	Experimental	7	Vue
31	andronov04/aiarena Client-side AI arena for comparing 1000+ models across 68+ providers. No...	14	Experimental	4	TypeScript
32	Swap-24/ARGUS Real-time AI debate arena. Argue live against opponents while an ML pipeline...	13	Experimental	—	JavaScript
33	Gliangquan/awesome-ai-radar Daily curated AI, LLM, and agent project radar from GitHub	13	Experimental	—	JavaScript
34	Privacy-Engineering-CMU/ai-risk-prettified A prettified page for MIT's AI Risk Database	13	Experimental	—	HTML
35	chankeypathak/AuditSync-Pro Gen AI application that automatically compares and analyzes audit reports...	10	Experimental	1	Python
36	ericy-eth/debatr Debatr enables users to quickly generate quality debate speeches tailored to...	10	Experimental	2	JavaScript