lechmazur/elimination_game
A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each other
This project helps AI researchers and developers evaluate how well different large language models (LLMs) handle complex social interactions, strategy, and deception. It takes various LLMs as input and simulates a multi-player 'elimination game' where they communicate, form alliances, and vote each other out. The output includes detailed analytics on conversation logs, voting patterns, and final rankings, revealing how models manage public personas versus hidden agendas.
302 stars.
Use this if you need to benchmark the social reasoning, strategic planning, and deceptive capabilities of LLMs in a dynamic, multi-agent environment.
Not ideal if you are looking to evaluate LLMs purely on factual recall, simple dialogue generation, or task-specific instruction following.
Stars
302
Forks
11
Language
—
License
—
Category
Last pushed
Jan 07, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/lechmazur/elimination_game"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
StonyBrookNLP/appworld
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and...
qualifire-dev/rogue
AI Agent Evaluator & Red Team Platform
microsoft/WindowsAgentArena
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of...
future-agi/ai-evaluation
Evaluation Framework for all your AI related Workflows
agentscope-ai/OpenJudge
OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards