lechmazur/elimination_game

A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each other

/ 100

Emerging

This project helps AI researchers and developers evaluate how well different large language models (LLMs) handle complex social interactions, strategy, and deception. It takes various LLMs as input and simulates a multi-player 'elimination game' where they communicate, form alliances, and vote each other out. The output includes detailed analytics on conversation logs, voting patterns, and final rankings, revealing how models manage public personas versus hidden agendas.

302 stars.

Use this if you need to benchmark the social reasoning, strategic planning, and deceptive capabilities of LLMs in a dynamic, multi-agent environment.

Not ideal if you are looking to evaluate LLMs purely on factual recall, simple dialogue generation, or task-specific instruction following.

LLM evaluation social AI multi-agent systems AI ethics strategic AI

No License No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 9 / 25

How are scores calculated?

Stars

302

Forks

Language

—

License

—

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

StonyBrookNLP/appworld

🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and...

qualifire-dev/rogue

AI Agent Evaluator & Red Team Platform

microsoft/WindowsAgentArena

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of...

future-agi/ai-evaluation

Evaluation Framework for all your AI related Workflows

agentscope-ai/OpenJudge

OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards

Explore AI Agents

All categories Trending AI Agent directory Insights