jordan-gibbs/secret-hitler-bench
An LLM benchmark based on the popular social deception game, Secret Hitler. Test intelligence, long context planning, logic, and duplicitous capabilities of popular AI models.
This project simulates full 8-player games of Secret Hitler with AI agents driven by large language models. You can test how well different AI models perform at deception, strategic thinking, and social deduction. It takes your chosen language models as input and outputs detailed game logs, win rates, and player statistics, viewable in a live web interface. Researchers, AI evaluators, and anyone interested in the social intelligence of AI would use this.
Use this if you want to rigorously evaluate the lying, strategic planning, and social interaction capabilities of various large language models in a complex game setting.
Not ideal if you are looking for a free simulation, as running LLM-powered games can be very expensive.
Stars
8
Forks
—
Language
Python
License
—
Category
Last pushed
Mar 23, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jordan-gibbs/secret-hitler-bench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
sierra-research/tau2-bench
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
xlang-ai/OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
bigcode-project/bigcodebench
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
THUDM/AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
scicode-bench/SciCode
A benchmark that challenges language models to code solutions for scientific problems