jordan-gibbs/secret-hitler-bench

An LLM benchmark based on the popular social deception game, Secret Hitler. Test intelligence, long context planning, logic, and duplicitous capabilities of popular AI models.

26
/ 100
Experimental

This project simulates full 8-player games of Secret Hitler with AI agents driven by large language models. You can test how well different AI models perform at deception, strategic thinking, and social deduction. It takes your chosen language models as input and outputs detailed game logs, win rates, and player statistics, viewable in a live web interface. Researchers, AI evaluators, and anyone interested in the social intelligence of AI would use this.

Use this if you want to rigorously evaluate the lying, strategic planning, and social interaction capabilities of various large language models in a complex game setting.

Not ideal if you are looking for a free simulation, as running LLM-powered games can be very expensive.

AI evaluation social intelligence deception research game theory LLM testing
No Package No Dependents
Maintenance 13 / 25
Adoption 4 / 25
Maturity 9 / 25
Community 0 / 25

How are scores calculated?

Stars

8

Forks

Language

Python

License

Last pushed

Mar 23, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jordan-gibbs/secret-hitler-bench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.