khoj-ai/llm-coup

Let LLMs play coup with each other and see who's the best at deception & strategy

/ 100

Emerging

This project helps AI researchers and developers evaluate how well different large language models (LLMs) perform in situations requiring deception and complex strategy. By simulating games of Coup between various LLMs, it provides insights into their ability to bluff, strategize, and adapt. You input your chosen LLMs and desired game parameters, and it outputs game logs, results, and performance metrics for each model.

No commits in the last 6 months.

Use this if you need to systematically test and compare the strategic thinking and deceptive capabilities of different large language models in a controlled, game-theory-driven environment.

Not ideal if you're looking to play a game of Coup against an LLM, or if your primary interest is in evaluating LLMs on tasks unrelated to strategic interaction and deception.

AI-evaluation LLM-benchmarking game-theory-AI strategic-AI deception-modeling

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 4 / 25

Maturity 15 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

TypeScript

License

GPL-3.0

Higher-rated alternatives

Cloud-CV/EvalAI

:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI

fireindark707/Python-Schema-Matching

A python tool using XGboost and sentence-transformers to perform schema matching task on tables.

graphbookai/graphbook

Visual AI development framework for training and inference of ML models, scaling pipelines, and...

visual-layer/fastdup

fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and...

github/CodeSearchNet

Datasets, tools, and benchmarks for representation learning of code.

Explore ML Frameworks

All categories Trending ML Framework directory Insights