sgoedecke/ai-poker-arena

Making multiple LLMs play Texas Holdem against each other

/ 100

Experimental

This tool helps AI researchers and developers evaluate the strategic capabilities of different large language models (LLMs) by having them play Texas Hold'em poker against each other. You input various LLMs, and the system simulates poker games, providing insights into which models demonstrate superior strategic decision-making in an adversarial environment. This is ideal for those who need a novel way to benchmark AI model performance beyond traditional benchmarks or human voting.

No commits in the last 6 months.

Use this if you are an AI researcher or developer looking for an objective, adversarial method to compare the strategic reasoning and decision-making of different LLMs.

Not ideal if you need to evaluate LLMs for tasks that don't involve adversarial strategy, like creative writing, summarization, or factual question-answering.

AI-evaluation LLM-benchmarking Adversarial-AI Model-comparison Strategic-AI

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

JavaScript

License

—

Higher-rated alternatives

google-deepmind/concordia

A library for generative social simulation

Mai-xiyu/Minecraft_AI

AI Play Minecraft

mikelma/craftium

A framework for creating rich, 3D, Minecraft-like single and multi-agent environments for AI...

cocacola-lab/MineLand

Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs

rezaho/MARSYS

Multi-Agent Reasoning Systems

Explore AI Agents

All categories Trending AI Agent directory Insights