superagent-ai/poker-eval

A comprehensive tool for assessing AI Agents performance in simulated poker environments

/ 100

Emerging

This tool helps evaluate the performance of different AI models or agents in simulated No-Limit Texas Hold'em poker games. You provide the AI agents you want to test, and the system simulates thousands of hands, generating detailed performance data like profit per hand. It's designed for researchers and developers who need to objectively benchmark AI decision-making capabilities in complex, uncertain environments.

No commits in the last 6 months. Available on npm.

Use this if you are developing or comparing AI agents and need a standardized, robust way to measure their strategic performance in poker.

Not ideal if you are looking for a poker game simulator for human players or a tool to analyze human poker strategies.

AI-agent-benchmarking poker-strategy-testing game-AI-evaluation large-language-model-assessment

No License Stale 6m

Maintenance 0 / 25

Adoption 6 / 25

Maturity 17 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

TypeScript

License

—

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights