lennart-finke/baba_is_eval

Claude et al. play the brilliant puzzle title "Baba is You"

/ 100

Experimental

This project helps evaluate how well language models can reason and strategize in interactive environments by having them play the puzzle game "Baba Is You." It takes text commands from a language model, translates them into game actions, and provides the current game state back as a text matrix. This is for researchers or developers focused on assessing advanced AI capabilities, particularly meta-level reasoning in complex, rule-bending scenarios.

No commits in the last 6 months.

Use this if you are a researcher or AI developer who wants to benchmark language models on their ability to understand and manipulate game rules in a complex, dynamic puzzle environment.

Not ideal if you are looking for a stable, ready-to-use tool for general game playing or if you do not have programming experience to set up and integrate language models.

AI evaluation language model reasoning interactive AI game AI cognitive science

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 7 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

deus-h/claudeus-wp-mcp

Claudeus WordPress MCP Server

minipuft/claude-prompts

MCP prompt template server: hot-reload, thinking frameworks, quality gates

mkXultra/ai-cli-mcp

MCP server to run Claude, Codex, and Gemini CLI agents in the background from any MCP client.

IMNMV/ClaudeR

Connect RStudio to Claude, Codex, Gemini, and other AI assistants via MCP. Multi-agent...

milisp/awesome-claude-dxt

Awesome Claude Desktop Extensions (dxt) (not only Claude) mcpb

Explore MCP Servers

All categories Trending MCP Server directory Insights