lennart-finke/baba_is_eval
Claude et al. play the brilliant puzzle title "Baba is You"
This project helps evaluate how well language models can reason and strategize in interactive environments by having them play the puzzle game "Baba Is You." It takes text commands from a language model, translates them into game actions, and provides the current game state back as a text matrix. This is for researchers or developers focused on assessing advanced AI capabilities, particularly meta-level reasoning in complex, rule-bending scenarios.
No commits in the last 6 months.
Use this if you are a researcher or AI developer who wants to benchmark language models on their ability to understand and manipulate game rules in a complex, dynamic puzzle environment.
Not ideal if you are looking for a stable, ready-to-use tool for general game playing or if you do not have programming experience to set up and integrate language models.
Stars
52
Forks
6
Language
Python
License
—
Category
Last pushed
Jun 30, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/mcp/lennart-finke/baba_is_eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
deus-h/claudeus-wp-mcp
Claudeus WordPress MCP Server
minipuft/claude-prompts
MCP prompt template server: hot-reload, thinking frameworks, quality gates
mkXultra/ai-cli-mcp
MCP server to run Claude, Codex, and Gemini CLI agents in the background from any MCP client.
IMNMV/ClaudeR
Connect RStudio to Claude, Codex, Gemini, and other AI assistants via MCP. Multi-agent...
milisp/awesome-claude-dxt
Awesome Claude Desktop Extensions (dxt) (not only Claude) mcpb