lennart-finke/baba_is_eval

Claude et al. play the brilliant puzzle title "Baba is You"

29
/ 100
Experimental

This project helps evaluate how well language models can reason and strategize in interactive environments by having them play the puzzle game "Baba Is You." It takes text commands from a language model, translates them into game actions, and provides the current game state back as a text matrix. This is for researchers or developers focused on assessing advanced AI capabilities, particularly meta-level reasoning in complex, rule-bending scenarios.

No commits in the last 6 months.

Use this if you are a researcher or AI developer who wants to benchmark language models on their ability to understand and manipulate game rules in a complex, dynamic puzzle environment.

Not ideal if you are looking for a stable, ready-to-use tool for general game playing or if you do not have programming experience to set up and integrate language models.

AI evaluation language model reasoning interactive AI game AI cognitive science
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 7 / 25
Community 12 / 25

How are scores calculated?

Stars

52

Forks

6

Language

Python

License

Last pushed

Jun 30, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/mcp/lennart-finke/baba_is_eval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.