StonyBrookNLP/appworld

🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource Paper.

65
/ 100
Established

This project creates a realistic, simulated digital world with various common apps and user behaviors. It takes an agent's code as input and simulates its interactions within this world, then evaluates how well the agent completes complex tasks. This is for AI researchers and developers who are building and testing autonomous AI agents.

388 stars. Available on PyPI.

Use this if you need a high-fidelity, controllable environment to benchmark how well your AI agents can interact with software applications and perform coding-related tasks.

Not ideal if you are looking for a simple dataset for natural language understanding or a ready-to-deploy, end-user application.

AI agent development large language model evaluation function calling interactive coding software testing
Maintenance 10 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 20 / 25

How are scores calculated?

Stars

388

Forks

59

Language

Python

License

Apache-2.0

Last pushed

Feb 17, 2026

Commits (30d)

0

Dependencies

35

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/agents/StonyBrookNLP/appworld"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.