lechmazur/pgg_bench
Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies among Large Language Models (LLMs) in a resource-sharing economic scenario. Our experiment extends the classic PGG with a punishment phase, allowing players to penalize free-riders or retaliate against others.
This project helps researchers and strategists understand how different large language models (LLMs) behave in economic scenarios involving resource sharing and consequences. It simulates a 'Public Goods Game' where LLMs decide to contribute tokens to a common pool, which then multiplies and is distributed. Afterward, they can punish others, allowing users to observe strategies like cooperation, free-riding, and retaliation among AI agents.
No commits in the last 6 months.
Use this if you are an AI researcher or strategist needing to evaluate the cooperative and self-interested behaviors of various LLMs in dynamic, resource-sharing environments, especially where punishment mechanisms exist.
Not ideal if you need to build or optimize a specific LLM application for a non-economic task, or if you are looking for a general-purpose LLM benchmarking tool unrelated to multi-agent social dynamics.
Stars
39
Forks
2
Language
—
License
—
Category
Last pushed
Apr 10, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/lechmazur/pgg_bench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
StonyBrookNLP/appworld
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and...
qualifire-dev/rogue
AI Agent Evaluator & Red Team Platform
microsoft/WindowsAgentArena
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of...
future-agi/ai-evaluation
Evaluation Framework for all your AI related Workflows
agentscope-ai/OpenJudge
OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards