lechmazur/pgg_bench

Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies among Large Language Models (LLMs) in a resource-sharing economic scenario. Our experiment extends the classic PGG with a punishment phase, allowing players to penalize free-riders or retaliate against others.

/ 100

Experimental

This project helps researchers and strategists understand how different large language models (LLMs) behave in economic scenarios involving resource sharing and consequences. It simulates a 'Public Goods Game' where LLMs decide to contribute tokens to a common pool, which then multiplies and is distributed. Afterward, they can punish others, allowing users to observe strategies like cooperation, free-riding, and retaliation among AI agents.

No commits in the last 6 months.

Use this if you are an AI researcher or strategist needing to evaluate the cooperative and self-interested behaviors of various LLMs in dynamic, resource-sharing environments, especially where punishment mechanisms exist.

Not ideal if you need to build or optimize a specific LLM application for a non-economic task, or if you are looking for a general-purpose LLM benchmarking tool unrelated to multi-agent social dynamics.

AI strategy economic simulation multi-agent systems LLM behavior analysis game theory

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

—

License

—

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

StonyBrookNLP/appworld

🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and...

qualifire-dev/rogue

AI Agent Evaluator & Red Team Platform

microsoft/WindowsAgentArena

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of...

future-agi/ai-evaluation

Evaluation Framework for all your AI related Workflows

agentscope-ai/OpenJudge

OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards

Explore AI Agents

All categories Trending AI Agent directory Insights