ag2ai/Agents_Failure_Attribution

Benchmark for automated failure attributions in agentic systems (🏆 ICML 2025 Spotlight)

/ 100

Emerging

This project helps developers of multi-agent systems quickly pinpoint why their AI agents fail. You input a log of a failed multi-agent task, and it identifies which specific agent and step caused the failure, along with a natural language explanation. This is primarily for AI system developers, researchers, and engineers working with large language model-based multi-agent architectures.

349 stars.

Use this if you are building or debugging complex multi-agent AI systems and need to automate the process of identifying the root cause of task failures.

Not ideal if you are not working with multi-agent systems or primarily debugging single-agent AI models, as this tool is specifically designed for inter-agent failure attribution.

multi-agent-systems LLM-development AI-debugging agentic-workflow AI-system-engineering

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 13 / 25

How are scores calculated?

Stars

349

Forks

Language

Python

License

MIT

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

StonyBrookNLP/appworld

🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and...

qualifire-dev/rogue

AI Agent Evaluator & Red Team Platform

microsoft/WindowsAgentArena

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of...

future-agi/ai-evaluation

Evaluation Framework for all your AI related Workflows

RouteWorks/RouterArena

RouterArena: An open framework for evaluating LLM routers with standardized datasets, metrics,...

Explore AI Agents

All categories Trending AI Agent directory Insights