ChenWu98/agent-attack
[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents
This project helps developers evaluate the security and reliability of multimodal AI agents, especially those interacting with web environments. It takes multimodal agent code and web-based task data, then applies crafted adversarial image attacks to measure how easily the agent can be misled or fail its tasks. Researchers and developers working on robust AI systems for web automation or general multimodal interaction would use this to stress-test their agents.
132 stars. No commits in the last 6 months.
Use this if you are a developer or researcher building and evaluating multimodal AI agents and need to understand their vulnerabilities to visual adversarial attacks in web-based scenarios.
Not ideal if you are an end-user of an AI agent and are not involved in its development, testing, or security analysis.
Stars
132
Forks
9
Language
Python
License
MIT
Category
Last pushed
Feb 19, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/ChenWu98/agent-attack"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PacificAI/langtest
Deliver safe & effective language models
microsoft/OpenRCA
[ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?
Babelscape/ALERT
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language...
TrustGen/TrustEval-toolkit
[ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative...
Trust4AI/ASTRAL
Automated Safety Testing of Large Language Models