camel-ai/crab
🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/
CRAB helps AI researchers and developers evaluate how well multimodal language model agents perform across various simulated or real-world environments. You define tasks and agent actions, provide your agent, and the framework generates detailed performance metrics and evaluations. This is ideal for those who build and test advanced AI agents.
405 stars.
Use this if you need a flexible way to benchmark the capabilities of your multimodal language model agents in diverse and complex settings.
Not ideal if you are looking for a pre-built agent to solve a specific problem rather than a tool for agent evaluation.
Stars
405
Forks
56
Language
Python
License
—
Category
Last pushed
Mar 04, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/camel-ai/crab"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
sierra-research/tau2-bench
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
xlang-ai/OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
bigcode-project/bigcodebench
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
scicode-bench/SciCode
A benchmark that challenges language models to code solutions for scientific problems
THUDM/AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)