om-ai-lab/open-agent-leaderboard
Reproducible Language Agent Research
This project helps AI researchers and developers compare the performance of different language agents across various benchmarks and large language models (LLMs). It takes specific agent algorithms (like Chain-of-Thought or ReAct) and LLM choices as inputs, then outputs a clear, fair performance score on datasets like GSM8K or MATH-500. This tool is for anyone developing, evaluating, or selecting advanced language agents for problem-solving tasks.
No commits in the last 6 months.
Use this if you need to rigorously compare how different language agent algorithms and LLMs perform on common reasoning and mathematical tasks.
Not ideal if you are a general user looking for a ready-to-use application rather than a tool for agent research and evaluation.
Stars
34
Forks
2
Language
Python
License
—
Category
Last pushed
Jun 25, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/om-ai-lab/open-agent-leaderboard"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
mitdbg/palimpzest
A System for Optimized Semantic Computation
SamurAIGPT/GPT-Agent
🚀 Introducing 🐪 CAMEL: a game-changing role-playing approach for LLMs and auto-agents like...
bubbuild/republic
Build LLM workflows like normal Python while keeping a full audit trail by default.
lwcsrf/netflux
Minimalist framework for authoring custom agentic applications in python; emphasizes task...
dlMARiA/Syzygy-of-thoughts
Syzygy-of-thoughts