zjunlp/WorfBench
[ICLR 2025] Benchmarking Agentic Workflow Generation
This project helps AI researchers and developers evaluate how well Large Language Models (LLMs) can break down complex problems into structured workflows. It takes a problem description and the workflow generated by an LLM, then assesses the accuracy and efficiency of that generated workflow. It is designed for those who are developing, benchmarking, or researching the capabilities of AI agents and LLMs for planning and reasoning tasks.
145 stars. No commits in the last 6 months.
Use this if you need to systematically benchmark and compare the workflow generation capabilities of different Large Language Models or AI agents.
Not ideal if you are looking for an off-the-shelf tool to directly generate workflows for business processes without requiring AI research or development expertise.
Stars
145
Forks
8
Language
Python
License
MIT
Category
Last pushed
Feb 19, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/zjunlp/WorfBench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
openai/openai-agents-python
A lightweight, powerful framework for multi-agent workflows
openagents-org/openagents
OpenAgents - AI Agent Networks for Open Collaboration
vamplabAI/sgr-agent-core
Schema-Guided Reasoning (SGR) has agentic system design created by neuraldeep community
BrainBlend-AI/atomic-agents
Building AI agents, atomically
camel-ai/camel
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents....