modelscope/MCPBench

The evaluation benchmark on MCP servers

39
/ 100
Emerging

This tool helps developers and researchers evaluate the performance of different Large Language Model (LLM) agent servers, such as those used for web search or database queries. You input configuration details for the MCP servers you want to test and the framework outputs metrics like task completion accuracy, latency, and token consumption. It's designed for AI practitioners who build or utilize LLM-powered applications and need to compare server effectiveness.

241 stars. No commits in the last 6 months.

Use this if you are an AI developer or researcher needing to benchmark and compare the performance of various LLM agent servers (MCP Servers) for tasks like web search or database querying.

Not ideal if you are an end-user looking for an LLM agent solution; this tool is for evaluating the underlying servers, not for direct use of the agents themselves.

LLM evaluation AI agent benchmarking natural language processing web search engineering database query optimization
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 11 / 25

How are scores calculated?

Stars

241

Forks

15

Language

Python

License

Apache-2.0

Last pushed

Sep 03, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/mcp/modelscope/MCPBench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.