strands-agents/evals

A comprehensive evaluation framework for AI agents and LLM applications.

53
/ 100
Established

This framework helps AI developers and ML engineers assess the performance of their AI agents and large language model (LLM) applications. It takes in test cases (inputs and expected outcomes) and the agent's responses, then provides detailed reports on how well the AI performs against predefined criteria and simulated real-world scenarios. It's for anyone building or improving AI systems who needs to ensure they are accurate, reliable, and helpful.

Use this if you are developing AI agents or LLM applications and need a systematic way to measure their output quality, analyze their decision-making process, or simulate user interactions to identify areas for improvement.

Not ideal if you are a business user looking for a no-code solution to evaluate existing off-the-shelf AI products, as this framework requires programming knowledge to set up and integrate.

AI-development LLM-evaluation AI-testing agent-performance ML-operations
No Package No Dependents
Maintenance 10 / 25
Adoption 9 / 25
Maturity 15 / 25
Community 19 / 25

How are scores calculated?

Stars

82

Forks

21

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/agents/strands-agents/evals"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.