LLM-Evaluation-s-Always-Fatiguing/leaf-playground

A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.

23
/ 100
Experimental

Evaluating how well large language models (LLMs) perform in specific scenarios or tasks can be time-consuming and complex. This framework helps you define complex scenarios where human and LLM agents interact, then automatically evaluates the LLMs' actions and visualizes the results. This is ideal for AI researchers, product managers, or developers who need to rigorously test and compare different LLM agent behaviors.

No commits in the last 6 months.

Use this if you need to set up realistic, interactive simulations to benchmark and understand the performance of LLM-based agents, with built-in visualization and automated evaluation.

Not ideal if you are looking for a simple, single-metric evaluation tool for basic LLM prompts, rather than agent behavior in complex, multi-turn scenarios.

LLM evaluation AI agent simulation conversational AI testing LLM application development human-AI interaction
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

27

Forks

Language

Python

License

MIT

Last pushed

Jun 18, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/LLM-Evaluation-s-Always-Fatiguing/leaf-playground"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.