LLM-Evaluation-s-Always-Fatiguing/leaf-playground
A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.
Evaluating how well large language models (LLMs) perform in specific scenarios or tasks can be time-consuming and complex. This framework helps you define complex scenarios where human and LLM agents interact, then automatically evaluates the LLMs' actions and visualizes the results. This is ideal for AI researchers, product managers, or developers who need to rigorously test and compare different LLM agent behaviors.
No commits in the last 6 months.
Use this if you need to set up realistic, interactive simulations to benchmark and understand the performance of LLM-based agents, with built-in visualization and automated evaluation.
Not ideal if you are looking for a simple, single-metric evaluation tool for basic LLM prompts, rather than agent behavior in complex, multi-turn scenarios.
Stars
27
Forks
—
Language
Python
License
MIT
Category
Last pushed
Jun 18, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/LLM-Evaluation-s-Always-Fatiguing/leaf-playground"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
mitdbg/palimpzest
A System for Optimized Semantic Computation
SamurAIGPT/GPT-Agent
🚀 Introducing 🐪 CAMEL: a game-changing role-playing approach for LLMs and auto-agents like...
bubbuild/republic
Build LLM workflows like normal Python while keeping a full audit trail by default.
lwcsrf/netflux
Minimalist framework for authoring custom agentic applications in python; emphasizes task...
dlMARiA/Syzygy-of-thoughts
Syzygy-of-thoughts