justinwetch/SkillEval
A visual workbench for A/B testing AI skills. Upload two skill files, run them through a batch of test prompts, and let an AI judge score the results.
This tool helps AI prompt engineers and researchers A/B test and compare different AI 'skills' or prompt instructions. You upload two skill files and a batch of test prompts; the system then runs both skills through the prompts, and an AI judge scores the results. This allows you to objectively determine which skill performs better for specific tasks.
Use this if you are developing or refining AI prompts (skills) and need a data-driven way to compare the performance of two different approaches.
Not ideal if you need to evaluate a large number of AI skills simultaneously or if you require human judges for evaluation.
Stars
21
Forks
1
Language
JavaScript
License
MIT
Category
Last pushed
Mar 12, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/justinwetch/SkillEval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
memodb-io/Acontext
Agent Skills as a Memory Layer
powroom/flins
Universal skill installer for AI coding agents
supabase/agent-skills
Agent Skills to help developers using AI agents with Supabase
DougTrajano/pydantic-ai-skills
This package implements Agent Skills (https://agentskills.io) support with progressive...
forefy/.context
AI Agent Skills for Smart Contract Auditing to generate triaged, industry grade report findings,...