Prompt Experimentation Platforms Prompt Engineering Tools

Tools for systematic A/B testing, comparison, and evaluation of LLM prompts across multiple models and variants. Includes statistical analysis, cost/performance measurement, and playground environments for prompt optimization. Does NOT include prompt templates, prompt collections, general LLM evaluation frameworks, or prompt management without experimentation features.

There are 34 prompt experimentation platforms tools tracked. The highest-rated is Mirascope/lilypad at 48/100 with 214 stars.

Get all 34 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=prompt-engineering&subcategory=prompt-experimentation-platforms&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	Mirascope/lilypad Open-source versioning, tracing, and annotation tooling.	48	Emerging	214	Python
2	Supervertaler/Supervertaler-Workbench Open-source, AI-enhanced CAT tool with multi-LLM support, translation...	48	Emerging	26	Python
3	crjaensch/PromptoLab A multi-platform app to serve as a prompts catalog, a LLM playground for...	44	Emerging	7	Python
4	parea-ai/parea-sdk-py Python SDK for experimenting, testing, evaluating & monitoring LLM-powered...	37	Emerging	82	Python
5	geeknees/sentinel_rb SentinelRb is an LLM-driven prompt inspector designed to automatically...	27	Experimental	2	Ruby
6	NeuroTinkerLab/synt-e-project A Python tool to translate natural language requests into efficient,...	23	Experimental	6	Python
7	tmam-dev/tmam-python-sdk An open-source LLM engineering platform featuring observability, metrics,...	23	Experimental	—	Python
8	jeong-se-hun/autotune-skill Eval-first tuning skill for prompts, docs, skills, and code with guards,...	22	Experimental	—	Python
9	MukundaKatta/PromptLab Prompt experimentation workspace — A/B testing prompt variants with...	22	Experimental	—	Python
10	dbhavery/promptlab Prompt testing framework — pytest for LLM prompts. Define prompts as YAML,...	21	Experimental	—	Python
11	magifd2/log_analyzer A Python-based CLI tool for analyzing large log files (JSONL) with Large...	21	Experimental	—	Python
12	Personaz1/prompt-qa-lab Regression and evaluation toolkit for prompt and agent output quality	21	Experimental	—	Python
13	rldyourmnd/local-llm-prompt-optimizer Offline prompt A/B testing, scoring & auto-tuning for local LLMs	20	Experimental	4	Python
14	martinklepsch/llm-web-ui A web UI for the `llm` command line tool	20	Experimental	6	TypeScript
15	prompt-foundry/python-sdk The prompt engineering, prompt management, and prompt evaluation tool for Python	20	Experimental	8	Python
16	artefactop/promptdev A prompt evaluation framework that provides comprehensive testing for AI...	19	Experimental	2	Python
17	mangobanaani/semantic-ui Minimal web interface for Large Language Models using Semantic Kernel	18	Experimental	1	Python
18	prompt-foundry/java-sdk The prompt engineering, prompt management, and prompt evaluation tool for Java.	17	Experimental	1	—
19	prompt-foundry/ruby-sdk The prompt engineering, prompt management, and prompt evaluation tool for Ruby.	17	Experimental	1	—
20	Shawn91/promtrix An intuitive GUI for evaluating and optimizing prompts and LLMs	17	Experimental	1	Python
21	dakshjain-1616/promptfight Minimal prompt A/B testing: run two prompts 30 times, get winner + p-value +...	14	Experimental	—	Python
22	akashjindal423/Promptlab The open-source prompt engineering workbench. Analyse your LLM prompts...	14	Experimental	—	Python
23	ashleysally00/promptfoo-quickstart-guide Quickstart guide for using PromptFoo to evaluate LLM prompts via CLI or Colab.	14	Experimental	1	—
24	vesper-astrena/promptlab Test and compare LLM prompts. Measure response time, tokens, and cost....	14	Experimental	1	Python
25	oruizramos/Blender-structured-knowledge-FAQ-retrieval PromptLab is a Python experimental framework for systematic prompt...	13	Experimental	—	Python
26	fernandoxx73/department-of-truth An experimental Python interface testing LLM constraint enforcement. It...	13	Experimental	—	Python
27	EltonCN/toolpy Python module made to facilitate the creation of tools using LLMs.	13	Experimental	—	Python
28	theishanpathak/prompt-tester Precision API analytics engine developed in Java 17 to track LLM usage...	13	Experimental	—	Java
29	joncoded/keywords keying in those words to understand them better (Next.js + Llama LLM + decap CMS)	13	Experimental	—	TypeScript
30	albipuliga/PromptLab Mange, test, and compare you prompts with different models.	13	Experimental	—	Vue
31	orange0214/auto-prompt-tuner A Feedback-Driven LLM Pipeline for Automatic Prompt Optimization	12	Experimental	1	Python
32	prompt-foundry/kotlin-sdk The prompt engineering, prompt management, and prompt evaluation tool for Kotlin.	11	Experimental	—	—
33	prompt-foundry/dotnet-sdk The prompt engineering, prompt management, and prompt evaluation tool for C# and .NET	11	Experimental	—	—
34	sayheyrey/py-prompt-qa python prompt testing script	10	Experimental	1	Python