Prompt Experimentation Platforms Prompt Engineering Tools
Tools for systematic A/B testing, comparison, and evaluation of LLM prompts across multiple models and variants. Includes statistical analysis, cost/performance measurement, and playground environments for prompt optimization. Does NOT include prompt templates, prompt collections, general LLM evaluation frameworks, or prompt management without experimentation features.
There are 34 prompt experimentation platforms tools tracked. The highest-rated is Mirascope/lilypad at 48/100 with 214 stars.
Get all 34 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=prompt-engineering&subcategory=prompt-experimentation-platforms&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
Mirascope/lilypad
Open-source versioning, tracing, and annotation tooling. |
|
Emerging |
| 2 |
Supervertaler/Supervertaler-Workbench
Open-source, AI-enhanced CAT tool with multi-LLM support, translation... |
|
Emerging |
| 3 |
crjaensch/PromptoLab
A multi-platform app to serve as a prompts catalog, a LLM playground for... |
|
Emerging |
| 4 |
parea-ai/parea-sdk-py
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered... |
|
Emerging |
| 5 |
geeknees/sentinel_rb
SentinelRb is an LLM-driven prompt inspector designed to automatically... |
|
Experimental |
| 6 |
NeuroTinkerLab/synt-e-project
A Python tool to translate natural language requests into efficient,... |
|
Experimental |
| 7 |
tmam-dev/tmam-python-sdk
An open-source LLM engineering platform featuring observability, metrics,... |
|
Experimental |
| 8 |
jeong-se-hun/autotune-skill
Eval-first tuning skill for prompts, docs, skills, and code with guards,... |
|
Experimental |
| 9 |
MukundaKatta/PromptLab
Prompt experimentation workspace — A/B testing prompt variants with... |
|
Experimental |
| 10 |
dbhavery/promptlab
Prompt testing framework — pytest for LLM prompts. Define prompts as YAML,... |
|
Experimental |
| 11 |
magifd2/log_analyzer
A Python-based CLI tool for analyzing large log files (JSONL) with Large... |
|
Experimental |
| 12 |
Personaz1/prompt-qa-lab
Regression and evaluation toolkit for prompt and agent output quality |
|
Experimental |
| 13 |
rldyourmnd/local-llm-prompt-optimizer
Offline prompt A/B testing, scoring & auto-tuning for local LLMs |
|
Experimental |
| 14 |
martinklepsch/llm-web-ui
A web UI for the `llm` command line tool |
|
Experimental |
| 15 |
prompt-foundry/python-sdk
The prompt engineering, prompt management, and prompt evaluation tool for Python |
|
Experimental |
| 16 |
artefactop/promptdev
A prompt evaluation framework that provides comprehensive testing for AI... |
|
Experimental |
| 17 |
mangobanaani/semantic-ui
Minimal web interface for Large Language Models using Semantic Kernel |
|
Experimental |
| 18 |
prompt-foundry/java-sdk
The prompt engineering, prompt management, and prompt evaluation tool for Java. |
|
Experimental |
| 19 |
prompt-foundry/ruby-sdk
The prompt engineering, prompt management, and prompt evaluation tool for Ruby. |
|
Experimental |
| 20 |
Shawn91/promtrix
An intuitive GUI for evaluating and optimizing prompts and LLMs |
|
Experimental |
| 21 |
dakshjain-1616/promptfight
Minimal prompt A/B testing: run two prompts 30 times, get winner + p-value +... |
|
Experimental |
| 22 |
akashjindal423/Promptlab
The open-source prompt engineering workbench. Analyse your LLM prompts... |
|
Experimental |
| 23 |
ashleysally00/promptfoo-quickstart-guide
Quickstart guide for using PromptFoo to evaluate LLM prompts via CLI or Colab. |
|
Experimental |
| 24 |
vesper-astrena/promptlab
Test and compare LLM prompts. Measure response time, tokens, and cost.... |
|
Experimental |
| 25 |
oruizramos/Blender-structured-knowledge-FAQ-retrieval
PromptLab is a Python experimental framework for systematic prompt... |
|
Experimental |
| 26 |
fernandoxx73/department-of-truth
An experimental Python interface testing LLM constraint enforcement. It... |
|
Experimental |
| 27 |
EltonCN/toolpy
Python module made to facilitate the creation of tools using LLMs. |
|
Experimental |
| 28 |
theishanpathak/prompt-tester
Precision API analytics engine developed in Java 17 to track LLM usage... |
|
Experimental |
| 29 |
joncoded/keywords
keying in those words to understand them better (Next.js + Llama LLM + decap CMS) |
|
Experimental |
| 30 |
albipuliga/PromptLab
Mange, test, and compare you prompts with different models. |
|
Experimental |
| 31 |
orange0214/auto-prompt-tuner
A Feedback-Driven LLM Pipeline for Automatic Prompt Optimization |
|
Experimental |
| 32 |
prompt-foundry/kotlin-sdk
The prompt engineering, prompt management, and prompt evaluation tool for Kotlin. |
|
Experimental |
| 33 |
prompt-foundry/dotnet-sdk
The prompt engineering, prompt management, and prompt evaluation tool for C# and .NET |
|
Experimental |
| 34 |
sayheyrey/py-prompt-qa
python prompt testing script |
|
Experimental |