Prompt Experimentation Platforms Prompt Engineering Tools

Tools for systematic A/B testing, comparison, and evaluation of LLM prompts across multiple models and variants. Includes statistical analysis, cost/performance measurement, and playground environments for prompt optimization. Does NOT include prompt templates, prompt collections, general LLM evaluation frameworks, or prompt management without experimentation features.

There are 34 prompt experimentation platforms tools tracked. The highest-rated is Mirascope/lilypad at 48/100 with 214 stars.

Get all 34 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=prompt-engineering&subcategory=prompt-experimentation-platforms&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 Mirascope/lilypad

Open-source versioning, tracing, and annotation tooling.

48
Emerging
2 Supervertaler/Supervertaler-Workbench

Open-source, AI-enhanced CAT tool with multi-LLM support, translation...

48
Emerging
3 crjaensch/PromptoLab

A multi-platform app to serve as a prompts catalog, a LLM playground for...

44
Emerging
4 parea-ai/parea-sdk-py

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered...

37
Emerging
5 geeknees/sentinel_rb

SentinelRb is an LLM-driven prompt inspector designed to automatically...

27
Experimental
6 NeuroTinkerLab/synt-e-project

A Python tool to translate natural language requests into efficient,...

23
Experimental
7 tmam-dev/tmam-python-sdk

An open-source LLM engineering platform featuring observability, metrics,...

23
Experimental
8 jeong-se-hun/autotune-skill

Eval-first tuning skill for prompts, docs, skills, and code with guards,...

22
Experimental
9 MukundaKatta/PromptLab

Prompt experimentation workspace — A/B testing prompt variants with...

22
Experimental
10 dbhavery/promptlab

Prompt testing framework — pytest for LLM prompts. Define prompts as YAML,...

21
Experimental
11 magifd2/log_analyzer

A Python-based CLI tool for analyzing large log files (JSONL) with Large...

21
Experimental
12 Personaz1/prompt-qa-lab

Regression and evaluation toolkit for prompt and agent output quality

21
Experimental
13 rldyourmnd/local-llm-prompt-optimizer

Offline prompt A/B testing, scoring & auto-tuning for local LLMs

20
Experimental
14 martinklepsch/llm-web-ui

A web UI for the `llm` command line tool

20
Experimental
15 prompt-foundry/python-sdk

The prompt engineering, prompt management, and prompt evaluation tool for Python

20
Experimental
16 artefactop/promptdev

A prompt evaluation framework that provides comprehensive testing for AI...

19
Experimental
17 mangobanaani/semantic-ui

Minimal web interface for Large Language Models using Semantic Kernel

18
Experimental
18 prompt-foundry/java-sdk

The prompt engineering, prompt management, and prompt evaluation tool for Java.

17
Experimental
19 prompt-foundry/ruby-sdk

The prompt engineering, prompt management, and prompt evaluation tool for Ruby.

17
Experimental
20 Shawn91/promtrix

An intuitive GUI for evaluating and optimizing prompts and LLMs

17
Experimental
21 dakshjain-1616/promptfight

Minimal prompt A/B testing: run two prompts 30 times, get winner + p-value +...

14
Experimental
22 akashjindal423/Promptlab

The open-source prompt engineering workbench. Analyse your LLM prompts...

14
Experimental
23 ashleysally00/promptfoo-quickstart-guide

Quickstart guide for using PromptFoo to evaluate LLM prompts via CLI or Colab.

14
Experimental
24 vesper-astrena/promptlab

Test and compare LLM prompts. Measure response time, tokens, and cost....

14
Experimental
25 oruizramos/Blender-structured-knowledge-FAQ-retrieval

PromptLab is a Python experimental framework for systematic prompt...

13
Experimental
26 fernandoxx73/department-of-truth

An experimental Python interface testing LLM constraint enforcement. It...

13
Experimental
27 EltonCN/toolpy

Python module made to facilitate the creation of tools using LLMs.

13
Experimental
28 theishanpathak/prompt-tester

Precision API analytics engine developed in Java 17 to track LLM usage...

13
Experimental
29 joncoded/keywords

keying in those words to understand them better (Next.js + Llama LLM + decap CMS)

13
Experimental
30 albipuliga/PromptLab

Mange, test, and compare you prompts with different models.

13
Experimental
31 orange0214/auto-prompt-tuner

A Feedback-Driven LLM Pipeline for Automatic Prompt Optimization

12
Experimental
32 prompt-foundry/kotlin-sdk

The prompt engineering, prompt management, and prompt evaluation tool for Kotlin.

11
Experimental
33 prompt-foundry/dotnet-sdk

The prompt engineering, prompt management, and prompt evaluation tool for C# and .NET

11
Experimental
34 sayheyrey/py-prompt-qa

python prompt testing script

10
Experimental