hammer-mt/thumb
A simple prompt testing library for LLMs.
This tool helps AI developers and prompt engineers refine their large language model (LLM) prompts. You provide different prompt variations and input scenarios, and it generates responses from LLMs. The tool then presents a user interface for blind rating of these responses, along with performance metrics like average score, token usage, and cost, allowing you to choose the most effective prompt.
No commits in the last 6 months. Available on PyPI.
Use this if you need to systematically compare different LLM prompts or models for effectiveness, cost, or response quality across various input conditions.
Not ideal if you're not an AI developer or prompt engineer, or if you need to test non-LLM models or complex multi-turn conversational flows outside of simple system messages and human/assistant turns.
Stars
28
Forks
2
Language
Python
License
—
Category
Last pushed
May 24, 2025
Commits (30d)
0
Dependencies
6
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/hammer-mt/thumb"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
dottxt-ai/outlines
Structured Outputs
takashiishida/arxiv-to-prompt
Transform arXiv papers into a single LaTeX source that can be used as a prompt for asking LLMs...
microsoft/promptpex
Test Generation for Prompts
Spr-Aachen/LLM-PromptMaster
A simple LLM-Powered chatbot software.
AI-secure/aug-pe
[ICML 2024 Spotlight] Differentially Private Synthetic Data via Foundation Model APIs 2: Text