hammer-mt/thumb

A simple prompt testing library for LLMs.

33
/ 100
Emerging

This tool helps AI developers and prompt engineers refine their large language model (LLM) prompts. You provide different prompt variations and input scenarios, and it generates responses from LLMs. The tool then presents a user interface for blind rating of these responses, along with performance metrics like average score, token usage, and cost, allowing you to choose the most effective prompt.

No commits in the last 6 months. Available on PyPI.

Use this if you need to systematically compare different LLM prompts or models for effectiveness, cost, or response quality across various input conditions.

Not ideal if you're not an AI developer or prompt engineer, or if you need to test non-LLM models or complex multi-turn conversational flows outside of simple system messages and human/assistant turns.

prompt-engineering LLM-evaluation AI-development model-testing natural-language-processing
No License Stale 6m
Maintenance 2 / 25
Adoption 7 / 25
Maturity 17 / 25
Community 7 / 25

How are scores calculated?

Stars

28

Forks

2

Language

Python

License

Last pushed

May 24, 2025

Commits (30d)

0

Dependencies

6

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/hammer-mt/thumb"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.