hammer-mt/thumb

A simple prompt testing library for LLMs.

/ 100

Emerging

This tool helps AI developers and prompt engineers refine their large language model (LLM) prompts. You provide different prompt variations and input scenarios, and it generates responses from LLMs. The tool then presents a user interface for blind rating of these responses, along with performance metrics like average score, token usage, and cost, allowing you to choose the most effective prompt.

No commits in the last 6 months. Available on PyPI.

Use this if you need to systematically compare different LLM prompts or models for effectiveness, cost, or response quality across various input conditions.

Not ideal if you're not an AI developer or prompt engineer, or if you need to test non-LLM models or complex multi-turn conversational flows outside of simple system messages and human/assistant turns.

prompt-engineering LLM-evaluation AI-development model-testing natural-language-processing

No License Stale 6m

Maintenance 2 / 25

Adoption 7 / 25

Maturity 17 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

dottxt-ai/outlines

Structured Outputs

takashiishida/arxiv-to-prompt

Transform arXiv papers into a single LaTeX source that can be used as a prompt for asking LLMs...

microsoft/promptpex

Test Generation for Prompts

Spr-Aachen/LLM-PromptMaster

A simple LLM-Powered chatbot software.

AI-secure/aug-pe

[ICML 2024 Spotlight] Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Explore Prompt Engineering Tools

All categories Trending Prompt Engineering directory Insights