promptpex and promptly
These are complementary tools: PromptPEx provides a framework for *generating* systematic test cases for prompts, while Promptly supplies a curated *collection* of pre-made prompts to evaluate—one automates test creation, the other supplies evaluation material.
About promptpex
microsoft/promptpex
Test Generation for Prompts
This helps AI developers ensure their AI model prompts consistently produce the desired output. It takes a natural language prompt and its specified rules (like 'output should be JSON'), then automatically generates unit tests to check if different AI models follow these rules. Developers can use this to compare how well various models perform against the same prompt and rules.
About promptly
equinor/promptly
A prompt collection for testing and evaluation of LLMs.
This collection provides pre-written prompts for evaluating and testing Large Language Models (LLMs). It helps you put different LLMs through their paces, feeding them specific questions and scenarios to assess their performance and responses. Scientific programmers and researchers who work with AI models would find this useful for benchmarking and understanding LLM capabilities.
Scores updated daily from GitHub, PyPI, and npm data. How scores work