google/litmus
Litmus is a comprehensive LLM testing and evaluation tool designed for GenAI Application Development. It provides a robust platform with a user-friendly UI for streamlining the process of building and assessing the performance of your LLM-powered applications.
Litmus is a tool for developers working with Generative AI. It helps you thoroughly test and evaluate your Large Language Model (LLM) applications by providing a platform to define test cases, run them against your LLM, and compare the outputs to expected results. Developers can use a web interface to set up tests, feed in various inputs, and then analyze the LLM's responses using AI-powered evaluation metrics.
Use this if you are developing applications powered by Large Language Models and need a robust way to systematically test, evaluate, and monitor their performance and quality.
Not ideal if you are an end-user simply consuming an LLM application and don't need to build or evaluate one yourself.
Stars
45
Forks
8
Language
Vue
License
Apache-2.0
Category
Last pushed
Feb 20, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/google/litmus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Related tools
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral,...
IBM/unitxt
🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the...
lean-dojo/LeanDojo
Tool for data extraction and interacting with Lean programmatically.
GoodStartLabs/AI_Diplomacy
Frontier Models playing the board game Diplomacy.
NatLabRockies/COMPASS
INFRA-COMPASS is a tool that leverages Large Language Models (LLMs) to create and maintain an...