aiverify-foundation/moonshot
Moonshot - A simple and modular tool to evaluate and red-team any LLM application.
This tool helps AI developers and compliance teams rigorously test and validate their Large Language Model (LLM) applications. You provide your LLM application, and Moonshot delivers comprehensive reports on its performance, safety, and vulnerabilities. It is ideal for those responsible for ensuring the reliability and trustworthiness of LLM-powered products before deployment.
315 stars.
Use this if you need to systematically evaluate the safety, reliability, and performance of an LLM application or LLM, using both benchmark tests and adversarial 'red team' attacks.
Not ideal if you are looking for a tool to develop or fine-tune LLMs, rather than test existing ones.
Stars
315
Forks
60
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 05, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/aiverify-foundation/moonshot"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Related tools
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents