aiverify-foundation/moonshot

Moonshot - A simple and modular tool to evaluate and red-team any LLM application.

/ 100

Established

This tool helps AI developers and compliance teams rigorously test and validate their Large Language Model (LLM) applications. You provide your LLM application, and Moonshot delivers comprehensive reports on its performance, safety, and vulnerabilities. It is ideal for those responsible for ensuring the reliability and trustworthiness of LLM-powered products before deployment.

315 stars.

Use this if you need to systematically evaluate the safety, reliability, and performance of an LLM application or LLM, using both benchmark tests and adversarial 'red team' attacks.

Not ideal if you are looking for a tool to develop or fine-tune LLMs, rather than test existing ones.

LLM-evaluation AI-safety compliance-testing red-teaming application-security

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 22 / 25

How are scores calculated?

Stars

315

Forks

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Related tools

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights