chziakas/redeval

A library for red-teaming LLM applications with LLMs.

37
/ 100
Emerging

Before deploying a Large Language Model (LLM) in a real-world application, this tool helps you find its weaknesses and potential failure points. It takes your LLM application and automatically tests it against various simulated scenarios, producing detailed reports on how it performed. This is for anyone responsible for the safety, reliability, or performance of an LLM-powered product, such as an AI product manager, an ethics and safety specialist, or an operations engineer.

No commits in the last 6 months.

Use this if you need to thoroughly test an LLM application for vulnerabilities like manipulation, deception, or generating toxic content before it interacts with actual users.

Not ideal if you need to evaluate an LLM's raw academic benchmark performance or if your application doesn't involve conversational interactions.

AI-safety LLM-auditing product-testing risk-assessment conversational-AI
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 14 / 25

How are scores calculated?

Stars

29

Forks

5

Language

Python

License

Apache-2.0

Last pushed

Oct 11, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/chziakas/redeval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.