IS2Lab/S-Eval
S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models
This project provides a comprehensive set of evaluation prompts to test the safety of Large Language Models (LLMs) against various harmful outputs. It takes LLM responses to these prompts as input and helps identify if the model generates content related to crimes, hate speech, privacy violations, or other unsafe categories. This is primarily for AI safety researchers and developers who are building or deploying LLMs and need to ensure their models are not generating problematic content.
111 stars.
Use this if you need a structured, multi-dimensional benchmark to systematically assess the safety performance of your Large Language Models.
Not ideal if you are a casual user looking for a simple, single-metric safety check for a pre-existing LLM.
Stars
111
Forks
6
Language
—
License
—
Category
Last pushed
Feb 13, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/IS2Lab/S-Eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents