lechmazur/sycophancy
LLM benchmark and leaderboard for narrator-bias sycophancy, opposite-narrator contradictions, and judgment consistency.
This benchmark helps evaluate how consistently large language models (LLMs) make judgments when presented with the same scenario from opposing first-person perspectives. It takes in a set of dispute cases, each presented from a neutral view and multiple first-person views (some with emotional framing). The output is a leaderboard ranking models by their tendency to agree with both sides of a dispute, or to reject both, helping practitioners understand a model's bias and judgment stability. This is for AI researchers, product managers, or evaluators who work with LLMs.
Use this if you need to assess the fairness and consistency of an LLM's judgments, especially when the model is exposed to biased or emotionally charged narratives.
Not ideal if you are looking for a general-purpose LLM performance benchmark or if your primary concern is factual accuracy rather than narrator-bias sycophancy.
Stars
13
Forks
—
Language
—
License
—
Category
Last pushed
Mar 09, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/lechmazur/sycophancy"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents