LLM Bias Evaluation LLM Tools

Tools and frameworks for detecting, measuring, and auditing biases in large language models across domains like mental health, hiring, news, and stereotypes. Includes bias benchmarks, evaluation metrics, and mitigation techniques. Does NOT include general fairness frameworks, bias in other ML models, or non-LLM applications.

There are 19 llm bias evaluation tools tracked. 1 score above 50 (established tier). The highest-rated is cvs-health/langfair at 60/100 with 255 stars.

Get all 19 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-bias-evaluation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 cvs-health/langfair

LangFair is a Python library for conducting use-case level LLM bias and...

60
Established
2 BetterForAll/HonestyMeter

HonestyMeter: An NLP-powered framework for evaluating objectivity and bias...

37
Emerging
3 bws82/biasclear

Structural bias detection and correction engine built on Persistent...

34
Emerging
4 KID-22/LLM-IR-Bias-Fairness-Survey

This is the repo for the survey of Bias and Fairness in IR with LLMs.

33
Emerging
5 Hanpx20/SafeSwitch

Official code repository for the paper "Internal Activation as the Polar...

30
Emerging
6 faiyazabdullah/TranslationTangles

Uncovering Performance Gaps and Bias Patterns in LLM-Based Translations...

25
Experimental
7 UltraDeep-Tech/lcb-bench

LLM Cognitive Bias Benchmark: 1,500 test cases measuring 30 cognitive biases...

22
Experimental
8 minnesotanlp/cobbler

Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases...

22
Experimental
9 zhuohaoyu/KIEval

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large...

21
Experimental
10 grecosalvatore/StereoBusters-GSI-Detect-Evalita2026

This repository contains the code of the team StereoBusters for the Evalita...

18
Experimental
11 tddschn/llm-biases

LLM Biases Research

17
Experimental
12 Robert-Morabito/STOP

Repository for the paper STOP! Benchmarking Large Language Models with...

17
Experimental
13 gopi703/cultural-advice-bias

🌍 Visualize cultural bias in AI therapy advice, revealing how local...

14
Experimental
14 Pikeras72/EQUITIA

Tool for the automatic assessment of biases in LLM models

13
Experimental
15 AndrewHeller17/Effect-of-Emotional-Framing-on-LLM-Performance

Evaluated the impact of emotional prompt framing on LLM reasoning accuracy...

13
Experimental
16 Trust4AI/GUARD-ME

AI-guided Evaluator for Bias Detection using Metamorphic Testing

13
Experimental
17 charlie-campanella/big-city-bias

Code for the paper "Big City Bias: Evaluating the Impact of Metropolitan...

11
Experimental
18 JayanaGunaweera01/EthAIAuditHub

An automated, collaborative ethical bias auditing platform for ML models....

11
Experimental
19 steinathan/bullshitmeter

This is a super-powered bullshit detector that can measure the amount of...

10
Experimental