LLM Bias Evaluation LLM Tools

Tools and frameworks for detecting, measuring, and auditing biases in large language models across domains like mental health, hiring, news, and stereotypes. Includes bias benchmarks, evaluation metrics, and mitigation techniques. Does NOT include general fairness frameworks, bias in other ML models, or non-LLM applications.

There are 19 llm bias evaluation tools tracked. 1 score above 50 (established tier). The highest-rated is cvs-health/langfair at 60/100 with 255 stars.

Get all 19 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-bias-evaluation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	cvs-health/langfair LangFair is a Python library for conducting use-case level LLM bias and...	60	Established	255	Python
2	BetterForAll/HonestyMeter HonestyMeter: An NLP-powered framework for evaluating objectivity and bias...	37	Emerging	26	TypeScript
3	bws82/biasclear Structural bias detection and correction engine built on Persistent...	34	Emerging	1	Python
4	KID-22/LLM-IR-Bias-Fairness-Survey This is the repo for the survey of Bias and Fairness in IR with LLMs.	33	Emerging	59	—
5	Hanpx20/SafeSwitch Official code repository for the paper "Internal Activation as the Polar...	30	Emerging	13	Jupyter Notebook
6	faiyazabdullah/TranslationTangles Uncovering Performance Gaps and Bias Patterns in LLM-Based Translations...	25	Experimental	2	Jupyter Notebook
7	UltraDeep-Tech/lcb-bench LLM Cognitive Bias Benchmark: 1,500 test cases measuring 30 cognitive biases...	22	Experimental	—	Python
8	minnesotanlp/cobbler Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases...	22	Experimental	22	Jupyter Notebook
9	zhuohaoyu/KIEval [ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large...	21	Experimental	39	Python
10	grecosalvatore/StereoBusters-GSI-Detect-Evalita2026 This repository contains the code of the team StereoBusters for the Evalita...	18	Experimental	1	Jupyter Notebook
11	tddschn/llm-biases LLM Biases Research	17	Experimental	1	—
12	Robert-Morabito/STOP Repository for the paper STOP! Benchmarking Large Language Models with...	17	Experimental	1	Python
13	gopi703/cultural-advice-bias 🌍 Visualize cultural bias in AI therapy advice, revealing how local...	14	Experimental	—	Python
14	Pikeras72/EQUITIA Tool for the automatic assessment of biases in LLM models	13	Experimental	—	Python
15	AndrewHeller17/Effect-of-Emotional-Framing-on-LLM-Performance Evaluated the impact of emotional prompt framing on LLM reasoning accuracy...	13	Experimental	—	Jupyter Notebook
16	Trust4AI/GUARD-ME AI-guided Evaluator for Bias Detection using Metamorphic Testing	13	Experimental	—	TypeScript
17	charlie-campanella/big-city-bias Code for the paper "Big City Bias: Evaluating the Impact of Metropolitan...	11	Experimental	—	TypeScript
18	JayanaGunaweera01/EthAIAuditHub An automated, collaborative ethical bias auditing platform for ML models....	11	Experimental	—	Jupyter Notebook
19	steinathan/bullshitmeter This is a super-powered bullshit detector that can measure the amount of...	10	Experimental	2	Python