Llm Bias Evaluation Transformer Models

There are 23 llm bias evaluation models tracked. 1 score above 50 (established tier). The highest-rated is google-deepmind/long-form-factuality at 55/100 with 672 stars.

Get all 23 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-bias-evaluation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 google-deepmind/long-form-factuality

Benchmarking long-form factuality in large language models. Original code...

55
Established
2 gnai-creator/aletheion-llm-v2

Decoder-only LLM with integrated epistemic tomography. Knows what it doesn't know.

38
Emerging
3 sandylaker/ib-edl

Calibrating LLMs with Information-Theoretic Evidential Deep Learning (ICLR 2025)

37
Emerging
4 nightdessert/Retrieval_Head

open-source code for paper: Retrieval Head Mechanistically Explains...

33
Emerging
5 MLD3/steerability

An open-source evaluation framework for measuring LLM steerability.

33
Emerging
6 kazemihabib/Mitigating-Reasoning-LLM-Social-Bias

A novel approach to mitigating social bias in Large Language Models through...

32
Emerging
7 EternityYW/BiasEval-LLM-MentalHealth

Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models

32
Emerging
8 aigc-apps/PertEval

[NeurIPS '24 Spotlight] PertEval: Unveiling Real Knowledge Capacity of LLMs...

31
Emerging
9 bowen-upenn/llm_token_bias

[EMNLP 2024] A Peek into Token Bias: Large Language Models Are Not Yet...

30
Emerging
10 chandar-lab/CAIRO

We explain why fairness metrics don't correlate and propose CAIRO to make...

30
Emerging
11 xingbpshen/medical-calibration-fairness-mllm

[MICCAI 2025] The official implementation of the paper "Exposing and...

25
Experimental
12 x-zheng16/CALM

[AAAI 25] CALM: Curiosity-Driven Auditing for LLMs

25
Experimental
13 fannie1208/FactTest

[ICML2025] "FactTest: Factuality Testing in Large Language Models with...

23
Experimental
14 fabthebest/EIC_Framework_Calibration

LLM decision-calibration engine based on Shannon Entropy and semantic...

21
Experimental
15 jwmke/BiasCompass

Using LLMs to detect bias in news articles.

20
Experimental
16 joaoaleite/PASTEL

PASTEL (Prompted weAk Supervision wiTh crEdibility signaLs) is a weakly...

19
Experimental
17 datos-Fundar/sesgos_LLM

¿Cómo “se equivocan” los modelos LLM?

18
Experimental
18 brucelyu17/SC-TC-Bench

[FAccT '25] Characterizing Bias: Benchmarking LLMs in Simplified versus...

17
Experimental
19 mtichikawa/llm-bias-detection

Research project detecting and quantifying demographic bias in language models

14
Experimental
20 Wazzabeee/Bias-Mitigation-In-LLM

Research POC on the mitigation of bias in large language models (FLAN-T5 and...

12
Experimental
21 Indiiigo/LLM_rep_review

Systematic Review of the Demographic Representativeness of LLMs

11
Experimental
22 cognitivefactory/llm-bias-analysis

Benchmark tool aimed at evaluating biases of large language models

11
Experimental
23 anoopkdcs/affective_bias_in_plm

Affevtive Bias in Large Pre-trained Language Models

11
Experimental