Llm Bias Evaluation Transformer Models
There are 23 llm bias evaluation models tracked. 1 score above 50 (established tier). The highest-rated is google-deepmind/long-form-factuality at 55/100 with 672 stars.
Get all 23 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-bias-evaluation&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
google-deepmind/long-form-factuality
Benchmarking long-form factuality in large language models. Original code... |
|
Established |
| 2 |
gnai-creator/aletheion-llm-v2
Decoder-only LLM with integrated epistemic tomography. Knows what it doesn't know. |
|
Emerging |
| 3 |
sandylaker/ib-edl
Calibrating LLMs with Information-Theoretic Evidential Deep Learning (ICLR 2025) |
|
Emerging |
| 4 |
nightdessert/Retrieval_Head
open-source code for paper: Retrieval Head Mechanistically Explains... |
|
Emerging |
| 5 |
MLD3/steerability
An open-source evaluation framework for measuring LLM steerability. |
|
Emerging |
| 6 |
kazemihabib/Mitigating-Reasoning-LLM-Social-Bias
A novel approach to mitigating social bias in Large Language Models through... |
|
Emerging |
| 7 |
EternityYW/BiasEval-LLM-MentalHealth
Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models |
|
Emerging |
| 8 |
aigc-apps/PertEval
[NeurIPS '24 Spotlight] PertEval: Unveiling Real Knowledge Capacity of LLMs... |
|
Emerging |
| 9 |
bowen-upenn/llm_token_bias
[EMNLP 2024] A Peek into Token Bias: Large Language Models Are Not Yet... |
|
Emerging |
| 10 |
chandar-lab/CAIRO
We explain why fairness metrics don't correlate and propose CAIRO to make... |
|
Emerging |
| 11 |
xingbpshen/medical-calibration-fairness-mllm
[MICCAI 2025] The official implementation of the paper "Exposing and... |
|
Experimental |
| 12 |
x-zheng16/CALM
[AAAI 25] CALM: Curiosity-Driven Auditing for LLMs |
|
Experimental |
| 13 |
fannie1208/FactTest
[ICML2025] "FactTest: Factuality Testing in Large Language Models with... |
|
Experimental |
| 14 |
fabthebest/EIC_Framework_Calibration
LLM decision-calibration engine based on Shannon Entropy and semantic... |
|
Experimental |
| 15 |
jwmke/BiasCompass
Using LLMs to detect bias in news articles. |
|
Experimental |
| 16 |
joaoaleite/PASTEL
PASTEL (Prompted weAk Supervision wiTh crEdibility signaLs) is a weakly... |
|
Experimental |
| 17 |
datos-Fundar/sesgos_LLM
¿Cómo “se equivocan” los modelos LLM? |
|
Experimental |
| 18 |
brucelyu17/SC-TC-Bench
[FAccT '25] Characterizing Bias: Benchmarking LLMs in Simplified versus... |
|
Experimental |
| 19 |
mtichikawa/llm-bias-detection
Research project detecting and quantifying demographic bias in language models |
|
Experimental |
| 20 |
Wazzabeee/Bias-Mitigation-In-LLM
Research POC on the mitigation of bias in large language models (FLAN-T5 and... |
|
Experimental |
| 21 |
Indiiigo/LLM_rep_review
Systematic Review of the Demographic Representativeness of LLMs |
|
Experimental |
| 22 |
cognitivefactory/llm-bias-analysis
Benchmark tool aimed at evaluating biases of large language models |
|
Experimental |
| 23 |
anoopkdcs/affective_bias_in_plm
Affevtive Bias in Large Pre-trained Language Models |
|
Experimental |