socialfoundations/folktexts

Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data!

51
/ 100
Established

This tool helps researchers, data scientists, and analysts evaluate how well large language models (LLMs) perform on predicting real-world outcomes from survey data. You feed in an LLM and survey-derived questions, and it outputs statistical metrics on the LLM's uncertainty, calibration, accuracy, and fairness. It's designed for those who need to rigorously test LLM capabilities on human-centric prediction tasks.

Available on PyPI.

Use this if you are developing or deploying an LLM for tasks like income prediction or demographic analysis, and you need to thoroughly benchmark its statistical reliability and potential biases.

Not ideal if you are looking for a general-purpose LLM fine-tuning library or if your evaluation needs don't involve outcome prediction from structured survey data.

LLM-evaluation survey-data-analysis predictive-modeling algorithmic-fairness statistical-benchmarking
Maintenance 6 / 25
Adoption 7 / 25
Maturity 25 / 25
Community 13 / 25

How are scores calculated?

Stars

25

Forks

4

Language

Jupyter Notebook

License

MIT

Last pushed

Dec 14, 2025

Commits (30d)

0

Dependencies

14

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/socialfoundations/folktexts"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.