socialfoundations/folktexts
Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data!
This tool helps researchers, data scientists, and analysts evaluate how well large language models (LLMs) perform on predicting real-world outcomes from survey data. You feed in an LLM and survey-derived questions, and it outputs statistical metrics on the LLM's uncertainty, calibration, accuracy, and fairness. It's designed for those who need to rigorously test LLM capabilities on human-centric prediction tasks.
Available on PyPI.
Use this if you are developing or deploying an LLM for tasks like income prediction or demographic analysis, and you need to thoroughly benchmark its statistical reliability and potential biases.
Not ideal if you are looking for a general-purpose LLM fine-tuning library or if your evaluation needs don't involve outcome prediction from structured survey data.
Stars
25
Forks
4
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Dec 14, 2025
Commits (30d)
0
Dependencies
14
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/socialfoundations/folktexts"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
PaddlePaddle/PaddleNLP
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
meta-llama/llama-cookbook
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started...
arcee-ai/mergekit
Tools for merging pretrained large language models.
changyeyu/LLM-RL-Visualized
๐100+ ๅๅ LLM / RL ๅ็ๅพ๐๏ผใๅคงๆจกๅ็ฎๆณใไฝ่ ๅทจ็ฎ๏ผ๐ฅ๏ผ100+ LLM/RL Algorithm Maps ๏ผ
mindspore-lab/step_into_llm
MindSpore online courses: Step into LLM