socialfoundations/folktexts

Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data!

/ 100

Established

This tool helps researchers, data scientists, and analysts evaluate how well large language models (LLMs) perform on predicting real-world outcomes from survey data. You feed in an LLM and survey-derived questions, and it outputs statistical metrics on the LLM's uncertainty, calibration, accuracy, and fairness. It's designed for those who need to rigorously test LLM capabilities on human-centric prediction tasks.

Available on PyPI.

Use this if you are developing or deploying an LLM for tasks like income prediction or demographic analysis, and you need to thoroughly benchmark its statistical reliability and potential biases.

Not ideal if you are looking for a general-purpose LLM fine-tuning library or if your evaluation needs don't involve outcome prediction from structured survey data.

LLM-evaluation survey-data-analysis predictive-modeling algorithmic-fairness statistical-benchmarking

Maintenance 6 / 25

Adoption 7 / 25

Maturity 25 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Related models

PaddlePaddle/PaddleNLP

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

meta-llama/llama-cookbook

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started...

arcee-ai/mergekit

Tools for merging pretrained large language models.

changyeyu/LLM-RL-Visualized

🌟100+ 原创 LLM / RL 原理图📚，《大模型算法》作者巨献！💥（100+ LLM/RL Algorithm Maps ）

mindspore-lab/step_into_llm

MindSpore online courses: Step into LLM

Explore Transformer Models

All categories Trending Transformer directory Insights