rmovva/HypotheSAEs

HypotheSAEs: hypothesizing interpretable relationships in text datasets using sparse autoencoders. https://arxiv.org/abs/2502.04382

64
/ 100
Established

HypotheSAEs helps researchers and analysts uncover meaningful patterns in large text datasets, like why certain news headlines get more clicks or which political party a speech belongs to. You input your collection of texts alongside a target outcome (e.g., engagement metrics, political affiliation), and it outputs clear, human-readable explanations of concepts within the text that predict that outcome. This tool is designed for anyone working with textual data who needs to understand the underlying drivers behind observed trends or classifications.

Available on PyPI.

Use this if you have a dataset of texts and a related variable, and you want to discover specific concepts or themes within those texts that explain why that variable changes or behaves the way it does.

Not ideal if your text data offers no predictive signal for your target variable, or if you require interpretations for extremely long documents exceeding 500 words without prior chunking or summarization.

text-analytics market-research social-science content-strategy political-science
Maintenance 10 / 25
Adoption 9 / 25
Maturity 25 / 25
Community 20 / 25

How are scores calculated?

Stars

77

Forks

24

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

Mar 08, 2026

Commits (30d)

0

Dependencies

16

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/rmovva/HypotheSAEs"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.