JasonKessler/scattertext

Beautiful visualizations of how language differs among document types.

58
/ 100
Established

This tool helps you quickly understand the key differences in language used across distinct groups of text, such as political speeches, customer reviews, or scientific abstracts. You input your collection of documents, along with their assigned categories (e.g., 'Democrat' or 'Republican'). The output is an interactive scatter plot that visually highlights which words and phrases are uniquely characteristic of each category, allowing you to easily identify distinguishing terms. This is ideal for researchers, marketers, analysts, or anyone needing to compare language patterns between two defined text groups.

2,330 stars. No commits in the last 6 months. Available on PyPI.

Use this if you have collections of text categorized into two groups and want to visually discover the specific words and phrases that are most distinctive to each group.

Not ideal if you need to analyze unstructured text without predefined categories or want to perform deep, statistical modeling of language similarities without a visual output.

text-analysis content-comparison market-research political-science discourse-analysis
Stale 6m
Maintenance 2 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 21 / 25

How are scores calculated?

Stars

2,330

Forks

288

Language

Python

License

Apache-2.0

Last pushed

Apr 29, 2025

Commits (30d)

0

Dependencies

9

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/JasonKessler/scattertext"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.