JasonKessler/scattertext
Beautiful visualizations of how language differs among document types.
This tool helps you quickly understand the key differences in language used across distinct groups of text, such as political speeches, customer reviews, or scientific abstracts. You input your collection of documents, along with their assigned categories (e.g., 'Democrat' or 'Republican'). The output is an interactive scatter plot that visually highlights which words and phrases are uniquely characteristic of each category, allowing you to easily identify distinguishing terms. This is ideal for researchers, marketers, analysts, or anyone needing to compare language patterns between two defined text groups.
2,330 stars. No commits in the last 6 months. Available on PyPI.
Use this if you have collections of text categorized into two groups and want to visually discover the specific words and phrases that are most distinctive to each group.
Not ideal if you need to analyze unstructured text without predefined categories or want to perform deep, statistical modeling of language similarities without a visual output.
Stars
2,330
Forks
288
Language
Python
License
Apache-2.0
Category
Last pushed
Apr 29, 2025
Commits (30d)
0
Dependencies
9
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/JasonKessler/scattertext"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
skrub-data/skrub
Machine learning with dataframes
biolab/orange3
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
root-project/root
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
cleanlab/cleanlab
Cleanlab's open-source library is the standard data-centric AI package for data quality and...
drivendataorg/deon
A command line tool to easily add an ethics checklist to your data science projects.