JasonKessler/scattertext

Beautiful visualizations of how language differs among document types.

/ 100

Established

This tool helps you quickly understand the key differences in language used across distinct groups of text, such as political speeches, customer reviews, or scientific abstracts. You input your collection of documents, along with their assigned categories (e.g., 'Democrat' or 'Republican'). The output is an interactive scatter plot that visually highlights which words and phrases are uniquely characteristic of each category, allowing you to easily identify distinguishing terms. This is ideal for researchers, marketers, analysts, or anyone needing to compare language patterns between two defined text groups.

2,330 stars. No commits in the last 6 months. Available on PyPI.

Use this if you have collections of text categorized into two groups and want to visually discover the specific words and phrases that are most distinctive to each group.

Not ideal if you need to analyze unstructured text without predefined categories or want to perform deep, statistical modeling of language similarities without a visual output.

text-analysis content-comparison market-research political-science discourse-analysis

Stale 6m

Maintenance 2 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 21 / 25

How are scores calculated?

Stars

2,330

Forks

288

Language

Python

License

Apache-2.0

Related frameworks

skrub-data/skrub

Machine learning with dataframes

biolab/orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

root-project/root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

cleanlab/cleanlab

Cleanlab's open-source library is the standard data-centric AI package for data quality and...

drivendataorg/deon

A command line tool to easily add an ethics checklist to your data science projects.

Explore ML Frameworks

All categories Trending ML Framework directory Insights