sean-chester/generalised-brown
C++ implementation of Generalised Brown clustering and python scripts for feature generation
This tool helps researchers and natural language processing practitioners group similar words together based on how they're used in text. You provide a text corpus, and it generates lists of words that belong in the same "cluster." This is useful for understanding word relationships or preparing data for other language models.
No commits in the last 6 months.
Use this if you need to create word clusters with flexible granularity, allowing you to choose how many clusters you want to generate from a pre-computed merge list.
Not ideal if you're looking for a simple, out-of-the-box solution that doesn't require compiling C++ code or running Python scripts via the command line.
Stars
41
Forks
5
Language
C++
License
—
Category
Last pushed
Apr 08, 2016
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/sean-chester/generalised-brown"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
MIND-Lab/OCTIS
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models...
i-dot-ai/themefinder
A topic modelling Python package for analysing one-to-many question-answer data.
andifunke/topic-labeling
The project proposes a framework to apply topic models on a text-corpus and eventually topic...
bab2min/tomotopy
Python package of Tomoto, the Topic Modeling Tool
bobxwu/TopMost
A Topic Modeling System Toolkit (ACL 2024 Demo)