sean-chester/generalised-brown

C++ implementation of Generalised Brown clustering and python scripts for feature generation

26
/ 100
Experimental

This tool helps researchers and natural language processing practitioners group similar words together based on how they're used in text. You provide a text corpus, and it generates lists of words that belong in the same "cluster." This is useful for understanding word relationships or preparing data for other language models.

No commits in the last 6 months.

Use this if you need to create word clusters with flexible granularity, allowing you to choose how many clusters you want to generate from a pre-computed merge list.

Not ideal if you're looking for a simple, out-of-the-box solution that doesn't require compiling C++ code or running Python scripts via the command line.

natural-language-processing computational-linguistics text-analysis feature-engineering semantic-similarity
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 11 / 25

How are scores calculated?

Stars

41

Forks

5

Language

C++

License

Last pushed

Apr 08, 2016

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/sean-chester/generalised-brown"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.