jdkato/prose

:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

Archived

/ 100

Emerging

Built entirely in pure Go with no external dependencies, prose implements a modular NLP pipeline (tokenization → POS tagging → NE extraction) with functional options to disable stages as needed. Its sentence segmenter achieves 75% accuracy on the Golden Rules benchmark while executing 4× faster than Stanford CoreNLP, and its POS tagger outperforms NLTK's implementation (96.1% vs 89.3% accuracy) on the Treebank corpus. The tokenizer handles modern text artifacts like URLs, mentions, hashtags, and emoticons as distinct tokens.

3,069 stars. No commits in the last 6 months.

Archived Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

3,069

Forks

169

Language

License

MIT

Higher-rated alternatives

ikawaha/kagome-dict

Dictionary Library for Kagome v2

aaaton/golem

A lemmatizer implemented in Go

habeanf/yap

Yet Another (natural language) Parser

clipperhouse/uax29

A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split graphemes, words, sentences.

abadojack/whatlanggo

Natural language detection library for Go

Explore NLP Tools

All categories Trending NLP directory Insights