jdkato/prose
:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.
ArchivedBuilt entirely in pure Go with no external dependencies, prose implements a modular NLP pipeline (tokenization → POS tagging → NE extraction) with functional options to disable stages as needed. Its sentence segmenter achieves 75% accuracy on the Golden Rules benchmark while executing 4× faster than Stanford CoreNLP, and its POS tagger outperforms NLTK's implementation (96.1% vs 89.3% accuracy) on the Treebank corpus. The tokenizer handles modern text artifacts like URLs, mentions, hashtags, and emoticons as distinct tokens.
3,069 stars. No commits in the last 6 months.
Stars
3,069
Forks
169
Language
Go
License
MIT
Category
Last pushed
May 02, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/jdkato/prose"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ikawaha/kagome-dict
Dictionary Library for Kagome v2
aaaton/golem
A lemmatizer implemented in Go
habeanf/yap
Yet Another (natural language) Parser
clipperhouse/uax29
A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split graphemes, words, sentences.
abadojack/whatlanggo
Natural language detection library for Go