clipperhouse/jargon
Tokenizers and lemmatizers for Go
This tool helps you process text to standardize variations of terms, like 'React.js' and 'REACTJS' into a single 'reactjs'. It takes unstructured text, like job descriptions or articles, and outputs a stream of consistent, standardized terms. Anyone building search applications, analyzing text data, or needing to ensure consistent vocabulary in large text corpuses will find this useful.
113 stars. No commits in the last 6 months.
Use this if you need to clean and standardize technical terms, company names, or other domain-specific jargon in text for consistent searching or analysis.
Not ideal if your primary need is general-purpose linguistic analysis like grammar checking or sentiment analysis, rather than term standardization.
Stars
113
Forks
3
Language
Go
License
MIT
Category
Last pushed
Sep 02, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/clipperhouse/jargon"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ikawaha/kagome-dict
Dictionary Library for Kagome v2
aaaton/golem
A lemmatizer implemented in Go
habeanf/yap
Yet Another (natural language) Parser
clipperhouse/uax29
A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split graphemes, words, sentences.
abadojack/whatlanggo
Natural language detection library for Go