ruanchaves/hashformers
Accurate word segmentation for hashtags and text, powered by Transformers and Beam Search. A scalable alternative to heuristic splitters and massive LLMs.
When analyzing social media or any text that's missing spaces between words—like #weneedanationalpark or #москвасити—this tool accurately splits them into individual, readable words. It takes unsegmented text strings and outputs correctly segmented phrases. This is for data scientists, social media analysts, or NLP researchers who need to clean and prepare text data for further analysis in any language.
Available on PyPI.
Use this if you need to precisely segment text like hashtags or concatenated words at scale, especially when working with various languages or niche vocabularies where pre-built dictionaries are insufficient.
Not ideal if your main concerns are very low latency and extremely high scalability where even small language models are too slow, or if you only need to segment a very small volume of items.
Stars
77
Forks
5
Language
Python
License
MIT
Category
Last pushed
Jan 08, 2026
Commits (30d)
0
Dependencies
3
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ruanchaves/hashformers"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ThilinaRajapakse/simpletransformers
Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling,...
jsksxs360/How-to-use-Transformers
Transformers 库快速入门教程
google/deepconsensus
DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences...
Denis2054/Transformers-for-NLP-2nd-Edition
Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning,...
abhimishra91/transformers-tutorials
Github repo with tutorials to fine tune transformers for diff NLP tasks