ruanchaves/hashformers

Accurate word segmentation for hashtags and text, powered by Transformers and Beam Search. A scalable alternative to heuristic splitters and massive LLMs.

/ 100

Emerging

When analyzing social media or any text that's missing spaces between words—like #weneedanationalpark or #москвасити—this tool accurately splits them into individual, readable words. It takes unsegmented text strings and outputs correctly segmented phrases. This is for data scientists, social media analysts, or NLP researchers who need to clean and prepare text data for further analysis in any language.

Available on PyPI.

Use this if you need to precisely segment text like hashtags or concatenated words at scale, especially when working with various languages or niche vocabularies where pre-built dictionaries are insufficient.

Not ideal if your main concerns are very low latency and extremely high scalability where even small language models are too slow, or if you only need to segment a very small volume of items.

social-media-analysis text-preprocessing natural-language-processing data-cleaning multilingual-text

Maintenance 6 / 25

Adoption 9 / 25

Maturity 25 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

ThilinaRajapakse/simpletransformers

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling,...

jsksxs360/How-to-use-Transformers

Transformers 库快速入门教程

google/deepconsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences...

Denis2054/Transformers-for-NLP-2nd-Edition

Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning,...

abhimishra91/transformers-tutorials

Github repo with tutorials to fine tune transformers for diff NLP tasks

Explore Transformer Models

All categories Trending Transformer directory Insights