gyatso736/-Tibetan-tokenizer-

This Tibetan tokenizer based on Bi-LSTM+CRF methods, it was created with the aim of aiding researchers in the field of Tibetan natural language processing.

/ 100

Experimental

This tool helps researchers in Tibetan natural language processing (NLP) by breaking down large volumes of Tibetan text into individual words or tokens. You provide it with a file containing Tibetan text, and it outputs a new file with the text segmented into its constituent words. It's designed for scholars and students working with Tibetan language data for computational analysis.

Use this if you need to preprocess large Tibetan text corpora for linguistic analysis, machine translation, or other natural language processing tasks.

Not ideal if you need perfectly accurate segmentation for very obscure or newly coined Tibetan words, as it may have limitations with unfamiliar terms.

Tibetan-language-research natural-language-processing text-segmentation linguistic-analysis

No License No Package No Dependents

Maintenance 6 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

PyThaiNLP/pythainlp

Thai natural language processing in Python

hankcs/HanLP

Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named...

jacksonllee/pycantonese

Cantonese Linguistics and NLP

dongrixinyu/JioNLP

中文 NLP 预处理、解析工具包，准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

hankcs/pyhanlp

中文分词

Explore NLP Tools

All categories Trending NLP directory Insights