gyatso736/-Tibetan-tokenizer-
This Tibetan tokenizer based on Bi-LSTM+CRF methods, it was created with the aim of aiding researchers in the field of Tibetan natural language processing.
This tool helps researchers in Tibetan natural language processing (NLP) by breaking down large volumes of Tibetan text into individual words or tokens. You provide it with a file containing Tibetan text, and it outputs a new file with the text segmented into its constituent words. It's designed for scholars and students working with Tibetan language data for computational analysis.
Use this if you need to preprocess large Tibetan text corpora for linguistic analysis, machine translation, or other natural language processing tasks.
Not ideal if you need perfectly accurate segmentation for very obscure or newly coined Tibetan words, as it may have limitations with unfamiliar terms.
Stars
10
Forks
1
Language
Python
License
—
Category
Last pushed
Dec 16, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/gyatso736/-Tibetan-tokenizer-"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PyThaiNLP/pythainlp
Thai natural language processing in Python
hankcs/HanLP
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named...
jacksonllee/pycantonese
Cantonese Linguistics and NLP
dongrixinyu/JioNLP
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
hankcs/pyhanlp
中文分词