gyatso736/-Tibetan-tokenizer-

This Tibetan tokenizer based on Bi-LSTM+CRF methods, it was created with the aim of aiding researchers in the field of Tibetan natural language processing.

26
/ 100
Experimental

This tool helps researchers in Tibetan natural language processing (NLP) by breaking down large volumes of Tibetan text into individual words or tokens. You provide it with a file containing Tibetan text, and it outputs a new file with the text segmented into its constituent words. It's designed for scholars and students working with Tibetan language data for computational analysis.

Use this if you need to preprocess large Tibetan text corpora for linguistic analysis, machine translation, or other natural language processing tasks.

Not ideal if you need perfectly accurate segmentation for very obscure or newly coined Tibetan words, as it may have limitations with unfamiliar terms.

Tibetan-language-research natural-language-processing text-segmentation linguistic-analysis
No License No Package No Dependents
Maintenance 6 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 7 / 25

How are scores calculated?

Stars

10

Forks

1

Language

Python

License

Last pushed

Dec 16, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/gyatso736/-Tibetan-tokenizer-"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.