OpenPecha/Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

66
/ 100
Established

This tool helps researchers, linguists, and anyone working with Tibetan texts to automatically break down raw Tibetan language into individual words. You provide a block of Tibetan text or a document, and it outputs the text with words clearly separated, optionally providing grammatical information like part-of-speech and the root form of each word. It's designed for anyone needing to analyze, process, or prepare Tibetan text for further study.

Used by 1 other package. Available on PyPI.

Use this if you need to precisely segment Tibetan text into words, understand their grammatical roles, or process large volumes of text for linguistic analysis or digital archiving.

Not ideal if you're only looking for simple space-based separation or don't need any detailed linguistic analysis of Tibetan text.

Tibetan language processing linguistic analysis text digitization cultural heritage NLP for Tibetan
Maintenance 13 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 18 / 25

How are scores calculated?

Stars

78

Forks

16

Language

Python

License

Apache-2.0

Last pushed

Mar 16, 2026

Commits (30d)

0

Dependencies

2

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/OpenPecha/Botok"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.