loomchild/segment
Program used to split text into segments
This tool helps language professionals, localization managers, and content creators automatically split large blocks of text into smaller, manageable segments, like individual sentences. You provide your text along with a set of segmentation rules (in SRX format), and it outputs the text broken down into discrete segments, one per line or separated by custom markers. It's designed for anyone who needs to prepare text for processes like machine translation, linguistic analysis, or indexing.
No commits in the last 6 months.
Use this if you need a reliable way to automatically segment plain text based on industry-standard SRX rules, especially for preparing content for translation memory systems or linguistic workflows.
Not ideal if you need to preserve original text formatting (like rich text or XML), require highly specialized segmentation not covered by SRX, or are looking for a GUI-based desktop application.
Stars
28
Forks
10
Language
Java
License
MIT
Category
Last pushed
Oct 27, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/loomchild/segment"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
EmilStenstrom/conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
OpenPecha/Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks
natasha/razdel
Rule-based token, sentence segmentation for Russian language