wikimedia/sentencex

A sentence segmentation library with wide language support optimized for speed and utility.

61
/ 100
Established

This tool helps language practitioners quickly and accurately split large blocks of text into individual sentences across many different languages. You provide a continuous block of text in a specific language, and it outputs a list of separate sentences, preserving original formatting. This is ideal for linguists, researchers, or data scientists working with multi-language textual data.

115 stars and 80,586 monthly downloads.

Use this if you need a fast and reliable way to break down long texts into sentences for tasks like preparing data for machine translation or text-to-speech systems.

Not ideal if you require extremely nuanced linguistic precision that accounts for every rare edge case, as this tool prioritizes speed and general accuracy over absolute, complex grammatical correctness.

natural-language-processing text-analysis content-localization data-preparation linguistics
No Package No Dependents
Maintenance 13 / 25
Adoption 20 / 25
Maturity 16 / 25
Community 12 / 25

How are scores calculated?

Stars

115

Forks

11

Language

Rust

License

MIT

Last pushed

Mar 16, 2026

Monthly downloads

80,586

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/wikimedia/sentencex"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.