reinfer/blingfire-rs
Rust wrapper for the BlingFire tokenization library
This tool helps data analysts and NLP practitioners quickly break down raw text into individual words or sentences. You provide unstructured text, and it outputs the same text with clear separations between words or sentences. It's ideal for anyone preparing large volumes of text for further analysis or processing.
15 stars and 2,273 monthly downloads. No commits in the last 6 months.
Use this if you need to reliably split natural language text into distinct words or sentences for tasks like information retrieval, text mining, or machine learning input.
Not ideal if you need advanced linguistic parsing, part-of-speech tagging, or handling of highly specialized text formats beyond basic sentence and word segmentation.
Stars
15
Forks
3
Language
Rust
License
MIT
Category
Last pushed
Jun 23, 2020
Monthly downloads
2,273
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/reinfer/blingfire-rs"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
guillaume-be/rust-tokenizers
Rust-tokenizer offers high-performance tokenizers for modern language models, including...
sugarme/tokenizer
NLP tokenizers written in Go language
elixir-nx/tokenizers
Elixir bindings for 🤗 Tokenizers
openscilab/tocount
ToCount: Lightweight Token Estimator
Scurrra/ubpe
Universal (general sequence) Byte-Pair Encoding