kampersanda/tongrams-rs
Rust library providing fast language model queries in compressed space
This tool helps language researchers, computational linguists, and data scientists efficiently store and query very large lists of N-grams (sequences of words) and their frequencies. It takes N-gram frequency files, compresses them significantly, and allows for rapid lookups of any N-gram to retrieve its occurrence count. The target user is anyone who works with extensive textual data and needs to analyze word patterns without consuming vast amounts of memory.
No commits in the last 6 months.
Use this if you are working with massive N-gram datasets and need to store them in a highly compressed format while still performing very fast lookups for specific N-gram frequencies.
Not ideal if you need to calculate N-gram probabilities directly or if your primary goal is to build a language model for text generation rather than frequency lookups.
Stars
25
Forks
5
Language
Rust
License
MIT
Category
Last pushed
Oct 01, 2022
Monthly downloads
5
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/kampersanda/tongrams-rs"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PyThaiNLP/nlpo3
Thai natural language processing library in Rust, with Python and Node bindings.
forzagreen/n2words
Convert numerical numbers to written numbers, in 52+ languages.
greyblake/whatlang-rs
Natural language detection library for Rust. Try demo online: https://whatlang.org/
wikimedia/sentencex
A sentence segmentation library with wide language support optimized for speed and utility.
pemistahl/lingua-rs
The most accurate natural language detection library for Rust, suitable for short text and...