kampersanda/tongrams-rs

Rust library providing fast language model queries in compressed space

/ 100

Emerging

This tool helps language researchers, computational linguists, and data scientists efficiently store and query very large lists of N-grams (sequences of words) and their frequencies. It takes N-gram frequency files, compresses them significantly, and allows for rapid lookups of any N-gram to retrieve its occurrence count. The target user is anyone who works with extensive textual data and needs to analyze word patterns without consuming vast amounts of memory.

No commits in the last 6 months.

Use this if you are working with massive N-gram datasets and need to store them in a highly compressed format while still performing very fast lookups for specific N-gram frequencies.

Not ideal if you need to calculate N-gram probabilities directly or if your primary goal is to build a language model for text generation rather than frequency lookups.

natural-language-processing computational-linguistics text-analytics information-retrieval big-data-text

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Rust

License

MIT

Higher-rated alternatives

PyThaiNLP/nlpo3

Thai natural language processing library in Rust, with Python and Node bindings.

forzagreen/n2words

Convert numerical numbers to written numbers, in 52+ languages.

greyblake/whatlang-rs

Natural language detection library for Rust. Try demo online: https://whatlang.org/

wikimedia/sentencex

A sentence segmentation library with wide language support optimized for speed and utility.

pemistahl/lingua-rs

The most accurate natural language detection library for Rust, suitable for short text and...

Explore NLP Tools

All categories Trending NLP directory Insights