kampersanda/tongrams-rs

Rust library providing fast language model queries in compressed space

40
/ 100
Emerging

This tool helps language researchers, computational linguists, and data scientists efficiently store and query very large lists of N-grams (sequences of words) and their frequencies. It takes N-gram frequency files, compresses them significantly, and allows for rapid lookups of any N-gram to retrieve its occurrence count. The target user is anyone who works with extensive textual data and needs to analyze word patterns without consuming vast amounts of memory.

No commits in the last 6 months.

Use this if you are working with massive N-gram datasets and need to store them in a highly compressed format while still performing very fast lookups for specific N-gram frequencies.

Not ideal if you need to calculate N-gram probabilities directly or if your primary goal is to build a language model for text generation rather than frequency lookups.

natural-language-processing computational-linguistics text-analytics information-retrieval big-data-text
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

25

Forks

5

Language

Rust

License

MIT

Last pushed

Oct 01, 2022

Monthly downloads

5

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/kampersanda/tongrams-rs"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.