prigarg/Bigram-Language-Model-from-Scratch

A Bigram Language Model from scratch with no-smoothing and add-one smoothing. Outputs bigram counts, bigram probabilities and probability of test sentence.

19
/ 100
Experimental

This tool helps computational linguists, NLP students, or researchers understand how frequently word pairs appear in a large text and predict the likelihood of a sentence. You provide a body of text (your 'training corpus') and a sentence you want to analyze, and it outputs the counts and probabilities of word pairs, plus the overall probability of your test sentence.

No commits in the last 6 months.

Use this if you need to quickly calculate bigram statistics and sentence probabilities from a text corpus using either basic or 'add-one' smoothing techniques.

Not ideal if you require more advanced language modeling techniques beyond bigrams or need to handle very sparse data more robustly than 'add-one' smoothing allows.

computational-linguistics natural-language-processing text-analysis language-modeling corpus-linguistics
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 5 / 25

How are scores calculated?

Stars

15

Forks

1

Language

Jupyter Notebook

License

Last pushed

Jan 12, 2021

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/prigarg/Bigram-Language-Model-from-Scratch"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.