vaskonov/burvec

Word Embeddings for Low Resource Languages: The Case of Buryat

/ 100

Experimental

This project offers a method to create 'word embeddings' for languages with limited digital text resources, like Buryat. It takes a small collection of text in such a language and outputs word vectors, which are numerical representations of words capturing their semantic meaning. This is for researchers and computational linguists working on preserving and analyzing under-resourced languages.

No commits in the last 6 months.

Use this if you need to develop language processing tools or conduct linguistic analysis for a low-resource language and lack standard NLP instruments like lemmatizers.

Not ideal if you are working with a widely-spoken language that already has extensive natural language processing resources available.

low-resource languages computational linguistics linguistic research natural language processing endangered language documentation

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

dselivanov/text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

vzhong/embeddings

Fast, DB Backed pretrained word embeddings for natural language processing.

dccuchile/spanish-word-embeddings

Spanish word embeddings computed with different methods and from different corpora

ncbi-nlp/BioSentVec

BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences

ibrahimsharaf/doc2vec

:notebook: Long(er) text representation and classification using Doc2Vec embeddings

Explore NLP Tools

All categories Trending NLP directory Insights