avidale/compress-fasttext

Tools for shrinking fastText models (in gensim format)

/ 100

Emerging

This project helps Natural Language Processing (NLP) practitioners and researchers make their word embedding models much smaller without losing significant accuracy. It takes large fastText word embedding models and outputs compressed versions that are easier to store, share, and use in resource-constrained environments. Data scientists and machine learning engineers working with text data will find this useful.

183 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to reduce the file size of your fastText word embedding models for easier deployment or faster loading, especially when working with many languages or large datasets.

Not ideal if you are working with older versions of gensim without updating, or if you require absolute, uncompromised model accuracy where even minor reductions are unacceptable.

natural-language-processing text-analytics machine-learning-deployment data-compression language-models

Stale 6m No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 11 / 25

How are scores calculated?

Stars

183

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

dselivanov/text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

vzhong/embeddings

Fast, DB Backed pretrained word embeddings for natural language processing.

dccuchile/spanish-word-embeddings

Spanish word embeddings computed with different methods and from different corpora

ncbi-nlp/BioSentVec

BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences

ibrahimsharaf/doc2vec

:notebook: Long(er) text representation and classification using Doc2Vec embeddings

Explore NLP Tools

All categories Trending NLP directory Insights