avidale/compress-fasttext
Tools for shrinking fastText models (in gensim format)
This project helps Natural Language Processing (NLP) practitioners and researchers make their word embedding models much smaller without losing significant accuracy. It takes large fastText word embedding models and outputs compressed versions that are easier to store, share, and use in resource-constrained environments. Data scientists and machine learning engineers working with text data will find this useful.
183 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need to reduce the file size of your fastText word embedding models for easier deployment or faster loading, especially when working with many languages or large datasets.
Not ideal if you are working with older versions of gensim without updating, or if you require absolute, uncompromised model accuracy where even minor reductions are unacceptable.
Stars
183
Forks
12
Language
Jupyter Notebook
License
MIT
Category
Last pushed
May 03, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/avidale/compress-fasttext"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
dselivanov/text2vec
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
vzhong/embeddings
Fast, DB Backed pretrained word embeddings for natural language processing.
dccuchile/spanish-word-embeddings
Spanish word embeddings computed with different methods and from different corpora
ncbi-nlp/BioSentVec
BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences
ibrahimsharaf/doc2vec
:notebook: Long(er) text representation and classification using Doc2Vec embeddings