ncbi-nlp/BioSentVec
BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences
This project helps researchers and healthcare professionals analyze large volumes of biomedical text, such as scientific articles and clinical notes. It takes in words or sentences from these texts and outputs numerical representations (embeddings) that capture their meaning, making it easier to compare and process them. Medical researchers, clinicians, and data scientists working with health-related text data would find this useful.
611 stars. No commits in the last 6 months.
Use this if you need to understand the similarity between medical terms, concepts, or entire sentences from biomedical literature and clinical records.
Not ideal if your text data is outside of the biomedical or clinical domain, as the models are specifically trained on PubMed articles and MIMIC-III clinical notes.
Stars
611
Forks
99
Language
Jupyter Notebook
License
—
Category
Last pushed
Aug 15, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ncbi-nlp/BioSentVec"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
dselivanov/text2vec
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
vzhong/embeddings
Fast, DB Backed pretrained word embeddings for natural language processing.
dccuchile/spanish-word-embeddings
Spanish word embeddings computed with different methods and from different corpora
avidale/compress-fasttext
Tools for shrinking fastText models (in gensim format)
ibrahimsharaf/doc2vec
:notebook: Long(er) text representation and classification using Doc2Vec embeddings