cambridgeltl/sapbert
[NAACL'21 & ACL'21] SapBERT: Self-alignment pretraining for BERT & XL-BEL: Cross-Lingual Biomedical Entity Linking.
This project helps medical and scientific researchers accurately identify and link biomedical terms and concepts across different texts, even if they're phrased differently. You provide a list of biomedical entity names (like 'COVID-19' or 'Coronavirus infection'), and it outputs numerical representations (embeddings) that capture their meaning, making it easier to find related terms or standardize data. This is useful for anyone working with large volumes of biomedical text, such as in clinical research, drug discovery, or medical informatics.
218 stars. No commits in the last 6 months.
Use this if you need to precisely match and link specific biomedical terms or concepts within and across documents, especially in research or clinical contexts where exact synonyms or slight variations in phrasing are common.
Not ideal if your primary need is general-purpose text analysis outside the biomedical domain, or if you don't require highly specialized entity linking.
Stars
218
Forks
39
Language
Python
License
MIT
Category
Last pushed
Apr 28, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/cambridgeltl/sapbert"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
DerwenAI/pytextrank
Python implementation of TextRank algorithms ("textgraphs") for phrase extraction
Tiiiger/bert_score
BERT score for text generation
BrikerMan/Kashgari
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for...
asyml/texar
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. ...
yohasebe/wp2txt
A command-line tool to extract plain text from Wikipedia dumps with category and section filtering