allenai/scispacy
A full spaCy pipeline and models for scientific/biomedical documents.
This project helps scientists and researchers automatically understand specialized scientific and biomedical texts. It takes raw text like journal articles or research papers and extracts key information, identifying important scientific terms, their definitions, and linking them to established medical or biological databases. Biomedical researchers, clinical scientists, and anyone working with large volumes of scientific literature would find this useful.
1,934 stars. Used by 2 other packages. Available on PyPI.
Use this if you need to quickly and accurately identify and categorize scientific entities, abbreviations, or connect terms to medical knowledge bases within large sets of scientific documents.
Not ideal if your documents are not scientific or biomedical in nature, or if you need to analyze text in languages other than English.
Stars
1,934
Forks
249
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 04, 2025
Commits (30d)
0
Dependencies
10
Reverse dependents
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/allenai/scispacy"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
sloria/TextBlob
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase...
chrismattmann/tika-python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called...
cltk/cltk
The Classical Language Toolkit
wi2trier/cbrkit
Customizable Case-Based Reasoning (CBR) toolkit for Python with a built-in API and CLI.
grid-parity-exchange/Egret
Tools for building power systems optimization problems