salgadev/medical-nlp
Dataset for Natural Language Processing using a corpus of medical transcriptions and custom-generated clinical stop words and vocabulary.
This project provides pre-processed medical text data and specialized vocabulary for anyone building tools to analyze clinical documentation. It takes raw medical transcriptions and curated clinical terms, outputting cleaned datasets ready for training machine learning models that can categorize or understand medical text. It's ideal for data scientists or researchers focusing on healthcare applications.
No commits in the last 6 months.
Use this if you need a ready-made dataset of medical transcriptions, clinical stop words, and a SNMI-based vocabulary to jumpstart your natural language processing project in healthcare.
Not ideal if your project requires highly specialized medical text from a different domain or if you need to build your own custom vocabulary from scratch without external resources.
Stars
98
Forks
23
Language
—
License
GPL-3.0
Category
Last pushed
Jul 08, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/salgadev/medical-nlp"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
medspacy/medspacy
Library for clinical NLP with spaCy.
jamesmullenbach/caml-mimic
multilabel classification of EHR notes
ncbi-nlp/NegBio
:newspaper: High-performance tool for negation and uncertainty detection in radiology reports
bionlplab/radtext
Python Radiology Text Analysis System
ClarityNLP/ClarityNLP
An NLP framework for clinical phenotyping. Docker | Python | Solr | OMOP....