salgadev/medical-nlp

Dataset for Natural Language Processing using a corpus of medical transcriptions and custom-generated clinical stop words and vocabulary.

45
/ 100
Emerging

This project provides pre-processed medical text data and specialized vocabulary for anyone building tools to analyze clinical documentation. It takes raw medical transcriptions and curated clinical terms, outputting cleaned datasets ready for training machine learning models that can categorize or understand medical text. It's ideal for data scientists or researchers focusing on healthcare applications.

No commits in the last 6 months.

Use this if you need a ready-made dataset of medical transcriptions, clinical stop words, and a SNMI-based vocabulary to jumpstart your natural language processing project in healthcare.

Not ideal if your project requires highly specialized medical text from a different domain or if you need to build your own custom vocabulary from scratch without external resources.

medical-transcriptions clinical-documentation-analysis healthcare-nlp biomedical-text-mining
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

98

Forks

23

Language

License

GPL-3.0

Last pushed

Jul 08, 2020

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/salgadev/medical-nlp"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.