smart-on-fhir/cumulus-etl
Extract FHIR data, Transform with NLP and DEID tools, and then Load FHIR data into a SQL Database for analysis
This tool helps healthcare researchers and analysts prepare large volumes of patient data for clinical investigations. It takes raw Fast Healthcare Interoperability Resources (FHIR) exports from electronic health record systems, then cleans and anonymizes them. The output is structured, de-identified patient data loaded into a SQL database, ready for secure querying and analysis.
Available on PyPI.
Use this if you need to process population-scale clinical data from FHIR exports, protect patient privacy through de-identification, and extract valuable insights from unstructured clinical notes using natural language processing, all within a SQL-queryable format.
Not ideal if you are working with small datasets, do not require de-identification or advanced NLP on clinical notes, or prefer to keep your data exclusively within the FHIR standard without transformation to a SQL database.
Stars
22
Forks
6
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 18, 2026
Commits (30d)
0
Dependencies
17
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/smart-on-fhir/cumulus-etl"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
mirkosertic/FXDesktopSearch
A JavaFX based desktop search application.
opensemanticsearch/open-semantic-search
Open Source research tool to search, browse, analyze and explore large document collections by...
opensemanticsearch/open-semantic-etl
Python based Open Source ETL tools for file crawling, document processing (text extraction,...
opensemanticsearch/open-semantic-entity-search-api
Open Source REST API for named entity extraction, named entity linking, named entity...
opensemanticsearch/open-semantic-search-apps
Python/Django based webapps and web user interfaces for search, structure (meta data management...