The Metastatic streaming engine was developed to identify inpatient encounters with high risk of metastatic disease (cancer that spreads from where it started to a distant part of the body) and optimize the capture rate. It was deployed at Mount Sinai Hospital on April 2022.
Clinical Documentation Integrity (CDI)
-
Core of every patient encounter is clinical documentation accurately reflecting the patient’s disease burden and scope of services provided.
-
Clinical documentation must be :
-
Clear
-
Consistent
-
Complete
-
Precise
-
Reliable
-
Timely
-
Legible
-
-
CDI facilitates the accurately translation of patient’s clinical status into coded data resulting in quality reporting, physician report cards, reimbursement, public health data, disease tracking and trending, and medical research.
-
The convergence of clinical care , documentation , and coding process is crucial for appropriate reimbursement, accurate quality scores, and informed decision-making to support high-quality patient care.
Challenges
-
High volume of clinical notes and the cost of processing
-
Manual chart review of cancer patients to identify new metastatic disease is inefficient due to
-
time required
-
limited number of patients assessed
-
difficulty identifying these patients prior to treatment
-
-
Information to quickly and accurately identify patients with metastatic disease is typically available only in clinical text documents (particularly radiology reports)
-
Complexity of language expression and inconclusive text to express uncertain or negative condition makes the NLP task very challenging 1
-
Building an exhaustive list of terms and rules to model language and extract domain concepts is extremely time consuming
-
High class-imbalance => low productivity
-
Clinical documentation improvement opportunity based on benchmarking reports
Current Approach and Solution
-
Search Algorithms
-
term/string matching and document indexing (
-
-
-
)
-
“metastatic”, “metastasis”, “metastases” and “carcinomatosis”
-
-
DNNs for medical NLP (Language models: embeddings)
-
BioBERT (
-
-
)
-
BiLSTM-CRF (
-
-
)
-
Relation Extraction (REX)2
-
-
Lexicon Mediated Entropy Reduction (LEXIMER) system
-
Medical Language Extraction and Encoding Sys- tem (MEDLEE)
-
It uses a controlled vocabulary and grammatical rules to translate text to a structured database format
-
Low generalizability 3,4
-
-
Radiology Analysis tool (RADA) 5
-
Mayo Clinic’s Clinical Text Analysis and Knowledge Extraction System (cTAKES)
-
a dictionary-based named-entity recognizer to highlight the Unified Medical Language System (UMLS) Metathesaurus terms in text, in addition to other NLP functionalities, such as tok- enizing, part of speech tagging, and parsing 6
-
-
Health Information Text Extraction (HITEx) from Brigham and Women’s Hospital and Harvard Medical School
-
It Creates tag for principal diagnoses 7
-
-
Named Entity Recognition (NER)2
-
Methods
-
dictionary-based method
-
conditional Markov model (CMM) 13
-
Sequence classifier
-
probabilities in CMMs are normalized locally for each state in the sequence
-
-
conditional random field model (CRF)
-
Sequence classifier
-
Probabilities in CRFs are normalized globally for a sequence
-
-
Information model 8,9
-
Structure
-
anatomy: “Right upper lobe”
-
anatomy modifier: “Anterior”
-
observation: “Mass”
-
observation modifier: “Calcified”,“1 cm”
-
uncertainty:“Probably is present”
-
-
This information model has a hierarchical structure
-
The annotation tool => eHOST:
-
-
-
eHOST: The Extensible Human Oracle Suite of Tools an open source annotation tool
-
Stanford Part of Speech Tagger
-
RadLex: RadLex lexicon is organized in a hierarchal structure and available in Web Ontology Language (OWL) format. 10
-
cTAKES dictionary-based named-entity recognition methodology in this work 12 (http://www.ohnlp.org/)
-
Using CMM and CRF train- ing infrastructure in Stanford Named-Entity Recognizer toolkit 11
Optimization opportunity and Goal
-
Opportunity: Identify inpatient encounters with high risk of metastatic disease and optimize the capture rate
-
Goal: Develop a ML based CDI tool to flag the inpatient encounters with high risk of metastatic disease at Discharge day and send the notification to the CDI specialist
Expected Impacts
-
Improve coding accuracy
-
Improve reimbursement opportunities
-
Improve comorbidity Score => Improve Elixhauser Comorbidity Index
-
Improve PSIs monitoring
Proposed Solution
-
This tool automatically screens patient’s clinical notes (Care Notes and Progress Notes) and reports (Radiology and Pathology) at discharge time for rapid identification of patients with metastatic disease
-
The machine learning information extraction approach provides an effective automatic method to annotate and extract clinically significant information from a large collection of free text and use a ML classifier to identify the patients with high risk of new metastasis
High Level Operationalization Workflow
Batch Computational Flow
-
we use discriminative sequence classifiers for named-entity recognition to extract and organize clinically significant terms and phrases consistent with the information model.
Feature Engineering Flow
Labeling Logic
Proposed Key Performance Indicators (KPIs)
-
Chart review rate ==> will be captured by redcap response
-
Query rate
-
Provider response rate ==> will be captured by redcap response
-
Provider agreement rate ==> will be captured by redcap response
-
Unable to determine rate
Active Pilot Workflow
there are two types of metastatic patients:
-
documented and captured by 3M software
-
undocumented and will be captured by the CDI team review ==> only this category will be sent into the recap for being scanned by the NLP application
References
-
Chirag M Lakhani1 2, Arjun K Manrai1, 3, Jian Yang4, 5, Peter M Visscher#4, 5,*, and Chirag J Patel#1, 1Department BTT. 乳鼠心肌提取 HHS Public Access. Physiol Behav 2019;176:139–48. https://doi.org/10.1177/1535370213508172.Automated.
-
Hahn U, Oleynik M. Medical Information Extraction in the Age of Deep Learning. Yearb Med Inform 2020;29:208–20.
-
Hripcsak George, Kuperman Gilad J, Friedman Carol. Extracting findings from narrative reports: software transferability and sources of physician disagree- ment. Methods Inf Med 1998;37(1):1–7.
-
Elkins Jacob S, Friedman Carol, Boden-Albala Bernadette, Sacco Ralph L, Hripc- sak George. Coding neuroradiology reports for the Northern Manhattan Stroke Study: a comparison of natural language processing and manual review. Com- put Biomed Res 2000;33(1):1–10.
-
Johnson David B, Taira Ricky K, Cardenas Alfonso F, Aberle Denise R. Extract- ing information from free text radiology reports. Int J Digit Libr 1997;1(3): 297–308
-
Savova Guergana K, Masanz James J, Ogren Philip V, Zheng Jiaping, Sohn Sungh- wan, Kipper-Schuler Karin C, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applica- tions. J Am Med Inf Assoc 2010;17(5):507–13.
-
Goryachev Sergey, Sordo Margarita, Zeng Qing T. A suite of natural language processing tools developed for the I2B2 project. In: Bates David W, editor. Proceedings of the AMIA symposium, vol. 2. Washington DC: American Medical Informatics Association; 2006. p. 931
-
Langlotz Curtis P, Lee Meininger. Enhancing the expressiveness and usability of structured image reporting systems. In: Marc Overhage J, editor. Proceedings of the AMIA symposium. Los Angeles, CA: American Medical Informatics Asso- ciation; 2000. p. 467
-
Hassanpour S, Langlotz CP. Information extraction from multi-institutional radiology reports. Artif Intell Med 2016;66:29–39.
-
Langlotz Curtis P. RadLex: a new method for indexing online educational mate- rials. Radiographics 2006;26(6):1595–7
-
Finkel Jenny Rose, Grenager Trond, Manning Christopher. Incorporating non- local information into information extraction systems by Gibbs sampling. In: Darwish Kareem, Diab Mona, Habash Nizar, editors. Proceedings of the 43rd annual meeting on association for computational linguistics. Ann Arbor, MI: Association for Computational Linguistics; 2005. p. 363–70
-
Savova Guergana K, Masanz James J, Ogren Philip V, Zheng Jiaping, Sohn Sungh- wan, Kipper-Schuler Karin C, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applica- tions. J Am Med Inf Assoc 2010;17(5):507–13
-
Ratnaparkhi Adwait. A maximum entropy model for part-of-speech tagging. In: Brill Eric, Church Kenneth, editors. Proceedings of the conference on empirical methods in natural language processing, vol. 1. Philadelphia, PA: Association for Computational Linguistics; 1996. p. 133–42