CDI Alert System for Metastatic Disease

The Metastatic streaming engine was developed to identify inpatient encounters with high risk of metastatic disease (cancer that spreads from where it started to a distant part of the body) and optimize the capture rate. It was deployed at The Mount Sinai Hospital on April 2022.

Clinical Documentation Integrity (CDI)

Core of every patient encounter is clinical documentation accurately reflecting the patient’s disease burden and scope of services provided.
Clinical documentation must be :
- Clear
- Consistent
- Complete
- Precise
- Reliable
- Timely
- Legible
CDI facilitates the accurately translation of patient’s clinical status into coded data resulting in quality reporting, physician report cards, reimbursement, public health data, disease tracking and trending, and medical research.
The convergence of clinical care , documentation , and coding process is crucial for appropriate reimbursement, accurate quality scores, and informed decision-making to support high-quality patient care.

Challenges

High volume of clinical notes and the cost of processing
Manual chart review of cancer patients to identify new metastatic disease is inefficient due to
- time required
- limited number of patients assessed
- difficulty identifying these patients prior to treatment
Information to quickly and accurately identify patients with metastatic disease is typically available only in clinical text documents (particularly radiology reports)
Complexity of language expression and inconclusive text to express uncertain or negative condition makes the NLP task very challenging 1
Building an exhaustive list of terms and rules to model language and extract domain concepts is extremely time consuming
High class-imbalance => low productivity
Clinical documentation improvement opportunity based on benchmarking reports

Current Approach and Solution

Search Algorithms
- term/string matching and document indexing (
- )
- “metastatic”, “metastasis”, “metastases” and “carcinomatosis”
DNNs for medical NLP (Language models: embeddings)
- BioBERT (
)
BiLSTM-CRF (
- )
- Relation Extraction (REX)2
Lexicon Mediated Entropy Reduction (LEXIMER) system
Medical Language Extraction and Encoding Sys- tem (MEDLEE)
- It uses a controlled vocabulary and grammatical rules to translate text to a structured database format
- Low generalizability 3,4
Radiology Analysis tool (RADA) 5
Mayo Clinic’s Clinical Text Analysis and Knowledge Extraction System (cTAKES)
- a dictionary-based named-entity recognizer to highlight the Unified Medical Language System (UMLS) Metathesaurus terms in text, in addition to other NLP functionalities, such as tok- enizing, part of speech tagging, and parsing 6
Health Information Text Extraction (HITEx) from Brigham and Women’s Hospital and Harvard Medical School
- It Creates tag for principal diagnoses 7
Named Entity Recognition (NER)2
- Methods
  - dictionary-based method
  - conditional Markov model (CMM) 13
    - Sequence classifier
    - probabilities in CMMs are normalized locally for each state in the sequence
  - conditional random field model (CRF)
  - Sequence classifier
  - Probabilities in CRFs are normalized globally for a sequence
- Information model 8,9
  - Structure
    - anatomy: “Right upper lobe”
    - anatomy modifier: “Anterior”
    - observation: “Mass”
    - observation modifier: “Calcified”,“1 cm”
    - uncertainty:“Probably is present”
  - This information model has a hierarchical structure
  - The annotation tool => eHOST:

eHOST: The Extensible Human Oracle Suite of Tools an open source annotation tool
Stanford Part of Speech Tagger
NegEx: https://javadoc.io/static/com.johnsnowlabs.nlp/spark-nlp_2.11/2.3.4/index.html#com.johnsnowlabs.nlp.AnnotatorType$
RadLex: RadLex lexicon is organized in a hierarchal structure and available in Web Ontology Language (OWL) format. 10
cTAKES dictionary-based named-entity recognition methodology in this work 12 (http://www.ohnlp.org/)
Using CMM and CRF train- ing infrastructure in Stanford Named-Entity Recognizer toolkit 11

Optimization opportunity and Goal

Opportunity: Identify inpatient encounters with high risk of metastatic disease and optimize the capture rate
Goal: Develop a ML based CDI tool to flag the inpatient encounters with high risk of metastatic disease at Discharge day and send the notification to the CDI specialist

Expected Impacts

Improve coding accuracy
Improve reimbursement opportunities
Improve comorbidity Score => Improve Elixhauser Comorbidity Index
Improve PSIs monitoring

Proposed Solution

This tool automatically screens patient’s clinical notes (Care Notes and Progress Notes) and reports (Radiology and Pathology) at discharge time for rapid identification of patients with metastatic disease
The machine learning information extraction approach provides an effective automatic method to annotate and extract clinically significant information from a large collection of free text and use a ML classifier to identify the patients with high risk of new metastasis

High Level Operationalization Workflow

Batch Computational Flow

we use discriminative sequence classifiers for named-entity recognition to extract and organize clinically significant terms and phrases consistent with the information model.

Feature Engineering Flow

Labeling Logic

Proposed Key Performance Indicators (KPIs)

Chart review rate ==> will be captured by redcap response
Query rate
Provider response rate ==> will be captured by redcap response
Provider agreement rate ==> will be captured by redcap response
Unable to determine rate

Active Pilot Workflow

there are two types of metastatic patients:

documented and captured by 3M software
undocumented and will be captured by the CDI team review ==> only this category will be sent into the recap for being scanned by the NLP application

References

Chirag M Lakhani1 2, Arjun K Manrai1, 3, Jian Yang4, 5, Peter M Visscher#4, 5,*, and Chirag J Patel#1, 1Department BTT. 乳鼠心肌提取 HHS Public Access. Physiol Behav 2019;176:139–48. https://doi.org/10.1177/1535370213508172.Automated.
Hahn U, Oleynik M. Medical Information Extraction in the Age of Deep Learning. Yearb Med Inform 2020;29:208–20.
Hripcsak George, Kuperman Gilad J, Friedman Carol. Extracting findings from narrative reports: software transferability and sources of physician disagree- ment. Methods Inf Med 1998;37(1):1–7.
Elkins Jacob S, Friedman Carol, Boden-Albala Bernadette, Sacco Ralph L, Hripc- sak George. Coding neuroradiology reports for the Northern Manhattan Stroke Study: a comparison of natural language processing and manual review. Com- put Biomed Res 2000;33(1):1–10.
Johnson David B, Taira Ricky K, Cardenas Alfonso F, Aberle Denise R. Extract- ing information from free text radiology reports. Int J Digit Libr 1997;1(3): 297–308
Savova Guergana K, Masanz James J, Ogren Philip V, Zheng Jiaping, Sohn Sungh- wan, Kipper-Schuler Karin C, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applica- tions. J Am Med Inf Assoc 2010;17(5):507–13.
Goryachev Sergey, Sordo Margarita, Zeng Qing T. A suite of natural language processing tools developed for the I2B2 project. In: Bates David W, editor. Proceedings of the AMIA symposium, vol. 2. Washington DC: American Medical Informatics Association; 2006. p. 931
Langlotz Curtis P, Lee Meininger. Enhancing the expressiveness and usability of structured image reporting systems. In: Marc Overhage J, editor. Proceedings of the AMIA symposium. Los Angeles, CA: American Medical Informatics Asso- ciation; 2000. p. 467
Hassanpour S, Langlotz CP. Information extraction from multi-institutional radiology reports. Artif Intell Med 2016;66:29–39.
Langlotz Curtis P. RadLex: a new method for indexing online educational mate- rials. Radiographics 2006;26(6):1595–7
Finkel Jenny Rose, Grenager Trond, Manning Christopher. Incorporating non- local information into information extraction systems by Gibbs sampling. In: Darwish Kareem, Diab Mona, Habash Nizar, editors. Proceedings of the 43rd annual meeting on association for computational linguistics. Ann Arbor, MI: Association for Computational Linguistics; 2005. p. 363–70
Savova Guergana K, Masanz James J, Ogren Philip V, Zheng Jiaping, Sohn Sungh- wan, Kipper-Schuler Karin C, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applica- tions. J Am Med Inf Assoc 2010;17(5):507–13
Ratnaparkhi Adwait. A maximum entropy model for part-of-speech tagging. In: Brill Eric, Church Kenneth, editors. Proceedings of the conference on empirical methods in natural language processing, vol. 1. Philadelphia, PA: Association for Computational Linguistics; 1996. p. 133–42