The Metastatic streaming engine was developed to identify inpatient encounters with high risk of metastatic disease (cancer that spreads from where it started to a distant part of the body) and optimize the capture rate. It was deployed at Mount Sinai Hospital on April 2022.

Clinical Documentation Integrity (CDI)

  • Core of every patient encounter is clinical documentation accurately reflecting the patient’s disease burden and scope of services provided.

  • Clinical documentation must be :

    • Clear

    • Consistent

    • Complete

    • Precise

    • Reliable

    • Timely

    • Legible

  • CDI facilitates the accurately translation of patient’s clinical status into coded data resulting in quality reporting, physician report cards, reimbursement, public health data, disease tracking and trending, and medical research.

  • The convergence of clinical care , documentation , and coding process is crucial for appropriate reimbursement, accurate quality scores, and informed decision-making to support high-quality patient care.

Challenges

  • High volume of clinical notes and the cost of processing

  • Manual chart review of cancer patients to identify new metastatic disease is inefficient due to

    • time required

    • limited number of patients assessed

    • difficulty identifying these patients prior to treatment

  • Information to quickly and accurately identify patients with metastatic disease is typically available only in clinical text documents (particularly radiology reports)

  • Complexity of language expression and inconclusive text to express uncertain or negative condition makes the NLP task very challenging 1

  • Building an exhaustive list of terms and rules to model language and extract domain concepts is extremely time consuming

  • High class-imbalance => low productivity

  • Clinical documentation improvement opportunity based on benchmarking reports

Current Approach and Solution

  • Search Algorithms

    • term/string matching and document indexing (

    • )

    • “metastatic”, “metastasis”, “metastases” and “carcinomatosis”

  • DNNs for medical NLP (Language models: embeddings)

    • BioBERT (

  • )

  • BiLSTM-CRF (

    • )

    • Relation Extraction (REX)2

  • Lexicon Mediated Entropy Reduction (LEXIMER) system

  • Medical Language Extraction and Encoding Sys- tem (MEDLEE)

    • It uses a controlled vocabulary and grammatical rules to translate text to a structured database format

    • Low generalizability 3,4

  • Radiology Analysis tool (RADA) 5

  • Mayo Clinic’s Clinical Text Analysis and Knowledge Extraction System (cTAKES)

    • a dictionary-based named-entity recognizer to highlight the Unified Medical Language System (UMLS) Metathesaurus terms in text, in addition to other NLP functionalities, such as tok- enizing, part of speech tagging, and parsing 6

  • Health Information Text Extraction (HITEx) from Brigham and Women’s Hospital and Harvard Medical School

    • It Creates tag for principal diagnoses 7

  • Named Entity Recognition (NER)2

    • Methods

      • dictionary-based method

      • conditional Markov model (CMM) 13

        • Sequence classifier

        • probabilities in CMMs are normalized locally for each state in the sequence

      • conditional random field model (CRF)

      • Sequence classifier

      • Probabilities in CRFs are normalized globally for a sequence

    • Information model 8,9

      • Structure

        • anatomy: “Right upper lobe”

        • anatomy modifier: “Anterior”

        • observation: “Mass”

        • observation modifier: “Calcified”,“1 cm”

        • uncertainty:“Probably is present”

      • This information model has a hierarchical structure

      • The annotation tool => eHOST:

Optimization opportunity and Goal

  • Opportunity: Identify inpatient encounters with high risk of metastatic disease and optimize the capture rate

  • Goal: Develop a ML based CDI tool to flag the inpatient encounters with high risk of metastatic disease at Discharge day and send the notification to the CDI specialist

Expected Impacts

  • Improve coding accuracy

  • Improve reimbursement opportunities

  • Improve comorbidity Score => Improve Elixhauser Comorbidity Index

  • Improve PSIs monitoring

Proposed Solution

  • This tool automatically screens patient’s clinical notes (Care Notes and Progress Notes) and reports (Radiology and Pathology) at discharge time for rapid identification of patients with metastatic disease

  • The machine learning information extraction approach provides an effective automatic method to annotate and extract clinically significant information from a large collection of free text and use a ML classifier to identify the patients with high risk of new metastasis

High Level Operationalization Workflow

Batch Computational Flow

  • we use discriminative sequence classifiers for named-entity recognition to extract and organize clinically significant terms and phrases consistent with the information model.

Feature Engineering Flow

Labeling Logic

Proposed Key Performance Indicators (KPIs)

  • Chart review rate ==> will be captured by redcap response

  • Query rate

  • Provider response rate ==> will be captured by redcap response

  • Provider agreement rate ==> will be captured by redcap response

  • Unable to determine rate

Active Pilot Workflow

there are two types of metastatic patients:

  1. documented and captured by 3M software

  2. undocumented and will be captured by the CDI team review ==> only this category will be sent into the recap for being scanned by the NLP application

References

  • Chirag M Lakhani1  2, Arjun K Manrai1, 3, Jian Yang4, 5, Peter M Visscher#4, 5,*, and Chirag J Patel#1, 1Department BTT. 乳鼠心肌提取 HHS Public Access. Physiol Behav 2019;176:139–48. https://doi.org/10.1177/1535370213508172.Automated.

  • Hahn U, Oleynik M. Medical Information Extraction in the Age of Deep Learning. Yearb Med Inform 2020;29:208–20.

  • Hripcsak George, Kuperman Gilad J, Friedman Carol. Extracting findings from narrative reports: software transferability and sources of physician disagree- ment. Methods Inf Med 1998;37(1):1–7.

  • Elkins Jacob S, Friedman Carol, Boden-Albala Bernadette, Sacco Ralph L, Hripc- sak George. Coding neuroradiology reports for the Northern Manhattan Stroke Study: a comparison of natural language processing and manual review. Com- put Biomed Res 2000;33(1):1–10.

  • Johnson David B, Taira Ricky K, Cardenas Alfonso F, Aberle Denise R. Extract- ing information from free text radiology reports. Int J Digit Libr 1997;1(3): 297–308

  • Savova Guergana K, Masanz James J, Ogren Philip V, Zheng Jiaping, Sohn Sungh- wan, Kipper-Schuler Karin C, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applica- tions. J Am Med Inf Assoc 2010;17(5):507–13.

  • Goryachev Sergey, Sordo Margarita, Zeng Qing T. A suite of natural language processing tools developed for the I2B2 project. In: Bates David W, editor. Proceedings of the AMIA symposium, vol. 2. Washington DC: American Medical Informatics Association; 2006. p. 931

  • Langlotz Curtis P, Lee Meininger. Enhancing the expressiveness and usability of structured image reporting systems. In: Marc Overhage J, editor. Proceedings of the AMIA symposium. Los Angeles, CA: American Medical Informatics Asso- ciation; 2000. p. 467

  • Hassanpour S, Langlotz CP. Information extraction from multi-institutional radiology reports. Artif Intell Med 2016;66:29–39.

  • Langlotz Curtis P. RadLex: a new method for indexing online educational mate- rials. Radiographics 2006;26(6):1595–7

  • Finkel Jenny Rose, Grenager Trond, Manning Christopher. Incorporating non- local information into information extraction systems by Gibbs sampling. In: Darwish Kareem, Diab Mona, Habash Nizar, editors. Proceedings of the 43rd annual meeting on association for computational linguistics. Ann Arbor, MI: Association for Computational Linguistics; 2005. p. 363–70

  • Savova Guergana K, Masanz James J, Ogren Philip V, Zheng Jiaping, Sohn Sungh- wan, Kipper-Schuler Karin C, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applica- tions. J Am Med Inf Assoc 2010;17(5):507–13

  • Ratnaparkhi Adwait. A maximum entropy model for part-of-speech tagging. In: Brill Eric, Church Kenneth, editors. Proceedings of the conference on empirical methods in natural language processing, vol. 1. Philadelphia, PA: Association for Computational Linguistics; 1996. p. 133–42