The paper below was published in the Journal of the American College of Nutrition, 40:1,3-12.

Prem Timsina, Himanshu N. Joshi, Fu-Yuan Cheng, Ilana Kersch, Sara Wilson, Claudia Colgan, Robert Freeman, David L. Reich, Jeffrey Mechanick, Madhu Mazumdar, Matthew A. Levin & Arash Kia (2021) MUST-Plus: A Machine Learning Classifier That Improves Malnutrition Screening in Acute Care Facilities, Journal of the American College of Nutrition, 40:1,3-12, DOI: 10.1080/07315724.2020.1774821

Abstract

Objective

Malnutrition among hospital patients, a frequent, yet under-diagnosed problem is associated with adverse impact on patient outcome and health care costs. Development of highly accurate malnutrition screening tools is, therefore, essential for its timely detection, for providing nutritional care, and for addressing the concerns related to the suboptimal predictive value of the conventional screening tools, such as the Malnutrition Universal Screening Tool (MUST). We aimed to develop a machine learning (ML) based classifier (MUST-Plus) for more accurate prediction of malnutrition.

Method

A retrospective cohort with inpatient data consisting of anthropometric, lab biochemistry, clinical data, and demographics from adult (≥ 18 years) admissions at a large tertiary health care system between January 2017 and July 2018 was used. The registered dietitian (RD) nutritional assessments were used as the gold standard outcome label. The cohort was randomly split (70:30) into training and test sets. A random forest model was trained using 10-fold cross-validation on training set, and its predictive performance on test set was compared to MUST.

Results

In all, 13.3% of admissions were associated with malnutrition in the test cohort. MUST-Plus provided 73.07% (95% confidence interval [CI]: 69.61%–76.33%) sensitivity, 76.89% (95% CI: 75.64%–78.11%) specificity, and 83.5% (95% CI: 82.0%–85.0%) area under the receiver operating curve (AUC). Compared to classic MUST, MUST-Plus demonstrated 30% higher sensitivity, 6% higher specificity, and 17% increased AUC.

Conclusions

ML-based MUST-Plus provided superior performance in identifying malnutrition compared to the classic MUST. The tool can be used for improving the operational efficiency of RDs by timely referrals of high-risk patients.

Keywords:

Introduction

Malnutrition is a frequently observed condition among hospitalized patients, with a reported rate ranging from 8% to 50% (Citation1–4), depending on the care setting and criteria used. Malnutrition, however, remains under-diagnosed and oftentimes unreported, compromising any estimates of the true incidence and prevalence. Untreated malnutrition contributes to delayed recovery, impaired immune and organ function (Citation5), increased length of hospital stays (Citation6), higher early readmission rates (Citation7, Citation8), increased morbidity and mortality burden (Citation9), and higher health care costs (Citation10–12). Unfortunately, the lack of proportionate capacity and efficiency for malnutrition screening, diagnosis, and clinical management remain key limitations in hospital settings.

Effective and mandatory screening for malnutrition in hospital settings is a cost-effective step to facilitate effective triaging of patients and guide nutritional interventions and minimize adverse consequences. As recommended by the Joint Commission, comprehensive nutritional assessment, diagnosis, and grading the severity of malnutrition by registered dietitians (RDs) is performed in patients determined to have nutritional risk by initial screening within 24 hours of admission (Citation5, Citation13, Citation14).

Since 2007, malnutrition has been incorporated as a part of the disease severity component for reimbursement by the Centers for Medicare and Medicaid Services (Citation15). With such value-based care models linking payment and quality, the accurate diagnosis and appropriate clinical management of comorbidities, such as malnutrition, can reduce or avert penalties for hospital-acquired conditions (HACs) and readmission within 30 days (Citation16). As a result, significant interest has been observed in developing better tools for the detection and management of malnutrition in hospital admissions and discharges. Many HACs, such as pressure injuries and nosocomial infections, can be prevented by nutritional interventions resulting from effective nutritional screening protocols (Citation17, Citation18). Consequently, the Academy of Nutrition and Dietetics and the American Society for Parenteral and Enteral Nutrition work group developed the evolved and consensus definition and diagnostic criteria for malnutrition (Citation4).

There are several screening tools currently used in hospital settings for triaging patients based on nutritional risk. These include the Malnutrition Universal Screening Tool (MUST) (Citation19), Nutritional Risk Screening 2002 (NRS-2002) (Citation20), and/or Malnutrition Screening Tool (MST) (Citation21, Citation22). The MUST computes a score in the range of 0 to 6 based on the following factors: body mass index (BMI), percentage weight loss over a defined time frame, presence of certain acute disease, amount of nutritional intake, and likelihood of no nutritional intake for greater than 5 days. Patients with scores of 2 or more are associated with an increased risk of malnutrition and nutrition-related complications. However, these and other rule-based screening tools are limited in their applicability for a variety of reasons: methodological flaws in validation studies, non-generalizability to all hospital patient populations, and relatively low performance metrics (Citation21).

Based on this performance gap, a superior data-driven tool for nutritional screening and assessment is needed that is suitable for multi-specialty hospital settings. Rigorously validated predictive algorithms based on machine learning (ML) have been transformative for clinical practice change and cost-effective for population health (Citation23). This is particularly true given the current multifactorial, multimorbid, and complex causality nature of chronic disease burden. Random forest (RF), an ensemble learning approach (Citation24), has grown in popularity in addressing classification problems for complex, multidimensional health care data, given its robustness to non-linearity, presence of outliers, and ability to identify input features by their importance. RF builds a large number of decision trees at once, each utilizing a subset of variables and votes for a particular class (Citation24).

The purpose of this study is to generate an ML-derived malnutrition predictive model (MUST-Plus) using a wide range of electronic health record (EHR) data and investigate the hypothesis that this model would provide superior predictive performance to the classic MUST score.

Materials and methods

Institutional Research Board approval was obtained for this retrospective cohort study (IRB #18-00573). Inclusion criteria were adults (age ≥ 18 years) patients admitted to the Mount Sinai Hospital between January 2017 and July 2018, who had a nutritional evaluation performed by a certified RD.

Study cohort

The following data were retrospectively collected from our institutional data warehouse: admission-discharge-transfer events; structured clinical assessments within nursing documentation flowsheet; physiologic data (e.g., vital signs including pulse and respiratory rates); laboratory results; and automated electrocardiogram results.

Outcome

A malnutrition diagnosis was documented if a minimum of two of the following diagnostic criteria were met: inadequate energy (kilocalorie) intake compared to estimated requirements; significant percentage of unintentional body weight loss within one year; and findings of muscle wasting, subcutaneous fat wasting, or fluid accumulation (edema) on physical examination (Citation4). Based on this assessment, the RD assigned a patient to one of three categories: No malnutrition, malnutrition of moderate degree, or severe protein-calorie malnutrition. The latter two were considered as a malnutrition positive for this study. The converted binary variable (negative/positive) was used as a label (gold standard) for training the model.

Calculation of MUST score

MUST computes a score in the range of 0 to 6 based on the following factors: BMI, percentage weight loss over a defined time frame, presence of certain acute diseases, amount of nutritional intake, and likelihood of no nutritional intake for greater than 5 days (Citation19). Patients with score of 2 or more are associated with an increased risk of malnutrition and nutrition-related complications.

Training, test, and cross-validation

The cohort data set was split into training (70%) and test (30%) data sets by random sampling. Given the class imbalance (87% negative and 13% positive) resulting from the malnutrition rate of 13.3%, random under-sampling (Citation25) was applied on the training data set to remove instances of the majority class (negative cases) until both classes were equally balanced (50% negative and 50% positive). Figure 1 provides an overview of the process of deriving training and test data sets. Tenfold cross-validation was used to train the model by using the RF algorithm from the open source Apache Spark project machine learning library (Citation26).

Figure 1. Steps of creating the study cohort and the outline of model development.

The prevalence of malnutrition in the test cohort was further stratified by the major diagnostic categories (MDC), which are defined as categorization of principal diagnoses into 25 mutually exclusive diagnosis areas, corresponding to a single organ system or etiology and in general are associated with a particular medical specialty (Citation27).

Sampling strategy and optimal prediction time

The time of prediction (tp) was defined as the time-point 48 hours prior to the discharge time. For the observational variables such as vital sign measurements, lab measurements, and nursing assessments, time series were created by looking backward up to 5 days from the prediction time (from tp) and the values were sampled every 12 hours. The result was a time series V = {Vtp−120, Vtp−84, Vtp−72, …, Vtp}, for each variable (see Figure 2). Numerical variables with missing values were imputed with the median value of the variable over the entire cohort at the sampling time point. For the categorical variables, a missing category was added in the categorical encoding map. A feature vector was created with 33 variables (378 features).

Figure 2. Building time-series for the observational variables.

The distribution of key variables was summarized overall for the study cohort and by stratifying it with malnutrition diagnosis status. Standardized mean difference (SMD) was used for estimating the difference in means in units of the pooled standard deviation and is less likely to be unaffected by sample size. An SMD > 0.1 is considered to be a statistically significant difference of mean (Citation28).

Feature selection

Recursive feature elimination was used as the feature selection approach. First, based on reviewing related studies and clinician feedback, a list of 53 variables (comprising 1155 features) was used to build a basic RF model, denoted here as MUST-Plus. Under-sampled training set was used for model development: 10-fold cross-validation was used for both feature selection and model training.

An F-score threshold ≤ 10% was used for removing a feature permanently. Following feature elimination, the final list of 33 variables (comprising 378 features) was used for training the RF model (see Supplementary Table 1)

  • Number of trees to train = 600;
  • Maximum depth of the tree = 11;
  • Maximum number of bins for discretizing continuous features = 30; and
  • Number of features to consider for splits at each tree node = 33%.

Feature importance, also referred as Gini importance, is a measure of the homogeneity of the labels at the node (Citation26). For each variable, highest value of its corresponding feature importance was taken as an estimate of variable importance.

Model testing

After identifying the 70% data set as a training set, the remaining 30% of the hospital inpatient data were used exclusively as an independent test set for the RF model. RF model-derived class probabilities (Citation26) were used for positive prediction of malnutrition with a default threshold of ≥ 0.5, and otherwise were categorized as a negative. Sensitivity, specificity, accuracy, area under the receiver-operator curve (AUC ROC), and area under the precision-recall curve (AUC PR), along with their 95% confidence intervals (CIs) were computed for comparing the performance of these screening tools (Citation29). Performance metrics were computed in the R environment (Citation30) by using custom scripts and R packages: PRROC (v.1.3.1) (Citation31), pROC (v. 1.15) (Citation32), and epiR (v. 1.0.4) (Citation33).

Results

Between January 2017 and July 2018, 8479 unique admissions had a formal malnutrition evaluation and diagnosis by an RD. The patient characteristics are shown in Table 1.

Table 1. Characteristics of admissions included in the study overall and by the malnutrition diagnosis status and their respective standardized mean difference.

Download CSV

View Table (page 4)

The data set was split into 3241 admissions for the training cohort and 5238 admissions for the test cohort.

Prevalence of malnutrition

In the test data set, 13.3% of inpatient admissions were associated with malnutrition. Figure 3 shows the prevalence of malnutrition across different MDCs in the data set. The prevalence of malnutrition varied considerably among the MDCs: 52.3% of patients with hepatobiliary system and pancreatic diseases and 49.5% of patients with myeloproliferative diseases or poorly differentiated neoplasms had malnutrition, respectively. Conversely, only 2.5% of admissions for mental diseases and disorders had malnutrition. The high variability in prevalence of malnutrition in different specialties reflects the importance of using broad range of predictors for capturing the multifactorial complexity underlying the causation of malnutrition.

Figure 3. Percent cases with malnutrition in each major diagnostic category (MDC)

Predictive performance of the model

From the test data, 174 admissions were excluded due to non-availability of MUST scores. The comparative predictive performance of the RF-based model (MUST-Plus) and classic MUST (as a reference model) is presented in Table 2.

Download CSV

Model Total admissions in test cohort % Patients with malnutrition Sensitivity in % (95% CI) Specificity in % (95% CI) Accuracy in % (95% CI) Positive predictive value in % (95% CI) F1 score AUC ROC in % (95% CI) AUC PR in % (95% CI)
MUST-Plus (RF classifier) 5064 13.3 73.5 (70.0–76.8) 76.9 (75.6–78.1) 76.4 (75.2–77.6) 32.8 (30.5–35.3) 0.45 83.5 (82.0–85.0) 44.5 (40.8–48.2)
MUST Score 41.2 (37.4–45.0) 83.0 (81.9–84.1) 77.4 (76.3–78.6) 27.2 (24.5–30) 0.32 66.2 (64.1–68.4) 29.4 (25.9–32.8)

 

Table 2. Comparison of predictive performance of MUST-Plus and classic MUST score

Using a typically used MUST score threshold of 2, sensitivity of prediction was low and further reduction was observed as the threshold was increased. Compared to the reference model, the ML-based model demonstrated a significantly higher AUC ROC of 83.5% (95% CI: 82%–85%; p < 0.0001), with sensitivity 73.07% (95% CI: 69.61%–76.33%) and specificity 76.89% (95% CI: 75.64%–78.11%). These results demonstrate superior performance of the ML-based model with 30% higher sensitivity, 6% higher specificity, 17% higher AUC ROC, 40% higher F1 score, and 15% higher AUC PR compared to the MUST scores (and Figure 4). The AUC ROC summarizes the model performance in terms its ability to rank positive instance higher than the negative ones. However, in a scenario of class imbalance with a high proportion of true negatives (as is the case with prediction of malnutrition), the F1 score and AUC PR (precision-recall) are more suitable metrics that can reflect the model’s ability to identify positive instances without regard to the negative ones (Citation34).

Figure 4. ROC and PR Curves for malnutrition classification by MUST score and MUST-Plus model: Performances of MUST and MUST-Plus are shown with red and blue lines, respectively).

Predictors and their importance

The top 20 variables ranked as having the highest Gini importance are summarized in Figure 5. BMI was identified as the most important variable by the RF model. Anthropometric variables showed core importance. Other laboratory variables, such as hemoglobin, albumin, alanine amino-transferase, and partial thromboplastin time, may also have importance in representing atypical cases of malnutrition, consistent with the multifactorial etiology of malnutrition.

Figure 5. Model variables ranked by Gini importance.

Discussion

The major purpose of using a malnutrition screening tool is enabling prioritization of patients based on the risk of malnutrition and to identify patients who require further evaluation and appropriate intervention. In this retrospective cohort study, an ML-based screening tool (MUST-Plus) using anthropometric, serum biochemistry, and hospital operational variables derived from the EHR significantly outperformed the classic MUST screening tool in multiple performance metrics without sacrificing specificity.

The major difference between the MUST-Plus approach and the classic MUST tool approach is that MUST-Plus is a probabilistic model, which generates a quantified estimate of malnutrition risk based on comprehensive data from the preceding 5 days of each patient’s hospital stay, whereas MUST is a rule-based tool based on a limited range of criteria: BMI, weight loss over a defined time frame, presence of certain acute disease, amount of nutritional intake, and likelihood of no nutritional intake for greater than 5 days. Besides, subjectivity of observation can make it vulnerable to low sensitivity. The heterogeneous rates of prevalence of malnutrition among the diagnostic categories in hospital inpatients indicates that the use of a broad range of predictors, potentially reflective of the multifactorial etiologies or association of malnutrition may be a reason that MUST-Plus provides better performance, compared with the classic MUST.

Malnutrition is a significant and pervasive problem among hospital patients. The 1974 and 1976 national surveys in surgical and general medical units of U.S. hospitals observed protein-calorie malnutrition as a common finding and demanded new clinical practice approaches for hospitalized patients (Citation35). A complicating phenomenon is that a significant number of patients without malnutrition upon admission experience malnutrition during extended inpatient stays, with the attendant risks for nutrition-related adverse outcomes (Citation36).

Corkins et al. (Citation37) showed that the rate of malnutrition in all U.S. hospital discharges in 2010 was 3.2%, and surprisingly only 13.4% of them received either enteral or parenteral nutrition support during the hospital stay. This finding raises two concerns: that malnutrition screening tools are suboptimal and that hospital protocols for intervention need to be in place for expedient and effective implementation.

Based on the current workflow, patients are screened for nutritional status by using the MUST score within 24 hours of admission by the nursing staff and then referred to RDs for assessment and treatment when necessary. Given the relatively low sensitivity of the MUST score as observed in our study, the time efficiency in assessment by RDs is adversely affected, resulting in potential delays in assessments and interventions. With MUST-Plus, however, daily assessments for all hospitalized patients with high sensitivity are feasible if deployed as an automated EHR-based screening tool. Integration of the tool in EHR can potentially reduce lag time between patient admission, referral for assessment, and management of nutritional status; shorten patient stays; and ensure delivery of high-quality patient care by clinical care. These hypotheses will need to be tested in prospective clinical trials as a part of a validation process. Improved detection rates and optimal nutrition care should be seen as an incentive for hospitals in terms of increased cost-efficiency of RD time, reduced rates of HACs, reduced periods of patient stays, and increased rates of reimbursement as a result of quality care delivered to patients.

The model variables of interest and ability of tool to address the nutritional paradox

Anthropometrics (weight, height, and BMI) provided the most information gain among those variables studied. The highest standardized mean difference was associated with BMI. Anthropometrics, however, are not universally valid predictors of malnutrition (Citation38). MUST-Plus also identified hemoglobin, serum albumin, serum creatinine, blood urea nitrogen, and serum alanine-aminotransferase (ALT) as important predictors of malnutrition. Interestingly, low albumin not only is a consequence of malnutrition and can also occur as a result of acute and chronic inflammation, chronic and advanced hepatic cirrhosis, albuminuria due to nephrotic syndrome and chronic renal disease, and protein losing enteropathy due to inflammatory bowel disease (IBD) and celiac disease (Citation39). Serum creatinine and blood urea nitrogen levels are also known to be low in patients with malnutrition of moderate to severe degree, despite a substantial reduction in glomerular filtration rate (Citation40). Elevation in ALT levels are observed in myopathy, alcoholic liver disease (Citation41), and chronic anorexia nervosa with refeeding.

Another interesting finding pertains to the “nutritional paradox” among diagnosed cases of malnutrition in which many have increased or otherwise abnormal adiposity (Citation42) (e.g., 17.9% overweight and 7.2% obese, respectively, in our study). This phenomenon will require further study incorporating new diagnostic and treatment protocols (i.e., using the MUST-Plus tool) to better understand mechanisms underlying adverse outcomes in patients with, for example, obesity, malnutrition, and critical illness (Citation43).

Length of stay also emerged as a predictor in the MUST-Plus model. Previous studies have shown that deterioration in nutritional status occurred after 7 days of hospitalization in more than a third of patients where there was normal nutritional status at admission (Citation36). Postsurgical malabsorption, nutritional neglect, and anorexia (due to medications, inflammation, neurological impairment, and other conditions) are common factors associated with longer stays and worsening nutritional status (Citation2). Whereas malnutrition has been a well-known predictor of increased length of stay, it is important to understand the bidirectional, complex, and scarcely understood relationship of malnutrition and length of stay (i.e., chronic hospitalization factors causing malnutrition and malnutrition-related complications prolonging length of stay and both).

Future of the nutritional screening process

Effective nutritional screening will need to leverage technology and further automation. The MUST-Plus model as a screening and management tool is feasible and provides prediction risk scores at prescribed time intervals. These scores should be conveyed to the RDs in real-time (e.g., via the EHR) alerting about the patient’s nutritional risk and need for intervention. This is in contrast to traditional screening tools (e.g., MUST, MST, NRS-2002), which provide a static or one-time malnutrition risk assessment within the first 24 hours of admission. Unlike the ML model, traditional screening tools do not provide continuing reassessment of malnutrition risk based on the evolution of the patient’s overall medical course. Using the ML model, RDs can assign priority based on the initial risk score and can then continue to reprioritize based on updated risk scores throughout the patient’s hospitalization. The ML model not only is, therefore, capable of alerting the RD to a high-risk patient to prioritize on the first day of an admission but can also rapidly alert the RD to a patient who requires malnutrition reassessment during a protracted hospitalization. Traditional one-time screening tools do not have this capacity. Patients having nutritional impairment can have a better opportunity for nutritional care during their hospital stay and after discharge.

Use of MUST-PLUS not only can improve the nutritional screening performance but can also create a unified screening framework that is easily scalable within or across health system(s). MUST-Plus also builds capacity for continuous improvements in malnutrition screening that is compatible with current quality improvement frameworks, such as lean and six-sigma.

Limitations

The availability of a wide range of data may not be uniform in different hospital settings. Additionally, the frequency of laboratory assessments, formats, and units may vary. These are limitations to implementing MUST-Plus on a larger scale, with a specific concern that malnutrition assessment guidelines followed by RDs may have inter-institutional differences. Finally, creating a data streaming pipeline, a machine learning engine, and building interfaces with the EHR platform require investments in data science and information technology resources for viable implementation of the tool.

Policy implications

Improving the reliability and accuracy in triaging high-risk patients to RDs can help to shift the focus from manpower planning for detection of malnutrition to manpower dedicated for delivery of nutritional care. Additionally, policies can be developed to improve the focus on providing comprehensive and coordinated patient care for malnutrition and other comorbidities in the hospital and after patient discharge.

Conclusion

MUST-Plus, an ML-based screening tool, can provide superior predictive performance compared to screening by the classic MUST score. Its adoption into practice can facilitate triaging patients for assessment by RDs. As a result of its increased reliability and accuracy in malnutrition screening, MUST-Plus can potentially augment RD performance by virtue of increased time efficiency and higher rates of detection of malnutrition.

Model availability statement

The model will be shared in future via MLflow tool (https://www.mlflow.org) (Citation44).

Supplemental material

Download MS Word (15 KB)

Acknowledgments

We thank the reviewers and editor for their valuable comments.

Disclosure statement

All authors declare that they do not have competing interest.

Data availability statement

Raw data were generated at Mount Sinai Health System. Derived data supporting the findings of this study are available from the corresponding author (MM) on request.

Authors’ contribution

Conceived the study and supervision: AK, PT, CC.

Data curation: FC, PT.

Formal analysis AK, PT, HJ.

Formal validation of data and results: HJ, SW, IK.

Wrote the original draft: HJ, AK, MM.

Edit and comment on the manuscript: ML, RF, CC, DLR, MM, JM.

Additional information

Funding

This work was supported by the Cancer Center Support Grant (CCSG) under grant number P30CA196521-01 (Mazumdar).