A Methodology for Estimating Hospital Intensive Care Unit Length of Stay Using Novel Machine Learning Tools https://ieeexplore.ieee.org/document/9356186
Author : Roberto Williams Batista
Extract from :
"A methodology for clustering health survey participants and predicting hospital length of stay using novel data mining and machine learning tools"
ICU LOS is one of the crucial metrics of hospital performance evaluation (World Health Organization, n.d.). It consists of the statistical measure of mean (or median) time spent by the patient in the ICU hospital facility calculated subtracting the discharge date minus admission date. The metric can be applied to other specific areas of the hospital, as an emergency, or applied to the hospital as a whole. Its usage is not new and can be found in registers of the American Medical Association in 1907 for diseases like tuberculosis (American Medical Association, 1907). The LOS prediction for ICU is used to minimize the idleness of one of the most expensive hospital resources. In 2005 the ICU represented 15% of the total hospital beds in the United States (US), with occupancy of 68% and reaching 0.66% of the Gross Domestic Product (GDP) of the country. The recent study (Wilcox, Vaughan, Chong, Neumann, & Bell, 2019) revels the ICU cost-effectiveness ratios ranged from $119,635 to $876,539. These figures highlight the importance of accurate LOS prediction unfolding potential benefits like ICU bed availability, ICU usage planning in elective surgical procedures, unit transfer projection, supplies consumption and trends, and workflow feedback with valuable information.
The study of LOS prediction has been approached for a long time and was significantly changed with medical advances and computation technology. The work (Gustafson, 1968) in 1968 uses five different methodologies for predicting the LOS of inguinal herniotomy patients. The data were stratified in four categories of LOS, from 39 patients admitted to surgery in Henry Ford Hospital, Detroit, during January and February of 1966. The first prediction method is the subjective point estimates using three physician groups, which will predict the LOS after evaluating the patient abstract and choose one of twelve hypothesized LOS. The second method is multiple linear regression analysis, approaching the medical profile of the patient classifying each item in a discrete rate from 0 to 5. It was used the linear, binary, logarithmic, and interaction terms model. The third method uses the mean of the LOS of all the herniotomy patients discharged from the hospital in 1965.
The fourth method is the direct posterior odds estimation. It uses estimating a subjective probability distribution, using odds rather than probabilities. The fifth method is Bayes' Theorem, which associates the impacts of the data complexity on the hypothesized LOS. This work shows the complexity of the LOS prediction and the influence of the patient diseases, procedures, the level of medical details, and the human factor in the prediction models. Also, the reduced patient population used in the study can difficult the generalization of the results. The authors in (Afrin et al., 2019) used unsupervised and supervised machine learning to analyze MIMIC-III dataset to predict LOS of the patients in three classifications: Short Stay (< 3 days), Medium Stay (>3 and <5 days), and Longer Stay (> 5 days). The study tested different methods as K-Nearest Neighbor, Support Vector Machine, Random Forest, and Gradient Boosting, focused on the age and death outcome of the patients. The prediction accuracy results in nearly 54.8% using Random Forest and Logistic Regression. The study (Van Houdenhoven et al., 2007) investigate the LOS prediction in the dataset with 518 consecutive patients that underwent elective esophagectomy with reconstruction for carcinoma at the Erasmus University Medical Centre, Rotterdam, The Netherlands. It was used a multivariable linear model with a natural logarithm applied to the LOS. Also, it was constructed in three variations, the first for preoperative, the second for postoperative, and the third for intra-ICU. The approach analyzed the LOS of the patient in three different moments in the hospital. The results reached 45% of R2, focusing on the diseases-related items such as the presence of gastroesophageal reflux disease, and respiratory minute volume transthoracic.
The authors of (Azari et al., 2012) approached the LOS prediction identifying similar groups through K-Means clustering of disease conditions. Once the groups were identified, it was applied different classifiers such as SVM, JRIP, J48, and Bayesnet, which had the best accuracy, Kappa Statistic, precision, recall, and Area Under the Curve (AUC). The authors in (Clark & Ryan, 2002) demographics younger than 55 years old reach the highest accuracy of 69%, individuals in the range of 55 and 70 years older reached 13%, and the group of persons older than 70 years old 17%. The results clearly demonstrate the correlation of the age in the LOS prediction. The study performed by (Kulinskaya et al., 2005) uses the dataset from the UK NHS for 1997/98 and 1998/99 to investigate the effects of five key variables: admission method, discharge destination, hospital type, specialty, and the NHS region. Also, it explores the most robust statistical and the consistent significance of the attributes two years period. The R2 found was 11.7%, indicating low variance explained by the models.
The work in (Woods et al., 2000) evaluates the predicted and actual LOS in twenty-two Scottish ICU using the Acute Physiology and Chronic Health Evaluation III (APACHE III) system. The APACHE III system is a score of the disease risk model, which ranges from 0 to 299, given in the first 24 hours of the patient ICU admission. The APACHE III system was developed on the association of short-term risk of death and the acute changes of the physiologic balance of the patient (Knaus et al., 1991). Woods et al. (2000) concluded that the predict LOS and the actual has a weak correlation. The severity of illness in intensive care unit variations cannot explain the differences among the ICU LOS predicted and observed. In the study (Toptas et al., 2018), the factors that are affecting the LOS in the ICU are analyzed from the clinical experience of the authors. It is identified the laboratory exams, which have a positive or negative correlation with LOS. Among the positive correlation exams are urea, creatine, and sodium, in the negative correlation are uric acid and hematocrit levels.