Analysis on Benefits and Costs of Machine Learning-Based Early Hospitalization Prediction Academic Project

Project

Analysis on Benefits and Costs of Machine Learning-Based Early Hospitalization Prediction

Posted by Admin: System Admin

Beginner

Abstract

Overcrowding in emergency departments (EDs) has long been a problem worldwide and has serious consequences for patient satisfaction and safety. Typically, overcrowding is caused by delays in the boarding time of ED patients waiting for inpatient beds. If the hospitalization of patients is predicted early enough in EDs, inpatient beds can be prepared in advance and the boarding time can be reduced. We design machine learning-based hospitalization predictive models using data on 27,747 patients and compare the experimental results. Five predictive models are designed: 1) logistic regression, 2) XGBoost, 3) NGBoost, 4) support vector machine, and 5) decision tree models. Based on the predictive results, we estimate the quantitative effects of hospitalization predictions on EDs and wards. Using the data from the ED of a general hospital in South Korea, our experiments show that the ED length of stay of a patient can be reduced by 12.3 minutes on average and the ED can reduce the total length of stay by 340,147 minutes for a year.

Existing System & Flaws

Kim et al. [12] predicted the hospitalization of patients visiting the ED and showed which characteristics of the patients influenced their likelihood ofhospitalization. Their predictive model was mainly analyzed based on accuracy and the area under the ROC curve (AUC). They found that the older the patient and the more urgent their condition, the more likely they were to be hospitalized. Lucke et al. [13] divided 21,287 ED patients into two groups (>70 years and <70 years old) and predicted hospitalization using LR. They evaluated patients' hospitalization predictions using indices like AUC and positive prediction values. Their study demonstrated that predictive models could help identify patients who were more likely to be hospitalized using readily available information, such as their vital signs. Graham et al. [11] used three algorithms, namely LR, the gradient boosting model (GBM), and decision tree (DT), to predict the hospitalization for ED patients and analyzed 107,545 patient data. They suggested that when choosing a predictive model, simplicity and interpretation efficiency took precedence over the model's performance. Some studies have considered using neural networks in addition to regression and ensemble-based classifications for hospitalization predictions. Araz et al. [14] performed hospitalization predictions based on LR, DT, support vector machine (SVM), extreme gradient boosting (XGBoost), random forest (RF), and artificial neural network (ANN) models using data from 118,005 patients. Among the predictive models, XGBoost showed the highest AUC. Hong et al. [8] analyzed LR, XGBoost, and deep neural networks (DNNs). Based on 560,486 patient visits, they analyzed three groups of data: patient severity classification data, clinical data from previous visits, and all the available data from previous and current visits. In this analysis, XGBoost and DNN displayed good AUC values when predicting ED patient hospitalizations. Golmohammadi [15] presented hospitalization predictions using LR, ANN, and a statistical method that patterned the similarity of patient characteristics to predict hospitalization. He showed that the overall accuracy of the three models was greater than 80%. Dinh et al. [17] limited the targets of analysis to adult patients aged 16 years or older and included 860,832 patient data in the analysis. LR was used to predict hospitaliza-tions to improve the patient flow and aid clinical decision- making in the ED. LR was interpreted based on AUC. Their study showed that accurate hospitalization predictions for ED patients could be made using initially available patient information, such as age, mode of arrival, and time of arrival. Fenn et al. [18] constructed a predictive model using Light- GBM to divide the likelihood of the hospitalization into four stages: low, medium, high, and very high. A total of 468,167 patient data points were used. Medical personnel could respond flexibly to patients' follow-up processes by dividing them into several categories according to their like- lihood of hospitalization. This constructed predictive model was measured based on AUC. Goto et al. [19] studied hospitalization predictions for children who visited the ED. Using data from 52,037 children, they used lasso regression, RF, XGBoost, and DNN to predict two clinical outcomes: critical care or hospitalization. These models were evaluated for their sensitivity and specificity; DNN was the best predictor of hospitalization in children. Horng et al. [20] used data from 230,936 ED patients to predict their dis- eases. In addition to circulated data, such as patients' vital signs, they used free-text information on important symptoms to identify patients with sepsis. They employed an SVM to predict sepsis and assessed it based on AUC. The SVM achieved high performance using data routinely avail- able during triage (e.g., reasons for visits and vital signs). Ram et al. [21] conducted a study to predict the number of daily ED visits of asthmatics using Twitter and Google data collected from various regions. They showed that a predictive model using data over a short period of three months could predict the ED visits of asthmatics in near real-time. The DT and ANN models that were used predicted the number of ED visits for asthmatics in daily low-, medium-, and high-volume categories. The models were evaluated in terms of their AUC and precision values; the predictive accuracy of the medium- volume category obtained using the hybrid ANN and DT models was the highest. Barack-Corren et al. [7] studied the prediction of hospitalization for pediatric patients who visited the ED. A total of 59,033 patient data points were used, and the predictions were tested using the data that were available within 10, 30, and 60 min after the patients arrived at the ED. The predictions were made using a hybrid model that combined Naive Bayes and LR. They estimated the potential effects of hospitalization predictions for ED patients. From the perspective of the ED patients' flow, they derived the effects of the time saved in the ED and the time costs in the inpatient ward (i.e., the total time during which empty beds were held for ED patients in the inpatient ward). Disadvantages ? There is no an effective MACHINE LEARNING ALGORITHMS used in the exising system. ? The system not implemented MODEL FITTING AND EVALUATION in the datasets.

Proposed System & Advantages

Predicting the hospitalization of ED patients is one of the measures taken to reduce the boarding time and facilitate inpatient bed management, staff planning, and specialized workflows within the ED [7]. This study hypothesizes that hospitalization predictions can initiate the preparation of inpatient beds in advance and ultimately help reduce the LOS of ED patients. Therefore, we aim to identify a model that accurately predicts ED patients who are hospitalized to inpatient beds at an early stage of ED stay. We also estimate the quantitative effects of hospitalization predictions on EDs and wards and the extent to which they contribute to reducing the LOS in the ED. We performed a predictive analysis for a single general hospital. ED patients' flows are similar across general hospitals. ED patients typically go through the following steps: ED entrance, triage and initial exam, treatment, disposition, and hospitalization or discharge from the ED [8], [9]. In addition, most EDs obtain similar clinical information from the initial exams for their patients [10], [11]. This study uses data recorded from the ED patients' flow and initial exams that are similarly implemented at general hospitals. For these reasons, there is little complication in applying the machine learning methods and quantitative effect analysis to their hospitals. In this system, we also use machine learning algorithms to classify ED patient hospitalization; 1 for hospitalization and 0 for discharge from the ED. Since LR, SVM, and DT have provided good hospitalization predictions [11], [14], we use them in our study as well. Additionally, since XGBoost is known to be superior to other algorithms in terms of generalization performance and accuracy in several fields [29], we predict hospitalization using SVM. We include LR, which is a recent algorithm that has not been used for ED hospitalization predictions in other experiments. Advantages ? LR is an efficient and straightforward method for binary or multiple classification problems and SVM is a linear learning method and classification method in supervised learning that finds the optimal hyperplane that separates two classes. It maximizes the distance between the two closest classes to achieve a high classification performance. ? DT is a nonparametric supervised learning method that is used for classification and regression. It implements a simple set of rules to create partitions of the generated data and iterates the partitioning process to produce predictions. DT can classify data without complicated calculations and can be used for both categorical and classification variables. It is generally suitable for predicting categorical outcomes.

Software Requirements

? Operating system : Windows 7 Ultimate.
? Coding Language : Python.
? Front-End : Python.
? Back-End : Django-ORM
? Designing : Html, css, javascript.
? Data Base : MySQL (WAMP Server).

Hardware Requirements

? H/W System Configuration:-
? Processor - Pentium –IV
? RAM - 4 GB (min)
? Hard Disk - 20 GB
? Key Board - Standard Windows Keyboard
? Mouse - Two or Three Button Mouse
? Monitor - SVGA

Interested in this Project?

You need an active student profile to apply for this project.

Need help? Contact Support