Biomedical Data Mining for Information Retrieval. Группа авторовЧитать онлайн книгу.
Discriminant Analysis (DA), Decision Tree (DT), K-Nearest Neighborhood (KNN), Naive Bayesian and Support Vector Machine (SVM) are also applied to predict mortality in an in-hospital death and obtained results using their own principles as briefed below.
Discriminant analysis [34] is one of the statistical tools which is used to classify individuals into a number of groups. To separate two groups, Discriminant Function Analysis (DFA) is used and to separate more than two groups Canonical Varieties Analysis (CVA) is used. There are two potential goals in a discriminant investigation: finding a prescient condition for grouping new people or deciphering the prescient condition to all the more likely comprehend the connections that may exist among the factors.
Decision Tree [35] is a tree like structure used for classification and regression. It is a supervised machine learning algorithm used in decision making. The objective of utilizing a DT is to make a preparation model that can use to foresee the class or estimation of the objective variable by taking in basic choice principles gathered from earlier data (training information). In DT, for anticipating a class name for a record one has to start from the foundation of the tree. We look at the estimations of the root property with the record’s characteristic. Based on correlation, one follows the branch and jump to the next node.
Figure 1.3 Convergence characteristics of FA-FLANN based mortality prediction model.
KNN [35] is also a supervised machine learning algorithm used for both classification and regression. It is simple and easy to implement algorithm. KNN finds the nearest neighbors by calculating the distance between the data points which is called the Euclidian distance.
A Naive Bayes classifier [35] is a probabilistic AI model that is utilized for classification task. The Bayes equation is given as
(1.6)
Utilizing Bayes hypothesis, it discovers the likelihood of an occurrence, given that B has happened. Here, B represents evidence and A represents hypothesis. The supposition made here is that the indicators/highlights are free. That is nearness of one specific element doesn’t influence the other. Consequently it is called Naïve.
Support Vector Machine [35] is a supervised machine learning algorithms which aims to find a hyperplane in the N-dimensional space. A plane which has the maximum margin is to be chosen. Vectors are information focuses that are nearer to the hyperplane and impact the position and direction of the hyperplane. Utilizing these help vectors, the edge of the classifier is expanded. Erasing the help vectors will change the situation of the hyperplane. These are the focuses that assist in building the SVM.
1.4 Result and Discussion
The results of all the models on testing set containing 1,000 records are shown in the Table 1.3.
As exhibited from the above table DT has outperformed the other five models with an accuracy of 97.95%. FA-FLANN model has secured the 2nd rank with an accuracy of 87.6%. DA, KNN and SVM models are giving almost same results with accuracy of 86.05%, 86.6% and 86.15% respectively. The worst result is reported for the Naïve Bayesian based model with an accuracy of 54.80%.
Table 1.3 Comparison of different models during testing.
S. no. | Model name | Error during testing | Accuracy | Rank | |
---|---|---|---|---|---|
Value | (%) | ||||
1. | FA-FLANN | 0.1240 | 12.40% | 87.60% | 2 |
2. | DA | 0.1395 | 13.95% | 86.05% | 5 |
3. | DT | 0.0205 | 2.05% | 97.95% | 1 |
4. | KNN | 0.1340 | 13.4% | 86.6% | 4 |
5. | Naive Bayesian | 0.4520 | 45.20% | 54.80% | 6 |
6. | SVM | 0.1385 | 13.85% | 86.15% | 3 |
1.5 Conclusion
In this chapter, different algorithms are presented to predict in hospital mortality based on the information collected at the hospital from the 48 h of observation. The data are selected from the PhysioNet challenge 2012 and used to predict in-hospital death. 4,000 records of patients have been selected of set A, from which 3,000 records of patients are used for training and other 1,000 records are kept for testing. 15 time series variables are selected out of 41 features for model development. Missing values are handled by imputing zeros. Six different models are developed for mortality prediction and a comparison is performed. It is observed from comparison that the decision tree is one of the best algorithms which obtained best accuracy result as compared to other five models used for the simulation study.
1.6 Future Work
Many authors have accepted challenges of PhysioNet challenge 2012 and published many papers and found better accuracy results. Mortality prediction is still a challenging task to predict patient’s mortality in a hospital. Researchers are going on to develop some more models, other methods of handling missing data and make new strategies for mortality prediction. The performance of different other algorithms such as extreme learning machine, convolution neural networks and deep learning can also be used for the purpose in future.
References
1. https://en.wikipedia.org/wiki/Health_care
2. Hanson, C. and Marshall, B., Artificial intelligence applications in the intensive care unit. Crit. Care Med., 29, 2, 1–9, 2001.
3. Halpern, N.A. and Pastores, S.M., Critical care medicine in the United States 2000–2005: An analysis of bed numbers, occupancy rates, payer mix and costs. Crit. Care Med., 38, 1, 65–71, 2010.
4. Awad,