3.1 Decision Tree Analyses of Clinical Data
3.1.2 Prediction of Disease Severity in Dengue Patients
3.1.2.6 Severity Prediction based on Clinical Data and only using hospitalized Cases
The second tree (SEVHOSP_TOTAL_71) was constructed by only including first visit data from hospitalized cases leading to a dataset of 71 samples. Pruning confidence was set to the standard of 25% and a value of 4 was given to “minimum cases”. The overall performance of the decision tree excluding the body temperature (TEMP_1) was slightly worse and resulted in a higher sensitivity (84%) but lower specificity (70%) (Table 3.44; Figure 3.23). The AUC of the ROC curve for more severe (95%CI:
0.69, 0.90) as well as mild cases (95%CI: 0.69, 0.91) was 0.80. The classifier correctly predicted 55 cases with an overall misclassification rate of 22.7%. Interestingly, the tree (Figure 3.22) still used PLT_1 <= 108 (OR: 25.45; 95%CI: 17.41, 33.50) as the main decision criterium to detect the low cases. Patients that were found to have higher platelet counts were further separated in secondary/primary infections. DV_IG__1 =
‘positive’ (OR: 5.63; 95%CI: 2.21, 9.05) patients were sub-grouped by first using CT_1st_COLLECTION <= 20.94 (OR: 29.75; 95%CI: 19.14, 40.36) as a criteria followed by MCHC_1 <= 34.5 (OR: 32.00; 95%CI: 18.00, 46.00) as a last decision node, both indicating classification into the low group. On the other hand, cases that were categorized as primary infections (DV_IG_G_1 = ‘negative’) were further grouped by using DIASTOLIC_BP_1 <= 67.0 (OR: 11.38; 95%CI: 1.67, 21.08) as a threshold for low cases. However, patients that had a higher diastolic blood pressure were separated into groups having either lower or higher systolic blood pressure and SYSTOLICBP_1<=119.0 (OR: 14.00; 95%CI: 1.67, 26.33) was an indicator for the more severe group low 12 (Table 3.42; Table 3.43).
12 PLT=platelet count; DV_IG_G=indicator for primary/secondary infection whereby a positive result indicates a secondary infection; CT_1st_COLLECTION=viral load whereby a high Ct-value indicates a low viral load; MCHC=mean corpuscular hemoglobin concentration; DIASTOLIC=diastolic blood
ROOT 42 LOW / 29 high
PLT_1 <= 108 20 LOW / 1 high
PLT_1 > 108 22 low / 28 HIGH
DV_IG_G_1 = positive 16 LOW / 9 high
DV_IG_G_1 = negative 6 low / 19 HIGH
CT_1ST_COLLECTION <= 20.94 16 LOW / 3 high
CT_1ST_COLLECTION > 20.94 0 low / 6 HIGH
DIASTOLIC1 <= 67.0 0 low / 12 HIGH
DIASTOLIC1 > 67.0 6 low / 7 HIGH
MCHC_1 <= 34.5 15 LOW / 0 high
MCHC_1 > 34.5 1 low / 3 HIGH
SYSTOLICBP_1 <= 119.0 6 LOW / 2 high
SYSTOLICBP_1 > 119.0 0 low / 5 HIGH ROOT
42 LOW / 29 high
PLT_1 <= 108 20 LOW / 1 high
PLT_1 > 108 22 low / 28 HIGH
DV_IG_G_1 = positive 16 LOW / 9 high
DV_IG_G_1 = negative 6 low / 19 HIGH
CT_1ST_COLLECTION <= 20.94 16 LOW / 3 high
CT_1ST_COLLECTION > 20.94 0 low / 6 HIGH
DIASTOLIC1 <= 67.0 0 low / 12 HIGH
DIASTOLIC1 > 67.0 6 low / 7 HIGH
MCHC_1 <= 34.5 15 LOW / 0 high
MCHC_1 > 34.5 1 low / 3 HIGH
SYSTOLICBP_1 <= 119.0 6 LOW / 2 high
SYSTOLICBP_1 > 119.0 0 low / 5 HIGH
Figure 3.22: SEVHOSP_TOTAL_71: Decision tree for severity prediction calculated on 71 hospitalized patients excluding cytokine data. PLT=platelet count; DV_IG_G=indicator for primary/secondary infection whereby a positive result indicates a secondary infection;
CT_1st_COLLECTION=viral load whereby a high Ct-value indicates a low viral load; MCHC=mean corpuscular hemoglobin concentration; DIASTOLIC=diastolic blood pressure; SYSTOLICBP=systolic blood pressure; 1=1st visit data.
Table 3.42: SEVHOSP_TOTAL_71: Decision tree for severity prediction calculated on 71 hospitalized patients excluding cytokine data. Statistical analysis of splitting criteria performed on the whole dataset.
PLT=platelet count; DV_IG_G=indicator for primary/secondary infection whereby a positive result indicates a secondary infection; CT_1st_COLLECTION=viral load whereby a high Ct-value indicates a low viral load; MCHC=mean corpuscular hemoglobin concentration; DIASTOLIC=diastolic blood pressure; SYSTOLICBP=systolic blood pressure; 1=1st visit data; RR=relative risk; OR=odds ratio;
CI=confidence interval.
p value Decision Node Feature RR OR 95% CI (OR)
PLT_1 [*1000/mm3] 2.16 25.45 17.41, 33.50 < 0.001 Cut-off value <= 108
DV_IG_G_1 1.65 3.42 0.72, 6.12 0.017
Cut-off value = positive
CT_1ST_COLLECTION 1.74 3.02 -078, 6.83 0.11 Cut-off value <= 20.94
MCHC_1 0.78 0.52 -2.71, 3.75 0.397
Cut-off value <= 34.5
DIASTOLIC1 1.21 1.57 -1.1084, 4.26 0.451
Cut-off value > 67.0
SYSTOLICBP_1 0.96 0.90 -1.74, 3.54 0.451 Cut-off value <= 119.0
Table 3.43: SEVHOSP_TOTAL_71: Decision tree for severity prediction calculated on 71 patients excluding cytokine data. Statistical analysis of splitting criteria performed on each subgroup at the decision nodes. In case of 0 values in the original contingency table, OR calculations were adjusted by adding 1 to each table value +1. PLT=platelet count; DV_IG_G=indicator for primary/secondary infection whereby a positive result indicates a secondary infection; CT_1st_COLLECTION=viral load whereby a high Ct-value indicates a low viral load; MCHC=mean corpuscular hemoglobin concentration; DIASTOLIC=diastolic blood pressure; SYSTOLICBP=systolic blood pressure; 1=1st visit data; RR=relative risk; OR=odds ratio; CI=confidence interval.
p value Decision Node Feature RR OR 95% CI (OR)
PLT_1 [*1000/mm3] 2.16 25.45 17.41, 33.50 < 0.001 Cut-off value <= 108
DV_IG_G_1 2.67 5.63 2.21, 9.05 0.01
Cut-off value = positive
CT_1ST_COLLECTION +1 6.48 29.75 19.14, 40.36 < 0.001 Cut-off value <= 20.94
MCHC_1 [g/dl] +1 2.82 32.00 18.00, 46.00 < 0.001 Cut-off value <= 34.5
DIASTOLIC1 [mmHg] +1 6.53 11.38 1.67, 21.08 0.015 Cut-off value > 67.0
SYSTOLICBP_1 [mmHg] +1 1.28 14.00 1.67, 26.33 0.015 Cut-off value <= 119.0
Table 3.44: SEVHOSP_TOTAL_71: Summary of K-fold (k=10) cross validation for severity prediction based on 71 hospitalized patients excluding cytokine data.
Overall Evaluation Value (n=71) Confusion Matrix Total
misclassifications 16.0 Predicted Class
Overall error rate 22.679% high low
SE of error rate 19.380 20 9
high
Actual Class
(70%) (30%) Average profit 0.546
SE of profit 0.388 7 35
low
95%CI: 0.69,
0.91 (17%) (83%)
AUC high 0.7991
95%CI: 0.69, 0.90 AUC low 0.7962
Figure 3.23: SEVHOSP_TOTAL_71: Receiver operating characteristics (ROC) curve for severity prediction calculated on 71 hospitalized patients excluding cytokine data.