3.1 Decision Tree Analyses of Clinical Data
3.1.2 Prediction of Disease Severity in Dengue Patients
3.1.2.7 Severity Prediction based on Cytokine and Clinical Data
Furthermore, we checked for the influence of cytokines on the classifier performance especially to elucidate the possibility of using specific cytokines as a marker for more severe infections. For this kind of analysis, we calculated a decision tree based on 89 dengue positive patients excluding the cytokine data to be able to compare the influence of cytokines on the overall classifier performance. The remaining 36 patients were excluded due to no cytokine data. The resulting tree (SEVERE_EXCYT_89) (Figure 3.24) (excluding TEMP_1), having a pruning confidence of 25% and
“minimum cases” set to 10, was identical to the tree (Figure 3.20) calculated on the total of 133 dengue patients using PLT_1 <= 108 (OR: 40.53; 95%CI: 32.40, 48.65), CT_1ST_COLLECTION <= 20.9 (OR: 13.53; 95%CI: 5.55, 21.51) as well as DV_IG_G_1 = ‘positive’ (OR: 2.28; 95%CI: 6.04, 13.46)13 as the classification criteria for more severe infections (Table 3.45; Table 3.46). The tree had a misclassification rate of 20.28% with a sensitivity of 77% and a specificity of 83%
(Table 3.47; Figure 3.25). The AUC of the high classification was minimally higher (95%CI: 0.70, 0.88) than the AUC for the low classification (95%CI: 0.69, 0.89) indicating similar overall performance of 0.79. The average profit of the chosen probabilistic classifier was 0.594.
13 TEMP=body temperature; PLT=platelet count; CT_1st_COLLECTION=viral load whereby a high
ROOT 33 low / 56 HIGH
PLT_1 <= 108 14 LOW / 1 high
PLT_1 > 108 19 lows / 55 HIGH
CT_1ST_COLLECTION <= 20.9 19 low / 33 HIGH
CT_1ST_COLLECTION > 20.9 0 low / 22 HIGH
DV_IG_G_1 = positive 13 LOW / 6 high
DV_IG_G_1 = negative 6 low / 27 HIGH ROOT
33 low / 56 HIGH
PLT_1 <= 108 14 LOW / 1 high
PLT_1 > 108 19 lows / 55 HIGH
CT_1ST_COLLECTION <= 20.9 19 low / 33 HIGH
CT_1ST_COLLECTION > 20.9 0 low / 22 HIGH
DV_IG_G_1 = positive 13 LOW / 6 high
DV_IG_G_1 = negative 6 low / 27 HIGH
Figure 3.24: SEVERE_EXCYT_89: Decision tree for severity prediction calculated on 89 patients excluding cytokine data. PLT=platelet count; CT_1st_COLLECTION=viral load whereby a high Ct- value indicates a low viral load; DV_IG_G=indicator for primary/secondary infection whereby a positive result indicates a secondary infection; 1=1st visit data.
Table 3.45: SEVERE_EXCYT_89: Decision tree for severity prediction calculated on 89 patients excluding cytokine data. Statistical analysis of splitting criteria performed on the whole dataset.
PLT=platelet count; CT_1st_COLLECTION=viral load whereby a high Ct-value indicates a low viral load; DV_IG_G=indicator for primary/secondary infection whereby a positive result indicates a secondary infection; 1=1st visit data; RR=relative risk; OR=odds ratio; CI=confidence interval.
p value Decision Node Feature RR OR 95% CI (OR)
PLT_1 [*1000/mm3] 3.64 40.53 32.40, 48.65 < 0.001 Cut-off value <= 108
CT_1ST_COLLECTION 6.05 10.80 6.20, 15.40 < 0.001 Cut-off value <= 20.9
DV_IG_G_1 1.91 2.90 0.43, 5.30 0.026
Cut-off value = positive
Table 3.46: SEVERE_EXCYT_89: Decision tree for severity prediction calculated on 89 patients excluding cytokine data. Statistical analysis of splitting criteria performed on each subgroup at the decision nodes. In case of 0 values in the original contingency table, OR calculations were adjusted by adding 1 to each table value +1. PLT=platelet count; CT_1st_COLLECTION=viral load whereby a high Ct-value indicates a low viral load; DV_IG_G=indicator for primary/secondary infection whereby a positive result indicates a secondary infection; 1=1st visit data; RR=relative risk; OR=odds ratio;
CI=confidence interval.
p value Decision Node Feature RR OR 95% CI (OR)
PLT_1 [*1000/mm3] 3.64 40.53 32.40, 48.65 < 0.001 Cut-off value <= 108
CT_1ST_COLLECTION +1 8.89 13.53 5.55, 21.51 < 0.001 Cut-off value <= 20.9
DV_IG_G_1 3.76 2.28 6.04, 13.46 0.001
Cut-off value = positive
Table 3.47: SEVERE_EXCYT_89: Summary of K-fold (k=10) cross validation for severity prediction based on 89 patients excluding cytokine data.
Overall Evaluation Value (n=89) Confusion Matrix Total
misclassifications 18.0 Predicted Class
Overall error rate 20.278% high low
SE of error rate 8.886 46 10
high
Actual Class
(83%) (17%) Average profit 0.594
SE of profit 0.178 8 25
low
95%CI: 0.70,
0.88 (23%) (77%)
AUC high 0.7933
95%CI: 0.69, 0.89 AUC low 0.7901
Figure 3.25: SEVERE_EXCYT_89: Receiver operating characteristics (ROC) curve for severity prediction calculated on 89 patients excluding cytokine data.
Including the cytokine data and leaving the technical tree parameters unchanged (minimum cases was set to 10 with a pruning confidence of 25%), no changes in the tree splitting criteria were observed but the overall performance of the tree decreased suggesting noisiness caused by interference of cytokine and clinical data.
Therefore, we constructed a tree that excluded TEMP_1 and CT_1ST_COLLECTION.
This resulted in a tree similar to the one calculated without the cytokine data (SEVERE_INCYTA_89) (Figure 3.24), with the split represented by CT_1ST_COLLECTION exchanged by IP_10_1 (OR: 12.75; 95%CI: 7.98, 17.52) 14 (Table 3.48; Table 3.49). The chosen classifier (Figure 3.26) had a higher profit (0.617) with higher specificity (86%) but lower sensitivity (74%) and the resulting overall error rate was 19.17% (Table 3.50; Figure 3.27). The AUC for low and high classification was 0.79 but the two groups showed different confidence intervals (low 95%CI: 0.68, 0.89; high 95%CI: 0.69, 0.88).
14 TEMP=body temperature; CT_1st_COLLECTION=viral load whereby a high Ct-value indicates a low viral load; PLT=platelet count; IP_10=interferon-inducible protein 10; DV_IG_G=indicator for primary/secondary infection whereby a positive result indicates a secondary infection; 1=1st visit data.
ROOT 33 low / 56 HIGH
PLT_1 <= 108 14 LOW / 1 high
PLT_1 > 108 19 lows / 55 HIGH
IP_10_1 > 1697.9 17 low / 22 HIGH
IP_10_1 <= 1697.9 2 low / 33 HIGH
DV_IG_G_1 = positive 12 LOW / 4 high
DV_IG_G_1 = negative 7 low / 18 HIGH ROOT
33 low / 56 HIGH
PLT_1 <= 108 14 LOW / 1 high
PLT_1 > 108 19 lows / 55 HIGH
IP_10_1 > 1697.9 17 low / 22 HIGH
IP_10_1 <= 1697.9 2 low / 33 HIGH
DV_IG_G_1 = positive 12 LOW / 4 high
DV_IG_G_1 = negative 7 low / 18 HIGH
Figure 3.26: SEVERE_INCYTA_89: Decision tree for severity prediction calculated on 89 patients including cytokine data. PLT=platelet count; IP_10=interferon-inducible protein 10;
DV_IG_G=indicator for primary/secondary infection whereby a positive result indicates a secondary infection; 1=1st visit data.
Table 3.48: SEVERE_INCYTA_89: Decision tree for severity prediction calculated on 89 patients including cytokine data. Statistical analysis of splitting criteria performed on the whole dataset.
PLT=platelet count; IP_10=interferon-inducible protein 10; DV_IG_G=indicator for primary/secondary infection whereby a positive result indicates a secondary infection; 1=1st visit data;
RR=relative risk; OR=odds ratio; CI=confidence interval.
p value Decision Node Feature RR OR 95% CI (OR)
PLT_1 [*1000/mm3] 3.64 40.53 32.40, 48.65 < 0.001 Cut-off value <= 108
IP_10_1 [pg/ml] 5.40 11.20 7.97, 14.44 < 0.001 Cut-off value > 1697.9
DV_IG_G_1 1.91 2.87 0.43, 5.30 0.026
Cut-off value = positive
Table 3.49: SEVERE_INCYTA_89: Decision tree for severity prediction calculated on 89 patients including cytokine data. Statistical analysis of splitting criteria performed on each subgroup at the decision nodes. PLT=platelet count; IP_10=interferon-inducible protein 10; DV_IG_G=indicator for primary/secondary infection whereby a positive result indicates a secondary infection; 1=1st visit data;
RR=relative risk; OR=odds ratio; CI=confidence interval.
p value Decision Node Feature RR OR 95% CI (OR)
PLT_1 [*1000/mm3] 3.64 40.53 32.40, 48.65 < 0.001 Cut-off value <= 108
IP_10_1 [pg/ml] 7.63 12.75 7.98, 17.52 < 0.001 Cut-off value <= 1697.9
DV_IG_G_1 3.45 10.80 6.30, 15.30 0.003
Table 3.50: SEVERE_INCYTA_89: Summary of K-fold (k=10) cross validation for severity prediction based on 89 patients including cytokine data.
Overall Evaluation Value (n=89) Confusion Matrix Total
misclassifications 17.0 Predicted Class
Overall error rate 19.167% high low
SE of error rate 7.686 48 8
high
Actual Class
(86%) (14%) Average profit 0.617
SE of profit 0.154 9 24
low
95%CI: 0.69,
0.88 (26%) (74%)
AUC high 0.7865
95%CI: 0.68, 0.89 AUC low 0.7874
Figure 3.27: SEVERE_INCYTA_89: Receiver operating characteristics (ROC) curve for severity