Severity Prediction based on Cytokine and Clinical Data but only using

Một phần của tài liệu Investigating the 2005 singaporean dengue outbreak (Trang 154 - 161)

3.1 Decision Tree Analyses of Clinical Data

3.1.2 Prediction of Disease Severity in Dengue Patients

3.1.2.9 Severity Prediction based on Cytokine and Clinical Data but only using

For the sake of completion, we separately investigated the influence of the cytokine data on the prediction of severity by excluding the non-hospitalized cases. This kind of analysis might identify cytokines that would be highly specific for severity due to the fact that we would classify two groups within a population (hospitalized patients) that was already more homogenous than using the total of 125 patients. Hence, the analysis was performed on a dataset of 52 cases (36.5% high, 63.5% low) and pruning confidence was set to 25% whereas ‘minimum cases’ was programmed to 4. The first tree excluding TEMP_1 (SEVHOSP_EXCYT_52) (Figure 3.32) was calculated by leaving the cytokine data on the side. PLT<=108 (OR: 15.00; 95%CI: 6.69, 23.31), DV_IG_G_1=’positive’ (OR: 4.69; 95%CI: 0.77, 8.62), CT_1ST_COLLECTION<=20.94 (OR: 42.00; 95%CI: 28.75, 55.25) and MXD_NO_1<=0.1 (OR: 15.00; 95%CI: 1.65, 27.35) were chosen as splitting criteria (Table 3.55; Table 3.56)16 and the decision tree was able to correctly classify 43 cases with an overall error rate of 16.67% (Table 3.57; Figure 3.33). The sensitivity was shown to be 85% and the specificity was 80%. It had a good overall performance of 0.80 for the classification of low (95%CI: 0.69, 0.92) and high (95%CI: 0.67, 0.94).

The probabilistic classifier was chosen at an average profit of 0.667.

16 PLT=platelet count; CT_1st_COLLECTION=viral load whereby a high Ct-value indicates a low viral load; DV_IG_G=indicator for primary/secondary infection whereby a positive result indicates a

ROOT 33 LOW / 19 high

PLT_1 <= 108 14 LOW / 0 high

PLT_1 > 108 19 low / 19 high

DV_IG_G_1 = positive 13 LOW / 6 high

DV_IG_G_1 = negative 6 low / 13 HIGH

CT_1ST_COLLECTION <= 20.94 13 LOW / 1 high

CT_1ST_COLLECTION > 20.94 0 low / 5 HIGH

MXD_NO_1 <= 0.1 4 LOW / 1.43 high

MXD_NO_1 > 0.1 2 low / 11.57 HIGH ROOT

33 LOW / 19 high

PLT_1 <= 108 14 LOW / 0 high

PLT_1 > 108 19 low / 19 high

DV_IG_G_1 = positive 13 LOW / 6 high

DV_IG_G_1 = negative 6 low / 13 HIGH

CT_1ST_COLLECTION <= 20.94 13 LOW / 1 high

CT_1ST_COLLECTION > 20.94 0 low / 5 HIGH

MXD_NO_1 <= 0.1 4 LOW / 1.43 high

MXD_NO_1 > 0.1 2 low / 11.57 HIGH

Figure 3.32: SEVHOSP_EXCYT_52: Decision tree for severity prediction calculated on 52 hospitalized patients excluding cytokine data. PLT=platelet count; CT_1st_COLLECTION=viral load whereby a high Ct-value indicates a low viral load; DV_IG_G=indicator for primary/secondary infection whereby a positive result indicates a secondary infection; MXD_NO= The percentage of the WBC belonging to monocytes, eosinophiles & basophiles; 1=1st visit data.

Table 3.55: SEVHOSP_EXCYT_52: Decision tree for severity prediction calculated on 52 hospitalized patients excluding cytokine data. Statistical analysis of splitting criteria performed on the whole dataset.

In case of 0 values in the original contingency table, OR calculations were adjusted by adding 1 to each table value +1 . PLT=platelet count; CT_1st_COLLECTION=viral load whereby a high Ct-value indicates a low viral load; DV_IG_G=indicator for primary/secondary infection whereby a positive result indicates a secondary infection; MXD_NO= The percentage of the WBC belonging to monocytes, eosinophiles & basophiles; 1=1st visit data; 1=1 visit data; RR=relative risk; OR=odds ratio; st CI=confidence interval.

p value Decision Node Feature RR OR 95% CI (OR)

PLT_1 [*1000/mm3 +1 ] 1.88 15 6.69, 23.31 0.001 Cut-off value <= 108

DV_IG_G_1 1.47 2.94 -0.34, 6.22 0.071

Cut-off value = positive

CT_1ST_COLLECTION 5.00 14.77 5.63, 23.91 0.007 Cut-off value <= 20.94

MXD_NO_1 [*1000 cells/mm ] 3 1.45 3.93 -1.61, 9.46 0.149 Cut-off value <= 0.1

Table 3.56: SEVHOSP_EXCYT_52: Decision tree for severity prediction calculated on 52 hospitalized patients excluding cytokine data. Statistical analysis of splitting criteria performed on each subgroup at the decision nodes. In case of 0 values in the original contingency table, OR calculations were adjusted by adding 1 to each table value +1 . PLT=platelet count; CT_1st_COLLECTION=viral load whereby a high Ct-value indicates a low viral load; DV_IG_G=indicator for primary/secondary infection whereby a positive result indicates a secondary infection; MXD_NO= The percentage of the WBC belonging to monocytes, eosinophiles & basophiles; 1=1st visit data; 1=1 visit data; RR=relative risk; OR=odds st ratio; CI=confidence interval.

p value Decision Node Feature RR OR 95% CI (OR)

PLT_1 [*1000/mm3 +1 ] 1.88 15 6.69, 23.31 0.001 Cut-off value <= 108

DV_IG_G_1 2.17 4.69 0.77, 8.62 0.05

Cut-off value = positive

CT_1ST_COLLECTION +1 6.13 42.00 28.75, 55.25 0.001 Cut-off value <= 20.94

3]

MXD_NO_1 [*1000 cells/mm 3.33 15.00 2.65, 27.35 0.015 Cut-off value <= 0.1

Table 3.57: SEVHOSP_EXCYT_52: Summary of K-fold (k=10) cross validation for severity prediction based on 52 hospitalized patients excluding cytokine data.

Overall Evaluation Value (n=52) Confusion Matrix Total

misclassifications 9.0 Predicted Class

Overall error rate 16.667% high low

SE of error rate 12.669 15 4

high

Actual Class

(80%) (20%) Average profit 0.667

SE of profit 0.253 5 28

low

95%CI: 0.67,

0.94 (15%) (85%)

AUC high 0.8042

95%CI: 0.69, 0.92 AUC low 0.8042

Figure 3.33: SEVHOSP_EXCYT_52: Receiver operating characteristics (ROC) curve for severity prediction calculated on 52 hospitalized patients excluding cytokine data.

Calculating a tree (excluding body temperature) with the same technical parameters and including the cytokine data did not result in an improvement (data not shown).

Therefore, we additionally excluded CT_1ST_COLLECTION and found that inclusion of only I_TAC_1 along with IL_10_1 would give the best prediction. Pruning confidence was still set to 25% whereas ‘minimum cases’ was defined as 10. This, in turn, resulted in a tree (SEVHOSP_INCYTA_52) (Figure 3.34) that only used PLT_1

<=108 (OR: 15.00; 95%CI: 6.69, 23.31) and I_TAC_1<=1255.2 (OR: 26.00; 95%CI:

16.44, 35.56) as decision criteria (Table 3.58; Table 3.59) . The misclassification rate 17 accounted for 20.667% with a sensitivity of 80% and a specificity of 80% (Table 7l;

Figure 7h). The AUC of the ‘low’ ROC curve averaged 0.84 (95%CI: 0.73, 0.94) and classification of ‘high’ cases showed an AUC of 0.84 (95%CI: 0.72, 0.96).

Furthermore, the classifier chosen at a specific threshold had a lower average profit (0.587) (Table 3.60; Figure 3.35).

ROOT 33 LOW / 19 high

PLT_1 <= 108 14 LOW / 0 high

PLT_1 > 108 19 low / 19 high

I_TAC_1 > 1255.2 12.86 LOW / 3.43 high

I_TAC_1 <= 1255.2 6.14 low / 15.57 HIGH ROOT

33 LOW / 19 high

PLT_1 <= 108 14 LOW / 0 high

PLT_1 > 108 19 low / 19 high

I_TAC_1 > 1255.2 12.86 LOW / 3.43 high

I_TAC_1 <= 1255.2 6.14 low / 15.57 HIGH

Figure 3.34: SEVHOSP_INCYTA_52: Decision tree for severity prediction calculated on 52 hospitalized patients including cytokine data. PLT=platelet count; I_TAC=interferon-inducible T cell α chemoattractant; 1=1st visit data

17 CT_1st_COLLECTION=viral load whereby a high Ct-value indicates a low viral load; PLT=platelet

Table 3.58: SEVHOSP_INCYTA_52: Decision tree for severity prediction calculated on 52 hospitalized patients including cytokine data. Statistical analysis of splitting criteria performed on the whole dataset. In case of 0 values in the original contingency table, OR calculations were adjusted by adding 1 to each table value +1. PLT=platelet count; I_TAC=interferon-inducible T cell α chemoattractant; 1=1st visit data; 1=1 visit data; RR=relative risk; OR=odds ratio; CI=confidence st interval.

p value Decision Node Feature RR OR 95% CI (OR)

PLT_1 [*1000/mm3 +1 ] 1.88 15.00 6.69, 23.31 0.001 Cut-off value <= 108

I_TAC_1 [pg/ml] +1 2.09 22.8 14.00, 31.64 < 0.001 Cut-off value > 1255.2

Table 3.59: SEVHOSP_INCYTA_52: Decision tree for severity prediction calculated on 52 hospitalized patients including cytokine data. Statistical analysis of splitting criteria performed on each subgroup at the decision nodes. In case of 0 values in the original contingency table, OR calculations were adjusted by adding 1 to each table value +1. PLT=platelet count; I_TAC=interferon-inducible T cell α chemoattractant; 1=1st visit data; 1=1 visit data; RR=relative risk; OR=odds ratio; CI=confidence st interval.

p value Decision Node Feature RR OR 95% CI (OR)

PLT_1 [*1000/mm3 +1 ] 1.88 15.00 6.69, 23.31 0.001 Cut-off value <= 108

I_TAC_1 [pg/ml] +1 2.79 26.00 16.44, 35.56 < 0.001 Cut-off value > 1255.2

Table 3.60: SEVHOSP_INCYTA_52: Summary of K-fold (k=10) cross validation for severity prediction based on 52 hospitalized patients including cytokine data.

Overall Evaluation Value (n=52) Confusion Matrix Total

misclassifications 11.0 Predicted Class

Overall error rate 20.667% high low

SE of error rate 15.299 15 4

high

Actual Class

(80%) (20%) Average profit 0.587

SE of profit 0.306 7 26

low

95%CI: 0.72,

0.96 (20%) (80%)

AUC high 0.8398

95%CI: 0.73, 0.94 AUC low 0.8373

Figure 3.35: SEVHOSP_INCYTA_52: Receiver operating characteristics (ROC) curve for severity

Một phần của tài liệu Investigating the 2005 singaporean dengue outbreak (Trang 154 - 161)

Tải bản đầy đủ (PDF)

(239 trang)