development and validation of risk assessment models for diabetes related complications based on the dcct edic data

9 1 0
development and validation of risk assessment models for diabetes related complications based on the dcct edic data

Đang tải... (xem toàn văn)

Thông tin tài liệu

Journal of Diabetes and Its Complications xxx (2015) xxx–xxx Contents lists available at ScienceDirect Journal of Diabetes and Its Complications journal homepage: WWW.JDCJOURNAL.COM Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data Vincenzo Lagani a,⁎, Franco Chiarugi a, Shona Thomson b, Jo Fursse c, Edin Lakasing c, Russell W Jones c, Ioannis Tsamardinos a, d a Institute of Computer Science, Foundation for Research and Technology—Hellas, Heraklion, Greece Herts Valley Clinical Commission Group, Hertfordshire, United Kingdom c Chorleywood Health Center, Chorleywood, United Kingdom d Computer Science Department, University of Crete, Heraklion, Greece b a r t i c l e i n f o Article history: Received 28 November 2014 Received in revised form 10 February 2015 Accepted March 2015 Available online xxxx Keywords: Risk assessment models Risk stratification Risk factors Diabetes complications Risk model external validation a b s t r a c t Aim: To derive and validate a set of computational models able to assess the risk of developing complications and experiencing adverse events for patients with diabetes The models are developed on data from the Diabetes Control and Complications Trial (DCCT) and the Epidemiology of Diabetes Interventions and Complications (EDIC) studies, and are validated on an external, retrospectively collected cohort Methods: We selected fifty-one clinical parameters measured at baseline during the DCCT as potential risk factors for the following adverse outcomes: Cardiovascular Diseases (CVD), Hypoglycemia, Ketoacidosis, Microalbuminuria, Proteinuria, Neuropathy and Retinopathy For each outcome we applied a data-mining analysis protocol in order to identify the best-performing signature, i.e., the smallest set of clinical parameters that, considered jointly, are maximally predictive for the selected outcome The predictive models built on the selected signatures underwent both an interval validation on the DCCT/EDIC data and an external validation on a retrospective cohort of 393 diabetes patients (49 Type I and 344 Type II) from the Chorleywood Medical Center, UK Results: The selected predictive signatures contain five to fifteen risk factors, depending on the specific outcome Internal validation performances, as measured by the Concordance Index (CI), range from 0.62 to 0.83, indicating good predictive power The models achieved comparable performances for the Type I and, quite surprisingly, Type II external cohort Conclusions: Data-mining analyses of the DCCT/EDIC data allow the identification of accurate predictive models for diabetes-related complications We also present initial evidences that these models can be applied on a more recent, European population © 2015 Published by Elsevier Inc Introduction Computational models for assessing the risk of diabetes-related complications are becoming more and more prevalent in diabetes clinical research (Palmer, 2013) Risk assessment models can be defined as mathematical tools that evaluate the risk of experiencing an adverse outcome on the basis of patient’s clinical profile These models are employed in clinical practice for assisting the clinicians in stratifying patients according to the gravity of their conditions and the possible evolution of their clinical trajectories Moreover, devising risk assessment models usually leads to the identification of novel risk Conflicts of interest: The authors declare that there are no conflicts of interest ⁎ Corresponding author N Plastira 100, Vassilika Vouton, GR-700 13 Heraklion, Crete, Greece Tel.: +30 2810 391070; fax: +30 2810 391428 E-mail address: vlagani@ics.forth.gr (V Lagani) factors associated with a given complications In turn, this knowledge potentially grants a better understanding of diabetes pathophysiology (Ajmera, Swat, Laibe, Le, & Chelliah, 2013) We analyzed the information collected during the Diabetes and Complication Control Trial (DCCT) (The Diabetes Control and Complications Trial Research Group, 1993) and the Epidemiology of Diabetes Interventions and Complications study (EDIC) (Nathan et al., 2005) for deriving risk assessment models for seven different diabetes-related complications and adverse events: Cardiovascular Diseases (CVD), Hypoglycemia, Ketoacidosis, Microalbuminuria, Proteinuria, Neuropathy and Retinopathy Particularly, for each complication we tried to identify the minimal set of clinical parameters that, considered jointly, are maximally predictive Identifying such minimal sets of risk factors leads to models easier to interpret, possibly providing intuitions into the mechanisms originating the disease, while discarded factors are either irrelevant or redundant given http://dx.doi.org/10.1016/j.jdiacomp.2015.03.001 1056-8727/© 2015 Published by Elsevier Inc Please cite this article as: Lagani, V., et al., Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data, Journal of Diabetes and Its Complications (2015), http://dx.doi.org/10.1016/j.jdiacomp.2015.03.001 V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx the selected ones Borrowing a notation commonly used in genomic research (Subramanian & Simon, 2010), hereafter we will refer to such parsimonious, predictive sets of risk factors as predictive signatures During our analyses we employed a complex machine-learning protocol (Lagani & Tsamardinos, 2010) in order to simultaneously (a) identify the predictive signatures, (b) derive the best models over the selected signatures and (c) unbiasedly assess the performances of the models on the DCCT/EDIC data (internal validation) Moreover, we retrospectively collected data from 393 Type I (49) and Type II (344) diabetes patients, followed in the Chorleywood Medical Center (CHC), United Kingdom (UK), in the period 2004–2014 The models were evaluated on this external cohort, in order to assess their transferability on a population with different characteristics with respect to the one followed in the DCCT/EDIC study The results of the validation indicate that models trained on a USA/ Canada cohort of diabetes patients enrolled in the 80’s can actually transfer on a cohort of contemporary European patients Transferability increases when the models are re-calibrated on the new data by conserving the original predictive signature This suggests that while the effect size of each risk factor may change over time and across different geographical area, factors that were highly predictive in the 80’s can still help clinicians in correctly stratifying diabetes patients according to their risk Research design and methods 2.2 Outcomes definition We have defined seven different outcomes, each one corresponding to a severe diabetes-related complication or adverse event Several studies (Nathan et al., 2005; The Diabetes Control and Complications Trial Research Group, 1993, 1995a, 1995b, 1995c, 1995d, 1997) have defined and studied similar diabetes-related complications on the DCCT/EDIC data Whenever possible, we have adopted the same definitions suggested by these previous works 2.2.1 Cardiovascular disease (CVD) Following the work presented in (Nathan et al., 2005), we define CVD as the first occurrence of any of the following events: Cardiovascular death, Acute Myocardial Infarction, Bypass graft/Angioplasty, Angina Pectoris, Cardiac Arrhythmia, Major ECG abnormality, Silent Myocardial Infarction, Congestive Heart Failure, Transient Ischemic Attack, Arterial Event requiring surgery The relatively young age of the subjects included in the DCCT study led to a particularly low incidence of CVD events: only twenty-eight subjects (1.94%) experienced any macro or microvascular complications One of the main objectives of the EDIC study was to record and study the incidence of CVD complications in the DCCT cohort after the end of the DCCT follow-up We decided to define two distinct outcomes for cardiovascular diseases: the first one, hereafter named CVD-DCCT, takes in consideration the DCCT follow-up and includes only the CVD events that occurred during the DCCT study; the second outcome, namely CVD-EDIC, considers the combined follow-up period of both DCCT and EDIC and includes the CVD events that occurred in both studies 2.1 DCCT/EDIC data The DCCT design has been described elsewhere (The Diabetes Control and Complications Trial Research Group, 1993) Briefly, 1441 Type I diabetes patients (13 to 39 years of age) were enrolled in the study from 1983 to 1989 and followed, on average, for 6.5 years The study was designed as a randomized control trial, with patients randomly assigned to conventional or intensive insulin therapy Two distinct cohort were enrolled: the primary intervention cohort was composed of patients with albumin concentration ≤ 40 mg/24 h, no retinopathy and having diabetes for to years, while the secondary intervention cohort comprises subjects with a longer history of diabetes (1 to 15 years), mild to moderated non-proliferative diabetic retinopathy, and albumin excretion rate ≤ 200 mg/24 h An exhaustive clinical examination was performed at baseline (including medical history, physical examination, electrocardiogram, and laboratory analyses), while patients’ conditions and risk factors were re-assessed annually (with glycosylated hemoglobin measured quarterly (The DCCT Research Group, 1987)) In 1994, 1394 subjects out of the original 1441 DCCT patients (97%) accepted to participate in a long term follow-up, the EDIC study, whose main objective was to collect prospective data on the evolution of macrovascular and microvascular complications (Epidemiology of Diabetes Interventions and Complications (EDIC) Research Group, 1999) The EDIC followed the same methods of DCCT, with only minor modifications in the schedule of the measurements of glycosylated hemoglobin (measured annually), fasting lipid levels and renal function (re-assessed every two years) For our analyses we selected fifty-one clinical parameters measured at DCCT baseline (see Table in the Supplementary Material) These clinical parameters were selected by a panel of clinical practitioners as the ones commonly used to date in the treatment of diabetes Remaining parameters were either measured solely during the DCCT for research purposes or are not employed in the clinical practice anymore This selection was performed in order to enhance the conformity of our results with the medical procedures followed in modern clinical settings 2.2.2 Hypoglycemia and ketoacidosis The Hypoglycemia and Ketoacidosis outcomes were defined as any serious hypoglycemic and ketoacidosis event, respectively, requiring hospitalization, as reported by the patients in each quarterly visit 2.2.3 Microalbuminuria and proteinuria Microalbuminuria was defined as albumin/creatinine ratio (ACR) greater than or equal to 2.5 mg/mmol (men) or 3.5 mg/mmol (women) (The National Collaborating Centre for Chronic Conditions, 2008), or albumin concentration greater than or equal to 20 mg/l, while Proteinuria was identified by an albumin/creatinine ratio greater than or equal to 30 mg/mmol or albumin concentration greater than or equal to 200 mg/l 2.2.4 Neuropathy The Neuropathy outcome was defined as the presence of abnormalities in the autonomic function During the DCCT Neuropathy was diagnosed on the basis of “physical examination and history confirmed by unequivocal abnormality of either nerve conduction or autonomic nervous system” (The Diabetes Control and Complications Trial Research Group, 1995d) In the CHC validation cohort we used an alternative definition based on the presence of dysfunctions in bowel/ bladder or erectile dysfunction 2.2.5 Retinopathy The presence and severity of retinopathy were assessed in the DCCT study according to a scale derived from the Early Treatment Diabetic Retinopathy Study Scale (ETDRS) (see Tables 1–2 in The Diabetes Control and Complications Trial Research Group, 1995e) Currently, the UK Retinopathy Severity (UKRS) scale (The Royal College of Ophthalmologists, 2012) is usually employed in clinical practice in UK We translated the DCCT–ETDRS measurements in UKRS values, according to the conversion schema reported in Table 1.1 of the Diabetic Retinopathy Guidelines (The Royal College of Ophthalmologists, 2012) (see also Table in Supplementary Material) After the conversion, we adopted an approach similar to (The Please cite this article as: Lagani, V., et al., Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data, Journal of Diabetes and Its Complications (2015), http://dx.doi.org/10.1016/j.jdiacomp.2015.03.001 V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx Diabetes Control and Complications Trial Research Group, 1995e) and we defined a “retinopathy event” as any worsening in the retina condition that lasted at least six months 2.3 Derivation of the computational models and internal validation The goals of our analyses are (a) identifying the best predictive signature for each outcome, (b) fitting a computational risk-assessment model over each signature and (c) assessing the predictive performances of these models The presence of censoring in the DCCT/EDIC data requires the adoption of specialized methods for achieving these goals “Censoring” in these context means that the information about the outcome can be partial; particularly, the data used in this work are affected by right-censoring, i.e., for some subjects the exact timeto-event is not known, and the only available information is that they were event-free up to a given point (follow-up time) More formally, the baseline visit of the DCCT data can be represented as a dataset D containing m = 1441 diabetes patients, where each patient is represented as a vector of measurements xi defined over a set of n = 51 risk factors X = {X1, …, Xj, …, Xn} Each outcome K is represented by a tuple Ok = {(ti, δi)}, where δi is a binary variable indicating that subject i experienced the specific event (δi = 1) or not (δi = 0), while ti is the recorded time-to-event or follow-up time The best signature and predictive model for each outcome are indicated as Xk⁎ ⊆ X and Mk, respectively Survival Max–Min Parent Children (SMMPC, (Lagani & Tsamardinos, 2010)), Lasso Cox Regression (Tibshirani, 1997), Bayesian Variable Selection (BVS, (Faraggi & Simon, 1998)), and Forward and Univariate Selection (Bøvelstad et al., 2007) were employed as feature selection methods for identifying the best performing signatures These feature selection methods are based on different theoretical foundations and assumptions; however, they all attempt to identify a set X* ⊆ X that is highly predictive with respect to the outcome Notably, while all methods try to keep X* parsimonious, only SMMPC provides theoretical guaranties about retrieving a minimal-size X* (Tsamardinos, Brown, & Aliferis, 2006) Once a signature X* is identified, predictive models can be fitted over it Cox regression (Cox, 1972), Ridge Cox regression (Van Houwelingen, Bruinsma, Hart, Van’t Veer, & Wessels, 2006), Accelerated Failure Time (AFT) models (Kalbfleisch & Prentice, 1980), Random Survival Forest (RSF (Ishwaran, Kogalur, Eugene, & Blackstone, 2008)) and Support Vector Machine Censored Regression (SVCR, (Shivaswamy, Chu, & Jansche, 2007)) were employed as regression algorithms for model fitting All regression methods provide models that are able to calculate a single-point risk estimate for any new subject xm + 1, under the form rm + 1, k = Mk⁎(xm + 1) These estimates can then be used for ranking patients according to their relative risk Particularly, for (Ridge) Cox Regression and AFT models the risk estimates are given by ri = ∑βjxij, where β is the coefficient provided by the regression procedure SVCR and RSF predictions are given by weighted combinations of kernelfunction products and single survival-tree predictions, respectively Each of these feature selection and regression algorithms requires the user to provide one or more “hyper-parameters”, i.e., parameters that are not directly estimated from the data and that must be specified a priori For example, the hyper-parameter λ in the Lasso Cox Regression regulates the level of shrinkage for the coefficients and, indirectly, the number of variables to be included in the regression model SVCR models require the specification of an appropriate kernel function and cost-parameter C The hyper-parameters used for each method are listed in the Supplementary Material We employed a complex experimentation protocol in order to (a) find for each outcome the best combination of feature selection and regression algorithms, along with their respective optimal hyperparameters (model selection) and (b) provide an unbiased assessment of the predictive performance of the selected model (internal validation/performance estimation) Model selection was performed through cross validation In cross validation, the data are partitioned in N separate folds, and each fold is in turn held out for performance estimation purpose (test set) while the rest of the data (training set) is employed for deriving predictive models When N is equal to the number of samples, the procedure is named leave-one-out The configuration that obtains the best average performance over the N folds is then applied on the whole set of data, in order to obtain the final predictive signature X* and the corresponding model M* The predictive performances of the final models were assessed through nested-cross validation (Statnikov, Aliferis, Tsamardinos, Hardin, & Levy, 2005) Nested-cross validation is an extension of the common cross validation procedure, where an inner loop of cross validation is performed within each training set The inner loop serves for selecting the best combination of algorithms and hyperparameters, while the N test sets of the outer cross validation are used exclusively for performance estimation The procedure provides a vector P = {P1, …, PN} of estimated performances, whose average value P is typically taken as single-point estimate Notably, nestedcross validation estimates are usually conservative (Tsamardinos, Lagani, & Rakhshani, 2014) Figs and in the Supplementary Material provide a visual representation of both procedures All performances are measured in terms of Concordance Index (CI (Uno, Cai, Pencina, D’Agostino, & Wei, 2011)) The CI metric is specific for right censored survival data, and it can be interpreted as the probability that the model will correctly rank two randomly selected subjects in accordance to their actual risk of experiencing a given event Similarly to the Area Under the Receiver Operator Curve metric for binary classification problems (AUC (Fawcett, 2006)), a value of CI equals to one indicates a perfect rank in terms of relative risk, while a value of 0.5 indicates a random ordering In both nested and standard cross validation the variables of each training set are standardized to have zero-mean and unitary standard deviation Test sets are standardized according to the mean and standard deviation values of the corresponding training set Moreover, categorical variables are transformed in sets of binary variables, one binary variable for each category In this way the feature selection methods are free to include in each model only the categories that are relevant for the outcome at hand 2.4 External validation Validation data were retrospectively collected from 393 diabetes patients who were admitted at the CHC premises between 2004 and 2014 Forty-nine patients (12.5%) had Type I diabetes, while the remaining ones were diagnosed with Type II diabetes For each patient and for each outcome we considered the first visit where the risk factors included in the corresponding predictive signature were measured Patients that already developed a specific complication at the time of the first visit were not employed for the validation of the respective predictive model Missing values were replaced with the average or mode values of the respective predictors, as calculated on the DCCT baseline data The data collection procedure produced seven distinct datasets, one for outcome, with a number of included subjects ranging between 274 and 343 and with an average follow-up between 37.6 and 69.4 months Table in the supplementary material describes the distribution of the validation data and compares it with the DCCT cohort Results 3.1 The predictive signatures and their interplay The final risk assessment models are reported in Table Each model is composed of a number of risk factors ranging from five to ten, for a total of twenty-five risk factors included in at least one model For each outcome a different regression algorithm was chosen by the Please cite this article as: Lagani, V., et al., Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data, Journal of Diabetes and Its Complications (2015), http://dx.doi.org/10.1016/j.jdiacomp.2015.03.001 V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx Table Risk assessment models Clinical parameters CVD-DCCT (Cox Regression) CVD-EDIC (Accelerated Failure Model) HbA1c Marital Status 0.204 −0.146 (Never Married) 0.095 (Divorced) 0.128 23.635 0.267 37.859 Albumin-urine value (mg/24 h) Age Insulin Regime (Strict/Standard control) Retinopathy level (R0, R1, R2, R3) Total Insulin Daily Dosage (Units/ Weight) Post Pubescent diabetes duration (in months) Total diabetes duration (in months) Presence of neuropathy Patient’s occupation Weight (kg) Smoke (never/exsmoker/current) Patient's body mass index (kg/m2) Patient attempted suicide Creatinine Clearance (ml/min) Family history of IDDM Family History of NIDDM HDL serum cholesterol (mg/dl) Systolic Blood Pressure Past history of severe hypoglycemia Cholesterol (mg/dl) Glomerular filtration rate (ml/min) Gender specific ideal body weight Hospitalization(s) due to ketoacidosis in past year # Parameters Hypoglycemia (Support Vector Machine) Ketoacidosis (Ridge Cox Regression) Microalbuminuria (Random Survival Forest) Proteinuria (Ridge Cox Regression) Neuropathy (Accelerated Failure Model) Retinopathy (Accelerated Failure Model) # Models 0.0001328 5.24E-006 (Widowed) |0.088| |0.043| (Married) 0.15 4.812 −80.667 (Married) 0.236 −0.26 (Married) |0.576| 0.178 5.918 11.394 |0.054| (Strict) −0.036 (Strict) 3 380.5 (Strict) |0.091| (R2) 15.707 132.927 16.113 (R2) 0.00013706 5.778 |0.062| 0.1 0.471 0.129 0.032 −2.61E-005 (Manager) |0.086| 2 −431.822 (ex-smoker) 0.096 0.019 1 14.257 −2.07E-005 −120 −0.1 0.162 141.446 10 3 −0.00011212 0.056 (Manager) 0.074 (Clerical) 0.031 (Laborer) −0.088 (Student) 0.1 −0.114 (Never) 0.128 (Current) 1.437 (R0) −0.931 (R2) 5 0.00010094 8.98E-005 1 −0.00013118 0.00011806 10 5 Each row represents a risk factor, while each column reports a single model The header shows the outcome of interest for each model along with the regression algorithm selected by the model-selection procedure (see the Method section) Cells report model coefficients, with empty cells indicating risk factors not included in the corresponding model Categorical risk factors can have multiple coefficients, one for each category included in the model The semantics of the coefficients depends on the used regression algorithm: log-hazard ratio for Ridge Cox Regression, survival time multipliers for Accelerated Failure Time models and (linear kernel) Support Vector Machines, relative variable importance for Random Survival Forest (see text for more details) The original AFT and SVCR coefficients’ signs have been switched in order to have positive values indicating an increase in the risk in all models Micro-albuminuria coefficients are reported as absolute values whose signs not reflect an increase or decrease of the risk model selection procedure: Ridge Cox Regression for CVD-DCCT, Ketoacidosis and Proteinuria outcomes, Accelerated Failure Time models for CVD-EDIC, Neuropathy and Retinopathy, linear-kernel Support Vector Machines and Random Survival Forest for Hypoglycemia and Microalbuminuria, respectively The corresponding optimal feature selection methods are reported in Supplementary Table Each regression algorithm produces coefficients with a specific interpretation; particularly, Ridge Cox Regression coefficients repre- sent a hazard ratio change in the logarithmic scale This means that for a standard-deviation unit increase (i.e., 1.594%, DCCT scale) in glycated hemoglobin (HbA1c) the hazard of a CVD complication becomes e 0.204 = 1.23 times higher AFT and linear-kernel SVCR coefficients act as linear multipliers for the expected time to event This means that for the same increase in HbA1c the expected time before developing Neuropathy decreases by 4.812 months RSF usually provides highly non-linear models, where the effect of each Please cite this article as: Lagani, V., et al., Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data, Journal of Diabetes and Its Complications (2015), http://dx.doi.org/10.1016/j.jdiacomp.2015.03.001 V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx Table Results of the internal and external validation of the models Type I Diabetes Internal Validation Type I Diabetes External Validation Type II Diabetes External Validation Model name # Events Average CI Cross-Validation CI Interval p-value H0: Aver CI ≤ 0.5 # Events CI CI 95% Confidence Interval p-value H0: CI ≤ 0.5 # Events CI CI 95% Confidence Interval p-value H0: CI ≤ 0.5 CVD-DCCT CVD-EDIC Hypoglycemia Ketoacidosis Microalbuminuria Proteinuria Neuropathy Retinopathy 28 127 408 130 299 44 149 969 0.7257 0.6204 0.6694 0.6745 0.7421 0.8330 0.6661 0.6564 [0.50962–0.8629] [0.5549–0.69224] [0.58766–0.75118] [0.59412–0.75479] [0.6751–0.77652] [0.53521–0.96223] [0.54626–0.74187] [0.60826–0.6745] 0.0001 ≤0.0001 ≤0.0001 ≤0.0001 ≤0.0001 ≤0.0001 ≤0.0001 ≤0.0001 5 6 17 0.6887 0.4862 0.6903 0.8182 0.824 – 0.735 0.7201 [0.4923–0.86207] [0.18084–0.81984] [0.5–0.8691] [0.23077–1] [0.66234–0.96875] – [0.55102–0.90754] [0.58669–0.8745] 0.0932 0.5246 0.0584 0.0367 0.0078 – 0.0429 0.0025 32 33 – 116 28 20 70 0.7143 0.6099 0.7002 [0.62384–0.80563] [0.50211–0.71809] [0.19012–0.97115] – [0.52144–0.62193] [0.53261–0.77125] [0.32132–0.56216] [0.47399–0.6189] b0.0001 0.0165 0.0084 – 0.0058 0.0027 0.8239 0.119 0.5701 0.6569 0.4359 0.5451 External validation was separately performed on Type I and Type II diabetes patients, while internal validation was performed only on Type I patients (as the DCCT study focused exclusively on Type I diabetes) For the internal validation and for each model (rows) we report the total number of events, the predictive performance expressed as nested-cross validated Concordance Index (CI), the interval spanned by the CI values obtained in the external loop of the nested-cross validation, and a p-value assessing the null-hypothesis that the CI is less or equal than 0.5, i.e., that the risk stratification provided by the model is not better than random For the external validations we report the CI values obtained by applying the final models on the external cohorts, along with the 95% confidence interval estimated through bootstrapping The p-values for the internal evaluation are calculated through one-tail t-test, while for the external evaluation they are obtained through a permutation-based test (see text for more detail) single risk factor can vary depending on the values of the other predictors Consequently, covariates in an RSF model not have a univocal coefficient, i.e., it is not generally possible to assess if the factor has a protective or deleterious effect However, a method has been developed for estimating Variable IMPortance (VIMP) in the RSF models, where the VIMP is proportional to the contribution of the variable in the predictive performance of the model (Ishwaran, 2007) The VIMP values for the Microalbuminuria model in Table have been scaled in order to sum up to one, for ease of comparison Given these different interpretations, it is not possible to compare effect-sizes across different models However, within each model the absolute value of each coefficient is directly proportional to the effect size of the corresponding predictors, and can be used for raking factors among each other We further set the signs of all coefficients such that positive values indicate an increment in risk while negative values indicate a decrease (except for the VIMP values of the RSF model that are reported in absolute value) observed CI value with the null-distribution obtained by randomly permuting 10,000 times the order of the predictions For Type I diabetes, several models manage to achieve a relevant and statistically significant predictive performance, particularly the Microalbuminuria, Neuropathy and Retinopathy models The CVD-DCCT and Hypoglycemia are also borderline significant The validation cohorts for the remaining models contain less than events, and the respective results should be considered carefully The external validation on Type II patients brought positive results as well Particularly, both CVD models, as well as the Microalbuminuria and Proteinuria models achieve statistically significant results on a relatively large number of events The results of the Hypoglycemia model are barely significant, but it is interesting to note that this model achieves almost identical results in both Type I and II external cohorts The Neuropathy and Retinopathy models did not prove to be better than random, and the Ketoacidosis model was not applicable on Type II diabetes patients 3.2 Internal and external validation 3.3 Calibration and re-assessment of the risk models Table reports and contrasts the results of both internal and external validation For the internal validation, we report the average CI values obtained on the DCCT data through the nested-cross validation procedure These values represent our expectations on the performances that the models should achieve when applied on a validation cohort coming from the same population of the training data, i.e., a hypothetical validation cohort collected in the same years, in the same geographical area and with similar characteristics of the DCCT data (Tsamardinos et al., 2014) Models’ results lay in the range [0.6024– 0.8333], meaning that we expect all models to provide a relevant improvement with respect to a random ranking the patients (CI = 0.5) For all models, the CI is statistically significantly greater than 0.5 (p-value ≤ 0.001, as calculated with a one-tail t-test) For each model, we also report the interval spanned by the CI values calculated over the external folds of the nested-cross validation procedure For the external validation, the final models were separately applied on the Type I and Type II diabetes patients of the Chorleywood cohort The resulting CI values estimate the predictive ability of the models on a UK-based population collected in recent times Interestingly, the models perform surprisingly well, reaching performances statistically significantly different from random guessing for several models For each model we also report the bootstrapped estimates (Efron & Tibshirani, 1986) of the 95% confidence interval and a permutation-based p-value assessing the null hypothesis H0: CI ≤ 0.5 These permutation p-values are obtained by comparing the Risk factors’ effect on the probability of developing diabetesrelated complications may differ across geographical areas or over time, due to several reasons For example, the association between a given risk factor and the outcome may be (partially) mediated by a third, unknown and unmeasured quantity If the value of this third quantity changes across different places, or over time, then also the association between the risk factor and the outcome changes or even ceases It is worthwhile to underline that the DCCT and Chorleywood cohorts were collected in different countries, and the DCCT data collection started in 1983, while the earliest recorded visit in Chorleywood was performed in 2004 (N20 years difference) Moreover, treatment options for diabetes patients (Franz et al., 2003; Gallen, 2004) and nutritional habits (Kuklina, Carrol, Shaw, & Hirsch, 2013) have provably undergone considerable changes during this period This implies that the models derived from the DCCT data may need to be re-calibrated or revised in order to provide accurate predictions on the Chorleywood cohorts, since the effects of the risk factors may differ between the two populations We follow the approach suggested by Van Houwelingen (2000) for assessing the calibration of the single-point risk estimates r against a known outcome O = {(δi, ti)} The approach consists in fitting a Cox regression model h(t|r) = h0(t)exp(α ⋅ r), where h(t|r) is the hazard at time t given r, h0 is the baseline hazard function, and α is the single coefficient of the model A perfectly calibrated model would produce Please cite this article as: Lagani, V., et al., Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data, Journal of Diabetes and Its Complications (2015), http://dx.doi.org/10.1016/j.jdiacomp.2015.03.001 V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx Table Results of models’ recalibration and re-assessment Type I Diabetes Revised models Type II Diabetes Revised Models Model name # Events Calibration α Average CI Cross-Validation CI Interval p-value H0: Aver CI ≤ 0.5 # Events Calibration α Average CI Cross-Validation CI Interval p-value H0: Aver CI ≤ 0.5 CVD–DCCT CVD–EDIC Hypoglycemia Ketoacidosis Microalbuminuria Proteinuria Neuropathy Retinopathy 5 6 17 0.4989 −0.0011 0.0015 1502.791 0.0975 – 0.0096 0.6625 0.6422 0.7168 1 – 0.6496 0.7521 – – – – – – – [0.5–1] b0.0001 0.2712 0.0284 b0.0001 b0.0001 – 0.2436 0.0039 32 33 – 116 28 20 70 1.2238 0.0013 0.0018 – 0.0368 1.7807 –0.0036 0.0822 0.6757 0.6621 0.5867 – 0.5932 0.7343 0.5285 0.5664 [0.54386–0.92683] [0.54348–0.83871] – – [0.41146–0.67516] [0.55102–0.94286] [0.076923–0.87234] [0.42063–0.76238] 0.003 0.0002 0.3102 – 0.0023 0.0002 0.3833 0.0381 For each outcome and external cohort, the calibration of the corresponding model is assessed (a) by applying the model on the external cohort and (b) by using the resulting vector of risk scores ri as a predictor in a Cox regression Cox coefficients close to one indicate well calibrated models The predictive capabilities of the selected signatures are then re-assessed using only the external cohort data Specifically, for each outcome and external cohort the predictive performance of the selected signature, regression method and hyper-parameter configuration is assessed through ten-fold cross-validation For each model and each cohort the number of events, the calibration Cox regression coefficient α, the cross-validated CI value along with its corresponding interval over the cross-validation folds are reported The statistical significance of the CI values is assessed through a one-tail t-test Outcomes with fewer than 10 recorded events were evaluated with a leave-one-out cross validation schema, which allows better performance estimation α ≈ 1, while higher or lower values would indicate an underestimation or over-estimation of the actual risk, respectively Table shows the calibration Cox regression coefficients for each outcome The most calibrated models seem to be the ones corresponding to CVD-DCCT, Proteinuria and Retinopathy (the latter on the Type I cohort only), while all the other models seem to provide predictions that are somewhat overly optimistic or pessimist These results suggest that the models should be revised and re-evaluated on the new data in order to provide more accurate predictions We thus decided to re-fit the coefficients of the models on the external cohorts and to assess the predictive performances of the revised models through cross validation Specifically, for each outcome and external cohort we performed a ten-fold cross-validation by using the same signature, regression method and hyper-parameter configuration selected on the DCCT/EDIC data For outcomes with fewer than 10 recorded events we employed a leave-one-out cross-validation schema, and the performance was calculated on all predictions pooled together The adoption of this revision procedure implies that we assume that the signatures selected on the data from the DCCT baseline visits have a valuable predictive power also for the Chorleywood cohort The results of model revision are reported in Table All the models showed at least a slight improvement in terms of average CI, except for the CVD-DCCT and Hypoglycemia models in the Type II diabetes cohort and for Neuropathy in the Type I cohort Some models achieve perfect score (CI = 1), although the limited number of events available for these outcomes suggests to consider these results carefully Discussion 4.1 Main findings The main contribution of the present work consists of the derivation of a set of computational models for assessing the risk of developing diabetes-related complications The models have been derived on the basis of the baseline-visit data of the DCCT study and of the DCCT/EDIC follow-up information Furthermore, the derivation of the models led to the identification of the minimal-size, maximally predictive set of features for each considered outcome, out of an initial set of fifty-one clinical parameters measured in the DCCT baseline visit Table reports the clinical parameters included in each risk assessment model, along with their respective coefficients Negative coefficients indicate protective factors, while factors with positive coefficients are associated with increasing risk The level of glycated hemoglobin HbA1c demonstrated to be the most relevant risk factor, being included in seven models out of eight Particularly, high values of HbA1c are associated with increased risk of developing diabetes-related complications This is perfectly in line with the current literature (Huang, Liu, Moffet, John, & Karter, 2011; Marcovecchio, Dalton, Chiarelli, & Dunger, 2011; Weber & Schnell, 2009) and in particular with the previous studies on the DCCT cohort (The Diabetes Control and Complications Trial Research Group, 1996) Our analyses also point out the relevance of the marital status for predicting the probability of developing diabetes-related complications and adverse events Being married is associated with a lower risk of experiencing hypoglycemia or retinopathy worsening The presence of a spouse is known to have a beneficial effect in different pathologies (Chung, Moser, Lennie, & Riegel, 2006; Goodwin, Hunt, Key, & Samet, 1987; Sugarman, Bauer, Barber, Hayes, & Hughes, 1993), and a recent work has demonstrated that, in heart failure patients, this beneficial effect is mediated by the medication adherence (Wu et al., 2014) Thus, a possible explanation for our results is that being married increases the adherence to medication or diet, and this in turn improves the patient’s prognosis For the CVD and Ketoacidosis models being respectively divorced or widowed increases the risk of experiencing an adverse event In this case the marital status may act as a proxy for the patient’s ages, since both divorced and widowed DCCT sub-cohorts are characterized by an older age than the rest The baseline value of the urine-albumin excretion rate turns out to be predictive of renal complications (i.e., Microalbuminuria and Proteinuria), a result already known in the medical literature (Newman et al., 2005), and for the development of cardiovascular diseases and Neuropathy The CVD-DCCT and CVD-EDIC models are in agreement with the CVD risk factors previously identified on the DCCT/EDIC data; particularly, all elements in the signature of the DCCT-EDIC model are listed among the clinical characteristics at DCCT baseline that were significantly associated with cardiovascular disease over the course of the DCCT/EDIC Study (Nathan et al., 2005) The predictive signatures of both CVD-DCCT and CVD-EDIC models closely resemble the results of different studies focusing on identifying relevant risk factor for cardiovascular complications in diabetes patients Particularly, our results are in good agreement with the results of the UK Prospective Diabetes Study (UKPDS) The UKPDS was a landmark randomized controlled trial, conducted over a period of 14 years (1977–1991) and involved 5102 patients followed, on average, for a period of 10.7 years The study actually showed that strict control of blood glucose and blood pressure can lower the risk of diabetes-related complications in individuals recently diagnosed with Type II diabetes (Turner & Holman, 1996) Several risk assessment models were developed on the basis of the UKPDS data The first UKPDS model (Stevens, Kothari, Adler, & Stratton, Please cite this article as: Lagani, V., et al., Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data, Journal of Diabetes and Its Complications (2015), http://dx.doi.org/10.1016/j.jdiacomp.2015.03.001 V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx 2001) included Age, Gender, Race, Smoking, HbA1c, Systolic Blood Pressure and Total Cholesterol/HDL Cholesterol ratio as predictors, and focused on assessing the probability of developing Coronary Hearth Diseases (CHD) The second version of the model (Clarke et al., 2004) provides seven different mathematical equations for predicting as many diabetes-related complications (stroke, heart failure, fatal or non-fatal MI, other IHD, amputation, renal failure and blindness) and three different equations for assessing the risk of mortality This second model is based on the same predictors of the first one, but it also includes information about the patients’ medical history (previous occurrences of diabetes-related adverse events) and physiology (Body Mass Index, BMI) The latest version of the UKPDS model was published recently (Hayes, Leal, Gray, Holman, & Clarke, 2013), and it slightly modifies the previous versions by including information about micro or macro-albuminuria, estimated GFR, heart rate, white blood cell count and hemoglobin Interestingly, both our CVD models include a subset of UKPDS predictors, namely Age, Smoking and HbA1c The CVD-DCCT model also includes Systolic Blood Pressure and Weight, both considered in the latest version of the UKPDS engine Moreover, our CVD models and the UKPDS models are fully in agreement regarding the direction of effect of the common predictors, i.e., all common predictors act as risk factors, and never as protective factors The Hypoglycemia model suggests that being married and having a family history of non-insulin dependent diabetes have a protective effect against hypoglycemic events, while a past history of severe hypoglycemia, strict glucose control and an elevated number of insulin units per kg of weight significantly increase patient’s risk It is worthwhile to note that the negative effect of strict glucose control on the probability of experiencing hypoglycemia was one of the main outcomes of the DCCT study In particular, strict glucose control is known to lower the risk of several diabetes-related complications except hypoglycemia (The Diabetes Control and Complications Trial Research Group, 1995a) The Ketoacidosis model includes several factors, the most relevant ones being (according to the magnitude of their respective coefficients) HbA1c, Total Insulin Dosage, Post-Pubescent diabetes duration, Cholesterol, Hospitalization(s) due to ketoacidosis in past year (risk factors) and Gender specific ideal body weight (protective factor) To the best of our knowledge this is the first study providing a predictive model for assessing the risk of experiencing ketoacidosis Studies investigating the association of clinical parameters with ketoacidosis exist (Egger, Davey Smith, Stettler, & Diem, 1997), however they not provide quantitative models for the estimation of the risk of ketoacidosis These studies generally point out that an intensified treatment is associated with the probability of experiencing ketoacidosis, which is in agreement with our results The two models related to renal complications (Microalbuminuria and Proteinuria) share several predictive factors, whose relevance in the development of renal complication in diabetes patients is already known in the literature and was even assessed on the DCCT data (Lopes-Virella et al., 2013): HbA1c (The Diabetes Control and Complications Trial Research Group, 1996), Albumin-urine value over 24 h (Newman et al., 2005), Insulin Regime (The Diabetes Control and Complications Trial Research Group, 1995c) and Total diabetes duration A recent study (Vergouwe et al., 2010) conducted on 1115 Type I diabetes patients also confirms the relevance of HbA1c and Albumin-urine value for predicting the progression of microalbuminuria, while another study (Elley et al., 2013) conducted on a large New Zealand cohort (25,736 Type II diabetes patients) and focusing on End-Stage Renal Diseases (ESRD) also identifies HbA1c and Total diabetes duration as relevant risk factors The Neuropathy and Retinopathy models also share part of their predictors, particularly HbA1c, the Retinopathy level at baseline, and Post-pubescent diabetes duration, all factors that were found to be associated with low peripheral nerve conduction (an indicator of neuropathy) in a study of 456 diabetes Type I individuals (Charles et al., 2010) The association between HbA1c and Retinopathy progression has been already studied and established (The Diabetes Control and Complications Trial Research Group, 1995b) One further relevant contribution of our study is the validation of the models on the retrospective cohort collected in the Chorleywood Health Center For the Type I diabetes external cohort, four models out of seven achieved statistically significant (p-value b 0.05) results, while two models (CVD-DCCT and Hypoglycemia) achieved appreciable CI performance (0.6887 and 0.6903, respectively), also borderline statistically significant In the case of the Type II diabetes cohort, five models out of seven achieved results statistically significantly better than random guessing (CI N 0.5) Models’ transferability generally increases when the models are re-calibrated on the new data while the original predictive factors are conserved All revised models perform better in terms of CI than the original models, with the exception of Neuropathy for Type I and CVD-DCCT/Hypoglycemia for Type II diabetes cohorts However, we note that for these models the revised CI values are within the 95% confidence interval of the CI results of the original models In general, these results support our hypothesis that the predictive signatures selected on the DCCT/EDIC data are able to give accurate predictions on the cohorts collected in Chorleywood 4.2 Study limitations The first relevant limitation of this study is the relatively restricted number of subjects and adverse events in the external validation cohorts In some cases the scarcity of recorded events did not allow a precise estimation of the models’ performances and respective confidence intervals Thus, our results only suggest that our models successfully transfer across populations, but more extensive studies on larger cohorts of Type I and Type II diabetes patients are needed in order to gather further evidences One more limitation concerns the Hypoglycemia and Ketoacidosis models Accurately evaluating the probability of experiencing these adverse events would require some short-term information about nutrition and physical activity, not present in the list of considered predictors Despite this limitation, both models achieve good level of predictive performances, in both the internal and external validation Conclusions We use the DCCT/EDIC data for deriving a set of computational models for assessing the risk of developing diabetes-related complications in diabetes patients Each model is defined over a parsimonious set of predictors (clinical parameters) with maximal predictive power for its specific outcome Predictors included in the models are generally in agreement with the current literature regarding risk factors for diabetes-related complications When applied on a retrospective validation cohort collected in UK, the models often provide predictions that are significantly better than random, supporting the hypothesis that the models transfer on a population that is geographically distant and more recent than the one originally examined in the DCCT/EDIC studies Future works will focus on the validation of the models on larger cohorts of diabetes patients, both Type I and Type II, in order to further strengthen the results hereinto presented Acknowledgements This work was performed in the framework of the FP7 Integrated Project REACTION (Remote Accessibility to Diabetes Management and Therapy in Operational Healthcare Networks) partially funded by the European Commission under Grant Agreement 248590 The work was also partially funded by the EPILOGEAS GSRT ARISTEIA II project, No 3446 Please cite this article as: Lagani, V., et al., Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data, Journal of Diabetes and Its Complications (2015), http://dx.doi.org/10.1016/j.jdiacomp.2015.03.001 V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx The Diabetes Control and Complications Trial (DCCT) and its follow-up the Epidemiology of Diabetes Interventions and Complications (EDIC) study were conducted by the DCCT/EDIC Research Group and supported by National Institute of Health grants and contracts and by the General Clinical Research Center Program, NCRR The data (and samples) from the DCCT/EDIC study were supplied by the NIDDK Central Repositories This manuscript was not prepared under the auspices of the DCCT/EDIC study and does not represent analyses or conclusions of the DCCT/EDIC study group, the NIDDK Central Repositories, or the NIH The authors would also like to thank the medical and technical personnel of the Chorleywood Health Center for their indispensable assistance Appendix A Supplementary data Supplementary data and methods to this article can be found online at http://dx.doi.org/10.1016/j.jdiacomp.2015.03.001 References Ajmera, I., Swat, M., Laibe, C., Le, Novère N., & Chelliah, V (2013) The impact of mathematical modeling on the understanding of diabetes and related complications CPT pharmacometrics Syst Pharmacol2 (pp e54), e54 ([Internet] Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3731829&tool= pmcentrez&rendertype=abstract/nhttp://www.scopus.com/inward/record.url? eid=2-s2.0-84881162079&partnerID=tZOtx3y1) Bøvelstad, H M., Nygård, S., Størvold, H L., Aldrin, M., Borgan, Ø., Frigessi, A., et al (2007) Predicting survival from microarray data—A comparative study Bioinformatics, 23, 2080–2087 Charles, M., Soedamah-Muthu, S S., Tesfaye, S., Fuller, J H., Arezzo, J C., Chaturvedi, N., et al (2010) Low peripheral nerve conduction velocities and amplitudes are strongly related to diabetic microvascular complications in type diabetes: The EURODIAB Prospective Complications Study Diabetes Care, 33(12), 2648–2653 ([Internet] [cited 2014 Nov 11] Available from: http://www.pubmedcentral.nih gov/articlerender.fcgi?artid=2992206&tool=pmcentrez&rendertype=abstract) Chung, M L., Moser, D K., Lennie, T A., & Riegel, B (2006) Abstract 2509: Spouses enhance medication adherence in patients with heart failure Circulation, 114(18_ MeetingAbstracts), II_518 ([cited 2014 Nov 9] Available from: http://circ ahajournals.org/cgi/content/meeting_abstract/114/18_MeetingAbstracts/II_518) Clarke, P M., Gray, A M., Briggs, A., Farmer, A J., Fenn, P., Stevens, R J., et al (2004) A model to estimate the lifetime health outcomes of patients with type diabetes: The United Kingdom Prospective Diabetes Study (UKPDS) Outcomes Model (UKPDS no 68) Diabetologia, 47(10), 1747–1759 ([cited 2014 Nov 11] Available from: http://www.ncbi.nlm.nih.gov/pubmed/15517152) Cox, D R (1972) Regression models and life-tables Journal of the Royal Statistical Society, Series B, 34, 187–220 Efron, B., & Tibshirani, R (1986) Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy Statistical Science, 1(1), 54–75 (Institute of Mathematical, Statistics; [cited 2014 Oct 21]) Egger, M., Davey Smith, G., Stettler, C., & Diem, P (1997) Risk of adverse effects of intensified treatment in insulin-dependent diabetes mellitus: A meta-analysis Diabetic Medicine, 14(11), 919–928 (cited 2014 Nov 8] Available from: http:// www.ncbi.nlm.nih.gov/pubmed/9400915) Elley, C R., Robinson, T., Moyes, S A., Kenealy, T., Collins, J., Robinson, E., et al (2013) Derivation and validation of a renal risk score for people with type diabetes Diabetes Care, 36, 3113–3120 Epidemiology of Diabetes Interventions and Complications (EDIC) Research Group (1999) Design, implementation, and preliminary results of a long-term follow-up of the Diabetes Control and Complications Trial cohort Diabetes Care, 22(1), 99–111 ([cited 2014 Sep 13] Available from: http://www.pubmedcentral.nih.gov/ articlerender.fcgi?artid=2745938&tool=pmcentrez&rendertype=abstract) Faraggi, D., & Simon, R (1998) Bayesian variable selection method for censored survival data Biometrics, 54, 1475–1485 Fawcett, T (2006) An introduction to ROC analysis Pattern Recognition Letters, 27, 861–874 Franz, M J., Warshaw, H., Daly, A E., Green-Pastors, J., Arnold, M S., & Bantle, J (2003) Evolution of diabetes medical nutrition therapy Postgraduate Medical Journal, 79(927), 30–35 ([cited 2014 Oct 17] Available from: http://www.pubmedcentral nih.gov/articlerender.fcgi?artid=1742592&tool=pmcentrez&rendertype= abstract) Gallen, I (2004) Review: The evolution of insulin treatment in type diabetes: The advent of analogues The British Journal of Diabetes & Vascular Disease, 4(6), 378–381 (cited 2014 Oct 17) Goodwin, J S., Hunt, W C., Key, C R., & Samet, J M (1987) The effect of marital status on stage, treatment, and survival of cancer patients JAMA, 258, 3125–3130 Hayes, A J., Leal, J., Gray, A M., Holman, R R., & Clarke, P M (2013) UKPDS outcomes model 2: A new version of a model to simulate lifetime health outcomes of patients with type diabetes mellitus using data from the 30 year United Kingdom Prospective Diabetes Study: UKPDS 82 Diabetologia, 56(9), 1925–1933 ([cited 2014 Nov 11] Available from: http://www.ncbi.nlm.nih.gov/pubmed/23793713) Huang, E S., Liu, J Y., Moffet, H H., John, P M., & Karter, A J (2011) Glycemic control, complications, and death in older diabetic patients Diabetes Care, 34, 1329–1336, http://dx.doi.org/10.2337/dc10-2377 (Available from:) Ishwaran, H (2007) Variable importance in binary regression trees and forests Electronic Journal of Statistics, 1, 519–537 (Institute of Mathematical, Statistics; [cited 2014 Oct 10]) Ishwaran, H., Kogalur, U B., Blackstone, E H., & Lauer, M S (2008) Random survival forest Annals of Applied Statistics, 2(3), 841–860 (cited 2014 Feb 27) Kalbfleisch, J D., & Prentice, R L (1980) The statistical analysis of failure time data Internet New York: John Wiley and Sons (Available from: http://proquest.umi.com/pqdweb? did=745641091&Fmt=7&clientId=3748&RQT=309&VName=PQD) Kuklina, E V., Carrol, M D., Shaw, K M., & Hirsch, R (2013) Trends in high LDL cholesterol, cholesterol-lowering medication use, and dietary saturated-fat intake: United States, 1976–2010 [Internet] p Available from: http://www.cdc.gov/ nchs/data/databriefs/db117.pdf Lagani, V., & Tsamardinos, I (2010) Structure-based variable selection for survival data Bioinformatics, 26(15), 1887–1894 (Available from: http://www.ncbi.nlm.nih.gov/ pubmed/20519286) Lopes-Virella, M F., Baker, N L., Hunt, K J., Cleary, P a, Klein, R., & Virella, G (2013) Baseline markers of inflammation are associated with progression to macroalbuminuria in type diabetic subjects Diabetes Care, 36, 2317–2323 (Available from: http://www.ncbi.nlm.nih.gov/pubmed/23514730) Marcovecchio, M L., Dalton, R N., Chiarelli, F., & Dunger, D B (2011) A1C variability as an independent risk factor for microalbuminuria in young people with type diabetes Diabetes Care, 34, 1011–1013 Nathan, D M., Cleary, P A., Backlund, J -Y C., Genuth, S M., Lachin, J M., Orchard, T J., et al (2005) Intensive diabetes treatment and cardiovascular disease in patients with type diabetes The New England Journal of Medicine, 353, 2643–2653 Newman, D J., Mattock, M B., Dawnay, A B S., Kerry, S., McGuire, A., Yaqoob, M., et al (2005) Systematic review on urine albumin testing for early detection of diabetic complications Health Technology Assessment, 9(30), iii–vi ([cited 2014 Nov 9], xiii– 163 Available from: http://www.ncbi.nlm.nih.gov/pubmed/16095545) Palmer, A J (2013) Computer modeling of diabetes and its complications: A report on the fifth Mount Hood challenge meeting Value Health, 16, 670–685 Shivaswamy, P K., Chu, W C W., & Jansche, M (2007) A support vector approach to censored targets Seventh IEEE Int Conf Data Min (ICDM 2007) Statnikov, A., Aliferis, C F., Tsamardinos, I., Hardin, D., & Levy, S (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis Bioinformatics, 21(5), 631–643 ([cited 2014 Jan 19] Available from: http://www.ncbi.nlm.nih.gov/pubmed/15374862) Stevens, R J., Kothari, V., Adler, A I., & Stratton, I M (2001) The UKPDS risk engine: A model for the risk of coronary heart disease in type II diabetes (UKPDS 56) Clinical Science (London), 101, 671–679 Subramanian, J., & Simon, R (2010) What should physicians look for in evaluating prognostic gene-expression signatures? Nature Reviews Clinical Oncology, 7(6), 327–334, http://dx.doi.org/10.1038/nrclinonc.2010.60 (Nature Publishing Group; [cited 2014 Aug 3] Available from:) Sugarman, J R., Bauer, M C., Barber, E L., Hayes, J L., & Hughes, J W (1993) Factors associated with failure to complete treatment for diabetic retinopathy among Navajo Indians Diabetes Care, 16(1), 326–328 ([cited 2014 Nov 9] Available from: http://www.ncbi.nlm.nih.gov/pubmed/8422803) The DCCT Research Group (1987) Feasibility of centralized measurements of glycated hemoglobin in the Diabetes Control and Complications Trial: A multicenter study Clinical Chemistry, 33(12), 2267–2271 ([cited 2014 Sep 13] Available from: http:// www.ncbi.nlm.nih.gov/pubmed/3319291) The Diabetes Control and Complications Trial Research Group (1993) The effect of intensive treatment of diabetes on the development and progression of long-term complications in insulin-dependent diabetes mellitus The New England Journal of Medicine, 329(14), 977–986, http://dx.doi.org/10.1056/NEJM199309303291401 ([cited 2014 Jul 22] Available from: http://www.ncbi.nlm.nih.gov/pubmed/ 8366922/n) The Diabetes Control and Complications Trial Research Group (1995a) Adverse events and their association with treatment regimens in the diabetes control and complications trial Diabetes Care, 18, 1415–1427 (Available from: http://eutils.ncbi.nlm.nih.gov/ entrez/eutils/elink.fcgi?dbfrom=pubmed&id=8722064&retmode=ref&cmd= prlinks/npapers2://publication/uuid/BFE8DB4C-0CDB-4977–B262–5947EE56DDDE) The Diabetes Control and Complications Trial Research Group (1995b) The Relationship of Glycemic Exposure (HbAlc) to the Risk of Development and Progression of Retinopathy in the Diabetes Control and Complications Trial Diabetes, 44, 968–983 The Diabetes Control and Complications Trial Research Group (1995c) Effect of intensive therapy on the development and progression of diabetic nephropathy in the Diabetes Control and Complications Trial Kidney International, 47, 1703–1720 The Diabetes Control and Complications Trial Research Group (1995d) The effect of intensive diabetes therapy on the development and progression of neuropathy Annals of Internal Medicine, 122(8), 561–568 ([cited 2014 Sep 15] Available from: http://www.ncbi.nlm.nih.gov/pubmed/7887548) The Diabetes Control and Complications Trial Research Group (1995e) The effect of intensive diabetes treatment on the progression of diabetic retinopathy in insulindependent diabetes mellitus Archives of Ophthalmology, 113(1), 36–51 ([cited 2014 Sep 15] Available from: http://www.ncbi.nlm.nih.gov/pubmed/7826293) The Diabetes Control and Complications Trial Research Group (1996) The absence of a glycemic threshold for the development of long-term complications: The perspective of the Diabetes Control and Complications Trial Diabetes, 45(10), Please cite this article as: Lagani, V., et al., Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data, Journal of Diabetes and Its Complications (2015), http://dx.doi.org/10.1016/j.jdiacomp.2015.03.001 V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx 1289–1298 ([cited 2014 Nov 9] Available from: http://www.ncbi.nlm.nih.gov/ pubmed/8826962) The Diabetes Control and Complications Trial Research Group (1997) Clustering of long-term complications in families with diabetes in the diabetes control and complications trial Diabetes, 46, 1829–1839 The National Collaborating Centre for Chronic Conditions (2008) Type Diabetes, National clinical guideline for management in primary and secondary care (update) The Royal College of Ophthalmologists (2012) Diabetic Retinopathy Guidelines (London) Tibshirani, R (1997) The lasso method for variable selection in the Cox model Statistics in Medicine, 16, 385–395 Tsamardinos, I., Brown, L E., & Aliferis, C F (2006) The max-min hill-climbing Bayesian network structure learning algorithm Machine Learning, 65(1), 31–78 Tsamardinos, I., Lagani, V., & Rakhshani, A (2014) Performance-Estimation Properties of Cross-Validation-Based Protocols with Simultaneous HyperParameter Optimization SETN’14 Proceedings of the 79 h Hellenic conference on Artificial Intelligence Turner, R C., & Holman, R R (1996) The UK Prospective Diabetes Study UK Prospective Diabetes Study Group Annals of Medicine, 439–444 Uno, H., Cai, T., Pencina, M J., D’Agostino, R B., & Wei, L J (2011) On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data Statistics in Medicine, 30, 1105–1117 Van Houwelingen, H C (2000) Validation, calibration, revision and combination of prognostic survival models Statistics in Medicine, 19, 3401–3415 Van Houwelingen, H C., Bruinsma, T., Hart, A A M., Van’t Veer, L J., & Wessels, L F A (2006) Cross-validated Cox regression on microarray gene expression data Statistics in Medicine, 25, 3201–3216 Vergouwe, Y., Soedamah-Muthu, S S., Zgibor, J., Chaturvedi, N., Forsblom, C., SnellBergeon, J K., et al (2010) Progression to microalbuminuria in type diabetes: Development and validation of a prediction rule Diabetologia, 53, 254–262 Weber, C., & Schnell, O (2009) The assessment of glycemic variability and its impact on diabetes-related complications: An overview Diabetes Technology & Therapeutics, 11, 623–633 Wu, J -R., Lennie, T A., Chung, M L., Frazier, S K., Dekker, R L., Biddle, M J., et al (2014) Medication adherence mediates the relationship between marital status and cardiac event-free survival in patients with heart failure Heart & Lung, 41(2), 107–114 (cited 2014 Nov 9] Available from: http://www.pubmedcentral.nih.gov/ articlerender.fcgi?artid=3288268&tool=pmcentrez&rendertype=abstract) Please cite this article as: Lagani, V., et al., Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data, Journal of Diabetes and Its Complications (2015), http://dx.doi.org/10.1016/j.jdiacomp.2015.03.001 ... Lagani, V., et al., Development and validation of risk assessment models for diabetes- related complications based on the DCCT/ EDIC data, Journal of Diabetes and Its Complications (2015), http://dx.doi.org/10.1016/j.jdiacomp.2015.03.001... models have been derived on the basis of the baseline-visit data of the DCCT study and of the DCCT/ EDIC follow-up information Furthermore, the derivation of the models led to the identification... Lagani, V., et al., Development and validation of risk assessment models for diabetes- related complications based on the DCCT/ EDIC data, Journal of Diabetes and Its Complications (2015), http://dx.doi.org/10.1016/j.jdiacomp.2015.03.001

Ngày đăng: 01/11/2022, 09:44

Tài liệu cùng người dùng

Tài liệu liên quan