80 SECTION II Pediatric Critical Care Tools and Procedures development in the United States are classified into four phases by the FDA In each progressive phase, the number of patients, complexity, du[.]
80 S E C T I O N I I Pediatric Critical Care: Tools and Procedures development in the United States are classified into four phases by the FDA In each progressive phase, the number of patients, complexity, duration, and costs of the trial increase The complete process is long; the average interval from drug development to market is approximately 10 years.58 The longest portion of this interval is usually drug testing in clinical trials, which collectively occur over to years Phase I trials are dosing trials, in which a small number of human subjects receive several doses of the study drug to assess its pharmacokinetics, pharmacodynamics, and side effects in either healthy volunteers or patients with terminal conditions with few remaining treatment options A phase II trial evaluates drug efficacy, usually in the patient population of ultimate interest, while continuing to monitor safety and side effects in more subjects over a longer period Finally, phase III trials are rigorously designed RCTs, with strictly defined outcomes or clinical end points, enrolling hundreds to thousands of patients across multiple centers Phase IV trials are postmarket trials conducted following drug approval by the FDA These studies, in addition to consumer and clinician reporting of drug-associated safety concerns, allow for ongoing evaluation of rare adverse events associated with the drug and evaluation in new populations.58 effects Also, performing multiple additional statistical tests increases the likelihood of a type I error or identifying an effect by chance simply because so many tests were done For these reasons, subgroup analyses should be prespecified and limited in number.60 Hypothesis Testing and Determining the Study Result Inference and Estimate of Effect The results of research studies are judged by their reliability and validity A trial is reliable if it is repeated under the same circumstances and the same results are achieved A study has internal validity if the results are real and not due to bias, chance, or confounding; it has external validity if its results can be generalized to a broader population A clinical trial observes the effect of an intervention in a small sample of patients; however, researchers want to generalize these results to the entire (theoretical) population Statistical analysis allows this generalization to be made Based on the study design and distribution of study measurements, researchers choose an appropriate statistical test to compare the study results against the null hypothesis For binary outcome measures (e.g., mortality), the estimate of effect is usually expressed as a relative risk or a risk ratio: the proportion of subjects with the outcome in one group divided by the proportion of subjects with the outcome in the second group Alternatively, relative odds or odds ratio may be reported Odds are calculated as the ratio of number of events to nonevents, and the odds ratio is the odds of the event in one group divided by the odds of the event in the second group For rare events, the odds ratio will approximate the relative risk The odds ratio is amenable to mathematical operations and can be generated from logistic regression analyses, which can include adjustment for confounding factors in calculating the estimate of effect Whom to Analyze? 95% Confidence Interval All randomized studies should be analyzed based on original group assignment, or intention to treat By including all patients randomized into each group—by the treatment that they were intended to receive—all consequences of the treatment and all benefits of balance achieved by randomization are preserved It can be tempting to analyze based on actual receipt or completion of treatment, termed per-protocol analysis, which excludes patients who crossed over between treatment groups or excludes patients after randomization for other reasons While these analyses will reduce dilution and possibly identify a greater treatment effect, they will also lose the benefits of randomization and will introduce selection bias in the result.4 However, the most appropriate approach to analysis depends on the trial Pragmatic trials, which aim to test the real-world performance of an intervention in a broad population of patients in a clinical setting, differ from explanatory trials, which aim to identify the biological effect of an intervention in a more idealized setting.4 In pragmatic trials in which loss to follow-up and incomplete adherence are an expected part of the intervention, per-protocol analysis may help translate trial results to alternate clinical settings in which differences in adherence, demographics, and other factors may substantially influence the effect of the intervention.59 Subgroup analyses focus on the effects of a treatment within a particular group of study participants, such as women, those within a given age strata, or among members of a specific race These analyses are typically performed when there is a suspicion based on observational data or biology that the treatment effects may differ among groups, also known as effect modification or interaction Since these analyses involve smaller numbers compared with the whole trial, they are typically underpowered to definitively identify treatment Next, the authors consider the certainty of the estimate of effect observed in the study The 95% confidence interval (CI) describes the range of true effect values that would plausibly yield the observed effect in the study If the estimate of effect in a study is a relative risk reduction of 50%, then a 95% CI of 40% to 60% indicates that it would reasonably have obtained this study result if the true effect was anywhere from 40% to 60% A higher-powered study will usually achieve a narrower 95% CI, giving more certainty about the true magnitude of effect When comparing binary outcomes, a CI that crosses 100% (or 1.0) for odds ratio or relative risk ratio translates to no statistically significant difference between groups Similarly, for continuous outcomes, a CI that crosses (i.e., no risk difference) translates to no statistically significant difference Statistical Analysis and Reporting P Values Statistical testing assesses whether the results support rejecting the null hypothesis within some margin of error Researchers calculate the probability that the observed difference (or one greater) would have occurred by chance, if the null hypothesis were true—the P value A low probability indicates that the results are unlikely to have occurred by chance and would support rejecting the null hypothesis Notably, even study results with an imprecise estimate of effect (e.g., those with a very wide 95% CI, or wide range of true values that could be consistent with the observed results) can meet statistical significance (i.e., P , 05) The method for calculating the P value varies by study design and outcome measure Parametric tests assume that the outcome measure has a normal distribution, indicating that it can be fully described by its mean and standard deviation For this kind of CHAPTER 11 Essential Concepts in Clinical Trial Design and Statistical Analysis outcome, a t-test is used to compare means from one or two samples; the analysis of variance test is used to compare means from more than two samples Nonparametric tests not depend on a normal distribution, but because they make fewer assumptions about the distribution of the data, they are typically less powerful These can be more appropriate for data that are highly skewed (e.g., length of stay, which is typically right skewed, as some patients have very long lengths of stay) or otherwise expected to have a nonnormal distribution Nonparametric tests include the Wilcoxon rank sum or Mann-Whitney U test (for unpaired comparison of the median in two groups) and the KruskalWallis test (for comparison of medians across multiple groups) Different tests are used for categorical data (e.g., ethnicity, gender, pediatric operational performance category score) The chi square test compares observed to expected values in a table Fisher’s exact test is similar but calculates a more accurate P value when small cell numbers are present McNemar’s test is used for paired categorical data For time-to-event data, survival curves are often used to display data This allows investigators to handle varying times of observation prior to events, as well as partial data from patients who are observed for some time but not observed to have an event during follow-up (censored data) A hazard ratio is typically reported for survival data, describing the ratio of “hazard” rate (moment-to-moment outcome or event rates) between groups Additional Sources and Mitigation of Bias Bias results in differences between study populations that are not due to chance Some features of design that reduce or eliminate bias, including study population selection, randomization, and blinding, have already been discussed Some additional features of study analysis can further reduce bias Bias due to loss of data occurs when data from subjects are eliminated from the final analyses Protocol violations, postrandomization exclusion, or unequal dropout or loss of follow-up can all result in missing data Data that are unequally missing between groups, or missing not at random, can introduce bias For example, if the study treatment was poorly tolerated by a subgroup of the study population, then those patients might be more likely to drop out, and their data would be missing from the final study results The total amount of missing data can also impact study results One review of 71 major RCTs in toptier medical journals identified that 13 of the trials (18%) were missing outcome data on 20% or more of enrolled subjects.61 Imputation, sensitivity analyses, and other advanced statistical methods can be used to explore how much the results might be affected by missing data Additional Methods of Exploring Study Results A clear distinction should be made between relative risk reduction and absolute risk reduction in study results A reduction in mortality from 60% to 20% and a reduction from 3% to 1% both represent a relative risk reduction—1 minus the risk ratio—of 66% However, the absolute risk reduction, or the difference in risk between groups, is substantially different: 40% versus 2% The number needed to treat (NNT) is a related statistic that describes what number of patients would have to be exposed to the intervention to result in one “saved” outcome, equal to divided by absolute risk reduction In the examples above, the NNT would be only 2.5 patients for the intervention 81 that reduces mortality from 60% to 20% but would be 50 patients for the intervention that reduces mortality from 3% to 1%.7 The NNT can facilitate comparing results from different studies and consideration of side effects, costs, and other aspects of an intervention It can also be adjusted for a particular patient’s baseline risk compared with the average risk of patients in the study; among patients with twice the baseline risk of the outcome, the NNT would be cut in half.62 These terms and additional common calculations are summarized in eTables 11.1 and 11.2 The fragility index, another method used to describe study results, refers to the number of outcomes (or events) that would have to be changed to nonoutcomes in order to raise the P value from statistically significant (classically, ,.05) to nonsignificant or to raise the likelihood that the observed study results occurred by chance beyond a threshold of acceptability.63–65 Across 43 published RCTs in pediatric critical care, a median of only two event switches would have been required to alter the study results from statistically significant to nonsignificant.66 This methodology can only be applied to studies with a binary outcome and is subject to the same methodologic concerns as the P value itself.67 Negative Studies The design and analysis of interventional trials is geared toward testing a null hypothesis Rejecting the null hypothesis (i.e., not finding evidence of a difference) is not the same as finding evidence of no difference Evidence of no difference would be a study in which the estimate of effect was close to unity, with a high degree of confidence in the result (e.g., a low P value and a narrow CI) While many clinicians refer to a “negative” study as one in which P 05, this indicates simply a reasonable likelihood that the study result was identified by chance If the estimate of effect favored the intervention, but P 05, this study may be merely underpowered to identify a true effect Clinicians should not necessarily conclude that such a study supports equal outcomes between treatment and control Conclusions Good trial design is more important than statistical analysis Once a trial is completed, shortcomings in design cannot be mitigated, whereas statistical analyses can be modified or corrected The most common shortcomings in trial design are the introduction of, or failure to accommodate, bias and imprecision in estimating the treatment effect, leading to an inability to address the initial study question Key References Koepsell TD, Weiss NS Randomized Trials Epidemiologic Methods: Studying the Occurrence of Illness 1st ed New York: Oxford University Press; 2003 Piantadosi S Clinical Trials: A Methodologic Perspective 2nd ed Hoboken, NJ: John Wiley and Sons, Inc; 2005 Pocock SJ, McMurray JJ, Collier TJ Making sense of statistics in clinical trial reports: part of a 4-part series on statistics for clinical trials J Am Coll Cardiol 2015;66(22):2536-2549 Pocock SJ, Clayton TC, Stone GW Design of major randomized trials: part of a 4-part series on statistics for clinical trials J Am Coll Cardiol 2015;66(24):2757-2766 The full reference list for this chapter is available at ExpertConsult.com 81.e1 eTABLE Two-by-Two Table for Calculations 11.1 DISEASE OR OUTCOME Test or exposure Present Absent Positive a b Negative c d eTABLE Selected Terms and Definitions 11.2 Absolute risk reduction (ARR): The difference in event rates in treated patients compared with control patients Note that the order is reversed compared with the attributable risk (see below) ARR [c/(c d)] [a/(a b)] Ascertainment bias: Observer bias; bias introduced by study staff or investigators knowing or being able to determine treatment group assignment in randomized studies Attributable risk (AR): The effect of an exposure on the risk of disease in those exposed compared with those unexposed AR (Frequency in exposed group) (Frequency in unexposed group) [a/(a b)] [c/c d)] Blinding: Obscuring study treatment group assignment from individuals in a trial (patients, research study staff, investigators, and/or other clinical providers) Confidence interval (CI): The range of values likely to include the true value for the entire population The standard is 95%, in which 95% of such intervals will contain the true population mean Confounding: An effect of a third factor, one associated with both a predictor and an outcome (but not on the causal pathway between the predictor and outcome), that may influence the observed effect of a predictor on outcome Effect modification: An effect of a third factor that influences the magnitude of the observed effect of a predictor on outcome Estimate of effect: The observed effect of an intervention in a particular study, usually presented along with an estimate of a range of effect sizes that would be consistent with the study’s result (e.g., a relative risk and its 95% confidence interval) Explanatory trials: Designed to observe the true biological effect, or efficacy, of an intervention, typically under tightly controlled circumstances Intention-to-treat analysis: Data are analyzed according to the groups to which subjects were assigned, regardless of what treatment subjects actually received (analyzed as randomized) Negative predictive value: The proportion of people with a negative test who are free of disease NPV = d c+d Number needed to treat (NNT): The number of patients needed to treat to achieve one outcome It is the inverse of the attributable risk ratio NNT = a c = 1/ ARR c + d a + b Odds: The ratio of events to nonevents (i.e., chances of something happening divided by chances against something happening) This is not the same as risk (which has a different denominator; see definition below) The odds of getting heads when flipping a coin are 1:1 (one to one) Odds ratio (OR), or relative odds: The odds of an event in a treated patient versus the odds in a control patient In case-control studies, relative risk (RR) cannot be calculated because subjects are selected on the basis of outcome, not exposure For rare outcomes (e.g., ,10% of the population), RR can be estimated by OR a /c ad OR = = b/d bc Per-protocol analysis: Analysis based on subjects who received the intended intervention or adhered to treatment; this analysis loses the benefits of randomization but may be helpful in pragmatic studies Positive predictive value (PPV): The proportion of people with a positive test who have disease PPV = a a+b Pragmatic trials: Designed to test the effectiveness of an intervention in a real-world scenario, often involving a clinical environment and a broadly selected population 81.e2 eTABLE Selected Terms and Definitions—cont’d 11.2 P value: The probability that the observed difference, or a larger one, would have been found by chance in a particular study if no effect is truly present Sensitivity: The proportion of people with disease who have a positive test, a/(a c) Specificity: The proportion of people free of disease who have a negative test, d/(b d) Relative risk (RR): The risk of development of disease in the exposed group relative to those who were not exposed (also called risk ratio) RR Prevalence in exposed group Prevalence in unex p osed group a / (a + b) c / (c + d) Relative risk reduction (RRR): Percent reduction in events in treated versus untreated groups RRR (1 [a/(a b)]/[c/(c d)]) 100% Risk (probability): The ratio of events to all possible events (i.e., the chances of something happening divided by the total number of chances) The risk (probability) of getting heads when flipping a coin is 0.5, or 50% Type I error (a): The chance that a difference between treated and control groups studied is found when, in reality, there is no difference Type II error (b): The chance that no difference between treated and control groups studied is found when, in reality, there is a difference Power (1 – b): Statistical power is the ability of an experiment to observe a significant difference between groups when a difference truly exists Power is equal to minus the type II error (b) Validity: Internal validity refers to results that are real and not due to bias, chance, or confounding External validity refers to results that can be generalized to a broader population e3 References Piantadosi S The study cohort, Treatment allocation Clinical Trials: A Methodologic Perspective 2nd ed Hoboken, NJ: John Wiley and Sons, 2005 Koepsell TDW, Weiss NS Overview of Study Designs Epidemiologic Methods: Studying the Occurrence of Illness New York: Oxford University Press; 2003 Pocock SJ, Clayton TC, Stone GW Design of Major randomized trials: part of a 4-part series on statistics for clinical trials J Am Coll Cardiol 2015;66:2757-2766 Koepsell TDW, Weiss NS Randomized Trials Epidemiologic Methods: Studying the Occurrence of Illness New York: Oxford University Press; 2003 Johnson N, Lilford RJ, Brazier W At what level of collective equipoise does a clinical trial become ethical? J Med Ethics 1991;17:3034 Doig GS, Simpson F Efficient literature searching: a core skill for the practice of evidence-based medicine Intensive Care Med 2003;29:2119-2127 Pocock SJ, McMurray JJ, Collier TJ Making sense of statistics in clinical trial reports: part of a 4-part series on statistics for clinical trials J Am Coll Cardiol 2015;66:2536-2549 Lin Y, Zhu M, Su Z The pursuit of balance: an overview of covariate-adaptive randomization techniques in clinical trials Contemp Clin Trials 2015;45:21-25 Suresh K An overview of randomization techniques: An unbiased assessment of outcome in clinical research J Hum Reprod Sci 2011;4:8-11 10 Horng S, Miller FG Ethical framework for the use of sham procedures in clinical trials Crit Care Med 2003;31:S126-S130 11 Savulescu J, Wartolowska K, Carr A Randomised placebo-controlled trials of surgery: ethical analysis and guidelines J Med Ethics 2016;42:776-783 12 Menon K, McNally JD, Zimmerman JJ, et al Primary outcome measures in pediatric septic shock trials: a systematic review Pediatr Crit Care Med 2017;18:e146-e154 Quartin AA, Schein RM, Kett DH, Peduzzi PN, Group ftDoVASSCS Magnitude and duration of the effect of sepsis on survival JAMA 1997;277:1058-1063 14 Kaplan V, Clermont G, Griffin MF, et al Pneumonia: Still the old man’s friend? Arch Intern Med 2003;163:317-323 15 Herridge MS, Chu LM, Matte A, et al The RECOVER program: disability risk groups and 1-year outcome after or more days of mechanical ventilation Am J Respir Crit Care Med 2016;194:831-844 16 Volakli EA, Sdougka M, Drossou-Agakidou V, Emporiadou M, Reizoglou M, Giala M Short-term and long-term mortality following pediatric intensive care Pediatr Int 2012;54:248-255 17 Pinto NP, Rhinesmith EW, Kim TY, Ladner PH, Pollack MM Long-term function after pediatric critical illness: results from the survivor outcomes study Pediatr Crit Care Med 2017;18:e122-e130 18 Matsumoto N, Hatachi T, Inata Y, Shimizu Y, Takeuchi M Longterm mortality and functional outcome after prolonged paediatric intensive care unit stay Eur J Pediatr 2019;178:155-160 19 Knaus WA, Draper EA, Wagner DP, Zimmerman JE Prognosis in acute organ-system failure Ann Surg 1985;202:685-693 20 Raffin TA Intensive care unit survival of patients with systemic illness Am Rev Respir Dis 1989;140:S28-S35 21 Leteurtre S, Duhamel A, Salleron J, Grandbastien B, Lacroix J, Leclerc F PELOD-2: an update of the Pediatric logistic organ dysfunction score Crit Care Med 2013;41:1761-1773 22 Matics TJ, Sanchez-Pinto LN Adaptation and validation of a pediatric sequential organ failure assessment score and evaluation of the sepsis-3 definitions in critically Ill children JAMA Pediatr 2017;171: e172352 23 Weinstein MC, Stason WB Foundations of cost-effectiveness analysis for health and medical practices N Engl J Med 1977;296:716721 24 Predicting outcome in ICU patients 2nd european consensus conference in intensive care Medicine J Intensive Care Med 1994; 20:390-397 25 Yagiela LM, Barbaro RP, Quasney MW, et al Outcomes and patterns of healthcare utilization after hospitalization for pediatric critical illness due to respiratory failure Pediatr Crit Care Med 2019;20:120-127 26 Oczkowski WJ, Barreca S The functional independence measure: its use to identify rehabilitation needs in stroke survivors Arch Phys Med Rehabil 1993;74:1291-1294 27 Enright PL, Sherrill DL Reference equations for the six-minute walk in healthy adults Am J Respir Crit Care Med 1998;158:13841387 28 Pollack MM, Holubkov R, Glass P, et al Functional status scale: new pediatric outcome measure Pediatrics 2009;124:e18-e28 29 Ridley SA, Wallace PG Quality of life after intensive care Anaesthesia 1990;45:808-813 30 Tarlov AR, Ware Jr JE, Greenfield S, Nelson EC, Perrin E, Zubkoff M The medical outcomes study An application of methods for monitoring the results of medical care JAMA 1989;262:925-930 31 Visser MC, Fletcher AE, Parr G, Simpson A, Bulpitt CJ A comparison of three quality of life instruments in subjects with angina pectoris: the sickness impact profile, the Nottingham Health Profile, and the Quality of Well Being Scale J Clin Epidemiol 1994;47: 157-163 32 Kaplan RM, Atkins CJ, Timms R Validity of a quality of well-being scale as an outcome measure in chronic obstructive pulmonary disease J Chronic Dis 1984;37:85-95 33 Chelluri L, Grenvik AN, Silverman M Intensive care for critically ill elderly: Mortality, costs, and quality of life Review of the literature Arch Intern Med 1995;155:1013-1022 34 Slatyer MA, James OF, Moore PG, Leeder SR Costs, severity of illness and outcome in intensive care Anaesth Intensive Care 1986;14:381-389 35 Varni JW, Seid M, Kurtin PS PedsQL 4.0: reliability and validity of the Pediatric Quality of Life Inventory version 4.0 generic core scales in healthy and patient populations Medical Care 2001;39:800-812 36 Leteurtre S, Martinot A, Duhamel A, et al Development of a pediatric multiple organ dysfunction score: use of two strategies Med Decis Making 1999;19:399-410 37 Doughty L, Carcillo JA, Kaplan S, Janosky J Plasma nitrite and nitrate concentrations and multiple organ failure in pediatric sepsis Crit Care Med 1998;26:157-62 38 Leteurtre S, Martinot A, Duhamel A, et al Validation of the paediatric logistic organ dysfunction (PELOD) score: prospective, observational, multicentre study Lancet 2003;362:192-197 39 Proulx F, Fayon M, Farrell CA, Lacroix J, Gauthier M Epidemiology of sepsis and multiple organ dysfunction syndrome in children Chest 1996;109:1033-1037 40 Graciano AL, Balko JA, Rahn DS, Ahmad N, Giroir BP The Pediatric Multiple Organ Dysfunction Score (P-MODS): development and validation of an objective scale to measure the severity of multiple organ dysfunction in critically ill children Crit Care Med 2005;33:1484-1491 41 Typpo KV, Petersen NJ, Hallman DM, Markovitz BP, Mariscalco MM Day multiple organ dysfunction syndrome is associated with poor functional outcome and mortality in the pediatric intensive care unit Pediatr Crit Care Med 2009;10:562-570 42 Schlapbach LJ, Straney L, Bellomo R, MacLaren G, Pilcher D Prognostic accuracy of age-adapted SOFA, SIRS, PELOD-2, and qSOFA for in-hospital mortality among children with suspected infection admitted to the intensive care unit Intensive Care Med 2018;44:179188 ... b Odds: The ratio of events to nonevents (i.e., chances of something happening divided by chances against something happening) This is not the same as risk (which has a different denominator;... one in which P 05, this indicates simply a reasonable likelihood that the study result was identified by chance If the estimate of effect favored the intervention, but P 05, this study may be... Lilford RJ, Brazier W At what level of collective equipoise does a clinical trial become ethical? J Med Ethics 1991;17:3034 Doig GS, Simpson F Efficient literature searching: a core skill for