1. Trang chủ
  2. » Kinh Doanh - Tiếp Thị

2011 (book) STATISTICS for bioengineering sciences (vidakovic)

759 378 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 759
Dung lượng 12,71 MB

Nội dung

Brani Vidakovic Statistics for Bioengineering Sciences With MATLAB and WinBUGS Support Brani Vidakovic Department of Biomedical Engineering Georgia Institute of Technology 2101 Whitaker Building 313 Ferst Drive Atlanta, Georgia 30332-0535 USA brani@bme.gatech.edu Series Editors: George Casella Department of Statistics University of Florida Gainesville, FL 32611-8545 USA Stephen Fienberg Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213-3890 USA Ingram Olkin Department of Statistics Stanford University Stanford, CA 94305 USA ISSN 1431-875X ISBN 978-1-4614-0393-7 e-ISBN 978-1-4614-0394-4 DOI 10.1007/978-1-4614-0394-4 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2011931859 © Springer Science+Business Media, LLC 2011 All rights reserved This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Preface This text is a result of many semesters of teaching introductory statistical courses to engineering students at Duke University and the Georgia Institute of Technology Through its scope and depth of coverage, the text addresses the needs of the vibrant and rapidly growing engineering fields, bioengineering and biomedical engineering, while implementing software that engineers are familiar with There are many good introductory statistics books for engineers on the market, as well as many good introductory biostatistics books This text is an attempt to put the two together as a single textbook heavily oriented to computation and hands-on approaches For example, the aspects of disease and device testing, sensitivity, specificity and ROC curves, epidemiological risk theory, survival analysis, and logistic and Poisson regressions are not typical topics for an introductory engineering statistics text On the other hand, the books in biostatistics are not particularly challenging for the level of computational sophistication that engineering students possess The approach enforced in this text avoids the use of mainstream statistical packages in which the procedures are often black-boxed Rather, the students are expected to code the procedures on their own The results may not be as flashy as they would be if the specialized packages were used, but the student will go through the process and understand each step of the program The computational support for this text is the MATLAB© programming environment since this software is predominant in the engineering communities For instance, Georgia Tech has developed a practical introductory course in computing for engineers (CS1371 – Computing for Engineers) that relies on MATLAB Over 1,000 students take this class per semester as it is a requirement for all engineering students and a prerequisite for many upper-level courses In addition to the synergy of engineering and biostatistical approaches, the novelty of this book is in the substantial coverage of Bayesian approaches to statistical inference v vi Preface I avoided taking sides on the traditional (classical, frequentist) vs Bayesian approach; it was my goal to expose students to both approaches It is undeniable that classical statistics is overwhelmingly used in conducting and reporting inference among practitioners, and that Bayesian statistics is gaining in popularity, acceptance, and usage (FDA, Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials, February 2010) Many examples in this text are solved using both the traditional and Bayesian methods, and the results are compared and commented upon This diversification is made possible by advances in Bayesian computation and the availability of the free software WinBUGS that provides painless computational support for Bayesian solutions WinBUGS and MATLAB communicate well due to the free interface software MATBUGS The book also relies on stat toolbox within MATLAB The World Wide Web (WWW) facilitates the text All custom-made MATLAB and WinBUGS programs (compatible with MATLAB 7.12 (2011a) and WinBUGS 1.4.3 or OpenBUGS 3.2.1) as well as data sets used in this book are available on the Web: http://springer.bme.gatech.edu/ To keep the text as lean as possible, solutions and hints to the majority of exercises can be found on the book’s Web site The computer scripts and examples are an integral part of the text, and all MATLAB codes and outputs are shown in blue typewriter font while all WinBUGS programs are given in red-brown typewriter font The comments in MATLAB and WinBUGS codes are presented in green typewriter font , , and are used to point to data sets, MATLAB The three icons codes, and WinBUGS codes, respectively The difficulty of the material in the text necessarily varies More difficult sections that may be omitted in the basic coverage are denoted by a star, ∗ However, it is my experience that advanced undergraduate bioengineering students affiliated with school research labs need and use the “starred” material, such as functional ANOVA, variance stabilizing transforms, and nested experimental designs, to name just a few Tricky or difficult places are marked with Donald Knut’s “bend” Each chapter starts with a box titled WHAT IS COVERED IN THIS CHAPTER and ends with chapter exercises, a box called MATLAB AND WINBUGS FILES AND DATA SETS USED IN THIS CHAPTER, and chapter references The examples are numbered and the end of each example is marked with  Preface vii I am aware that this work is not perfect and that many improvements could be made with respect to both exposition and coverage Thus, I would welcome any criticism and pointers from readers as to how this book could be improved Acknowledgments I am indebted to many students and colleagues who commented on various drafts of the book In particular I am grateful to colleagues from the Department of Biomedical Engineering at the Georgia Institute of Technology and Emory University and their undergraduate and graduate advisees/researchers who contributed with real-life examples and exercises from their research labs Colleagues Tom Bylander of the University of Texas at San Antonio, John H McDonald of the University of Delaware, and Roger W Johnson of the South Dakota School of Mines & Technology kindly gave permission to use their data and examples I also acknowledge Mathworks’ statistical gurus Peter Perkins and Tom Lane for many useful conversations over the last several years Several MATLAB codes used in this book come from the MATLAB Central File Exchange forum In particular, I am grateful to Antonio Truillo-Ortiz and his team (Universidad Autonoma de Baja California) and to Giuseppe Cardillo (Merigen Research) for their excellent contributions The book benefited from the input of many diligent students when it was used either as a supplemental reading or later as a draft textbook for a semester-long course at Georgia Tech: BMED2400 Introduction to Bioengineering Statistics A complete list of students who provided useful comments would be quite long, but the most diligent ones were Erin Hamilton, Kiersten Petersen, David Dreyfus, Jessica Kanter, Radu Reit, Amoreth Gozo, Nader Aboujamous, and Allison Chan Springer’s team kindly helped along the way I am grateful to Marc Strauss and Kathryn Schell for their encouragement and support and to Glenn Corey for his knowledgeable copyediting Finally, it hardly needs stating that the book would have been considerably less fun to write without the unconditional support of my family B RANI V IDAKOVIC School of Biomedical Engineering Georgia Institute of Technology brani@bme.gatech.edu Contents Preface v Introduction Chapter References The Sample and Its Properties 2.1 Introduction 2.2 A MATLAB Session on Univariate Descriptive Statistics 2.3 Location Measures 2.4 Variability Measures 2.5 Displaying Data 2.6 Multidimensional Samples: Fisher’s Iris Data and Body Fat Data 2.7 Multivariate Samples and Their Summaries* 2.8 Visualizing Multivariate Data 2.9 Observations as Time Series 2.10 About Data Types 2.11 Exercises Chapter References 9 10 13 16 24 Probability, Conditional Probability, and Bayes’ Rule 3.1 Introduction 3.2 Events and Probability 3.3 Odds 3.4 Venn Diagrams* 3.5 Counting Principles* 3.6 Conditional Probability and Independence 3.6.1 Pairwise and Global Independence 3.7 Total Probability 3.8 Bayes’ Rule 3.9 Bayesian Networks* 59 59 60 71 71 74 78 82 83 85 90 28 33 38 42 44 46 57 ix x Contents 3.10 Exercises 96 Chapter References 106 Sensitivity, Specificity, and Relatives 109 4.1 Introduction 109 4.2 Notation 110 4.2.1 Conditional Probability Notation 113 4.3 Combining Two or More Tests 115 4.4 ROC Curves 118 4.5 Exercises 122 Chapter References 129 Random Variables 131 5.1 Introduction 131 5.2 Discrete Random Variables 133 5.2.1 Jointly Distributed Discrete Random Variables 138 5.3 Some Standard Discrete Distributions 140 5.3.1 Discrete Uniform Distribution 140 5.3.2 Bernoulli and Binomial Distributions 141 5.3.3 Hypergeometric Distribution 146 5.3.4 Poisson Distribution 149 5.3.5 Geometric Distribution 151 5.3.6 Negative Binomial Distribution 152 5.3.7 Multinomial Distribution 155 5.3.8 Quantiles 156 5.4 Continuous Random Variables 157 5.4.1 Joint Distribution of Two Continuous Random Variables 158 5.5 Some Standard Continuous Distributions 161 5.5.1 Uniform Distribution 161 5.5.2 Exponential Distribution 162 5.5.3 Normal Distribution 164 5.5.4 Gamma Distribution 165 5.5.5 Inverse Gamma Distribution 166 5.5.6 Beta Distribution 167 5.5.7 Double Exponential Distribution 168 5.5.8 Logistic Distribution 169 5.5.9 Weibull Distribution 170 5.5.10 Pareto Distribution 171 5.5.11 Dirichlet Distribution 172 5.6 Random Numbers and Probability Tables 173 5.7 Transformations of Random Variables* 174 5.8 Mixtures* 177 5.9 Markov Chains* 178 5.10 Exercises 180 Chapter References 189 Contents xi Normal Distribution 191 6.1 Introduction 191 6.2 Normal Distribution 192 6.2.1 Sigma Rules 197 6.2.2 Bivariate Normal Distribution* 197 6.3 Examples with a Normal Distribution 199 6.4 Combining Normal Random Variables 202 6.5 Central Limit Theorem 204 6.6 Distributions Related to Normal 208 6.6.1 Chi-square Distribution 209 6.6.2 (Student’s) t-Distribution 213 6.6.3 Cauchy Distribution 214 6.6.4 F-Distribution 215 6.6.5 Noncentral χ2 , t, and F Distributions 216 6.6.6 Lognormal Distribution 218 6.7 Delta Method and Variance Stabilizing Transformations* 219 6.8 Exercises 222 Chapter References 228 Point and Interval Estimators 229 7.1 Introduction 229 7.2 Moment Matching and Maximum Likelihood Estimators 230 7.2.1 Unbiasedness and Consistency of Estimators 238 7.3 Estimation of a Mean, Variance, and Proportion 240 7.3.1 Point Estimation of Mean 240 7.3.2 Point Estimation of Variance 242 7.3.3 Point Estimation of Population Proportion 245 7.4 Confidence Intervals 246 7.4.1 Confidence Intervals for the Normal Mean 247 7.4.2 Confidence Interval for the Normal Variance 249 7.4.3 Confidence Intervals for the Population Proportion 253 7.4.4 Confidence Intervals for Proportions When X = 257 7.4.5 Designing the Sample Size with Confidence Intervals 258 7.5 Prediction and Tolerance Intervals* 260 7.6 Confidence Intervals for Quantiles* 262 7.7 Confidence Intervals for the Poisson Rate* 263 7.8 Exercises 265 Chapter References 276 Bayesian Approach to Inference 279 8.1 Introduction 279 8.2 Ingredients for Bayesian Inference 282 8.3 Conjugate Priors 287 8.4 Point Estimation 288 8.5 Prior Elicitation 290 738 19 Bayesian Inference Using Gibbs Sampling – BUGS Project diag, auto cor) are now active Return to Update Tool and select the desired number of simulations, say 100,000, in the updates subwindow Press the update button (Fig 19.6a) Return to Sample Monitor Tool and check trace for the part of the MC trace for α, history for the complete trace, density for a density estimator of α, etc For example, pressing the stats button will produce something like the following table: mean sd MC error val2.5pc median val97.5pc start sample alpha 2.996 0.5583 0.001742 1.941 2.998 4.041 1001 100000 The mean 2.996 is the Bayes estimator (as the mean from the sample from the posterior for α) There are two precision outputs, sd and MC error The former is an estimator of the standard deviation of the posterior and can be improved by increasing the sample size but not the number of simulations The latter is the simulation error and can be improved by additional simulations The 95% credible set (1.941, 4.041) is determined by val2.5pc and val97.5pc, which are the 0.025 and 0.975 (empirical) quantiles from the posterior The empirical median of the posterior is given by median The outputs start and sample show the starting index for the simulations (after burn-in) and the available number of simulations (a) (b) Fig 19.6 (a) Select the simulation size and update (b) After the simulation is done, check the stats node For all parameters a comparative table (Fig 19.6b) is as follows: alpha beta sigma tau mean 2.996 0.7987 1.014 1.865 sd 0.5583 0.3884 0.7215 1.533 MC error val2.5pc median val97.5pc start sample 0.001742 1.941 2.998 4.041 1001 100000 0.001205 0.06345 0.7999 1.537 1001 100000 0.004372 0.4134 0.8266 2.765 1001 100000 0.006969 0.1308 1.463 5.852 1001 100000 19.3 Built-in Functions and Common Distributions in WinBUGS 739 We recall the least squares estimators from the beginning of this session: αˆ = 3, βˆ = 0.8, and τˆ = 1.875, and note that their Bayesian counterparts are very close Densities (smoothed histograms) and traces for all parameters are given in Fig 19.7 (a) (b) Fig 19.7 Checking (a) density and (b) trace in the Sample Monitor Tool If you want to save the trace for α in a file and process it in MATLAB, select coda and the data window will open with an information window as well Keep the data window active and select Save As from the File menu Save the αs in alphas.txt, where it will be ready to be imported into MATLAB Later in this chapter we will discuss the direct interface between WinBUGS and MATLAB called MATBUGS 19.3 Built-in Functions and Common Distributions in WinBUGS This section contains two tables: one with the list of built-in functions and another with the list of available distributions A first-time WinBUGS user may be disappointed by the selection of builtin functions – the set is minimal but sufficient The full list of distributions in WinBUGS can be found in Manuals>OpenBUGS User Manual WinBUGS also allows for the inclusion of distributions for which functions are not built in Table 19.2 provides a list of important continuous and discrete distributions, with their syntax and parametrizations WinBUGS has the capability to define custom distributions, both as a likelihood and as a prior, via the socalled zero-tricks (p 296) 740 19 Bayesian Inference Using Gibbs Sampling – BUGS Project 1.6 1.4 1.2 0.8 0.6 0.8 0.6 0.4 0.4 0.2 0.2 −1 −0.5 0.5 (a) 1.5 2.5 (b) 0.35 1.2 0.3 0.25 0.8 0.2 0.6 0.15 0.4 0.1 0.2 0.05 0 (c) 10 (d) Fig 19.8 Traces of the four parameters from a simple example: (a) α, (b) β, (c) τ, and (d) σ from WinBUGS Data are plotted in MATLAB after being exported from WinBUGS 19.4 MATBUGS: A MATLAB Interface to WinBUGS There is strong motivation to interface WinBUGS with MATLAB Cutting and pasting results from WinBUGS is cumbersome if the simulation size is in millions or if the number of simulated parameters is large Also, the data manipulation and graphical capabilities in WinBUGS are quite rudimentary compared to MATLAB 19.4 MATBUGS: A MATLAB Interface to WinBUGS 741 Table 19.1 Built-in functions in WinBUGS WinBUGS code Function abs(y) | y| cloglog(y) ln(− ln(1 − y)) cos(y) cos(y) equals(y, z) if y = z; otherwise exp(y) exp(y) inprod(y, z) i yi z i inverse(y) y−1 for symmetric positive–definite matrix y log(y) ln(y) logfact(y) ln(y!) loggam(y) ln(Γ(y)) logit(y) ln(y/(1 − y)) max(y, z) y if y > z; y otherwise mean(y) n−1 i yi , n = dim(y) min(y, z) y if y < z; z otherwise phi(y) standard normal CDF Φ(y) pow(y, z) yz sin(y) sin(y) sqrt(y) y rank(v, s) number of components of v less than or equal to vs ranked(v, s) sth smallest component of v round(y) nearest integer to y sd(v) standard deviation of components of y (n − in denom.) step(y) if y ≥ 0; otherwise sum(y) i yi trunc(y) greatest integer less than or equal to y MATBUGS is a MATLAB program that communicates with WinBUGS The program matbugs.m was written by Kevin Murphy and his team and can be found at: http://code.google.com/p/matbugs We now demonstrate how to solve Jeremy’s IQ problem in MATLAB by calling WinBUGS First we need to create a simple text file, say, jeremy.txt: model{ for(i in : N) { scores[i] ~ dnorm(theta, tau) } theta ~ dnorm(mu, xi) and then run the MATLAB file: dataStruct = struct( ’N’, 5, ’tau’,1/80, ’xi’,1/120, ’mu’,110, ’scores’,[97 110 117 102 98]); x ∼ dt(mu, tau, k) x ∼ dunif(a, b) x ∼ dweib(v, lambda) Student-t Uniform Weibull Multinomial x[] ∼ dmnorm(mu[], T[,]) p[] ∼ ddirch(alpha[]) Multivariate Student-t x[] ∼ dmt(mu[], T[,], k) Wishart x[,] ∼ dwish(R[,], k) Dirichlet Multivariate normal x ∼ dgamma(a, b) x ∼ dnorm(mu, tau) x ∼ dpar)alpha,c) Gamma Normal Pareto x[] ∼ dmulti(p[], N) x ∼ ddexp(mu, tau) x ∼ dexp(lambda) x ∼ dflat() x ∼ dchisqr(k) WinBUGS code x ∼ dbern(p) x ∼ dbin(p, n) x ∼ dcat(p[]) x ∼ dpois(lambda) x ∼ dbeta(a,b) Chi-square Double exponential Exponential Flat Distribution Bernoulli Binomial Categorical Poisson Beta (2π) −(k+ d)/2 exp{−1/2(x − µ) T(x − µ)}, x ∈ R exp{−λ x }, x, v, λ > 0, + 1k (x − µ) T(x − µ) , x ∈ Rd , k ≥ k/2 (k− p−1)/2 | R | | x| exp{−1/2T r(Rx)} Γ((k+ d)/2) |T |1/2 Γ(k/2) k d/2 πd/2 |T | vλ x τ τ −(k+1)/2 , x ∈ R, k ≥ kπ [1 + k (x − µ) ] , a ≤ x ≤ b b−a v−1 v x i )! xi i p i , i x i = N, < p i < 1, i p i = i xi ! Γ( i α i ) α i −1 , < p i < 1, i p i = i pi i Γ(α i ) − d/2 1/2 d i Γ((k+1)/2) Γ(k/2) ( b a xa−1 Γ(a) x ≥ 0, k > exp{−τ| x − µ|}, x ∈ R, τ > 0, µ ∈ R λ exp{−λ x}, x ≥ 0, λ ≥ constant; not a proper density exp(− bx), x, a, b > τ/(2π) exp{− 2τ (x − µ)2 }, x, µ ∈ R, τ > α cα x−(α+1) , x > c τ x k/1−1 exp{− x/2} , 2k/2 Γ(k/2) Density p x (1 − p)1− x , x = 0, 1; ≤ p ≤ n x n− x , x = 0, , n; ≤ p ≤ x p (1 − p) p[x], x = 1, 2, , dim(p) λx x! exp{−λ}, x = 0, 1, 2, , λ > a−1 (1 − x)b−1 , = x ≤ 1, a, b > −1 B(a,b) x Table 19.2 Built-in distributions with WinBUGS names and their parameterizations 742 19 Bayesian Inference Using Gibbs Sampling – BUGS Project 19.4 MATBUGS: A MATLAB Interface to WinBUGS 743 initStruct = struct( ’theta’, 100 ); cd(’C:\MyBugs\matbugs\’) [samples, stats] = matbugs(dataStruct, fullfile(pwd, ’jeremy.txt’), ’init’, initStruct, ’nChains’, 1, ’view’, 0, ’nburnin’, 2000, ’nsamples’, 50000, ’thin’, 1, ’monitorParams’, {’theta’}, ’Bugdir’, ’C:/Program Files/BUGS’); baymean = mean(samples.theta) frmean=mean(dataStruct.scores) figure(1) [p, x] = ksdensity(samples.theta); plot(x, p); 0.12 0.1 0.08 0.06 0.04 0.02 85 90 95 100 105 110 115 120 125 Fig 19.9 Posterior for Jeremy’s data set Data are plotted in MATLAB after being exported from WinBUGS by MATBUGS 744 19 Bayesian Inference Using Gibbs Sampling – BUGS Project 19.5 Exercises 19.1 A Coin and a Die The following WinBUGS code simulates flips of a coin The outcome H is coded by and T by Mimic this code to simulate rolls of a fair die #coin model{ flip ~ dcat(p.coin[]) coin Compute P (10 < X < 16) using (a) exact integration, 10 (b) MATLAB’s expcdf, and (c) WinBUGS 19.4 WinBUGS as a Calculator WinBUGS can approximate definite integrals, solve nonlinear equations, and even find values of definite integrals over random intervals The following WinBUGS program finds an approxiπ mation to sin(x)dx, solves the equation y5 − 2y = 0, and finds the integral R z (1 − z )dz, where R is a beta B e(2, 2) random variable The solution is given by the following code: model{ F(x)

Ngày đăng: 09/08/2017, 10:31

TỪ KHÓA LIÊN QUAN

w