Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 465 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
465
Dung lượng
3,08 MB
Nội dung
AppliedBayesian Modelling
Applied Bayesian Modelling. Peter Congdon
Copyright
2003 John Wiley & Sons, Ltd.
ISBN: 0-471-48695-7
WILEY SERIES IN PROBABILITY AND STATISTICS
Established by WALTER A. SHEWHART and SAMUEL S. WILKS
Editors: David J. Balding, Peter Bloomfield, Noel A. C. Cressie,
Nicholas I. Fisher, Iain M. Johnstone, J. B. Kadane, Louise M. Ryan,
David W. Scott, Adrian F. M. Smith, Jozef L. Teugels
Editors Emeriti: Vic Barnett, J. Stuart Hunter and David G. Kendall
A complete list of the titles in this series appears at the end of this volume.
Applied Bayesian Modelling
PETER CONGDON
Queen Mary, University of London, UK
Copyright # 2003 John Wiley & Sons Ltd,
The Atrium, Southern Gate, Chichester,
West Sussex PO19 8SQ, England
Telephone (44) 1243 779777
Email (for orders and customer service enquiries): cs-books@wiley.co.uk
Visit our Home Page on www.wileyeurope.com or www.wiley.com
All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or
transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or
otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of
a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T
4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be
addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate,
Chichester, West Sussex PO19 8SQ, England, or emailed to permreq@wiley.co.uk, or faxed to (44)
1243 770620.
This publication is designed to provide accurate and authoritative information in regard to the subject
matter covered. It is sold on the understanding that the Publisher is not engaged in rendering
professional services. If professional advice or other expert assistance is required, the services of a
competent professional should be sought.
Other Wiley Editorial Offices
John Wiley & Sons Inc., 111 River Street,
Hoboken, NJ 07030, USA
Jossey-Bass, 989 Market Street,
San Francisco, CA 94103-1741, USA
Wiley-VCH Verlag GmbH, Boschstr.
12, D-69469 Weinheim, Germany
John Wiley & Sons Australia Ltd, 33 Park Road, Milton,
Queensland 4064, Australia
John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01,
Jin Xing Distripark, Singapore 129809
John Wiley & Sons Canada Ltd, 22 Worcester Road,
Etobicoke, Ontario, Canada M9W 1L1
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print
may not be available in electronic books.
Library of Congress Cataloging-in-Publication Data
Congdon, Peter.
Applied Bayesianmodelling / Peter Congdon.
p. cm. ± (Wiley series in probability and statistics)
Includes bibliographical references and index.
ISBN 0-471-48695-7 (cloth : alk. paper)
1. Bayesian statistical decision theory. 2. Mathematical statistics. I. Title. II. Series.
QA279.5 .C649 2003
519.542±dc21 2002035732
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
ISBN 0 471 48695 7
Typeset in 10/12 pt Times by Kolam Information Services, Pvt. Ltd., Pondicherry, India
Printed and bound in Great Britain by Biddles Ltd, Guildford, Surrey.
This book is printed on acid-free paper responsibly manufactured from sustainable forestry in which at
least two trees are planted for each one used for paper production.
Contents
Preface xi
Chapter 1 The Basis for, and Advantages of, Bayesian Model
Estimation via Repeated Sampling 1
1.1 Introduction 1
1.2 Gibbs sampling 5
1.3 Simulating random variables from standard densities 12
1.4 Monitoring MCMC chains and assessing convergence 18
1.5 Model assessment and sensitivity 20
1.6 Review 27
References 28
Chapter 2 Hierarchical Mixture Models 31
2.1 Introduction: Smoothing to the Population 31
2.2 General issues of model assessment: marginal likelihood
and other approaches 32
2.2.1 Bayes model selection using marginal likelihoods 33
2.2.2 Obtaining marginal likelihoods in practice 35
2.2.3 Approximating the posterior 37
2.2.4 Predictive criteria for model checking and selection 39
2.2.5 Replicate sampling 40
2.3 Ensemble estimates: pooling over similar units 41
2.3.1 Mixtures for Poisson and binomial data 43
2.3.2 Smoothing methods for continuous data 51
2.4 Discrete mixtures and Dirichlet processes 58
2.4.1 Discrete parametric mixtures 58
2.4.2 DPP priors 60
2.5 General additive and histogram smoothing priors 67
2.5.1 Smoothness priors 68
2.5.2 Histogram smoothing 69
2.6 Review 74
References 75
Exercises 78
Chapter 3 Regression Models 79
3.1 Introduction: Bayesian regression 79
3.1.1 Specifying priors: constraints on parameters 80
3.1.2 Prior specification: adopting robust
or informative priors 81
3.1.3 Regression models for overdispersed discrete outcomes 82
3.2 Choice between regression models and sets of predictors
in regression 84
3.2.1 Predictor selection 85
3.2.2 Cross-validation regression model assessment 86
3.3 Polytomous and ordinal regression 98
3.3.1 Multinomial logistic choice models 99
3.3.2 Nested logit specification 100
3.3.3 Ordinal outcomes 101
3.3.4 Link functions 102
3.4 Regressions with latent mixtures 110
3.5 General additive models for nonlinear regression effects 115
3.6 Robust Regression Methods 118
3.6.1 Binary selection models for robustness 119
3.6.2 Diagnostics for discordant observations 120
3.7 Review 126
References 129
Exercises 132
Chapter 4 Analysis of Multi-Level Data 135
4.1 Introduction 135
4.2 Multi-level models: univariate continuous
and discrete outcomes 137
4.2.1 Discrete outcomes 139
4.3 Modelling heteroscedasticity 145
4.4 Robustness in multi-level modelling 151
4.5 Multi-level data on multivariate indices 156
4.6 Small domain estimation 163
4.7 Review 167
References 168
Exercises 169
Chapter 5 Models for Time Series 171
5.1 Introduction 171
5.2 Autoregressive and moving average models under
stationarity and non-stationarity 172
5.2.1 Specifying priors 174
5.2.2 Further types of time dependence 179
5.2.3 Formal tests of stationarity in the AR(1) model 180
5.2.4 Model assessment 182
5.3 Discrete Outcomes 191
5.3.1 Auto regression on transformed outcome 193
vi
CONTENTS
5.3.2 INAR models for counts 193
5.3.3 Continuity parameter models 195
5.3.4 Multiple discrete outcomes 195
5.4 Error correction models 200
5.5 Dynamic linear models and time varying coefficients 203
5.5.1 State space smoothing 205
5.6 Stochastic variances and stochastic volatility 210
5.6.1 ARCH and GARCH models 210
5.6.2 Stochastic volatility models 211
5.7 Modelling structural shifts 215
5.7.1 Binary indicators for mean and variance shifts 215
5.7.2 Markov mixtures 216
5.7.3 Switching regressions 216
5.8 Review 221
References 222
Exercises 225
Chapter 6 Analysis of Panel Data 227
6.1 Introduction 227
6.1.1 Two stage models 228
6.1.2 Fixed vs. random effects 230
6.1.3 Time dependent effects 231
6.2 Normal linear panel models and growth curves
for metric outcomes 231
6.2.1 Growth Curve Variability 232
6.2.2 The linear mixed model 234
6.2.3 Variable autoregressive parameters 235
6.3 Longitudinal discrete data: binary, ordinal and
multinomial and Poisson panel data 243
6.3.1 Beta-binomial mixture for panel data 244
6.4 Panels for forecasting 257
6.4.1 Demographic data by age and time period 261
6.5 Missing data in longitudinal studies 264
6.6 Review 268
References 269
Exercises 271
Chapter 7 Models for Spatial Outcomes and Geographical Association 273
7.1 Introduction 273
7.2 Spatial regressions for continuous data with fixed
interaction schemes 275
7.2.1 Joint vs. conditional priors 276
7.3 Spatial effects for discrete outcomes: ecological
analysis involving count data 278
7.3.1 Alternative spatial priors in disease models 279
7.3.2 Models recognising discontinuities 281
7.3.3 Binary Outcomes 282
CONTENTS vii
7.4 Direct modelling of spatial covariation in regression
and interpolation applications 289
7.4.1 Covariance modelling in regression 290
7.4.2 Spatial interpolation 291
7.4.3 Variogram methods 292
7.4.4 Conditional specification of spatial error 293
7.5 Spatial heterogeneity: spatial expansion, geographically
weighted regression, and multivariate errors 298
7.5.1 Spatial expansion model 298
7.5.2 Geographically weighted regression 299
7.5.3 Varying regressions effects via multivariate priors 300
7.6 Clustering in relation to known centres 303
7.6.1 Areas vs. case events as data 306
7.6.2 Multiple sources 306
7.7 Spatio-temporal models 310
7.7.1 Space-time interaction effects 312
7.7.2 Area Level Trends 312
7.7.3 Predictor effects in spatio-temporal models 313
7.7.4 Diffusion processes 314
7.8 Review 316
References 317
Exercises 320
Chapter 8 Structural Equation and Latent Variable Models 323
8.1 Introduction 323
8.1.1 Extensions to other applications 325
8.1.2 Benefits of Bayesian approach 326
8.2 Confirmatory factor analysis with a single group 327
8.3 Latent trait and latent class analysis for discrete outcomes 334
8.3.1 Latent class models 335
8.4 Latent variables in panel and clustered data analysis 340
8.4.1 Latent trait models for continuous data 341
8.4.2 Latent class models through time 341
8.4.3 Latent trait models for time varying discrete outcomes 343
8.4.4 Latent trait models for clustered metric data 343
8.4.5 Latent trait models for mixed outcomes 344
8.5 Latent structure analysis for missing data 352
8.6 Review 357
References 358
Exercises 360
Chapter 9 Survival and Event History Models 361
9.1 Introduction 361
9.2 Continuous time functions for survival 363
9.3 Accelerated hazards 370
9.4 Discrete time approximations 372
9.4.1 Discrete time hazards regression 375
viii
CONTENTS
9.4.2 Gamma process priors 381
9.5 Accounting for frailty in event history and survival models 384
9.6 Counting process models 388
9.7 Review 393
References 394
Exercises 396
Chapter 10 Modelling and Establishing Causal Relations: Epidemiological
Methods and Models 397
10.1 Causal processes and establishing causality 397
10.1.1 Specific methodological issues 398
10.2 Confounding between disease risk factors 399
10.2.1 Stratification vs. multivariate methods 400
10.3 Dose-response relations 413
10.3.1 Clustering effects and other methodological issues 416
10.3.2 Background mortality 427
10.4 Meta-analysis: establishing consistent associations 429
10.4.1 Priors for study variability 430
10.4.2 Heterogeneity in patient risk 436
10.4.3 Multiple treatments 439
10.4.4 Publication bias 441
10.5 Review 443
References 444
Exercises 447
Index 449
CONTENTS ix
Preface
This book follows Bayesian Statistical Modelling (Wiley, 2001) in seeking to make the
Bayesian approach to data analysis and modelling accessible to a wide range of
researchers, students and others involved in applied statistical analysis. Bayesian statis-
tical analysis as implemented by sampling based estimation methods has facilitated the
analysis of complex multi-faceted problems which are often difficult to tackle using
`classical' likelihood based methods.
The preferred tool in this book, as in Bayesian Statistical Modelling, is the package
WINBUGS; this package enables a simplified and flexible approach to modelling in
which specification of the full conditional densities is not necessary and so small changes
in program code can achieve a wide variation in modelling options (so, inter alia,
facilitating sensitivity analysis to likelihood and prior assumptions). As Meyer and Yu
in the Econometrics Journal (2000, pp. 198±215) state, ``any modifications of a model
including changes of priors and sampling error distributions are readily realised with
only minor changes of the code.'' Other sophisticated Bayesian software for MCMC
modelling has been developed in packages such as S-Plus, Minitab and Matlab, but is
likely to require major reprogramming to reflect changes in model assumptions; so my
own preference remains WINBUGS, despite its possible slower performance and con-
vergence than tailored made programs.
There is greater emphasis in the current book on detailed modelling questions such as
model checking and model choice, and the specification of the defining components (in
terms of priors and likelihoods) of model variants. While much analytical thought has
been put into how to choose between two models, say M
1
and M
2
, the process
underlying the specification of the components of each model is subject, especially in
more complex problems, to a range of choices. Despite an intention to highlight these
questions of model specification and discrimination, there remains considerable scope
for the reader to assess sensitivity to alternative priors, and other model components.
My intention is not to provide fully self-contained analyses with no issues still to resolve.
The reader will notice many of the usual `specimen' data sets (the Scottish lip cancer
and the ship damage data come to mind), as well as some more unfamiliar and larger
data sets. Despite recent advantages in computing power and speed which allow
estimation via repeated sampling to become a serious option, a full MCMC analysis
of a large data set, with parallel chains to ensure sample space coverage and enable
convergence to be monitored, is still a time-consuming affair.
Some fairly standard divisions between topics (e.g. time series vs panel data analysis)
have been followed, but there is also an interdisciplinary emphasis which means that
structural equation techniques (traditionally the domain of psychometrics and educa-
tional statistics) receive a chapter, as do the techniques of epidemiology. I seek to review
the main modelling questions and cover recent developments without necessarily going
into the full range of questions in specifying conditional densities or MCMC sampling
[...]... programs or results via e-mail at p .congdon@ qmul.ac.uk The WINBUGS programs that support the examples in the book are made available at ftp://ftp.wiley.co.uk/pub/books /congdon Peter CongdonApplied Bayesian Modelling Peter Congdon Copyright 2003 John Wiley & Sons, Ltd ISBN: 0-4 7 1-4 869 5-7 CHAPTER 1 The Basis for, and Advantages of, Bayesian Model Estimation via Repeated Sampling BAYESIAN MODEL ESTIMATION... non-standard densities If the full conditionals are non-standard but of a certain mathematical form (log-concave), then adaptive rejection sampling (Gilks and Wild, 1992) may be used within the Gibbs sampling for those parameters In other cases, alternative schemes based on the Metropolis±Hastings algorithm, may be used to sample from non-standard densities (Morgan, 2000) The program WINBUGS may be applied. .. Poisson probability of case i could then be evaluated in terms of that parameter This type of approach (n-fold cross-validation) may be computationally expensive except in small samples Another option is for a large dataset to be randomly divided into a small number k of groups; then cross-validation may be applied to each partition of the data, with k À 1 groups as `training' sample and the remaining group... domain estimation for survey outcomes (Ghosh and Rao, 1994), and meta-analysis across several studies (Smith et al., 1995) Unlike classical techniques, the Bayesian method allows model comparison across non-nested alternatives, and again the recent sampling estimation 1 See, for instance, Example 2.8 on geriatric patient length of stay 2 BAYESIAN MODEL ESTIMATION VIA REPEATED SAMPLING developments have... chooses the sampling method, opting for standard Gibbs sampling if conjugacy is identified, and for adaptive rejection sampling (Gilks and Wild, 1992) for non-conjugate problems with log-concave sampling densities For non-conjugate problems without log-concavity, Metropolis±Hastings updating is used, either slice sampling (Neal, 1997) or adaptive sampling (Gilks et al., 1998) To monitor parameters (i.e... BUGS the codings SIMULATING RANDOM VARIABLES FROM STANDARD DENSITIES and 15 t[i] $ dweib(alpha,lambda) x[i] $ dexp(lambda) t[i] `- pow(x[i],1/alpha) generate the same density 1.3.4 Gamma, chi-square and beta densities The gamma density is central to the modelling of variances in Bayesian analysis, and as a prior for the Poisson mean It has the form f (x) [ba aG(a)]xaÀ1 exp (À bx), xb0 with mean aab and... Example 1.3 below, and in a particular kind of multiple random effects model, the age-period-cohort model (Knorr-Held and Rainer, 2001) Identifiability issues also occur in discrete mixture regressions (Chapter 3) and structural equation models (Chapter 8) due to label switching during the MCMC sampling Such instances of non-identifiability will show as essentially nonconvergent parameter series between... INTRODUCTION Bayesian analysis of data in the health, social and physical sciences has been greatly facilitated in the last decade by advances in computing power and improved scope for estimation via iterative sampling methods Yet the Bayesian perspective, which stresses the accumulation of knowledge about parameters in a synthesis of prior knowledge with the data at hand, has a longer history Bayesian. .. number of other densities, and hence ways of sampling from them The chi-square is also used as a prior for the variance, and is the same as a gamma density with a na2, b 0X5 Its expectation is then n, usually interpreted as a degrees of freedom parameter The density (1.6) above is sometimes known as a scaled chi-square The chi-square may also be obtained for n an integer, by taking n draws x1 ,... u(t 1) , u(t 2)XX , or from more widely spaced sub-samples K steps apart u(t) , u(t K) , u(t 2K) Geweke (1992) developed a t-test applicable to assessing convergence in runs of sampled parameter values, both in single and multiple chain situations Let "a be the posterior u u mean of scalar parameter u from the first na iterations in a chain (after burn-in), and "b be the mean from the last nb draws . Applied Bayesian Modelling Applied Bayesian Modelling. Peter Congdon Copyright 2003 John Wiley & Sons, Ltd. ISBN: 0-4 7 1-4 869 5-7 WILEY SERIES IN PROBABILITY AND STATISTICS Established. Data Congdon, Peter. Applied Bayesian modelling / Peter Congdon. p. cm. ± (Wiley series in probability and statistics) Includes bibliographical references and index. ISBN 0-4 7 1-4 869 5-7 (cloth : alk. paper) 1 for instance, Example 2.8 on geriatric patient length of stay. Applied Bayesian Modelling. Peter Congdon Copyright 2003 John Wiley & Sons, Ltd. ISBN: 0-4 7 1-4 869 5-7 developments have facilitated