Nonparametric modeling of the effects of air pollution on public health

88 262 0
Nonparametric modeling of the effects of air pollution on public health

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

NONPARAMETRIC MODELING OF THE EFFECTS OF AIR POLLUTION ON PUBLIC HEALTH PENG QIAO NATIONAL UNIVERSITY OF SINGAPORE 2005 NONPARAMETRIC MODELING OF THE EFFECTS OF AIR POLLUTION ON PUBLIC HEALTH PENG QIAO (B.Sc Peking University, China) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF STATISTICS AND APPLIED PROBABILITY NATIONAL UNIVERSITY OF SINGAPORE 2005 ACKNOWLEDGEMENTS For the completion of this thesis, I would like to express my heartfelt gratitude to my supervisor, Assistant Professor Xia Yingcun, for all his invaluable advice and guidance, endless patience, kindness and encouragement during the mentor period in the Department of Statistics and Applied Probability of National University of Singapore I have learned many things from him, especially regarding academic research and character building I truly appreciate all the time and effort he has spent on helping me to solve my problems even when he was in the midst of his work I also wish to express my sincere gratitude and appreciation to my other lecturers, namely Professors Bai Zhidong, Chen Zehua, and Loh Wei Liem, etc, for imparting ii Acknowledgements iii knowledge and techniques to me and their precious guidance and help in my study I would like to take this opportunity to record my thanks to my dear parents who have always been supporting me with their encouragement and understanding And special thanks to all of my friends, who have contributed to my thesis in one way or another, for their concern and inspiration in my study and life during the past two years It is a great experience to share those colorful days with them Finally, I would like to attribute the completion of this thesis to other members and staffs in our department for their help in various ways and providing such a pleasant studying and working environment Peng Qiao August 2005 Contents Acknowledgements ii Summary vi List of Tables vii List of Figures viii Chapter 1.1 Introduction Backgrounds on Air Pollution 1.1.1 Particulate Matter (PM) 1.1.2 Ozone (O3 ) 1.1.3 Sulphur Dioxide (SO2 ) 1.1.4 Nitrogen Dioxide (NO2 ) iv Contents v 1.1.5 Carbon Monoxide (CO) 1.2 Quantification of Health Effects 1.3 Objectives and Organization Chapter Materials 11 2.1 Data Source 11 2.2 Data Descriptions 12 Chapter Methodology 15 3.1 Dimension Reduction Through Regression 16 3.2 Model Selection Through Cross-Validation 20 Chapter Simulations 27 Chapter Results and Discussions 31 5.1 Preliminary Analysis 32 5.2 Dimension Reduction 36 5.3 Model Selection 41 Chapter Concluding Remarks 54 Bibliography 57 Appendix A Conditions for Theorem 61 Appendix B Time-Series Plots 64 Appendix C Scatter Plot Matrix with Correlations 70 SUMMARY This thesis aims to analyze the effects of exposure to air pollution on public health across 15 populous cities in the United States, based on daily observations from January 1987 to December 1998 In our analysis, the first step is to perform the Efficient Dimension Reduction (EDR) procedure to reduce the complexity resulting from high dimensionality involved in the air pollution problem After obtaining the dimension and the directions of the EDR space for each study city, we then compare the cross-validatory (CV ) values, which assess models in view of their forecasting performance, of a Generalized Additive Model (GAM) with those values of a general nonparametric regression model The criterion is to choose the model with smaller CV -values Finally, we need vi Summary to answer one important question: whether the commonly used GAM is acceptable to quantify the effects of air pollution on public health? Our results show that air pollutants (PM10 , O3 , SO2 , NO2 and CO) at current levels, acting with weather conditions (measured by temperature and humidity) together, have adverse effects on human health The more influential hazards to death are O3 , PM10 , and weather variates As for model selection, our results suggest that EDR via the rMAVE method proposed by Xia et al (2002) is necessary to the original pollution data set, and that the general nonparametric regression model incorporating EDR outperforms GAMs That is, GAMs are not desirable when considering the predictive ability, and hence they can be improved to better fit the air pollution data These results represent a starting point for refinement in the future analysis of the effects of air pollution on public health It would seem appropriate then to investigate how to adjust the EDR space for proper usage of GAMs to gain a better forecasting performance and a deeper understanding of the link between air pollution and mortality rate for future work vii List of Tables Table 4.1 Simulation Results of Cross-Validatory Criterion 29 Table 5.1 Descriptive Characteristics of the 15 cities 33 Table 5.2 Estimated EDR dimensions for the 15 cities 36 Table 5.3 Estimated EDR directions for the 15 cities 38 Table 5.4 Results of CV -value criterion for the 15 cities 43 viii List of Figures Figure 2.1 Locations of the Fifteen Study Cities 13 Figure 5.1 Partial residual plots of GAM (5.5) for Baton Rouge 46 Figure 5.2 Partial residual plots of GAM (5.5) for Dallas/Fort Worth 47 Figure 5.3 Partial residual plots of GAM (5.5) for Los Angeles 47 Figure 5.4 Partial residual plots of GAM (5.5) for San Bernardino 48 Figure 5.5 Partial residual plots of GAM (5.5) for San Diego 48 ix Conditions of Theorem 62 (A.7) k has compact support, and ∀ x1 , x2 ∈ R1 , |k(x1 ) − k(x2 )| ≤ c3 x1 − x2 (A.8) For every t, s, τ,t , s τ ∈ N, the joint probability density function of (Xt , Xs , Xτ , Xt , Xs , Xτ ) is bounded (A.9) Let p + 1q = For some p > and δ > such that δ < 2q − 1, E|ε|2p(1+δ ) < ∞ and E|g(X1 )|2p(1+δ ) < ∞ δ /(1+δ ) (A.10) For δ in condition (i) and some ε > 0, β j β j = sup E i∈N sup n (X) A∈Fi+ j = o( j−2+ε ), where {|Pr(A|F1n (X)) − Pr(A)|} (A.11) Let j = j(n) be a positive integer and i = i(n) be the largest positive integer such that 2i j ≤ n, 1/(1+i) i lim sup + 6e1/2 β j < ∞ n→∞ (A.12) For i = i(n) in condition (k) and the bandwidth h, lim sup{i(n)hD } < ∞ n→∞ (A.13) nh2D → ∞ as n → ∞ (A.14) For µ in assumption (d), nh2D+2µ → as n → ∞ (A.15) For q, δ and ε in condition (i) and (j) nε h−2D+θ → as n → ∞, where θ = 4D/(q + qδ ) We briefly describe here some explanation of these conditions in order, which were given in Cheng and Tong (1993) Conditions (A.1)-(A.4) are self-explanatory Condition(A.5) is the introduction of a weight function ω, the purpose of which is to overcome Conditions of Theorem the “infinite integration problem” in asymptotic expansion encountered by Auestad and Tjøstheim (1990) Conditions (A.6), (A.7), (A.9), (A.13) and (A.14) are standard conditions in nonparametric inference Condition (A.8) is a mild condition, which will be useful when we use a mixing inequality Condition (A.10) is a very mild condition, which is weaker than geometric absolute regularity Conditions (A.11) and (A.12) were given by Roussas (1988) They may be replaced by other assumptions on the mixing coefficient β , if other methods are used to show the almost sure convergence of fˆn and gˆ n Condition (A.15) is necessary for proposition of Denker and Keller (1983) Note that conditions (A.10) and (A.15) not contradict each other 63 Appendix B Time-Series Plots In this appendix, we include the time-series plots for the monthly averages of death rate, temperature, dew point temperature (i e humidity), PM10 , O3 , SO2 , NO2 and CO from January 1987 to December 1998, in Baton Rouge, Dallas/Fort Worth, Los Angeles, San Bernardino, and San Diego respectively We select them as representatives of all the 15 study cites 64 Time-Series Plots 65 Death 2.5 70 60 40 1.5 Temp Humi PM10 Baton Rouge 50 Time 40 Time 25 Time SO2 Time NO2 10 15 20 15 O3 35 20 Time Time CO 400 800 Time 1988 1990 1992 1994 1996 Year Time Figure B.1 Time-series plots for Baton Rouge 1998 Time-Series Plots 66 22 18 80 60 Time 40 Time 30 Time 25 Time Time NO2 15 800 Time 1200 10 CO 20 SO2 15 O3 35 10 PM10 5020 Humi 60 40 Temp 14 Death Dallas/Fort Worth 400 Time 1988 1990 1992 1994 1996 Year Time Figure B.2 Time-series plots for Dallas/Fort Worth 1998 Time-Series Plots 67 70 65 Time 40 Time Time NO2 40 Time 350020 CO 2000 SO2 O3 Time 10 20 30 40 20 PM10 60 Humi Time 35 45 55 65 55 Temp 75 50 Death Los Angeles 500 Time 1988 1990 1992 1994 1996 Year Time Figure B.3 Time-series plots for Los Angeles 1998 Time-Series Plots 68 Humi 10 12 70 60 30 40 50 6050 Time Time 40 Time Time NO2 25 CO Time Time 400 1000 160015 35 −1 SO2 20 O3 60 20 PM10 Time 60 Temp 80 Death San Bernardino 1988 1990 1992 1994 1996 Year Time Figure B.4 Time-series plots for San Bernardino 1998 Time-Series Plots 69 20 16 65 Time 50 Time 40 Time 35 Time Time NO2 Time 500 1500 CO 10 20 30 40 SO2 15 25 O3 45 20 PM10 60 40 Humi 60 55 Temp 7512 Death 24 San Diego Time 1988 1990 1992 1994 Year Time Figure B.5 Time-series plots for San Diego 1996 1998 Appendix C Scatter Plot Matrix with Correlations In this appendix, we include the scatter plot matrix with correlations among the monthly averages of death count, temperature, dew point temperature (i e humidity), PM10 , O3 , SO2 , NO2 and CO from January 1987 to December 1998, in Baton Rouge, Dallas/Fort Worth, Los Angeles, San Bernardino, and San Diego respectively We select them as representatives of all the 15 study cities 70 Scatter Plot Matrix with Correlations 71 Baton Rouge 50 60 70 80 20 40 400 800 −0.50 −0.31 −0.28 0.21 0.37 0.22 0.97 0.49 0.64 −0.27 −0.66 −0.30 0.41 0.51 −0.29 −0.68 −0.26 0.34 −0.21 −0.29 −0.37 −0.016 −0.26 −0.11 0.42 0.28 1.5 −0.52 2.5 Death 50 70 Temp 40 60 Humi 20 40 PM10 15 25 35 O3 NO2 400 800 0.44 10 15 20 SO2 CO 1.5 2.5 Figure C.1 40 60 15 25 35 10 15 20 Scatter plot matrix with correlations for Baton Rouge Scatter Plot Matrix with Correlations 72 Dallas/Fort Worth 60 80 10 30 50 400 800 1200 22 Death −0.69 −0.42 −0.61 0.27 0.39 −0.15 0.96 0.63 0.85 −0.20 −0.56 −0.12 0.51 0.78 −0.19 −0.56 −0.14 0.54 −0.21 −0.23 0.016 −0.22 −0.42 −0.15 0.24 −0.23 14 −0.71 18 40 40 60 80 Temp 50 20 40 60 Humi 10 30 PM10 15 25 35 O3 NO2 1200 10 0.10 15 20 SO2 400 800 CO 14 18 22 Figure C.2 20 40 60 15 25 35 10 15 20 Scatter plot matrix with correlations for Dallas/Fort Worth Scatter Plot Matrix with Correlations 73 Los Angeles 55 65 75 20 40 60 500 2000 3500 −0.76 −0.055 −0.72 0.48 0.35 0.65 0.90 0.17 0.77 −0.26 −0.19 −0.56 0.08 0.80 −0.43 −0.33 −0.68 0.051 0.45 0.68 0.41 −0.48 −0.44 −0.81 0.73 0.79 75 50 −0.75 70 Death 65 55 65 Temp 35 45 55 Humi 40 20 40 60 PM10 10 20 30 O3 SO2 20 3500 0.82 40 NO2 500 2000 CO 50 70 Figure C.3 35 45 55 65 10 20 30 40 20 30 40 50 Scatter plot matrix with correlations for Los Angeles Scatter Plot Matrix with Correlations 74 San Bernardino 70 80 20 60 −1 400 1000 1600 10 12 60 Death −0.55 −0.50 −0.62 −0.16 −0.34 −0.041 0.91 0.52 0.88 0.028 0.35 −0.25 0.36 0.80 −0.061 0.26 −0.33 0.41 0.21 0.68 0.38 −0.043 0.14 −0.38 0.22 0.40 80 −0.63 50 Humi 30 40 50 60 50 60 70 Temp 20 60 PM10 20 40 60 O3 NO2 1600 15 0.53 25 35 −1 SO2 400 1000 CO 10 12 Figure C.4 30 40 50 60 20 40 60 15 25 35 Scatter plot matrix with correlations for San Bernardino Scatter Plot Matrix with Correlations 75 San Diego 65 75 20 40 60 500 1500 2500 24 55 −0.57 −0.0037 −0.70 0.43 0.33 0.45 0.93 −0.071 0.51 −0.19 −0.49 −0.64 −0.12 0.50 −0.29 −0.60 −0.73 −0.14 0.43 0.63 0.53 −0.53 −0.55 −0.65 0.57 0.60 75 12 −0.59 16 20 Death 55 65 Temp 60 40 50 60 Humi 25 15 O3 35 45 20 40 PM10 NO2 0.94 500 1500 CO 12 16 20 24 Figure C.5 40 50 60 15 25 35 45 10 20 30 40 Scatter plot matrix with correlations for San Diego 10 20 30 40 SO2 NONPARAMETRIC MODELING OF THE EFFECTS OF AIR POLLUTION ON PUBLIC HEALTH PENG QIAO 2005 [...]... heart contractions and lowers the amount of oxygen carried by the blood It possibly causes nausea, dizziness and headaches and is fatal at very high concentration 1.2 Quantification of Health Effects As evidence of negative impacts of air pollution on public health has been accumulated, quantification of these impacts has increasingly become a critical concern This 5 1.2 Quantification of Health Effects. .. sponsored by the Health Effects Institute (HEI), is a systematic investigation of the dependence of mortality rates on air pollution The database includes various cause-mortality counts, weather conditions and air pollution data for the 108 largest cities in the United States for the 13-year period from January 1st , 1987 to December 31st , 2000 The NMMAPS data on mortality, weather, census and air pollution. .. regression and 2) model selection through a cross-validatory criterion We will introduce them in the following subsections respectively 3.1 Dimension Reduction Through Regression The final goal of a multiple regression analysis is to understand how the conditional distribution of a univariate response Y given a vector X of p predictors depends on the value of X If the conditional distribution of Y |X... explore the association between death probability and levels of air pollution shortly before the death, using mortality counts as the outcome measure Our study is a time-series analysis One feature of time-series studies on heath effects of air pollution is that the probability of death is influenced not by a single hazard, but rather by a function of a whole 6 1.2 Quantification of Health Effects 7 set of. .. mild conditions and is called the EDR space Hence, we will refer to the column vectors of B0 as the EDR directions, which are unique to the orthogonal transformations To estimate the EDR space, we need to estimate the directions B0 as well as the dimension D In fact, the direction estimation B0 is a solution to the problem 2 min E Y − E(Y |BT X) B:BT B=ID For any orthogonal matrix B, the conditional... justification for the use of these models, especially for GAMs In this subsection, we introduce a nonparametric model selection criterion based on the Cross-Validatory (CV ) values measuring the predictive performance of models In the following discussions, we assume the actual dimension of the EDR space is D Model selection can be based on subjective judgements as well as on more objective methods Often the. .. The rest of this thesis is organized as follows In the next chapter, Chapter 2, we describe the sources and characteristics of the mortality and pollution data of America under our study Chapter 3 introduces the nonparametric method involved in this study One component of our approach is the “rMAVE” dimension reduction method based on a semi-parametric regression model to determine the EDR space; the. .. validation set are well-predicted from the other samples in the training set, it indicates that the model will have good forecasting ability for new samples of the same general population In the simplest case, the validation set contains only one sample: this is so called the “leave-one-out cross validation” that is broadly used Specifically, consider the general framework of nonparametric regression Y... Hence, the question whether a GAM is valid for time-series air pollution data rises To date, however, those reports using GAMs to model health impacts only discussed the estimates but not statistically justified the use of GAMs Is there any feasible method to assess the performance of GAMs on fitting the associations between mortality rates and air pollutant levels and weather conditions? Is there any... terminology of Cook and Weisberg (1999), Model (3.1) implies that the distribution of Y |X is the same as that of Y |BT0 X Therefore, the p-dimensional predictor X can be replaced by the D-dimensional predictor BT0 X without loss of regression information and this replacement represents a potentially useful reduction in the dimension of the multi-predictor vector The space spanned by the columns of B0 can .. .NONPARAMETRIC MODELING OF THE EFFECTS OF AIR POLLUTION ON PUBLIC HEALTH PENG QIAO (B.Sc Peking University, China) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF STATISTICS... hence they can be improved to better fit the air pollution data These results represent a starting point for refinement in the future analysis of the effects of air pollution on public health. .. evidence of negative impacts of air pollution on public health has been accumulated, quantification of these impacts has increasingly become a critical concern This 1.2 Quantification of Health Effects

Ngày đăng: 26/11/2015, 23:08

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan