model and variable selection procedures for semiparametric time series regression

Hindawi Publishing Corporation Journal of Probability and Statistics Volume 2009, Article ID 487194, 37 pages doi:10.1155/2009/487194 Research Article Model and Variable Selection Procedures for Semiparametric Time Series Regression Risa Kato and Takayuki Shiohama Department of Management Science, Faculty of Engineering, Tokyo University of Science, Kudankita 1-14-6, Chiyoda, Tokyo 102-0073, Japan Correspondence should be addressed to Takayuki Shiohama, shiohama@ms.kagu.tus.ac.jp Received 13 March 2009; Accepted 26 June 2009 Recommended by Junbin Gao Semiparametric regression models are very useful for time series analysis They facilitate the detection of features resulting from external interventions The complexity of semiparametric models poses new challenges for issues of nonparametric and parametric inference and model selection that frequently arise from time series data analysis In this paper, we propose penalized least squares estimators which can simultaneously select significant variables and estimate unknown parameters An innovative class of variable selection procedure is proposed to select significant variables and basis functions in a semiparametric model The asymptotic normality of the resulting estimators is established Information criteria for model selection are also proposed We illustrate the effectiveness of the proposed procedures with numerical simulations Copyright q 2009 R Kato and T Shiohama This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited Introduction Non- and semiparametric regression has become a rapidly developing field of statistics in recent years Various types of nonlinear model such as neural networks, kernel methods, as well as spline method, series estimation, local linear estimation have been applied in many fields Non- and semiparametric methods, unlike parametric methods, make no or only mild assumptions about the trend or seasonal components and are, therefore, attractive when the data on hand does not meet the criteria for classical time series models However, the price of this flexibility can be high; when multiple predictor variables are included in the regression equation, nonparametric regression faces the so-called curse of dimensionality A major problem associated with non- and semiparametric trend estimation involves the selection of a smoothing parameter and the number of basis functions Most literature on nonparametric regression with dependent errors focuses on the kernel estimator of the trend function see, e.g., Altman , Hart and Herrmann et al These results have been extended to the case with long-memory errors by Hall and Hart , Ray and Tsay , Journal of Probability and Statistics and Beran and Feng Kernel methods are affected by the so-called boundary effect A well-known estimator with automatic boundary correction is the local polynomial approach which is asymptotically equivalent to some kernel estimates For detailed discussions on local polynomial fitting see, for example, Fan and Gijbels and Fan and Yao For semiparametric models with serially correlated errors, Gao proposed the semiparametric least-square estimators SLSEs for the parametric component and studied its asymptotic properties You and Chen 10 constructed a semiparametric generalized leastsquare estimator SGLSE with autoregressive errors Aneiros-Pérez and Vilar-Fernández 11 constructed SLSE with correlated errors Like parametric regression models, variable selection of the smoothing parameter for the basis functions is important problem in non- and semiparametric models It is common practice to include only important variables in the model to enhance predictability The general approach to finding sensible parameters is to choose an optimal subset determined according to the model selection criterion Several information criteria for evaluating models constructed by various estimation procedures have been proposed, see, for example, Konishi and Kitagawa 12 The commonly used criteria are generalized cross-validation, the Akaike information criterion AIC , and the Bayesian information criterion BIC Although best subset selection is practically useful, these selection procedures ignore stochastic errors inherited between the stages of variable selection Furthermore, best subset selection lacks stability, see, for example, Breiman 13 Nonconcave penalized likelihood approaches for selecting significant variables for parametric regression models have been proposed by Fan and Li 14 This methodology can be extended to semiparametric generalized regression models with dependent errors One of the advantages of this procedure is the simultaneous selection of variables and the estimation of unknown parameters The rest of this paper is organized as follows In Section 2.1 we introduce our semiparametric regression models and explain classical partial ridge regression estimation Rather than focus on the kernel estimator of the trend function, we use the basis functions to fit the trend component of time series In Section 2.2, we propose a penalized weighted least-square approach with information criteria for estimation and variable selection The estimation algorithms are explained in Section 2.3 In Section 2.4, the GIC proposed by Konishi and Kitagawa 15 , the BICm proposed by Hastie and Tibshirani 16 , and the BICp proposed by Konishi et al 17 are applied to the evaluation of models estimated by penalized weighted least-square Section 2.5 contains the asymptotic results of proposed estimators In Section the performance of these information criteria is evaluated by simulation studies Section contains the real data analysis Section concludes our results, and proofs of the theorems are given in the appendix Estimation Procedures In this section, we present our semiparametric regression model and estimation procedures 2.1 The Model and Penalized Estimation We consider the semiparametric regression model: yi α ti β xi εi , i 1, , n, 2.1 Journal of Probability and Statistics where yi is the response variable and xi is the d × covariate vector at time i, α ti is i/n, β is a vector of unknown regression an unspecified baseline function of ti with ti coefficients, and εi is a Gaussian and zero mean covariance stationary process We assume the following properties for the error terms εi and vectors of explanatory variables xi A.1 It holds that {εi } is a linear process given by ∞ εi bj ei−j , 2.2 j where b0 and {ei } is an i.i.d Gaussian random variable with E{ei } E{ei } σe2 A.2 The coefficients bj satisfy the conditions that for all |z| < 1, ∞ j j |bj | < ∞ ∞ j 0 and bj zj / and We define γ k cov εt , εt k E{εt εt k } The assumptions on covariate variables are as follows B.1 Also xi xi1 , , xid ∈ Rd and {xij }, j 1, d, have mean zero and variance The trend function α ti is expressed as a linear combination of a set of m underlying basis functions: m α ti wφ t , wk φk ti 2.3 k φ1 ti , , φm ti } is an m-dimensional vector constructed from basis where {φ ti w1 , , wm is an unknown parameter vector to be functions {φk ti ; k 1, , m}, and w estimated The examples of basis functions are B-spline, P-spline, and radial basis functions A P-spline basis is given by φ ti p p ti , , ti , ti − κ1 , , ti − κk p , 2.4 where {κk }k 1, ,K are spline knots This specification uses the so-called truncated power function basis The choice of the number of knots K and the knot locations are discussed by Yu and Ruppert 18 Radial basis function RBF emerged as a variant of artificial renewal network in late 80s Nonlinear specification of using RBF has been widely used in cognitive science, engineering, biology, linguistics, and so on If we consider the RBF modeling, a basis function can take the form φk ti exp − ti − μ k 2s2k , where μk determines the location and s2k determines the width of the basis function 2.5 Journal of Probability and Statistics Selecting appropriate basis functions, then the semiparametric regression model 2.1 can be expressed as a linear model y Xβ Bw ε, 2.6 y1 , , yn , B φ1 , , φn with φi φ1 i/n , , φm i/n where X x1 , , xn , y The penalized least-square estimator is then a minimizer of the function y − Xβ − Bw y − Xβ − Bw nξw Kw, 2.7 where ξ is the smoothing parameter controlling the tradeoff between the goodness-of-fit measured by weighted least-square and the roughness of the estimated function Also K is an appropriate positive semidefinite symmetric matrix For example, if K satisfies w Kw α u du, we have the usual quadratic integral penalty see, e.g., Green and Silverman 19 By simple calculus, 2.7 is minimized when β and w satisfy the block matrix equation XX XB BX BB nξK β X w B y 2.8 This equation can be solved without any iteration see, e.g., Green 20 First, we find Bw S y − Xβ , where S B B B αK −1 B is usually called the smoothing matrix Substituting Bw into 2.6 , we obtain y Xβ ε, 2.9 where y I−S y, X I−S X ,and I is the identity matrix of order n Applying least-square to the linear model 2.9 , we obtain the semiparametric ordinary least-square estimator SOLSE result: βSOLSE wSOLSE BB XX nξK −1 −1 Xy , B y − XβSOLSE 2.10 2.11 Speckman 21 studied similar solutions for partial linear models with independent observations Since the errors are serially correlated in model 2.1 , βSOLSE is not asymptotically efficient To obtain an asymptotically efficient estimator for β, we use the prewhitening ∞ transformation Note that the errors {εi } in 2.6 are invertible Let b L j bj ei−j , where j L is the lag operator and a L b L −1 a0 − ∞ Applying a L to the j aj L with a0 model 2.6 and rewriting the corresponding equation, we obtain the new model: y Xβ Bw e, 2.12 Journal of Probability and Statistics where y y , ,y n , X y x1 , , xn , B i yi − φ1 , , φn and e ∞ φ aj yi−j , j ∞ aj φi−j , j 2.13 ∞ xi − xi i φi − e1 , , en Here aj xi−j j The regression errors in 2.12 are i.i.d Because, in practice, the response variable yi is unknown, we use a reasonable approximation by y based on the work by Xiao et al 22 i and Aneiros-Pérez and Vilar-Fernández 11 Under the usual regularity conditions the coefficients aj decrease geometrically so, letting τ τ n denote a truncation parameter, we may consider the truncated autoregression on εi : ∞ εi − ei aj εi−j , 2.14 j where ei are i.i.d random variables with E ei the truncation parameter We make the following assumption about C.1 The truncation parameter τ satisfies τ n c log n for some c > The expansion rate of the truncation parameter given in C.1 is also for convenience Let Tτ be the n × n transformation matrix such that eτ Tτ ε Then the model 2.12 can be expressed as Tτ y Tτ Xβ Tτ Bw Tτ ε, 2.15 where ⎛ Tτ δ11 ··· ⎜ ⎜δ ··· ⎜ 21 −δ22 ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ δτ1 · · · −δττ ⎜ ⎜ ⎜−aτ · · · −a1 ⎜ ⎜ ⎜ −aτ · · · −a1 ⎜ ⎜ ⎜ ⎜ ⎝ ··· −aτ ⎞ ⎟ 0⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ · · · −a1 2.16 with δ11 σe / γ , δ22 σe / − ρ2 γ , δ21 ρ σe / − ρ2 γ , Here ρ h γ h /ρ denotes the lag h autocorrelation function of {εi } Journal of Probability and Statistics Now our estimation problem for the semiparametric time series regression model can be expressed as the minimization of the function L β, w y − Xβ − Bw V−1 y − Xβ − Bw αw Kw, 2.17 where V−1 σe −2 Tτ Tτ and σe n−1 Tτ ε Based on the work by Aneiros-Pérez and VilarFernández 11 , an estimator for Tτ is constructed as follows We use the residuals ε y − XβSOLSE − BwSOLSE to construct an estimate of Tτ using the ordinary least square method applied to the model εi Define the estimate aτ ··· a1 εi−1 aτ εi−τ a1 , a2 , , aτ of aτ 2.18 a1 , a2 , , aτ , where Eτ Eτ aτ residuali −1 Eτ ε, 2.19 where ε ετ , , εn and Eτ is the n − τ × τ matrix of regressors with the typical element εi−j Then Tτ is obtained from Tτ by replacing aj with aj , σe2 with σe2 , and so forth Applying least-square to the linear model, we obtain Tτ y Tτ Xβ Tτ Bw Tτ ε 2.20 Then βSGLSE wSGLSE Bτ Bτ −1 Xτ Xτ nξK −1 Xτ yτ , 2.21 Bτ yτ − Xτ βSGLSE , where Xτ I − S Xτ and yτ I − S yτ , with yτ Tτ y and Xτ Tτ X The following theorem shows that the loss in efficiency associated with the estimation of the autocorrelation structure is modest in large samples Theorem 2.1 Let the conditions of (A.1), (A.2), (B.1), and (C.1) hold, and assume that Σ1 limn → ∞ n−1 X V−1 X is nonsingular Let β0 denote the true value of β, then √ n β − β0 √ √ n βSGLSE − β0 n βSGLSE − β0 Op → N 0, Σ−1 , D τ n 1/2 , 2.22 2.23 Journal of Probability and Statistics where → denotes convergence in distribution and β D −1 Xτ Xτ −1 Xτ yτ Assume that Σ2 −1 limn → ∞ n B V B is nonsingular and let w0 denote the true value of w, then one has √ √ n w − w0 √ n wSGLSE − w0 Op τ n 1/2 , n wSGLSE − w0 → N 0, Σ2 −1 , 2.25 D where w Bτ Bτ nξK −1 2.24 Bτ yτ − Xτ β 2.2 Variable Selection and Penalized Least Squares Variable and model selection are an indispensable tool for statistical data analysis However, it has rarely been studied in the semiparametric context Fan and Li 23 studied penalized weighted least-square estimation with variable selection in semiparametric models for longitudinal data In this section, we introduce the penalized weighted least-square approach We propose an algorithm for calculating the penalized weighted least-square estimator of θ β , w in Section 2.3 In Section 2.4 we present the information criteria for the model selection From now on, we assume that the matrices Xτ and Bτ are standardized so that each column has mean and variance The first term in 2.7 can be regarded as a loss function of β and w, which we will denote by l β, w Then expression 2.7 can be written as L β, w l β, w nξw Kw 2.26 The methodology in the previous section can be applied to the variable selection via penalized least-square A form of penalized weighted least-square is ⎛ S β, w l β, w n⎝ d m pλ1 βi i ⎞ pλ w j ⎠ nξw Kw 2.27 j where pλi · are penalty functions and λi are regularization parameters, which control the model complexity By minimizing 2.27 with a special construction of the penalty function given in what following some coefficients are estimated as 0, which deletes the corresponding variables, whereas others are not Thus, the procedure selects variables and estimates coefficients simultaneously The resulting estimate is called a penalized weighted least-square estimate Many penalty functions have been used for penalized least-square and penalized likelihood in various non- and semiparametric models There are strong connections between the penalized weighted least-square and the variable selection Denote by θ β , w and z z1 , , zd m the true parameters and the estimates, respectively By taking the hard thresholding penalty function pλ |θ| λ2 |θ| − λ I |θ| < λ , 2.28 Journal of Probability and Statistics we obtain the hard thresholding rule θ zI |z| > λ 2.29 λ|θ|2 results in a ridge regression and the L1 penalty Pλ |θ| The L2 penalty Pλ |θ| yields a soft thresholding rule sgn z I |z| > λ θ λ|θ| 2.30 This solution gives the best subset selection via stepwise deletion and addition Tibshirani 24, 25 has proposed LASSO, which is the penalized least-square estimate with the L1 penalty, in the general least-square and likelihood settings 2.3 An Estimation Algorithm In this section we describe an algorithm for calculating the penalized least-square estimator of θ β , w The estimate of θ minimizes the penalized sum of squares L θ given by 2.17 First we obtain θSOLSE in Step In Step 2, we estimate Tτ by using ε obtained in Step HT Then θSGLSE is obtained using Tτ Step Here the penalty parameters λ, and ξ, and the number of basis functions m are chosen using information criteria that will be discussed in Section 2.4 Step First we obtain βSOLSE and wSOLSE by 2.10 and 2.11 , respectively Then we have the model y BwSOLSE XβSOLSE ε 2.31 Step An estimator for Tτ is constructed followings the work of Aneiros-Pérez and VilarFernández We use the residuals ε y − BwSOLSE − XβSOLSE to construct an estimate of Tτ using the ordinary least square method applied to the model εi a1 εi−1 ··· aτ εi−τ residuali 2.32 The estimator Tτ is obtained from Tτ by replacing parameters with their estimates Step Our SGLSE of θ is obtained by using the model yτ Bτ w Xτ β ετ , 2.33 where yτ Tτ y, Bτ Tτ B, Xτ Tτ X, and ετ Tτ ε Finding the solution of the penalized least-square of 2.27 needs the local quadratic approximation, because the L1 and hard thresholding penalty are irregular at the origin and may not have second derivatives at some points We follow the methodology of Fan and Li 14 Suppose that we are given an initial Journal of Probability and Statistics value θ that is close to the minimizer of 2.27 If θj is very close to 0, then set θj Otherwise they can be locally approximated by a quadratic function as pλj θj θj sgn θj ≈ pλ j ⎧ ⎨ pλ θj j ⎩ θj 0 ⎫ ⎬ 2.34 θj , ⎭ 0 when θj / Therefore, the minimization problem 2.27 can be reduced to a quadratic minimization problem and the Newton-Raphson algorithm can be used The right-hand side of equation 2.27 can be locally approximated by l β0 , w0 ∇lβ β, w ∇lw β0 , w0 β − β0 β − β0 ∇2ββ β0 , w0 w − w0 ∇2ww β0 , w0 w − w0 , β − β0 β − β ∇2βw w − w0 w − w0 nβ Σλ1 β0 β 2.35 nw Σλ2 w0 w, where ∂l β0 , w0 , ∂β ∇lβ β0 , w0 ∇2 lββ β0 , w0 ∂2 l β0 , w0 ∂2 l β0 , w0 , ∂β∂w ∇2 lβ,w β0 , w0 Σ λ1 β Σλ2 w0 diag diag ⎧ ⎨ pλ ⎩ ⎧ ⎨ pλ ⎩ β1 , , pλ β1 ⎫ ⎬ ⎭ w1 , , pλ w1 2.36 βd , βd ∂2 l β0 , w0 , ∂w∂w ∇2 lww β0 , w0 , ∂β∂β ∂l β0 , w0 , ∂w ∇lw β0 , w0 wm ⎫ ⎬ ⎭ wm The solution can be found by iteratively computing the block matrix equation: ⎛ ⎝ Xτ Xτ nΣλ1 β Bτ Xτ Xτ Bτ Bτ Bτ αK nΣλ2 w ⎞ ⎠ β Xτ w Bτ y 2.37 10 Journal of Probability and Statistics This gives the estimators where yτ βSGLSE HT Xτ Xτ wHT SGLSE Bτ Bτ I − Sτ yτ , Xτ −1 nΣλ1 β nξK Xτ yτ , nΣλ2 w I − Sτ Xτ , and Sτ −1 Bτ yτ − Bτ Bτ Bτ nξK HT Xτ βSGLSE nΣλ2 w 2.38 , −1 Bτ 2.4 Information Criteria Selecting suitable values for the penalty parameters and number of basis functions is crucial to obtaining good curve fitting and variable selection The estimate of θ minimizes the penalized sum of squares L θ given by 2.17 In this section, we express the model 2.15 as yτ Aτ θ e, 2.39 where Aτ Xτ , Bτ and θ β , w In many applications, the number of basis functions m needs to be large to adequately capture the trend To determine the number of basis functions, all models with m ≤ mmax are fitted and the preferred model minimizes some model selection criteria The Schwarz BIC is given by BIC n log 2π σe2 log n the number of parameters , 2.40 where σe2 is the least-square estimate of σe2 without a degree of freedom correction Hastie and Tibshirani 16 used the trace of the smoother matrix as an approximation to the effective number of parameters By replacing the number of parameters in BIC by trSβ , we formally obtain information criteria for the basis function Gaussian regression model in the form BICm where σe2 n−1 y − Sθ y n log 2π σe2 trSθ log n, 2.41 and trSθ Aτ Aτ Aτ nξK Here Σλ θ is defined by 2.44 in what follows nΣλ θ −1 Aτ 2.42 24 Journal of Probability and Statistics 1.5 0.5 −0.5 −1 −1.5 1970 1980 1990 2000 Figure 7: Plots of estimated curves, the solid line represents y; the dotted lines are the estimated curves of y; the dashed lines are the estimated curves of α Table 11: Estimated coefficients for Model 4.4 log FLP for 1968–1984 for 1984–2007 for 1968–1979 for 1980–1989 for 1990–1999 for 2000–2007 PLS estimators −0.32 −0.28 0.02 −0.04 0.04 SE −1.99 −2.00 0.17 1.37 −0.51 0.17 PLS HT estimators −0.36 −0.31 0 0 SE −2.16 −2.18 makers, as a positive relationship implies that a rising FLP is associated with an increasing TFR Usually, FLP contains all women aged 15 to 64 However, TFR is a combination of fertility rates for ages 15–49, so we use the FLP of women aged 15 to 49 instead of women aged 15 to 64 We take the TFR from 1968 to 2007 in Japan The estimation is a semiparametric regression of log TFRi on log FLPi As the law of the Equal Employment Act came into force in 1985, we use the interaction variables “dummy for 1968–1984 × log TFR ” xi2 and for 1985–2007 xi3 We also use dummy variables for 1990–1999 and 2000–2007 xi4 , xi5 and consider the semiparametric model log TFR i α ti β xi εi , i 1, , 40 4.4 We applied the existing procedure PLS and proposed procedure PLS HT with BICp The resulting estimates and standard errors SE of β are given in Table 11 Therefore, we obtain the model yi α ti − 0.27xi1 − 0.20xi2 , i 1, , 40 4.5 The residual mean square of PLS HT is 2.24 × 10−2 and that of PLS is 2.47 × 10−2 The selected number of basis functions is six with one excluded basis function and the spread parameter s Journal of Probability and Statistics 25 0.1 0.05 −0.05 −0.1 −0.15 1970 1980 1990 2000 Figure 8: Plot of residuals 0.8 0.6 0.4 0.2 −0.2 10 15 Figure 9: ACF plot of residuals is estimated as 0.30 Table 11 shows that PLS HT selects only log FLPi 1968–1984 and 1985– 2007 That indicates a negative correlation between TFR and FLP for 1968–2007, especially for 1968–1984, which means TFR decreases as FLP increases We could not see the positive association in 80s which has been reported in recent studies, see, for example, Brewster and Rindfuss 27 , Ahn and Mira 28 , and Engelhardt et al 29 We plot the estimated trend curve, residuals and autocorrelation functions in Figures to Concluding Remarks In this article we have proposed variable and model selection procedures for the semiparametric time series regression We used the basis functions to fit the trend component An algorithm of estimation procedures is provided and the asymptotic properties are investigated From the numerical simulations, we have confirmed that the models determined by 26 Journal of Probability and Statistics the proposed procedure are superior to those based on the existing procedure They reduce the complexity of models and give good fitting by excluding basis functions and nuisance explanatory variables The development here is limited to the case with Gaussian stationary errors, but it seems likely our approach can be extended to the case with non-Gaussian long-range dependent errors, along with the lines explored in recent work by Aneiros-Pérez et al 30 However, the efficient estimation for regression parameter is an open question in case of long-range dependence This is a question we hope to address in future work We also plan to explore the question of whether the proposed techniques can be extended to the cointegrating regression models with an autoregressive distributed lag framework Appendix Proofs In this appendix we give the proofs of the theorems in Section We use x to denote the Euclidian norm of x a1,n , , aτ,n be the infeasible estimator for aτ a1 , , aτ constructed Let aτ,n a1,n , a2,n , , aτ,n Eτ Eτ −1 Eτ ε, where ε ετ , , εn using OLS methods That is aτ,n and Eτ εi,j , i 1, , n, j 1, , τ with εi,j εi−j−τ For ease of notation, we set aj,n aj,n for j > τ, and a0,n a0,n We write Γ k for cov ε0 , εk Then we can construct the infeasible estimate V using aτ,n and Γ k , k 0, , τ The following lemma states that the estimators β and w given in Theorem 2.1 have asymptotically normal distributions Lemma Under the assumptions of Theorem 2.1, one has √ √ , → N 0, Σ−1 A.1 , n w − w → N 0, Σ−1 A.2 n β−β D D −1 where Σ−1 and Σ2 are defined in Theorem 2.1 Proof of Lemma From model 2.6 , y − Xβ − Bw can be written as y − Xβ − Bw y − Bw y − Xβ y − Xβ − y − Xβ − Xβ X − X β − y − y − Bw A.3 y − Xβ S y − Xβ − Bw y − Xβ − B w − w , Journal of Probability and Statistics 27 where y, X, and w are given by y I−S y, X I−S X, and w B B nξK −1 B ε, respectively Hence w can be expressed without using β Then the minimization function L β, w in 2.17 can be written as L β, w y − Xβ V−1 y − Xβ − w − w B V−1 y − Xβ w − w B V−1 B w − w ≡ I1 β I2 β, w I3 w A.4 αw Kw I4 w First we consider asymptotic normality for w, using the model y Xβ0 Bw0 ε A.5 The estimators w minimize the function L β, w , which yields ∂L β, w ∂w I2 β, w I3 w I4 w −B V−1 y − Xβ B V−1 B w0 − w A.6 B V−1 B w − w0 2nξK w − w0 nξKw0 Then the minimization of this quadratic function is given by w − w0 B V−1 B nξK B V−1 B nξK B V−1 B A2 −1 nξK − nξ B V−1 B ≡ A1 −1 A3 B V−1 y − Xβ − B w − w0 − nξVB−1 Kw0 B V−1 y − Xβ −1 nξK B V−1 B w0 − w −1 B V−1 VB−1 Kw0 A.7 28 Journal of Probability and Statistics We now deal with A1 , A2 , and A3 First we evaluate A1 From the expansion A A−1 − aA−1 BA−1 O a2 , we can see that A1 B V−1 B ⎧ ⎨ nξK ξK B V−1 B n B V−1 B n B V−1 ε n −1 B V−1 B n ξ B V−1 B n B V−1 ε−ξ n B V−1 B n Kw0 −1 −1 −1 B V−1 ε −1 B V−1 B n ⎩ −1 aB B V−1 ε n −1 B V−1 B n K −1 B V−1 B n K ⎫ ⎬ B V−1 −1 Kw0 −1 O ξ2 B V−1 ε n ⎭ O ξ2 n ε A.8 B V−1 ε n O ξ2 O ξ Similarly, we obtain A2 B V−1 B ⎧ ⎨ B V−1 B n ⎩ B V−1 B w0 − w −1 ξK B V−1 B n B V−1 B n × −1 nξK w0 − w − ξ B V−1 B w0 − w n −1 −1 B V−1 B n −ξ K B V−1 B n −1 O ξ2 ⎫ ⎬ ⎭ A.9 w0 − w B V−1 B n −1 K w0 − w O ξ2 B V−1 B n w0 − w Finally, we can evaluate A3 as follows: A3 − B V−1 B ⎧ ⎨ ⎩ −ξ B V−1 B n B V−1 B n B V−1 B n nξK −1 B V−1 B−1 nξKw0 −1 ξK ξKw0 −1 −ξ B V−1 B n −1 Kw0 ξ2 −1 K B V−1 B n B V−1 B n K −1 B V−1 B n O ξ Kw0 ⎫ ⎬ ξKw0 ⎭ O ξ3 Kw0 A.10 Journal of Probability and Statistics 29 We can also observe that the weighted least-square estimates w have a normal distribution Hence Op n−1/2 w − w0 If ξ A.11 O nη and η < −1/2, then A1 , A2 , and A3 become A1 A2 and A3 −1 B V−1 B n w0 − w O ξ B V−1 ε n O ξ2 , O ξ O ξ3 Op n−1/2 , O ξ2 × Op w0 − w O ξ × Op w0 − w O ξ2 A.12 O ξ Therefore, A.7 can be written as −1 B V−1 B n w − w0 B V−1 ε n Op n−1/2 A.13 By the law of large numbers and the central limit theorem, √ n w − w0 → N 0, Σ−1 A.14 D Next we deal with the estimators β These minimize the function L β, w , which yields ∂L β, w ∂β I1 β −X V−1 ε I2 β, w X V−1 X β − β0 X V−1 B w − w A.15 The minimization of this quadratic function is given by β β0 β0 X V−1 X −1 XV X −1 −1 X V−1 ε −1 X V {ε X V−1 B w − w A.16 B w − w } If we substitute w for its estimator w0 , from A.14 and A.11 , we have β β0 X V−1 X β0 X V−1 X −1 −1 X V−1 ε X V−1 ε X V−1 B w0 − w Op n−1/2 A.17 Similarly, by the law of large numbers and the central limit theorem, √ n β − β0 This completes the proof of the lemma → N 0, Σ−1 D A.18 30 Journal of Probability and Statistics a1,n , aτ,n be the ordinary least-square Proof of Theorem 2.1 Let the estimator aτ,n aj,n for j > τ estimate applied to model 2.18 For the ease of notation, we set aj,n and a0,n a0,n Then we write ei,n ei − Si,n aj β − β xi−j Ri,n Qi,n , A.19 where ∞ Si,n w − w φi−j , j τ aj,n − aj,n Ri,n yi−j − β xi−j − w φi−j , A.20 j ∞ aj,n − aj Qi,n yi−j − β xi−j − w φi−j j From assumptions A.1 , A.2 , and Lemma we can see that under the assumptions about τ and by the Caucy-Schwarz inequality |Si,n | ≤ ∞ β − β xi−j aj w − w φi−j j ≤ β−β ∞ w−w aj xi−j A.21 ∞ Op n−1/2 aj φi−j j j Next we evaluate Ri,n In An et al 31, proof of Theorem : it is shown that, under the assumptions about τ n , τ aj,n − aj,n log n n o j 1/2 A.22 Thus, by the Cauchy-Schwarz inequality ⎛ |Ri,n | ≤ ⎝ τ ⎞1/2 ⎛ aj,n − aj,n ⎠ j ⎝ τ ⎞1/2 yi−j − β xi−j − w φi−j ⎠ , A.23 j which yields Ri,n o log n /n 1/4 Op τ 1/2 op Finally, we evaluate Qi,n By the extended Baxter inequality from Buhlmann 32, proof of Theorem 3.1 , we have ă j aj,n aj C ∞ aj j τ A.24 Journal of Probability and Statistics 31 Notice that yi−j − β xi−j − w φi−j ei,n Since ei,n is a stationary and invertible process whose linear process coefficients satisfy the given summability assumptions, we have for some M > 0, ∞ |Qi,n | ≤ M ∞ aj,n − aj ≤ M j aj op A.25 j τ From the above decomposition and evaluation, we can see that yτ − Xτ β − Bτ w y − Xβ − Bw op A.26 Therefore, in order to prove the second equation in Theorem 2.1 we just need to show −1 X V−1 τ X − X Vτ X n τ n 1/2 Op −1 B V−1 τ B − B Vτ B n τ n 1/2 Op , , A.27 −1 ε √ X V−1 τ − Vτ n Op τ n 1/2 −1 ε √ B V−1 τ − Vτ n τ n 1/2 Op , To see the above results are true, let yτ,i be the ith element yτ of model 2.20 We have for Tτ,i the ith row of Tτ , Xτ,i the ith column of Xτ , and Bτ,i the ith column of Bτ ei Tτ,i ε ei τ aj − aj εi−j , j Xτ,ij Tτ,j · Xτ,·i τ τ Xτ,ij aj Xi−j,i j ≡ Xτ,ij τ aj − aj Xi−j,i j aj − aj Xi−j,i , A.28 j Bτ,ij Tτ,j · Bτ,·i τ Bτ,ij τ aj Bi−j,i j ≡ Bτ,ij τ j aj − aj Bi−j,i j aj − aj Bi−j,i 32 Journal of Probability and Statistics for i τ 1, τ 2, , n, with similar expressions holding for i 1, 2, , τ By A.26 and the Op τ/n 1/2 see Xiao et al 22 , it follows that fact that aτ − a n ei Xτ,ij √ ni 1 n ei Xτ,ij √ ni 1 n Xτ,ij Xτ,ik ni 1 n Xτ,ij Xτ,ik ni 1 n ei Bτ,ij √ ni 1 n ei Bτ,ij √ ni Op τ n n Bτ,ij Bτ,ik ni 1 n Bτ,ij Bτ,ik ni Op τ n Therefore, using the expansion A aB and A.27 , we have √ n βSGLSE − β0 X V−1 X n X V−1 X n × √ √ n β − β0 B V−1 B n √ n β − β0 1/2 τ n τ n 1/2 Op τ n 1/2 Op −1 Op τ n τ n Op 1/2 , A.29 1/2 , 1/2 O a2 and from A.17 , A.13 −1 Op n−1 , 1/2 τ n , Op n−1/2 √ B V−1 ε n Op 1/2 τ n A−1 − aA−1 BA−1 √ X V−1 ε n Op X V−1 √ ε n B V−1 B n n wSGLSE − w0 −1 −1 Op Op n−1/2 −1 B V−1 √ ε n Op τ n −1/2 Op n−1 1/2 A.30 This completes the proof of Theorem 2.1 Journal of Probability and Statistics 33 Proof of Theorem 2.2 We write αn n−1/2 there exist large constants C such that P an It is sufficient to show that, for any given ζ > 0, inf S θ0 u αn u ≥ S θ C ≤ − ζ A.31 This implies, with probability at least − ζ, that there exists a local minimizer in the balls {θ0 αn u : u ≤ C} Define Dn u S θ0 αn u − S θ0 A.32 and that pλjn |θj | is nonnegative, so Note that pλjn Dn u ≥ n−1 {l θ0 αn u − l θ0 } d m pλjn θj0 αn uj − pλjn θj0 A.33 j ξ θ0 αn u K θ αn u − ξθ0 Kθ , where l θ is the first term of 2.7 and K is defined in 2.47 Now −1 n {l θ0 αn u − l θ } α21n B VB u1 n op α22n X V−1 X u2 n u1 − α1n u1 op BV ε n u2 − α2n u2 op n−1/2 X V−1 ε n op n−1/2 A.34 Note that B V−1 B/n → Σ1 , B V−1 ε/n → ξ , X V−1 X/n → Σ2 , and X V−1 ε/n → ξ2 are finite positive definite matrices in probability So the first term in the right side of A.34 is Op Cα21n Similarly, of order Op C12 α21n , and the second term is of order Op C1 n−1/2 α1n 2 the third term of A.34 is of order Op C2 α2n and the fourth term is order Op C2 n−1/2 α22n Furthermore, m pλj1n wj0 α1n uj − pλj1n wj0 pλj2n βj0 α2n uj − pλj2n βj0 , A.35 j d j , A.36 34 Journal of Probability and Statistics are bounded by √ mα1n a1n u α21n b1n u dα2n a2n u α2n b2n u 2 Cα2n √ m b1n C , d b2n C A.37 Cα22n by the Taylor expansion and the Cauchy-Schwarz inequality As bn → 0, the first term on the right side of A.34 will dominate A.35 and A.36 as well as the second term on the right side of A.34 , by taking C sufficiently large Hence A.31 holds for sufficiently large C This completes the proof of the theorem Lemma Under the conditions of Theorem 2.3, with probability tending 1, for any given β and w, w1 − w10 Op n−1/2 and any constant C, satisfying β1 − β10 S β1 , , w1 , β2 ≤C1 n−1/2 , w2 ≤C2 n−1/2 , S β1 , β2 , w1 , w2 A.38 Proof We show that with probability tending to 1, as n → ∞, for any β1 and w1 satisfying w1 − w10 Op n−1/2 , β2 ≤ C1 n−1/2 , and w2 ≤ C2 n−1/2 , ∂l β, w /∂βj and β1 − β10 βj have the same signs for βj ∈ −C1 n−1/2 , C1 n−1/2 , for j S1 1, , d Also ∂l β, w /∂wj and wj have the same signs for wj ∈ −C2 n−1/2 , C2 n−1/2 , for j S2 1, , m Thus the minimization is attained at β2 w2 For βj / and j S1 1, , d, ∂S β ∂βj where lj β lj β npλj2n βj sgn βj , A.39 ∂l β /∂βj By the proof of Theorem 2.1, lj β −n ξ2j − β − β0 Σ1j Op n−1/2 , A.40 where ξ2j is the jth component of ξ 2n and Σ1j is the jth column of Σ1 From β − β0 Op n−1/2 , n−1 lj β is of order Op n−1/2 Therefore, ∂S β ∂βj nλj2n λj1n pλj1n βj sgn βj Op n−1/2 /λ1n A.41 −1/2 Because lim infn → ∞ lim infβj → λ−1 λj1n → 0, the sign of the derivative j1n pλj1n |βj | > and n is completely determined by that of βj For wj / and j S1 1, , m, ∂S w ∂wj lj w npλj1n wj sgn wj 2nξKw, A.42 Journal of Probability and Statistics where lj w 35 ∂l w /∂wj By the proof of Theorem 2.1, lj w Op n−1/2 −n ξ1j − w − w0 Σ2j , A.43 where ξ1j is the jth component of ξ 1n and Σ2j is the jth column of Σ2 From w − w0 Op n−1/2 , n−1 lj w is of order Op n−1/2 Therefore, ∂S w ∂wj Op n−1/2 /λ2n nλj2n λj2n pλj2n wj sgn wj A.44 −1/2 Because lim infn → ∞ lim infwj → λ−1 λj2n → 0, n−1/2 λj2n → 0, the sign j2n pλj2n |wj | > and n of the derivative is completely determined by that of wj This completes the proof Proof of Theorem 2.3 Part a follows directly from follows by Lemma Now we prove part b Using an argument similar to the proof of Theorem 2.1, it can be shown that there exist a w1 and β1 in Theorem 2.3 that are a root-n consistent local minimizer of S{ w1 , } and S{ β1 , }, satisfying the penalized least-square equations: ∂S w1 , 0, ∂w1 A.45 ∂S β1 , 0 ∂β1 Following the proof of Theorem 2.1, we have ∂S w1 , ∂wj n ξ2 n bn ∂S β1 , ∂βj Op n−1/2 Σλ1 w Σ2 nξKw Op Op w1 − w10 w1 − w10 , A.46 n ξ1 n bn Op n−1/2 Σ2 Σλ2 β Σ1 Op Op β1 − β10 β1 − β10 where ξ1 and ξ2 consist of the first Sj , j 1, 2, , S1 and j 1, 2, , S2 components of ξ and ξ2 respectively Also Σ1 and Σ2 consist of the first Sj , j 1, 2, , S1 and j 1, 2, , S2 rows and columns of Σ1 and Σ2 , respectively 36 Journal of Probability and Statistics Therefore, similar to the proof of Theorem 2.1 and by Slutsky’s theorem, it follows that √ √ n IS1 n IS2 Σλ1 β Σ λ2 w β1 − β10 w1 − w10 IS1 IS2 Σλ1 β Σλ2 w −1 bβ −→ NS1 0, Σ−1 11 , ξK −1 bw −→ NS2 0, Σ−1 21 A.47 This completes the proof of Theorem 2.3 Acknowledgments The authors are grateful to two anonymous referees whose probing questions have led to a substantial improvement of the paper This research was supported by the Norinchukin Bank and Nochu Information System endowed chair of Financial Engineering in the Department of Management Science, Tokyo University of Science References N S Altman, “Kernel smoothing of data with correlated errors,” Journal of the American Statistical Association, vol 85, pp 749–759, 1990 J D Hart, “Kernel regression estimation with time series errors,” Journal of the Royal Statistical Society: Series B, vol 53, pp 173–188, 1991 E Herrmann, T Gasser, and A Kneip, “Choice of bandwidth for kernel regression when residuals are correlated,” Biometrika, vol 79, pp 783–795, 1992 P Hall and J D Hart, “Nonparametric regression with long-range dependence,” Stochastic Processes and Their Applications, vol 36, pp 339–351, 1990 B K Ray and R S Tsay, “Bandwidth selection for kernel regression with long-range dependence,” Biometrika, vol 84, pp 791–802, 1997 J Beran and Y Feng, “Local polynomial fitting with long-memory, short-memory and antipersistent errors,” Annals of the Institute of Statistical Mathematics, vol 54, no 2, pp 291–311, 2002 J Fan and I Gijbels, Local Polynomial Modeling and Its Applications, Chapman and Hall, London, UK, 1996 J Fan and Q Yao, Nonlinear Time Series: Nonparametric and Parametric Methods, Springer, New York, NY, USA, 2005 J T Gao, “Asymptotic theory for partially linear models,” Communications in Statistics: Theory and Methods, vol 22, pp 3327–3354, 1995 10 J You and G Chen, “Semiparametric generalized least squares estimation in partially linear regression models with correlated errors,” Journal of Statistical Planning and Inference, vol 137, no 1, pp 117–132, 2007 11 G Aneiros-Pérez and J M Vilar-Fernández, “Local polynomial estimation in partial linear regression models under dependence,” Journal of Statistical Planning and Inference, vol 52, pp 2757–2777, 2008 12 S Konishi and G Kitagawa, Information Criteria and Statistical Modeling, Springer, New York, NY, USA, 2008 13 L Breiman, “Heuristics of instability and stabilization in model selection,” The Annals of Statistics, vol 24, no 6, pp 2350–2383, 1996 14 J Fan and R Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American Statistical Association, vol 96, no 456, pp 1348–1360, 2001 15 S Konishi and G Kitagawa, “Generalised information criteria in model selection,” Biometrika, vol 83, no 4, pp 875–890, 1996 16 T J Hastie and R Tibshirani, Generalized Additive Models, Chapman and Hall, London, UK, 1990 17 S Konishi, T Ando, and S Imoto, “Bayesian information criteria and smoothing parameter selection in radial basis function networks,” Biometrika, vol 91, no 1, pp 27–43, 2004 Journal of Probability and Statistics 37 18 Y Yu and D Ruppert, “Penalized spline estimation for partially linear single-index models,” Journal of the American Statistical Association, vol 97, no 460, pp 1042–1054, 2002 19 P J Green and B W Silverman, Nonparametric Regression and Generalized Linear Models, Chapman and Hall, London, UK, 1994 20 J Green, “Penalized likelihood for generalized semi-parametric regression models,” International Statistical Review, vol 55, pp 245–259, 1987 21 P Speckman, “Kernel smoothing in partial linear models,” Journal of the Royal Statistical Society: Series B, vol 50, pp 413–436, 1988 22 Z Xiao, O B Linton, R J Carroll, and E Mammen, “More efficient local polynomial estimation in nonparametric regression with autocorrelated errors,” Journal of the American Statistical Association, vol 98, no 464, pp 980–992, 2003 23 J Fan and R Li, “New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis,” Journal of the American Statistical Association, vol 99, no 467, pp 710–723, 2004 24 R Tibshirani, “Regression shrinkage and selection via the LASSO,” Journal of the Royal Statistical Society: Series B, vol 58, pp 267–288, 1996 25 R Tibshirani, “The LASSO method for variable selection in the Cox model,” Statistics in Medicine, vol 16, no 4, pp 385–395, 1997 26 A Fuller, Introduction to Statistical Time Series, John Wiley & Sons, New York, NY, USA, 2nd edition, 1996 27 K L Brewster and R R Rindfuss, “Fertility and women’s employment in industrialized countries,” Annual Review of Sociology, vol 26, pp 271–286, 2000 28 N Ahn and P Mira, “A note on the changing relationship between fertility and female employment rates in developed countries,” Journal of Population Economics, vol 15, no 4, pp 667–682, 2002 29 H Engelhardt, T Kogel, and A Prskawetz, “Fertility and women’s employment reconsidered: a ă macro-level time-series analysis for developed countries, 1960–2000,” Population Studies, vol 58, no 1, pp 109–120, 2004 30 G Aneiros-Pérez, W González-Manteiga, and P Vieu, “Estimation and testing in a partial linear regression model under long-memory dependence,” Bernoulli, vol 10, no 1, pp 49–78, 2004 31 H Z An, Z G Chen, and E J Hannan, “Autocorrelation, autoregression and autoregressive approximation,” The Annals of Statistics, vol 10, pp 926–936, 1982 32 P Buhlmann, “Moving-average representation of autoregressive approximations, Stochastic Processes ă and Their Applications, vol 60, no 2, pp 331–342, 1995 Copyright of Journal of Probability & Statistics is the property of Hindawi Publishing Corporation and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission However, users may print, download, or email articles for individual use ... Estimation Procedures In this section, we present our semiparametric regression model and estimation procedures 2.1 The Model and Penalized Estimation We consider the semiparametric regression model: ... have proposed variable and model selection procedures for the semiparametric time series regression We used the basis functions to fit the trend component An algorithm of estimation procedures is... Xτ β 2.2 Variable Selection and Penalized Least Squares Variable and model selection are an indispensable tool for statistical data analysis However, it has rarely been studied in the semiparametric

Định dạng
Số trang	38
Dung lượng	499,03 KB