712 ✦ Chapter 12: The ENTROPY Procedure (Experimental) proc reg data=one outest=parm3; model y = x1 x2; by by; run; The 100 estimations of the coefficient on variable x1 are then summarized for each of the three error distributions by using PROC UNIVARIATE, as follows: proc univariate data=parm1; var x1; run; The following table summarizes the results from the estimations. The true value for the coefficient on x1 is 1.0. Estimation Normal Chi-Squared Cauchy Method Mean Std Deviation Mean Std Deviation Mean Std Deviation GME 0.418 0.117 0.626 .330 0.818 3.36 GME-NM 0.878 0.116 0.948 0.427 3.03 13.62 OLS 0.973 0.142 1.023 0.467 5.54 26.83 For normally distributed or nearly normally distributed data, moment-constrained maximum entropy is a good choice. For distributions not well described by a normal distribution, data-constrained maximum entropy is a good choice. Example 12.2: Unreplicated Factorial Experiments Factorial experiments are useful for studying the effects of various factors on a response. For the practitioner constrained to the use of OLS regression, there must be replication to estimate all of the possible main and interaction effects in a factorial experiment. Using OLS regression to analyze unreplicated experimental data results in zero degrees of freedom for error in the ANOVA table, since there are as many parameters as observations. This situation leaves the experimenter unable to compute confidence intervals or perform hypothesis testing on the parameter estimates. Several options are available when replication is impossible. The higher-order interactions can be assumed to have negligible effects, and their degrees of freedom can be pooled to create the error degrees of freedom used to perform inference on the lower-order estimates. Or, if a preliminary experiment is being run, a normal probability plot of all effects can provide insight as to which effects are significant, and therefore focused, in a later, more complete experiment. The following example illustrates the probability plot methodology and the alternative by using PROC ENTROPY. Consider a 2 4 factorial model with no replication. The data are taken from Myers and Montgomery (1995). Example 12.2: Unreplicated Factorial Experiments ✦ 713 data rate; do a=-1,1; do b=-1,1; do c=-1,1; do d=-1,1; input y @@; ab=a * b; ac=a * c; ad=a * d; bc=b * c; bd=b * d; cd=c * d; abc=a * b * c; abd=a * b * d; acd=a * c * d; bcd=b * c * d; abcd=a * b * c * d; output; end; end; end; end; datalines; 45 71 48 65 68 60 80 65 43 100 45 104 75 86 70 96 ; run; Analyze the data by using PROC REG, then output the resulting estimates. proc reg data=rate outest=regout; model y=a b c d ab ac ad bc bd cd abc abd acd bcd abcd; run; proc transpose data=regout out=ploteff name=effect prefix=est; var a b c d ab ac ad bc bd cd abc abd acd bcd abcd; run; Now the normal scores for the estimates can be computed with the rank procedure as follows: proc rank data=ploteff normal=blom out=qqplot; var est1; ranks normalq; run; To create the probability plot, simply plot the estimates versus their normal scores by using PROC SGPLOT as follows: title "Unreplicated Factorial Experiments"; proc sgplot data=qqplot; scatter x=est1 y=normalq / markerchar=effect markercharattrs=(size=10pt); xaxis label="Estimate"; yaxis label="Normal Quantile"; run; 714 ✦ Chapter 12: The ENTROPY Procedure (Experimental) Output 12.2.1 Normal Probability Plot of Effects The plot shown in Output 12.2.1 displays evidence that the a, b, d, ad, and bd estimates do not fit into the purely random normal model, which suggests that they may have some significant effect on the response variable. To verify this, fit a reduced model that contains only these effects. proc reg data=rate; model y=a b d ad bd; run; The estimates for the reduced model are shown in Output 12.2.2. Example 12.2: Unreplicated Factorial Experiments ✦ 715 Output 12.2.2 Reduced Model OLS Estimates Unreplicated Factorial Experiments The REG Procedure Model: MODEL1 Dependent Variable: y Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 70.06250 1.10432 63.44 <.0001 a 1 7.31250 1.10432 6.62 <.0001 b 1 4.93750 1.10432 4.47 0.0012 d 1 10.81250 1.10432 9.79 <.0001 ad 1 8.31250 1.10432 7.53 <.0001 bd 1 -9.06250 1.10432 -8.21 <.0001 These results support the probability plot methodology. PROC ENTROPY can directly estimate the full model without having to rely upon the probability plot for insight into which effects can be significant. To illustrate this, PROC ENTROPY is run by using default parameter and error supports in the following statements: proc entropy data=rate; model y=a b c d ab ac ad bc bd cd abc abd acd bcd abcd; run; The resulting GME estimates are shown in Output 12.2.3. Note that the parameter estimates associated with the a, b, d, ad, and bd effects are all significant. 716 ✦ Chapter 12: The ENTROPY Procedure (Experimental) Output 12.2.3 Full Model Entropy Results Unreplicated Factorial Experiments The ENTROPY Procedure GME-NM Variable Estimates Approx Approx Variable Estimate Std Err t Value Pr > |t| a 5.688414 0.7911 7.19 <.0001 b 2.988032 0.5464 5.47 <.0001 c 0.234331 0.1379 1.70 0.1086 d 9.627308 0.9765 9.86 <.0001 ab -0.01386 0.0270 -0.51 0.6149 ac -0.00054 0.00325 -0.16 0.8712 ad 6.833076 0.8627 7.92 <.0001 bc 0.113908 0.0941 1.21 0.2435 bd -7.68105 0.9053 -8.48 <.0001 cd 0.00002 0.000364 0.05 0.9569 abc -0.14876 0.1087 -1.37 0.1900 abd -0.0399 0.0516 -0.77 0.4509 acd 0.466938 0.1961 2.38 0.0300 bcd 0.059581 0.0654 0.91 0.3756 abcd 0.024785 0.0387 0.64 0.5312 Intercept 69.87294 1.1403 61.28 <.0001 Example 12.3: Censored Data Models in PROC ENTROPY Data available to an analyst might sometimes be censored, where only part of the actual series is observed. Consider the case in which only observations greater than some lower bound are recorded, as defined by the following process: y D max . Xˇ C; lb / : Running ordinary least squares estimation on data generated by the preceding process is not optimal because the estimates are likely to be biased and inefficient. One alternative to estimating models with censored data is the tobit estimator. This model is supported in the QLIM procedure in SAS/ETS and in the LIFEREG procedure in SAS/STAT. PROC ENTROPY provides another alternative which can make it very easy to estimate such a model correctly. Example 12.3: Censored Data Models in PROC ENTROPY ✦ 717 The following DATA step generates censored data in which any negative values of the dependent variable, y, are set to a lower bound of 0. data cens; do t = 1 to 100; x1 = 5 * ranuni(456); x2 = 10 * ranuni(456); y = 4.5 * x1 + 2 * x2 + 15 * rannor(456); if( y<0 ) then y = 0; output; end; run; To illustrate the effect of the censored option in PROC ENTROPY, the model is initially estimated without accounting for censoring in the following statements: title "Censored Data Estimation"; proc entropy data = cens gme primal; priors intercept -32 32 x1 -15 15 x2 -15 15; model y = x1 x2 / esupports = (-25 1 25); run; Output 12.3.1 GME Estimates Censored Data Estimation The ENTROPY Procedure GME Variable Estimates Approx Approx Variable Estimate Std Err t Value Pr > |t| x1 2.377609 0.000503 4725.98 <.0001 x2 2.353014 0.000255 9244.87 <.0001 intercept 5.478121 0.00188 2906.41 <.0001 The previous model is reestimated by using the CENSORED option in the following statements: proc entropy data = cens gme primal; priors intercept -32 32 x1 -15 15 x2 -15 15; model y = x1 x2 / esupports = (-25 1 25) censored(lb = 0, esupports=(-15 1 15) ); run; 718 ✦ Chapter 12: The ENTROPY Procedure (Experimental) Output 12.3.2 Entropy Estimates Censored Data Estimation The ENTROPY Procedure GME Variable Estimates Approx Approx Variable Estimate Std Err t Value Pr > |t| x1 4.429697 0.00690 641.85 <.0001 x2 1.46858 0.00349 420.61 <.0001 intercept 8.261412 0.0259 319.51 <.0001 The second set of entropy estimates are much closer to the true parameter estimates of 4.5 and 2. Since another alternative available for fitting a model of censored data is a tobit model, PROC QLIM is used in the following statements to fit a tobit model to the data: proc qlim data=cens; model y = x1 x2; endogenous y ~ censored(lb=0); run; Output 12.3.3 QLIM Estimates Censored Data Estimation The QLIM Procedure Parameter Estimates Standard Approx Parameter DF Estimate Error t Value Pr > |t| Intercept 1 2.979455 3.824252 0.78 0.4359 x1 1 4.882284 1.019913 4.79 <.0001 x2 1 1.374006 0.513000 2.68 0.0074 _Sigma 1 13.723213 1.032911 13.29 <.0001 For this data and code, PROC ENTROPY produces estimates that are closer to the true parameter values than those computed by PROC QLIM. Example 12.4: Use of the PDATA= Option It is sometimes useful to specify priors and supports by using the PDATA= option. This example illustrates how to create a PDATA= data set which contains the priors and support points for use in a Example 12.4: Use of the PDATA= Option ✦ 719 subsequent PROC ENTROPY step. In order to have a model to estimate in PROC ENTROPY, you must first have data to analyze. The following DATA step generates the data used in this analysis: title "Using a PDATA= data set"; data a; array x[4]; do t = 1 to 100; ys = -5; do k = 1 to 4; x[k] = rannor( 55372 ) ; ys = ys + x[k] * k; end; ys = ys + rannor( 55372 ); output; end; run; Next you fit this data with some arbitrary parameter support points and priors by using the following PROC ENTROPY statements: proc entropy data = a gme primal; priors x1 -10(2) 30(1) x2 -20(3) 30(2) x3 -15(4) 30(4) x4 -25(3) 30(2) intercept -13(4) 30(2) ; model ys = x1 x2 x3 x4 / esupports=(-25 0 25); run; These statements produce the output shown in Output 12.4.1. Output 12.4.1 Output From PROC ENTROPY Using a PDATA= data set The ENTROPY Procedure GME Variable Estimates Approx Approx Variable Estimate Std Err t Value Pr > |t| x1 1.195688 0.1078 11.09 <.0001 x2 1.844903 0.1018 18.12 <.0001 x3 3.268396 0.1136 28.77 <.0001 x4 3.908194 0.0934 41.83 <.0001 intercept -4.94319 0.1005 -49.21 <.0001 You can estimate the same model by first creating a PDATA= data set, which includes the same information as the PRIORS statement in the preceding PROC ENTROPY step. 720 ✦ Chapter 12: The ENTROPY Procedure (Experimental) A data set that defines the supports and priors for the model parameters is shown in the following statements: data test; length Variable $ 12 Equation $ 12; input Variable $ Equation $ Nsupport Support Prior ; datalines; Intercept . 2 -13 0.66667 Intercept . 2 30 0.33333 x1 . 2 -10 0.66667 x1 . 2 30 0.33333 x2 . 2 -20 0.60000 x2 . 2 30 0.40000 x3 . 2 -15 0.50000 x3 . 2 30 0.50000 x4 . 2 -25 0.60000 x4 . 2 30 0.40000 ; The following statements reestimate the model by using these support points. proc entropy data=a gme primal pdata=test; model ys = x1 x2 x3 x4 / esupports=(-25 0 25); run; These statements produce the output shown in Output 12.4.2. Output 12.4.2 Output From PROC ENTROPY with PDATA= option Using a PDATA= data set The ENTROPY Procedure GME Variable Estimates Approx Approx Variable Estimate Std Err t Value Pr > |t| x1 1.195686 0.1078 11.09 <.0001 x2 1.844902 0.1018 18.12 <.0001 x3 3.268395 0.1136 28.77 <.0001 x4 3.908194 0.0934 41.83 <.0001 Intercept -4.94319 0.1005 -49.21 <.0001 These results are identical to the ones produced by the previous PROC ENTROPY step. Example 12.5: Illustration of ODS Graphics ✦ 721 Example 12.5: Illustration of ODS Graphics This example illustrates how to use ODS graphics in the ENTROPY procedure. This example is a continuation of the example in the section “Simple Regression Analysis” on page 662. Graphical displays are requested by specifying the ODS GRAPHICS statement. For information about the graphics available in the ENTROPY procedure, see the section “ODS Graphics” on page 710. The following statements show how to generate ODS graphics plots with the ENTROPY procedure. The plots are displayed in Output 12.5.1. proc entropy data=coleman; model test_score = teach_sal prcnt_prof socio_stat teach_score mom_ed; run; Output 12.5.1 Model Diagnostics Plots . 0.8627 7 .92 <.0001 bc 0.11 390 8 0. 094 1 1.21 0.2435 bd -7.68105 0 .90 53 -8.48 <.0001 cd 0.00002 0.000364 0.05 0 .95 69 abc -0.14876 0.1087 -1.37 0. 190 0 abd -0.0 399 0.0516 -0.77 0.45 09 acd 0.46 693 8. > |t| x1 1. 195 688 0.1078 11. 09 <.0001 x2 1.84 490 3 0.1018 18.12 <.0001 x3 3.268 396 0.1136 28.77 <.0001 x4 3 .90 8 194 0. 093 4 41.83 <.0001 intercept -4 .94 3 19 0.1005 - 49. 21 <.0001 You. > |t| x1 1. 195 686 0.1078 11. 09 <.0001 x2 1.84 490 2 0.1018 18.12 <.0001 x3 3.268 395 0.1136 28.77 <.0001 x4 3 .90 8 194 0. 093 4 41.83 <.0001 Intercept -4 .94 3 19 0.1005 - 49. 21 <.0001 These