512 ✦ Chapter 9: The COMPUTAB Procedure title3 'Based on Forecasted Unit Sales'; title4 'All Values Shown'; options linesize=96; proc computab data=outcome cwidth=12; %whatif(mktshr=.02 .07 .15 .25,price=38.00, ucost=20.00,taxrate=.48,numshar=15000,overhead=5000); %show(mktshr tunits units sales cost ovhd gprof tax pat earn); run; Output 9.6.1 PROC COMPUTAB Report That Uses Macro Invocations Fleet Footwear, Inc. Marketing Analysis Income Statement Based on Forecasted Unit Sales All Values Shown Calculated March June September December Total Market Share 0.02 0.07 0.15 0.25 0.12 Market Forecast 23,663.94 24,169.61 24,675.27 25,180.93 97,689.75 Items Sold 473.28 1,691.87 3,701.29 6,295.23 12,161.67 Sales $17,984.60 $64,291.15 $140,649.03 $239,218.83 $462,143.61 Cost of Goods $9,465.58 $33,837.45 $74,025.80 $125,904.65 $243,233.48 Overhead $5,000.00 $7,315.33 $11,133.22 $16,061.71 $39,510.26 Gross Profit $3,519.02 $23,138.38 $55,490.00 $97,252.47 $179,399.87 Tax $1,689.13 $11,106.42 $26,635.20 $46,681.19 $86,111.94 Profit After Tax $1,829.89 $12,031.96 $28,854.80 $50,571.28 $93,287.93 Earnings per Share $0.12 $0.80 $1.92 $3.37 $6.22 The following statements produce a similar report for different values of market share and unit costs. The report in Output 9.6.2 displays the values for the market share, market forecast, sales, after-tax profit, and earnings per share. title3 'Revised'; title4 'Selected Values Shown'; options linesize=96; proc computab data=outcome cwidth=12; %whatif(mktshr=.01 .06 .12 .20,price=38.00, Example 9.7: Cash Flows ✦ 513 ucost=23.00,taxrate=.48,numshar=15000,overhead=5000); %show(mktshr tunits sales pat earn); run; Output 9.6.2 Report That Uses Macro Invocations for Selected Values Fleet Footwear, Inc. Marketing Analysis Income Statement Revised Selected Values Shown Calculated March June September December Total Market Share 0.01 0.06 0.12 0.20 0.10 Market Forecast 23,663.94 24,169.61 24,675.27 25,180.93 97,689.75 Sales $8,992.30 $55,106.70 $112,519.22 $191,375.06 $367,993.28 Profit After Tax $-754.21 $7,512.40 $17,804.35 $31,940.30 $56,502.84 Earnings per Share $-0.05 $0.50 $1.19 $2.13 $3.77 Example 9.7: Cash Flows The COMPUTAB procedure can be used to model cash flows from one time period to the next. The RETAIN statement is useful for enabling a row or column to contribute one of its values to its successor. Financial functions such as IRR (internal rate of return) and NPV (net present value) can be used on PROC COMPUTAB table values to provide a more comprehensive report. The following statements produce Output 9.7.1: data cashflow; input date date9. netinc depr borrow invest tax div adv ; datalines; 30MAR1982 65 42 32 126 43 51 41 30JUN1982 68 47 32 144 45 54 46 30SEP1982 70 49 30 148 46 55 47 30DEC1982 73 49 30 148 48 55 47 ; title1 'Blue Sky Endeavors'; title2 'Financial Summary'; title4 '(Dollar Figures in Thousands)'; proc computab data=cashflow; cols qtr1 qtr2 qtr3 qtr4 / 'Quarter' f=7.1; col qtr1 / 'One'; col qtr2 / 'Two'; 514 ✦ Chapter 9: The COMPUTAB Procedure col qtr3 / 'Three'; col qtr4 / 'Four'; row begcash / 'Beginning Cash'; row netinc / 'Income' ' Net income'; row depr / 'Depreciation'; row borrow; row subtot1 / 'Subtotal'; row invest / 'Expenditures' ' Investment'; row tax / 'Taxes'; row div / 'Dividend'; row adv / 'Advertising'; row subtot2 / 'Subtotal'; row cashflow/ skip; row irret / 'Internal Rate' 'of Return' zero=' '; rows depr borrow subtot1 tax div adv subtot2 / +3; retain cashin -5; _col_ = qtr( date ); rowblock: subtot1 = netinc + depr + borrow; subtot2 = tax + div + adv; begcash = cashin; cashflow = begcash + subtot1 - subtot2; irret = cashflow; cashin = cashflow; colblock: if begcash then cashin = qtr1; if irret then do; temp = irr( 4, cashin, qtr1, qtr2, qtr3, qtr4 ); qtr1 = temp; qtr2 = 0; qtr3 = 0; qtr4 = 0; end; run; Example 9.7: Cash Flows ✦ 515 Output 9.7.1 Report That Uses a RETAIN Statement and the IRR Financial Function Blue Sky Endeavors Financial Summary (Dollar Figures in Thousands) Quarter Quarter Quarter Quarter One Two Three Four Beginning Cash -5.0 -1.0 1.0 2.0 Income Net income 65.0 68.0 70.0 73.0 Depreciation 42.0 47.0 49.0 49.0 BORROW 32.0 32.0 30.0 30.0 Subtotal 139.0 147.0 149.0 152.0 Expenditures Investment 126.0 144.0 148.0 148.0 Taxes 43.0 45.0 46.0 48.0 Dividend 51.0 54.0 55.0 55.0 Advertising 41.0 46.0 47.0 47.0 Subtotal 135.0 145.0 148.0 150.0 CASHFLOW -1.0 1.0 2.0 4.0 Internal Rate of Return 20.9 516 Chapter 10 The COUNTREG Procedure Contents Overview: COUNTREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 518 Getting Started: COUNTREG Procedure . . . . . . . . . . . . . . . . . . . . . . 519 Syntax: COUNTREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 Functional Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522 PROC COUNTREG Statement . . . . . . . . . . . . . . . . . . . . . . . . 523 BOUNDS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 BY Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 CLASS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 FREQ Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 INIT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 MODEL Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 NLOPTIONS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 OUTPUT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 RESTRICT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529 WEIGHT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530 ZEROMODEL Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 530 Details: COUNTREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Specification of Regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 Poisson Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 Negative Binomial Regression . . . . . . . . . . . . . . . . . . . . . . . . . 535 Zero-Inflated Count Regression Overview . . . . . . . . . . . . . . . . . . . . 537 Zero-Inflated Poisson Regression . . . . . . . . . . . . . . . . . . . . . . . 538 Zero-Inflated Negative Binomial Regression . . . . . . . . . . . . . . . . . 540 Computational Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 Nonlinear Optimization Options . . . . . . . . . . . . . . . . . . . . . . . . 544 Covariance Matrix Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 544 Displayed Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544 OUTPUT OUT= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 OUTEST= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 ODS Table Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 Examples: COUNTREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 548 Example 10.1: Basic Models . . . . . . . . . . . . . . . . . . . . . . . . . 548 Example 10.2: ZIP and ZINB Models for Data Exhibiting Extra Zeros . . . 552 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 518 ✦ Chapter 10: The COUNTREG Procedure Overview: COUNTREG Procedure The COUNTREG (count regression) procedure analyzes regression models in which the dependent variable takes nonnegative integer or count values. The dependent variable is usually an event count, which refers to the number of times an event occurs. For example, an event count might represent the number of ship accidents per year for a given fleet. In count regression, the conditional mean E.y i jx i / of the dependent variable y i is assumed to be a function of a vector of covariates x i . The Poisson (log-linear) regression model is the most basic model that explicitly takes into account the nonnegative integer-valued aspect of the outcome. With this model, the probability of an event count is determined by a Poisson distribution, where the conditional mean of the distribution is a function of a vector of covariates. However, the basic Poisson regression model is limited because it forces the conditional mean of the outcome to equal the conditional variance. This assumption is often violated in real-life data. Negative binomial regression is an extension of Poisson regression in which the conditional variance can exceed the conditional mean. Also, an often encountered characteristic of count data is that the number of zeros in the sample exceeds the number of zeros predicted by either the Poisson or negative binomial model. Zero-inflated Poisson (ZIP) and zero- inflated negative binomial (ZINB) models explicitly model the production of zero counts to account for excess zeros and also enable the conditional variance of the outcome to differ from the conditional mean. Under zero-inflated models, additional zeros occur with probability ' i , which is determined by a separate model, ' i D F .z 0 i / , where F is the normal or logistic distribution function that results in a probit or logistic model and z i is a set of covariates. PROC COUNTREG supports the following models for count data: Poisson regression negative binomial regression with quadratic (NEGBIN2) and linear (NEGBIN1) variance functions (Cameron and Trivedi 1986) zero-inflated Poisson (ZIP) model (Lambert 1992) zero-inflated negative binomial (ZINB) model In recent years, count data models have been used extensively in economics, political science, and sociology. For example, Hausman, Hall, and Griliches (1984) examine the effects of research and development expenditures on the number of patents received by U.S. companies. Cameron and Trivedi (1986) study factors that affect the number of doctor visits. Greene (1994) studies the number of derogatory reports to a credit reporting agency for a group of credit card applica nts. As a final example, Long (1997) analyzes the number of doctoral publications in the final three years of Ph.D. studies. The COUNTREG procedure uses maximum likelihood estimation. When a model with a dependent count variable is estimated using linear ordinary least squares (OLS) regression, the count nature of the dependent variable is ignored. This can lead to negative predicted counts and to parameter estimates with undesirable properties in terms of statistical efficiency, consistency, and unbiasedness Getting Started: COUNTREG Procedure ✦ 519 unless the mean of the counts is high, in which case the Gaussian approximation and linear regression might be satisfactory. Getting Started: COUNTREG Procedure The COUNTREG procedure is similar in use to other regression model procedures in the SAS System. For example, the following statements are used to estimate a Poisson regression model: proc countreg data=one ; model y = x / dist=poisson ; run; The response variable y is numeric and has nonnegative integer values. To allow for variance greater than the mean, specify the DIST=NEGBIN option to fit the negative binomial model instead of the Poisson. The following example illustrates the use of PROC COUNTREG. The data are taken from Long (1997) and can be found in the SAS/ETS Sample Library. This study examines how factors such as gender (fem), marital status (mar), number of young children (kid5), prestige of the graduate program (phd), and number of articles published by a scientist’s mentor (ment) affect the number of articles (art) published by the scientist. The first 10 observations are shown in Figure 10.1. Figure 10.1 Article Count Data Obs art fem mar kid5 phd ment 1 3 0 1 2 1.38000 8.0000 2 0 0 0 0 4.29000 7.0000 3 4 0 0 0 3.85000 47.0000 4 1 0 1 1 3.59000 19.0000 5 1 0 1 0 1.81000 0.0000 6 1 0 1 1 3.59000 6.0000 7 0 0 1 1 2.12000 10.0000 8 0 0 1 0 4.29000 2.0000 9 3 0 1 2 2.58000 2.0000 10 3 0 1 1 1.80000 4.0000 The following SAS statements estimate the Poisson regression model: proc countreg data=long97data; model art = fem mar kid5 phd ment / dist=poisson; run; The Model Fit Summary, shown in Figure 10.2, lists several details about the model. By default, the COUNTREG procedure uses the Newton-Raphson optimization technique. The maximum log- 520 ✦ Chapter 10: The COUNTREG Procedure likelihood value is shown, in addition to two information measures, Akaike’s information criterion (AIC) and Schwarz’s Bayesian information criterion (SBC), which can be used to compare competing Poisson models. Smaller values of these criteria indicate better models. Figure 10.2 Estimation Summary Table for a Poisson Regression The COUNTREG Procedure Model Fit Summary Dependent Variable art Number of Observations 915 Data Set WORK.LONG97DATA Model Poisson Log Likelihood -1651 Maximum Absolute Gradient 3.5741E-9 Number of Iterations 5 Optimization Method Newton-Raphson AIC 3314 SBC 3343 The parameter estimates of the model and their standard errors are shown in Figure 10.3. All covariates are significant predictors of the number of articles, except for the prestige of the program (phd), which has a p-value of 0.6271. Figure 10.3 Parameter Estimates of Poisson Regression Parameter Estimates Standard Approx Parameter DF Estimate Error t Value Pr > |t| Intercept 1 0.304617 0.102982 2.96 0.0031 fem 1 -0.224594 0.054614 -4.11 <.0001 mar 1 0.155243 0.061375 2.53 0.0114 kid5 1 -0.184883 0.040127 -4.61 <.0001 phd 1 0.012823 0.026397 0.49 0.6271 ment 1 0.025543 0.002006 12.73 <.0001 The following statements fit the negative binomial model. While the Poisson model requires that the conditional mean and conditional variance be equal, the negative binomial model allows for overdispersion; that is, the conditional variance can exceed the conditional mean. proc countreg data=long97data; model art = fem mar kid5 phd ment / dist=negbin(p=2); run; The fit summary is shown in Figure 10.4, and parameter estimates are listed in Figure 10.5. Syntax: COUNTREG Procedure ✦ 521 Figure 10.4 Estimation Summary Table for a Negative Binomial Regression The COUNTREG Procedure Model Fit Summary Dependent Variable art Number of Observations 915 Data Set WORK.LONG97DATA Model NegBin Log Likelihood -1561 Maximum Absolute Gradient 7.0695E-6 Number of Iterations 17 Optimization Method Newton-Raphson AIC 3136 SBC 3170 Figure 10.5 Parameter Estimates of Negative Binomial Regression Parameter Estimates Standard Approx Parameter DF Estimate Error t Value Pr > |t| Intercept 1 0.256144 0.138560 1.85 0.0645 fem 1 -0.216418 0.072672 -2.98 0.0029 mar 1 0.150490 0.082106 1.83 0.0668 kid5 1 -0.176415 0.053060 -3.32 0.0009 phd 1 0.015271 0.036040 0.42 0.6718 ment 1 0.029082 0.003470 8.38 <.0001 _Alpha 1 0.441620 0.052967 8.34 <.0001 The parameter estimate for _Alpha of 0:4416 is an estimate of the dispersion parameter in the negative binomial distribution. A t test for the hypothesis H 0 W ˛ D 0 is provided. It is highly significant, indicating overdispersion (p < 0:0001). The null hypothesis H 0 W ˛ D 0 can be also tested against the alternative ˛ > 0 by using the likelihood ratio test, as described by Cameron and Trivedi (1998, pp. 45, 77–78). The likelihood ratio test statistic is equal to 2.L P L NB / D 2.1651 C 1561/ D 180 , where L P and L NB are the log likelihoods for the Poisson and negative binomial models, respectively. The likelihood ratio test is highly significant, providing strong evidence of overdispersion. Syntax: COUNTREG Procedure The COUNTREG procedure is controlled by the following statements: . 23,663 .94 24,1 69. 61 24,675.27 25,180 .93 97 ,6 89. 75 Items Sold 473.28 1, 691 .87 3,701. 29 6, 295 .23 12,161.67 Sales $17 ,98 4.60 $64, 291 .15 $140,6 49. 03 $2 39, 218.83 $462,143.61 Cost of Goods $9, 465.58. 0.10 Market Forecast 23,663 .94 24,1 69. 61 24,675.27 25,180 .93 97 ,6 89. 75 Sales $8 ,99 2.30 $55,106.70 $112,5 19. 22 $ 191 ,375.06 $367 ,99 3.28 Profit After Tax $-754.21 $7,512.40 $17,804.35 $31 ,94 0.30 $56,502.84 Earnings. $74,025.80 $125 ,90 4.65 $243,233.48 Overhead $5,000.00 $7,315.33 $11,133 .22 $16,061.71 $ 39, 510.26 Gross Profit $3,5 19. 02 $23,138.38 $55, 490 .00 $97 ,252.47 $1 79, 399 .87 Tax $1,6 89. 13 $11,106.42 $26,635.20