Recall that, according to the CAPM,αi= 0 indicates that assetiis mispriced, withαi>0 corresponding to an asset with a price that is too low andαi <0 corresponding to an asset with a price that is too high; see Section 7.5.
Therefore, a test of the hypothesisαi= 0 in the market model may be used as a test of the hypothesis that the asset is priced correctly; rejection of this hypothesis suggests that the price of the asset is either too low or too high. It is important to keep in mind that such a conclusion is a statement about the mar- ket in periods 1,2, . . . , Tand is not necessarily a statement about future prices.
The p-value for such a test is available in the result extracted using the summaryfunction on the results fromlm.
Example 8.4 Recall that, for IBM stock, the results of the lm function include the following:
Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) -0.000707 0.005358 -0.13 0.9 sp500 0.618789 0.138073 4.48 3.5e-05 ***
Thep-value for testingα= 0 is 0.9; therefore, we do not reject the hypothesis that the IBM stock is priced correctly.
To calculate thep-values for testing thatα= 0 for each of the stocks with returns included in the big8variable, we may use theapplyfunction. Note that the[1, 4]element of the$coefficientscomponent of the output from thelmfunction is thep-value for testing α= 0
> summary(lm(ibm~sp500))$coefficients[1, 4]
[1] 0.8955
that is rounded to 0.9 in the output fromlm.
Define a functionf.alphapvalthat takes a vector of excess returns as an argument and returns thisp-value:
> f.alphapval<-function(y)
+ {summary(lm(y~sp500))$coefficients[1, 4]}
> f.alphapval(ibm) [1] 0.8955
Thep-values for the eight stocks may then be calculated usingapply
> apply(big8, 2, f.alphapval)
AAPL BAX KO CVS XOM IBM JNJ DIS
0.0883 0.9576 0.3675 0.0945 0.7591 0.8955 0.2110 0.1224
Therefore, for these eight stocks, the hypothesis that the stock is priced cor- rectly is never rejected at the 0.05 level. Two stocks have ap-value less than 0.10—Apple, which has ˆα= 0.0154 and a p-value of 0.088, and CVS, which
has ˆα= 0.00956 and ap-value of 0.095.
Stock Screening and Multiple Testing
It is tempting to use tests ofαj= 0 to screen a large number of stocks, hoping to find a few that are mispriced. However, when testing many hypotheses in this way, it is important to be aware of themultiple testing problem.
Suppose that we are testingmnull hypotheses, each of the formH0:αj= 0.
Recall that ap-value has the property that, if the null hypothesis is true, then the probability is approximately 0.05 that thep-value will be less than or equal to 0.05; more generally, thep-value is approximately distributed as a uniform random variable on the interval (0,1) when the null hypothesis is true.
Therefore, even ifαj = 0 for allj= 1,2, . . . , m, we expect about 5% of the p-values to be less than 0.05. For instance, if we are testing αj = 0 for 100 stocks, even if all stocks are priced correctly, we expect about five significant p-values, defining significance in terms of a 0.05 level, that is, choosing each test to have a probability of Type I error of 0.05. Thus, if a few of the 100 p-values are significant, it may be inappropriate to conclude that those stocks are mispriced.
A simple way to deal with this issue is to modify the criterion for a signif- icantp-value. Suppose that we take the null hypothesis to be the hypothesis that allαj are 0; that is, consider the null hypothesis
H0:αj= 0, j= 1,2, . . . , m or, equivalently,
H0:α1=α2=ã ã ã=αm= 0.
For this testing problem, a Type I error corresponds to the event of rejecting αj = 0 for anyj when, in fact, allαj are 0. We can test this hypothesis using the p-values from the tests of the individual hypotheses αj= 0, which we denote byq1, q2, . . . , qm, respectively.
Suppose we want the level of our test of H0:α1=α2=ã ã ã=αm= 0 to be, at most, 0.05. If we reject αj = 0 when qj ≤c∗, for some threshold c∗, then the probability of a Type I error is
P (q1≤c∗ ∪ q2≤c∗ ∪ ã ã ã ∪ qm≤c∗) calculated under the assumption that
α1=α2=ã ã ã=αm= 0.
Exact calculation of this probability requires the joint distribution of (q1, q2, . . . , qm); hence, it is difficult, if not impossible, without making strong assumptions.
However, it is generally possible to bound the probability. Recall that, for two eventsA andB,
P(A∪B) = P(A) + P(B)−P(A∩B) and, hence,
P(A∪B)≤P(A) + P(B).
An induction argument can be used to show that for eventsA1, A2, . . . , Am P(A1∪A2∪ ã ã ã ∪Am)≤
m j=1
P(Aj), a result known as theBonferroni inequality.
Hence,
P(q1≤c∗ ∪ q2≤c∗ ∪ ã ã ã ∪ qm≤c∗)≤ m j=1
P(qj ≤c∗). (8.5)
Using the fact that a p-value has a uniform distribution under the null hypothesis, P(qj ≤c∗) =c∗. It follows that
P(q1≤c∗ ∪ q2≤c∗ ∪ ã ã ã ∪ qm≤c∗)≤mc∗.
Therefore, to guarantee that our test has a level less than or equal to 0.05, we can choose c∗= 0.05/m. Then the probability of concluding that any of the assets is mispriced when all are priced correctly is less than or equal to 0.05. Clearly, the same approach may be used for any desired level.
Hence, to address the multiple-testing problem, we modify the criterion for a significantp-value from 0.05 to 0.05/m, wheremis the number of hypotheses being tested; this is known as theBonferroni method. An equivalent approach is to calculate “adjustedp-values,” given bymqj,j= 1,2, . . . , m; ifmqj>1, we set the adjustedp-value to 1. The adjustedp-values can then be evaluated using the usual criteria; for instance, we can compare the adjusted p-values to 0.05 for a test with level 0.05.
Example 8.5 Consider stocks for firms represented in the S&P 100 index;
stocks in the S&P 100 index are a subset of those in the S&P 500 index, representing a cross section of large U.S. companies. For each stock, five years of monthly returns were analyzed for the period ending December 31, 2014;
only 96 of the 100 stocks had five years of monthly returns available.
For each of these 96 stocks, the p-value of the test of αj = 0 described earlier was calculated; the results are stored in the variablesp96.pv
> head(sp96.pv)
[1] 0.0883 0.2450 0.5338 0.9436 0.1488 0.0397
Thirteen of the p-values are less than 0.05, with the smallest at 0.0043.
> sort(sp96.pv)[1:15]
[1] 0.00426 0.00548 0.00930 0.01299 0.01801 0.01960 0.02715 0.02891 0.03139
[10] 0.03458 0.03966 0.04254 0.04811 0.05394 0.05474
For a test with level 0.05, the Bonferroni-corrected criterion is 0.05/96 = 0.00052; all of thep-values exceed this threshold. Thus, although thep-values suggest that some of the stocks might be mispriced, after adjusting for multiple testing, we do not reject the hypothesis that all stocks are priced correctly.
Alternatively, if we compute the adjusted p-values, by multiplying the p-values by 96, we see that the smallest adjusted p-value is 0.41 (96 times
0.0043), leading to the same conclusion.
False Discovery Rate
An important drawback of the Bonferroni method is that it is generally con- servative, in the sense that the actual level of the test is less than 0.05; this is particularly true when mis large, as is often the case when analyzing stock
return data. In the present context, this property means that there is a ten- dency for the procedure to conclude that all stocks are priced correctly even when one or more is mispriced.
An alternative approach to designing tests of many hypotheses is to control thefalse discovery rate(FDR) rather than to control the probability of a Type I error. Suppose we conduct a series of tests of the hypotheses that a stock is mispriced, that is, of the hypotheses of the formαj= 0, and that, based on the procedure used, we conclude thatm0of the stocks are mispriced; that is, m0 of the hypotheses thatαj = 0 are rejected. Letm1 denote the number of those rejected hypotheses for whichαj is actually 0.
We refer to a rejected hypothesis as a “discovery” and an incorrectly rejected hypothesis as a “false discovery.” In the present context, a false dis- covery occurs if we conclude that a stock is mispriced when it is not. The false discovery proportion is defined asm1/m0provided thatm0>0; ifm0= 0, it is taken to be 0.
Note that the false discovery proportion is a random variable; the FDR is the expected value of this random variable. Therefore, the FDR is the expected proportion of rejected null hypotheses that were rejected incorrectly.
It is important to note that although the level of a test and its FDR are related, they are fundamentally different measures. The level of a test of α1=α2=ã ã ã=αm= 0 is the probability of rejecting this hypothesis, that is, of concluding that at least oneαjis nonzero when all are actually 0. The FDR measures the expected proportion of those cases in whichαj = 0 is rejected for whichαj is actually 0. Hence, procedures that control the FDR do not control the level of the test. However, the FDR is an intuitively appealing concept in many applications, such as stock screening; furthermore, the procedures that control the FDR have higher power than those based on the Bonferroni correction, so that we are more likely to discover mispriced stocks.
Let qj denote the p-value of the usual test of αj = 0, j= 1,2, . . . , m.
To control the FDR atF, instead of comparing eachqj to a given threshold value, as in the Bonferroni method, we use the following procedure. First, order thep-values and letq(1), q(2), . . . , q(m)denote the ordered values, so thatq(1)is the smallestp-value,q(2)is the second smallest, and so on. Then, starting with j= 1 and moving through the list ofp-values, we compareq(j)to (j/m)F.
If q(j)>(j/m)F for all j= 1,2, . . . , m, then we do not reject any of the hypotheses. Otherwise, find the largestjfor whichq(j)≤(j/m)F; denote this value byj∗. Then we reject the hypotheses corresponding toq(1), q(2), . . . , q(j∗). Although this procedure is a bit complicated, fortunately, there is an R func- tion that computes the corresponding adjustedp-values that can be compared to a given threshold in the usual way.
Although the conventional choice for the level of a test is 0.05, that is not necessarily the best choice for the FDR. For instance, an FDR of 0.10 or larger may be reasonable. In particular, if the tests of αj= 0 are used to screen stocks for further investigation, a threshold as large as 0.20 may be appropriate.
Example 8.6 Consider stocks for firms represented in the S&P 100 index analyzed in Example 8.5; consider testingαj= 0 for these stocks, controlling the FDR at 0.10.
Thep-values for testingαj= 0 for each of the 96 stocks are stored in the variablesp96.pv. To compute thep-values adjusted for controlling the FDR, we use the following command:
> sp96.pv.fdr<-p.adjust(sp96.pv, method="fdr")
> head(sp96.pv)
[1] 0.0883 0.2450 0.5338 0.9436 0.1488 0.0397
> head(sp96.pv.fdr)
[1] 0.403 0.523 0.733 0.971 0.468 0.340
The functionp.adjustcan perform a number of different adjustments; using the argumentmethod="fdr"specifies the adjustment to control the FDR, as described earlier.
The minimum adjustedp-value is given by
> min(sp96.pv.fdr) [1] 0.26
Because this value exceeds 0.10, we conclude that all 96 stocks are priced correctly. If the minimum adjusted p-value had not exceeded 0.10, we would reject the hypothesis thatαj= 0 for those assets with an adjustedp-value less
than or equal to 0.10.