Operational Risk Modeling Analytics phần 9 pptx

46 308 0
Operational Risk Modeling Analytics phần 9 pptx

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

GRAPHICAL COMPARISON OF THE DENSITY AND DISTRIBUTION FUNCTIONS 351 Table 12.1 Data Set B with highest value changed $27 $82 $115 $126 $155 $161 $243 $294 $340 $384$457 $680 $855 $877 $974 $1193 $1340 $1884 $2558 $3476 truncation, and grouping rarely apply. The data can easily be represented by the relative or cumulative frequencies at each possible observation. With regard to representing the data, the empirical distribution function will be used for individual data and the histogram will be used for grouped data. In order to compare the model to truncated data, we begin by noting that the empirical distribution begins at the truncation point and represents conditional values (that is, they are the distribution and density function given that the observation exceeds the truncation point). In order to make a comparison to the empirical values, the model must also be truncated. Let the truncation point in the data set be t. The modified functions are , XLt, 1 - F(t) 12.3 GRAPHICAL COMPARISON OF THE DENSITY AND DISTRIBUTION FUNCTIONS The most direct way to see how well the model and data match up is to plot the respective density and distribution functions. Example 12.1 Consider Data Sets B and C. However, for this example and all that follow, in Data Set B we replace the value at $15,743 by $3,476 (this is to allow the graphs to fit comfortably on a page). These data sets are reproduced here in Tables 12.1 and 12.2. Truncate Data Set B at $50 and Data Set C at $7,500. Estimate the parameter of an exponential model for each data set. Plot the appropriate functions and comment on the quality of the fit of the model. Repeat this for Data Set B censored at $1,000 (without any truncation). For Data Set B, there are 19 observations (the first observation is re- moved due to truncation). A typical contribution to the likelihood function is f(82)/[1 - F(50)]. The maximum likelihood estimate of the exponential 352 MODEL SELECTION Table 12.2 Data Set C 1 0.9 0.8 0.7 0.6 & 0.5 U. 0.4 1 0.3 0.2 0.1 0 Payment range Number of payments 0-$7500 99 $7500-$17,500 42 $17,500-$32,500 29 $67,500-$125,000 17 $32,500-$67,500 28 $125,000-$300,000 9 Over $300,000 3 Exponential fit 1 0 700 1,400 2,100 2,800 3,500 X -Model , -Empirical I Fig. 12.1 Model vs. data cdf plot for Data Set B truncated at 50. parameter is 0 = 802.32. The empirical distribution function starts at 50 and jumps 1/19 at each data point. The distribution function, using a truncation point of 50, is 1 - e-x/802.32 - (1 - e 50/802.32 - 1 - e-(x-50)/802.32 - 1 - (1 - e-50/802.32) F*(x) = Figure 12.1 presents a plot of these two functions. The fit is not as good as we might like because the model understates the distribution function at smaller values of LG and overstates the distribution function at larger values of 2. This is not good because it means that tail probabilities are understated. GRAPHICAL COMPARISON OF THE DENSITY AND DISTRIBUTION FUNCTIONS 353 Exponential tit 0.000035 , i 0.00003 0.000025 - 0.00002 0.000015 0.00001 0.000005 0 -Model ! -Empirical I 0 50,000 100,000 150,000 200,000 X fig. 12.2 Model vs. data density plot for Data Set C truncated at 7,500. For Data Set C, the likelihood function uses the truncated values. For example, the contribution to the likelihood function for the first interval is F(17,500) - F(7500) [ 1 - F(7500) The maximum likelihood estimate is 6 = 44,253. The height of the first histogram bar is = 0.0000328 42 128( 17,500 - 7500) and the last bar is for the interval from $125,000 to $300,000 (a bar cannot be constructed for the interval from $300,000 to infinity). The density function must be truncated at $7,500 and becomes e- (x-7500)/44,253 , x > 7500. - - 44.253 The plot of the density function versus the histogram is given Figure 12.2. The exponential model understates the early probabilities. It is hard to tell from the picture how the curves compare above $125,000. For Data Set B modified with a limit, the maximum likelihood estimate is 8 = 718.00. When constructing the plot, the empirical distribution function must stop at $1,000. The plot appears in Figure 12.3. 0 Once again, the exponential model does not fit well. 354 MODEL SELECTION Exponential fit 0.9 0.8 ~ 0.7 0.6 - 0.5 2 0.4 0.3 0.2 0.1 0 ~ ~~ 0 200 400 600 800 1,000 X Fig. 12.3 Model vs. data cdf plot for Data Set B censored at 1,000. When the model’s distribution function is close to the empirical distrib- ution function, it is difficult to make small distinctions. Among the many ways to amplify those distinctions, two will be presented here. The first is to simply plot the difference of the two functions. That is, if F,(x) is the empirical distribution function and F*(x) is the model distribution function, plot D(x) = F,(z) - F*(x). Example 12.2 Plot D(x) for Example 12.1. For Data Set B truncated at $50, the plot appears in Figure 12.4. The lack of fit for this model is magnified in this plot. There is no corresponding plot for grouped data. For Data Set B censored at $1,000, the plot must again end at that value. It appears in Figure 12.5. 0 The lack of fit continues to be apparent. Another way to highlight any differences is the pp plot, which is also called a probability plot. The plot is created by ordering the observations as 51 5 5 x,. A point is then plotted corresponding to each value. The coordinates to plot are (F,(xj)lF*(xj)). If the model fits well, the plotted points will be near the 45” line running from (0,O) to (1,l). However, for this to be the case, a different definition of the empirical distribution function is needed. It can be shown that the expected value of F,(xj) is j/(n + 1) and therefore the empirical distribution should be that value and not the usual j/n. If two observations have the same value, either plot both points (they would have the same (‘y” value but different “x” values) or plot a single value by averaging the two (‘x” values. GRAPHICAL COMPARISON OF THE DENSITY AND DISTRIBUTION FUNCTIONS 355 Exponential fit 0.15 - 0.1 ~ 0.05 25 Q 0- -0.05 - -0.1 0 500 1,000 1,500 2,000 2,500 3,000 3.500 X Fig. 12.4 Model vs. data D(z) plot for Data Set B truncated at 50. Exponential fit -0.1 ~ -0.15 0 200 400 600 800 1 ,OOo X Fig. 12.5 Model vs. data D(z) plot for Data Set B censored at 1,000. Example 12.3 Create a pp plot for Example 12.1. For Data Set B truncated at $50, n = 19 and one of the observed values is 2 = 82. The empirical value is Fn(82) = $ = 0.05. The other coordinate is ~*(82) = 1 - e-(82 80)/802.32 = 0.0391. 356 MODEL SELECTION Exponential fit 1 0.9 0.8 0.7 0.6 k 0.5 u. 0.4 0.3 0.2 0.1 0 T 0 0.2 0.4 0.6 0.8 1 Fn tx) Fig. 12.6 pp for Data Set B truncated at 50. One of the plotted points will be (0.05,0.0391). The complete picture appears in Figure 12.6. From the lower left part of the plot it is clear that the exponential model places less probability on small values than the data call for. A similar plot can be constructed for Data Set B censored at $1,000 and it appears in Figure 12.7. This plot ends at about 0.75 because that is the highest probability ob- served prior to the censoring point at $1,000. There are no empirical values at higher probabilities. Again, the exponential model tends to underestimate the empirical values. 12.4 HYPOTHESIS TESTS A picture may be worth many words, but sometimes it is best to replace the impressions conveyed by pictures with mathematical demonstrations. One such demonstration is a test of the hypotheses Ho : The data came from a population with the stated model. HI : The data did not come from such a population. The test statistic is usually a measure of how close the model distribution function is to the empirical distribution function. When the null hypothesis completely specifies the model (for example, an exponential distribution with HYPOTHESIS TESTS 357 Exponential fit 0.7 0.6 4 s ;:; 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 F" (XI Fig. 127 pp plot for Data Set B censored at 1,000. mean $loo), critical values are well known. However, it is more often the case that the null hypothesis states the name of the model but not its parameters. When the parameters are estimated from the data, the test statistic tends to be smaller than it would have been had the parameter values been prespecified. That is because the estimation method itself tries to choose parameters that produce a distribution that is close to the data. In that case, the tests become approximate. Because rejection of the null hypothesis occurs for large values of the test statistic, the approximation tends to increase the probability of a Type I1 error while lowering the probability of a Type I error.3 One method of avoiding the approximation is to randomly divide the sam- ple in half. Use one half to estimate the parameters and then use the other half to conduct the hypothesis test. Once the model is selected, the full data set could be used to reestimate the parameters. 12.4.1 Kolmogorov-Smirnov test Let t be the left truncation point (t = 0 if there is no truncation) and let 2~ be the right censoring point (u = 03 if there is no censoring). Then, the test 3Among the tests presented here, only the chi-square test has a built-in correction for this situation. Modifications for the other tests have been developed, but they will not be presented here. 358 MODEL SELECTION Table 12.3 Calculation of D for Example 12.4 82 115 126 155 161 243 294 340 384 457 680 855 877 974 1,193 1,340 1,884 2,558 3,476 0.0391 0.0778 0.0904 0.1227 0.1292 0.2138 0.2622 0.3033 0.3405 0.3979 0.5440 0.6333 0.6433 0.6839 0.7594 0.7997 0.8983 0.9561 0.9860 0.0000 0.0526 0.1053 0.1579 0.2105 0.2632 0.3158 0.3684 0.4211 0.4737 0.5263 0.5789 0.6316 0.6842 0.7368 0.7895 0.8421 0.8947 0.9474 0.0526 0.1053 0.1579 0.2105 0.2632 0.3158 0.3684 0.421 1 0.4737 0.5263 0.5789 0.6316 0.6842 0.7368 0.7895 0.8421 0.8947 0.9474 1 .oooo 0.0391 0.0275 0.0675 0.0878 0.1340 0.1020 0.1062 0.1178 0.1332 0.1284 0.0349 0.0544 0.0409 0.0529 0.0301 0.0424 0.0562 0.0614 0.0386 statistic is D = max IFn(z) - F*(x)/. t<X<U This test should only be used on individual data. This is to ensure that the step function F,(x) is well defined. Also, the model distribution function F*(z) is assumed to be continuous over the relevant range. Example 12.4 Calculate D for Example 12.1. Table 12.3 provides the needed values. Because the empirical distribution function jumps at each data point, the model distribution function must be compared both before and after the jump. The values just before the jump are denoted F,(x-) in the table. The maximum is D = 0.1340. For Data Set B censored at $1,000, 15 of the 20 observations are uncensored. Table 12.4 illustrates the needed calculations. The maximum is D = 0.0991.0 All that remains is to determine the critical value. Commonly used critical values for this test are l.22/fi for cy = 0.10, 1.36/fi for cy = 0.05, and 1.63/fi for a = 0.01. When u < 00, the critical value should be smaller because there is less opportunity for the difference to become large. Modi- fications for this phenomenon exist in the literature (see reference [lll], for HYPOTHESIS TESTS 359 Table 12.4 Calculation of D for Example 12.4 with censoring 27 82 115 126 155 161 243 294 340 384 457 680 855 877 974 1000 0.0369 0.1079 0.1480 0.1610 0.1942 0.2009 0.2871 0.3360 0.3772 0.4142 0.4709 0.6121 0.6960 0.7052 0.7425 0.7516 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.75 0.0369 0.0579 0.0480 0.0390 0.0558 0.0991 0.0629 0.0640 0.0728 0.0858 0.0791 0.0621 0.0960 0.0552 0.0425 0.0016 example, which also includes tables of critical values for specific null distrib- ution models), and one such modification is given in reference [loo] but will not be introduced here. Example 12.5 Complete the Kolmogorov-Smirnov test for Example 12.4. For Data Set B truncated at $50 the sample size is 19. The critical value at a 5% significance level is 1.36/m = 0.3120. Because 0.1340 < 0.3120, the null hypothesis is not rejected and the exponential distribution is a plausible model. While it is unlikely that the exponential model is appropriate for this population, the sample size is too small to lead to that conclusion. For Data Set B censored at 1,000 the sample size is 20 and so the critical value is 1.36/m = 0.3041 and the exponential model is again viewed as being plausible. 0 For both this test and the Anderson-Darling test that follows, the criti- cal values are correct only when the null hypothesis completely specifies the model. When the data set is used to estimate parameters for the null hypoth- esized distribution (as in the example), the correct critical value is smaller. For both tests, the change depends on the particular distribution that is hy- pothesized and maybe even on the particular true values of the parameters. 360 MODEL SELECTION 12.4.2 Anderson-Darling test This test is similar to the Kolmogorov-Srnirnov test, but uses a different measure of the difference between the two distribution functions. The test statistic is [Fn(x) - F*(X)l2 f*(x)& F*(x)[l - F*(x)] That is, it is a weighted average of the squared differences between the empir- ical and model distribution functions. Note that when x is close to t or to u the weights might be very large because of the small value of one of the factors in the denominator. This test statistic tends to place more emphasis on good fit in the tails than in the middle of the distribution. Calculating with this formula appears to be challenging. However, for individual data (so this is another test that does not work for grouped data), the integral simplifies to A’ = -nF*(u) + nz[l- ~,(yj)]~{ln[l- ~*(yj)] - ln[l- ~*(yj+l)]) k j=O k + 72 c Fn(Yj l2 [In F* (Yj+l) - In F* (Yj 11 , j=1 where the unique noncensored data points are t = yo < y1 < < yk < yk+l = u. Note that when u = co the last term of the first sum is zero [evaluating the formula as written will ask for ln(O)]. The critical values are 1.933, 2.492, and 3.857 for lo%, 5%, and 1% significance levels, respectively. As with the Kolmogorov-Smirnov test, the critical value should be smaller when u < 03. Example 12.6 Perform the Anderson-Darling test for the continuing exam- ple. For Data Set B truncated at $50, there are 19 data points. The calculation is in Table 12.5, where “summand” refers to the sum of the corresponding terms from the two sums. The total is 1.0226 and the test statistic is -19(1)+ lg(1.0226) = 0.4292. Because the test statistic is less than the critical value of 2.492, the exponential model is viewed as plausible. For Data Set B censored at $1000, the results are in Table 12.6. The total is 0.7602 and the test statistic is -20(0.7516) + 20(0.7602) = 0.1713. Because the test statistic does not exceed the critical value of 2.492, the exponential 0 model is viewed as plausible. 12.4.3 Chi-square goodness-of-fit test Unlike the previous two tests, this test allows for some discretion. It begins with the selection of k - 1 arbitrary values, t = co < CI < . . . < Ck = co. Let [...]... SELECTION Table 12 .9 Data Set C Range P Observed X2 0.2023 0.2 293 0.3107 0.1874 0.06 89 0.0013 25.8 89 29. 356 39. 765 23 .99 3 8.824 0.172 42 29 28 17 9 3 10.026 0.004 3.481 2.038 0.003 46.360 1 $7500-$17,500 $17,500-$32,500 $32,500-$67,500 $67,500-$125,000 $125,000-$300,000 $300,000-o0 Expected 128 128 61 .91 3 Total Table 12.10 Automobile claims by year Year Exposure Claims 198 6 198 7 198 8 198 9 199 0 199 1 2145 2452... 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 50 82 115 126 155 161 243 294 340 384 457 680 855 877 97 4 1 193 1340 1884 2558 3476 co p j = F * ( c j )- F * ( c j - l ) 0.0000 0.0 391 0.0778 0. 090 4 0.1227 0.1 292 0.2138 0.2622 0.3033 0.3405 0. 397 9 0.5440 0.6333 0.6433 0.68 39 0.7 594 0. 799 7 0. 898 3 0 .95 61 0 .98 60 1.0000 0.0000 0.0526 0.1053 0.15 79 0.2105 0.2632 0.3158 0.3684 0.4211 0.4737 0.5263 0.57 89 0.6316... Exponential Weibull Exponential Weibull X2 pValue Loglikelihood SBC 0.1340 0.4 292 1.4034 0.8436 -146.063 -147.535 0.0887 0.1631 0.3615 0 .94 81 -145.683 -148.628 0. 099 1 0.1713 0. 595 1 0. 897 6 -113.647 -115.145 0. 099 1 0.1712 0. 594 7 0.7428 -113.647 -116.643 X2 pValue Loglikelihood SBC 61 .91 3 10-12 -214 .92 4 -217.350 0.3 698 0 .94 64 -202.077 -206 .92 9 K-S* A-D* 'K-S and A-D refer to the Kolmogorov-Smirnov and Anderson-Darling... 0 .98 60 1.0000 0.0000 0.0526 0.1053 0.15 79 0.2105 0.2632 0.3158 0.3684 0.4211 0.4737 0.5263 0.57 89 0.6316 0.6842 0.7368 0.7 895 0.8421 0. 894 7 0 .94 74 1.oooo 1.0000 0.0 399 0.0388 0.0126 0.0332 0.0070 0. 090 4 0.0501 0.0426 0.03 89 0.0601 0.1 490 0.0 897 0.0 099 0.0407 0.0758 0.0403 0. 099 4 0.0 592 0.0308 0.0141 C be the probability that a truncated observation falls in the interval from cj-1 to c j - Similarly, let... 5 6 7 8 9 10 11 12 13 14 15 16 0 27 82 115 126 155 161 243 294 340 384 457 680 855 877 97 4 1000 0.0000 0.03 69 0.10 79 0.1480 0.1610 0. 194 2 0.20 09 0.2871 0.3360 0.3772 0.4142 0.47 09 0.6121 0. 696 0 0.7052 0.7425 0.7516 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.75 0.0376 0.0718 0.0404 0.0130 0.0334 0.0068 0.0881 0.0 493 0.0416 0.0375 0.0575 0.1423 0.0852 0.0 093 0.0374... for Example 12.14 No of claimslday Observed no of days 47 97 1 09 62 25 16 4 3 2 0 0 1 2 3 4 5 6 7 8 9- t Table 12.15 Chi-square goodness-of-fit test for Example 12.14 ~~~~~~~ Claims/day Observed Expected Chi square 0 1 2 3 4 5 6 7+ 47 97 1 09 62 25 16 4 5 47.8 97 .2 98 .8 66 .9 34.0 13.8 4.7 1.8 0.01 0.00 1.06 0.36 2. 39 0.34 0.10 5.66 Totals 365 365 9. 93 0.1277 By this test the Poisson distribution is an acceptable... contracts 0 1 2 3 4 5 6 7 8 9 10 11 12+ Parameters Chi square Degrees of freedom pValue Loglikelihood SBC 99 65 57 35 20 10 4 0 3 4 0 1 0 Fitted distributions Negative Poisson binomial Polya-Aeppli 54.0 92 .2 78.8 44 .9 19. 2 6.5 1 .9 0.5 0.1 0.0 0.0 0.0 0.0 95 .9 75.8 50.4 31.3 18.8 11.0 6.4 3.7 2.1 1.2 0.7 0.4 0.5 98 .7 70.6 50.2 32.6 20.0 11.7 6.6 3.6 2.0 1.0 0.5 0.3 0.3 X = 1.70805 , = 1.1 590 7 B T = 1.47364 X... Negative binomial 103,704 14,075 1,766 255 45 6 2 0 102,6 29. 6 15 ,92 2.0 1,235.1 63 .9 2.5 0.1 0.0 0.0 103,723.6 13 ,98 9 .9 1,857.1 245.2 32.3 4.2 0.6 0.1 103,710.0 14,054.7 1,784 .9 254.5 40.4 6 .9 1.3 0.3 X = 0.155140 ,B = 0.150232 r = 1.03267 X = 0.144667 ,B = 0.310536 1332.3 2 . 0.57 89 0.6316 0.6842 0.7368 0.7 895 0.8421 0. 894 7 0 .94 74 1 .oooo 1.0000 0.0 399 0.0388 0.0126 0.0332 0.0070 0. 090 4 0.0501 0.0426 0.03 89 0.0601 0.1 490 0.0 897 0.0 099 0.0407. 0. 090 4 0.1227 0.1 292 0.2138 0.2622 0.3033 0.3405 0. 397 9 0.5440 0.6333 0.6433 0.68 39 0.7 594 0. 799 7 0. 898 3 0 .95 61 0 .98 60 1.0000 0.0000 0.0526 0.1053 0.15 79 0.2105 0.2632 0.3158. 294 340 384 457 680 855 877 97 4 1, 193 1,340 1,884 2,558 3,476 0.0 391 0.0778 0. 090 4 0.1227 0.1 292 0.2138 0.2622 0.3033 0.3405 0. 397 9 0.5440 0.6333 0.6433 0.68 39 0.7 594

Ngày đăng: 09/08/2014, 19:22

Từ khóa liên quan

Mục lục

  • Operational Risk

    • Part III Statistical methods for calibrating models of operational risk

      • 12 Model selection

        • 12.3 Graphical comparison of the density and distribution functions

        • 12.4 Hypothesis tests

          • 12.4.1 Kolmogorov-Smirnov test

          • 12.4.2 Anderson-Darling test

          • 12.4.3 Chi-square goodness-of-fit test

          • 12.4.4 Likelihood ratio test

          • 12.5 Selecting a model

            • 12.5.1 Introduction

            • 12.5.2 Judgment-based approaches

            • 12.5.3 Score-based approaches

            • 12.6 Exercises

            • 13 Fitting extreme value models

              • 13.1 Introduction

              • 13.2 Parameter estimation

                • 13.2.1 ML estimation from the extreme value distribution

                • 13.2.2 ML estimation from the generalized Pareto distribution

                • 13.2.3 Estimating the Pareto shape parameter

                • 13.2.4 Estimating extreme probabilities

                • 13.3 Model selection

                  • 13.3.1 Mean excess plots

                  • 14 Fitting copula models

                    • 14.1 Introduction

                    • 14.2 Maximum likelihood estimation

Tài liệu cùng người dùng

Tài liệu liên quan