Extreme value methods and models

7.2.1 The block maxima approach

The focus of EVT is on the modelling and inference of maxima. Assume that a se- quence of iid random variables X1,…,Xn over a time span ofn periods is given.

Ordinarily, the time span is a calendar period such as a month, quarter, half year, or year, and the observed data within this period is of daily frequency. With respect to EVT, the question arises as to which distribution the maximum of these random variables follows, or, to put it more precisely, is asymptotically best approximated by.

Mn=max{X1,…,Xn} (7.1)

In principle, if the distribution function for theXiis assumed to be known, then the distribution ofMncould be derived as:

P{Mn≤z} =P{X1 ≤z,…,Xn≤z}

=P{X1 ≤z} ì ã ã ã ìP{Xn≤z}

= {F(z)}n. (7.2)

In practice, this approach is not feasible for various reasons. First, the distribution functionF is, in general, unknown. This could be rectified by either estimating a kernel density or assuming that theXi are governed by a particular distribution. If this route is taken, the next obstacle is that estimation errors are raised to the power ofn, leading to quite divergent results. An alternative route would be to seek a family of distributions Fn that can be used to approximate any kind ofF. Therefore, the characteristics and properties ofFnforn→∞need to be investigated. However, this asymptotic reasoning would imply that the values of the distribution function forz less thanz+approach zero, wherebyz+denotes the upper right point. Put differently, the mass of the distribution would not collapse over the pointz+. This artifact can be circumvented by a linear transformationMn∗ofMn:

Mn∗= Mn−bn

an , (7.3)

wherean>0 andbnare sequences of constants. The purpose of these constants is to straighten outMn, such that the probability mass would collapse over a single point.

Under the assumption that the sequencesan andbn exist, it can be shown that the probability expression

P {

M∗n = Mn−bn an ≤z

}

→G(z)forn→∞ (7.4)

converges to a non-degenerate distributionG(z), which belongs to one the following distribution families: Gumbel, Fréchet, or Weibull. All three distributions have a location and scale parameter. The Fréchet and Weibull distributions also have a shape

k k parameter. The three distributions can be subsumed into the generalized extreme

value (GEV) distribution, G(z) =exp

{

−1 [

1+𝜉(z−𝜇 𝜎

)]−1∕𝜉}

. (7.5)

The GEV is a three-parameter distribution where𝜇 is the location,𝜎 the scale, and𝜉the shape parameter. For the limit𝜉 →0 the Gumbel distribution is obtained, for𝜉 >0 the Fréchet, and for𝜉 <0 the Weibull. The Weibull has a finite right point, whereasz+is infinity for the other two distributions. The density is exponential in the case of Gumbel and polynomial for the Fréchet distribution. Hence, the characteristics and properties of the GEV can be deduced from the value of the shape parameter.

7.2.2 The rth largest order models

A problem that arises when the block maxima method is applied to financial series is the lack of a sufficiently long data history. In these instances, the unknown distribution parameters will be estimated with greater uncertainty. To circumvent this problem the rth largest order model has been suggested. Not only is the maximal loss in a given block used as a data point for fitting the GEV, but also therlargest loss observations are employed. The data selection pertinent to the block maxima and ther largest orders is shown in Figure 7.1.

In this figure, 100 losses have been randomly generated and subdivided into 10 blocks of equal size. The maximum losses in each block are indicated by squares.

These data points would be used in the block maxima approach. Further, the second largest losses in each of the 10 blocks are marked by triangles. The sets of these first- and second-order losses would be utilized if one were to fit a second largest order model. The variance of the parameter estimates will certainly be reduced by

0 20 40 60 80 100

−101234

Losses

Figure 7.1 Block maxima,rlargest orders, and peaks-over-threshold.

k k increasing the sample size. However, one runs the risk of having a biased sample in

the sense that the second highest losses possibly would not qualify as extreme events or be considered as such. Finally, a dashed horizontal line has been drawn on the figure at the ordinate value of 2.5. This threshold value leads directly into the topic of the next subsection.

7.2.3 The peaks-over-threshold approach

With regard to the application of the block maxima method and therth largest order model in the context of financial market data, the following issues arise:

•For a given financial instrument a sufficiently long data history is often not available. This will result in wide confidence bands for the unknown distribution parameters. As a consequence, the derived risk measures should be used with great caution.

•Not all observations that should be considered as extremes are exploited in estimating the distribution parameters, although this issue is ameliorated in the case of therth largest order statistics. Hence, the available information about extreme events is not fully taken into account.

•Given the stylized facts for univariate return series, in particular volatility clus- tering, data points during tranquil periods could be selected as block maxima values, which they are not. Hence, an estimation bias would result in these instances.

Due to these issues the peaks-over-threshold (POT) method is more widely encountered when financial risks are of interest. As the name of this method suggests, neither the block maxima data points nor therlargest values within a block are considered as extreme observations, but rather all observations above a certain threshold value. This can be summarized for a given thresholduby the following probability expression:

P{X>u+y|X>u} = 1−F(u+y)

1−F(u) , y>0. (7.6) It is evident from this equation that when the distribution functionF is known, the expression can be solved for the exceedances y>u. In practice, however, the distribution functionFis generally unknown, and hence, similarly to the derivation of the GEV, one needs an approximating distribution for sufficiently large threshold values. It can be shown that the exceedances(X−u)are distributed according to the generalized Pareto distribution (GPD),

H(y) =1− (

1+ 𝜉y

̃𝜎 )−1∕𝜉

, (7.7)

fory∶y>0 and ̃𝜎=𝜎+𝜉(u−𝜇). This means, that if the distribution of the block maxima for an iid sample can be approximated by the GEV, then the exceedances can

k k be approximated by the GPD for a sufficiently large threshold valueu. By comparison

of the GEV and GPD, it can be concluded that the shape parameter𝜉is identical for both and hence independent of the chosen number of block maxima. It can then be deduced further that the distribution for𝜉 <0 possesses an upper bound ofu−̃𝜎∕𝜉 and is unbounded to the right for parameter values satisfying𝜉 >0. At the limit of 𝜉 →0 the GPD converges to an exponential distribution with parameter 1∕̃𝜎.

In practice, a difficulty arises when an adequate threshold has to be selected. If this value is chosen too small, a biased sample results. In this case observations would be selected which are too small, such that the GPD approximation would be violated.

In contrast, ifuis chosen too large, the sample size might become too small to yield reliable estimates for the unknown distribution parameters. Thus, a trade-off between a biased estimate and a large estimation error is required. An adequate threshold value can be determined graphically by means of a mean residual life (MRL) plot. This kind of plot is based on the expected value of the GPD:E(Y) = 𝜎

1−𝜉. For a given range of thresholdsu0the conditional expected values

E(X−u0|X>u0) = 𝜎u0

1−𝜉

= 𝜎u 1−𝜉

= 𝜎u0+𝜉u

1−𝜉 (7.8)

are plotted againstu. This equation is linear with respect to the thresholdu:

{(

u, 1 nu

∑

i=1

(xi−u) )

∶u<xmax }

. (7.9)

Hence, a suitable value foru is given when this line starts to become linear. Due to the central limit theorem (CLT), confidence bands can be calculated according to the normal distribution, owing to the fact that arithmetic means are depicted in an MRL plot.

Most often, the parameters of the GPD are estimated by applying the ML principle, but other methods, such as probability-weighted moments, are also encountered in empirical applications. The advantage of the former method is the possibility of plotting the profile log-likelihood for the shape parameter𝜉 and computing confidence bands based on this profile. In contrast to the bands derived from the normality assumption for the estimator, these are, in general, asymmetric. Hence, inference with respect to the sign of the shape parameter becomes more precise.

The VaR and ES risk measures can be inferred directly from the GPD as follows:

VaR𝛼=q𝛼(F) =u+ ̃𝜎 𝜉

((1−𝛼 F(u)̄

)−𝜉

−1 )

, (7.10)

k k ES𝛼= 1

1−𝛼∫

𝛼 qx(F)dx= VaR𝛼

1−𝜉 + ̃𝜎−𝜉u

1−𝜉 , (7.11)

whereF(u)̄ denotes the number of non-exceedances relative to the sample size. Anal- ogously to the confidence bands for the distribution parameters, intervals for these risk measures can also be derived.

Stylized facts for univariate series

Synopsis of R packages for GHD