Introduction

Extreme value analysis has wide applications in many fields, such as civil engineering (Wu and Qiu 2018), climatology (Davison and Smith 1990; Kharin et al. 2013), seismology (Beirlant et al. 2018), hydrology (Katz et al. 2002; Carreau et al. 2017; Bader et al. 2018), insurance (Reiss and Thomas 2007) and finance (Embrechts et al. 1997; Fontnouvelle et al. 2007; Abad and Benito 2013, among others). For instance, the extreme value of vehicle load plays an important role in bridge design and risk assessment (Wu and Qiu 2018). In seismology and climatology, extreme value analysis is used to study earthquakes (Beirlant et al. 2018) and extreme precipitation (Bader et al. 2018). In hydrology, extreme value analysis is an important tool for studying coastal flood risk (Haigh et al. 2010; McMillan et al. 2011). In the field of finance, extreme value modelling is important to quantify large financial losses from different sources of risk: operational, credit and market risk (see Cruz 2002; Moscadelli 2004; Fontnouvelle et al. 2007; Ergün and Jun 2010; Žiković and Aktan 2009; Abad and Benito 2013).

Traditionally, the study of the extreme values of fat tail distributions has been carried out through the extreme value theory (EVT). EVT comprises mainly two approaches—the block maximum method (BMM) and the POT approach-. In the former, the data set is divided into blocks and a generalized extreme value distribution is fitted to the sample of maximums or minimums extracted from these blocks. In the context of the POT method, a threshold is determined above which the excesses are fitted with the GPD (Queensley et al. 2019).

Although the BMM and POT approaches should lead asymptotically to the same results, in practice the POT provides more suitable extreme quantile estimations due to the more efficient use of the data for the extreme values, see Cunnane (1973) and Madsen et al. (1997a). These authors show that POT approach performs better than BMM, independently of the estimation method used. Similar results have been reported by Wang (1991), Madsen et al. (1997b).

From a theoretical point of view, threshold selection is a critical issue in the framework of the POT approach. The choice of threshold must be a balance between bias and variation. A threshold being too low is likely to violate the asymptotic basis of the model which leads to bias. However, a threshold being too high will generate few excesses leading to an increase in the variance of the estimators (Davison and Smith 1990; Coles 2001; MacDonald et al. 2011; Papalexiou et al. 2013; Wyncoll and Gouldby 2013).

That is why, within the framework of POT method, different methods have been developed for the selection of the suitable threshold. Those methods can be divided into two groups: (i) graphical approaches, based on a visual inspection of plots, such as the mean excess plot (Davison and Smith 1990), stability parameters plot (Coles 2001) and Hill plot (Drees et al. 2000) among others and (ii) numerical approaches (Ferreira et al. 2003; Thompson et al. 2009; Northrop and Coleman 2014; Li et al. 2014; Wadsworh and Tawn 2012; Naveau et al. 2016) which are more objective methods. Recently, new methods have been developed with the aim of automating some of the existing proposals, especially those based on visual data inspection, see for instance Wu and Qiu (2018), Bader et al. (2018), Caballero-Megido et al. (2018) and Queensley et al. (2019) among others.

The aforementioned papers focus on studying new methods for the selection of the optimal threshold, assuming that the estimates of the quantiles of the generalized Pareto are highly sensitive to the threshold choice in which case such efforts would be fully justified (Langousis et al. 2016). In the field of finance, the existing literature on this issue is quite scarce, especially in the area of market risk management. As far as we know in this field, there are no studies on this subject. To cover this gap, we carry out an empirical analysis with a double aim. First, to analyse the sensibility of the GPD quantiles to the threshold choice and second, to study the sensibility of the market risk measure to this choice. For measuring market risk, we use the value at risk (VaR)Footnote 1 measure and the expected shortfall (ES)Footnote 2 measure. To last, we calculate the market risk capital requirements and evaluate their sensitive to the threshold choice.

This study is in accordance with Langousis et al. (2016) who remarked that “the variety of existing methods for threshold chosen, the fundamental differences in their theoretical underpinnings, and their relative performance when dealing with different types of data, make threshold detection an open question that can be addressed solely on the basis of a specific application”. With regard to this, we think, that in the area of market risk management, where daily data and large samples are used could give rise to different results from those obtained in other areas of science where the periodicity of the data is annual and consequently the size of the samples is reduced.

Thus, in this paper, we analyse in detail the case of the S&P500 and later extend that study to a set of 14 assets from alternative markets: seven stock indexes (CAC40, DAX30, FTSE100, HangSeng, IBEX35, Merval and Nikkei), four commodities (Copper, Gold, Crude Oil Brent and Silver) and three rates exchange (₤/€, $/€ and ¥/€). The results obtained indicate that there is a large set of thresholds that provide similar GPD quantiles estimators and as a consequence similar market risk measures. Just only, for large thresholds, those corresponding to the 98th and 99th percentile of the GPD some differences are found. It means that the choice of threshold in the framework of the POT method may not be relevant in quantifying market risk when we use the VaR and ES measures for this task. With regard to the market risk capital requirement, we find that these charges do not differ much among the thresholds. Nevertheless, if the objective of the financial institutions is to minimize these charges, they might be interested in the selection of a specific threshold.

The remainder of the paper is organized as follows. In “Methodology” section, we show the methodology used for the study. In “Case study” section, we submit the data and the results obtained for the particular case of the S&P500 index. “Robustness analysis” section displays a robustness analysis. The main conclusions are presented in “Conclusions” section.

Methodology

Extreme value theory

The Extreme Value Theory (EVT) studies the asymptotic behaviour of extreme values of a random variable. This theory has wide applications in many fields, such as, civil engineering (Wu and Qiu 2018), climatology (Davison and Smith 1990; Kharin et al. 2013), seismology (Beirlant et al. 2018), hydrology (Katz et al. 2002; Carreau et al. 2017; Bader et al. 2018), insurance (Reiss and Thomas 2007) and finance (Embrechts et al. 1997), among others.

Within the EVT context, there are two approaches that study extreme events. The first one, based on the Generalized Extreme Value (GEV) distribution, models the distribution of the minimum or maximum realizations and it is known as the Block Maxima (Minima) Method (BMM). The second one is the Peaks Over Threshold (POT) approach based on the Generalized Pareto distribution (GPD) (Pickands 1975) which models the exceedances over a particular threshold. In the next lines, we introduce these approaches.

Fisher–Tippett theorem

Suppose that \(X_{1} , X_{2} , \ldots , X_{n}\) is a sequence of independently and identically distributed random variables with a distribution function \(F\left( x \right) = {\text{Pr}}\left( {X_{t} \le x} \right)\) and denote \(M_{{\varvec{n}}} = \max \left( {X_{1} , X_{2} , \ldots , X_{n} } \right)\) as a sample of maxima of this series, the distribution function (CDF) of \(M_{n}\) is represented as

$$P\left( {M_{n} \le x} \right) = P\left( {X_{1} \le x, \ldots .,X_{n} \le x} \right) = \mathop \prod \limits_{i = 1}^{n} F\left( x \right) = F^{n} \left( x \right)$$
(1)

Although \(F\left( x \right)\) is unknown, Fisher and Tippet theorem establishes an asymptotic approach for \(F^{n} \left( x \right)\). This theorem establishes that given a sequence of \(b_{n} > 0\), \(a_{n} \in R\), the maximum normalized value \(Z_{n} = \frac{{M_{n} - a_{n} }}{{b_{n} }}\) converges to a non-degenerated distribution \(H\), and this distribution is the generalized extreme value (GEV) distribution, \(\mathop {\lim }\limits_{n \to \infty } \Pr \left( {\frac{{M_{n} - a_{n} }}{{b_{n} }} \le x} \right) \to H\left( x \right)\).

The algebraic expression for such generalized distribution is as follows:

$${\text{GEV}}_{\xi ,\mu ,\;\sigma } \left( x \right) = e^{{ - \left[ {1 + \xi \frac{{\left( {x - \mu } \right)}}{\sigma }} \right]^{{ - \frac{1}{\xi }}} }}$$
(2)

defined on \(\left( {1 + \frac{{\xi \left( {x - \mu } \right)}}{\sigma }} \right) > 0\), where \(\sigma > 0\) is the scale parameter, \(- \infty < \mu < \infty\) is the mean, and \(- \infty <\xi< \infty\) is known as the shape parameter of the GEV distribution and characterizes the tail behaviour of the distribution. The prior distribution is a generalization of the three types of distributions, depending on the value taken by \(\xi\):

  • Gumbel (\(\xi\) = 0) type I family. It has light extremes, not heavy extremes

    $$\Lambda \left( x \right) = e^{{e^{{ - \frac{x - \mu }{\sigma }}} }} \quad \quad \forall x \in \Re$$
  • Fréchet (\(\xi\) > 0) type II family. This distribution is particularly useful for patterning financial returns as it has very heavy tails.

    $$\Phi_{\xi ,\mu ,\sigma } \left( x \right) = \left\{ {\begin{array}{ll} { 0 \quad x \le \mu } \\ {e^{{ - \left( {\frac{x - \mu }{\sigma }} \right)^{{ - \frac{1}{\xi }}} }} \quad x > \mu } \\ \end{array} } \right.$$
  • Weibull (\(\xi\) < 0) type III family. This distribution is used when the extremes are lighter (softer) than those from the normal distribution, and thus, it is not particularly useful for applications related to financial yields (returns).

    $$\Psi_{\xi ,\mu ,\sigma } \left( x \right) = \left\{ {\begin{array}{ll} {e^{{ - \left( { - \frac{x - \mu }{\sigma }} \right)^{{\frac{ - 1}{\xi }}} }} \quad x \le \mu } \\ {\quad \quad 1\quad \quad \quad x > \mu } \\ \end{array} } \right.$$

Peaks over threshold approach (POT)

In general, we are not only interested in the maxima of observations, but also in the behaviour of large observations which exceed a high threshold. One method of extracting extremes from a sample of observations, \(X_{t} , t = 1, 2, \ldots n\) with a distribution function \(F\left( x \right) = {\text{Pr}}\left( {X_{t} \le x} \right)\) is to take the exceedances over a predetermined high threshold \(u\). An exceedance of a threshold \(u\) occurs when \(X_{t} > u\) for any \(t\) in \(t = 1,2, \ldots ,n\). Thus, an excess over \(u\) is defined as \(y = X_{t} - u\). This approach is known as POT.

Although the BMM and POT approaches should lead asymptotically to the same results, in practice the POT provides more suitable extreme quantile estimations due to the more efficient use of the data for the extreme values, see Cunnane (1973) and Madsen et al. (1997a). These studies show that the POT approach performs better than BMM, independently of the estimation method used. Similar results have been reported by Wang (1991), Madsen et al. (1997b), and Tanaka and Takara (2002) among others.

Let \(x_{0}\) be the finite or infinite right endpoint of the distribution \(F\). That is to say, \(x_{0} = {\text{sup}}\{ x \in R:F\left( x \right) < 1\} \le \infty\). The distribution function of the excesses \(\left( y \right)\) over the threshold \(u\) is given by \(F_{u} \left( y \right) = P\left( {\left( {X - u} \right) \le \left. y \right| X > u} \right)\) for \(0 \le x \le x_{0} - u\). Thus, \(F_{u} \left( y \right)\) is the probability that the value of \(X\) exceeds the threshold \(u\) by no more than an amount \(y\), given that the threshold is exceeded. This probability can be written as

$$F_{u} \left( y \right) = \frac{{F\left( {y + u} \right) - F\left( u \right)}}{1 - F\left( u \right)}$$
(3)

This distribution can be approximated by the generalized Pareto distribution (GPD) which is usually expressed as a two-parameter distributionFootnote 3:

$$G_{k. \xi } \left( y \right) = \left\{ {\begin{array}{ll} {1 - \left( {1 + \frac{\xi }{\sigma }y} \right)^{{ - \frac{1}{\xi }}} \quad if\; \xi \ne 0} \\ {1 - \exp \left( { - \frac{y}{\sigma }} \right)\quad if\; \xi = 0} \\ \end{array} } \right.$$
(4)

where ξ and \(\sigma > 0\) are the shape parameter and the scale parameter, respectively.Footnote 4 Note that if the distribution of \(M_{n}^{*}\) converges to a GEV distribution for block maxima with parameter ξ, then the distribution of exceedances over threshold converges to the GPD with the same parameter ξ (Rodríguez 2017).

Using this approximation, the distribution function of \(X\) will be given by \(F\left( x \right) = \left( {1 - F\left( u \right)} \right) F_{u} \left( y \right) + F\left( u \right)\). Replacing \(F_{u}\)(y) by GPD and \(F\left( u \right)\) by its empirical estimator \(\left( {n - N_{u} } \right)/n\), where \(n\) is the total number of observations and \(N_{u}\) the number of observations above the threshold \(u,\) we have

$$F\left( x \right) = 1 - \frac{{N_{u} }}{n}\left( {1 + \frac{\xi }{\sigma }\left( {x - u} \right)} \right)^{{ - \frac{1}{\xi }}}$$
(5)

For a given probability \(\alpha > F\left( u \right)\), the \({\text{quantile }} \alpha ,\) which is denoted by \(q_{\alpha } ,\) is calculated by inverting the tail estimation formula to obtain

$$q_{\alpha } = u + \frac{\sigma }{\xi }\left( {\left( {\frac{n}{{N_{u} }}\left( {1 - \alpha } \right)} \right)^{ - \xi } - 1} \right)$$
(6)

The distributional choice is motivated by a theorem (Balkema and de Haan 1974; Pickands 1975) which states that, for a certain class of distributions, the GPD is the limiting distribution for the distribution of the excesses, as the threshold tends to the right endpoint:

$$\mathop {\lim }\limits_{{u \to x_{0} }} {\text{sup}}\left| {F_{n} \left( y \right) - {\text{GPD}}_{\xi , \sigma } \left( y \right)} \right| = 0$$

This theorem is fulfilled if and only if \(F\) is in the maximum domain of attraction of the generalized extreme value distribution \(H_{\xi }\), \(\left( {F \in {\text{MDA}}\left( {H_{\xi } } \right)} \right)\). It means that if, for a given distribution \(F\), an appropriately normalized maximum sample converges to a non-degenerated distribution \(H_{\xi }\), then this is equivalent to say \(H_{\xi }\) is the MDA for \(F\) for some value of \(\xi .\)

The class of distribution \(F\) for which the condition \(F \in {\text{MDA}}\left( {H_{\xi } } \right)\) holds is large; essentially all commonly encountered continuous distributions show the kind of regular behaviour for sample maximum described by Eq. (1).

Threshold selection method

The approaches developed for selecting the suitable threshold can be divided into two groups: (i) subjective approaches based on graphical analysis, such as the mean excess plot, stability parameters plot and Hill plot among others and (ii) numerical approaches. In its turn, the latter can be divided into various categories: (a) non-parametric approach; (b) approaches based on goodness-of-fit test; (iii) mixture models; (iv) simple naïve methods; (v) computational approaches and (vi) other approaches. In the following lines, we describe briefly these methods (see Scarrot and McDonald 2012 and Langousis et al. 2016 for a more detailed review of these methods).

(i) Graphical approaches


Due to its simplicity, the graphic method most commonly used for selecting threshold is the mean excess plot (MEP) also called mean residual life plot (MRLP) introduced by Davison and Smith (1990). This instrument is a graphical tool based on the sample means of the excesses function (SMEF), which is defined as

$${\text{SMEF}}\left( u \right) = \frac{{\mathop \sum \nolimits_{i}^{{N_{u} }} (r_{i} - u)_{{\{ r_{i} > u\} }} }}{{N_{u} }}$$

The sample means excess function is an estimate of the excess mean function (MEF), \(e\left( u \right) = E\left[ {\left( {X - u} \right){|}X > u} \right]\). For the GPD, the excess mean function is given by a linear function in \(u\)Footnote 5:

$$e\left( u \right) = \frac{\sigma }{1 - \xi } + \frac{\xi }{1 - \xi }u$$
(7)

This finding means that for \(0 < \xi < 1\) and \(\sigma + u\xi > 0,\) the mean excess plot should resemble a straight line with a positive slope. Empirical estimates of the sample mean excesses are typically plotted against a range of thresholds. Thus, the general rule for the choice of the optimal threshold will be to choose a value of \(u\) for which the resulting line has a positive slope. An application of this method can be found in Beirlant et al. (2004).

The main problem associated with the sample mean excess plot is subjectivity. As Queensley et al. (2019) remark, “judging from where the graph is approximately linear using only the eyeball inspection approach, is a rather subjective choice so that different thresholds may be selected by different viewers of the plot”.

The second type of plot is the parameter stability plot (Coles 2001) (shape and scale) created by fitting the GPD using a range of thresholds. This method involves plotting \(\hat{\sigma }\) and \(\hat{\xi }\) together with confidence intervals and selecting the value of \(u\) from which the estimates are no longer stable (see Coles 2001). This type of plot may present some inconsistencies, showing different flat sections for different ranges of threshold (Scarrot and McDonald 2012).

Other graphic approaches are these based on quantile plots and plots comparing the empirical cumulative distribution function and the cumulative GPD. According to quantile plots, the proper threshold is selected as the lowest threshold above which the plot shows a linear trend. When we compare the empirical with the theoretical distribution function, the proper threshold is selected as the lowest threshold above which the differences between the empirical and the theoretical distribution function seem minimum. Hill plot, explored by Drees et al. (2000) can also be included in this group. The Hill plot plots the Hill estimator of the tail index for a set of thresholds. According to this plot, the optimal threshold is the lowest threshold at which the Hill estimator is stabilised. This tool suffers from many of the same benefits and drawbacks that the MEP, and has been referred to as the Hill horror plot by Resnick (1997).

(ii) Numerical approaches

The approaches aforementioned are based on judgement (Caballero-Megido et al. 2018) so they can be rather subjective and require substantial expertise to interpret these diagnostics as a method of threshold selection (Davison and Smith 1990; Coles 2001; Solari and Losada 2012). To overcome these limitations, some numerical approaches have been developed which lead to a more objective decision. The numerical approaches are numerous and can be classified into different categories: (a) non-parametric approach; (b) approaches based on goodness-of-fit test; (c) simple naïve methods; (d) mixture models; (e) computational approaches and (e) other approaches. In the following lines, we resume each one of these categories.

  1. (a)

    Nonparametric methods that are intended to locate the changing point between extreme and nonextreme regions of the data (see e.g. Gerstengarbe and Werner 1989, 1991; Werner and Gerstengarbe 1997; Domonkos and Piotrowicz 1998; Lasch et al. 1999; Cebrián et al. 2003; Cebrián and Abaurrea 2006; Karpouzos et al. 2010, among others).

  2. (b)

    Approaches based on the Goodness of fit test where the threshold is selected as the lowest level above which the GPD provides an adequate fit to the exceedances. To analyse the goodness of fit of the GPD, Kolmogorov–Smirnov test and Anderson–Darling test can be used. Applications of this method can be found in Davison and Smith (1990), Dupuis (1999), Choulakian and Stephens (2001), Northrop and Coleman (2014), Langousis et al. (2016) among others.

    In this category, we also include the method based on the Root Mean Error (RMSE) proposed by Li et al.(2014). The RMSE measures the difference between analytical and observed CDFs of exceedances for different thresholds. The threshold with the lowest RMSE is considered the best one.

  3. (c)

    Simple naïve methods Given the general order statistic convergence properties, various rules of thumb have been derived from the literature. Simple fixed quantile rules, like the upper 10% rule of DuMouchel (1983). Ferreira et al. (2003) use the square root of the number of data (n) to specify the number of exceedances (\(N_{u}\)). Ho and Wan (2002) and Omran and McKenzie (2010) use the rule \(N_{u} = \frac{{n^{2/3} }}{{{\text{log}}({\text{log}}\left( n \right))}}\) proposed by Loretan and Philips (1994) to determine the optimal number of exceedances. Neftci (2000), followed by Bekiros and Georgoutsos (2005), proposes the estimation of the threshold as \(1.176 \sigma_{0}\), where \(\sigma_{0}\) is the standard deviation of the sample. In other studies, these methods are classified as ad hoc methods or rules of thumb.

  4. (d)

    Methods in the other category are based on mixtures of a GPD for the tail and another distribution for the “bulk” joined at the threshold (e.g. MacDonald et al. 2011; Wadsworh and Tawn 2012; Naveau et al. 2016). Treating the threshold as a parameter to estimate, these methods can account for the uncertainty from threshold selection in inferences. The major drawback of such models is their ad hoc heuristic definitions, the asymptotic properties of which are still little understood. They have also not had time to be well used in practice and currently there is no readily available software implementation to allow practitioners to gain wider experience (Scarrot and McDonald 2012).

  5. (e)

    Computational approaches Other researchers have suggested using techniques that provide an optimal trade-off between bias and variance. This method involves using bootstrap simulations to numerically calculate the optimal threshold considering the trade-off between bias and variance. Applications of this method can be found in Danielsson et al. (2001), Drees et al. (2000), Ferreira et al. (2003), Hall (1990) and Beirlant et al. (2004). In general, the restrictive assumptions underlying these approaches hinder their wide applicability.

  6. (f)

    Other approaches Other approaches different from the aforementioned are proposed by Dupuis (1999), Thompson et al. (2009) and De Zea Bermudez et al. (2001). See Scarrot and McDonald (2012) for a detailed review of these methods.

Recently, new methods have been developed to automate some of the existing proposals, especially those based on visual data inspection, see for instance Wu and Qiu (2018), Bader et al. (2018), Caballero-Megido et al. (2018) and Queensley et al. (2019), Schneider et al. (2021) among others. Wu and Qiu (2018) propose a method to select the suitable threshold based on multiple criteria decision analysis (MCDA). In MCDA, Chi Square test, Kolmogorov–Smirnov (K-S) test and Root Mean Square Error (RMSE) are combined as the test criteria and the weight of these criteria is calculated using the entropy method. Thus, the MCDA can integrate results obtained from the goodness-of-fit test under different criteria into a comprehensive one, which makes the selection more scientific and objective (Wang et al. 2009). Bader et al. (2018) develop an efficient technique to evaluate and apply the Anderson–Darling test to the sample of exceedances above a fixed threshold. In order to automate threshold selection, this test is used in conjunction with a recently developed stopping rule that controls the false discovery rate in ordered hypothesis testing. Caballero-Megido et al. (2018) propose a new automated method that mimics the enduringly popular visual inspection method. The purpose of the automated graphic threshold selection (AGTS) method, in the absence of a priori threshold value, is to guide in the choice of the threshold which requires judgement and expertise, making the process simple and approachable, whilst being reproducible and less subjective. Queensley et al. (2019) propose an alternative way of selecting the threshold where, instead of choosing individual thresholds in isolation and testing their fit, they make use of the bootstrap aggregate of these individual thresholds which are formulated in terms of quantiles. The method incorporates the visual technique and is aimed at reducing the subjectivity associated with solely using the eye inspection approach (EIA). Schneider et al. (2021) suggest a couple of automated methods for threshold selection. The first one consists in estimating and minimizing the integrated square error (ISE) between the exponential density and its parametric estimator employing the Hill estimator. This is based on the null hypothesis that the log-spacings between a sample of thresholds are indeed exponentially distributed. The error function that obtains is called the inverse Hill statistic (IHS). This method exhibits high fluctuations for small thresholds, which might make the automated selection of the minimum highly variable. To control this problem, the authors propose a smooth IHS. The second method consists in look for a sample fraction of optimal thresholds that minimise the asymptotic mean squared error (AMSE) of the Hill estimator.

Risk measure

According to Jorion (2001), “VaR measure is defined as the worst expected loss over a given horizon under normal market conditions at a given level of confidence”. Thus, VaR is a conditional quantile of the asset return loss distribution.

Let \(X_{1} ,X_{2} , \ldots ,X_{n}\) be identically distributed independent random variables representing the financial returns. Using \(F\left( x \right)\) to denote the cumulative distribution function, \(F\left( x \right) = {\text{Pr}}\left( {\left. {X_{t} \le x} \right|{\Omega }_{t - 1} } \right)\) conditioned to the information available at \(t - 1\) (\({\Omega }_{t - 1} )\). Assume that {\(X_{t}\)} follows the stochastic process given by

$$X_{t} = \mu_{t} + \tilde{\sigma }_{t} z_{t } \hspace{0.5cm} z_{t} \sim iid\left( {0,1} \right)$$
(8)

where \(\tilde{\sigma }_{t}^{2}\) = \(E\)(\(\left. {z_{t}^{2} } \right|{\Omega }_{{{\text{t}} - 1}}\)) and \(z_{t} { }\) has the conditional distribution function \(G\left( z \right),\) \(G\left( z \right) = { }P\left( {z_{t} < z{{|\Omega }}_{t - 1} } \right).\) The VaR with a given probability \(\alpha\) ∈ (0, 1), denoted by \({\text{VaR}}\left( \alpha \right),\) is defined as the \({\upalpha }\) quantile of the probability distribution of financial returns \(F\left( {{\text{VaR}}_{t} \left( \alpha \right)} \right) = \Pr \left( {X_{t} < {\text{VaR}}_{t} \left( \alpha \right)} \right)\) = \(\alpha\). In this paper, we use POT approach to estimate the tail of the distribution of the standardized residuals and thus later estimate the risks measure. As the GPD is only defined for positive values, we multiply our data by (− 1) and thus move the left tail to the right side. Therefore, the VaR of a portfolio at \(\alpha \%\) probability will be calculated as

$${\text{VaR}}_{t} \left( \alpha \right) = \mu_{t} + \tilde{\sigma }_{t} q_{1 - \alpha }$$
(9)

where \(\mu_{t}\) and \(\tilde{\sigma }_{t}\) represent the conditional mean and the conditional standard deviation of the returnsFootnote 6 and \(q_{1 - \alpha }\) is the quantile (\(1 - \alpha )\) of the GPD (Eq. 6).

The ES with a given probability \(\alpha\) ∈ (0, 1), denoted by \({\text{ES}}\left( \alpha \right),\) is defined as the average of all losses that are greater than or equal to VaR, i.e. the average loss in the worst \(\alpha\) % cases:

$${\text{ES}}_{t} \left( \alpha \right) = E\left[ {\left. X \right| X \ge {\text{VaR}}\left( \alpha \right)} \right] = \mu_{t} + \tilde{\sigma }_{t} E\left[ {\left. z \right| z \ge q_{1 - \alpha } { }} \right]$$
(10)

It can be demonstratedFootnote 7 that the mean of the excess distribution \(F_{{q_{1 - \alpha } }} \left( y \right)\) over the threshold \(q_{1 - \alpha }\) is given by

$$E\left( {\left. z \right|z \ge q_{1 - \alpha } { }} \right) = \frac{{ q_{1 - \alpha } }}{1 - \xi } + \frac{\sigma - \xi u}{{1 - \xi }}$$
(11)

Replacing (11) in (10) we obtain the ES measure under the conditional EVT.

$${\text{ES}}_{t} \left( \alpha \right) = E\left[ {\left. X \right| X \ge {\text{VaR}}\left( \alpha \right)} \right] = \mu_{t} + \tilde{\sigma }_{t} \left[ {\frac{{ q_{1 - \alpha } }}{1 - \xi } + \frac{\sigma - \xi u}{{1 - \xi }}} \right]$$
(12)

Backtesting

Backtesting VaR

To evaluate the accuracy of the VaR estimates, several tests have been used. All of these tests are based on the indicator variable. We have an exception when \(r_{t + 1} < {\text{VaR}}_{\alpha }\); then, the exception indicator variable (It+1) is equal to one (zero in other cases).

To check the accuracy of the VaR estimates, we have used four standard tests: unconditional (LRuc), independent and conditional coverage (LRind and LRcc) and dynamic quantile (DQ) tests.

Kupiec (1995) shows that if we assume that the probability of obtaining an exception is constant, the number of exceptions \(x = \sum I_{t + 1}\) follows a binomial distribution \(B\left( {N,\alpha } \right)\), where \(N\) represents the number of observations. An accurate measure \({\text{VaR}}_{\alpha }\) should produce an unconditional coverage \(\left( {\hat{\alpha } = \frac{{\sum I_{t + 1} }}{N}} \right)\) equal to \(\alpha\) percent. The unconditional coverage test has a null hypothesis \(\hat{\alpha } = \alpha ,\) with a likelihood ratio statistic:

$${\text{LR}}_{{{\text{uc}}}} = 2\left[ {\log \left( {\hat{\alpha }^{x} \left( {1 - \hat{\alpha }} \right)^{N - x} } \right) - \log \left( {\alpha^{x} \left( {1 - \alpha } \right)^{N - x} } \right)} \right]$$
(13)

which follows an asymptotic \({ }\chi^{2} \left( 1 \right)\) distribution. The conditional coverage test, developed by Christoffersen (1998), jointly examines whether the percentage of exceptions is statistically equal to the one expected \(\left( {\hat{\alpha } = \alpha } \right)\) and the serial independence of the exception indicator. The likelihood ratio statistic of this test is given by \({\text{LR}}_{{{\text{cc}}}} = {\text{LR}}_{{{\text{uc}}}} + {\text{LR}}_{{{\text{ind}}}}\), which is asymptotically distributed as \({ }\chi^{2} \left( 2 \right)\), and the \({\text{LR}}_{{{\text{ind}}}}\) statistic is the likelihood ratio statistic for the hypothesis of the serial independence against first-order Markov dependence.Footnote 8 Finally, the dynamic quantile test proposed by Engle and Manganelli (2004) examines if the exception indicator is uncorrelated with any variable that belongs to the information set \({{ \Omega }}_{t - 1}\), available when the VaR is calculated. This test is a Wald test of the hypothesis that all slopes are zero in the regression:

$$I_{t} = \beta_{0} + \mathop \sum \limits_{i = 1}^{p} \beta_{i} I_{t - i} + \mathop \sum \limits_{j = 1}^{q} \mu_{j} X_{t - j}$$
(14)

where \({ }X_{t - j}\) are the explanatory variables contained in \({\Omega }_{t - 1}\). This statistic is introduced as five explanatory variable lags of VaR. Under the null hypothesis, the exception indicator cannot be explained by the level of VaR, i.e. \({\text{VaR}}\left( \alpha \right)\) is usually an explanatory variable to test if the probability of an exception depends on the level of the VaR.

Backtesting ES

In this paper, we use McNeil and Frey (2000) test for the conditional expected shortfall. This test is likely the most successful in the literature. These authors develop a test to verify that a model provides much better estimates of the conditional expected shortfall than any other. The authors are interested in the size of the discrepancy between the return \(r_{t + 1}\) and the conditional expected shortfall forecast \({\text{ES}}_{t} \left( \alpha \right)\) in the event of quantile violation. The authors define the residuals as follows:

$$Y_{t + 1} = \frac{{r_{t + 1} - {\text{ES}}_{t + 1} \left( \alpha \right)}}{{\sigma_{t + 1} }}$$
(15)

Replacing Eqs. (8) and (12) in Eq. (15), we have the next expression:

$$y_{t + 1} = z_{t + 1} - E\left( {\left. z \right|z < q_{\alpha } } \right)$$
(16)

It is clear that, under model (5), these residuals are i.i.d. and that, conditional on \(\left\{ {r_{t + 1} < {\text{VaR}}_{t + 1} \left( \alpha \right)} \right\}\) or equivalent \(\left\{ {z_{t + 1} < q_{\alpha } } \right\}\), they have an expected value of zero. Suppose we again backtest on days in the set \(T\). We can form empirical versions of these residuals on those specific days on which violations have occurred, i.e. days in which \(\left\{ {r_{t + 1} < {\text{VaR}}_{t + 1} \left( \alpha \right)} \right\}\). The authors call these residuals exceedances and denote them by \(\{ \hat{y}_{{t + 1}} :t\,\epsilon\, T.r_{{t + 1}} {\text{ < VaR}}_{{t + 1}} \left( \alpha \right)\}\) where \(\hat{y}_{t + 1} =\) \(\frac{{r_{t + 1} - \widehat{{{\text{ES}}}}_{t + 1} \left( \alpha \right)}}{{\hat{\sigma }_{t + 1} }}\) and \(\widehat{{{\text{ES}}}}_{t + 1} \left( \alpha \right)\) is an estimation of the conditional expected shortfall.

Under the null hypothesis, in which we correctly estimate the dynamic of the process \(\mu_{t + 1}\) and \(\sigma_{t + 1}\) and the first moment of the truncated innovation distribution \(E\left( {\left. z \right|z < q_{\alpha } } \right),\) these residuals should behave such as an i.i.d sample with a mean of zero. Thus, for testing whether the estimates of the expected shortfall are correct, we must test if the sample mean of the residual is equal to zero against the alternative that the mean of \(y\) is negative. Indeed, given a sample \(\left\{ {y_{t + 1} } \right\}\) of size \(N\) (where \(N\) is the number of violations in the \(T\) period), the sample mean \(\overline{y}\) converges in distribution to standard normality, as \(N\) tends to \(\infty\) by the central limit theorem. In other words, given mean \(\mu_{y}\) and variance \(\sigma_{y}\) of population

$$\sqrt N \left( {\frac{{\overline{y} - \mu_{y} }}{{\sigma_{y} }}} \right) \to N\left( {0, 1} \right)$$
(17)

By applying the central limit theorem, the statistics for testing the null hypothesis are given by

$$t = \frac{{\overline{y}}}{{\frac{{S_{y} }}{\sqrt N }}}\sim t_{N - 1}$$
(18)

where \(\overline{y}\) and \(S_{y}\) are the sample mean and the sample standard deviation, respectively, of the exceedance residuals.

Forecasting daily market risk capital charges

Basel II Accord required financial institutions to meet daily capital requirements based on VaR estimates (BCBS 1996, 2006). The Basel II Accord specified that daily capital charges (DCC) must be set at the higher of the previous day’s VaR or the average VaR over the last 60 business days, multiplied by a value between 3 and 4 depending on the number of violations (see Table 1) that occurred in the 250 days prior to the estimation of capital charges \({\text{DCC}}_{t} = {\text{sup}}\left\{ { - k \times \overline{{{\text{VaR}}}}_{60} , - {\text{VaR}}_{t - 1} } \right\}\).

Table 1 Basel accord penalty zones

Recently, the Basel Committee for Banking Supervision (BCBS) has promoted a change in international financial regulation. Under the new regulation based on the Basel solvency framework (BCBS 2012, 2016, 2017, 2019), known as Basel III, financial institutions must calculate the market risk capital requirements based on the Expected Shortfall (ES) measure, replacing the Value at Risk (VaR) measure.

Following Chang et al. (2019), we evaluate the market risk capital requirement based on ES measure, which is the market risk benchmark according to Basel III. Thus, the forecasting daily market risk capital requirement (DCR) at time t can be calculated as follows:

$${\text{DCR}}_{t} = {\text{sup}}\left\{ { - k \times \overline{{{\text{ES}}}}_{60} , - {\text{ES}}_{t - 1} } \right\}$$
(19)

Case study

Dataset overview

The data consist of the S&P500 stock index extracted from the Thomson-Reuters-Eikon database. The index is transformed into returns by taking the logarithmic differences of the closing daily price (in percentage). We use daily data for the period January 3rd, 2000, through December 30th, 2021. The sample size is 5534. Figure 1 shows the evolution of the daily index and returns of the S&P500. The index shows a sawtooth profile alternating periods with an upward slope with a period of sudden decreases.

Fig. 1
figure 1

S&P500

In addition, we can observe that the range fluctuation of daily returns is not constant, which means that the variance of the returns changes over time. The volatility of S&P500 was particularly high from 2008 to 2009, coinciding with the period known as the Global Financial Crisis, and in the first quarter of 2020, coinciding with the beginning of the COVID-19 pandemic. The basis descriptive statistics are provided in Table 2. The unconditional mean daily return is very close to zero (0.021%) which is typical of daily returns. The skewness statistic is negative, implying that the distribution of daily returns is skewed to the left. The kurtosis coefficient shows that the distribution has much thicker tails than the normal distribution. Similarly, the Jarque-Bera statistic is statistically significant, rejecting the assumption of normality. All this evidence shows that the empirical distribution of daily returns cannot be fit by a normal distribution, as it exhibits a significant excess of kurtosis and asymmetry (fat tails and peakness).

Table 2 Descriptive statistics

Before continuing, we briefly summarize the steps performed in this study. For each of the thresholds selected, first of all, we evaluate the fit of the GPD, second, we analysed the stability of the GPD parameters. In the third place, the sensitivity of the high quantiles of the GPD to the threshold choice is evaluated (“Fitting the GPD” section). Later, we evaluate the sensitivity of the risk market measure to the threshold choice (“Sensitivity of the risk measures to changes in the threshold” section). Fifth, we assess the accuracy of the estimated risk measures (“Analysing the quality of the risk estimates” section). Finally, for each selected threshold, we calculate the capital charges based on the ES measure (“Analysing the sensitivity of forecasting daily capital charges to the selected threshold” section). The objective is to evaluate how sensitive the capital requirements are to the choice of the threshold.

Fitting the GPD

In this section, we fit GPD to the data for a set of 20 thresholds. This section aims to evaluate the sensitivity of the parameters and quantiles of the generalized Pareto distribution (GPD) to changes in the threshold. The thresholds were chosen from a quantile range between the 80th and the 99th percentile at 1% increments. As the size of the sample is 5534 daily returns, the percentile 80th gives 1107 exceedances, whilst the 99th percentile gives 56 exceedances. As the threshold increases by one unit, the number of exceedances decreases by 55 units. According to the theory, the distribution of the exceedances defined as \(F_{n} \left( {X - u} \right)\) for \(X \ge u\) may be approximated by the GPD denoted by \(G_{\xi , \sigma } \left( y \right)\). Thus, for each threshold, we fit a GPD and check that the sample of the excesses above the threshold follows a \(G_{\xi , \sigma } \left( y \right)\). Examples of this fit can be seen in Fig. 2 for the 5534 returns with thresholds set at 0.74% and 2.76%. These thresholds give 1107 and 56 exceedances, respectively. Parameters are estimated by maximum likelihood and the resulting GPD curves are superimposed on the empirical estimate of the distribution function of the exceedances. As we can see, GPD seems to fit pretty well the exceedances samples. In concordance with this, the Kolmogorov–Smirnov test used to test if the sample of exceedances follows a GPD cannot be rejected in any case (see Table 3).

Fig. 2
figure 2

GPD. In the left plot, GPD is fitted to 1107 exceedances over the threshold of 0.74%. In the right plot, GPD is fitted to 56 exceedances over the threshold of 2.76%

Table 3 Maximum likelihood estimations (GPD)

Let \(u_{1} ,\) \(u_{2} ,\)…, \(u_{n}\) be the set of thresholds selected (\(n = 20)\). For \(j = 1, \ldots ,n\), let \(\hat{\xi }_{{u_{j} }}\) and \(\hat{\sigma }_{{u_{j} }}\) be the estimators of the shape and scale parameters based on the exceedances over the threshold \(u_{j}\). Figure 3 displays the estimation of \(\xi\) and \(\sigma\), respectively, as a function of the threshold \(u\).

Fig. 3
figure 3

Maximum Likelihood Estimations GPD

We observe that as the threshold increases, the value of \(\xi\) increases. In the case of the scale parameter, the opposite occurs; as the threshold increases, the value of \(\sigma\) is reduced. As we expected, in both cases, the accuracy of the estimations decreases as the threshold increases.

The estimation of the shape parameter, which determines the weight of the tail in the distribution, is very sensitive to changes in the threshold. For instance, the value of \(\xi\) increases by 2233% when the threshold moves from the 80th percentile to the 90th percentile. From the 90th percentile to the 99th percentile, the increase is equal to 227%. The value of the scale parameter is also sensitive to changes in the threshold; however, in this case, the changes are not that striking. Thus, in accordance with the literature, we find that the parameter estimations are very sensitive to the threshold selected for estimating GPD. But, what about the GPD quantiles? Do they depend on the threshold choice? To answer this question, we analyse the sensitivity of the high quantiles generalized Pareto distribution to changes in the threshold. For this analysis, we just only focus on the high quantiles (95%, 96%, 97%, 98% and 99%) as they are the only relevant quantiles in quantifying market risk. The quantile of the GPD is calculated using the Eq. (6).

Figure 4 displays these quantiles as a function of the threshold \(u\). What pays our attention is that the line representing the quantiles as a function of the threshold is completely flat, which means that the high quantile of the GPD does not depend on the threshold choice, at least in the range of threshold considered in this paper. This result is pretty striking. Table 4 displays the differences in \(\alpha\) quantiles obtained for all thresholds considered against the threshold benchmark. Panel (a) displays these differences regard to the threshold corresponding to percentile 90th, which is in the middle point of the considered range vector and has been used successfully in many empirical papers in VaR estimate (see Abad and Benito 2013; Benito et al. 2017). Panel (b) displays these differences with regard to the threshold corresponding to percentile 97th, which is the optimal threshold according to the excess mean plot.Footnote 9 In Panel (a), we observe that for a large set of thresholds, from a return corresponding to the 81st percentile to a return corresponding to the 97th percentile, the differences in quantile estimation do not exceed the 7 basis points. Even more, from a return corresponding to the 85th percentile to a return corresponding to the 94th percentile, the differences in quantile estimation {95.0th to 98th} do not exceed the 2 basis points. As regards Panel (b), although the differences in quantile estimation {95.0th to 98th} are larger compared with Panel (a), they do not exceed 8 basis points from a return corresponding to the 80th percentile to a return corresponding to the 99th percentile. Just only in the case of the 99th quantile, the differences are somewhat higher.

Fig. 4
figure 4

GPD quantiles. GPD quantiles at 95%, 96% 97%, 98% and 99% probability are displayed as a function of the thresholds

Table 4 Differences in quantiles

As the estimation of the market risk depends on the quantile of the GPD, this preliminary analysis may suggest that the choice of the threshold in the framework of the POT method may not be very relevant in quantifying market risk.

Sensitivity of the risk measures to changes in the threshold

We can say that the analysis presented in the previous section is in accordance with the literature; we observe that the estimates of the parameters that describe the generalized Pareto distribution depend significantly on the threshold selected for the estimation. However, surprisingly the high quantiles of the GPD keep approximately constant. In this section, we want to go a step further by assessing to what extent the selection of the threshold affects the quantification of financial risk. With this objective, a set of 20 thresholds has been selected. The parametric estimates corresponding to these thresholds were presented in the previous section.

To quantify the risk, we use VaR and ES measures, which were presented in “Risk measure” section. The expression for these measures is given by

$$VaR_{t} \left( \alpha \right) = \mu_{t} + \tilde{\sigma }_{t} q_{1 - \alpha } \;\;\;\;\;ES_{t} \left( \alpha \right) = \mu_{t} + \tilde{\sigma }_{t} \left[ {\frac{{ q_{1 - \alpha } }}{1 - \xi } + \frac{\sigma + \xi u}{{1 - \xi }}} \right]$$
(19)

where \(\mu_{t}\) is the conditional mean return that is assumed constant (\(\mu_{t} = \mu )\), \(\tilde{\sigma }_{t}\) represents the conditional standard deviation of the return; \(q_{1 - \alpha }\) is the percentile \(1 - \alpha\) of the GPD and \(\xi\) and \(\sigma\) are the shape and scale parameters of the GPD. For the estimation of the conditional standard deviation of the returns, we use an APARCH model.

The sample period is divided into a learning sample from January 3rd, 2000, to December 30th, 2016, and a forecast sample from January 3rd, 2017, to the end of December 2021. For each day of the forecast period, we will generate estimations of VaR and ES measures. These forecasting measures are obtained one day ahead at the 95% and 99% confidence levels. Table 5 presents the descriptive statistics of the differences between the market risk estimates obtained from the threshold corresponding to th 90th percentile and the market risk estimates obtained from the remaining selected thresholds. For VaR estimates at the 95% confidence level, from a return corresponding to the 80th percentile to a return corresponding to the 96th percentile, the mean of the differences does not exceed the 3 basis points with a standard deviation between 1 and 3 basis points. For the thresholds corresponding to the 97th and 99th percentiles, the mean of the differences in the VaR estimate at 95% confidence level increases moving between 6 and 9 basis points.

Table 5 Differences between VaR and ES estimates

The standard deviation of these differences also increases, moving between 4 and 28 basis points. For these thresholds, the minimum difference becomes 45 basis points (99th percentile), whilst the maximum difference becomes 229 basis points (99th percentile). For VaR estimates at the 99% confidence level, we find similar results. For a large set of thresholds (from the 82nd percentile to the 96th percentile), the mean and standard deviation of the differences are very small, not exceeding 5 basis points. Only in the case of the threshold corresponding to 99th percentiles, the differences are higher. In summary, we find that for a large set of thresholds (the return corresponding to the 80th percentile to the 96th percentile), the quantification of risk that we obtain from VaR measures is similar. The same conclusion can be drawn from the ES measure. At the 95% confidence level and from the 82nd percentile to the 98th percentile, the mean and standard deviation of the differences do not exceed 3 basis points. At the 99% confidence level, the differences are even smaller and do not exceed 1 basis point (from the 80th percentile to the 99th percentile). Thus, we can conclude that in the selected range, the choice of threshold in the framework of the POT method may not be very relevant in quantifying market risk.

Analysing the quality of the risk estimates

In this section, we are interested in analysing the accuracy of the risk measures (VaR and ES) obtained from the conditional EVT. In addition, we will analyse if the quality of these measures depends on the threshold selected for applying EVT. Therefore, we will use the backtesting techniques presented in “Backtesting” section.

To evaluate the accuracy of the VaR estimates, we have used four standard tests: unconditional (LRuc), independent (LRind), conditional coverage (LRcc) and dynamic quantile (DQ) tests. The results of these tests are presented in Table 6. In this table, we also present the number and the percentage of exceptions. The first thing that pays our attention when viewing Table 6 is that for a large set of thresholds (from the 82nd percentile to the 93rd percentile), the number of exceptions is very close to the expected one.Footnote 10 In the cases in which the number of exceptions differs from the theoretical one, the differences are very small. Thus, at the 95% confidence level, the percentage of exceptions ranges from 4.61% to 5.56%, corresponding to the 80th percentile and the 99th percentile. At the 99% confidence level, the percentage of exceptions ranges from 1.43% to 1.59%, also very similar to the expected one (1%). To test statistically whether the number of exceptions is equal to the theoretical one, we use the aforementioned test. We cannot reject the null hypothesis “that the VaR estimates are accurate” for any of the thresholds selected. To test whether the ES estimations are correct, we use the procedure proposed by McNeil and Frey (2000) test. The results of these tests are displayed in Table 6. The null hypothesis that states that the ES(95%) estimates are correct is rejected for all thresholds at both 5% and 1% probability. However, the hypothesis that the ES(99%) estimates are correct is rejected at 5% for all thresholds, but not at 1%.

Table 6 Backtesting VaR and ES for S&P500 (2017–2021)

The results presented in this section indicate that the choice of threshold in the framework of the POT method may not be relevant in quantifying market risk when we use the VaR and ES measures for this task.

Analysing the sensitivity of forecasting daily capital charges to the selected threshold

In this section, we carry out an empirical application in which we evaluate the sensitivity of the daily market risk capital requirement (DCR) to the threshold choice. For this proposal, we follow Chang et al. (2019) and calculate the DCR according to Eq. (19). Figure 5 shows the mean of the DRC calculated on the base of ES measure at 99% confidence level for each of the thresholds selected.

Fig. 5
figure 5

Mean Market Risk Capital Requirement calculated on the ES measure

The visual inspection of this figure suggests that there is a large set of thresholds (85th to 95th) that provide similar results, observing some differences in the lowest (80th to 84th) and highest (96th to 99th) thresholds. Table 7 which shows the mean, standard deviation and range of daily capital requirements confirms these previous results. For a large set of thresholds (80th to 95th proxy), the differences in DRC are under 5 basis points. This implies that for investment portfolios worth 1 million euros, the differences do not exceed 50 thousand euros. However, for thresholds outside this range, the differences are somewhat greater.

Table 7 Statistics of the daily capital requirement (ES 99%)

Robustness analysis

In the above section, we show that in the framework of the POT approach, there is a set of thresholds that provide a similar market risk estimate. It is due to the fact that the GPD quantiles are not sensitive to the threshold choice. Just only for the threshold 99th some differences are found. To corroborate the validity of this result, in this section, we extend the study to a set of 14 assets. In accordance with the performed study for the S&P500, for each of these assets, we select a set of 20 thresholds, from 80th percentile to 99th percentile. For each threshold selected, first, we apply the conditional POT approach to analyse the sensitivity of the GPD quantiles to the threshold choice. Second, we obtain forecast VaR and ES measures 1 day ahead and analyse the differences among them for the set of thresholds selected. To last, we analyse the sensitive of the market risk capital charges to the threshold choice.

Before analysing the accuracy of the market risk measure, we evaluate the sensitivity of high quantiles from generalized Pareto distribution to changes in the threshold. As in the case of S&P500, for this analysis, we just only focus on the high quantiles (95%, 96%, 97%, 98% and 99%) as they are the only relevant quantiles in quantifying market risk. The quantile of the GPD is calculated using the expression (6). Figure 6 displays the GPD quantiles as a function of the threshold for all assets considered. Again, what pays our attention is that the line representing the quantiles as a function of the threshold is quite flat in the threshold range (80th to 96th). Just in the case of the high threshold, corresponding to the 97th to 99th percentiles, some differences are observed, especially for the threshold corresponding to the 99th percentile. For this threshold, the differences are around 25 basis points becoming 50 basis points for some assets as Merval index.

Fig. 6
figure 6

GPD quantiles. GPD quantiles at 95% (black line) 96% (orange line) 97% (blue line) 98% (brown line) and 99% (green line) probability are displayed as a function of the thresholds

After checking that the GPD fits well the upper tail of the distribution of the assets for the set of thresholds considered, we calculate the market risk measure at the 95% and 99% confidence levels. For evaluating the accuracy of the VaR estimates, we use the standard tests that we presented in “Backtesting” section: LRuc, LRind, LRcc and DQ. For each asset, Table 8 displays the number of times that each of these tests is rejected for the 20 thresholds selected.

Table 8 Backtesting VaR and ES

In the footnote in Table 8, we indicate the set of thresholds for which the null hypothesis is rejected. For instance, for CAC40 at a 95% confidence level, LRuc test is rejected once for the threshold corresponding to the 99th percentile. The results obtained for VaR are as follows. According to LRuc tests, in 7 of the 15 considered assets, we do not find evidence against the null hypothesis that the “VaR(5%) estimate is accurate”. This result is independent of the selected threshold, although, for certain indexes, this hypothesis is rejected for some tests performed over the threshold corresponding to the 99th percentile. The results found for VaR at the 99% confidence level are even more conclusive than those for VaR at the 95% confidence level. According to LRuc tests, in 14 of the 15 considered assets, we do not find evidence against the null hypothesis that the “VaR (1%) estimate is accurate”. In certain cases, the accuracy tests provide evidence against the null hypothesis; however, in these cases, the rejection does not depend on the threshold selected. For instance, for the Cooper commodity, the DQ test rejects the null hypothesis for all thresholds. These results suggest that the quantification of the risk through the VaR measure does not depend on the threshold selected for this objective.

To test whether the ES estimations are correct, we use the procedure proposed by McNeil and Frey (2000) test. Overall, we do not find evidence against the null hypothesis that the average of the discrepancy measure is equal to zero indicating that all the threshold provide correct ES estimations for both 95% and 99% confidence level.

The results presented in this section corroborate those obtained for S&P500, indicating that the quantification of market risk through the VaR and ES measures does not depend on the threshold selected for applying the POT method.

To last, for all assets, we calculate the market risk capital requirement on the base of the ES at 99%. Table 9 shows the mean of these requirements. Again we find that in general for all assets there is a wide set of thresholds that give similar results. Just only the extreme thresholds provide capital requirements something different. If the aim of a financial institution is to minimize the market risk capital charges, the optimal threshold is the threshold corresponding to 90th percentile for six of the 14 assets considered. For three indexes (IBEX35, Merval and Nikkei), the highest thresholds (98th and 99th) are the best, whilst for the commodities, the thresholds that minimise market risk capital requirement are the lowest(80th).

Table 9 Forecasting daily capital charges based on ES measure (99% confidence level)

Conclusions

The conditional extreme value theory has been proven to be one of the most successful in estimating market risk. The implementation of this method in the framework of the POT model requires choosing a threshold return for fitting the generalized Pareto distribution. Threshold choice involves balancing bias and variance. To determine the optimal threshold, several techniques have been proposed such as graphic methods, ad hoc methods, or methods based on goodness-of-fit contrasts. However, none of these techniques have been proven to provide better results than others.

In this paper, we ask if the threshold choice is relevant in measuring market risk. In other words, in this study, we assess to what extent the selection of the threshold is decisive in quantifying the market risk. To measure market risk, we have used the value at risk (VaR) and expected shortfall (ES) measures. The study has been done for the S&P500index.

Previously, we analyse both, the sensitivity of the parameter estimates and GPD quantiles to the threshold choice. The results obtained are as follows. First, we find that following the literature, the parameter estimations are very sensitive to the selected threshold for estimating GPD. However, the quantiles of the GPD do not change much when the threshold changes, particularly for high quantiles (95th, 96th, 97th, 98th and 99th), which are relevant in risk estimation. Second, for a large set of thresholds (from the 80th percentile to the 96th percentile), the VaR estimations are practically equivalent. A similar finding occurs for the ES measure. In a last application, we calculate the market risk capital requirements on the base of the ES(99%) estimations. The results reveals that there is a set of thresholds which provides the same results finding some differences for the higher percentiles.

The results obtained indicate that from the market risk management point of view there is not an optimal threshold but that there is a set of optimal thresholds which provide similar market risk measures. Thus, we can conclude that in market risk estimation, the researchers and practitioners should not focus excessively on the threshold choice, as a wide range of them produce the same risk estimates.

To corroborate these results, we have extended the S&P500 index study to a set of 14 assets (stock market indexes, commodities and exchange rates). The results obtained for these assets corroborate the results obtained for S&P500.

To last, although overall the quantification of the risk does not depend on the threshold choice, for a certain threshold, some differences are found therefore, the financial institution may be interested in choosing the threshold that minimises the market risk capital requirement.