1 Introduction

Every year, regulated financial institutions (banks, insurance companies, financial advisors etc.) must assess the amount of retained capital to cover operational risk losses that might be suffered in the following year. The European Banking Authority [7] defines operational risk as ”the risk of losses stemming from inadequate or failed internal processes, people and systems or from external events”. Informally, operational risk is the risk of ”things going wrong”. Such capital must be retained by the bank and cannot be used for lending. Consequently, operational risk capital must be sufficient to cover expected and anticipated losses, but should not be over-estimated substantially. Calculating a satisfactory amount is an important part of a bank’s risk control. This paper concerns a particular problem that often arises when calculating operational risk capital. There are cases when the result of a capital calculation is judged subjectively to be ’too large’. These cases arise when the Operational Risk loss data contains very extreme values, and alternative calculations have to be done. There is currently no objective way to decide whether or not the result of a capital calculation is ’too large’, and the purpose of this paper is to present an objective decision process.

1.1 Structure of this Paper

The introduction is followed by a summary of the characteristics of operational risk and problems associated with measuring it. The literature review (Sect. 3) covers prior research on modelling probability distributions that are appropriate to operational risk data.

In the main body of this paper (Sect. 4), a framework is presented to assess whether or not a Value-at-Risk calculation for any appropriate data set is ”excessive”. An objective method to make that assessment is proposed. Validation results are presented in Sect. 5, and the framework is evaluated in Sect. 6.

1.2 Nomenclature

The following terms are used extensively throughout this paper.

  1. 1.

    OpRisk is an abbreviation for Operational Risk.

  2. 2.

    ’Operational Risk Data’ (usually abbreviated to ’data’ in this paper) comprising sets of ’fat-tailed’ data, each with a date stamp.

  3. 3.

    VaR is the usual abbreviation for Value-at-Risk. In this paper, VaR refers to the particular instance of Value-at-Risk specified by international financial regulators, which is Value-at-Risk at 99.9%, with a 1-year time horizon.

  4. 4.

    ’Loss’ (’Losses’) refers to payments made to customers/clients in respect of Operational Risk events.

  5. 5.

    ’Tail’ refers to the subset of all losses, comprising the largest losses, in our case determined by a percentage.

  6. 6.

    ’Body’ is the set complement of the Tail losses.

  7. 7.

    GPD is the Generalised Pareto Distribution with location, scale and shape parameters \(\mu , \sigma , \xi \) respectively

  8. 8.

    ’Annual frequency’ is the number of elements in a data set divided by the number of years spanned by the data.

  9. 9.

    ’Distribution VaR’ refers to VaR calculated for all data (i.e. not restricted to the body or tail)

  10. 10.

    ’GoF’ means Goodness-of-Fit in the context of significance testing

Closely connected are combinations of the above terms which refer to ’loss’ or VaR applied to a distribution tail, body.

2 Operational Risk and Value-at-Risk

The Basel Committee on Banking Supervision (the ”Basel Committee”) is the source of regulations that govern the management of operational risk, and has approved a calculation method known as the Advanced Measurement Approach (AMA) [2]. Under AMA regulations, banks can implement their own risk models, subject to broad principles specified by the national financial regulators. The capital calculation is usually done by fitting an appropriate probability distribution to data, and calculating a standard risk metric, VaR. VaR can be thought of, informally, as the ’largest loss that could be tolerated’. Formally, it measures a maximum amount of money that could be lost over a given time horizon, given a probability of loss [15]. The probability set by international financial regulations [2] is 0.001 (usually expressed as ’99.9% confidence’). Operational Risk data sets are usually modelled by ’fat-tailed’ distributions, which are polynomial-like rather than exponential-like for large losses. A common model for calculating VaR is known as the Loss Distribution Approach [9], abbreviated here to LDA. The LDA incorporates a convolution of two statistical models, one for loss frequency (mostly Poisson or Negative Binomial are used), and the other for loss severity.

Table 1 illustrates the problem we are trying to solve. It shows best-fit results with corresponding VaR values derived by fitting the distributions shown, all of which are viable candidates. The VaR values vary enormously, from very small to huge. The LogNormal Mixture distribution gives a VaR which exceeds the dollar figure for the gross domestic product of the world (84.75 trillion USD in 2020 - https://data.worldbank.org/indicator/NY.GDP.MKTP.CD )! These bizarre results are surprisingly common. The result of analysis in Mitic [20] is that a VaR value should reflect the overall ’size’ of the data. Specifically, it should be not more than \(7 \frac{1}{3}\) times the annualised data sum. For the data set used in Table 1, the data sum was 2144 mEUR, and data spanned 10.5 years. The upper limit for its VaR should then be approximately 1500 mEUR, which eliminates the highest estimates in Table 1.

Table 1 Operational Risk p-values and Value-at-Risk for candidate fat-tailed distributions. Only one is a Goodness-of-fit fail (p-value \(> 0.05\)), and is indicated in bold

There are two principal reasons for the discrepancies noted in Table 1: distribution parameter estimation, and the gradient of the derived distribution CDF for large losses. In the former case, paramater estimation can be difficult in cases where there is little variation in loss severity within a large subset of losses. In those circumstances, there can be little change in a maximum likelihood value even if a target parameter changes by a relatively large value. The maximum likelihood estimation then terminates because it has reached a specified maximum number of iterations, and not because it has converged. In particular, estimation of the \(\xi \) parameter for Generalised Pareto, Gumbel, Frechet distributions is subject to this problem. Usually the Hill estimator is a reliable way to estimate the \(\xi \)-value, provided that there is sufficient data. In the second case, the CDF gradient for some distributions is very flat for large data values. The CDF then attains the required 0.999 value at extremely large data values. A particular case is the Generalised Pareto distribution when \(\xi \) is greater than 1. A potential solution to this problem is to reject values in a sample that exceed the maximum observed datum by some predetermined amount. For example, a factor of 10 times the maximum observed datum is one possibility. Without investigation, it is hard to tell which effect operates in any particular case. Parameter estimation for some distributions is known to be reliably convergent (LogNormal, Weibull), whereas for others (Generalised Pareto) it is not. Similarly, the CDF gradient for large loss is known to be near to zero for distributions such as Burr and Generalised Pareto. Further investigation on this topic is merited.

2.1 Theoretical Background: Tail VaR

A distribution tail is generally a principal determinant of VaR. Furthermore, the Pickands-Balkema-deHaan Theorem [25] shows that the tail distribution approaches a GPD as the tail size decreases. Fitting a GPD to tail data is a fundamental part of our framework for assessing whether or not a VaR calculation is ”excessive”. The basis of the Pickands-Balkema-deHaan Theorem is the Excess distribution. Given a high-value threshold u, \(F_u(x)\) is the conditional distribution where a random variable X exceeds \(x+u\), given that X exceeds u. That is:

$$\begin{aligned} F_u(x) = P(X> u + x | X > u) \end{aligned}$$
(1)

The Pickands-Balkema-deHaan Theorem states that the distribution \(F_u(x)\) is approximated by the GPD, with the density and distribution functions f and F respectively in Eq. 2. A GPD is characterised by 3 parameters: location (\(\mu \)), scale (\(\sigma \)) and shape (\(\xi \)). Of these, \(\xi \) is the principal determiner of VaR.

$$\begin{aligned} f(x: \mu , \sigma , \xi )&=\Big (\frac{1}{\sigma }\Big ) \Big ( 1 + \frac{\xi (x-\mu )}{\sigma } \Big )^{-1-\frac{1}{\xi }} \quad x\ge \mu , \sigma>0, \xi>0 \nonumber \\ F(x: \mu , \sigma , \xi )&= 1- \Big ( 1 + \frac{\xi (x-\mu )}{\sigma } \Big )^{-\frac{1}{\xi }} \quad x\ge \mu , \sigma>0, \xi >0 \end{aligned}$$
(2)

2.2 Proposed Solution

We assert that maximum appropriate VAR may be determined using the tail of a data distribution, modelled by a GPD. The proposed solution is to implement a framework that introduces two concepts: a GPD Surface and a Credible Region. A GPD Surface is a 3-D plot of VaR, for a given frequency parameter \(\nu \), derived from a GPD on the vertical axis, with independent GPD variables \(\xi \) and \(\sigma \) on the horizontal axes. Each GPD Surface has a ’near-horizontal’ portion when \(\xi \) is small, with a marked ’upturn’ for larger \(\xi \). Examples are shown in Sect. 5.2. The Credible Region is a subset of the ’near-horizontal’ portion, and its boundary partitions VaR values that are subjectively ”too big” from those that are not. The framework comprises the following steps for each value of the frequency parameter \(\nu \).

  • Assume that maximal distribution VaR can be estimated using tail VaR.

  • Model tail VaR using a GPD at frequency \(\nu \).

  • Find a best-fit distribution, F, for the entire data set.

  • Compare the calculated VaR using F to a reference set of data-independent pre-calculated GPD Surface values derived from a GPD distribution.

  • Formulate a decision rule: ”If the distribution VaR exceeds the GPD-derived VaR, reject F and seek an alternative (non-optimal) fit. Otherwise, accept F.

These steps will be explained in the Methodology section, 4. To give a clearer idea of the two essential concepts in the framework, Fig. 1 shows a typical VaR Surface. The Credible Region corresponds approximately to the purple-blue ”flat” part of the surface. The sharp surface curvature for large \(\xi \) is also apparent. The orange-red part of the surface represents VaR values that are not ”credible”.

Fig. 1
figure 1

Typical GPD Surface: \(\sigma \in [50000, 150000]\), \(\xi \in [0, 1.5]\), \(\nu = 4\)

3 Literature Review

The literature review concentrates on the parts of Extreme Value Theory (EVT) that are relevant to GPD tail loss modelling. There were two principal approaches to modelling the extremes of a set of losses in the mid-\(20^{th}\) century. The development of the second depends on the first, and both are fundamental to the study of distribution tails.

The Block Maxima approach [13] leads to the Generalised Exponential Distribution (GEV), and its three sub-types, Frechet, Gumbel and Weibull. It is so-called because it relates to distributional properties of the maximum loss in a set of partitions of the data. The groundwork was established earlier. In 1923, [6] calculated an expression for the median of the tail distribution. Also in 1923, [17] calculated an expression for the expected value of the tail distribution. In 1927, [10] advanced the theory with a study of the asymptotic distribution of the tail. A year later, [8] showed that extreme limit distributions of the tail can only be one of the three types. That result was not proved rigourously until 1943 [11]. The final result of mid-\(20^{th}\) century work has become known as the Extremal Types Theorem.

A formal distribution function for the Extreme Value Distribution (EVD) was given by [14], following an earlier basis [18]. [5] (Sect. 3.1.4) provides an outline proof of the form of an EVD.

The second approach, Threshold Exceedances, extends the research described above, and leads to the link between the GPD and tail data. The essential formulation was done in the mid-1970s [1, 25]. The basis of the advance was to derive the distributional form of Eq. 2 from the Excess formulation in Eq. 1.

An alternative approach was to characterise convergence in terms of moments. [4] suggested using the distribution mean and standard deviation as loss scaling constants. Validity conditions were given in [24]. [5] (Sect. 4.2.2) provides an outline proof of convergence, and a formal proof is given by [16]. Coles also has many further references to the development of EVT, and also to its early applications. [27] gives a similar overview, with details on parameter estimation.

3.1 Recent Advances

Before 2023, no direct attempts to characterise VaR in terms of the data from which it was calculated had been made. In 2023, two such studies were published. In Mitic [22], an empirical relationship between VaR and the annual loss sum, S, was established. The relationship was expressed in a very simple linear formula: \(\textit{VaR} \sim \frac{S}{2}\). No indication of a VaR ceiling is suggested.

An alternative way of estimating a VaR ceiling was used in Mitic [21]. The median of the distribution of the Maximum Order Statistic of a GPD fit for tail losses is associated with a VaR ceiling via a linear scale factor. Overall, it is easier to apply than the method proposed in this paper, at the expense of a slightly impaired success rate. A third result, Mitic [20] presents a further simple formula to determine whether or not a VaR value is ”excessive”: \(\textit{VaR} \sim 7 \frac{1}{3} S\). This result has the considerable advantage of simplicity, but also at the expense of an impaired success rate.

4 Methodology

Our ’solution’ to the problem of answering the question ”is the calculated VaR too big?” is to compare the calculated VaR to a reference set of data-independent pre-calculated VaR values derived from a GPD distribution. The latter set is termed a GPD Surface. If the calculated VaR exceeds the GPD-derived VaR, the distribution used to calculate VaR is rejected. Otherwise it is accepted. In order to avoid excessive time-consuming calculations, we propose a framework of pre-prepared GPD surfaces, each linked to a given annual loss frequency. An empirically-calculated VaR value can be compared with a GPD surface (which may have to be derived by interpolation) with the same annual frequency. In doing so, the following assumptions are made.

Assumptions

  1. 1.

    Tail VaR can be calculated using a GPD.

  2. 2.

    Tail VaR is representative of VaR for an entire data set (i.e. body VaR is small compared to tail VaR ). An outline proof of this assumption for large data sets is given in Appendix A. It depends on assumptions that the mean and standard deviation of the body are smaller than the mean and standard deviation of the tail.

  3. 3.

    There is a boundary, such that if VaR exceeds the boundary value, the distribution that was used to calculate the VaR value should be rejected in favour of an alternative distribution that gives a VaR value that does not exceed the boundary.

  4. 4.

    The boundary may be approximated as a quadratic

  5. 5.

    An appropriate model that expresses VaR in terms of the GPD parameters \(\{\mu , \sigma , \xi \}\) can be fitted to empirical data.

4.1 The GPD Surface

We first define the essential terms that underpin the framework that follows.

Definition: GPD Surface

A GPD Surface is a mapping from a set of n pairs in a region defined in terms of a quadratic function \(\sigma = Q(\xi ) \) by \(P^\sigma _\xi = \{\sigma _i, \xi _i \;\;| \;\; \xi _i> 0, \sigma _i > 0, \sigma _i \le Q(\xi _i) \}; i=1..n\) and constants \(\{\mu , \nu \}\) to a set of VaR values \(V_i\) calculated using GPD parameters \(\{\mu , \sigma , \xi \}\), with an annual frequency \(\nu \). We denote it by \(\Gamma \).

$$\begin{aligned} \Gamma (\mu , \sigma , \xi , \nu ) = \bigg \{ \{\sigma _i, \xi _i, V_i \} \;\; | \;\; \mu , \nu \bigg \} \quad \sigma _i,\xi _i \in P^\sigma _\xi ; \quad i=1..n \end{aligned}$$
(3)

This definition therefore extends a GPD to a ”pseudo distribution” which has four parameters: the three GPD parameters, plus a frequency parameter. The frequency dependence arises because VaR calculations depend on the data time span. For convenience, we sometimes shorten the notation to \(\Gamma (\nu )\) if an interpolation on \(\nu \) is involved. See Algorithm: GPD-CRED.

Definition: Credible Region

A Credible Region is a subset of a GPD Surface, such that the VaR values that fall within that region satisfy an acceptance criterion that limits the values of \(\sigma \) and \(\xi \) on the GPD Surface to \(\sigma _C\) and \(\xi _C\) respectively. We denote it by \(\Gamma _C(\mu , \nu )\). Note that the term Credible Region does not refer to a term with the same name in Bayesian analysis.

$$\begin{aligned} \Gamma _C(\mu , \sigma , \xi , \nu ) = \bigg \{ \Gamma (\mu , \sigma , \xi , \nu ) \;\; | \;\; \mu , \nu , \sigma \in \sigma _C, \xi \in \xi _C \bigg \} \end{aligned}$$
(4)

The precise way to define the Credible Region will be discussed in Sect. 4.6. In parallel with the abbreviated notation \(\Gamma (\nu )\) for a GPD Surface, we also use the abbreviated notation \(\Gamma _C(\nu )\) for a Credible Region.

4.2 Overall Strategy

The overall strategy is to associate fitted GPD tail parameters with a point on a GPD Surface that is appropriate for the annual frequency of the tail. The VaR derived from a best fit distribution to all the data (not just the tail) can then be compared with a theoretical GPD-derived VaR. The detailed steps are listed below. The actual comparison is done by a calculation of surface curvature. A marked discontinuity in curvature demarcates a region in which a VaR value is of acceptable size from one in which it is not. A marked discontinuity is not so apparent on the GPD-Surfaces, except at very high \(\xi \) values.

There are two parts to the overall calculation. The first (Algorithm GPD-PREP) needs to be done once only, is not data-dependent, and is reused for every data set. In the second (Algorithm GPD-CRED), tail VaR calculated from data is compared with the VaR derived from a GPD Surface and its Credible Region, and a credibility calculation is applied to decide whether or not the VaR calculated from data is ”too big”. An overview of the process is shown in Fig. 2. Each stage is explained in further parts of this section.

Fig. 2
figure 2

Process flow: One-time surface preparation stage GPD-PREP (in blue), and decision stage GPD-CRED (in yellow)

Algorithm: GPD-PREP

The steps below are data-independent, and apply for a single annual frequency \(\nu \).

  1. 1.

    Set a value for \(\mu \) which must be less than or equal to any minimum loss in a data set. \(\mu =1000\) is suggested for tail data.

  2. 2.

    Define a range of \(\sigma \) and \(\xi \) values that are appropriate for a distribution’s GPD tail. We suggest \(\xi \in (0,2.5)\) and \(\sigma \in (5 \times 10^6, 50 \times 10^6)\)

  3. 3.

    With the frequency \(\nu \), for each GPD parameter combination \({\sigma _i, \xi _i}\), calculate VaR \(V_i\) using the fixed value of \(\mu \) and a Poisson frequency model with 1 million simulations.

  4. 4.

    Fit a 2-D surface to the triples \({\sigma _i, \xi _i, V_i}\), with the pair \({\sigma _i, \xi _i}\) as independent variables, and the set \({V_i}\) as the dependent variable.

The result of applying the steps above is a GPD Surface conditioned on an annual frequency \(\nu \). The algorithm is applied to multiple annual frequencies \( \nu = \nu _1, \nu _2, \nu _3,...\), appropriate for the distribution tails commonly encountered. Annual frequencies in the range 1 to 25 are suggested. In this way, a set of GPD Surfaces, each conditioned on an annual frequency, is defined. They serve as reusable reference sets, against which VaR calculated from data can be tested. GPD Surfaces at other frequencies are derived by interpolation or extrapolation.

Algorithm: GPD-CRED

In Algorithm GPD-CRED, the GPD Surfaces are used in conjunction with fat-tailed data sets to examine the suitability of data fits for the purpose of VaR calculations. The steps below form the required framework.

  1. 1.

    Extract the data tail and fit a GPD to it (giving parameters \(\mu \), \(\sigma \) and \(\xi \)), using a Poisson frequency model and 1 million simulations

  2. 2.

    Determine the annual data frequency \(\nu \)

  3. 3.

    Locate a GPD Surface, \(\Gamma (\nu )\) either by selecting an existing one, or by interpolation (with respect to frequency) on the set of GPD Surfaces from Algorithm GPD-PREP

  4. 4.

    Calculate surface curvature Gamma using \(\sigma \), \(\xi \) and \(\nu \)

  5. 5.

    Determine a Credible Region \(\Gamma _C(\nu )\) from \(\Gamma \)

  6. 6.

    Determine boundary values \(\sigma _B\) and \(\xi _B\) for \(\sigma \) and \(\xi \) respectively, using \(\Gamma _C(\nu )\) and parameters \(\sigma \) and \(\xi \)

  7. 7.

    Compare \(\sigma _B\) with \(\sigma \), and \(\xi _B\) with \(\xi \) in a binary decision process, the outcome of which is to either accept the VaR calculated from data, or reject it.

In Sect. 4.8, a method to validate the decision using distributional statistics of the data is presented. This step is not strictly needed in Algorithm GPD-CRED, but provides a measure of the probability that the decision is correct.

4.3 The GPD Surface Fit

Empirical evidence (see Sect. 5.3) shows that VaR increases exponentially with the GPD \(\xi \) parameter. The VaR dependence on the GPD \(\sigma \) parameter is essentially linear for a distribution body, but is better modelled by an exponential for the tail. Therefore, we propose the functional form, denoted by \(\widehat{\Gamma }\), for a tail GPD Surface \(\Gamma \). It comprises an exponential expression representing a plane, modified by a linear term in \(\sigma \). In Eq. 5, \(\mu \) and \(\nu \) are held constant, with \(\sigma \) and \(\nu \) as variables. Parameters a, b and c are to be determined by a non-linear fit.

$$\begin{aligned} \widehat{\Gamma }(\mu , \sigma , \xi , \nu , a,b,c) = c \xi e^{(a \xi + b \sigma )} \quad | \; \mu ,\nu \quad \{a,b,c \in \mathbb {R}\} \end{aligned}$$
(5)

If GPD parameter values that are appropriate to all of the data (i.e. not just the tail) are selected to define a GPD Surface, a more complex form for the GPD Surface is needed (Eq. 6). The ’all data’ fit, \(\widetilde{\Gamma }\), incorporates the following components (determined empirically):

  • VaR varies approximately exponentially with \(\xi \)

  • VaR varies approximately linearly with \(\sigma \)

  • A ’square root of sigma’ modifier to the exponential term to improve the fit

  • A ’power of \(\xi \)’ additive term to improve the fit further

$$\begin{aligned} \widetilde{\Gamma }(\mu , \sigma , \xi , \nu , a,b,c,d,n) = a \sigma + d \xi ^n + c \sqrt{\sigma } \sigma e^{b \xi } \quad | \; \mu ,\nu \quad \{a,b,c,d,n \in \mathbb {R}\} \end{aligned}$$
(6)

In practice, surface fits tend to underestimate VaR. Despite this shortcoming, they do pass goodness-of-fit tests, and serve the purpose of validating an empirical VaR value. We have noted that the VaR surfaces are very smooth, despite the stochastic calculation involved. That is, the numeric values of the partial derivatives of VaR with respect to parameters \(\sigma \) and \(\xi \) are small. Therefore we do not expect instability in the surface fit process. Nor do we expect convergence to a non-optimal solution.

4.3.1 Adjustment for small \(\xi \) and \(\sigma \)

The fit from Eq. 6 is relatively poor if \(\sigma \) and \(\xi \) are ’small’ ( \(0<\sigma \le 25, 0< \xi <0.5\)). An improvement is to approximate the exponential part in Eq. 5 to obtain a simpler form \(c \xi (a \xi + b \sigma )\). The fit is greatly improved if an addition additive linear term in \(\sigma \) is included. The following is applicable for a fit to such a restricted region.

$$\begin{aligned} \widehat{\Gamma }(\mu , \sigma , \xi , \nu , a,b,c) = c \xi (a \xi + b \sigma ) + d \sigma \quad | \; \mu ,\nu , \quad \{a,b,c,d \in \mathbb {R}\} \end{aligned}$$
(7)

See 5.2 for illustrations of fits in these cases.

4.4 Goodness-of-Fit Tests

Two GoF tests are used. To test distributional fits to data, the TNA-test ([19]) is effective for all distributions concerned. This test is a formalisation of a Q-Q plot: a comparison of the empirical quantiles of a distribution with the quantiles of a fitted instance of the distribution. It is robust with respect to both small and large populations. The latter requirement is particularly important, since most data sets used in this analysis are too large for a GoF test such as Kolmagorov-Smirnov or Anderson-Darling to be reliable if all data are used. Appendix B shows an outline of how it operates.

To test equality of surface fits, a \(\chi ^2\) test is used, using an empirical array of k ’observed’ coordinate pairs \(\{p_1, p_2,..., p_k\}\), and corresponding fitted values \(\{\bar{p}_1, \bar{p}_2,..., \bar{p}_k\}\) (the ’expected’ values).

4.5 Curvature Calculations

The GPD surface are shown in Fig. 1 indicates that there is a change in curvature throughout, but that it is very marked at certain values of \(\xi \) and \(\sigma \). Therefore, to try to find a boundary, beyond which the VaR is deemed to be ”inadmissible”, we look for changes in curvature on the GPD surface. We concentrate on Gauss Curvature and Mean Curvature. Those curvature metrics apply to a surface in which the dependent variable \(\phi \) (in our case VaR) is a function of \(\xi \) and \(\sigma \). In the case of GPD, the GPD-parameter \(\mu \) is fixed, as is the reference to a fixed frequency \(\nu \). The sticking point with this approach is that the functional form of \(VaR(\sigma , \xi )\) must be known.

Gauss Curvature (\(K_G\)) and Mean Curvature (\(K_M\)) are defined as in equations 8. Subscripts denote partial derivatives. A full discussion of their derivation may be found in, for example, Sochi [28] or Gray et al [12]. The latter has notes on implementation in Mathematica.

$$\begin{aligned} K_G= & {} \frac{\hat{\Gamma }_{\sigma \sigma } \hat{\Gamma }_{\xi \xi } - \hat{\Gamma }^{2}_{\sigma \xi } }{(1 + \hat{\Gamma }^{2}_{\sigma } + \hat{\Gamma }^{2}_{\xi } )^{2}} \end{aligned}$$
(8)
$$\begin{aligned} K_M= & {} \frac{\hat{\Gamma }_{\sigma \sigma } (1 + \hat{\Gamma }^{2}_{\xi }) + \hat{\Gamma }_{\xi \xi } (1 + \hat{\Gamma }^{2}_{\sigma }) - 2 \hat{\Gamma }_{\sigma } \hat{\Gamma }_{\xi } \hat{\Gamma }_{\sigma \xi } }{2 (1 + \hat{\Gamma }^{2}_{\sigma } + \hat{\Gamma }^{2}_{\xi } )^{3/2}} \end{aligned}$$
(9)

We have concentrated on the use of Gauss Curvature, as a map of \(K_G(\sigma ,\xi )\) for the GPD-Surfaces in this analysis corresponds more closely to the perceived Credible Regions. An advantage of using Mathematica is that derivatives of any function \(\phi \) can be formulated dynamically for any functional form for \(\phi \).

4.6 Credible Region

The credible region is the framework core. It is defined in terms of a contour plot of Gauss Curvature of a GPD-Surface defined by tail data only, with \(\xi \) as independent variable and \(\sigma \) as dependent variable. Using the tail data only makes it easy to select a contour that serves as a boundary for the credible region. There is some doubt as which contour to use if all data are used, but for tail data only, the curvature contours are very tightly spaced, and which to choose is not an issue. In order to select an appropriate contour, a range of contours from 3 to 25 was considered, and contour plots were produced for all surfaces. In all cases it was observed that as the number of contours increased, they became more tightly packed, particularly in the yellow-green/yellow/yellow-orange regions of the contour plot. Those regions corresponded to approximately the middle set of contours out of the total number of contours set. Therefore we anticipated the credible region boundary to correspond approximately to \(\frac{1}{2}\) to \(\frac{2}{3}\) of the number of contours set. We acknowledge that the choice of the number of contours is subjective. Having selected a contour, the equation of the selected contour can be calculated. The equation of the boundary contour, \(\sigma = Q(\xi )\), takes the (quadratic) form shown in equation 10. This formulation was noted in Sect. 4.1.

$$\begin{aligned} \sigma = Q(\xi ) = A \xi ^2 + B \xi + C, \quad A,B,C \in \mathbb {R}, \quad \xi , \sigma >0 \end{aligned}$$
(10)

To formally define the credible region, we locate the \(\xi \) coordinate, \(\xi _0\), which is the intersection of the selected contour with the \(\xi \) axis (i.e. solve \(\sigma = 0\) in equation 10). \(\xi _0\) in increased by 10%, giving \(\xi _0 \rightarrow \xi _B = 1.1 \xi _0\), to allow for a more generous credible region. That change results in a corresponding \(\sigma \) value on what is, in effect, a virtual contour corresponding to \(\xi _B\): \(\sigma _B = Q(\xi _B)\). This admits fewer false negatives. The credible region is then defined in Eq. 11. It corresponds to the yellow region in the example of Fig. 1.

$$\begin{aligned} {\varvec{R}} = \Big \{ (\sigma ,\xi ): \quad 0 < \sigma \le Q(\xi ) \, | \, \xi \in (0,\xi _B] \Big \} \end{aligned}$$
(11)

4.7 Acceptance Criterion and the Decision Process

An Acceptance Criterion is used to decide if the measured GPD tail parameters \(\sigma ^{\prime }\) and \(\xi ^{\prime }\) are ’acceptable’. If not, the distribution from which the tail was extracted is rejected. There are two parts to the Acceptance Criterion. In the first, \(\sigma ^{\prime }\) is compared with an evaluation of the credible region equation 10 using \(\xi ^{\prime }\). For \(\{\sigma ^{\prime }, \xi ^{\prime }\}\) to be within the credible region, \(\sigma ^{\prime } < S(\xi ^{\prime })\).

The second part of the Acceptance Criterion is to calculate VaR from the fitted surface \(V_B = \hat{\Gamma }(\sigma ^{\prime }, \xi ^{\prime })\) and compare the value obtained with a ”maximum possible” VaR, \(V_{max}\)=50 billion. This value is approx twice as large as the largest seen so far on the ORXFootnote 1 database, 27.2 billion. That eliminates VaR values that are ’clearly’ too high. The Acceptance Criterion is a conjunction of the two parts (equation 12).

$$\begin{aligned} \text{ ACCEPT }&\quad \text{ If } \quad \Big \{ \sigma ^{\prime }< S(\xi ^{\prime }) \quad | \quad 0< \xi ^{\prime } \le \xi _B \Big \} \wedge \Big \{ V_B < V_{max} \Big \}; \nonumber \\ \text{ REJECT } \quad&\text{ Otherwise } \end{aligned}$$
(12)

4.8 Validation

Validation is done using a data-based ’sense test’. Its purpose is to test whether or not the decisions made (Sect. 4.7 are reasonable.) The data components used are (referred to the entire data set X):

  • Maximum, \(X_{max}\)

  • Mean, \(\bar{X}\)

  • Distribution VaR, \(V_X\)

  • Annual Frequency, \(\nu _X\)

The validity test is, like the Acceptance Criterion, bipartite. In the first part, the ratio \(X_{max}/\bar{X}\) detects huge outliers and rejects them. A coefficient that determines an appropriate limit (with value 30) is conditioned on the first 50% of the data sets, such that ’clear’ huge VaRs are rejected. In the second, part, we assert that \(V_X\) should not be greater than it would be if every draw of a random sample from a fitted distribution resulted in an order of magnitude times the sum of the maximum datum, which is expressed as \(10 \times \nu _X \times X_{max} \). The validity condition is the conjunction of these two conditions.

$$\begin{aligned} \text{ VALID } \quad&\text{ If } \quad \Big \{ \frac{X_{max}}{\bar{X}} < 30 \Big \} \wedge \Big \{ V_X \le 10 \,\, \nu _X \,\, X_{max} \Big \}; \nonumber \\ \text{ INVALID }\quad&\text{ Otherwise } \end{aligned}$$
(13)

In formulating this validation criterion, we considered that simplicity was paramount. Therefore we have not used more complicated methods based on approximate sums of GPD random variables, such as those due to Zaliapin et al [30] or van Zyl [31].

5 Results

We first discuss the data used, and then illustrate some of the graphical constructs that are central to our proposed framework. Finally, numerical results are presented,

5.1 Data

Random samples were drawn from distributions that are commonly encountered in operational risk. Having determined suitable parameter ranges, uniform random samples were generated to determine specific parameter values. The sample size was also randomly selected in the range 100 to 1000. Distributions that have no native implementation in Mathematica were implemented using methods available in R packages. In particular, package GK was used for the G-and-H distribution [26], and package VGAM for the Frechet and Gumbel distributions [29]. In Mathematica, the Burr (Type VII) distribution is known as the Singh Maddala distribution, and the GPD is known as the Pareto Pickands distribution.

Samples were combined in random mixtures in order to make the data more realistic. A best fit distribution for each of the the resulting 201 data sets was found, mostly using the in-built non-linear distribution fitting methods available in Mathematica, using maximum likelihood. In cases where a straightforward method to estimate initial parameters was unclear, random samples were generated within given reasonable ranges, and were tested for optimal GoF using the TNA test [19]. Random generation of parameters in this way has been shown to be a fast and efficient technique in difficult cases, such as the Tukey G and H distribution [3]. Table 2 shows the fitted distributions used, with the number of data sets fitted to each of them.

Each data set is tied to a nominal time window \(Y=5\) years, so that the annual frequency, \(\nu \), is the ratio of the number of elements in a data set, \(\#(X)\), and the number of years.

$$\begin{aligned} \nu = \frac{\#(X)}{Y} = \frac{\#(X)}{5} \end{aligned}$$
(14)
Table 2 Best fit distributions, 201 data sets

5.2 GPD-Surface Illustrations

Fig. 3 (parts (a)-(d)) shows four views of the same GPD-Surface. The flat areas for small \(\xi \) and \(\sigma \) are clear, as is the progression to the steepest point (high \(\xi \) and high \(\sigma \)). The thinness of the surface indicates an accurate VaR estimation.

Fig. 3
figure 3figure 3

Views of a GPD-Surface, frequency 25 (parts (a) and (b)). Views of a GPD-Surface, frequency 25 (parts (c) and (d))

Figure 4 shows a group of layered surfaces, with frequencies 1, 2 and 5. It illustrates the fact that all GPD-Surfaces considered in this analysis have the same shape: a largely flat area leading to an abrupt gradient change for large \(\xi \). The layers do not intersect. The graphic is positioned to show the divergence of the surfaces when \(\xi \) and \(\sigma \) are both large.

Fig. 4
figure 4

GPD Surfaces: Frequencies 1,2,5, low values for \(\sigma \) and \(\xi \). They are non-intersecting, and all have the same shape profile. The Credible Regions are the violet-blue-cyan portions of the surface

5.3 GPD-Surface Data Fits

Fig. 5 shows a typical surface (in ’rainbow colours’) with its best-fit parametrically-defined surface of the form \(\widehat{\Gamma }_i(\sigma , \xi ) = c e^{(a \xi + b \sigma )} \sigma \) (in grey/black). When the grey/black regions are visible, there is an over-fit, which is desirable when assessing maximum ’acceptable’ VaR. The goodness-of-fit test for surface equality result was \(\chi ^2 = 5.48\), with a p-value of 0.963. This result shows that the null hypothesis that the fitted and empirical surfaces have the same distribution is not rejected at the 5% level. In this case the equation of the fitted surface was \(\widehat{\Gamma }_i(\sigma , \xi ) = 11.566 e^{(8.376 \xi + 0.0344 \sigma )} \sigma \quad (\mu =1000; \nu =10)\). Fitted surfaces at other frequencies are similar, and an ’under-fit’ is a common feature. The goodness-of-fit conclusion (that the null hypothesis is rejected) is also the same for frequencies ranging from 1 to 40.

Figure 6 shows a close-up view of Fig. 5, restricted to a region of small \(\xi \) and small \(\sigma \). See Sect. 4.3.1 for details of the fit. In this case \(0< \sigma \le 25 \times 10^6, 0< \xi < 0.5\). A notable point is that the fitted surface represents greater VaR values than the corresponding empirical values. Consequently, upper VaR limits are effectively more stringent. That is not a problem, as upper bounds are high anyway.

Fig. 5
figure 5

Surface fit, annual frequency 10. The black/grey regions show where VaR for the fitted surface exceeds the actual VaR (shown in ’rainbow’ colours)

Fig. 6
figure 6

Surface fit, annual frequency 10. Close-up of the surface in Fig. 5, restricted to the region \(0< \sigma \le 25 \times 10^6, 0< \xi < 0.5\)

5.4 Credible Region Results

Figure 7 (left hand) shows a contour map of curvatures for all data (i.e. body plus tail) for a typical GPD-Surface (in this case for frequency 10), derived using the method of Sect. 4.5. The illustration shows a gradual progression from the blue credible region (small \(\xi \) and \(\sigma \)) into the non-credible region (red). The blue portion of the contour map is the curvature equivalent of a ’flat’ region for a GPD-Surface such as the one in Fig. 1. Typically,the Gauss Curvature and Mean Curvature plots are almost identical. In both cases the choice of contour that could formally define the credible region (Sect. 4.6) is has to be made. We have selected the contour that intersects the \(\sigma =0\) axis at the maximum \(\xi \)-value where is contour is defined, using 6 contours. The coordinates of that contour can be extracted, and a quadratic \(Q(\xi , \sigma )\) can be fitted to them. The credible region is the region bounded by \( { \{Q(\xi , \sigma ), \sigma =0, \xi =0 \}}\). This defines a region that excludes only the red portion in Fig. 1, in order to avoid too many false negative assessments.

Figure 7 (right hand) shows the equivalent Gauss Curvature contour map for the tail data only. There is a marked distinction between them. The ’tail’ plot has a sharp boundary for the credible region, resulting from closely-packed contour lines. Furthermore, the close-packing of the Gauss contours indicates that the choice of contour to define the credible region is not important in these cases. The numerical results in Sects. 5.6 and 5.7 confirm this view. It also seems reasonable to model the boundary of the credible region by a quadratic.

Fig. 7
figure 7

Left: Gauss Curvature Contour plot for all data (body plus tail), for a GPD-Surface at frequency 10. Right: Gauss Curvature Contour plot restricted to tail data only, also at frequency 10. The sharp transition from the red to the blue region in the right-hand illustration is a strong indicator of the credible region boundary. Non-red parts indicate the credible region

5.5 Validation Method Results

Generally, the success rate drops as the upper limit for acceptable VaR increases. Overall, the success rate varies between 75% and 85%, and within that range, the dependence on the tail length is quite variable. There is an optimal range at 5–10%, and another at about 30%. The latter tail length is more suitable for smaller data sets. In all cases, variation of the success rate with the number of contours varies consistently, but in a way that shows no clear trend.

The principal validation method, a comparison of the maximum datum and mean of all empirical data (not just the tail), was designed to eliminate cases where the measured VaR (for all data) is ’clearly’ excessive. Such cases frequently have VaR values in multi-billions, which is not consistent with a maximum value in the order of even a few billion.

5.6 Validation Details

Overall, 48.3% of data sets were deemed by the decision procedure (Sect. 4.7) to have ’unacceptable’ VaR values. This number is high, and is difficult to verify with data from practice between ’rejections’ are not routinely recorded. However, many high VaR results are rejected informally, so the percentage rejected by the objective decision process is not so surprising. Table 3 shows a detailed analysis of the VaR variation with both the number of contours and the tail length, expressed as a percentage of the total data count. Since small data tails are more important than larger ones (due to greater applicability of the GPD fit in those regions ), we settle on 6 contours as the preferred number to produces optimal percentages of correct validations. Eight per cent of the rejections were rejected due to the criterion \(V_B < V_{max}\) in Eq. 12.

Table 3 Proportion of correct validations if decisions are based on comparison of a GPD-Surfaces, using all data

The surface graphic in Fig. 8 illustrates the validation success rates plotted against the number of contours in the credible region, and the tail length, expressed as a % of the total data count. There is an optimal ridge at approximately 10% tail, and a descent to a worst case success rate for high tail percentages.

Fig. 8
figure 8

Validation Success surface, showing optimal ridge at approximately 10% tail, and worst cases for 5 contours and high tail percentages

5.7 Validation Results for a Restricted ’Small \(\sigma , \xi \)’ Region

The feasible region is of particular interest, because we expect ’acceptable’ VaR values to originate from it. ’Small’ values of the GPD parameters \(\sigma \) and \(\xi \) generally indicate that a fitted distribution should not be rejected. Within the feasible region, the relationship between VaR and \(\sigma \) and \(\xi \) is essentially linear. We restrict the credible region, R to \(0 < \sigma \le 25 \times 10^6\) and \(0< \xi < 0.5\). The validations are restricted to those cases where the fitted GPD \(\sigma \) and \(\xi \) parameters fall in the restricted region. Others are ignored since the linearised fit in the restricted region is not applicable elsewhere. The results are noted in Table 4. Overall, they are very similar to Table 3, with marginally improved results. Similarly, the validation surface plot looks very similar to Fig. 8.

Table 4 Proportion of correct validations if decisions are based on a comparison of GPD-Surfaces with data in a restricted region, \(0 < \sigma \le 25 \times 10^6\) with \(0< \xi < 0.5\)

Figure 9 shows a surface plot of Figure 9 shows a surface plot of the data in Table 4. The optimal peak at approximately 10% tail is clear. There is minimal variation with the number of contours, probably due to the near planar nature of the surface.

Fig. 9
figure 9

Validation Success surface for a restricted ’small \(\sigma , \xi \)’ range, showing optimal ridge at approximately 10% tail, and almost no variation with the number of contours

5.7.1 Validation ’Sense Checks’

In practice, the simplest path for practitioners is to apply the ’sense check’, which implements the validation criteria. The only essential requirements are to select the optimal tail percentage, calculate the necessary descriptive statistics (maximum and mean and frequency), calculate VaR (for all the data), and then apply the test in Eq. 13. Subjectively, best fits that are not LogNormal, LogNormal Mixture, or LogNormal Gamma Mixture distributions might be treated with suspicion. In particular, GPD and Extreme Value distributions often produce ’excessive’ VaR. Table 5 shows some examples of ’sense checks’, with comments to indicate what features in the data a practitioner might look for to make a subjective decision.

Table 5 Sense check

Comments

  1. 1.

    VaR and Maximum are both huge

  2. 2.

    VaR and maximum are both low

  3. 3.

    VaR is reasonable given maximum value

  4. 4.

    VaR is large but not unreasonably so, but maximum is not excessive: doubtful decision

  5. 5.

    VaR seems reasonable given maximum value but low mean triggered a likely incorrect decision

  •    Notes 1,2,3 are subjective ’correct decision’ verdicts

  •    Note 4 is a subjective ’doubtful decision’ verdict

  •    Note 5 is a subjective ’almost certainly wrong decision’ verdict

6 Discussion

An essential component of the VaR calculation is to generate data that have values greater than the largest observed datum. If VaR is calculated using the empirical data only, the result represents a minimum value, and does not measure a largest ”acceptable” value. The numerical values of draws from a random sample from the fitted distribution that exceed the maximum observed value depend on two factors. The first is the fitted distribution from which that sample was drawn. The second is its curvature for large ordinates. It is not possible to calculate a generally applicable expression for curvature, since that depends on the parameters of the fitted distribution.

The computations involved in modelling credibility using the method described are considerable, as is the time taken to do them. The overall task can be partitioned into two distinct stages. Producing the GPD surfaces is a one-off process. Once done, the ordinates on each surface can be stored and used as required. If a high degree of accuracy is needed, each ordinate on each surface takes 2–3 min to produce (using an i7 Windows PC with 48MB RAM), since each complete evaluation requires in excess of 1 million Monte Carlo cycles. With 12 GPD surfaces, each with 110 ordinates, the total production time is approximately 44 h processor time. The second stage, testing a target VaR value against those surfaces is, happily, quick. The VaR value to be tested still takes 2–3 min, but the subsequent processes (calculating a corresponding surface-based VaR) value and comparison with the target take a few seconds. However, the details of the decision process are complicated, and are unlikely to be easily understood by non-technical risk managers. Therefore we would recommend that the entire calculation process is embedded in a ’black-box’ application (written in C++ or similar), accompanied by a detailed explanation of how it works. The VaR surfaces can be held in a database.

With the passage of time, operational risk loss severity, and therefore VaR values, would be expected to rise in the same way that prices, wages etc. might change. In practice, operational risk losses have been remarkably stable over the past 12 years, although financial crime has been an increasing problem in recent years. We would expect the proposed credibility method to be stable with respect to such changes, provided that the reference data (i.e. the VaR surfaces) are still commensurate with the VaR levels encountered in practice. VaR surfaces can be recalibrated periodically (perhaps every two or three years) by repeating the entire GPD-PREP process in Sect. 4.2 with updated reference samples.

7 Conclusion

Practitioners who encounter a calculated VaR value that they consider ”unacceptably large” are faced with a dilemma. A huge capital requirement is unsustainable. It inhibits lending, which is how a retail bank makes money. A subjective decision to discard the offending distribution and seek an alternative has to be made. The key points to consider are:

  • Is the VaR value consistent with the loss sum? A large VaR should follow from a large loss sum, but not from a small loss sum.

  • Is the VaR value consistent with previous similar calculations?

  • Has there been a significant change in the data since previous calculations were done?

  • Can distribution changes be made, such that the VaR value becomes acceptable?

The analysis presented in this paper is an attempt answer these questions in an objective way. The decision rule in Eq. 12 should not be seen as firm. Rather, it should be regarded as a guideline.