1 Introduction

From its very beginnings in the 1980s Value-at-Risk (VaR) as a measure of market risk has received widespread acceptance both amongst industry and regulators on account of its ease of calculation and intuitive interpretation. In its most basic form, VaR provides the worst possible loss at a given confidence level over a specific horizon. The main drawback of VaR, other than, that it is a single number is that there is no one accepted way of calculating it. It is possible that the use of different models will lead to different VaRs and that this could be very costly to financial institutions. In that, if VaR is over estimated, then the institution is tying of capital which it could use elsewhere for a higher return; or if it under estimates, then the firm is severely exposed to market down turns as it has not set aside correct amount of capital. The financial crisis of 2007–2008 has illustrated the drawbacks in stark terms of the VaR methodology and this has resulted in debate amongst academics, regulators and market practitioners. As part of this debate, the related measure to VaR, the expected shortfall (ES) is now given more prominence under Basel III.

Given the underlying nature of equity returns, forecasting of volatility is critical to the success of VaR models Siu (2018) and Chiou et al. (2009) amongst others. The volatility clustering resulting from infrequent large jump has been modelled using GARCH type process of Bollerslev (1986). This basic GARCH model leads to the development of more advanced models such as EGARCH, NGARCH, which are explicitly able to incorporate the skewness and excess kurtosis that are observed in equity returns.

To calculate VaR and ES, GARCH models need to be enhanced with more complex distributions. One such approach has been the use of the family of t-distributions. The student t-distributions have played particularly significant role in financial research as models for the distribution of heavy-tailed phenomena such as financial markets data. However, student t-distribution that allows for heavy tails than the normal, but assumes that the distribution is symmetric around zero. Huang and Lin (2014) compare the forecasting performance of several VaR models. Lin et al (2006) use historical simulation to estimate portfolio VaR. Baixaali and Alvarez (2006) consider the impact of excess kurtosis on VaR. Angelidis et al. (2007) examine different weighing schemes for robust VaR estimation. Wong et al (2012) model tail risk beyond VaR. The comparison focuses on the difference between normal distribution and student t-distribution. Mogel and Auer (2018) imply student t and extreme value theory to compute Value at Risk and compare them with historical simulation other approaches. Their results suggest that historical simulation outperforms EVT-based approach.

The student’s t-distribution can permit for kurtosis in the conditional distribution but not for skewness. Hansen (1994) was the first to propose a generalization of student’s t-distribution that allowed modelling skewness in conditional distributions of financial returns.

In this study we compare the performance of seven different t-distributions. The first is the standardized t-distribution (ST) used by Bollerslev (1987). The second is the Twin t-distribution (TT) of Baker and Jackson (2014). This distribution is heavy-tailed like a ST distribution but closer to the normality at the central part of the curve. The third distribution is the Generalized t-distribution (GAT) of Baker (2016). This distribution generalizes the t-distribution through two types of skewness. Fourth and fifth distributions are the Asymmetric exponential power distribution (AEP) and its special case (SEP) of Zhu and Zinde-Walsh (2009). The sixth and seventh distributions are the Asymmetric student t-distribution (AST) and Skewed student t-distribution (SST) respectively of Zhu and Galbraith (2010).

Our analysis focuses on datasets of five major stock indices covering S&P500, FTSE100, NASDAQ100, NIKKEI225 and DAX30 for the period 1995–2014. Calculation of 1-day ahead ES follows a two-stage procedure. In the first step, an asymmetric GARCH-type volatility model is fitted to the historical data by maximum likelihood estimation. From this model, the so-called standardized residuals are extracted. The asymmetric GARCH-type model is used to calculate 1-step predictions of conditional mean and conditional standard deviation. In the second step, various long tail and asymmetric distributions are applied to the standardized residuals and calculate with estimated parameters of distributions. Finally, 1 day ahead conditional expected shortfall \(ES_{t + 1}\) is calculated.

For the situation where the variance is time varying, going from 1-day-ahead to h-days-ahead expected shortfall is not so straightforward. As in the case of GARCH, scaling by the horizon h is not attainable as variance mean revert. Additionally, the returns over the next h days are not normally distributed. To overcome this difficulty in calculating VaR and ES we use Monte Carlo simulation to generate the returns h- ahead.

We find overall EGARCH (1,1) provides the best fit for volatility for the indices considered in this study. We find substantial evidence in the improvement of our results with the use of EGARCH(1,1) combined with GAT and EGARCH(1,1) combined with TTD. When we compare the GAT distribution proposed by Baker (2016) with AST distribution proposed by Zhu and Galbraith (2010) we find GAT outperforms AST by providing better fit to financial returns and more accurate forecast of the ES. As the empirical distribution of the financial returns has been reported to be asymmetric and shows a significant excess of kurtosis (Abad et al. 2014). The longer period ES forecasts is estimated using Monte Carlo Simulation with GAT, AEPD, SEPD, AST, SST, ST and TT as standardized distributions of returns for world’s major five stock indices (S&P500, FTSE100, NASDAQ100, NIKKEI225 and DAX30).

The contribution of this paper is as follows. First, our study provides further support for the usefulness and superiority of fat tailed distributions especially asymmetric distributions in the major stock markets. Second, it proposes the use of fat tailed distribution to measure financial risk for a longer horizon. In contrast to the current literature that mainly focuses on the 1 day ahead ES, our approach considers the usefulness of fat tail distribution for calculation of ES beyond 1-day. To the best of our knowledge, our research is the first to consider two new distributions and compare them with other previous distributions for ES calculation.

The remainder of this paper is organized as follow: Sect. 2 addresses the methodological framework. Results are discussed in Sect. 3. Section 4 concludes the findings.

2 Methodological framework

Since its inception in the 1980s, VaR and its extension the ES have been the market risk measure of choice both for industry and regulators. To calculate market risk, we follow the risk measure of Dowd et al. (2008) and define \(M_{\varphi }\) as follows:

$$M_{\varphi } = \mathop \int \limits_{0}^{1} \varphi \left( p \right)q_{p} dp$$
(1)

where \(q_{p}\) is the p loss quintile, \(\varphi \left( p \right)\) is a weighting function defined over the full range of cumulative probabilities \(p \in \left[ {0,1} \right]\) and \(M_{\varphi }\) is the class of quantile-based risk measures.

As noted by Dowd et al. (2008) VaR and ES constitute two well-known members of this class. The VaR at confidence level \(\alpha\) with \(R_{t}\) as the index return in period t and \({{\Psi }}_{t - 1}\) represents the information available at time \(t - 1\) is defined as follows:

$$VaR_{\alpha } = q_{\alpha } \left( {R_{t} |\Psi _{t - 1} } \right)$$
(2)

Moreover, each individual risk measure is characterised by its individual weighting function \(\varphi \left( p \right)\). The weighting function for VaR is a Dirac delta function that gives the outcome \(\left( {p = \alpha } \right)\) an infinite weight and zero weight for every other outcome.

The ES at confidence level \(\alpha\) is the average of the worst \(1 - \alpha\) losses, which is defined as follows:

$$ES_{a} = \frac{1}{1 - a}\mathop \int \limits_{a}^{1} q_{p} dp$$
(3)

The weighting function for ES gives all tail quantiles the same weight of \(1/1 - a\) and the non-tail quantiles zero weight.

We define an asset’s return process at time t as follows:

$$R_{t} = \mu_{t} + \sigma_{t} z_{t}$$
(4)

where \({{\upsigma }}_{t}\) is the conditional volatility, \({{\upmu }}_{t}\) is the conditional mean of returns and \(z_{t}\) is an independent and identically distributed random variable that follows alternative t-distributions.

The key challenge in calculating VaR and other market risk measures is the modelling and estimation of the conditional volatility that incorporates the observed characteristics of share price and index returns such as volatility clustering, asymmetry and long memory. Since its introduction by Bollerslev (1986), the GARCH approach to modelling volatility has become popular, resulting in a wide range of alternative GARCH specifications being proposed.

2.1 VaR and ES calculation over single period

Following Christoffersen (2012) the calculation of VaR and ES follows a two-stage procedure:

  1. 1.

    A GARCH-type volatility model is fitted to the historical data by maximum likelihood estimation (ML). From this model, the so-called standardized residuals are extracted. The GARCH-type model is used to calculate 1-step predictions of conditional mean \(\left( {\mu_{t + 1} } \right)\) and conditional standard deviation \(\left( {\sigma_{t + 1} } \right)\).

  2. 2.

    Various long tail and asymmetric distributions are applied to the standardized residuals to calculate \(F^{ - 1} \left( p \right)\) with estimated parameters of the distributions. Finally, the 1-day ahead conditional \(VaR_{t + 1}^{p}\) and conditional \(ES_{t + 1}^{p}\) are calculated based on the following formulae:

    $$VaR_{t + 1}^{p} = - \mu_{t + 1} - \sigma_{t + 1} F^{ - 1} \left( p \right)$$
    (5)
    $$ES_{t + 1}^{p} = - E_{t} \left[ {R_{t + 1} |R_{t + 1} < - VaR_{t + 1}^{p} } \right]$$
    (6)

2.1.1 Standardized t-distribution

Bollerslev (1987) used the standardized t-distribution with \(v\) > 2. The standardized t-distribution density with \(v\) > 2 is then:

$$f_{t} \left( {z,v} \right) = \frac{{\varGamma (\frac{1}{2}\left( {v + 1)} \right)}}{{\varGamma \left( {v/2} \right)\sqrt {\pi \left( {v - 2} \right)} }} \left( {1 + \frac{{z^{2} }}{v - 2}} \right)^{{ - \left( {\frac{1 + v}{2}} \right)}}$$
(7)

where \(\varGamma \left( v \right) = \int_{0}^{\infty } {e^{ - x} } x^{v - 1} dx\) is the gamma function. \(v\) is the parameter that describe the thickness of tails. Corresponding conditional \(VaR_{t + 1}^{p}\) with \(t_{p}^{ - 1}\) as the pth quantile of student t-distribution and conditional \(ES_{t + 1}^{p}\) are:

$$\begin{aligned} VaR_{t + 1}^{p} & = - \mu_{t + 1} - \sigma_{t + 1} \sqrt {\frac{v - 2}{v}} t_{p}^{ - 1} \left( v \right) \\ ES_{t + 1}^{p} & = - \mu_{t + 1} - \sigma_{t + 1} ES_{t\left( v \right)} \left( p \right) \\ \end{aligned}$$
(8)

where

$$\begin{aligned} & ES_{t\left( v \right)} \left( p \right) = \frac{C\left( v \right)}{p}\left[ {\left[ {1 + \frac{1}{v - 2}t_{p}^{ - 1} \left( v \right)} \right]^{{\frac{1 - v}{2}}} \frac{v - 2}{1 - v}} \right]\\ &\quad {\text{with}}\, C\left( v \right) = \frac{{\varGamma \left( {\left( {v + 1} \right)/2} \right)}}{{\varGamma \left( {v/2} \right)\sqrt {\pi \left( {v - 2} \right)} }} \end{aligned}$$

The main drawback of the student t-distribution is that it is symmetrical while financial time series can be skewed.

2.1.2 Twin t-distribution (TT)

Baker and Jackson (2014) applied Johnson’s transformation to statistical modelling and construct a new long tailed distribution that is like the t-distribution. The t like distribution is useful for fitting data, as it is more normal in the body of the distribution but has the same power law tail behavior.

The probability density function is:

$$f\left( {x|v} \right) = \frac{{2^{5/2} \varGamma \left( {v/4 + 3/2} \right)}}{{\sqrt {\pi v} \varGamma \left( {v/4} \right)\left( {v + 1} \right)}}\left( {x^{2} /v + \sqrt {1 + \left( {1 + \left( {x^{2} /v} \right)} \right)^{2} } } \right)^{{ - \left( {v + 1} \right)/2}}$$
(9)

As \(v \to \infty\) the distribution becomes standard normal. The distribution function for \(x > 0\) is:

$$F_{TT} \left( x \right) = \frac{1}{2} + \frac{{2^{3/2} x\left( {S + C} \right)^{{ - \left( {v + 1} \right)/2}} }}{{\sqrt v \left( {v + 1} \right)B\left( {v/4,3/2} \right)}} + \left( {\frac{1}{2}} \right)I\left( {1 - \left( {C\left( x \right) + S\left( x \right)} \right)^{ - 2} ;3/2,v/4 } \right)$$
(10)

where \(S = \frac{{x^{2} }}{v}, C = \sqrt {1 + S^{2} }\), B is the beta function and I the regularized incomplete beta function.

Conditional \(VaR_{t + 1}^{p}\) and \(ES_{t + 1}^{p}\) of TT are:

$$\begin{aligned} VaR_{t + 1}^{p} & = - \mu_{t + 1} - \sigma_{t + 1} VaR_{TT} \left( {p\left| v \right.} \right) \\ ES_{t + 1}^{p} & = - \mu_{t + 1} - \sigma_{t + 1} ES_{TT} \left( {p\left| v \right.} \right) \\ \end{aligned}$$
(11)

where

$$VaR_{TT} \left( {p\left| v \right.} \right) = F_{TT}^{ - 1} \left( {p\left| v \right.} \right)$$

\(F_{TT }^{ - 1}\) is the inverse of cdf \(F_{TT}\).

$$ES_{TT} \left( {p\left| v \right.} \right) = - E_{t} \left[ {R_{t + 1} |R_{t + 1} < - VaR_{TT} \left( {p\left| v \right.} \right)} \right]$$

2.1.3 Generalized asymmetric t-distribution (GAT)

A 6-parameter asymmetric fat-tailed distribution (GAT) is proposed by Baker (2016). The pdf of the GAT is:

$$\begin{aligned} & f_{GAT} \left( {x\left| {\mu ,\phi ,\alpha ,r,c,v} \right.} \right) \\ & \quad = \frac{{\alpha \left( {1 + r^{2} } \right)}}{r\phi }\frac{{\left\{ {\left( {cg\left( {\left( {x - \mu } \right)/\phi } \right)} \right)^{\alpha r} + \left( {cg\left( {\left( {x - \mu } \right)/\phi } \right)} \right)^{ - \alpha /r} } \right\}^{ - v/\alpha } }}{{B\left( {\frac{v/a}{{1 + r^{2} }},\frac{{r^{2} v/\alpha }}{{1 + r^{2} }}} \right)}}\left( {1 + \left( {\left( {x - \mu } \right)/\phi } \right)^{2} } \right)^{ - 1/2} \\ \end{aligned}$$
(12)

where B is the beta function, ν > 0 controls tail power, µ is a centre of location (not necessarily the mean), ϕ > 0 is a measure of scale (but not the variance, which may not exist), r > 0 controls tail power asymmetry, c > 0 controls the scale asymmetry, and α > 0 controls how early ‘tail behaviour’ is apparent.

The cdf of the GAT distribution is:

$$F_{GAT} \left( {x\left| {\mu ,\phi ,\alpha ,r,c,v} \right.} \right) = B\left( {\frac{v}{{\alpha \left( {1 + r^{2} } \right)}},\frac{{vr^{2} }}{{\alpha \left( {1 + r^{2} } \right)}};q\left( x \right)} \right)$$
(13)

where

$$\begin{aligned} & q\left( x \right) \\ & \quad = \frac{1}{{1 + c^{{ - \alpha \left( {1 + r^{2} } \right)/r}} \left\{ {\frac{{\left( {x - \mu } \right)}}{\phi } + \sqrt {1 + \frac{{\left( {x - \mu } \right)^{2} }}{{\phi^{2} }}} } \right\} - \alpha \left( {1 + r^{2} } \right)/r}} \\ \end{aligned}$$

Conditional \(VaR_{t + 1}^{p}\) and \(ES_{t + 1}^{p}\) of GAT are:

$$\begin{aligned} VaR_{t + 1}^{p} & = - \mu_{t + 1} - \sigma_{t + 1} VaR_{GAT} \left( {p\left| {\mu ,\phi ,\alpha ,r,c,v} \right.} \right) \\ ES_{t + 1}^{p} & = - \mu_{t + 1} - \sigma_{t + 1} ES_{GAT} \left( {p\left| {\mu ,\phi ,\alpha ,r,c,v} \right.} \right) \\ \end{aligned}$$
(14)

where

$$VaR_{GAT} \left( {p\left| {\mu ,\phi ,\alpha ,r,c,v} \right.} \right) = F_{GAT}^{ - 1} \left( {p\left| {\mu ,\phi ,\alpha ,r,c,v} \right.} \right)$$

and \(F_{GAT}^{ - 1}\) is the inverse of cdf \(F_{GAT}\).

$$ES_{GAT} \left( {p\left| {\mu ,\phi ,\alpha ,r,c,v} \right.} \right) = - E_{t} \left[ {R_{t + 1} |R_{t + 1} < - VaR_{GAT} \left( {p\left| {\mu ,\phi ,\alpha ,r,c,v} \right.} \right)} \right]$$

2.1.4 The asymmetric exponential power distribution (AEP)

The asymmetric exponential power distribution is proposed by Zhu and Zinde-Walsh (2009).

$$f_{AEP} \left( {x\left| \beta \right.} \right) = \left\{ {\begin{array}{*{20}l} {\left( {\frac{\alpha }{{\alpha^{*} }}} \right)\frac{1}{\sigma }K_{EP} \left( {d_{1} } \right)exp\left( { - \frac{1}{{d_{1} }}\left| {\frac{x - \mu }{{2\alpha^{*} \sigma }}} \right|^{{d_{1} }} } \right), } \hfill & {x \le \mu } \hfill \\ {\left( {\frac{1 - \alpha }{{1 - \alpha^{*} }}} \right)\frac{1}{\sigma }K_{EP} \left( {d_{2} } \right)exp\left( { - \frac{1}{{d_{2} }}\left| {\frac{x - \mu }{{2\left( {1 - \alpha^{*} } \right)\sigma }}} \right|^{{d_{2} }} } \right),} \hfill & {x > \mu } \hfill \\ \end{array} } \right.$$
(15)

where \(\beta = \left( {\alpha ,d_{1} ,d_{2} ,\mu ,\sigma } \right)\) is parameter vector, \(\mu \in R \,{\text{and}} \,\sigma > 0\) is still location and scale parameters respectively,\(\alpha \in \left( {0,1} \right)\) is skewness parameter.\(d_{1} > 0\) and \(d_{2} > 0\) are left and right tail parameters respectively, \(K_{EP} \left( d \right)\) is the normalizing constant is:

$$K_{EP} \left( d \right) \equiv \frac{1}{{\left[ {2d^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 d}}\right.\kern-0pt} \!\lower0.7ex\hbox{$d$}}}} \varGamma \left( {1 + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 d}}\right.\kern-0pt} \!\lower0.7ex\hbox{$d$}}} \right)} \right]}}$$

and \(\alpha^{*}\) is:

$$\alpha^{*} = \alpha K_{EP} \left( {d_{1} } \right)/\left[ {\alpha K_{EP} \left( {d_{1} } \right) + } \right]\left( {1 - \alpha } \right)K_{EP} \left( {d_{2} } \right)$$

Note that:

$$\left( {\frac{\alpha }{{\alpha^{*} }}} \right)K_{EP} \left( {d_{1} } \right) = \left( {\frac{1 - \alpha }{{1 - \alpha^{*} }}} \right)K_{EP} \left( {d_{1} } \right) = \left[ {\alpha K_{EP} \left( {d_{1} } \right) + } \right]\left( {1 - \alpha } \right)K_{EP} \left( {d_{2} } \right)$$

The AEP density function is still continuous at every point and unimodal with mode at \(\mu\). The parameter \(\alpha^{*}\) in the AEP density provides scale adjustments respectively to the left and right parts of the density to ensure continuity of the density under changes of shape parameters \(\left( {\alpha ,d,d_{2} } \right)\).

The VaR and ES is computed analytically for the AEP distribution in Zhu and Galbraith (2011).

Conditional \(VaR_{t + 1}^{p}\) conditional \(ES_{t + 1}^{p}\) of AEP are:

$$\begin{aligned} VaR_{t + 1}^{p} & = - \mu_{t + 1} - \sigma_{t + 1} VaR_{AEP} \left( {p\left| {\alpha ,d_{1} ,d_{2} } \right.} \right) \\ ES_{t + 1}^{p} & = - \mu_{t + 1} - \sigma_{t + 1} ES_{AEP} \left( {p\left| {\alpha ,d_{1} ,d_{2} } \right.} \right) \\ \end{aligned}$$
(16)

where

$$VaR_{AEP} \left( {p\left| {\alpha ,d_{1} ,d_{2} } \right.} \right) = \left\{ {\begin{array}{*{20}l} { - 2\alpha^{*} \left[ {d_{1} Q^{ - 1} \left( {\frac{p}{\alpha },\frac{1}{{d_{1} }}} \right)} \right]^{{\frac{1}{{d_{1} }}}} ,} \hfill & {p \le \alpha } \hfill \\ {2\left( {1 - \alpha^{*} } \right)\left[ {d_{2} Q^{ - 1} \left( {\frac{1 - p}{1 - \alpha },\frac{1}{{d_{2} }}} \right)} \right]^{{1/d_{2} }} ,} \hfill & {p > \alpha } \hfill \\ \end{array} } \right.$$

\(Q\left( {\alpha ,x} \right)\) denotes the regularized complementary incomplete gamma function:

$$Q\left( {\alpha ,x} \right) = \mathop \int \limits_{x}^{\infty } t^{\alpha - 1} exp\left( { - t} \right)dt/\varGamma \left( \alpha \right)$$

\(Q^{ - 1}\) denotes the inverse of \(Q\left( {\alpha ,x} \right)\) and \(\varGamma\) is gamma function:

$$\begin{aligned} & ES_{AEP} \left( {p\left| {\alpha ,d_{1} ,d_{2} } \right.} \right) \\ & \quad = - \frac{{2\alpha^{*} }}{p}\mathop \int \limits_{0}^{p} \left[ {d_{1} Q^{ - 1} \left( {\frac{p}{\alpha },\frac{1}{{d_{1} }}} \right)} \right]^{{\frac{1}{{d_{1} }}}} dp + \frac{{2\left( {1 - \alpha^{*} } \right)}}{p}\mathop \int \limits_{0}^{p} \left[ {d_{2} Q^{ - 1} \left( {\frac{1 - p}{1 - \alpha },\frac{1}{{d_{2} }}} \right)} \right]^{{\frac{1}{{d_{2} }}}} dp \\ \end{aligned}$$
(17)

2.1.5 Skewed exponential power distribution (SEP)

Skewed is the special case of AEP proposed by Zhu and Zinde-Walsh (2009), if \(d_{2} = d_{1} = d\) implying \(\alpha = \alpha^{*}\) The AEP reduced to SEP:

$$f_{SEP} \left( {x\left| \beta \right.} \right) = \left\{ {\begin{array}{*{20}l} {\frac{1}{\sigma }K_{EP} \left( d \right)exp\left( { - \frac{1}{d}\left| {\frac{x - \mu }{2\alpha \sigma }} \right|^{d} } \right),} \hfill & {x \le \mu } \hfill \\ {\frac{1}{\sigma }K_{EP} \left( d \right)exp\left( { - \frac{1}{d}\left| {\frac{x - \mu }{2\alpha \sigma }} \right|^{d} } \right),} \hfill & { x > \mu } \hfill \\ \end{array} } \right.$$
(18)

The SEP density is skewed to the right for \(\alpha < 1/2\) and to the left for \(\alpha < 1/2\).

Conditional \(VaR_{t + 1}^{p}\) and \(ES_{t + 1}^{p}\) of SEP are:

$$\begin{aligned} VaR_{t + 1}^{p} & = - \mu_{t + 1} - \sigma_{t + 1} VaR_{SEP} \left( {p\left| {\alpha ,d} \right.} \right) \\ ES_{t + 1}^{p} & = - \mu_{t + 1} - \sigma_{t + 1} ES_{SEP} \left( {p\left| {\alpha ,d} \right.} \right) \\ \end{aligned}$$
(19)

where

$$\begin{aligned} VaR_{SEP} \left( {p\left| {\alpha ,d} \right.} \right) & = \left\{ {\begin{array}{*{20}l} { - 2\alpha^{*} \left[ {d_{1} Q^{ - 1} \left( {\frac{p}{\alpha },\frac{1}{d}} \right)} \right]^{{\frac{1}{d}}} , } \hfill & {p \le \alpha } \hfill \\ {2\left( {1 - \alpha^{*} } \right)\left[ {dQ^{ - 1} \left( {\frac{1 - p}{1 - \alpha },\frac{1}{d}} \right)} \right], ^{1/d} } \hfill & {p > \alpha } \hfill \\ \end{array} } \right. \\ ES_{SEP} \left( {p\left| {\alpha ,d} \right.} \right) & = - \frac{{2\alpha^{*} }}{p}\mathop \int \limits_{0}^{p} \left[ {dQ^{ - 1} \left( {\frac{p}{\alpha },\frac{1}{d}} \right)} \right]^{1/d} dp + \frac{{2\left( {1 - \alpha^{*} } \right)}}{p}\mathop \int \limits_{0}^{p} \left[ {dQ^{ - 1} \left( {\frac{1 - p}{1 - \alpha },\frac{1}{d}} \right)} \right]^{1/d} dp \\ \end{aligned}$$

2.1.6 Asymmetric student t-distribution (AST)

AST proposed by Zhu and Galbraith (2010) and density function is defined as:

$$f_{AST} \left( {x\left| \beta \right.} \right) = \left\{ {\begin{array}{*{20}l} {\left( {\frac{\alpha }{{\alpha^{*} }}} \right)K\left( {v_{1} } \right)\left[ {1 + \frac{1}{{v_{1} }}\left( {\frac{x}{{2\alpha^{*} }}} \right)^{2} } \right]^{{ - \frac{{v_{1} + 1}}{2}}} ,} \hfill & { x \le 0} \hfill \\ {\left( {\frac{1 - \alpha }{{1 - \alpha^{*} }}} \right)K\left( {v_{2} } \right)\left[ {1 + \frac{1}{{v_{2} }}\left( {\frac{x}{{2\alpha^{*} }}} \right)^{2} } \right]^{{ - \frac{{v_{2} + 1}}{2}}} ,} \hfill & {x > 0} \hfill \\ \end{array} } \right.$$
(20)

\(\alpha \in \left( {0,1} \right)\) is skewness parameter.\(v_{1} > 0\) and \(v_{2} > 0\) are left and right tail parameters respectively.

$$K\left( v \right) \equiv \left( {\varGamma \left( {v + 1} \right)/2/\left[ {\sqrt {\pi v} \left( {v/2} \right)} \right]} \right)$$

where Γ \(\left( . \right)\) is gamma function and \(\alpha^{*}\) is:

$$\alpha^{*} = \alpha \left( {v_{1} } \right)/\left[ {\alpha K\left( {v_{1} } \right) + } \right]\left( {1 - \alpha } \right)K\left( {v_{2} } \right)$$

Denoting by µ and σ the location (centre) and scale parameters, respectively, the general form of the AST density is expressed as \(\frac{1}{\sigma }f_{AST} \left( {\frac{x - \mu }{\sigma };\alpha ,v_{1} ,v_{2} } \right)\).

Note that

$$\left( {\frac{\alpha }{{\alpha^{*} }}} \right)K\left( {v_{1} } \right) = \left( {\frac{1 - \alpha }{{1 - \alpha^{*} }}} \right)K\left( {v_{2} } \right) = \left[ {\alpha K\left( {v_{1} } \right) + } \right]\left( {1 - \alpha } \right)K_{EP} \left( {v_{2} } \right) \equiv {\text{B}}$$

Conditional \(VaR_{t + 1}^{p}\) and \(ES_{t + 1}^{p}\) of AST are:

$$\begin{aligned} VaR_{t + 1}^{p} & = - \mu_{t + 1} - \sigma_{t + 1} VaR_{AST} \left( {p\left| {\alpha ,v_{1} ,v_{2} } \right.} \right) \\ ES_{t + 1}^{p} & = - \mu_{t + 1} - \sigma_{t + 1} ES_{AST} \left( {p\left| {\alpha ,v_{1} ,v_{2} } \right.} \right) \\ \end{aligned}$$
(21)

where

$$\begin{aligned} & VaR_{AST} \left( {p\left| {\alpha ,v_{1} ,v_{2} } \right.} \right) \\ & \quad = 2\alpha^{*} S_{{v_{1} }}^{ - 1} \left( {\frac{{min\left( {p,\alpha } \right)}}{2\alpha }} \right) + 2\left( {1 - \alpha^{*} } \right)S_{{v_{2} }}^{ - 1} \left( {\frac{{max\left( {p,\alpha } \right) + 1 - 2\alpha }}{{2\left( {1 - \alpha } \right)}}} \right) \\ \end{aligned}$$

where \(S_{v} \left( . \right)\) is the cumulative distribution function of the standard student t-distribution with ν degrees of freedom and \(S_{v}^{ - 1}\) is its inverse.

$$\begin{aligned} & ES_{AST} \left( {p\left| {\alpha ,v_{1} ,v_{2} } \right.} \right) \\ & \quad = - \frac{4B}{p}\left\{ {\frac{{\left( {\alpha^{*} } \right)^{2 } v_{1} }}{{v_{1} - 1}}\left( {1 + \frac{1}{{v_{1} }}\left[ {\frac{{min\left( {q - \mu ,0} \right)}}{{2\alpha^{*} }}} \right]^{2} } \right)^{{\frac{{1 - v_{1} }}{2}}} - \frac{{\left( {1 - \alpha^{*} } \right)^{2 } v_{2} }}{{v_{2} - 1}}\left( {1 + \frac{1}{{v_{2} }}\left[ {\frac{{min\left( {q - \mu ,0} \right)}}{{2\alpha^{*} }}} \right]^{2} } \right)^{{\frac{{1 - v_{2} }}{2}}} } \right\} \\ \end{aligned}$$

where \(q = VaR_{AST} \equiv F_{AST}^{ - 1}\).

2.1.7 Skewed student t-distribution (SST)

By letting \(v_{2} = v_{1} = v\) and \(\alpha^{*} = \alpha\) in AST by Zhu and Galbraith (2010), we obtain new parameterization of skewed student t-distribution (SST):

$$f_{SST} \left( {x\left| \beta \right.} \right) = \left\{ {\begin{array}{*{20}l} {\frac{1}{\sigma }K\left( v \right)\left[ {1 + \frac{1}{v}\left( {\frac{x - \mu }{2\alpha \sigma }} \right)^{2} } \right]^{{ - \frac{v + 1}{2}}} , } \hfill & {x \le \mu } \hfill \\ {\frac{1}{\sigma }K\left( v \right)\left[ {1 + \frac{1}{v}\left( {\frac{x}{2\alpha \sigma }} \right)^{2} } \right]^{{ - \frac{v + 1}{2}}} ,} \hfill & {x > \mu } \hfill \\ \end{array} } \right.$$
(22)

Conditional \(VaR_{t + 1}^{p}\) and \(ES_{t + 1}^{p}\) of SST are:

$$\begin{aligned} VaR_{t + 1}^{p} & = - \mu_{t + 1} - \sigma_{t + 1} VaR_{SST} \left( {p\left| {\alpha ,v} \right.} \right) \\ ES_{t + 1}^{p} & = - \mu_{t + 1} - \sigma_{t + 1} ES_{SST} \left( {p\left| {\alpha ,v} \right.} \right) \\ \end{aligned}$$
(23)

where

$$\begin{aligned} & VaR_{SST} \left( {p\left| {\alpha ,v} \right.} \right) \\ & \quad = 2\alpha^{*} S_{v}^{ - 1} \left( {\frac{{min\left( {p,\alpha } \right)}}{2\alpha }} \right) + 2\left( {1 - \alpha^{*} } \right)S_{v}^{ - 1} \left( {\frac{{max\left( {p,\alpha } \right) + 1 - 2\alpha }}{{2\left( {1 - \alpha } \right)}}} \right) \\ \end{aligned}$$

where \(S_{v} \left( . \right)\) is the cumulative distribution function of the standard student t-distribution with ν degrees of freedom and \(S_{v}^{ - 1}\) is its inverse.

$$ES_{SST} \left( {p\left| {\alpha ,v} \right.} \right) = - \frac{4B}{p}\left\{ {\begin{array}{*{20}c} {\frac{{\left( {\alpha^{*} } \right)^{2 } v}}{v - 1}\left( {1 + \frac{1}{v}\left[ {\frac{{min\left( {q - \mu ,0} \right)}}{{2\alpha^{*} }}} \right]^{2} } \right)^{{\frac{1 - v}{2}}} } \\ { - \frac{{\left( {1 - \alpha^{*} } \right)^{2 } v}}{v - 1}\left( {1 + \frac{1}{v}\left[ {\frac{{min\left( {q - \mu ,0} \right)}}{{2\alpha^{*} }}} \right]^{2} } \right)^{{\frac{1 - v}{2}}} } \\ \end{array} } \right\}$$

where \(q = VaR_{SST} \equiv F_{SST}^{ - 1}\).

2.2 Term structure of risk: VaR and ES calculation over multi-period

To date majority of studies have focused on single day market risk estimation. Currently the most popular method is the square-root rule that is applied over short time horizons. If we consider a simple case of normal distribution with a constant variance \(\sigma_{PF}^{2}\), per square–root rule, the VaR and ES for returns over the next \(h\) days calculated on day \(t,\) as:

$$\begin{aligned} VaR_{t + 1,h}^{p} & = - \sqrt h \sigma_{PF} \varPhi_{p}^{ - 1} = \sqrt h VaR_{t + 1}^{p} \\ ES_{t + 1,h}^{p} & = \sqrt h \sigma_{PH} \frac{{\phi \left( {\varPhi_{p}^{ - 1} } \right)}}{p} = \sqrt h ES_{t + 1}^{p} \\ \end{aligned}$$
(24)

However, given the dynamic nature of variance, moving from one period ahead to multi period h-days ahead is not straightforward because scaling variance as modelled by GARCH processes is not mean reverting with the returns over the next h days are not normally distributed. This drawback means that Monte Carlo simulation needs to be used to calculate VaR and ES over multi-period horizon. We follow Christoffersen (2012) in simulating the index returns having first estimated the underlying GARCH model parameters. Further details on the simulation methodology can be found in Christoffersen (2012). Based on simulated returns over h-days \(\left\{ \check{R}_{i,t + 1:t + h} \right\}_{i = 1}^{MC}\), the VaR and ES over period h is:

$$\begin{aligned} VaR_{t + 1:t + h}^{p} & = - Percentile\left\{{\left\{ {\check{R}}_{i,t + 1:t + h} \right\}_{i = 1}^{MC} ,100p}\right\} \\ ES_{t + 1:t + h}^{p} & = - \frac{1}{p.MC}\mathop \sum \limits_{i = 1}^{MC} {\check{R}}_{i,t + 1:t + h} .1\left({\check{R}}_{i,t + 1:t + h} < - VaR_{t + 1:t + h}^{p} \right) \\\end{aligned}$$
(25)

where \(1\left( \cdot \right)\) takes the value 1 if the argument is true and zero otherwise and MC denotes the number of draws.

2.3 Back-testing risk models

2.3.1 Bootstrap test for the expected shortfall

To evaluate ES we first use McNeil and Frey (2000) test for zero unconditional mean. The test focuses on the discrepancy between the observed return and the ES forecast for the periods in which the return exceeds the VaR forecast, the assessment of ES forecasts is not independent of the VaR forecasts. McNeil and Frey (2000) defined residuals as:

$$\begin{aligned} R_{t + 1} & = \frac{{X_{t + 1} - ES_{t}^{q} \left( {X_{t + 1} } \right)}}{{\sigma_{t + 1} }} \\ R_{t + 1} & = \frac{{\mu_{t + 1} + \sigma_{t + 1} Z_{t + 1} - \left( {\mu_{t + 1} + \sigma_{t + 1} ES_{t}^{q} \left( Z \right)} \right)}}{{\sigma_{t + 1} }} \\ \end{aligned}$$
(26)
$$R_{t + 1} = Z_{t + 1} - E_{t} \left[ {Z\left| Z \right\rangle z_{q} } \right]$$
(27)

According to McNeil and Frey (2000) these residuals are iid and conditional on \(X_{t + 1} > x_{q}\) or equivalently \(Z_{t + 1} > z_{q}\) being the q-quantile of Z. Based on our stock price data and our estimates of expected shortfall, we can construct the corresponding residuals on days when violation occurs. McNeil and Frey (2000) call these residuals exceedance residuals and denote them by:

$$r = \left\{ {r_{t + 1} ;for\,t\,such\,that x_{t + 1} > \hat{x}_{q} } \right\}$$
(28)

where

$$r_{t + 1} = \frac{{x_{t + 1} - ES_{q}^{t} \left( {X_{t + 1} } \right)}}{{\hat{\sigma }_{t + 1} }}$$

Under the null hypothesis that we estimate \(\mu_{t + 1} ,\)\(\sigma_{t + 1}\) and the expected shortfall correctly, these residuals should behave like an iid sample from a random variable with mean zero and the alternative hypothesis is that the residuals have a mean greater than zero (McNeil and Frey, 2000).

2.3.2 MAE for back-testing ES

We evaluated the expected shortfall as measure of downside risk based on the mean absolute error defined as

$$MAE_{j} \left( q \right) = \frac{1}{N - 1}\mathop \sum \limits_{i = 1}^{N - 1} \left| {R_{t + 1} - ES_{t + i} \left( q \right)} \right|$$
(29)

where \(ES_{t + i} \left( q \right)\) is the expected shortfall as measure of downside risk and \(R_{t + 1}\) are observed returns and N is the number of observations. The model with minimum MAE value is preferred to the other models.

3 Empirical results

3.1 Data and preliminary analysis

The data for this study comprises of five global stock indices, including S&P500, FTSE100, NASDAQ100 – comprising of non-American and non-financial top 100 companies on the NASDAQ exchange, NIKKEI225 and DAX30. All data is obtained from Datastream. For all the indices, the sample comprises of 18 years of daily observation from 1995 to 2013 with a total of 4698 daily observations. The continuously compounded returns are calculated as the logarithmic difference of daily closing price multiplied by 100.

The summary statistics are presented in Table 1. The value of skewness is negative for all return series, indicating an asymmetry in the distribution of return. A negatively skewed distribution or skewed to the left has a long-left tail. All our data series are characterized by many small gains and a few extreme losses. The kurtosis of our data set is greater than 3 and reflects fat tails. We reject the null hypothesis of the normal distribution as the p value for Jarque–Bera (1980) test is less than 0.05. Jarque–Bera test confirms that all return series have non-normal distributions. The Ljung–Box (1978) Q-statistics reported in Table 1 for both returns and squared returns for all data series also reject the null hypothesis of no autocorrelation through 20-lags at a 5% significance level.

Table 1 Summary descriptive statistics

3.2 Parameter estimation of distributions of return

Specifically we estimate the parameters of the following seven models: Standardized t-distribution (ST), Twin t-distribution (TT) of Baker and Jackson (2014), Generalized asymmetric distribution (GAT) of Baker (2016), Asymmetric exponential power distribution (AEP) of Zhu and Zinde-Walsh (2009), Skewed exponential power distribution (SEP) and the special case of AST, the Skewed Student t-distribution, Asymmetric Student t- distribution (AST) of Zhu and Galbraith (2010). The estimation procedure is as follows:

  • Given the specific ith t distribution with parameter \({{\Theta }}^{\left( i \right)} {\text{for }}1 \le i \le 7\), we identify the underlying GARCH process using the likelihood method. Then use the estimated GARCH process to generate the conditional volatility (\(\hat{\sigma }_{t}\)) and the conditional mean (\(\hat{\mu }\)).

  • We obtain standardized residuals \(\hat{z}_{t}\) as the function of \(\hat{\mu }_{t}\) and \(\hat{\sigma }_{t}\).

  • We define F(i)(.) as the CDF of i-th candidate t distribution, and Q(.) as the quantile function of zt. The desired \({{\Theta }}_{\left( i \right)}^{:*}\) should satisfy

    $${{\Theta }}_{\left( i \right)}^{*} = \begin{array}{*{20}c} {\arg { \inf }} \\ {{{\Theta }}_{\left( i \right)}^{*} } \\ \end{array} \left| {Q\left( {\hat{z}_{t} } \right) - F_{\left( i \right)}^{ - 1} \left( {\hat{z}_{t} } \right);{{\Theta }}_{\left( i \right)} } \right|, 1 \le i \le 7$$
    (30)

The parameter estimates are reported in Tables 2, 3, 4, 5 and 6 with the log-likelihood values, AIC and BIC values. The bold values of AIC and BIC criteria in all tables represent top three best models for the specific data set. Per AIC and BIC values in Table 2, the best-fitting models for S&P 500 data are the Generalized asymmetric t- distribution (GAT), Student t-distribution (ST) and Twin t-distribution (TT). All above models have lowest AIC and BIC while, asymmetric t-distribution (AST) and skewed exponential power distribution (SEP) have highest AIC and BIC value respectively.

Table 2 Estimated parameters and goodness of fit tests for S&P500 for the period 1995–2013
Table 3 Estimated parameters and goodness of fit tests for FTSE100 for the period 1995–2013
Table 4 Estimated parameters and goodness of fit tests for NASDAQ100 for the period 1995–2013
Table 5 Estimated parameters and goodness of fit tests for NIKKEI225 for the period 1995–2013
Table 6 Estimated parameters and goodness of fit tests for DAX30 for the period 1995–2013
Table 7 Back-testing results for 1-day ahead ES for international indices
Table 8 Back-testing results for 5-days and 10-days ahead ES using MAE for international indices

Examining the values of AIC and BIC in Tables 3, 4, 5 and 6 for the indices FTSE100, NASDAQ100, NIKKEI225 and DAX30, we observe that the generalized asymmetric t-distribution (GAT),Footnote 1,Footnote 2 Student t-distribution (ST) and Twin t-distribution (TT) are the top three models respectively except for FTSE100 where AEP is the second best model. When we compare GAT and AST models, we find that GAT significantly outperforms AST. AEP model as an alternative to AST and GAT performs better that AST but under performs GAT. We see that the best fitting models for all our indices data sets are our two new distributions GAT and TT and Standardized Student t- distribution. Overall the GAT distribution is the best model, as it has many advantages over Standardized Student t-distribution. Standardized t-distribution does not support asymmetry. Neither of the Zhu–Zinde-Walsh (2009) Asymmetric exponential power distribution and Zhu and Galbraith (2010) asymmetric t-distribution provide the best fits to the models. Per AIC and BIC, the new Twin t-distribution also performs better than Asymmetric t- distribution and Exponential power distribution for all data sets. To summarize based on the AIC and BIC criteria GAT, TT and ST provide a better fit than AST and AEP and their skewed versions.

3.3 One-day ahead expected shortfall back-testing

To test the validity of the different t-distributional assumptions on the five different indices, the VaR and ES at different confidence levels ranging from 5% to 0.5% are calculated. As the existing literature has demonstrated VaR has serious drawbacks, we therefore only evaluate different ES models to assess market risk. As stated earlier the competing risk models are ST, TT, GAT, AEP, SEP, AST and SST, with the TT and GAT used for the first time in the literature to calculate market risk as calculated by VaR or ES.

To evaluate the ES forecasts, we first use McNeil and Frey’s (2000) bootstrap test. The high p values given by this test speak in favour of a model, while low p values speak against a model. The results indicate that AST, AEP and GAT have highest p values which is significantly higher then of 0.01. As indicated by McNeil and Frey’s (2000) that an assumption of normality always fails the test with p values in all cases much less than 0.01. In our case p values for all distribution is much higher than of 0.01 in almost all the cases. The results, which are shown in Table 7, and clearly provide insight for the use of asymmetric distributions for the risk analysis purpose. However, this test provides little information into the relative performance of the methods. This motivates the use of an additional approach to evaluating ES forecast accuracy.

We compare ES by using MAE that calculates the difference between the actual and the expected losses when a violation occurs. The small value of calculated mean absolute error and the mean squared errors appear small enough to suggest that the best fitting models are reasonable.

Table 7 contains the performance results for all the models and indices, with each of the panels containing the results for each of the indices across the seven models. From Panel A, we see that for the S&P500, at the 5% level GAT, TT and SST provide the best fit. At the 2.5% level, again results indicate that GAT, TT and SST models providing the best fit. However, at the 1% or 0.5% level SST model provides the best fit by outperforming the other models. At 1% and 0.5% SEP is the second best model.

Panel B contains the results for the FTSE100 index, up to 2.5% confidence level GAT, ST and TT model outperforms all other models. However, at the 1% and 0.5% level GAT, TT are the best performers. Panel C comprises of the NASDAQ back-testing results where we find that at 5% GAT provides the best fit, while SEP and AEP are second and third best models. At 2.5%, 1% and 0.5% again GAT, SEP and AEP are the outperformers, however SEP outperforms GAT and AEP marginally. Moreover, AST performs better then TT, SST and ST. In Panel D we present the results for NIKKIE225 and note that our proposed GAT model outperforms all other models at 5%. At 2.5% only SEP outperforms GAT. However, at 1% and 2.5% both SEP and AEP perform better than of GAT. Panel E indicate the results for DAX30 GAT, TT and SEP are the best performing models at 5% and 2.5%. At 1% TT, SEP and SST are the best performers. AT 0.5% SEP, SST and AEP perform better than GAT and TT.

To summarize our key results:

  1. (i)

    GAT model and TT models are in the top three models at 5% and 2.5% confidence level in almost all cases.

  2. (ii)

    AST model have highest values of MAE for almost all datasets and significance levels except NASDAQ in panel C.

  3. (iii)

    The skewed version of AST model (SST) is amongst the models with the highest MAE values except S&P500 in panel A, where it is third best model after GAT and TT.

  4. (iv)

    AEP model as alternative to asymmetric distributions performs better than the AST, but GAT model clearly outperforms AEP in most of the cases.

  5. (v)

    The skewed version of AEP model (SEP) performs better than of the skewed version of AST model (SST) in most of the cases. For NASDAQ, NIKKIE300 and DAX225 it is among the top three models.

  6. (vi)

    The results of MAE indicate different model ranking for the same confidence level. However, for most of the cases GAT remain in the top three models.

  7. (vii)

    These results give us a strong indication that new parameterization of generalized asymmetric distribution provides valuable improvement in the results. When we compare ES back-testing for two asymmetric t-distributions, MAE of GAT are significantly lower than that of AST. These results indicate strong implication for further research for use of asymmetric t-distribution as ES measure.

Based on the ES back-tests conducted, we conclude that the GAT model by Baker (2016) outperforms the competing AST by Zhu and Galbraith (2010) model by a significant margin. As an alternative to asymmetric t-distribution AEP model also underperforms GAT model.Footnote 3

3.4 Multi-period horizon ES back-testing

Table 8 contains the back-testing results across 4 days and 10 days for each of the indices across all of the models. Regarding the results for the 5-day horizon, we find that for the S&P500, SST model outperforms all other models up to 1% level. However, at the 0.5% level GAT model outperforms all other models. SST and GAT are among the top three outperformers for all the significance level while TT is among top two best model for 2.5% and 1% significance level.

From Panel B, we see that for the FTSE100 index, up to 2.5% level, SST is the best performer, followed by GAT at 1% and 0.5%. Looking at Panel C and D, we see that for the NASDAQ100 and NIKKIE225 at all significance level GAT, SST and TT are the best performers. At 5% and 2.5% SST outperforms GAT and TT. However, at 1% and 0.5% GAT has smallest MAE than of SST and TT. In panel C the results for DAX30 indicates that GAT, TT and AEP are the top three models for all significance levels.

From Table 8, with regards to the 10-day horizon, results are straight cut, for the S&P500, FTSE100, NIKKEI225 and DAX30, GAT provides the best result across all confidence levels with the TT providing the best fit for the remaining index NASDAQ100. Our results for the predicted ES for 5-days and 10-days can be summarized as follows:

  1. (i)

    At the 5-day horizon results are mixed with the SST being the best performer up to 1% level in majority of the cases. However, at 0.5% confidence level GAT is the best performer. Overall, GAT remains in the top three models based on the lowest MAE value.

  2. (ii)

    When we increase the number of horizons to 10-days, MAE values clearly suggest GAT as the best model for almost all data sets.

  3. (iii)

    Both AEP and SEP perform very poorly to forecast ES for both 5-days and 10-days horizon at various significance levels.

  4. (iv)

    AST model has highest MAE value in most of the case for both 5-days and 10-days horizon.

  5. (v)

    We can infer that results of ES models are not similar across different time horizons. However, the satisfactory predictions of the GAT are in accordance with the findings of 1-day ahead ES evaluation. Again, like 1-day ahead GAT model out performs AST model and give clear implication for the use of GAT distribution for risk forecasting.

4 Concluding remarks

The recent crisis has highlighted the weaknesses of VaR as a market measure of risk. This has resulted in the related superior measure ES being given more prominence under Basel III (Basel Committee on Banking and Supervision 2013, 2017). Previous studies have focused on VaR and more specifically on a single day VaR. This study has sought to complement earlier studies by expanding market risk measures to ES over multi-day horizon using seven different models that incorporate the observed empirical characteristics of equity returns as noted by Kellner and Rosch (2016) who recommends that only models which allow for heavy tailed and/or skewness can accurately estimate both VaR and ES.

In this study we make a number of contributions. First, we found that when seven different models based on alternative t-distributions were fitted to the standardized residuals, we found that our two new proposed models Generalized asymmetric t-distribution (GAT) of Baker (2016) and Double t-distribution (TT) of Baker and Jackson (2014) provided the best fit, with GAT model being overall the best model. Moreover, surprisingly the Standard t-distribution outperformed many of the more complex t-distributions.

Second, the performance of the ES models are dependent on the market and the confidence level, particularly so at the 1-day and 5-day horizons. This result would indicate that for short horizons, risk managers and regulators should use a variety of models and check the accuracy of each model specific to each index and constantly re-assess the validity of each model. For longer horizons we find that our new proposed models GAT outperformed all the models considered in this study. This would indicate that for longer horizons, risk managers should focus on a single model, rather than a number of alternative models.

Third, complex models do not always lead to best fits or back-testing results. For example, in many cases the Standardized t-distribution outperforms the more complex Asymmetric exponential power distribution (AEP) of Zhu and Zinde-Walsh (2009). These findings are further reinforced by the outperformance of by our simpler GAT and TT distributions across different horizons, confidence levels and markets.

Finally the backtesting results indicates a wide variation of ES values across different models and indices. Given that the VaR and ES values form the basis of regulatory capital allocation, it is imperative that the most accurate model with the lowest estimated VaR and ES are used by both regulators and managers as the wrong model may mean either capital is not efficiently used or insufficient capital is set aside. In this regard, our GAT model provides a reliable alternative to many of the existing models in that it is overall the best performing model across different confidence levels, different horizons and different indices.