1 Introduction

The volatility of financial returns constitutes a pivotal part of the empirical finance and econometrics research, with crucial implications in portfolio optimization and risk management practices. Robust modeling and reliable forecasting of the volatility trajectory of financial instruments has been the main task and objective of financial economics applications for business operations, given that volatility constitutes one of the fundamental input variables in estimations and decision processes of any non-financial corporation on derivatives pricing, portfolio immunization, investment diversification, firm valuation and funding choices. There are several studies introducing non-parametric estimators of realized volatility using high-frequency market data. Andersen and Bollerslev (1998), Andersen et al. (2001) and Barndorff-Nielsen and Shephard (2002) were the first that econometrically formalized the realized variance with quadratic variation-like measures, while Barndorff-Nielsen et al. (2008, 2009) focused on the realized kernel estimation as a realized measure which is more robust to noise.

A large body of empirical research focuses on modeling and forecasting the realized volatility. Various studies combine it with the conditional variance of returns. Engle (2002b) proposed the GARCH-X process, where the former is included as an exogenous variable in the equation of the latter. Corsi et al. (2008) suggested the HAR-GARCH formulation for modeling the volatility of realized volatility.

Hansen et al. (2012) introduced the Realized GARCH model that corresponds most closely to the HEAVY framework of Shephard and Sheppard (2010), which jointly estimates conditional variances based on both daily (squared returns) and intra-daily (it uses the realized kernel as a measure of ex-post volatility) data, so that the system of equations adopts to information arrival more rapidly than the classic daily GARCH process. One of its advantages is the robustness to certain forms of structural breaks, especially during crisis periods, since the mean reversion and short-run momentum effects result in higher quality performance in volatility level shifts and more reliable forecasts. Borovkova and Mahakena (2015) employed a HEAVY specification with a skewed-t error distribution, while Huang et al. (2016) incorporated the HAR structure of the realized measure in the GARCH conditional variance specification in order to capture the long memory of the volatility dynamics.

This paper examines the HEAVY model by enriching the bivariate system with asymmetries and power transformations, through the structure of Ding et al. (1993). Among others, Pérez et al. (2009, see the references therein for more details) show that the presence of an asymmetric response of volatility to positive and negative returns shows up in non-zero cross-correlations between original returns and future powers of absolute returns. One of our main findings is that each of the two powered conditional variances is significantly affected by the first lags of both power transformed variables, that is, squared negative returns, and realized kernel (or, for the latter, its negative signed values).

We analyze the various specifications in depth and we investigate their performance over six stock indices. We also take into account long memory (either fractionally integrated or hyperbolic), by employing the framework of Davidson (2004) (see, Schoffer 2003 and Dark 2005, 2010, as well). We find that a fractionally integrated formulation better fits the squared returns, whereas a hyperbolic type of memory is more suitable for the realized measure. The long memory feature reinforces our main argument that the lagged values of the power transformations of both aforementioned variables move the dynamics of the two powered conditional variances. The fractionally integrated (asymmetric power) model for the returns equation pools information across both low-frequency and high-frequency based volatility indicators. Similarly, the more richly parametrized hyperbolic process for the realized kernel equation is bolstered with low-frequency information as well since the lagged value of the powered squared negative returns improves the forecasting performance of the model. Finally, in the presence of structural breaks, which are apparent in the two power transformed volatility measures, we re-estimate the bivariate system including dummy variables, and we present the time-varying behavior of the parameters. Focusing on the recent global financial crisis, we observe that their values increase after the crisis.

With the advent of the crisis, when volatilities increased sharply and persistently with crucial systemic risk externalities, we witnessed a resurgence of regulators’ and academics’ interest in meaningful volatility estimates, while, at the same time, practitioners remained alert to improving the relevant volatility frameworks on a day-to-day basis. Financial economics scholars focused on volatility as a potent catalyst of systemic risk build-up, which policymakers tried to limit. We demarcate this study from the extant finance bibliography by extending the benchmark HEAVY model with asymmetries, power transformations and long memory providing a well-defined framework that adequately fits the volatility process. The bivariate system of the two volatility regression equations, we establish, is ready-to-use, not only on stock markets returns, but also on multiple financial economics applications of business operations research, such as bonds investing, foreign exchange trading and commodities hedging, core daily functions in the treasuries of most non-financial corporations, besides the financial services sector.

Overall, our proposed volatility modeling framework improves the HEAVY model, with major implications for market practitioners and policymakers on forecasting the financial returns second moment. Volatility modeling and forecasting is essential for asset allocation, pricing and risk management hedging strategies. A reliable volatility forecast, exploiting in full the high-frequency domain, is the input variable of paramount importance for the processes of derivatives pricing, effective cross-hedging, Value-at-Risk measurement, investment allocation and portfolio optimization with different asset classes and financial instruments. Moreover, the robust volatility modeling approach we introduce provides a tool of utmost significance not only for market players but also for policymakers. Policymaking includes continuous oversight duties and prudential regulation practices. In this vein, it is imperative for the authorities to account for the volatility of financial markets across every aspect of the financial system’s policy responses, both post-crisis through stabilization policy reactions and pre-crisis through proactive assessment of financial risks.

The remainder of the paper is structured as follows. In Sect. 2, we detail the HEAVY formulation and our first extension, which allows for asymmetries and power transformations. Section 3 describes the data and Sect. 4 presents the results for the asymmetric power specification. The next section studies the long memory process and discusses the relevant empirical findings. In Sect. 6, we estimate multiple-step-ahead forecasts to measure the out-of-sample performance of the various specifications. The following section takes into consideration the presence of structural breaks. In Sect. 8, we discuss the dynamic conditional correlations estimation procedure for the extended HEAVY models. Finally, Sect. 9 concludes the analysis.

2 The framework

The benchmark HEAVY model of Shephard and Sheppard (2010) can be extended in many directions. We allow for power transformations, leverage effects and long memory (see Sects. 4 and 5 below) in the conditional variance process. We run the estimated benchmark specification of Shephard and Sheppard (2010), enriched with the three key features to improve the volatility modeling and forecasting further.

2.1 The HEAVY model

The HEAVY model uses two variables: the close-to-close stock returns (\(r_{t}\)) and the realized measure of variation based on high frequency data, \( RM_{t}\). We first form the signed square rooted (SSR) realized measure as follows: \(\widetilde{RM_{t}}=\)sign\((r_{t})\sqrt{RM_{t}}\), where sign\((r_{t})=1\), if \(r_{t}\geqslant 0\) and sign\((r_{t})=-1\), if \( r_{t}<0\).

We assume that the returns and the SSR realized measure are characterized by the following relations:

$$\begin{aligned} r_{t}=e_{rt}\sigma _{rt},\quad \widetilde{RM_{t}}=e_{Rt}\sigma _{Rt}, \end{aligned}$$
(1)

where the stochastic term \(e_{it}\) is independent and identically distributed (i.i.d), \(i=r,R\); \(\sigma _{it}\) is positive with probability one for all t and it is a measurable function of \(\mathcal {F} _{t-1}^{(XF)}\), which is the filtration generated by all available information through time \(t-1\). We will use \(\mathcal {F}_{t-1}^{(HF)}\) (\(X=H\)) for the high frequency past data, i.e., for the case of the realized measure, or \(\mathcal {F}_{t-1}^{(LoF)}\) (\(X=Lo\)) for the low frequency past data, i.e., for the case of the close-to-close returns. Hereafter, for notational convenience we will drop the superscript XF.

In the HEAVY/GARCH model \(e_{it}\) has zero mean and unit variance. Therefore, the two series have zero conditional means, and their conditional variances are given by

$$\begin{aligned} \mathbb {E}\left( r_{t}^{2}\left| \mathcal {F}_{t-1}\right. \right) =\sigma _{rt}^{2},\;\; \text {and}\;\;\mathbb {E}\left( \widetilde{RM_{t}}^{2}\left| \mathcal {F} _{t-1}\right. \right) =\mathbb {E}(RM_{t}\left| \mathcal {F}_{t-1}\right. )=\sigma _{Rt}^{2}\text {,} \end{aligned}$$
(2)

where \(\mathbb {E}(\cdot )\) denotes the expectation operator.

2.2 Asymmetric power specification

The asymmetric power (AP) specification for the HEAVY(1, 1) model consists of the following equations (in what follows for notational simplicity we will drop the order of the model if it is (1, 1)):

$$\begin{aligned} (1-\beta _{i}L)\left( \sigma _{it}^{2}\right) ^{\frac{\delta _{ii}}{2}}= & {} \omega _{i}+( \alpha _{ir}+\gamma _{ir}s_{t-1})L\left( r_{t}^{2}\right) ^{\frac{\delta _{ir}}{2} }\nonumber \\&+\,(\alpha _{iR}+\gamma _{iR}s_{t-1})L(RM_{t})^{\frac{\delta _{iR}}{2}},\quad \text {for}\;i=r,R \end{aligned}$$
(3)

where L is the lag operator, \(\delta _{ij} \in \mathbb {R}_{>0}\) (the set of the positive real numbers) are the power parameters, and \(s_{t}=0.5[1-\) sign\((r_{t})]\), that is, \(s_{t}=1\) if \(r_{t}<0\) and 0 otherwise; \( \gamma _{ii}, \gamma _{ij}\) (\(i\ne j\)) are the own and cross leverage parameters, respectivelyFootnote 1; positive \( \gamma _{ii}, \gamma _{ij}\) means larger contribution of negative ‘shocks’ in the volatility process (in our long memory AP specification we will replace \(\alpha _{ii}+\gamma _{ii}s_{t-1}\) by \(\alpha _{ii}(1+ \gamma _{ii}s_{t-1})\); see Sect. 5 below, and, in particular, Eq. (4)). In this specification the powered conditional variance, \( (\sigma _{it}^{2})^{\delta _{ii}/2}\), is a linear function of the lagged values of the powered transformed squared returns and realized measure.

We will distinguish between three different asymmetric cases: the double one (DA: \(\gamma _{ij}\ne 0\)) and two more: own asymmetry (OA: \(\gamma _{ii}\ne 0\) , \(\gamma _{ij}=0\) for \(i\ne j\)) and cross asymmetry (CA: \(\gamma _{ii}=0, \gamma _{ij}\ne 0\) for i\(\ne j\)).

The \(\alpha _{iR}\) and \(\gamma _{iR}\) are called the (four) Heavy parameters (own when \(i=R\) and cross when \(i\ne R\)). These parameters capture the impact of the realized measure on the two conditional variances. Similarly, the \(\alpha _{ir}\) and \(\gamma _{ir}\) (four in total) are called the Garch parameters (own when \(i=r\) and cross for \(i\ne r\)). They depict the influence of the squared returns on the two conditional variances. When all four Heavy parameters are zero the AP-HEAVY model is reduced to a bivariate AP-GARCH process (see, for example, Conrad and Karanasos 2010). If, on the other hand, all four Garch parameters are zero, then we have the AP-HEAVY specification where the only unconditional regressor is the first lag of the powered \(RM_{t}\). Finally, we should mention that all the parameters in this bivariate system should take non-negative values (see, for example, Conrad and Karanasos 2010).

To sum up, the benchmark HEAVY(1, 1) model, to be extended in this study, is characterized by two conditional variance equations:

$$\begin{aligned} (1-\beta _{r}L)\sigma _{rt}^{2}&=\omega _{r}+\alpha _{rR}L(RM_{t}), \\ (1-\beta _{R}L)\sigma _{Rt}^{2}&=\omega _{R}+\alpha _{RR}L(RM_{t}) \end{aligned}$$

Equation (3) gives the general formulation of our first HEAVY extension for both variables, where we add asymmetries and power transformations. We will first estimate both conditional variance equations in the general form with all Heavy, Garch and Asymmetry parameters given by Eq. (3) and in case a parameter is insignificant, we will exclude it and this will result in a reduced form that is statistically preferred for each volatility process. For example, in the returns and realized conditional variances estimation, the own and cross Garch parameters (\(\alpha _{rr}\) and \( \alpha _{Rr}\) respectively) are proved insignificant and, therefore, excluded (see Sect. 4, Table 2, Panels A and B) since this is the way to reach the returns and realized measure formulations that are statistically preferred.

We conclude this section by highlighting the fact that the estimation (see Sect. 4) of our extended asymmetric power specification proves that both powered conditional variances receive the notable impact from the first lag of the power transformed negative returns. Therefore, the conclusion that the realized measure of variation does all the work at moving around the conditional variance of stock returns does not hold for the more richly parametrized model.

3 Data description

The HEAVY framework is estimated for six stock indices returns and realized volatilities. According to the analysis in Shephard and Sheppard (2010), the HEAVY formulation considerably improves the volatility modeling by allowing momentum and mean reversion effects and adjusting quickly to the structural breaks in volatility. We extend the benchmark specification in Shephard and Sheppard (2010), by adding the features of power transformed conditional variances, leverage effects and long memory (see Sect. 5 below) in the volatility process. Moreover, in order to identify the possible recent global financial crisis effects on the volatility process and to take into account the structural breaks in the two powered series (squared returns and realized measure), in Sect. 7 we incorporate dummies in our empirical investigation. The analysis with the structural breaks can be considered as an alternative to the long memory investigation.

3.1 Oxford-Man Institute’s library

We use daily data for six market indices extracted from the Oxford-Man Institute’s (OMI) realized library version 0.2 of Heber et al. (2009): S&P 500 from the US (SP), Nikkei 225 from Japan (NIKKEI), TSE from Canada, FTSE 100 from the UK (FTSE), DAX from Germany and Eustoxx 50 from the Eurozone (EUSTOXX). Our sample covers the period from 03/01/2000 to 30/11/2017 for most indices. For the Canadian stock market index TSE the data begin from 2002. The OMI’s realized library includes daily stock market returns and several realized volatility measures calculated on high-frequency data from the Reuters DataScope Tick History database. The data are first cleaned and then used in the realized measures calculations. According to the library’s documentation, the data cleaning consists of deleting records outside the time interval that the stock exchange is open. Some minor manual changes are also needed, when results are ineligible due to the rebasing of indices. We use the daily closing prices, \(P_{t}^{C}\), to form the daily returns as follows: \(r_{t}=\ln (P_{t}^{C})-\ln (P_{t-1}^{C})\), and two realized measures as drawn from the library: the realized kernel and the 5-min realized variance. The estimation results using the two alternative measures are very similar, so we present only the ones with the realized kernels (the results for the realized variances are available upon request).

3.2 Realized measures

The library’s realized measures are calculated in the way described in Shephard and Sheppard (2010). The realized kernel, which we present in our analysis here, is chosen as a measure more robust to noise, where the exact calculation with a Parzen weight function is described as follows: \( RK_{t}=\sum \nolimits _{k=-H}^{H}k(h/(H+1))\gamma _{h}\), where k(x) is the Parzen kernel function with \(\gamma _{h}=\sum \nolimits _{j=\mid h\mid +1}^{n}x_{j,t}x_{j-\mid h\mid ,t}; x_{jt}=X_{t_{j,t}}-X_{t_{j-1,t}}\) are the 5-min intra-daily returns where \(X_{t_{j,t}}\) are the intra-daily prices and \(t_{j,t}\) are the times of trades on the t-th day. Shephard and Sheppard (2010) declared that they selected the bandwidth of H as in Barndorff-Nielsen et al. (2009).

The 5-min realized variance, \(RV_{t}\), which we also employ as an alternative realized measure, is calculated with the formula: \(RV_{t}=\sum x_{j,t}^{2}\) (results are not reported but they are available upon request). Heber et al. (2009) additionally implement a subsampling procedure from the data to the most feasible level in order to eliminate the stock market noise effects. The subsampling involves averaging across many realized variance estimations from different data subsets (see also the references in Shephard and Sheppard 2010 for realized measures surveys, noise effects and subsampling procedures).

Table 1 presents the main six stock indices extracted from the database and provides volatility estimations for each one’s squared returns and realized kernels time series for the respective sample period (see also the SP series graphs in the “Appendix”, Figs. 3, 4 and 5). We calculate the standard deviation of the series and the annualized volatility. Annualized volatility is the square rooted mean of 252 times the squared return or the realized kernel. The standard deviations are always lower than the annualized volatilities. The realized kernels have lower annualized volatilities and standard deviations than the squared returns since they ignore the overnight effects and are affected by less noise. The returns represent the close-to-close yield and the realized kernels the open-to-close variation. The annualized volatility of the realized measure is between 12 and 21%, while the squared returns show figures from 17 to 24%.

Table 1 Data description

4 Estimation results

Building upon the introduction of the GARCH-X process by Engle (2002b) to include realized measures as exogenous regressors in the conditional variance equation, Han and Kristensen (2014) and Han (2015) studied the asymptotic properties of this new specification with a fractionally integrated (nonstationary) process included as covariate. Moreover, Pedersen and Rahbek (2019) developed likelihood-ratio tests on the significance of the nonstationary covariate in the above mentioned model, while Halunga and Orme (2009) provided some asymmetry and nonlinearity tests. Lastly, Nakatani and Teräsvirta (2009) and Pedersen (2017) focused on the multivariate case, the so called extended constant conditional correlation, which allows for volatility spillovers and they developed inference and testing for the quasi-maximum likelihood estimator (QMLE) parameters (see also Ling and McAleer 2003 for the asymptotic theory of vector ARMA-GARCH processes). Within the HEAVY framework we first estimate the benchmark formulation as in Shephard and Sheppard (2010), that is, without asymmetries and power transformations, obtaining very similar results (available upon request).

Table 2 presents the estimation results for the chosen asymmetric power specifications. Wald and t tests are used to test the significance of the Heavy and Garch parameters, rejecting the null hypothesis at 10% in all cases. We should highlight the fact that since all the parameters take non-negative values, we use one-sided tests. Following Pedersen and Rahbek (2019), we first test for arch effects and after rejecting the conditional homoscedasticity hypothesis we apply one-sided significance tests of the covariates added to the estimated GARCH-X processes.

For the returns, we statistically prefer the double asymmetric power (DAP) specification since the own estimated power term is \(1.18\le \delta _{rr} \le 1.59\) in all cases (see Table 2, Panel A; see also the Wald tests for the power terms, Table 8, Panel A, in the “Appendix”, where the hypotheses of \(\delta _{rr}=1\) and \(\delta _{rr}=2\) are rejected in most cases).

Since when we tried to estimate both power terms simultaneously there was no convergence, we estimated them separately with a two-stage procedure, as follows: We firstly estimate for the realized measure univariate asymmetric power specifications, that is, without the effect from the returns, and, then, in the second stage we use the estimated powers in the returns equations. The sequential procedure produces power term values close to the ones estimated for the realized measure in the respective DAP specification (compare the values of \(\delta _{rR}\) with those of \(\delta _{RR}\) in Table 2, Panels A and B).

The Heavy asymmetry parameter, \(\gamma _{rR}\), is significant and around 0.05 (min. value) to 0.16 (max. value).Footnote 2 Although \(\alpha _{rr}\) is insignificant and excluded in all cases, the own asymmetry parameter (\(\gamma _{rr}\)) is significant with \( \gamma _{rr}\in [0.07,0.14]\). In other words, the lagged values of both powered variables, that is, the negative signed realized measure and the squared negative returns drive the model of the power transformed conditional variance of the returns. Moreover, the momentum parameter, \( \beta _{r}\), is estimated to be around 0.87–0.91. All six indices generated very similar DAP specifications.

Table 2 The DAP-HEAVY model. Estimation results

Similarly, for the realized measure the most preferred specification is the DAP one, as the estimated power is \(\delta _{RR}\in [1.05,1.28]\) in all cases (see Table 2, Panel B). The Wald tests of the power terms (see Panel B in Table 8) mostly reject the hypotheses of \(\delta _{RR}=1\) and \(\delta _{RR}=2\). The \(\delta _{Rr}\) is the power term of the returns estimated separately using the univariate specification, as discussed previously. Both Heavy parameters, \(\alpha _{RR}\) and \(\gamma _{RR}\), are significant: \(\alpha _{RR}\) is around 0.24 (min. value) to 0.31 (max. value), while \(\gamma _{RR}\), is between 0.001 and 0.07. Moreover, the cross asymmetry Garch parameter is always significant with \( \gamma _{Rr}\in [0.10,0.33]\). This means that the power transformed conditional variance of \(\widetilde{RM}_{t}\) is significantly affected by the lagged values of both powered variables: squared negative returns and realized measure. Lastly, the momentum parameter, \(\beta _{R}\), is estimated to be around 0.64–0.69.

Overall, our results show strong Heavy effects (captured by the \(\gamma _{rR}\) , \(\alpha _{RR}\) and \(\gamma _{RR}\) parameters), as well as asymmetric Garch influences (as the estimated \(\gamma _{rr}\) and \(\gamma _{Rr}\) are significant).

5 Long memory

In this section we extend the DAP-HEAVY framework by incorporating long memory.

5.1 Hyperbolic specification

First, we present the most general hyperbolic (HY) specification (see, for example, in the context of a univariate GARCH model Davidson 2004; Dark 2005, 2010 and Schoffer 2003):

$$\begin{aligned} (1-\beta _{r}L)\left[ (\sigma _{rt}^{2})^{\frac{\delta _{rr}}{2}}-\omega _{r}\right]&=A_{r}(L)(1+\gamma _{rr}s_{t})\left( r_{t}^{2}\right) ^{\frac{\delta _{rr}}{2}}+(\alpha _{rR}+\gamma _{rR}s_{t-1})L(RM_{t})^{\frac{\delta _{rR}}{2}}, \nonumber \\ (1-\beta _{R}L)\left[ (\sigma _{Rt}^{2})^{\frac{\delta _{RR}}{2}}-\omega _{R}\right]&=A_{R}(L)(1+\gamma _{RR}s_{t})(RM_{t})^{\frac{\delta _{RR}}{2}}+(\alpha _{Rr}+\gamma _{Rr}s_{t-1})L\left( r_{t}^{2}\right) ^{\frac{\delta _{Rr}}{2}}, \nonumber \\ \end{aligned}$$
(4)

with

$$\begin{aligned} A_{i}(L)=(1-\beta _{i}L)-(1-\phi _{i}L)[(1-\zeta _{i})+\zeta _{i}(1-L)^{d_{i}}], \quad i=r,R, \end{aligned}$$

where \(|\phi _{i}|<1, d_{i}\), is the fractional differencing parameter with \(0\le d_{i}\le 1\), and \(\zeta _{i}\), is the amplitude or hyperbolic parameter with \(0\le \zeta _{i}\le 1\). In other words, we have three long memory parameters, \(\phi _{i}, \zeta _{i}\), and \(d_{i}\). So now the Heavy parameters are six in total. Similarly, the Garch parameters are six.

If \(\zeta _{i}=0\) and \(\phi _{i}-\beta _{i}=\alpha _{ii}\), the HYDAP specifications reduce to the DAP ones (see Eq. (3)), since in this case \(A_{i}(L)=\alpha _{ii}L\).

The HY specification also nests the fractional integrated (FI) one (see, for example, Baillie et al. 1996; Tse 1998; Karanasos et al. 2004, and Conrad and Karanasos 2006) by imposing the restriction \(\zeta _{i}=1\). In this case \(A_{i}(L)\), in Eq. (4) becomes

$$\begin{aligned} A_{i}(L)=(1-\beta _{i}L)-(1-\phi _{i}L)(1-L)^{d_{i}}. \end{aligned}$$
(5)

Finally, note that the sufficient conditions of Dark (2005, 2010) for the non-negativity of the conditional variance of a HYAPARCH (\(1,d_{i},1\)) specification are: \(\omega _{i}>0, \beta _{i}-\zeta _{i}d_{i}\le \phi _{i}\le \frac{2-d_{i}}{3}\) and \(\zeta _{i}d_{i}(\phi _{i}-\frac{1-d_{i}}{2} )\le \beta _{i}(\phi _{i}-\beta _{i}+\zeta _{i}d_{i}), i=r,R\) (see also Conrad 2010). When \(\zeta _{i}=1\) they reduce to the ones for the FIGARCH (\( 1,d_{i},1\)) model (see Bollerslev and Mikkelsen 1996).

5.2 Estimated model

We further extend the HEAVY framework by incorporating long memory. For the returns the chosen specification is the FIOAP, whereas for the realized measure we select the HYDAP one. In all cases the power terms of the covariates are presented as fixed parameters since they are estimated separately using univariate models.

In the FIOAP specification for the returns (see Table 3, Panel A) \(\delta _{rr}\) is around 1.27–1.57 and \(d_{r}\) close to 0.50 (around 0.42–0.49). In most cases the Wald tests (see Table 9) reject the null hypotheses of \(d_{r}=0\) or 1 and \(\delta _{rr}=1\) or 2. The other two long memory parameters, \(\phi _{r}\) and the hyperbolic one, \(\zeta _{r}\), were insignificant and, therefore, they were excluded. The own asymmetry parameter, \(\gamma _{rr}\), is significant and around 0.26–0.60. The Heavy parameter, \(\alpha _{rR}\), is significant as well and with estimated values between 0.04 and 0.15. In other words, the lagged values of both powered variables (the squared returns and the realized measure) drive the model of the power transformed conditional variance of returns.

Table 3 The HYDAP-HEAVY model. Estimation results

In the HYDAP specification for the realized measure (see Table 3, Panel B) \( \delta _{RR}\) is around 1.13–1.43. There is also strong evidence of hyperbolic memory as not only \(d_{R}\) but also \(\zeta _{R}\) is significant, with estimated values 0.57–0.68 and 0.86–0.92, respectively, while the Wald tests always reject the null of either a FIDAP (\(H_{0}: \zeta _{R}=1\)) or a DAP formulation (\(H_{0}: \zeta _{R}=0\)). The own and the cross (Garch) asymmetric parameters, \(\gamma _{RR}\in [0.06,0.24]\) and \( \gamma _{Rr}\in [0.08,0.22]\), are also significant. This means that the power transformed conditional variance of \(\widetilde{RM}_{t}\) is significantly affected by the lagged values of both powered variables: realized measure and squared negative returns.

All in all, our long memory extension of the asymmetric power specification proves once more that both powered conditional variances receive the notable impact from the first lags of both power transformed variables. Intriguingly, this result stands in sharp contrast to the benchmark HEAVY model, where the intradaily realized measure is not affected by squared daily returns and the daily returns conditional variance is only determined by the lagged realized measure and the lagged returns variance since the asymmetries from negative returns are completely neglected.

6 Forecasting performance

Following the estimation of all possible extensions to the HEAVY framework of equations, we perform multistep-ahead out-of-sample forecasting in order to compare the forecasting accuracy of the different specifications proposed in this study. We re-estimate the benchmark model, the DAP process and its long memory extension, for the shortened sample from 3/1/2000 up to 30/12/2016 (4,248 observations: in-sample estimation) and keep the remaining 231 observations from 3/1/2017 to 30/11/2017 for out-of-sample comparison purposes. With the shortened sample, for each specification we estimate the 1-, 5-, 20-, 120- and 231-step-ahead forecasted (power transformed) conditional variances and calculate the standard measure of forecasting performance, that is the Root Mean Square Error (RMSE) based on the out-of-sample observations up to 30/11/2017.

The results, presented in Table 4 for the SP index (similar forecasting results for the other five indices available upon request), clearly show the preference for our extensions over the benchmark models across all time horizons. Regarding the returns equations (see Panel A), the FIOAP is the best performing specification in the 1-day forecasted variance, while for the one-week and up to the eleven-month-forecasts the DAP formulation dominates the alternative extensions with the lowest RMSE. In the realized measure equation, we get the best 1-step-ahead forecasting performance from the DAP specification (see Panel B). For the remaining time horizons the preferred formulation is the Hyperbolic one.

Overall, the extensions of the HEAVY specification proposed in our study perform significantly better than the benchmark one in the short- and the long-term horizons.

Table 4 Root mean square error (RMSE) of m-step ahead forecasts for SP as a Ratio of the benchmark model

Value at risk application

The forecasting performance of the proposed models can be further examined in a real-world business operation. Value-at-Risk (VaR) is a daily metric for market risk measurement, defined as the potential loss in the value of a portfolio, over a pre-defined holding period, for a given confidence level. The most important input in the VaR calculation is the 1-day volatility forecast of the risk factor relevant to the trading portfolio under scope. We directly apply our conditional variance forecasts in a long position to a portfolio of 10,000 S&P 500 index contracts starting from 30/12/2016. We calculate 231 daily VaR values from 3/1/2017 to 30/11/2017 using the 1-day conditional variance forecasts of each model for returns and realized measure (6 models in total). Given that the conditional mean return is zero and the returns follow the normal distribution, we first calculate the 1-day VaR with a 99% and 95% confidence level. According to the parametric approach to VaR calculation, we multiply the daily portfolio value by the 1-day conditional volatility forecast (equal to the square root of the conditional variance forecast) and the left quantile at the respective confidence level of the normal distribution (the z-scores for 99% and 95% confidence level are 2.326 and 1.645, respectively). Secondly, we calculate the daily realized return of the portfolio (gains and losses) and, thirdly, we perform the backtesting exercise comparing the 231 realized returns to the respective 1-day VaR for the 99% and 95% confidence levels. In the cases where the realized loss exceeds the respective day’s VaR value, we record it as an exception in the backtesting procedure, meaning that the VaR metric fails to cover the loss of the specific day’s portfolio value.

Table 5 reports the backtesting results that prove the superiority of the long memory asymmetric power HEAVY models in forecasting the 1-day-ahead volatility since they give fewer exceptions than the VaR estimates of the benchmark models for both confidence levels applied. The robustness check of the VaR model through the backtesting exercise results in the critical number of exceptions which should be, first, in line with the selected confidence level (the 99% and 95% confidence levels allow for 1 and 5 exceptions, respectively, every 100 days) and, second, the lowest possible. The higher number of exceptions means higher market risk capital requirements for financial institutions. This is the way regulators heavily penalize banks’ internal VaR models that do not cover trading losses.

Table 5 VaR backtesting for the SP portfolio. No. of exceptions of the 99% and 95% VaR

7 Structural breaks

In this section, we investigate the impact of structural changes (detected in the two time series used) on the Heavy and Garch estimated parameters. The time-varying behavior of these parameters can be significant, specifically around a financial crisis break, indicative of the crisis effects on the volatility pattern. As an alternative to the long memory specification, we incorporate structural break dummies in the DAP specification. We identify the structural breaks in the two volatility series for SP, focusing mainly on the recent global financial crisis, and study their impact on the HEAVY framework. We test for structural breaks by employing the methodology in Bai and Perron (1998, 2003a, b), who address the problem of testing for multiple structural changes in a least squares context and under very general conditions on the data and the errors. In addition to testing for the presence of breaks, these statistics identify the number and location of multiple breaks. So, we identify the structural breaks in the two powered series (power transformations of squared returns and realized measure) with the Bai and Perron methodology (see Table 6 and Figs. 1, 2). We use the breaks of the two series in order to build the slope dummies for the various parameters. We observe that a break date for the recent financial crisis of 2007/2008 is detected, so that we can focus on the crisis effect. We also detect one break date before and one after the crisis.

Table 6 The break dates for SP
Fig. 1
figure 1

SP power transformed squared returns with breaks

Fig. 2
figure 2

SP power transformed realized kernel with breaks

We present the estimation results for the SP index in Table 7 (similar results for the other five indices available upon request), where we choose to use the 3 breaks of the power transformed squared returns series: (1) 02/04/2003: pre-crisis break, (2) 31/10/2007: crisis break and (3) 30/11/2011: post-crisis break. The three dummy variables multiplied by the respective Heavy and Garch parameters (to construct the slope dummies) are defined as follows: \(D_{i,t}=0\), if \(t<T_{i}\) and \(D_{i,t}=1\), if \( t\geqslant T_{i}, i=(1),(2),(3)\) the three break dates. In the returns equation, the own asymmetry (Garch) parameter, \(\gamma _{rr}\), receives a decreasing impact (\(-\,0.04\)) from the pre-crisis break, while the cross asymmetry (Heavy) parameter is increased by the crisis dummy (\(+\,0.05\)) and decreased by the post-crisis dummy (\(-\,0.04\)). Regarding the realized measure equation, the Heavy impact, as captured by the own asymmetry \(\gamma _{RR}\), rises with the crisis break, and the Heavy parameter \(\alpha _{RR}\) falls post-crisis, whereas the Garch asymmetric influence (captured by \( \gamma _{Rr} \)) falls after the pre-crisis break.

Overall, our finding is that the dummy parameters corresponding to the 2003 and 2011 breaks are negative, whereas the ones for the 2007/2008 crisis are positive.

Table 7 The DAP-HEAVY model with structural breaks for SP. Estimation results

8 Dynamic correlations

Lastly, we estimated the bivariate system of the extended HEAVY models (Eqs. (3)–(4)) with four alternative correlation models: the Constant Conditional Correlations (CCC) model of Bollerslev (1990), the Dynamic Conditional Correlations (DCC) specification of Engle (2002a), the Asymmetric DCC process (see Cappiello et al. 2006) and the Dynamic Equicorrelations (DECO) model of Engle and Kelly (2012).

The conditional covariance matrix for the 2-dimensional vector \(\mathbf {r} _{t}=[r_{t},\widetilde{RM}_{t}]^{\prime }, \mathbb {V}ar(\mathbf {r} _{t}\left| F_{t-1}\right. )=\mathbf {H}_{t}\), is given by:

$$\begin{aligned} \mathbf {H}_{t}=\left[ \begin{array}{cc} \sigma _{rt}^{2} &{}\quad \sigma _{rR,t} \\ \sigma _{rR,t} &{}\quad \sigma _{Rt}^{2} \end{array} \right] , \end{aligned}$$

where \(\sigma _{it}^{2},i=r,R\), are the conditional variances (see Eq. (2)), and \(\sigma _{rR,t}\) denotes the conditional covariance, that is \(\sigma _{rR,t}=Cov(r_{t},\widetilde{RM}_{t}\left[ F_{t-1}\right] )\). \(\mathbf {H}_{t}\) can be written as:

$$\begin{aligned} \mathbf {H}_{t}=\mathbf {D}_{t}\mathbf {R}_{t}\mathbf {D}_{t}, \end{aligned}$$

while \(\mathbf {D}_{t}\) is a diagonal matrix of standard deviations, that is \( \mathbf {D}_{t}=dg(\mathbf {H}_{t}^{\frac{1}{2}})\), and \(\mathbf {R}_{t}\) is the conditional correlation matrix with unit diagonal elements and off-diagonal elements given by \(\rho _{rR,t}=\sigma _{rR,t}/\sigma _{rt} \sigma _{Rt}\). In our HEAVY model in Eqs. (3)–(4) we, initially, assumed that the conditional covariances and dynamic correlations are zero: \(\rho _{rR,t}=\sigma _{rR,t}=0\) for all t. This implies that, \( \mathbf {R}_{t}=\mathbf {I}_{2}\) (the identity matrix) and \(\mathbf {H}_{t}\) is a diagonal matrix (\(\mathbf {H}_{t}=\mathbf {D}_{t}^{2}\)). Allowing for non-zero conditional correlations does not alter our results because the estimation of various non-zero correlation models-the four alternative specifications, namely the CCC, DCC, ADCC, and DECO-is a two-step procedure, where in the first step the parameters in the \(\mathbf {D}_{t}\) matrix are estimated (using Eqs. (3)–(4)), while the second step consists of estimating the (off-diagonal) parameters in \(\mathbf {R}_{t}\). To see this more explicitly, we present the quasi-likelihood function (QL). But first, note that \(\mathbf {r}_{t}\) can be written as: \(\mathbf {r}_{t}=\mathbf {D}_{t}\mathbf {e}_{t}\) or equivalently, \(\mathbf {e}_{t}=\mathbf {D}_{t}^{-1} \mathbf {r}_{t}\), where \(\mathbf {e}_{t}=[e_{rt},e_{Rt}]^{\prime }\). Then QL is given by

$$\begin{aligned} QL&=QL_{1}+QL_{2} \\&=-\underset{QL_{1}}{\underbrace{\sum \limits _{t=1}^{T}(n\log (2\pi )+2\log \left| \mathbf {D}_{t}\right| + \mathbf {r}_{t}^{\prime }\mathbf {D}_{t}^{-2}\mathbf {r}_{t})}}-\underset{QL_{2}}{\underbrace{\sum \limits _{t=1}^{T}(\log \left| \mathbf {D} _{t}\right| +\mathbf {e}_{t}^{\prime }\mathbf {R}_{t}^{-1}\mathbf {e}_{t}+ \mathbf {e}_{t}^{\prime }\mathbf {e}_{t})}}. \end{aligned}$$

Thus in the first-step the parameters of the bivariate (HY)DAP-HEAVY process, see again Eqs. (3)–(4), are estimated using \( QL_{1}\), and in the second-step we estimate the off-diagonal element in \( \mathbf {R}_{t}\) using the standardized residuals: \(\widehat{\mathbf {e}}_{t}= \widehat{\mathbf {D}}_{t}^{-1}\mathbf {r}_{t}\) in \(QL_{2}\). In all cases, the three alternative dynamic models (DCC, ADCC and DECO) estimate the average conditional correlations for the two volatility measures around 0.80–0.90 similar to the constant correlation values that we get from the CCC model.

All in all, the conditional correlations extension provides identical results for the conditional variance equations and estimates similar correlation levels for all indices formulation (results not reported but are available upon request).

9 Conclusions

Our study examined the HEAVY model and extended it by taking into consideration leverage, power transformations and long memory characteristics. For the realized measure our empirical results favor the most general hyperbolic asymmetric power specification, where the lags of both powered variables—squared negative returns, and realized kernel—move the dynamics of the power transformed conditional variance of the latter. Similarly, modeling the returns with an asymmetric power process, we found that not only the powered realized measure, but the power transformed squared negative returns, as well, help in forecasting the conditional variance of the latter.

The long memory (hyperbolic or fractionally integrated) of volatility, its asymmetric response to negative and positive shocks and its power transformations ensures the superiority of our contribution, which can be implemented in the areas of asset allocation and portfolio selection, as well as in several risk management practices. Further, for the U.S. stock index we proved the forecasting superiority of our extensions over the benchmark HEAVY model through the out-of-sample forecasting across multiple short- and long-term horizons. Finally, the detection of structural breaks and the inclusion of break dummies in the asymmetric power formulation capture the time-varying pattern of the parameters, as the break corresponding to the financial crisis of 2007/2008, in particular, increases the values of the parameters.

Our empirical findings on the nexus between low-frequency daily squared returns and high-frequency intradaily realized measures provide a volatility forecasting framework with major implications for policymakers and market practitioners, from investors, risk and portfolio managers up to financial chiefs, leaving ample room for future research on further HEAVY model extensions. Thereupon, policymakers and market players should use our HEAVY framework to closely watch and forecast financial volatility patterns in the process of devising drastic policies, enforcing the financial system’s regulations, deciding on asset allocation, hedging strategies and investment projects. Future research should focus on extending the multivariate HEAVY formulation of Noureldin et al. (2012) with long memory, asymmetries and power transformations, as in the recent study of Dark (2018), who uses a long memory multivariate GARCH model, or Opschoor et al. (2018) within the Generalised Autoregressive Score (GAS) process of Creal et al. (2013).