1 Introduction

Combining forecasts obtained by a hybrid approach has long been known to improve forecast accuracy. Because each model whose forecasts are to be combined may consider different predictors and make different assumptions about the underlying data generating process (DGP) and distributions, averaging the individual forecasts broadens the information embedded and may offset individual model biases as well; see, e.g., Zhang (2003) for additional reasons. In a one-dimensional (univariate) setting the hybridization of forecasts obtained from time series models/methods has been well researched; see, e.g., Aijaz and Agarwal (2020), Aydin and Isci Güneri (2015), Pai and Lin (2005), and Valenzuela et al. (2008) among others.

Recently, however, the enhanced availability of large databases with many time series variables as potential predictors has stimulated interest in high-dimensional forecasting. Excellent reviews of the state-of-the-art in economics and finance are given by Fan et al. (2011) and Lee (2011). Furthermore, the introductory section of the paper by Uematsu and Tanaka (2019) provides an extensive survey of the current literature on high-dimensional forecasting and variable selection. Along this direction, conditional quantile averaging procedures in conjunction with dimension reduction methods have been considered. Examples include quantile forecasting the S&P 500 equity risk premium (see, e.g., Konzen and Ziegelmann, 2016; Lima and Meng,2017; Meligkotsidou et al.,2014; De Gooijer and Zerom, 2020) inflation forecasting (Garcia et al., 2017; De Gooijer and Zerom, 2019), quantile forecasting of macroeconomic time series (Manzan, 2015; Jiang et al., 2018), and realized volatility forecasting (Meligkotsidou et al., 2019).

Most of these studies are limited by the assumption that the high-dimensional data set comes from a linear DGP, and forecasts are obtained from finite-dimensional parametric models. One exception is the study by De Gooijer and Zerom (2020). Using a large data set of predictors, involving both macroeconomic predictors and technical indicators, these authors showed that combining quantile forecasts from parametric and semiparametric methods (called hybrid quantile averaging) can be useful in practice. Semiparametric models are infinite-dimensional. As a result, the quantile forecasts are less prone to model misspecification as may happen with forecasts obtained from parametric models. In terms of quantile forecast performance the hybrid method works well in identifying relevant predictors, and more importantly results in improved combined one-step ahead forecasts over alternative (no hybridization) conditional quantile methods.

While the empirical findings by De Gooijer and Zerom (2020) are interesting, they are sample-specific which makes it difficult to generalize the results to novel situations. In addition, the focus on one-step ahead out-of-sample prediction is somewhat restrictive. Multi-step ahead out-of-sample quantile forecasting results can provide more insight in the relative performance of the hybrid quantile averaging method over a longer time period. Unfortunately, a general theoretical comparison of quantile forecasts obtained from hybrid and non-hybrid methods is not feasible due to complicated interactions of nonlinear parameter estimation methods, sparse modelling, and correlated forecasts. Indeed, further insights regarding the performance of the hybrid quantile averaging method can only be obtained via a Monte Carlo simulation study. In the first half of this paper, we provide such a study.

In the second half of the paper, we evaluate the out-of-sample multi-step ahead forecasting performance of the hybrid conditional quantile method and five alternative, non-hybrid, forecasting methods via an empirical application. More specifically, we report out-of-sample conditional quantile forecasts for the risk premium of the monthly S&P 500 index using a large data set of macroeconomic predictors. A simple equal-weighted combination of parametric and semiparametric conditional quantile forecasts is adopted as a benchmark. Our main finding is that that hybridization can be an effective way to improve quantile forecasts as compared to non-hybrid methods.

The rest of this paper unfolds as follows. First, for ease of reference, Sect. 2 summarizes the main features of the semiparametric and hybrid conditional quantile averaging methods. Section 3 describes six quantile forecasting methods used in the Monte Carlo experiment. Section 4 introduces the large data set of exogenous predictors via semiparametric and parametric GARCH model specifications. This section also discusses the evaluation of quantile forecasts. Section 5 contains the simulation results, while Sect. 6 presents the empirical results. Finally, Sect. 7 contains some concluding remarks.

2 Conditional Quantile Averaging

2.1 Some Notation

To simplify presentation, we only discuss the case of making \(h=1\) step ahead quantile forecasts, where h denotes the forecast horizon. The extension to multi-horizon \((h>1)\) conditional forecasting is straightforward. Let \(\{Y_{t}\}^{n}_{t=1}\) be a set of observations obtained from a strictly stationary time series process \(\{Y_{t}, t \in {\mathbb {Z}}\}\) that depends on \(q_{y} \ge 1\) past values of \(Y_{t}\), and on a \(q_{z}\)-dimensional vector \({\mathbf {Z}}_{t}=(Z_{1,t},\ldots ,Z_{q_{z},t})^{\text {{T}}}\) that consists of exogenous, possibly lagged, stationary time series. Let \({\mathbf {X}}_{t}= ({\mathbf {Y}}^{\text {{T}}}_{t-1},{\mathbf {Z}}^{\text {{T}}}_{t})^{\text {{T}}} \in {\mathbb {R}}^{q}\) where \({\mathbf {Y}}_{t-1} = (Y_{t-1}, \ldots , Y_{t-q_{y}})^{\text {{T}}}\), and \(q = q_{y}+q_{z}\). Given the observed data set \(\{({\mathbf {X}}_{t}, Y_{t})\}^{n}_{t=1}\), the one-step ahead out-of-sample \(\tau \)th \((0<\tau <1)\) conditional quantile of the unobserved random variable \(Y_{n+1}\) given \({\mathbf {X}}_{t}= {\mathbf {X}}_{n}\) will be denoted by \(Q_{Y_{n+1}}(\tau |{\mathbf {X}}_{n})\).

For convenience of later analysis, we define the associated process \(\{({\mathbf {X}}_{t},Y^{*}_{t})\}\in {\mathbb {R}}^{q}\times {\mathbb {R}}\) where the components of the predictor vector \({\mathbf {X}}_{t}=(X_{1,t},\ldots ,X_{q,t})^{\text {{T}}}\) are given by

$$\begin{aligned} X_{j,t}= {\left\{ \begin{array}{ll}Z_{j,t}, &{} j=1, \ldots , q_{z}, \\ Y_{t+(j-q_{z}-1)}, &{} j=q_{z}+1, \ldots , q, \end{array}\right. } \end{aligned}$$

and

$$\begin{aligned} Y^{*}_{t}=Y_{t+q_{y}}, \quad t=1, \ldots , N, \quad (N\equiv n-q_{y}). \end{aligned}$$

Hence, we obtain \(Q_{Y_{n+1}}(\tau |{\mathbf {X}}_{n})\) directly via \(Q_{Y^{*}_{t}}(\tau {\mathbf {X}}_{n-q_{y}+1}) =\inf \{y:F_{Y^{*}_{t}}(y|{\mathbf {X}}_{n-q_{y}+1}) \ge \tau \}\), where \(F_{Y^{*}_{t}}(\cdot |{\mathbf {x}})\) is the conditional distribution of \(Y^{*}_{t}\) given \({\mathbf {X}}_{t}={\mathbf {x}}\). Furthermore, for a given quantile level \(\tau \), we define

$$\begin{aligned} {\mathcal {M}}_{\tau } = \{j:Q_{Y^{*}_{t}}(\tau |{\mathbf {x}}) ~ \hbox {functionally depends on} ~ x_{j} \} \end{aligned}$$
(1)

as the set of relevant predictors that truly influence \(Q_{Y^{*}_{t}}(\tau |{\mathbf {x}})\). We assume that sparsity of the predictors exists, i.e., \(|{\mathcal {M}}_{\tau }|\ll q\) which means that only a small subset of predictors influences \(Q_{Y^{*}_{t}}(\tau |{\mathbf {x}})\).

2.2 Semiparametric (SP) Conditional Quantile Averaging

For q large (many predictors), the semiparametric (SP) quantile prediction averaging method consists of the following three steps.

  1. 1.

    Using gradient boosting, obtain \({\widehat{\theta }}_{0,j}(\tau |x_j)\) as a nonparametric estimate of a marginal conditional quantile, defined as

    $$\begin{aligned} \theta _{j}(\tau |x_{j}) = \inf \{y:F_{Y^{*}_{t}}(y|X_{j,t}=x_{j})\ge \tau \},\quad (j=1,\ldots ,q). \end{aligned}$$

    Next, approximate \(Q_{Y^{*}_{t}}(\tau |{\mathbf {x}})\) by a linear combination of \({\widehat{\theta }}_{0,j}(\tau |x_j)\)’s. That is

    $$\begin{aligned} {\widetilde{Q}}_{Y^{*}_{t}}(\tau |{\mathbf {x}}) = \gamma _{0}(\tau ) + \smash {\sum ^{q}_{j=1}} \gamma _{0,j} (\tau ) {\widehat{\theta }}_{0,j}(\tau |x_{j}), \end{aligned}$$
    (2)

    where \(\gamma _{0,j}(\tau )\) are weights depending on the quantile level \(\tau \).

  2. 2.

    Using \({\widehat{\theta }}_{0,j}(\tau |X_{j,t})\) as regressors and applying penalized quantile regression, obtain the set \(\widehat{\widetilde{{\mathcal {M}}}}_{\tau }=\{j:{\widehat{\gamma }}_{0,j}(\tau )\ne 0\}\). This is an estimate of the set of relevant one-step ahead predictors \(\widetilde{{\mathcal {M}}}_{\tau }=\{j:\gamma _{j}(\tau )\ne 0\}\) identified by the averaging model in (2). Then, compute an estimate \(\widehat{\varvec{\gamma }}(\tau )=({\widehat{\gamma }}_{1}(\tau ),\ldots ,{\widehat{\gamma }}_{q}(\tau ))^{\text {{T}}}\) of the \(q\times 1\) parameter vector of optimal weights \(\varvec{\gamma }(\tau )=(\gamma _{1}(\tau ),\ldots ,\gamma _{q}(\tau ))^{\text {{T}}}\) by considering the following weighted \(L_{1}\)-penalized quantile estimator

    $$\begin{aligned} \widehat{\varvec{\gamma }}(\tau ) = \arg \min _{\varvec{\gamma }(\tau )} \Big \{ \sum ^{N}_{t=1}\big (\rho _{\tau }(Y^{*}_{t}-\gamma _{0}(\tau )-\widehat{\varvec{\theta }}^{\text {{T}}}_{0,t}\varvec{\gamma }(\tau )\big ) +\lambda \sum ^{q}_{j=1}w_{j}|\gamma _{j}(\tau )|\Big \}, \end{aligned}$$
    (3)

    where\(\rho _{\tau }(z)=\{\tau -{\mathbb {I}}(z<0)\}z\) is the quantile ‘tick’ loss function, \({\mathbb {I}}(\cdot )\) the indicator function, \(\lambda >0\) is a tuning (or penalization) parameter, \(w_{j}=|{\widehat{\gamma }}^{*}_{j}|^{-c}\), \(c>0\), and \({\widehat{\gamma }}^{*}_{j}\) is an initial parameter estimate. Observe that for \(w_{j}=1\), \(\forall j\), (3) becomes the usual LASSO (L) penalty. For the adaptive LASSO (aL) penalty function used in this paper, the typical choice is given by \(w_{j}=({\widehat{\gamma }}_{j}^{\text {{(L)}}}+1/N)^{-1}\).

  3. 3.

    Finally, using all the ingredients given above, compute the one-step ahead penalized averaged (PA) and aL-based quantile forecast of \(Y_{n+1}\), i.e.,

    $$\begin{aligned} \widehat{{\widetilde{Q}}}_{Y^{*}_{t}}^{\text {{(PA-aL)}}}(\tau |{\mathbf {X}}_{n-q_{y}+1})= {\widehat{\gamma }}_{0}(\tau ) + \sum _{u\in \widehat{\widetilde{{\mathcal {M}}}}^{(\text {{aL}})}_{\lambda ,\tau } } {\widehat{\gamma }}^{\text {{(aL)}}}_{u}(\tau ){\widehat{\theta }}^{\text {{(aL)}}}_{u}(\tau |X_{u,n-q_{y}+1}), \end{aligned}$$
    (4)

    where \(\widehat{\widetilde{{\mathcal {M}}}}^{(\text {{aL}})}_{\lambda ,\tau }\) is an estimate of the set of nonzero quantile predictors \(\widetilde{{\mathcal {M}}}^{\text {{(aL)}}}_{\lambda ,\tau \!}=\!\{u:\theta _{u}(\tau |X_{u,n-q_{y}+1})\ne 0\}\) associated with \(\widehat{{\widetilde{Q}}}_{Y^{*}_{t}}^{\text {{(aL)}}}(\tau |{\mathbf {X}}_{n-q_{y}+1})\), and where \({\widehat{\gamma }}^{\text {{(aL)}}}_{u}(\tau )\) is the uth aL-based quantile regression estimate, or weight.

We adopt a prediction-based criterion for the selection of \(\lambda \). For a given \(\lambda \), let \(\widehat{\varvec{\gamma }}_{\lambda }(\tau )=\big ({\widehat{\gamma }}_{1,\lambda }(\tau ),\ldots ,{\widehat{\gamma }}_{q,\lambda }(\tau )\big )^{\text {{T}}}\) be the penalized estimates and \(|\widehat{\widetilde{{\mathcal {M}}}}_{\lambda ,\tau }|\) be the corresponding number of non-zero estimates. We choose \(\lambda \) by minimizing the following high-dimensional criterion

$$\begin{aligned} \hbox {QBIC}_{\tau }(\lambda ) = \log \Big (\sum ^{N}_{t=1}\rho _{\tau }\big (Y^{*}_{t}-{\widehat{\gamma }}_{0,\lambda }(\tau ) - \widehat{\varvec{\theta }}_{0,t}^{\prime }\widehat{\varvec{\gamma }}_\lambda (\tau ) \big ) \Big ) + |\widehat{\widetilde{{\mathcal {M}}}}_{\lambda , \tau }|\frac{\log N}{2N}C_{N}, \end{aligned}$$
(5)

where \(C_{N}=\log (q)\log (\log (N))/\log (N)\).

2.3 Hybrid (H) Conditional Quantile Averaging

In addition to the forecasting approach of Section 2.2, there are also situations were a marginal conditional quantile forecast of \(Y_{t+1}\) can be computed from a fully specified parametric DGP. Combining such a parametric quantile with the marginal semiparametric conditional quantile \({\widetilde{Q}}_{Y^{*}_{t}}(\tau |{\mathbf {x}})\) in (2) is the first step in a so-called hybrid (H) quantile averaging method. To formalize, we assume that a parametric model is available with the corresponding estimated marginal quantile functions \(\{{\widehat{\theta }}_{1}(\tau |x_{1,t}),\ldots , {\widehat{\theta }}_{q^{*}}(\tau |x_{q^{*},t})\}\) \((q^{*}\ge 1)\). Then, we define a hybrid extension of \({\widetilde{Q}}_{Y^{*}_{t}}(\tau |{\mathbf {x}})\) by

$$\begin{aligned} {\widetilde{Q}}^{\text {{(H)}}}_{Y^{*}_{t}}(\tau |{\mathbf {x}}) = \gamma _{0}(\tau )+ \underbrace{\sum ^{q}_{j=1}\gamma _{0,j}(\tau ) {\widehat{\theta }}_{0, j}(\tau |x_{j})}_\text {semiparametric} +\underbrace{\sum ^{q^{*}}_{j=1}\gamma _{j}(\tau ) {\widehat{\theta }} _{j}(\tau |x_{j})}_\text {parametric}, \end{aligned}$$
(6)

where \(\{\gamma _{0,j}(\tau )\}^{q}_{j=1}\) and \(\{\gamma _{j}(\tau )\}^{q^{*}}_{j=1}\) are sets of weights depending on the quantile level \(\tau \), which we summarize by the set \(\{\gamma _{u}^{\text {{(H)}}}(\tau )\}^{q + q^{*}}_{u=1}\) with \(q + q^{*}\equiv q_{y}+q_{z}+q^{*}\). The design matrix \({\mathbf {X}}_{t}\) consists of the set of potential predictors:

$$\begin{aligned} \{{\widehat{\theta }}_{0, 1}(\tau |X_{1,t}), \ldots ,{\widehat{\theta }}_{0, q}(\tau |X_{q,t}), {\widehat{\theta }}_{1}(\tau |X_{1,t}),\ldots , {\widehat{\theta }}_{q^{*}}(\tau |X_{q^{*},t}) \}; \end{aligned}$$
(7)

its uth component will be denoted by \({\widehat{\theta }}^{\text {{(H)}}}_{u}(\tau |X_{u,t})\) \((u=1,\ldots ,(q+q^{*}))\).

Similar to (3), let \(\{{\widehat{\gamma }}_{u}^{{\hbox {(H-aL)}}}(\tau )\}^{q+q^{*}}_{u=1}\) denote the set of hybrid quantile regression estimates of \(\{\gamma _{u}^{\text {{(H)}}}(\tau )\}^{q+q^{*}}_{u=1}\). Here, we select the tuning parameter \(\lambda \) by minimizing the hybrid version of (5), and for the weights \(w_{u}\) we use the aL regularization technique. Let \(\widehat{\widetilde{{\mathcal {M}}}}^{\hbox {{(H-aL)}}}_{\tau }\) denote an estimate of the resulting set of selected nonzero quantile predictors \(\widetilde{{\mathcal {M}}}^{\text {{(H)}}}_{\tau }=\{u:\theta _{u}(\tau )\ne 0\}\) associated with \({\widetilde{Q}}^{\text {{(H)}}}_{Y^{*}_{t}}(\tau |{\mathbf {X}}_{n-q_{y}+1})\). Then, similar to (4) the one-step ahead hybrid and aL-based quantile forecast of \(Y_{n+1}\) is given by

$$\begin{aligned} \widehat{{\widetilde{Q}}}^{\text {{(H-aL)}}}_{Y^{*}_{t}}(\tau |{\mathbf {X}}_{n-q_{y}+1})={\widehat{\gamma }}_{0}(\tau )+ \sum _{u \in \widehat{\widetilde{{\mathcal {M}}}}^{\text {{(H-aL)}}}_{\tau }}{\widehat{\gamma }}_{u}^{\text {{(H-aL)}}}(\tau ) {\widehat{\theta }}_{u}^{\text {{(H)}}}(\tau |X_{u,n-q_{y}+1}). \end{aligned}$$
(8)

3 Forecasting Procedures

In Sect. 5, we report recursive prediction results for the following six forecasting methods.

  1. (1)

    Equal Weighting (EW): \({\widehat{Q}}_{\tau ,t}^{{\hbox {(EW)}}}=(1/q)\sum ^{q}_{j=1}{\widehat{\theta }}_{j,t}(\tau |x_{j,t})\), where \({\widehat{\theta }}_{j,t}(\tau |x_{j,t})\) is the conditional quantile estimates of \(\theta _{j}(\tau |x_{j})\), and \(q=q_{y}+q_{z}\).Footnote 1 So, the final EW estimator does not discard irrelevant predictors.

  2. (2)

    Penalized Quantile Averaging with L-based penalty (PA-L): Forecasts are based on weighted averaging similar to the EW method, but the weights are data driven. Thus, PA-L only considers the second component of (6) with \(q^{*}\) marginal conditional quantiles.

  3. (3)

    Penalized Averaging of nonparametric quantiles with aL-based penalty (PA-aL): The conditional quantile forecast is defined by (4), and the set of relevant predictors consists of q marginal conditional quantiles. Thus, PA-aL only considers the first component of (6).

  4. (4)

    Penalized Linear Quantile Regression (P-Lin-QR) with L-based penalty: This is the well-known quantile estimation method of Koenker (2005). It does not allow for nonlinearities in the predictors. In this case the forecast results are based on a set of q marginal conditional quantiles.

  5. (5)

    Additive QR (Ad-QR): This method is a generalization of method (3) where each additively entered predictor has a non-parametric (possibly nonlinear) effect. Unlike methods (2)–(4), we do not conduct predictor selection for Ad-QR. Instead, we use the predictor selection results of method (3) and implement a low dimensional additive model.

  6. (6)

    Hybrid with aL-based penalty (H-aL): The conditional quantile forecast is defined by (8). In this case the forecasting results are based on a set with \((q+q^{*})\) marginal conditional quantiles. In Sect. 4, we extend the set of potential predictors with another set of marginal conditional quantiles representing a parametric approximation for the GARCH-type model under study.

Note, methods (1) – (3) and (6) are logically connected in the sense that they are based on the concept of averaging forecasts, assigning weights (equal or varying) to all predictors. Forecasting methods (4) and (5) are popular approaches used in the quantile regression literature. Unlike methods (4) and (5), penalized averaging does not assume that \(Q_{Y_{n+h}}(\tau |{\mathbf {X}}_{n})\) has an additive structure.

4 Simulation Design

In Sect. 5, we evaluate/compare the multi-step out-of-sample forecasting performance of the quantile prediction methods using a recursive forecasting strategy with an increasing window size for parameter re-estimation. More concretely, the \(h=1\) conditional quantile forecast is based on a method with parameter estimates using \(\{Y_{t}\}^{300}_{t=1}\) observations, the second one on a parameter vector using \(\{Y_{t}\}^{301}_{t=1}\) observations, and so on. The last one is based on a parameter vector using \(\{Y_{t}\}^{399}_{t=1}\) observations. Thus, in total there are \(N^{*}=100\) one-step ahead out-of-sample conditional quantile forecasts, and where at each step of the recursive forecasting strategy the available in-sample size N is updated with one observations, starting from \(N=300\) till \(N=399\). No lagged values of \(Y_{t}\) are included in the set of predictors, i.e., \(q_{y}=0\). This implies that at each step of the recursive strategy \(n\equiv N\) and \(q\equiv q_{z}\). For \(h>1\), the recursive scheme starts at observation number \(N-h+1\) and ends at \(N-h+N^{*}\).

4.1 GARCH Specifications

Semiparametric: For the SP design, we consider an absolute value GARCH (avGARCH) model (Taylor 1986; Schwert 1990) of order (1, q) with exogenous variables \(\{Z_{j,t}\}^{q}_{j=1}\), i.e.

$$\begin{aligned} Y_{t+1}&=\sigma _{t+1}F^{-1}_{\nu }(V_{t+1}), \end{aligned}$$
(9)
$$\begin{aligned} \sigma _{t+1}&= \alpha _{0} + \sum ^{q}_{j=1}g_{j}(Z_{j,t})+\alpha _{1}|Y_{t}|+\beta \sigma _{t}, \end{aligned}$$
(10)

where \(g_{j}(\cdot )\) is an unknown real-valued function. In (9), we assume that \(\{V_{t+1}\}\) are i.i.d. random variables from U(0, 1) and \(F_{\nu }^{-1}(\cdot )\) is the inverse of the cumulative distribution function (CDF) of the Student \(t_{\nu }\) distribution with \(\nu =4\) degrees of freedom. So, the model has heavier tails than a traditional GARCH model with Gaussian innovations.

Assuming that the root of \(1-\beta z=0\) lies outside the unit circle, \(\sigma _{t+1}\) can be written in an infinite-dimensional ARCH(\(\infty \)) representation. Under some regularity conditions (Xiao and Koenker, 2009) the coefficients of this model decrease geometrically. Let p \((p\ll n-h)\) denote a truncation parameter which depends on the number of in-sample observations. Then, the finite-order approximation of the ARCH\((\infty )\) model is given by

$$\begin{aligned} \sigma _{t+1}&\approx a_{0} + \sum ^{q}_{j=1} f_{j}(Z_{j,t})+\sum ^{p}_{u=1}a_{u}|Y_{t+1-u}|, \end{aligned}$$
(11)

where \(a_{0}=\alpha _{0}/(1-\beta )\), \(f_{j}(\cdot )=g_{j}(\cdot )/(1-\beta )\) \((j=1,\ldots ,q)\), and \(a_{u}=\alpha _{1}\beta ^{u-1}\) \((u=1,\ldots ,p)\). To ensure the identifiability of \(\sigma _{t+1}\), we normalize (11) so that \(a_{0}=1\).

The truncation parameter p is small relative to the sample size n, but large enough to avoid bias of the parameter estimates. Xiao and Koenker (2009) use \(p = 3n^{1/4}\) in their study of conditional quantile estimators for GARCH models, which in our case gives \(p = 12\) or 13 with sample sizes \(n = 300,\ldots , 399\). However, being conservative, we set \(p=15\) and \(q=28\) throughout the simulation study. Thus, the design matrix \({\mathbf {X}}_{t}\) for the SP approach of Sect. 2.2 consists of \(d=p+q=43\) predictors. Moreover, we consider the following set of parameter values: \(\alpha _{0} = 0.1\), \(\alpha _{1}=0.3\) and \(\beta =0.5\).

Using (11), the \(\tau \)th conditional quantile of \(Y_{t+1}\) given \({\mathbf {X}}_{t}={\mathbf {x}}_{t}\) is obtained by

$$\begin{aligned} \theta ^{\text {{(SP)}}}(\tau |{\mathbf {X}}_{t}) \approx a_{0}(\tau ) + \sum ^{q}_{j=1}f_{j,\tau }(Z_{j,t}) +\sum ^{p}_{u=1} a_{u}(\tau )|Y_{t+1-u}|, \end{aligned}$$
(12)

where \(a_{u}(\tau )=a_{u}F_{\nu }^{-1}(\tau )\) \((u=0,1,\ldots ,p)\), and \(f_{j,\tau }(Z_{j,t})=f_{j}(Z_{j,t})F_{\nu }^{-1}(\tau )\) \((j=1,\ldots ,q)\). For the additive functions, we set \(f_{j}(z_{j}) = 0.05z^{2}_{j}\) \((j=1,2)\) and \(f_{3}(z_{3})=\cdots =f_{28}(z_{28})\equiv 0\). The exogenous predictors \(Z_{j,t}\) \((j=1,2)\) are generated by the following AR(1) process

$$\begin{aligned} Z_{j,t} = 0.5+0.3Z_{j,t-1}+\varepsilon _{t}, \quad \varepsilon _{t}{\mathop {\sim }\limits ^{\hbox {{i.i.d.}}}} {\mathcal {N}}(0,1). \end{aligned}$$
(13)

The remaining 26 exogenous predictors will be considered as noncontributory to the conditional quantile. These predictors are generated to have a compound symmetry covariance structure: \(Z_{u,t} = (V_{u,t} + sW_{u,t})/(1 + s)\) \((u=3,\ldots ,28)\), where \(V_{u,t}\) and \(W_{u,t}\) are i.i.d. random variables from U(0, 1), and \(s\ge 0\). When \(s=0\) the predictors \(Z_{u,t}\) are independent, whereas when \(s\ne 0\) they are dependent with correlation coefficient \(r\equiv \text{ Corr }(Z_{u,t},Z_{u',t})=s^{2}/(1+s^{2})\) \((u\ne u'=3,\ldots ,28)\). We report quantile evaluation results for the case \(r=0\) \((s=0)\) and \(r=0.8\) \((s=2)\).

In each step of the recursive forecast strategy, the one-step ahead quantile forecast is given by

$$\begin{aligned} {\widehat{Q}}^{\text {{(SP)}}}_{Y^{*}_{t}}(\tau |{\mathbf {X}}_{N-p+1})={\widehat{\beta }}_{0}(\tau )+ \sum _{j \in \widehat{\widetilde{{\mathcal {M}}}}_{\tau }}{\widehat{\beta }}_{j}(\tau ){\widehat{\theta }}^{{\hbox {(SP)}}}_{j}(\tau |X_{j,N-p+1}), \end{aligned}$$
(14)

where \({\widehat{\theta }}^{{\hbox {(SP)}}}_{j}(\tau |X_{j,N-p+1})\) are estimates of the recursively obtained marginal conditional quantiles \(\theta ^{{\hbox {(SP)}}}_{j}(\tau |X_{j,N- p+1})\).

Parametric: For the parametric (P) model, we adopt the following avGARCH(1, 1) process

$$\begin{aligned} Y_{t+1}&=\sigma _{j,t+1}\varepsilon _{t+1},\quad \varepsilon _{t+1}{\mathop {\sim }\limits ^{\hbox {{i.i.d.}}}} {\mathcal {N}}(0,1), \end{aligned}$$
(15)
$$\begin{aligned} \sigma _{j,t+1}&= \xi _{0} + \phi _{j}Z_{j,t}+\xi _{1}|Y_{t}|+\xi _{2}\sigma _{j,t}, \quad (j=1,\ldots ,q), \end{aligned}$$
(16)

where \(Z_{1,t}\) and \(Z_{2,t}\) are generated as in (13), and where the remaining \(Z_{j,t}\)’s are again i.i.d. U(0, 1) random variables.

Using the apARCH model specification in the R-rugarch package, we estimate the parameters in (15)–(16) for \(j=1,\ldots ,q\). The conditional quantiles are given by

$$\begin{aligned} {\widehat{\theta }}_{j}^{\text {{(P)}}}(\tau |Z_{j,t})={\widehat{\sigma }}_{j,t+1}\Phi ^{-1}(\tau ), \end{aligned}$$
(17)

where \(\Phi ^{-1}(\tau )\) is the \(\tau \)th quantile of the CDF of \(\varepsilon _{t}\), and \({\widehat{\sigma }}_{j,t+1}\) is an estimate of \(\sigma _{j,t+1}\). These marginal parametric quantile estimates and the marginal semiparametric conditional quantiles \({\widehat{\theta }}^{\hbox {{(SP)}}}(\tau |Z_{j,t})\) serve as “pilots” of the H-aL conditional quantile estimates. Note that in the case of the hybrid forecasting method, the design matrix \({\mathbf {X}}_{t}\) consists of \(q+d=28+43=71\) potential regressors, i.e.

$$\begin{aligned} \{{\widehat{\theta }}_{1}^{\hbox {{(P)}}}(\tau |Z_{1,t}),\ldots , {\widehat{\theta }}_{q}^{\hbox {{(P)}}}(\tau |Z_{q,t}), {\widehat{\theta }}^{\hbox {{(SP)}}}_{1}(\tau |X_{1,t}), \ldots ,{\widehat{\theta }}^{\hbox {{(SP)}}}_{d}(\tau |X_{d,t}) \}. \end{aligned}$$
(18)

For instance, with \({\widehat{\theta }}^{{\hbox {(H)}}}_{u}(\tau |\cdot )\) \((u=1,\ldots ,q+d)\) denoting the uth component of the set (18), the recursively obtained one-step ahead H-aL quantile forecasts of \(Y_{301},\ldots ,Y_{400}\) follow from (8) with \(n\equiv N=300,\ldots ,399\), respectively.

4.2 Evaluating Predicted Quantiles

4.2.1 Expected loss

Recall that \(q_{y}=0\). Then, let \(\{e^{(\cdot )}_{t,h}=Y_{n+h}-{\widehat{Q}}^{(\cdot )}_{Y^{*}_{t}}(\tau |{\mathbf {X}}_{n+h})\}^{N^{*}}_{t=1}\) be the set of h-step ahead out-of-sample quantile prediction errors (QPEs) obtained by the recursive forecasting strategy, where the superscript \((\cdot )\) refers to one of the six methods discussed in Sect. 3, \(Y^{*}_{t}=Y_{t+h+p-1}\), and \(N^{*}=100\) for all values of h. As a measure of performance, we calculate an average of the tick loss function \(\rho _{\tau }(\cdot )\) of QPE values, i.e.

$$\begin{aligned} {\widehat{L}}_{\tau ,h}^{(\cdot )} = \frac{1}{N^{*}}\sum ^{N^{*}}_{t=1}\rho _{\tau }\big (e^{(\cdot )}_{t,h}\big ). \end{aligned}$$
(19)

This average is an estimate of the expected loss \(L^{(\cdot )}_{\tau ,h}={\mathbb {E}}[\rho _{\tau }\big (Y_{n+h}-{Q}^{(\cdot )}_{Y^{*}_{t}}(\tau |{\mathbf {X}}_{n+h})\big )|{\mathcal {F}}_{t}]\) where \({\mathcal {F}}_{t}\) is the sigma-algebra generated by the available information up to time t. For each recursively obtained quantile forecast, \({\widehat{L}}_{\tau ,h}^{(\cdot )}\) weights the difference between the observed value \(Y_{n+h}\) and the forecasted quantile \({\widehat{Q}}^{(\cdot )}_{Y^{*}_{t}}(\tau |{\mathbf {X}}_{n+h})\) by \((1-\tau )\) when \(Y_{n+h}\) is lower than the \(\tau \)th quantile, and by \(\tau \) when \(Y_{n+h}\) exceeds the quantile. In this sense, (19) is a natural way to evaluate quantile forecasts.

4.2.2 Equal forecast accuracy

We assess the difference between two conditional quantile methods A and B via a Diebold–Mariano (DM) type test statistic. To this end, let \(e^{\text {{(A)}}}_{t,h}\) and \(e^{\text {{(B)}}}_{t,h}\) be the associated h-step ahead QPEs. For a fixed quantile level \(\tau \) and fixed forecast horizon h, the corresponding loss differential is defined as

$$\begin{aligned} D^{\text {{(A,B)}}}_{t,\tau ,h} =\rho _{\tau }(e^{\text {{(A)}}}_{t,h}) -\rho _{\tau }(e^{\text {{(B)}}}_{t,h}). \end{aligned}$$
(20)

The null hypothesis that method A produces as accurate conditional quantile forecasts as method B can be tested using the test statistic \(\text{ DM}^{\text {{(A,B)}}}_{\tau ,h}\equiv {\overline{D}}^{\text {{(A,B)}}}_{\tau ,h}/\{\text{ Var }({\overline{D}}^{\text {{(A,B)}}}_{\tau ,h})\}^{1/2}\sim {\mathcal {N}}(0,1)\), where \({\overline{D}}^{\text {{(A,B)}}}_{\tau ,h}\) is the average over t of \(D^{\text {{(A,B)}}}_{t,\tau ,h}\) at forecast horizon h. Under the alternative hypothesis, we specify a one-sided test, so that rejection of the null indicates that method B is more accurate than method A. For \(h=1\), a consistent estimator of \(\text{ Var }({\overline{D}}^{\text {{(A,B)}}}_{\tau ,h })\) is given by the sample variance of \(D^{\text {{(A,B)}}}_{t,\tau ,h }\). For \(h>1\), we use the modified DM test proposed by Harvey et al. (1989).Footnote 2

4.2.3 Encompassing

The topic of combining conditional quantile forecasts can also be investigated via the principle of “encompassing” (ENC). To this end, consider the set of h-step ahead quantile forecasts \(\{\widehat{{\varvec{Q}}}_{t,h}(\tau )\equiv (1,{\widehat{Q}}^{{\hbox {(A)}}}_{t,h}(\tau ),{\widehat{Q}}^{{\hbox {(B)}}}_{t,h}(\tau ))^{\text {{T}}}\}^{n-h}_{t=1}\), a \((3\times 1)\) vector, where \({\widehat{Q}}^{{\hbox {(A)}}}_{t,h}(\tau )\) and \({\widehat{Q}}^{{\hbox {(B)}}}_{t,h}(\tau )\) are two alternative conditional h-step ahead quantile forecasts, such that \(\text{ Corr }\big ({\widehat{Q}}^{{\hbox {(A)}}}_{t,h}(\tau ),{\widehat{Q}}^{{\hbox {(B)}}}_{t,h}(\tau )\big )\ne \pm 1\). Suppose there exists a class \({\mathcal {W}}\) of weights. Let \({\widehat{Q}}^{(c)}_{t,h}(\tau )\!=\! \varvec{\omega }^{\text {{T}}}_{h}\widehat{{\varvec{Q}}}_{t,h}(\tau )\) with \(\varvec{\omega }_{h}\!=\!(\omega ^{(0)}_{h},\omega ^{(1)}_{h}, \omega ^{(2)}_{h})^{\text {{T}}}\in {\mathcal {W}}\), denoting a linear quantile forecast combination (c). Then, for fixed values of \(\tau \) and h, forecast \({\widehat{Q}}^{{\hbox {(A)}}}_{t,h}(\tau )\) is said to encompass forecast \({\widehat{Q}}^{{\hbox {(B)}}}_{t,h}(\tau )\) conditionally at time t with respect to the quantile tick loss function \(\rho _{\tau }(\cdot )\) if and only if \({\mathbb {E}}[\rho _{\tau }(Y_{t+h}-{\widehat{Q}}^{{\hbox {(A)}}}_{t,h}(\tau ))|{\mathcal {F}}_{t}]\le {\mathbb {E}}[\rho _{t}(Y_{t+h}- {\widehat{Q}}^{(c)}_{t,h}(\tau ))|{\mathcal {F}}_{t}]\) for all \(\varvec{\omega }_{h} \in \Omega \subseteq {\mathbb {R}}^{3}\) being a compact set. Testing this inequality for all \(\varvec{\omega }\in \Omega \) is, however, infeasible. Instead, let \(\varvec{\widetilde{\omega }}=(\widetilde{\omega }^{\tiny {(0)}},\widetilde{\omega }^{\tiny {(1)}},\widetilde{\omega }^{\tiny {(2)}})^{\text {{T}}}\in \Omega \) be the h-step ahead optimal forecast combination parameters which, conditionally at time t, minimizes the expected loss

$$\begin{aligned} \varvec{{\widetilde{Q}}}_{h}(\tau ) = \arg \min _{\theta \in \Theta }{\mathbb {E}}[\rho _{\tau }\big (Y_{t+h}-{\widehat{Q}}^{(c)}_{t,h}(\tau )\big )|{\mathcal {F}}_{t}]. \end{aligned}$$
(21)

We consider conducting two separate tests: \({\mathbb {H}}_{1,0}:{\varvec{\widetilde{\omega }}}= (0,1,0)^{\text {{T}}}\) versus \({\mathbb {H}}_{1,a}:{\varvec{\widetilde{\omega }}} \ne (0,1,0)^{\text {{T}}}\), and \( {\mathbb {H}}_{2,0}:{\varvec{\widetilde{\omega }}} = (0,0,1)^{\text {{T}}}\) versus \({\mathbb {H}}_{2,a}:{\varvec{\widetilde{\omega }}} \ne (0,0,1)^{\text {{T}}}\), where for ease of notation we dropped the dependence on h. This testing framework corresponds, respectively, to testing whether \({\widehat{Q}}^{{\hbox {(A)}}}_{t,h}(\tau )\) encompasses \({\widehat{Q}}^{{\hbox {(B)}}}_{t,h}(\tau )\), and whether \({\widehat{Q}}^{{\hbox {(B)}}}_{t,h}(\tau )\) encompasses \({\widehat{Q}}^{{\hbox {(A)}}}_{t,h}(\tau )\), conditionally.

A root-n-consistent estimator of (21) is given by

$$\begin{aligned} \widetilde{{\varvec{Q}}}^{{\hbox {T}}}_{n,h}(\tau )&\equiv \big ({\widetilde{Q}}^{\tiny {(0)}}_{n,h}(\tau ),{\widetilde{Q}}^{\tiny {(1)}}_{n,h}(\tau ),{\widetilde{Q}}^{\tiny {(2)}}_{n,h}(\tau )\big ) \nonumber \\&= \arg \min _{\varvec{\theta }\in \Theta }\frac{1}{n-h}\sum ^{n-h}_{t=1} \rho _{\tau }\big ((Y_{t+h}-(\omega _{0}+\omega _{1}{\widehat{Q}}^{\tiny {(1)}}_{t}(\tau )+\omega _{2}{\widehat{Q}}^{\tiny {(2)}}_{t}(\tau ))\big ). \end{aligned}$$
(22)

Under a sufficient weak dependence condition (and additional regularity conditions), it can be shown (Koenker, 2005) that within the context of conditional quantile autoregression and for fixed values of \(\tau \) and h, \(\sqrt{n}\big (\widetilde{{\varvec{Q}}}_{n,h}(\tau )-\varvec{{\widetilde{Q}}}_{h}(\tau )\big ) \overset{d}{\longrightarrow }{\mathcal {N}}({\mathbf {0}},\varvec{\Sigma }_{h})\), with \(\varvec{\Sigma }_{h}=\tau (1-\tau )\varvec{\Sigma }^{-1}_{1,h}\varvec{\Sigma }_{0,h}\varvec{\Sigma }_{1,h}^{-1}\) where \(\varvec{\Sigma }_{0,h}={\mathbb {E}}[\widetilde{{\varvec{Q}}}_{t,h}(\tau )\widetilde{{\varvec{Q}}}^{\text {{T}}}_{t,h}(\tau )|{\mathcal {F}}_{t}]\), \(\varvec{\Sigma }_{1,h}={\mathbb {E}}[f_{t,h}(\varvec{\widetilde{\omega }}^{\text {{T}}}\widetilde{{\varvec{Q}}}_{t,h}(\tau )|{\mathcal {F}}_{t}) \widetilde{{\varvec{Q}}}_{t,h}(\tau )\widetilde{{\varvec{Q}}}^{\text {{T}}}_{t,h}(\tau )]\), and \(f_{t,h}(\cdot |{\mathcal {F}}_{t})\) is the conditional density of \(Y_{t+h}\) evaluated at \(\varvec{\widetilde{\omega }}\).

Following Fuertes and Olmo (2013), we propose the following two Wald-type test statistics for testing \({\mathbb {H}}_{1,0}\) and \({\mathbb {H}}_{2,0}\):

$$\begin{aligned} \hbox {ENC}^{(1)}_{\tau ,h} =&n\big (\widetilde{\varvec{\omega }}_{n}^{\text {{T}}}\!-\!(0,1,0)\big ) \widehat{\varvec{\Sigma }}^{-1}_{n,h}\big (\widetilde{\varvec{\omega }}_{n}\!-\!(0,1,0)^{\text {{T}}}\big ), \end{aligned}$$
(23)
$$\begin{aligned} \text {and} \hbox {ENC}^{(2)}_{\tau ,h} =&n\big (\widetilde{\varvec{\omega }}_{n}^{\text {{T}}}\!-\!(0,0,1)\big ) \widehat{\varvec{\Sigma }}^{-1}_{h}\big (\widetilde{\varvec{\omega }}_{n}\!-\!(0,0,1)^{\text {{T}}}\big ), \end{aligned}$$
(24)

where \(\widehat{\varvec{\Sigma }}_{h}\equiv \tau (1-\tau )\widehat{\varvec{\Sigma }}^{-1}_{1,h}\widehat{\varvec{\Sigma }}_{0,h} \widehat{\varvec{\Sigma }}^{-1}_{1,h}\) is a consistent estimator of the \(3\times 3\) covariance matrix \(\varvec{\Sigma }_{h}\) with

$$\begin{aligned} \widehat{\varvec{\Sigma }}_{0,h}\!=\!\frac{1}{n-h}\sum ^{n-h}_{t=1}\widetilde{{\varvec{Q}}}_{t,h}(\tau )\widetilde{{\varvec{Q}}}^{\text {{T}}}_{t,h}(\tau ),\, \widehat{\varvec{\Sigma }}_{1,h}\!=\!\frac{1}{2(n-h)h_{n}}\sum ^{n-h}_{t=1} {\mathbb {I}}(|Y_{t+h}- {\widehat{Q}}^{(c)}_{t,h}(\tau )|\le h_{n})\widetilde{{\varvec{Q}}}_{t,h}(\tau )\widetilde{{\varvec{Q}}}^{\text {{T}}}_{t,h}(\tau ). \end{aligned}$$

The kernel-type matrix \(\widehat{\varvec{\Sigma }}_{1,h}\) builds upon a method proposed by Powell (1991), with \(h_{n} = \nu n^{-1/3}\) \((\nu >0)\) a bandwidth parameter satisfying \(h_{n}\rightarrow 0\) and \(nh^{2}_{n}\rightarrow \infty \) as \(n\rightarrow \infty \).Footnote 3 For a fixed value of the forecast horizon h, it follows from (White, 2001, Thm. 4.3) that under \({\mathbb {H}}_{i,0}\), and as \(n\rightarrow \infty \), ENC\(^{(i)}_{\tau ,h}\overset{d}{\longrightarrow }\chi ^{2}_{3}\) \((i=1,2)\). Under \({\mathbb {H}}_{i,a}\), and as \(n\rightarrow \infty \), ENC\(^{(i)}_{\tau ,h}\rightarrow +\infty \) \((i=1,2)\).

Table 1 Sample mean and sample median (in parentheses) of the ratios of tick loss functions \({\widehat{L}}_{\tau ,h}^{\text {{(Method)}}}/{\widehat{L}}_{\tau ,h}^{\text {{(EW)}}}\) for forecast horizons \(h=1\) and \(h=6\), and averaged over 50 predictions

5 Monte Carlo Results

QPE Loss Values

Table 1 presents the sample mean and the sample median values of the ratio \({\widehat{L}}_{\tau ,h}^{({\hbox {Method}})}/{\widehat{L}}_{\tau ,h}^{(\text {{(EW)}}}\) in the left tail \((\tau =0.05\) and \(\tau =0.1\)) and the right tail \((\tau =0.9\) and \(\tau =0.95\)) of the h-step ahead distribution of the QPE values (\(h=1\) and \(h=6\)) for independent (\(r=0\)) and dependent (\(r=0.8\)) exogenous predictors. The main findings are summarized as follows.

  1. 1.

    When \(r=0\) and \(r=0.8\), the estimated QPE loss ratios are virtually the same for \(h=1\) and \(h=6\).

  2. 2.

    Relative to the EW method, the H-aL method beats all other methods in terms of the lowest QPE loss ratios, irrespective of the level of dependence r between the predictors. Compared to all other methods, the reduction in loss is less pronounced for the Ad-QR method.

  3. 3.

    As revealed by their lower loss ratios, there are differences between the \(h=1\) forecasting performance of all methods in the right tail and the left tail of the distribution of the QPE values. The differences are no longer present for \(h=6\).

  4. 4.

    For all methods the QPE loss ratios are much smaller at the boundaries \(\tau =0.05\) and \(\tau =0.95\) than at, respectively, \(\tau =0.1\) and \(\tau =0.9\). With the exception of H-aL, the differences in QPE loss ratios are small for all other quantile forecasting methods at \(\tau =0.5\) (not shown here). In this case the set of exogenous predictors \(\{Z_{j,t}\}\) consists of 28 i.i.d. U(0,1) random variables.

Table 2 Averages of computed p-values of the DM\(^{\text {{(EW, Method)}}}_{\tau ,h}\) test statistic; \(r=0\)

Equal Forecast Accuracy

Table 2 shows averages (based on 50 replications) of computed p-values of the DM\(^{\text {{(EW, Method)}}}_{\tau ,h}\) test statistic for \(r=0\), with \(h=1\) and \(h=6\). Except for the PA-L method at \(\tau =0.9\) and \(h=6\), all null hypotheses are rejected at a 5% nominal significance level, implying that the use of a particular nonparametric conditional quantile method yields statistically more accurate quantile forecasts than using the EW method. Again, we note that the dependencies between the exogenous predictors have hardly any impact on the reported p-values. Overall, the PA-aL method has the lowest average p-value but the differences with the P-Lin-QR, Ad-QR, and H-aL methods are minimal. The DM\(^{\text {{(EW, Method)}}}_{\tau ,h}\) test results are generally the same for \(r=0.8\) (not shown here).

Encompassing

Table 3 shows the average p-values of the ENC\(^{(i)}_{\tau ,h}\) \((i=1,2)\) test statistics for \(h=1\) and \(h=6\). Here we make pairwise comparisons between quantile forecasts obtained by methods 1)–5) relative to quantile forecasts obtained by the H-aL method. Two general results are evident from the table.

  1. 1.

    In the left tail of the quantile forecast distribution neither \({\mathbb {H}}_{1,0}\) and \({\mathbb {H}}_{2,0}\) are rejected at a 1% nominal significance level for \(h=1\) and \(h=6\). This indicates that quantile forecasts of the H-aL method encompasses those of methods 1)–5). Overall, the largest evidence for not rejecting \({\mathbb {H}}_{i,0}\) \((i=1,2)\) seems to occur for quantile forecasts obtained by the PA-L method.

  2. 2.

    Very different conclusions emerge for the right tail of the quantile forecast distribution. In particular, at a 5% nominal significance level both ENC\(^{(i)}_{\tau ,h}\) tests plainly reject \({\mathbb {H}}_{i,0}\) \((i=1,2)\) for \(h=1\) and \(h=6\), indicating that quantile forecasts obtained by the H-aL method do not encompass quantile forecasts obtained from methods 1)–5).

6 Empirical Application

6.1 Data and Forecast Procedure

In this section we make pairwise comparisons between quantile forecasts obtained by methods 2)–6) relative to quantile forecasts obtained by the EW method. The time series of interest is the risk premium of the monthly S&P 500 index, denoted by \(R_{t}\). The set of potential exogenous predictors consists of \(q_{z}=14\) macroeconomic variables covering the time period 1951:01–2016:12 (792 observations).Footnote 4 Specifically, we consider the following predictors: dividend-price ratio (DP); dividend yield (DY); earnings-price ratio (EP); dividend-payout ratio (DE); equity risk premium volatility (RVOL); book-to-market ratio (BM); net equity expansion (NTIS); treasury bill rate (TBL); long-term yield (LTY); long-term return (LTR); term spread TMS); default yield spread (DFY); default return spread (DFR); and inflation (NFL). These series have been the subject of conditional quantile prediction studies by Lima and Meng, (2017), Meligkotsidou et al. (2014, 2019), and De Gooijer and Zerom (2020) among others.

Table 3 Averages of computed p-values of the ENC\(^{(i)}_{\tau ,h}\) test statistics \((i=1,2)\); \(r=0\)

To obtain estimates of the marginal parametric conditional quantiles of the \(R_{t}\) series, we adopt a so-called time-varying mean model with exponential GARCH-Z volatility, i.e.,

$$ {\left\{ \begin{array}{ll} \begin{aligned}&R_{t+1} = \beta _{0,j}+\beta _{1,j}Z_{j,t}+\varepsilon _{j,t+1}, \quad \varepsilon _{j,t+1}{\mathop {\sim }\limits ^{\hbox {{i.i.d.}}}} {\mathcal {N}}(0,\sigma ^{2}_{j,t+1}), \\ &\log (\sigma ^{2}_{j,t+1}) = \delta _{0,j}+\delta _{1,j}Z_{j,t} +\delta _{2,j}\log (\sigma ^{2}_{j,t})+ \delta _{3,j}\Big |\frac{\varepsilon _{j,t}}{\sigma _{j,t}}\Big |+\delta _{4,j}\frac{\varepsilon _{j,t}}{\sigma _{j,t}},\quad (j=1,\ldots ,q_{z}).\end{aligned} \end{array}\right. } $$
(25)
Table 4 Pairwise comparison of quantile forecasting methods for the risk premium of the monthly S&P 500 index

This is a very wide model specification from the literature on the predictability of the S&P 500 risk premium; see, e.g., Cenesizoglu and Timmermann (2012). Thus, with the 14 semiparametric marginal quantiles, the complete portfolio of predictors for the hybrid quantile averaging method consists of 28 potential variables.

We use a recursive forecasting strategy similar to the one discussed in Sect. 4. For \(h=1\), the first forecast origin is 1965:12 and hence the first in-sample (estimation) period is from 1951:01 to 1964:12 \((N=168).\) The second in-sample covers the period 1951:01–1965:01 \((N=169)\). The last in-sample covers the period 1951:01–2016:11 \((N=791)\). Thus, in total there are \(N^{*}=624\) one-step ahead out-of-sample conditional quantile forecasts. For \(h=6\), the first forecast origin is 1964:07, and the last forecast origin is at 2016:06.

6.2 Empirical Results

Table 4, top panel, presents sample means (taken over \(N^{*}=624\) values) of the ratios of the tick loss functions \({\widehat{L}}_{\tau ,h}^{({\hbox {Method}})}/{\widehat{L}}_{\tau ,h}^{(\text {{(EW)}}}\) for \(\tau =0.05,\, 0.1,\, 0.9\), and 0.95, and \(h=1\) and \(h=6\). With the exception of Ad-QR, the values of the estimated QPE loss ratios are all below one, irrespective of the forecast horizon h. This indicates that there are improvements in conditional quantile forecasts over out-of-sample quantile forecasts obtained by the EW method. Overall, the best forecast performance seems to occur for P-Lin-QR and H-aL. Interestingly, for \(\tau =0.05\) and \(\tau =0.95\) the P-Lin-QR method has slightly lower QPE loss ratios than the H-aL method irrespective of the value of h. In this respect, recall that conditional quantiles based on the P-Lin-QR method are best for linear functionals of predictors. So the choice of \(\tau \) can affect the ranking of P-Lin-QR and H-aL.

The bottom panel of Table 4 shows p-values of the DM\(^{\text {{(EW, Method)}}}_{\tau ,h}\) test statistic. Note, that the null hypothesis of equal forecast accuracy is not rejected for PA-aL and Ad-QR for all values of \(\tau \) and h. We also see that the p-values of DM\(^{\text {{(EW, PA-L)}}}_{\tau ,h}\) for \(h=1,\, 6\), and \(\tau =0.9\) are close to one, indicating that there is no significant difference between the forecast performance of EW and PA-L. Overall, the top and bottom panels confirm the good performance of the H-aL method as opposed to the EW method of combining quantile forecasts.

Table 5 shows that the gain in forecasting performance of the H-aL method mainly comes from including the predictor variables DFR and RVOL in the parametric component. By contrast, the semiparametric component includes a variety of predictor variables with significant weights on the set of potential predictors. Finally, as revealed by their selection frequencies less than 5%, there is evidence that the variables DE, BM, and LTY hardly contribute to the predictability of \(R_{t}\).

Table 5 Frequency of predictors which are selected in at least 5% of the times by each component (Comp) of the H-aL method for forecast horizon \(h=1\)

7 Conclusion

In the fist half of the paper, we have explored the multi-step ahead quantile forecasting performance of a hybrid method versus quantile forecast results obtained by five alternative, non-hybrid methods, via a Monte Carlo study. The study yields two main findings. First, relative to quantile forecasts obtained from simple averaging quantile forecasts (i.e., the EW method), the hybrid H-aL method is more accurate in terms of the lowest QPE loss ratios, irrespective of the level of dependence between the predictors. This evidence is strongest in the tails of the forecast distribution of the response variable, and most likely due to the benefits of including information from multiple predictor variables. Second, quantile forecasts obtained by the H-aL method do not encompass quantile forecasts from five competitors in the right tail of the quantile forecast distribution.

The second half of the paper provided an empirical application to forecast the risk premium of the monthly S&P 500 index. Very similar results are obtained as in the Monte Carlo simulation study. In particular, our findings strongly suggest that it is possible to use pre-selected macroeconomic predictors to produce better out-of-sample quantile forecasts than the EW combination approach which uses fixed, time-invariant, weights of conditional quantile forecasts. The H-aL method on the other hand uses stochastic rather than fixed weights. As such, we showed that variable selection and random weights jointly can achieve forecast efficiency gains in the tails of the quantile return distribution. We also provided some insight as to where the gains of the hybrid method come from.