1 Introduction

The estimation and forecasting of covariance matrices are common problems in many fields, such as finance, biostatistics, signal processing and geostatistics. This is a difficult task as covariance matrices lay on a curved manifold (Marron and Dryden 2021; Han and Park 2022). This means that an estimation setting of covariance matrices must account for the fact that the resulting matrices must be at least positive semidefinite (PSD).

In finance, estimating and predicting covariance matrices are of major interest for the crucial role of asset returns covariance matrices in portfolio optimization. Several methods aim at predicting returns conditional volatility in the multivariate framework while guaranteeing positive semidefiniteness of the covariance matrix. For example, Bollerslev (1990) and Engle (2002) provide a multivariate version of autoregressive conditional heteroskedasticity models by imposing different processes for the variances and covariances in the Dynamic Conditional Correlation (DCC) model. Subsequently, a non-parametric measure of volatility, proposed by Andersen et al. (2001, 2003), Barndorff-Nielsen and Shephard (2002b, 2002a), approximates the volatility of stock returns through high-frequency data and, in the multivariate framework, guarantees a positive definite and symmetric estimation of the real covariance matrix. The most attractive feature of realized volatility is that any time series model can be applied to estimate its relationship with explanatory variables and make predictions. Nevertheless, predicted matrices must still be positive semidefinite.

Generally, two approaches can be performed to ensure that predicted realized covariance matrices are symmetric and at least positive semidefinite: a constrained optimization in the predictive model or matrix parametrization. As further pointed out in Pinheiro and Bates (1996), the former approach usually leads to sub-optimal solutions and makes the statistical properties of the constrained predictions difficult to characterize. Therefore, most of the literature on the prediction of realized covariances has focused on matrix parametrizations and, consequently, on modelling the parameters obtained by these transformations. However, the majority of these studies applies a single method without discussing possible differences in the predictive accuracy due to the choice of the parametrization. This paper aims to fill the gap in the literature by providing a critical analysis of the unconstrained methods in a predictive setting both through simulation and empirical application, highlighting the strengths and weaknesses of the parametrizations.

There are several ways to parametrize the covariance matrix, each of which has distinctive properties. For example, Halbleib-Chiriac and Voev (2011) use the Cholesky decomposition to guarantee that predicted matrices through the VARFIMA model are positive definite. The same happens in Christiansen et al. (2012), C.Čech and Baruník, (2017) and in the application of neural networks to realized covariances in Bucci (2020). Conversely, Bauer and Vorkink (2011) and Asai et al. (2022) propose using the matrix logarithm function transformation of the realized covariances and using the exponential matrix function to guarantee positive definite predictions. As shown in Heiden (2015), both methods present some pitfalls that should be accounted for in the predictive setting. For instance, in the Cholesky decomposition, the ordering of elements in the original matrix yields different decompositions. Moreover, both models lack the interpretability of parameters estimated in the predicting model and may produce biased forecasts due to the nonlinear transformation of the covariance matrix.

For these reasons, the forecasting accuracy of a model for predicting realized covariance strongly depends on the parametrization of the realized covariance matrix. Since the predictive model is usually applied to the parameters of the covariance matrix transformation, any pitfall in these approaches may lead to biased forecasts and consequent poor predictive accuracy. The goal of this paper is then to investigate if there are differences in the predictive accuracy of the covariance matrix by comparing four different parametrizations. Specifically, we parametrize the realized covariance matrices, we model the related parameters and we apply the inverse transform to obtain positive semidefinite matrices (most of the methods compared here guarantee also the positive definiteness). As suggested by Pinheiro and Bates (1996), we apply Cholesky decomposition and matrix logarithm transformation to the covariance matrix. Both approaches can be easily implemented and guarantee symmetric positive definite predicted matrices. We also implement a positive semidefinite matrix approximation like the one introduced in Higham (1988) whose application for realized covariance matrices has been proposed in Barndorff-Nielsen and Shephard (2004) and implemented in Fan et al. (2012). This allows computing the nearest symmetric positive semidefinite matrix in the Frobenius norm to the estimated/predicted covariance matrix. The main advantage of this approach is that the time series model can be directly applied to the realized covariances since their predictions are a posteriori approximated to a positive semidefinite matrix. Thus, the interpretability of coefficients in time series models remains untouched, as well as the unbiasedness of predictions for not-approximated positive semidefinite matrices. Finally, we use the parametrization recently suggested by Archakov and Hansen (2021). This approach is based on the matrix logarithmic transformation of the correlation matrix, offering a new tool for the regularization of large covariance matrices by imposing a structure on the off-diagonal elements. To the best of our knowledge, this is the first study that presents a comparison of parametrizations in the accuracy of the predicted covariance matrices. Moreover, the parametrization proposed by Archakov and Hansen (2021) has been here applied for the first time for predicting asset returns realized covariances.

To understand possible differences in the predictions obtained from these parametrizations, we analyse the forecasting accuracy by using both simulated and real data. Following Andersen et al. (2003) and Halbleib-Chiriac and Voev (2011), all the predictions are produced from a rather generic model. In particular, we fit a Vector Autoregressive (VAR) model either to the (unique) parameters of the parametrizations or, as in the case of the positive semidefinite matrix approximant, directly to the realized covariances. The accuracy of the predictions is then evaluated by applying robust multivariate loss functions (Pástor and Veronesi 2012) in a Model Confidence Set procedure (see Hansen et al. 2011). In this context, we also propose the use of a Procrustes Covariance Distance which, as discussed in Marron and Dryden (2021), is less prone to swelling than Frobenius distance and can be applied to non-full-rank covariance matrices like realized covariance matrices in high-dimensional allocation problems (Reiss and Winkelmann 2021).

The remainder of the paper is organized as follows. In Sect. 2, we introduce the realized covariance matrix and discuss the different parametrizations used to obtain positive semidefinite predictions. The parametrization predictive ability is evaluated by a simulation study in Sect. 3 where the data are simulated both from a stochastic volatility and a DCC model. An application with real data is then presented in Sect. 4 while Sect. 5 concludes the paper with a discussion on further developments.

2 Realized covariance matrix parametrization

The covariance matrix of asset returns plays a key role in several financial applications, such as portfolio optimization, risk management and option pricing. Throughout the last three decades, several approaches have been proposed to estimate such quantity (Engle and Kroner 1995; Engle 2002). Many of them (Andersen et al. 2003; Barndorff-Nielsen and Shephard 2004) are built upon the notion of a measure of the latent volatility, i.e. the realized covariance, which can be computed as

$$\begin{aligned} RC_{t} = \sum _{\tau =1}^{N_t}r_{\tau }r_{\tau }' \end{aligned}$$
(1)

where \(r_{\tau }\) is the n-dimensional vector of (intra-day) returns sampled at higher frequencies \(\tau = 1,\ldots , N_t\), where \(N_t\) is the number of high-frequency observations in the t-th reference period.

From the theoretical point of view, it is known that \(RC_{t}\) is an unconditional non-parametric estimator of the (ex-post) quadratic covariation matrix on a reference period t. Nonetheless, \(RC_{t}\) measures are also useful in gauging the conditional covariance matrix. In fact, as shown in Barndorff-Nielsen and Shephard (2002a) and Andersen et al. (2001), using high-frequency data, one can also obtain an estimate of the matrix of quadratic covariation that differs from the true conditional covariance matrix by a zero-mean error.

Since \(RC_{t}\) changes over time, describing and predicting its temporal dynamics is fundamental to financial decision-making. Nevertheless, the predicted covariance matrices should be guaranteed at least positive semidefinite, making the predictive setting still difficult. To overcome this problem, several parametrizations of the covariance matrices have been proposed (see Halbleib-Chiriac and Voev 2011; Bauer and Vorkink 2011).

In order to fully understand the methods here implemented, we need to specify some useful notation. The operator \(\text {vech} \left( \cdot \right) \) denotes the vectorization operator of the \({\tilde{n}} = n (n+1)/2\) lower elements of an \(n \times n\) symmetric matrix. As in Archakov and Hansen (2021), we use the operator \(\text {vecl}\left( \cdot \right) \) to denote the vectorization operator of the lower off-diagonal elements of the argument. The operator \(\text {diag}\left( \cdot \right) \) works in a twofold way. On the one hand, when applied to an n-dimensional vector it denotes an \(n \times n\) diagonal matrix whose diagonal entries are the elements of the vector. On the other hand, when applied to an \(n \times n\) square matrix, it extracts the diagonal elements of it and stacks them in an \(n \times 1\) vector. The operator \(\text {expm}(\cdot )\) returns the matrix exponential function of a symmetric matrix, as further described in Sect. 2.2, while \(\text {logm}(\cdot )\) gives the matrix logarithm transformation function of a square and symmetric matrix.

Since the \(n \times n\) covariance matrix, \(RC_t\), is symmetric, only \({\tilde{n}}\) parameters are necessary for its representation. By denoting with \(\psi _t\) the set of parameters at time t that identifies \(RC_t\), a general parametrization of \(RC_t\) can be written as

$$\begin{aligned} RC_t = L_t(\psi _t)'L_t(\psi _t) \end{aligned}$$
(2)

where \(L_t(\psi _t)\) is an \(n \times n\) full rank matrix derived from the \({\tilde{n}}\)-dimensional vector of unconstrained parameters \(\psi _t\). The choice of \(L_t(\psi _t)\) in Eq. (2) leads to different parametrizations of \(RC_t\). In general, the parametrization should guarantee that:

  1. 1.

    any nonsingular covariance matrix, \(\Sigma _t\), maps to a unique vector \(\psi _t = v(\Sigma _t) \in {\mathbb {R}}^{{\tilde{n}}}\);

  2. 2.

    any vector, \(\psi _t \in {\mathbb {R}}^{{\tilde{n}}}\), maps to a unique covariance matrix \(\Sigma _t = v^{-1}(\psi _t)\); therefore, the parametrization can be inverted and this inversion is unique;

  3. 3.

    the parametrization, \(\psi _t = v(\Sigma _t)\), is ‘invariant’ to the ordering of the columns that define \(\Sigma _t\);

  4. 4.

    the elements of the parameters \(\psi _t\) can be easily interpreted.

Here, we consider three classes of unconstrained parametrizations that respect most of the former properties, one relying on the Cholesky decomposition of \(RC_t\), the other based on its spectral decomposition, and the third based on the matrix logarithm transformation of the correlation matrix.

2.1 Cholesky decomposition

A computationally simple way to parametrize \(RC_t\) is supposing that the parameters of \(\psi _t\) in Eq. (2) are the elements of an upper triangular matrix, such that

$$\begin{aligned} RC_t = L_t'L_t \end{aligned}$$
(3)

is the Cholesky decomposition of \(RC_t\). The elements of \(L_t\), \(l_{ij,t}\), can be calculated recursively by

$$\begin{aligned} l_{ij,t} = {\left\{ \begin{array}{ll} \frac{1}{l_{jj,t}}\left( c_{ij,t} - \sum _{k=1}^{j-1}l_{ik,t}l_{jk,t}\right) , &{} i > j\\ \sqrt{c_{ij,t} - \sum _{k=1}^{j-1}l_{jk, t}^2}, &{} i = j \end{array}\right. } \end{aligned}$$
(4)

where \(c_{ij,t}\) is the ij-th element of \(RC_t\). Once obtained the h-steps-ahead predictions of \(L_t\), i.e. \({\hat{L}}_{t+h}\), through a given time series model, the predicted \(RC_{t+h}\) is obtained by inverting the Cholesky decomposition in Eq. (3), \({\widehat{RC}}_{t+h} = {\hat{L}}_{t+h}'{\hat{L}}_{t+h}.\)

The major pitfall of this parametrization, already assessed in Pinheiro and Bates (1996) and Heiden (2015), is that it is not invariant to the ordering of variables, thus violating the aforementioned property 3. In fact, since the calculation of Cholesky factors is recursive, the ordering of the elements in the original matrix, \(RC_t\), influences the definition of the factors. This means that up to \(2^n\) different set of parameters, \(\psi _t\), may represent the same \(RC_t\). In terms of predictions, different results may arise from the different ordering, especially when the dimension of \(RC_t\) is high.

The Cholesky decomposition does not even respect property 4, since, except for the first Cholesky factor, it is difficult to interpret the relationship between the parameters and elements of \(RC_t\). For example, this lack of a direct relationship between \(\psi _t\) and the elements of \(RC_t\) makes it difficult to interpret the coefficients in a time series model.

Furthermore, the prediction errors on the Cholesky factors quadratically propagate to the forecasts of the original elements of \(RC_t\), making the predictions possibly biased, as already shown in Halbleib-Chiriac and Voev (2011). In fact, the forecasts of the parameters \(\psi _t\) from a Vector Autoregressive model are unbiased (see Dufour 1985). However, in the construction of \({\widehat{RC}}_{t+h}\), the prediction error for each element of the covariance matrix, \(e_{ij, t+h} = RC_{ij, t+h} - {\widehat{RC}}_{ij, t+h}\), has no longer mean zero and depends on the variance \(\Omega _h\) of the forecast error of the Cholesky factors, \(u_{\psi , t+h} = \psi _{t+h} - {\hat{\psi }}_{t+h}\). Specifically, each element of the predicted covariance matrix \({\widehat{RC}}_{ij, t+h}\) is biased by \(E\left[ e_{ij, t+h}\right] \equiv \sigma _{ij, h}^* \ne 0\), where \(\sigma _{ij, h}^*\) is obtained from the elements of \(\Omega _h\). For this reason, Halbleib-Chiriac and Voev (2011) propose a bias correction that strongly depends on the second moment of the forecasting error, which is model-dependent. Nevertheless, the authors conclude in their manuscript that bias correction is difficult to be empirically justified. Therefore, we do not consider this correction in our paper.

2.2 Matrix logarithm function

Another possible way to parametrize and predict \(RC_t\) is based on its matrix logarithm function and the matrix exponential function of the predicted parameters \(\psi _{t+h}\). The matrix exponential function gives a power series expansion on a square matrix \(A_t\) such that

$$\begin{aligned} RC_t = \text {expm}(A_t) = \sum _{m = 0}^{\infty }\frac{1}{m!}A_t^m \end{aligned}$$
(5)

where \(\text {expm} = e^{(\cdot )}\) is the matrix exponential function as defined in Chiu et al. (1996).

Conversely, given the square and symmetric matrix \(RC_t\), its logarithm \(A_t\) is a square and symmetric matrix that satisfies the condition \(\text {expm}\left( A_t\right) =RC_t\). It should be remembered that a real matrix has a real logarithm if and only if it is invertible and each Jordan block belonging to a negative eigenvalue occurs an even number of times.

To obtain the logarithm transformation of \(RC_t\), a spectral decomposition is necessary. Because \(RC_t\) is positive definite by construction, it has n positive definite eigenvalues \(\lambda _t\). Letting \(U_t\) denote the orthogonal matrix of orthonormal eigenvectors of \(RC_t\) and \(\Lambda _t = \text {diag}(\lambda _t)\) be a diagonal matrix with eigenvalues on its diagonal, we can write

$$\begin{aligned} RC_t = U_t \Lambda _t U_t'. \end{aligned}$$
(6)

This is a reformulation of Eq. (2) by simply imposing

$$\begin{aligned} L_t = \Lambda _t^{1/2}U_t' \end{aligned}$$
(7)

where \(\Lambda _t^{1/2}\) denotes the diagonal matrix with \(\left[ \Lambda _t^{1/2}\right] _{ii} = \sqrt{\left[ \Lambda _t\right] _{ii}}\). Given the spectral decomposition of \(RC_t\), its matrix logarithm function can be defined as

$$\begin{aligned} A_t = \text {logm} (RC_t) = U_t \log (\Lambda _t) U_t' \end{aligned}$$
(8)

where \(\log (\Lambda _t) = \text {diag}\left( \log \lambda _t\right) \) is the \(n \times n\) diagonal matrix with diagonal entries the logarithm of the eigenvalues. The matrix \(A_t\) can take any value in the space of \(n \times n\) symmetric matrices. Letting \(\psi _t = vech(A_t)\) be the upper triangular elements of \(\log (RC_t)\), we obtain the matrix logarithm parametrization of \(RC_t\). As for the Cholesky decomposition, \(\psi _t\) is predicted h-steps ahead. Then, the inverse of the vech operator is used to form the symmetric predicted logarithm matrix, \({\hat{A}}_{t+h}\), and the prediction of \(RC_t\) is obtained through its matrix exponential transformation such that \({\widehat{RC}}_{t+h} = \text {expm} \left( {\hat{A}}_{t+h}\right) \).

Differently from Cholesky decomposition, this parametrization is unique as it defines a one-to-one mapping between \(\psi _t\) and \(RC_t\). Indeed, it satisfies the first three properties of a covariance matrix reported before. Nevertheless, the vector of parameters \(\psi _t\) does not directly refer to the elements of the original matrix \(RC_t\), hence the fourth property stated before cannot be satisfied.

As for the Cholesky decomposition, the predictions of \(RC_t\) will be biased since the predictive model is implemented in the log-volatility space and by Jensen’s inequality \(E\left( {\widehat{RC}}_{t+h}\right) \ne \text {expm} \left[ E\left( \psi _{t+h}\right) \right] \). Once again, the solution proposed by Bauer and Vorkink (2011) is to correct somehow the predictions through a data-driven approach. Still, their correction is based on the unrealistic assumption that \(RC_t\) is observed and, more worryingly, it may lead to predicted matrices no longer being positive semidefinite.

2.3 Positive semidefinite matrix approximant

In addition to the previous parametrizations of \(RC_t\), we implement the positive semidefinite approximant, proposed by Higham (1988) and Rousseeuw and Molenberghs (1993), and used on realized covariance matrices in Fan et al. (2012). This approach allows computing the nearest symmetric positive semidefinite matrix to an arbitrary matrix in the Frobenius norm. This means that the realized covariances can be directly modelled and that the predicted matrices are a posteriori approximated to the nearest positive semidefinite matrix. Therefore, the interpretability of the coefficients of the predictive model remains unchanged, as well as the forecasts are unbiased, at least when the approximation is not necessary.

When a matrix \(Z_t\) is symmetric but not positive semidefinite, some eigenvalues in the spectral decomposition as in Eq. 6 are negative. Higham (1988) demonstrates that a PSD approximant of \(Z_t\) can be obtained by simply replacing the negative eigenvalues with zeroes. In such a case, the computation of the unique approximation of \(Z_t\) is straightforward. Given the spectral decomposition of \(Z_t = U_t \Lambda _t U_t'\), with \(\Lambda _t = \text {diag}(\lambda _t)\), where \(\lambda _t = \left\{ \lambda _{1,t}, \ldots , \lambda _{n,t}\right\} \) is the vector of eigenvalues, its PSD approximant, \(S_t\), can be computed as follows:

$$\begin{aligned} S_t = U_t \text {diag}(\lambda _t^*)U_t' \end{aligned}$$
(9)

where \(\lambda _t^* = \left\{ \lambda _{1,t}^*,\ldots , \lambda _{n,t}^*\right\} \) with \(\lambda _{i,t}^*\), for \(i = 1, \ldots , n\), is defined as follows

$$\begin{aligned} \lambda _{i,t}^* = {\left\{ \begin{array}{ll} \lambda _{i,t}, &{} \lambda _{i,t} \ge 0\\ 0, &{} \lambda _{i,t} <0. \end{array}\right. } \end{aligned}$$
(10)

Alternatively, the approximant of \(Z_t\) can be computed using the iterative algorithm introduced by Higham (1986) based on matrix inversions. Here, we do not consider this approach and rely only on the approximant presented in Eqs. (9) and (10).

From this specification, it follows that we can directly model and predict the \({\tilde{n}}\)-dimensional vectorization of the realized covariance matrices without any a priori parametrization of the matrices themselves and without caring about semidefinite positiveness. Once obtained the forecast of the realized covariances from a specific model, we approximate the positive semidefinite version of the predicted matrix via Eq. (9). This also allows the practitioner to directly interpret the effect of possible explanatory variables on the realized covariances and avoid any lack of the method in terms of the properties defined in Sect. 3. Differently from the aforementioned approaches, the predictions of the realized covariances remain unbiased and no bias correction methods should be applied to them.

2.4 Matrix correlation parametrization

Finally, we implement the parametrization recently introduced by Archakov and Hansen (2021) that allows us to obtain positive definite forecasts of \(RC_t\) from the matrix logarithm transformation of the correlation matrix. Their approach relies on the decomposition of the covariance matrix into standard deviations and correlations, such that

$$\begin{aligned} RC_t = D_t C_t D_t \end{aligned}$$
(11)

where \(C_t\) is the correlation matrix and \(D_t\) is the diagonal matrix with the standard deviation of the asset returns on its diagonal, see also Bollerslev (1990). By reformulating, Eq. (11) can be written in terms of Eq. (2) with \(L_t = C_t^{1/2}D_t'\). Archakov and Hansen (2021) propose to parametrize \(RC_t\) by the n log-volatilities, \(d_t = \log (\text {diag}(D_t))\), and by the \(n (n-1)/2\) off-diagonal elements of the matrix logarithm transformation of \(C_t\), such that \(G_t = \text {logm} (C_t)\). Let \(\gamma (G_t)\) be the off-diagonal elements of \(G_t\), the parameters that define \(RC_t\) are then \(\psi _t = \left( d_t, \gamma _t\right) \). As in the case of the Constant/Dynamic Conditional Correlation models by Bollerslev (1990) and Engle (2002), this approach allows modelling separately the variances and correlations. This may be crucial in the modelling of high-dimensional covariance matrices where a simple model may be imposed for the correlations and more complex models may be supposed for the variances.

Once obtained the predictions \(\psi _{t+h}\), reconstructing \(C_{t+h}\) and then \(RC_{t+h}\) from this approach is not straightforward. To formalize the inverse process to obtain positive definite predicted matrices from \(d_{t+h}\) and \(\gamma _{t+h}\), Archakov and Hansen (2021) introduce a new operator such that, given an \(n \times n\) matrix, B, and a vector \(\omega \in {\mathbb {R}}^n\), then \(B[\omega ]\) is the matrix for which \(\omega \) has replaced its diagonal. This operator is useful for the definition of Theorem 1 in their manuscript. According to this theorem, for any real symmetric matrix, \(B \in {\mathbb {R}}^{n \times n}\), there is a unique vector, \(\omega ^* \in {\mathbb {R}}^n\), such that the exponential of B, \(\text {expm} \left( B[\omega ^*]\right) \), is a correlation matrix. Obtaining \(\omega ^*\) requires that the diagonal elements of \(\text {expm} \left( B[\omega ^*]\right) = e^{B[\omega ^*]}\) are all equal to one. At this end, the authors propose to apply the following iterative procedure. Consider the sequence,

$$\begin{aligned} \omega _{(k+1)} = \omega _{(k)} - \log \text {diag} \left( e^{B\left[ \omega _{(k)}\right] }\right) , \quad k = 0, 1, 2, \ldots \nonumber \\ \end{aligned}$$
(12)

with an arbitrary initial vector \(\omega _{(0)} \in {\mathbb {R}}\), it follows that \(\omega _{(k)} \rightarrow \omega ^*\), when \(\omega ^*\) is the solution of the aforementioned theorem. The authors find that this algorithm converges very fast, also with high-dimensional matrices. This procedure permits to map the values of \({\hat{\gamma }}_{t+h}\) to \({\hat{C}}_{t+h}\) by simply calculating \({\hat{C}}_{t+h} = \text {expm} \left( {\hat{G}}_{t+h}[\omega ^*]\right) = e^{{\hat{G}}_{t+h}[\omega ^*]}\), where \({\hat{G}}_{t+h}\) is the matrix with off-diagonal elements equal to \({\hat{\gamma }}_{t+h}\) and diagonal elements equal to \(\omega ^*\), and then obtain \({\widehat{RC}}_{t+h}\) in combination with the prediction of the diagonal matrix of volatilities, \({\hat{D}}_{t+h} = \text {diag}\left( e^{{\hat{d}}_{t+h}}\right) \), such that \({\widehat{RC}}_{t+h} = {\hat{D}}_{t+h} {\hat{C}}_{t+h} {\hat{D}}_{t+h}\).

As further discussed in Archakov and Hansen (2021), this parametrization satisfies all the properties introduced before. Additionally, this approach also permits separately modelling variances and correlations, which is usually a desirable feature in the modelling of asset returns covariances (Heiden 2015). However, the practitioner should account for the fact that the procedure in Eq. (12) must be repeated for each prediction, and this may entail a higher computational time in high-dimensional problems.

We list the parametrizations of \(RC_t\) considered in this paper in Table 1.

Table 1 Summary of parametrizations

3 Monte Carlo simulations

In this section, we examine the performance of each approach with simulated data. Specifically, we first simulate the time series of high-frequency prices from the stochastic volatility (SV) model proposed by Zhang et al. (2005) and then obtain the daily \(RC_t\) matrices from the high-frequency returns as in Eq. (1). For comparison purposes, we also simulate directly the return covariance matrices from a DCC model.

3.1 Simulation from a stochastic volatility model

The data-generating process for each element of the \(n \times 1\) vector of prices, \(P_t = (p_{1,t}, \ldots , p_{n,t})'\), is the stochastic volatility model by Heston (1993), such that

$$\begin{aligned} dp_{i,t}&= (\mu - v_{i,t}/2)dt + \sigma _{i,t}dB_t \end{aligned}$$
(13)
$$\begin{aligned} dv_{i,t}&= \kappa (\alpha - v_{i,t})dt + \gamma v_{i,t}^{1/2}dW_t, \quad i=1, \ldots , n \end{aligned}$$
(14)

where \(v_{i,t} = \sigma _{i,t}\) is the stochastic volatility of the i-th asset. As in Zhang et al. (2005), we assume that the parameters are constant through time and that they are common among the simulated assets. In particular, following the simulation set-up of Zhang et al. (2005), we set \(\mu = 0.05\), \(\kappa = 5\), \(\alpha = 0.04\), \(\gamma = 0.5\) and \(\rho = -0.5\), where \(\rho \) is the correlation coefficient between the two Brownian motions, B and W.To investigate the predictive accuracy of the procedures for a different number of assets, we simulate three samples of prices with n equal to 5, 10 and 15. Assuming that the trading day consists of 6.5 h, we are computing \(T = 3000\) daily realized covariance matrices as in Eq. (1), which is almost the same amount of data available in the empirical application, from 39,000 prices sampled at a 30-min frequency.

3.1.1 Out-of-sample evaluation of simulated data from a stochastic volatility model

We produce out-of-sample predictions by fitting a general VAR model to the elements of \(\psi _t\) obtained with the different parametrizations. This means that we apply the VAR to make predictions on the elements of \(\psi _t\) and we reconstruct the guaranteed positive semidefinite predicted matrix from the predicted \(\psi _{t+h}\), where h is the forecast horizon. The order of the VAR is automatically selected in the training set by information criteria with a maximum lag order of 5. We then analyse the differences in the predictive accuracy among the parametrizations and compare them with the predictions from a VAR applied directly to the realized covariances. Considering that for a growing number of assets the dimension of the coefficient matrix in a VAR can be very large, we have also analysed the predictions from a penalized VAR as the one introduced in Davis et al. (2016). In addition, we have considered using the prediction from a random walk as an additional competing model, nevertheless, the performance of this is much worse than the one from the considered models and we do not report it in our results.

Fig. 1
figure 1

Volatility plots for \(n = 15\) assets simulated from a SV model

We divide the whole sample into two subsamples: the in-sample one contains 2/3 of the total observations, i.e. \(T_1 = 2000\), while the out-of-sample contains the remaining 1/3 of the total observations, hence \(T_2 = 1000\). The simulated volatilities for \(n = 15\) stocks are depicted in Fig. 1. Since possible biases in the predictive setting further propagate in a multistep-ahead procedure due to the recursive mechanism, we consider both one- and multistep-ahead predictions. Supposing that the realized covariance matrices are sampled at daily frequency, we analyse three forecast horizons: short-term (\(h = 1\), one day ahead), mid-term (\(h = 5\), one week ahead), and long-term (\(h = 22\), one month ahead). Out-of-sample forecasts are obtained from a rolling window with a size of 2000 where the parameters are re-estimated at each step. The size of the window is fixed, while \(T_2\) changes with h as follows: \(T_2 = 1000\) for \(h = 1\), \(T_2 = 996\) for \(h = 5\) and \(T_2= 979\) for \(h = 22\). The order of the asset returns in the construction of the covariance matrix of asset returns, relevant for the Cholesky decomposition, is random.

To assess the predictive accuracy of the parametrizations, we evaluate the Frobenius and the Procrustes (Dryden and Mardia 2016) distances between the true and predicted covariance matrices at time t. In general, for a pair of covariance matrices \(S_1\) and \(S_2\), the latter is defined as the Euclidean distance (or Frobenius) which minimises

$$\begin{aligned} D^2(X_1, X_2) = \left\| X_2 - X_1 \Gamma \right\| _F \end{aligned}$$
(15)

where \(||_F\) is the Frobenius norm, \(\Gamma \) is an \(n \times n\) rotation matrix and \(X_i, \ i=1,2\) are unique non-negative matrix roots such that \(S_i=X_i X_i'.\)

As discussed in Laurent et al. (2013), the Frobenius distance is robust to a volatility proxy, as in the case of \(RC_t\). On the other hand, the Procrustes distance is less prone to swelling than the Frobenius distance (Marron and Dryden 2021) and can be applied to non-full-rank covariance matrices like realized covariance matrices in high-dimensional allocation problems (Reiss and Winkelmann 2021).

In this paper, we use the average of these loss functions over the out-of-sample observations to analyse the predictions in the rolling window setting and, as descriptive statistics, we also report the number of times the VAR applied to \(\text {vech}(RC_t)\) fails to predict a PSD matrix, denoted by \(T_2^*\). Moreover, the loss functions have been used to compute multiple comparisons via the model confidence set (MCS) methodology introduced by Hansen et al. (2011). For a given confidence level, this procedure identifies the set of models with the best out-of-sample forecasts. Specifically, we test sequentially the equal predictive ability of the considered models. The performance is measured pairwise via the loss functions difference, \(\eta _{ij,t} = L(RC_{t},H_{i,t}) - L(RC_{t},H_{j,t})\), for all \(i,j=1,2,\ldots ,M\) with \(i\ne j\), where M is the number of compared predictive models (in our case \(M = 4\)), \(L(\cdot )\) is either a Frobenius loss function or a Procrustes metric between the observed realized covariance matrix and its prediction from the i/j-th model, \(H_{\cdot ,t}\). Assuming that \(\eta _{ij,t}\) is stationary, the null hypothesis takes the form

$$\begin{aligned} H_{0}:E(\eta _{ij,t})=0. \end{aligned}$$
(16)

When the null hypothesis is rejected, we discard the model with the largest loss function. Hence, the set of superior models (SSM) is composed of all non-discarded models. The MCS p-values are calculated as the number of times the model is included in the SSM through 10000 stationary block bootstraps.

3.1.2 Results for data simulated from a stochastic volatility model

In this section, we discuss the prediction results from fitting the VAR model to the elements of \(\psi _t\). An analysis of the information criteria (Akaike, Bayesian and Hannan-Quinn information criterion) for each parametrization over \(T_1\) suggests that, independently of the number of assets, a first-order VAR is a preferable specification to represent the dynamics of the series of the training set. The average loss functions and the MCS p-values for \(h = 1, 5, 22\) are reported in Tables 23 and 4.

When the forecast horizon is equal to 1 (Table 2), the only predictions that always belong to the SSM are the ones from the method proposed by Archakov and Hansen (2021). Specifically, this method has the lowest average Frobenius distance for \(n = 5\) and 10, and the lowest Procrustes metric for all the simulated samples. The only method comparable with Corr is the Cholesky decomposition which also exhibits a lower Frobenius distance for \(n = 15\). It is worth noticing that modelling directly the \(RC_t\) series or using the approximant by Higham (1988) leads to an average Frobenius distance which is almost double that of the Cholesky decomposition for \(n=15\).

In the cases of \(h=5\) and \(h=22\), the method of Archakov and Hansen (2021) exhibits the lowest loss functions for all the samples. Indeed, this is the only method included in the SSM for \(p_{\textsc {MCS}} > 0.75\). Once again, the Cholesky decomposition is the only parametrization with average loss functions similar to Corr.

Conditioning on h, the number of non-PSD predictions obtained by directly modelling the \(RC_t\) series (\(T_2^*\)) increases with the number of assets. On the other hand, conditioning on n, \(T_2^*\) decreases for larger values of h. There is, thus, the hint that by extending the forecast horizon the recursive predictions tend to the average realized covariance matrix which is, in turn, generally positive definite. Interestingly, a parametrization of the covariance matrix always leads to improved predictions with respect to a linear model applied directly on the covariances time series, even when \(T_2^* = 0\). Finally, for \(n = 15\), we have also estimated a sparse VAR (Davis et al. 2016) where the penalization parameter is tuned at each step of the rolling procedure. The results show that the performance of the different parametrizations under this model is very much the same as the general VAR model and they are thus not reported here.

Table 2 Average loss functions for a different number of assets and \(h = 1\) in the simulation study from a SV model
Table 3 Average loss functions for a different number of assets and \(h = 5\) in the simulation study from a SV model
Table 4 Average loss functions for a different number of assets and \(h = 22\) in the simulation study from a SV model

3.2 Simulation from a DCC model

In order to analyse the forecasting accuracy of theparametrizations with a different simulation setting that produces known covariance matrices, we also simulate the return covariance matrices by a DCC model. With this simulation setting, we obtain time series of volatilities with lower dispersion around the mean and no seasonality, see Fig. 2. Therefore, we are investigating the robustness of the results obtained in the previous section also when the data-generating process changes. The steps of the simulation process from a DCC model are described in Appendix A.

This simulation setting yields return covariance matrices fully observed, allowing us to apply any parametrization and multivariate time series model to make predictions. Considering that the generating process is based on two GARCHs(1,1), out-of-sample predictions are obtained by fitting a VAR(1) model.

3.2.1 Results for data simulated from a DCC

The simulated sample is divided as in Sect. 3.1.1, thus we have a training sample of \(T_1 = 2000\) observations and a testing sample of \(T_2 = 1000\) observations. The simulated volatilities for \(n = 15\) stocks are depicted in Fig. 2. The order of the assets in the construction of the return covariance matrix, relevant for the Cholesky decomposition, is random. It is worth noticing that the simulation design is similar to that suggested by Archakov and Hansen (2021). Accordingly, this may positively influence the accuracy of their parametrization. The evaluation of the out-of-sample forecasts is performed through the methods introduced in Sect. 3.1.1.

Fig. 2
figure 2

Volatility plots for \(n = 15\) assets simulated from a DCC model

Tables 89, and 10 in Appendix B report the average of both loss functions for each parametrization along with the MCS p-values respectively for \(h = 1, 5, 22\). Overall, the results confirm the superiority of the method proposed by Archakov and Hansen (2021) and the Cholesky decomposition, as already emerged from Tables 23 and 4 in the application of simulated data from a stochastic volatility model.

The first evidence from the tables is that the VAR(1) fails to predict positive semidefinite matrices independently of the number of assets and the forecasting horizon, underlining how crucial a parametrization is in a forecasting approach. Not surprisingly, the number of times (\(T_2^{*}\)) a VAR(1) fails to produce PSD predicted matrices increases with the number of assets.

For any combination of n and h, the differences between the parametrizations are relatively small, while there emerges a clear difference with the approach in which no-parametrization is performed and with the approximant of Higham (1988). These differences explode for \(n = 10\) and \(n = 15\). In general, the parametrization proposed by Archakov and Hansen (2021) outperforms the compared methodologies for the average of both loss functions. When looking at the average Frobenius distance, the use of Cholesky decomposition obtains the overall best performance no matter what the number of assets, only for \(h =1\). Still, its performance is closely followed by the parametrization proposed by Archakov and Hansen (2021), which is also the only parametrization always included in the SSM for \(p_{\textsc {MCS}} > 0.75\) in terms of the Procrustes measure. The matrix logarithm transformation seems to be a valid tool to parametrize the covariance matrix, since it performs similarly to the approach proposed by Archakov and Hansen (2021). In fact, it also belongs to the SSM for \(h = 22\) and \(n = 5\). Furthermore, Cholesky decomposition is strongly outperformed in terms of average loss functions for \(n = 15\) when the forecast horizon is different from \(h = 1\).

4 Empirical evidence

4.1 Forecasting US stocks realized covariances

The dataset used to compare the forecasting accuracy of the parametrizations comprehends the time series of hourly open spot prices of the 10 largest (as of November 2021) US stocks by market capitalization belonging to the S &P500 for the period August 31, 2010–November 30, 2021, sourced from Bloomberg (19,766 hourly observations). The choice of estimating daily \(RC_t\) covariance matrices from hourly data is in line with De Pooter et al. (2008) which show that optimal intra-day frequencies correspond to 30 min or an hour. Table 5 reports the complete list of stock names and tickers. Daily realized volatility matrices are computed as in Eq. (1), for a total amount of \(T=2833\) daily observations. The covariance stationarity of each series is confirmed by Augmented Dickey-Fuller (ADF), ADF-GLS, Phillips-Perron (PP), and KPSS tests for unit roots. The realized volatilities of the \(n = 10\) stocks are depicted in Fig. 3.

Table 5 List of the 10 US stocks used in this study
Fig. 3
figure 3

Daily realized volatility plots

In order to evaluate the forecasting accuracy of the parametrizations, we use almost 1/3 of the total observations (\(T_2 = 943, 939, 922\) respectively for \(h = 1, 5, 22\)) from March 6, 2018 to November 30, 2021 for the out-of-sample analysis. To avoid any look-ahead bias or distortion in the predictions due to a bad sampling scheme, we produce one- and multistep ahead forecasts for a rolling window of 1890 observations that foresees the estimation of the parameters at each step.

We select the number of lags to be used in the predictive VAR through information criteria applied to the training sample. All the criteria for all the parametrizations suggest a single lag, hence we make predictions of the dependent variables, \(\psi _t\), from a VAR(1).

4.2 Forecasting results for daily realized covariances

In this section, we evaluate out-of-sample forecasts through the methods introduced in Sect. 3.1.1 and report the average loss functions in Table 6. As in the case of simulated data, Cholesky decomposition is the one with the lowest Frobenius loss for \(h = 1\). Nevertheless, in the comparison of the predictions for \(h = 1\), this does not lead to statistically significant differences, since all the models, except for the matrix logarithm transformation, belong to the SSM. For one-step-ahead predictions, the approach proposed by Archakov and Hansen (2021) is the one with the lowest average Procrustes metric. The supremacy of the Cholesky decomposition and the Corr approach is also evident from multistep-ahead predictions. In fact, for \(h = 5\), these parametrizations outperform the competing ones and are both included in the SSM when looking at the Frobenius distance. Surprisingly, the PSD approximant has the lowest Procrustes measure for \(h = 5\). As already observed in the simulated data, the differences in terms of predictive accuracy between these parametrizations and the simple VAR(1) or the a posteriori PSD approximation increase with the number of steps ahead. For \(h = 22\), the lowest losses are observed for the method of Archakov and Hansen (2021). Intuitively, the performance of no-parametrization and the approximant by Higham (1988) is mainly influenced by the poor accuracy in the observations belonging to \(T_2^*\). In Appendix A, we also report a heatmap of the Root Mean Square Error (RMSE) calculated between the observed and the predicted elements of the covariance matrix through the different parametrizations for \(h=1\). The heatmaps show that Cholesky and Corr outperform other methods for almost all the elements of the lower triangular part of the covariance matrix.

Table 6 Average loss functions for daily empirical data in different forecast horizons
Table 7 Average loss functions for weekly empirical data in different forecast horizons

4.3 Forecasting results for weekly realized covariances

For robustness purposes, we also use the hourly data presented in Sect. 4.1 to compute weekly realized covariances. In particular, weekly realized volatility matrices are computed as in Eq. (1), for a total amount of \(T=595\) observations. To allow an out-of-sample evaluation, one-third of the total observations (\(T_2 = 200, 196, 179\) respectively for \(h = 1, 5, 22\)) is used as a testing sample. As for the case of daily realized covariances, we produce the forecasts from a rolling window of 395 weekly observations and estimate the parameters of the VAR at each step. We have performed unit roots test to confirm the covariance stationarity of the time series.

The average loss functions for weekly \(RC_t\) are reported in Table 7. The findings are mostly overlapping with those from the previous Sections. In fact, the parametrization introduced by Archakov and Hansen (2021) outperforms all the alternatives in terms of predictive accuracy. Nevertheless, the difference with the Cholesky decomposition and the matrix logarithm transformation for the Frobenius distance is not significant for \(h = 5\), since they are all included in the SSM for a \(p_{\textsc {MCS}}\) greater than 0.75.

5 Conclusions

Predicting return covariance matrices is of particular importance to the fields of risk management, portfolio management and asset pricing. However, modelling the dynamics of a covariance matrix is a challenging task as the chosen model must guarantee the positive semi-definiteness of the predicted covariance matrices.

In this paper, by using simulated and real data, we have discussed the performance of different parametrization methods which allow the predicted covariance matrices to be at least semidefinite positive. Our findings are that different parametrizations can lead to different results in terms of forecasting accuracy of asset return covariance matrices. In particular, we have found that Cholesky decomposition and the new parametrization proposed by Archakov and Hansen (2021) provided the lowest loss functions and that the difference between such methods is often statistically not significant. As a result, directly modelling the return covariances or using the a posteriori approximant of Higham (1988) produce the poorest predictions. This evidence is mainly due to the predictive errors made in non-positive semidefinite observations. Interestingly, we have found that a parametrization of the covariance matrix always leads to improved predictions with respect to a linear model applied directly on the covariances time series, also when this method does not fail to predict PSD matrices. This finding, which is worth further investigations for future works, clearly emerges in multistep-ahead forecasts, where non-PSD matrices are used recursively to produce successive predictions.

Among the two best-performing methods, possible advantages and disadvantages also emerge. Although computationally more complex, the one proposed by Archakov and Hansen (2021) respects all the properties that a parametrization should have. Moreover, it allows the implementation of different models for variances and correlations. In the ever-evolving context of time series modelling, this can be crucial for the practitioner to analyse possible nonlinear relations or to investigate the role of explanatory variables on variances and/or correlations. Conversely, Cholesky decomposition does not satisfy all the properties of a parametrization. In particular, the predictions from this method strongly rely on the ordering of the series in the construction of the covariance matrix. Nevertheless, this is the simplest method among the ones we consider in this analysis and, unsurprisingly, it is the most popular method so far implemented in the prediction of realized covariances.

The analysis presented in this paper has been conducted under simple hypotheses about the price process. However, raw high-frequency data typically deviates from the ideal situations since, for example, jumps and market microstructure noise appear as stylized features of financial data. Studying the performance of the parametrization methods under more complicated settings will be a task for future works.

Finally, it should be noted that the estimated VARs, especially for \(n=15\), involve a large number of dependent variables. To deal with possible problems related to the high number of coefficients, researchers typically use prior shrinkage on the coefficients. In our simulation study, considering the case of \(n = 15\) assets, we have compared unrestricted formulations with \(L_1\)-penalized versions of the model. However, investigating the use of other approaches in a high-dimensional setting, such as the Stochastic Search Variable Selection model (George et al. 2008) and the Bayesian Compressed regression (Guhaniyogi and Dunson 2015), could be worth considering in future works.