Composite forecasting of vast-dimensional realized covariance matrices using factor state-space models

We propose a dynamic factor state-space model for the prediction of high-dimensional realized covariance matrices of asset returns. Using a block LDL decomposition of the joint covariance matrix of assets and factors, we express the realized covariance matrix of the individual assets similar to an approximate factor model. We model the individual parts, i.e., the factor and residual covariances as well as the factor loadings, independently via a tractable state-space approach. This results in closed-form Matrix-F predictive densities for the distinct covariance elements and Student’s t predictive densities for the factor loadings. In an out-of-sample forecasting and portfolio selection exercise we compare the performance of the proposed factor model under different specifications of the residual dynamics. These includes block diagonal residuals based on the GICS sector classifications and strict diagonality assumptions as well as combinations of both using linear shrinkage. We find that the proposed model performs very well in an empirical application to realized covariance matrices for 225 NYSE-traded stocks using the well-known Fama–French factors and sector-specific factors represented by exchange traded funds.


Introduction
Modeling and forecasting covariance matrices of financial assets are essential in various fields like option pricing, risk management and portfolio allocation. Numerous approaches have been studied in the past decades, mostly based on daily asset return B Jan Patrick Hartkopf jan.hartkopf@gmail.com 1 Cologne Graduate School, University of Cologne, Universitätsstr. 22a, 50937 Cologne, Germany data. Due to the increasing availability of intraday asset return data it is nowadays possible to compute accurate nonparametric ex-post estimates of daily integrated covariance matrices of asset returns, called realized covariance, which are known to incorporate much more information about daily conditional covariances than daily return series, and come with the advantage that time-series models can directly be applied. 1 However, this involves two key challenges. Firstly, the need to assure positive definiteness of the covariance matrix forecasts, and secondly, to cope with the parameter proliferation arising through the fact that the number of objects to be modeled is proportional to the square of the number of assets. These aspects become even more pronounced in vast-dimensional applications that are considered in this paper.
Nevertheless, the direct time-series modeling of realized covariance matrices has arisen as an important new strand of literature, pioneered by the studies of Gourieroux et al. (2009), Noureldin et al. (2012, Golosnoy et al. (2012), and Jin and Maheu (2013), who focused on capturing the time-series behavior of realized covariance matrices through matrix-variate distributions like the Wishart and inverse-Wishart. In addition Bauer and Vorkink (2011) and Chiriac and Voev (2011) focused on decompositions of realized covariances to apply standard time-series models for the prediction of future realizations. More recent contributions examine additional aspects in modeling realized covariances by using more complex distributions like mixtures of inverse-Wisharts or the Matrix-F distribution that allow for fat tails in the covariance (Jin and Maheu 2016;Opschoor et al. 2017;Opschoor and Lucas 2019;Vasallo et al. 2021;Zhou et al. 2019). As further example, the Riesz distribution allows for incorporating liquidity differences in the intraday returns and reflecting tail heterogeneity (Gribisch and Hartkopf 2022;Blasques et al. 2021). The benefits of LASSO (least absolute shrinkage and selection operator) techniques on the predictive accuracy are examined in Callot et al. (2017), the incorporation of (multiplicative) long-term components is studied by, e.g., Bauwens et al. (2017). However, most of the proposed models are limited to low-dimensional applications, say with 30 assets or less.
In this paper we contribute to another branch in realized covariance modeling, i.e., the application of factor structures for the assets' covariance matrices to deal with empirically more realistic scenarios in increasing dimensions. Prominent examples of the factor approach are the models proposed by Tao et al. (2011), Asai and McAleer (2015), Jin et al. (2019), Sheppard and Xu (2019), Gribisch et al. (2020) and Brito et al. (2018). While the first three approaches extract implicit factors from realized covariance data via an eigenanalysis and therefore build dynamic time-series models, the remaining three rely on economically motivated observed risk factors, like in the Fama and French (1993) three-factor model.
The use of observed risk factors is particularly appealing as it allows for timevariation in both the factor loadings and the residual components which yields flexibility gains when modeling the complex structures driving the evolution of realized covariances. We follow this route and propose a flexible factor state-space model for the prediction of realized covariance matrices of asset returns based on observed risk factors which is capable of handling dynamic systems of vast dimensions (say 200 and more).
Our method relies on a factor decomposition of the realized asset covariance matrix from a joint construction of the realized measure for the factors and assets. After observing the covariances of both the assets and factors, we construct realized factor loadings and realized residual components from standard matrix decompositions resulting in a time-series for each of the individual parts. We then propose independent state-space models applied to the time-series of the factors, factor loadings and residual components akin to an approximate factor model taking the 'true' integrated factor and residual covariance matrices and integrated loadings as a latent state variables which are observed through their noisy realized counterparts.
For the factor and residual covariance matrices we combine independent Wishart measurement densities with parsimoniously parameterized matrix-variate Beta transitions in the vein of Windle and Carvalho (2014). Besides yielding positive definite covariance forecasts by construction, this specification has the advantage of being tractable in a sense that the predictive distribution of the data, and hence, the likelihood function is available in closed form. As a further contribution, we derive additional useful properties of the covariance model at hand. For the factor loadings we rely on well-established models from the TVP-VAR literature with stochastic volatility estimated by maximum likelihood (ML) techniques based on Kalman-like filtering.
Although a separate modeling approach has certain drawbacks as it ignores dataimposed dependencies in the measurements, it comes with the huge advantage of reducing model complexity allowing to efficiently handle vast-dimensional portfolios. In contrast to other state-space approaches that require high computational effort, our approach enables parameter estimation and prediction on a minute time scale. Based on the individual models, predictions for each component are readily made completely in parallel and are combined afterward to obtain a composite forecast of the full asset covariance matrix. This makes the model particularly appealing for practitioners.
In order to investigate whether the loss of information arising from the composite nature of our proposed model has any influence on its predictive accuracy we later include the Factor HEAVY (high-frequency-based volatility) model of Sheppard and Xu (2019) in the set of competing models. The Factor HEAVY relies on a GARCH-type modeling of the observed factors' and assets' covariances based on a joint Wishart distribution combined with a strict factor structure (i.e., a sparse diagonal residual component). The diagonal residual assumption is crucial for their model as it enables a neat factorization into (conditionally) independent low-dimensional components allowing for straightforward estimation by quasi-maximum likelihood (QML). However, it imposes severe restrictions on the residual component which can be unrealistic in vast-dimensional applications. Our modeling approach allows to relax this assumption without effort.
The proposed composite factor state-space (CFSS) 2 model relies on the same factor decomposition as used in Brito et al. (2018). They combine the LASSO approach of Callot et al. (2017) for the factor and residual covariances with linear heterogeneous autoregressive (HAR, Corsi 2009) processes for the factor loadings. Their approach has two major drawbacks in comparison with the CFSS model. First, since their model specification is prone to over-parameterization, they have to restrict the dynamics of the residual component to be driven by past residual variances only, neglecting possibly important information when forecasting the assets' covariance matrices. Second, and maybe more important, they operate with transformed time-series for the covariance parts based on the matrix-logarithm and make use of the matrix exponential function to ensure positive definiteness of the predictions which induces a bias to the forecasts. This bias might be neglectable in short-term forecasts; however, it gets more pronounced over longer horizons.
In the empirical part, we apply the CFSS model to daily covariance matrices for the returns of 225 NYSE traded stocks. In an extensive out-of-sample analysis we compare several model specifications based on varying sets of observed risk factors and different residual structures, as well as different restrictions imposed to the dynamics of the realized covariance components. We illustrate the predictive performance of our approach relative to several competing models. As performance measures we consider the accuracy of the (co)variance predictions and the ability to produce predictions for the global-minimum variance portfolio under different side restrictions as well as the ability to produce predictions for the mean-variance portfolio using momentum signals. Our out-of-sample results show that the CFSS model performs favorable in almost all dimensions relative to its competitors.
The remainder of the paper is organized as follows: Sect. 2 introduces the proposed CFSS model. The maximum likelihood estimation and parameter restrictions and the construction of composite forecasts are discussed in Sect. 3. Section 4 describes the data set. Section 5 discusses implementation details and presents the out-of-sample results. Finally, Sect. 6 concludes. Additional derivations for the model (parts) at hand are provided in Appendix, additional in-and out-of-sample results are deferred to Supplementary Appendix.

Factor decomposition of realized covariances
Consider a positive definite, symmetric consistent estimate C t of the joint latent integrated covariance matrix t , observed for a panel of p logarithmic asset prices and q observed factor prices at the trading days t = 1, . . . , T . Let C t be partitioned as where C f t constitutes the q × q realized covariance matrix for the factors, C r t denotes the p × p realized covariance matrix for the assets and C f r t = (C r f t ) denotes the p × q matrix of the realized covariances between the factors and assets.
In the present paper our particular interest is to predict the realized covariance matrix of the asset returns C r t . To this end, we exploit a factor decomposition relating the assets' realized covariance to the risk factors via where the matrices B t ( p × q) and C e t ( p × p) are uniquely found from a block LDL factorization of the block matrix representation in Eq. (1) as the regression coefficient matrix and the resulting Schur complement of C f t in C t , i.e., The matrices B t and C e t can be interpreted as realized factor loadings and as realized residual covariance matrix approximating the integrated loadings B t and residual components e t , respectively. Since we observe the complete covariance C t at each point in time, we can use Eq.
(3) to determine distinct time-series for C f t , C e t and B t , such that these can be modeled and predicted separately. Afterward, they can be recomposed to obtain a forecast for the major quantity of interest, the full covariance of the assets C r t . Specifications for the dynamics of the individual parts are presented in the following section.

Model equations
We take the integrated components f t , e t and B t as latent state variables observed through their noisy realized counterparts and model them by a state-space approach with independent measurement densities f , and corresponding transition equations.
Before specifying the respective measurements for the individual parts, we make two additional assumptions on B t and C e t . First, we assume row-wise independence of the betas, i.e., Cov(b it , b jt ) = 0 ∀i = j, and i, j = 1, . . . , p, where b it is a vector containing the ith row's entries of B t . This assumption is comparably mild as, e.g., Brito et al. (2018) assume element-wise independence for the entries in B t . Second, we impose the following block structure for the residual covariance where the S diagonal blocks are of size p i × p i , i = 1, . . . , S. The asterisk ' * ' indicates residual covariances which are economically insignificant and thus are ignored. 3 The structure in Eq. (4) is supported by empirical evidence in many studies when the number of factors is chosen large enough, see, e.g., Fan et al. (2016), Aït-Sahalia and Xiu (2017), Brito et al. (2018), Gribisch et al. (2020). Expecting the significant entries to cluster in blocks around the diagonal might appear arbitrary. However, by rearranging the rows and columns in C r t such a block structure can almost always be achieved (at least approximately). As it is now common in the literature to model e t as a sparse matrix (e.g., Sheppard and Xu 2019 assume e t to be strict diagonal and only use residual variance information in a one-factor setting), we impose a blockdiagonal structure and assume that the integrated residual covariance e t is sufficiently measured by the marginals f (C e 1t | e 1t ), . . . , f (C e St | e St ). Next, we specify the measurement distributions for the different components , which read as follows where W m (d, s) denotes a m-dimensional Wishart distribution with d > m−1 degrees of freedom and scale matrix s, and N m (a, b) denotes a m-dimensional Normal distribution with mean a and covariance b. The latter choice is motivated by the fact that similar (marginal) distributions result from the assumption of a joint conditional Wishart distribution for C t in (1), which can be seen as a natural candidate for the measurement (c.f. Gribisch et al. 2020, and see Philipov and Glickman 2006;Golosnoy et al. 2012;Noureldin et al. 2012;Bauwens et al. 2016, for applications of the Wishart to realized covariances). In order to complete the composite factor state space (CFSS) model, we specify the transition densities for the latent time-varying integrated factor and residual covariances,  Windle and Carvalho (2014), which combines the measurements in Eq. (5) with a generalized Matrix-Beta type-I transition for the integrated precision, i.e., the inverse of the integrated covariances f t and { e it }, respectively, implying a shifted Matrix-F transition for f t and { e it } themselves (see "Appendix A.2" for a derivation). In the vein of Windle and Carvalho (2014) we refer to this model as Uhlig Extension (UE) as it depicts a generalization of the process originally proposed in Uhlig (1994Uhlig ( , 1997. The UE model comes with three major advantages. First, irrespective of the dimension of the underlying covariance parts, it is parameterized parsimoniously with only three parameters. Parsimony has proven to be particularly advantageous in predicting high-dimensional covariance matrices (see, e.g., Bauwens et al. 2016). Second, it yields positive definite (p.d.) forecasts for C f t and C e t by construction. Hence, there is no need to resort to any transformation like the Cholesky decomposition that depends on the ordering of assets in C t , or the matrix logarithmic transformation (Bauer and Vorkink 2011), which may induce severe biases of the predictions C r t+1 through the exponential re-transformation. Third, and most important, for the UE model we can exploit the conjugacy between the Wishart and the Matrix-Beta distribution to obtain closed-form filtering formulas for tracking the latent states of the system and allowing the model parameters to be estimated by maximum likelihood in one step. (Details are given in the following section.) In the UE model for the factor covariance matrix, the q×q symmetric, p.d. integrated precision with conditional expectation given by E[ For the factor loadings {b it } we use a local-level version of the time-varying parameter VAR with stochastic volatility proposed in Moura and Noriller (2019). This model assumes the latent βs to follow heteroscedastic random walks and uses a Wishart process for modeling the volatility of the system. The latent q × 1 vectors of factor loadings b it , i = 1, . . . , p, evolve as driftless random walk processes of the form where the transition is allowed to be affected by multivariate stochastic volatility shocks via b it , scaled by 1/σ i . Similar to the UE model, the evolution of b it = ( b it ) −1 is specified by the following multiplicative transition with fixed initial condition. Here, b it are q × q iid singular Beta type-I shocks to b it , with single d.o.f. parameter k b i governing the time-variation in the precision, given a predetermined value of 0 < λ b i < 1. Again, exploiting the conjugacy between the Normal, Wishart and Matrix-Beta distributions, analytical expressions for the filtering distribution of the loadings and the likelihood function can be found, enabling fast and straightforward estimation of the unknown model parameters. It is worth noting that for k b i → ∞ the nonlinear state-space model in Eqs. (5), (8) and (9) collapses to a homoscedastic version (Kim 2014;Moura and Noriller 2019). For 1/σ i → 0 the latent factor loadings are time-invariant.

Predictive distributions and likelihood estimation
This section describes the ML estimation of the CFSS model. The unknown model parameters are estimated by independently maximizing the distinct elements in the conditional log-likelihood where L f , L e i and L b j denote the likelihood contributions of the factors, residual blocks and factor loadings, with parameters , respectively. 6 The notation A s:τ is used to denote the collection {A s , . . . , A τ }.
The measurement distributions in Eq. (5) combined with the corresponding transition equations for (6) to (9) constitute independent nonlinear state-space models for the realized quantities C f t , {C e it } and {b jt }. By exploiting the conjugacy between Normal, Wishart and Matrix-Beta distributions, Windle and Carvalho (2014) and Moura and Noriller (2019) derived several useful propositions for the UE model and the TVP-VAR, including the forward filtering distributions for the latent precision matrices and factor loadings, which turn out to be Wishart and multivariate-t, respectively. In addition, Windle and Carvalho (2014) propose a backward sampling scheme to efficiently draw the latent sequences . . , S) in a single sweep within a Bayesian MCMC sampler. In the following we focus on their results enabling fast and easy to implement ML estimation of the model parameters, and the prediction of future realizations of C r t in Eq. (2). Let C f 1:t capture the information on the realized factor covariance matrices from period 1 up to t. With the transition in Eq. (6) and the corresponding initial condition, the predictive distribution for C f t+1 given past information is of Matrix-F type, i.e.,  Opschoor et al. 2017). The conditional first moment that is used to generate one-step forecasts for C f t+1 is given by 7 where solving the recursion for S f t yields implying that the forecast of C f t+1 will be a scaled, geometrically weighted sum of the previous observations. From Eq. (13) one can see that λ f controls the degree of smoothing of the observations. This is crucial when forming estimates and one-step ahead predictions (cf. Windle and Carvalho 2014). 8 Based on Eq. (11) the likelihood function for the factor part obtains as Analogous results hold for the predictive distributions f (C e it+1 | C e i1:t ) and likelihoods L e i of the distinct residual covariance blocks, and left out here for space saving reasons. 7 A derivation for the conditional second moment, i.e., Cov[vec(C f t+1 )], is given in "Appendix A.1". 8 In Sect. 3.2 we propose several restrictions on the smoothing parameter, directly relating it to the shape of the predictive distribution. 9 The likelihood function is based on S f 0 , which could already be challenging to estimate in small to medium dimensional applications. To circumvent this problem, we apply the approach proposed by Windle and Carvalho (2014), i.e., we set aside the first T observations and use the set T −i to estimate the remaining model parameters. Now, let b i1:t capture the information on the realized factor loadings from period 1 up to t, for i = 1, . . . , p. The predictive density of b it+1 is that of a scaled multivariate t distribution (see Moura and Noriller 2019, Corollary 1) 10 with The initial conditions are fixed parameters. 11 The predictive expectation of the factor loadings is given by Again, the likelihood function L b i is obtained as the product of the predictive densities in Eq. (14).

Parameter restrictions
Our proposed CFSS model as defined by Eqs. (5) to (9) includes 3 × (1 + S + p) parameters, whereof the majority relates to the covariance dynamics of the factor loadings. In order to further increase the parsimony of our approach we utilize commonly imposed restrictions to the d.o.f. parameters of the loading processes and the smoothing parameters.
In initial investigations of the realized factor loadings in scenarios with different risk factors, we found the estimates for k b i to be similar throughout the asset dimension p. Moreover, we found that minor changes in the d.o.f. parameters do not have noteworthy effect on the predictive ability regarding mean forecasts for b it+1 . Hence, we preset k b i 10 Note a typo in Moura and Noriller (2019), where they define the scale of the multivariate t as by mistake. The error has been fixed here. 11 We set N i1 = 1 and S b i1 = I q .
in the ongoing of this paper. Similar restrictions are indeed imposed by Uhlig (1997) and Kim (2014) in the context of constant coefficient VARs (see also Moura and Noriller 2019). Kim (2014) suggests to choose values k b i ∈ [10, 25] which mirrors our findings fork b i . Consequently, we adopt this in the empirical analysis. 12 For given is too small. Hence, it seems reasonable to further tie the smoothing parameter to the 10,25]. In addition we consider three different restrictions on the smoothing parameters . They are all considered as distinct model variants and their predictive performance is analyzed in the empirical application below. Each of them has been found to be valuable on its own. Note that all restrictions imply λ f , {λ e i } ∈ (0, 1) and that they can straightforwardly be imposed during the estimation of the degree of freedom parameters. Here they are exemplified for the factor part. Similar restrictions are later imposed for the residual blocks as well.
The first restriction has been originally proposed by Windle and Carvalho (2014) and is given by It implies that the one-step-ahead forecast for the realized factor (residual) covariance based on (12) is obtained as the exponentially weighted moving average (EWMA) which is known to deliver decent (short-term) predictive performance. 13 Notably, this restriction also results in a martingale for the evolution of latent integrated factor (residual) covariance matrices, i.e., E[ The second restriction is chosen to induce a random walk behavior for the latent integrated precision matrices in Eq. (6), A similar restriction has been imposed by Uhlig (1997) in the context of singular Matrix-Beta transitions for the latent precision. In the context of stochastic covariance modeling for daily return series, Moura et al. (2020) found (R2) to be performing well in out-of-sample portfolio allocations. 12 The exact restrictions for k b i are postponed to Sect. 5. 13 Note that under restriction (R1) the prediction for C f t+1 is similar to the one implied by the RiskMetrics methodology (Morgan 1996). However, in the latter the smoothing parameter is set arbitrarily to 0.96, whilst in the former λ is determined by the distributional shape mirrored in the degree of freedom parameters n f and k f . The third restriction traces back to the work of Shephard (1994), where in an univariate modeling framework akin to Uhlig (1997) the smoothing parameter is chosen as the geometric mean of the latent beta shocks in the precisions' transition equations. Shephard (1994) argues that this restriction rules out the case of the precision collapsing to zero if t → ∞. Here, we propose a multivariate extension of this restriction which takes the form Abramowitz and Stegun 1972;Gupta and Nagar 2000) and that (R3) results in a random walk for the log determinant process of the integrated precision matrices. Figure 1 visualizes the restrictions on the smoothing parameter for an exemplary scenario of 25 assets, which corresponds to the average sector-size in the empirical application. Panels (a)-(c) show contour plots of (R1), (R2) and (R3)  for n fix at its lower bound and k → ∞ (k fix at its lower bound and n → ∞). Hence, the less fat tailed and noisier the observations are, the more the model smooths, and vice versa. Panels (d)-(f) show contour plots of the differences between the ML estimate for λ (when fixing n and k) and the respective restrictions (R1), (R2) and (R3). The black circles mark the corresponding unrestricted ML estimates of the d.o.f. parameters, i.e., n ≈ 89,k ≈ 175. The unrestricted MLE for the smoothing parameter isλ = 0.63 with a standard deviation of 0.017. We can see that (R1) and (R3) tend to be smaller than the MLE, whereas (R2) exceeds the MLE slightly, though all differences lie within one standard deviation. The performance of (R1) and (R3) deteriorates if both d.o.f. tend to their lower bound. However, it is common in practice to impose these restrictions when estimating the d.o.f. in order to reduce estimation uncertainty.
Imposing the restrictions stated above reduces the number of parameters to be estimated to 2 × (1 + S) + p. This further increases the parsimony but preserves the flexibility of the model at hand and allows for very fast estimation of the unknown model parameters based on numerical optimization routines. 14

Composite forecasting
Exploiting the decomposition of the realized asset covariance in Eq. (2), we write the forecasting equation for the realized asset covariance C r t+1 by combining the distinct forecasts for C f t+1 , {C e it+1 } and {b it+1 }, i.e., where we use :t ] as obtained from the respective predictive moments in (12) and (19). Since the predictions of the factor and residual covariance are p.d., the forecast for the assets covariance will be p.d. by construction as well.
The composite forecasting procedure in Eq. (20) relies on a simple plug-in approach and hence ignores nonlinearities in the term B t+1 C f t+1 B t+1 when predicting C r t+1 . As pointed out by an anonymous referee a more general predictor could be obtained by combining the separate forecasts through the optimization of some, possibly strictly consistent, loss function for the assessment of forecasting performance. For example, the two components on the RHS of Eq. (20) could be rescaled by means of scalar coefficients estimated by minimizing the Frobenius norm of the in-sample forecast errors. However, we do not consider such a calibrated predictor in the present paper. 15

Data
The given data set includes and expands the 60 dimensional data of Gribisch et al. (2020). It consists of 1510 daily observations for 225 stocks traded at the New York Stock Exchange and 12 risk factors, observed between January 3, 2007, and December 31, 2012, comprising a total of 28,203 time series of realized variances and covariances. The stocks are selected by liquidity from the S&P 500 index and are sorted by their sector and industry classification according to the Global Industrial Classification Standard (GICS). For the observed factors we use the market, the high-minus-low price-earnings ratio (HML) and small-minus-big market capitalization (SMB) factors as in the Fama and French (1993) three-factor model. Additionally we consider the sector-specific Spyder Exchange-Traded Funds (SPDR ETFs) for the nine sectors covered by the 225 stocks (Fan et al. 2016;Aït-Sahalia and Xiu 2017;Gribisch et al. 2020). A brief summary of the sector sizes along with the sector-specific ETF tickers and some descriptive statistics are given in Table 1.
The daily realized covariance matrices C t are computed by using the composite realized kernel method of Lunde et al. (2016) based on 5-minute returns for the Fama-French factors and 1-minute returns for the assets as well as the SPDR ETFs (see Barndorff-Nielsen et al. 2011;Lunde et al. 2016, for further details). Given the joint realized covariance matrices for the assets and factors, we compute according to Eq. (3) the realized residual covariance matrices C e t and the realized factor loadings B t which represent estimates for integrated residual covariance matrices and loadings, for various sets of factors. Figure 2 shows time-series plots of the realized variances for the three Fama-French factors, as well as sector-wise averages of realized asset variances, realized factor loadings and realized residual variances for the industrial, health care and financial sector.
In Fig. 3 we analyze the sparsity pattern for the realized residual covariance matrices. Following Aït-Sahalia and Xiu (2017) and Brito et al. (2018) we determine the economically significant entries of the residual correlation, i.e., we flag each entry with a minimum absolute correlation of 0.15 for at least 1/3 of the sample period. Panel (a) displays the results for the realized asset covariance matrices. Panels (b)-(d) display the results for the realized residual components after removing the common   Table 1 covariation driven by the market factor only, the three Fama-French factors and the Fama-French factors plus nine sector-specific factors based on the ETFs, respectively. The analysis shows that using only one or three factors still yields very dense residual correlation patterns, indicating not only a strong rejection of the strict diagonality assumption, but also of the block-diagonality based on industry sectors (highlighted as black rectangles). However, by inclusion of the sector ETFs almost all of the significant inter-sectoral and most of the intra-sectoral correlations vanish, leaving behind a somewhat mixed pattern. This could indicate that a block-diagonal specification might model irrelevant correlations whereas a strict diagonal specification still ignores important information.
Both the diagonal and block-diagonal sparsity restrictions are imposed ex ante and are not changed during the analysis. In order to investigate whether the sparsity pattern is time-varying we plot the residual pattern for the 12 factor case at different points in time in panel (e). The results indicate that the pattern is rather constant over time.

Out-of-sample forecasting analysis
This section reports the out-of-sample forecasting results. Selected in-sample parameter estimation results for the proposed CFSS model and robustness checks are presented in Supplementary Appendix.

Implementation
We analyze the out-of-sample performance of three different factor structures: a 1-factor model with the market factor only (1F), a 3-factor model based on the Fama-French factors (3F), and a 12-factor model including the Fama-French factors plus the nine sector-specific ETFs (12F). For both the realized factor and residual covariance matrices we fit the unrestricted UE model (R0) and compare its performance to the restricted models (R1), (R2) and (R3) within the CFSS framework. The d.o.f. parameters k b i for the stochastic volatility process of the (latent) betas are fixed to their cross-sectional median values based on the in-sample estimates. For completeness, we consider diagonal (D) and block-diagonal residual assumptions based on the GICS classification (S) for each of the factor structures, although this is only (approximately) supported under the 12-factor specification. The diagonal specification sets S = p and p i = 1∀i, the block-diagonal specification sets S = 9 with sector sizes given in Table 1. Motivated by the findings of Fig. 3, we also consider post-prediction shrinkage of the sector-blocks toward their diagonal. We use a linear shrinkage (LS) approach akin to , Ledoit and Wolf (2004). The LS prediction reads as where denotes the Hadamard product (see Lütkepohl 1996, p. 3). The optimal shrinkage intensity α i is found by minimizing the expected Frobenius distance between the in-sample LS prediction and the realized residual blocks. 16 The first three years of the data set are used as in-sample and the remainder as out-of-sample periods. We employ a rolling-window size of three years to obtain oneday (h = 1), 1-week (h = 5), 2-weeks (h = 10) and one-month (h = 22) ahead predictions. The daily forecasts are obtained by re-estimating each model on a daily basis and forming predictions via Eq. (20). For the multi-step ahead forecasts we rely on a direct forecasting approach, i.e., we use a horizon-specific estimated model where the dependent variable is the multi-period ahead value being predicted (Marcellino et al. 2006 (2011) we aggregate the realized components of C r t using non-overlapping h-day windows.
The sample period includes the financial crisis of 2008 as well as flash crashes in 2010 and 2011. These events lead to the presence of outliers in the (co)variance time-series which could distort parameter estimation results and affect out-of-sample predictions. In order to mitigate the effect of this comparably extreme events it is nowadays a conventional approach to perform a slight ex-post cleaning on the realized covariance matrices (Callot et al. 2017;Brito et al. 2018). As a means for cleaning the estimation sample we rely on the method proposed in Callot et al. (2017). Each day for which at least 25% of the unique elements of the realized covariance matrix exceed their sample mean up to then by more than four standard errors are flagged for censoring. The flagged matrices are then replaced by the sample average of their ten nearest non-flagged neighbors. In total 22 covariance matrices are flagged, which corresponds to only 1.46% of the whole sample.
It is worth to note that the fraction of flagged matrices can vary depending on the particular realized covariance estimator used. More sophisticated and noise robust realized covariance estimators could mitigate the need for ex-post data cleaning (see, e.g., Fan and Kim 2018). However, a thorough investigation of this problem is going beyond the scope of the paper.

Competing models
In order to compare the out-of-sample forecasting performance of the proposed composite factor state-space model we consider five alternative state-of-the-art forecasting approaches as benchmarks: (1) The Factor LASSO approach of Brito et al. (2018), 16 That is α i = arg min ]. 17 The direct forecasting approach is known to be more robust than the iterated approach as it is less prone to error propagation, which is especially relevant if there exists a model misspecification problem (Andersen et al. 2003;Chiriac and Voev 2011;Bollerslev et al. 2018;Luo and Chen 2020). (2) the Factor HEAVY (FHEAVY) model of Sheppard and Xu (2019), 18 (3) the Realized consistent DCC (Re-cDCC) model of Bauwens et al. (2016), (4) the Exponentially Weighted Moving Average (EWMA; Morgan 1996), (5) and the random walk (RW) model.
While the Factor LASSO and HEAVY models can be seen as natural competitors, the (non-factor) Re-cDCC is known to be a hard-to-beat benchmark when predicting realized asset covariance matrices. The EWMA and RW models can be seen as standard industry practice. Models (1) and (2) use the decomposition in Eq. (20) to predict C r t+1 , models (3) to (5) are directly applied to the series of realized asset covariance matrices.
The LASSO (and also the EWMA and RW) approach can be implemented as described in Brito et al. (2018). To implement the HEAVY and Re-cDCC model, however, minor restrictions are required for the high-dimensional asset (and factor) setting as considered here. For the factor part in the multi-factor HEAVY model we implement a scalar version of the CAW model (see Golosnoy et al. 2012) and apply a covariance targeting approach (see, e.g., Noureldin et al. 2012) to get rid of the constant part. For the Re-cDCC model we consider the scalar specification for the underlying correlation dynamics. Potential unreliability in the parameter estimation arising from numerically calculating determinants and inverting huge dimensional matrices is circumvented by further restricting the correlation part using the dynamic equicorrelation (DECO) approach. In the DECO model both the determinant and inverse of the correlation matrix are given by closed-form expressions (see also Engle and Kelly 2012). The resulting model (sRe-cDECO hereafter) is advocated by Bauwens et al. (2016) as valuable competitor when handling vast dimensional systems.

Statistical forecast evaluation
We assess the accuracy of the direct h-step-ahead point forecast with two types of statistical loss functions. First, the root mean squared error (RMSE) based on the Frobenius norm of the forecasting error that is calculated by comparing C r t+h and the ex-post observed value C r t+h , is considered (cf. . This RMSE is given by where c it+h and c i jt+h denote the realized variance of asset i and the realized covariance between asset i and j, respectively. The corresponding forecasts are denoted by c it+h and c i jt+h , and T denotes the number of forecasting periods. In order to see whether a model performs different w.r.t. the different elements in the covariance 18 Since our focus lies on forecasting the realized measures, we ignore the HEAVY-P equations which link the realized covariance forecast to the conditional (latent) return covariance (see Sheppard and Xu 2019). matrix we follow Gribisch et al. (2020) and disentangle the RMSE from above into the following two loss functions where RMSE v uses the variances only and RMSE c considers the covariances. In addition to the RMSE loss we consider the commonly used QLIKE loss function, i.e., While the RMSE losses are symmetric, the QLIKE loss function is an asymmetric loss, penalizing under-predictions more heavily (cf. Luo and Chen 2020). Both loss functions are known to be robust to noisy (co)variance proxies (Patton 2011;Laurent et al. 2013;Sheppard and Xu 2019).
In order to evaluate the statistical significance of differences in the RMSE, RMSE v , RMSE c and QLIKE losses across models, we employ the Model Confidence Set (MCS) approach of Hansen et al. (2011). The MCS is computed for the 75% confidence level using a block bootstrap with block length (T ) 1/3 and 10,000 bootstrap replications. Table 2 reports the RMSE and QLIKE results for the short-, mid-and longterm out-of-sample forecasts. We first compare the performances of the different CFSS model specifications. For one-step and five-step ahead forecasts the 12-factor CFSS model clearly outperforms the single and three-factor settings across all loss functions, regardless of the residual specification considered (though the losses suggest that block-diagonal specifications S and L S do increase the forecast precision). The biweekly and monthly results reveal that the three-factor models with diagonal and linear-shrinkage-based residuals perform best in predicting long-term covariance matrices, especially for the covariance elements as indicated by RMSE c . In terms of QLIKE loss, however, including sector ETF-based factors still yields gains in forecast precision. Furthermore, we see that imposing restrictions on the smoothing parameter helps to increase predictive accuracy-while restriction (R2) solely convinces for the QLIKE measure, restriction (R1) performs overall well. Now, we turn to a comparison with the nine competing models. Most striking is that for the 'full' RMSE loss all competitors are significantly outperformed by the CFSS model specifications. In particular, for daily and weekly horizons solely the UE-R1-LS specification belongs to the 75% MCS. 19 Similar results can be found for the QLIKE loss, where the UE-R2-LS specification significantly outperforms all competitors for daily forecasts. For multistep ahead forecasts the sRe-cDECO model and further CFSS specifications, however, perform equally well. The EWMA and RW models are significantly outperformed in any case.  Further results, including additional loss functions and investigations of the model behavior under different market conditions (i.e., calm and volatile periods), are provided in Supplementary Appendix. These robustness checks confirm the above findings.

Portfolio construction
In this section we turn to an economic application in out-of-sample portfolio construction. We analyze the forecasting performance for the CFSS model and its competitors based on several variations of the global minimum variance (GMV) and meanvariance-optimal (MV) portfolios in the vein of Lunde et al. (2016) and Moura et al. (2020), imposing short-selling and boundary constraints on the weights to ensure stable and well-diversified portfolios.
For a given asset covariance forecast C r t+h computed in period t, the GMV portfolio is the solution to the minimization problem where w t+h is the p × 1 vector of portfolio weights and ι is a p × 1 vector of ones. w t+h 1 = p i=1 |w it+h |, and w t+h ∞ = max 1≤i≤ p |w it+h | denote the L 1 and L ∞ norm of the weight vector, respectively. Here, s ∈ [0, 1] denotes the percentage that is allowed to be held short, and u > 0 denotes the boundary for every distinct position (see Fan et al. 2012;Lunde et al. 2016, for more detailed discussions). Since the constraint minimization problem in Eq. (25) has no closed-form solution we rely on the CVX package in MATLAB of Grant and Boyd (2014) to solve for w t+h . 20 To find the MV portfolio Eq. (25) is augmented by the additional constraint w t+h m = μ target , where m is a signal variable and μ target is the target return. The signal is constructed using the momentum factor of Jegadeesh and Titman (1993), i.e., for each of the p stocks the individual momentum m i is computed as the geometric average of the previous 252 returns, but excluding the 22 most recent returns. Collecting all the momentum in a vector yields the signal m. The target return is computed as the arithmetic average of the momentum of those stocks that belong to the top-quintile stocks ranked according to momentum (see also Engle et al. 2019;Moura et al. 2020).
For assessing the relative capabilities of the competing models for optimal portfolio allocation, we calculate their out-of-sample portfolio returns r p t = w t r t and report five measures, i.e., the portfolio standard deviation 20 However, by dropping the norm constraints the minimization problem has the unique solution w t+h = ( C r t+h ) −1 ι (ι ( C r t+h ) −1 ι) given C r t+h is positive definite. In large-scale applications as considered here, it might occur that the covariance prediction is not well conditioned, possibly resulting in unrealistically volatile weights due to the inversion of C r t+h . For the factor model predictions based on Eq. (20) this problem is mitigated. Due to the (block)diagonality of the residual covariance prediction, the p dimensional inverse problem collapses to max{q, p 1 , . . . , p s } dimensions by making use of the Woodbury matrix inversion lemma, i.e. (C r the Sharpe ratio the average portfolio concentration the average total short positions and the average turnover rates The reported portfolio standard deviations and Sharpe ratios are annualized by multiplication with √ 252/h . Statistical significance of the differences in portfolio standard deviations is assessed by the MCS approach using the squared demeaned portfolio returns as loss series. For a more detailed discussion of the measures in Eqs. (28)-(30) the reader is referred to Hautsch et al. (2011), Bauwens et al. (2016), Bollerslev et al. (2018. In the present paper we follow Brito et al. (2018) and Lunde et al. (2016) and consider three restricted portfolio construction problems, i.e., a 150/50 portfolio (s = 0.50), a 130/30 portfolio (s = 0.30) and a long-only portfolio (s = 0.00). The latter portfolios are constructed using a boundary value of u = 0.10 for each weight, such that at least ten assets are included. In addition, unrestricted (G)MV portfolios are considered. We focus on daily allocations and consider weekly horizons as robustness check.
The daily GMV portfolio allocation results are collected in Table 3. For the unrestricted portfolio the EWMA has the lowest standard deviation. However, the 1F-S and 3F-S CFSS models cannot be outperformed significantly while yielding higher Sharpe ratios and more diversified portfolios with less extreme short positions. By imposing 50% short-selling constraints the performance of the 12F-and LS-based residual models improves with comparably lower turnover for the CFSS models. By imposing stronger (30% and 0%) short-selling constraints the portfolio standard deviations get hardly distinguishable. Though comparing the CFSS model specifications to the LASSO and HEAVY competitors, we find better performance regarding the S R and TO results for the CFSS. Overall, the lowest turnover is obtained with the EWMA. The highest turnover is obtained for the random walk, followed by the Factor LASSO specifications. The weekly portfolio results are summarized in Table 4. They indicate similar allocations. The lowest standard deviations are now obtained by the sRe-cDECO and EWMA. However, the CFSS with block residual belongs to the MCS in every scenario, whereas the factor HEAVY is only included for long-only portfolios.
The daily MV portfolio allocation results are collected in Table 5. The results are qualitatively similar to those obtained in the GMV exercise. For the unrestricted allocations the EWMA shows the lowest standard deviation. The second best models included in the MCS are the CFSS 1F-R1-S, 1F-R2-S and 3F-R2-S specifications, respectively. Compared to the EWMA all three have higher SR as well as lower CO and SP. Imposing short-selling constraints ameliorates the performance of the 12F CFSS model specifications in terms of lower standard deviations and lower turnover values than the 1F and 3F specifications. As robustness check, Table 6 shows weekly allocation results. The results do not differ substantially in comparison with the daily allocations.

Conclusion
In this paper we propose a composite factor state-space (CFSS) model for the prediction of vast-dimensional realized covariance matrices of asset returns. The model exploits an observed factor structure of the realized covariances of asset returns. Its components are found by a block LDL decomposition of the joint realized covariance matrix of the observed risk factors and assets. This yields individual time-series for the realized factor covariances, the realized residual covariances and the matrices of realized factor loadings. An assumption of independence is made, which admits a straightforward factorization of the likelihood function, reducing model complexity and allowing for fast parameter estimation and prediction of asset covariance matrices in scalable dimensions.
We apply the factor model to a data set of daily realized covariance matrices for 225 NYSE-traded stocks. The observed risk factors we consider are the CAPM market factor, the Fama and French (1993) HML and SMB factors as well as sector-specific ETFs. It turns out that for justifying an approximate factor structure based on sectorblocks for the residual components it is critical to include all those factors.
In an extensive out-of-sample application including point forecasts and predictions of the global minimum variance and mean variance optimal portfolios the CFSS model shows superior forecasting performance and outperforms several competing models in almost all forecast horizons. We conclude that the CFSS approach is a valuable tool for modeling and forecasting time-series of vast-dimensional realized covariance matrices.