1 Introduction

Estimation and prediction of covariance are fundamental to financial practices such as asset allocation and risk management; however, the covariance of financial-asset returns cannot be observed directly. Most existing models can be divided into two categories; examples include the multivariate GARCH model, which treats past observations as measurable, and the multivariate stochastic volatility model, which treats the covariance matrix as if it consists of stochastic values. At present, covariance estimation and modeling approaches involve the construction of a realized covariance matrix that accurately estimates a covariance matrix over a fixed period constructed with high-frequency data. For example, Noureldin et al. (2012) incorporates ex-post covariance matrix estimates from high-frequency data into multivariate GARCH models. Similarly, the extension of realized stochastic volatility models in such a situation was proposed in Shirota et al. (2017) and Yamauchi and Omori (2020).

Consistent estimation of large matrices based on high-dimensional data usually requires some sparsity, which may result from appropriate formulation of some low-dimensional structures within the high-dimensional data (Kim et al. 2018; Tao et al. 2013). Hence, Tao et al. (2013) and Kim et al. (2016, 2018) proposed procedures that require sparsity in the covariance matrix, which itself must be estimated. In Brownlees et al. (2018), realized networks that apply a graphical lasso (Friedman et al. 2008) to impose sparsity upon the precision matrix were proposed. The graphical lasso assumes that data follow a multivariate normal distribution. In particular, a realized network estimator can construct a partial correlation network structure using the estimated precision matrix.

To predict covariance, the Wishart autoregressive (WAR) model proposed in Gourieroux et al. (2009) guarantees the positive-definiteness of the predicted covariance without restrictions upon the parameters. In Golosnoy et al. (2012), a conditional autoregressive Wishart (CAW) model was proposed as a generalization of the WAR model. The CAW model can predict the covariance matrix, which cannot be directly observed using a realized covariance matrix.

Like the WAR model, the CAW model can guarantee the symmetry and positive-definiteness of the predicted covariance matrix without imposing any restrictions upon its parameters. The CAW model can be estimated using the maximum likelihood method, and has the further advantage of easy generalization. Golosnoy et al. (2012) combined the CAW model with the mixed-data-sampling (MIDAS, Engle et al. 2013) and heterogeneous-autoregressive (HAR, Corsi 2009) specifications to capture the long-term-dependence structure of the covariance matrix of the return rate. Additionally, Anatolyev and Kobotaev (2018) proposed a conditional-threshold autoregressive Wishart (CTAW) model, which introduced a leverage effect into the CAW model. Various extended and restricted versions of this model were also proposed in the same paper.

The CAW and CTAW models performed parameter estimation using the maximum-likelihood method, but when the number of issues increases, the number of parameters to be estimated becomes huge, making estimation difficult (the so-called curse of dimensionality). In fact, Gribisch and Stollenwerk (2020) pointed out that empirical applications do not exceed ten dimensions, because the application to higher dimensional covariance matrices is too complicated. Therefore, it is necessary to select only a few stocks to predict the covariance matrix. In this paper, we propose a new framework for covariance prediction; in particular, the realized kernel (Barndorff-Nielsen et al. 2011) is estimated from the obtained high-frequency return data. The graphical lasso method for estimating the covariance and precision matrices from a sample-covariance matrix is applied using the estimated realized kernel as an input. We construct an optimal portfolio by selecting stocks that maximize the Sharpe ratio or minimize the partial correlation among the graphical networks constructed using the graphical lasso. Return-rate covariance prediction is performed using multiple CAW-type models based on the selected stocks. The estimated powers of the models are compared using the out-of-sample technique, and the model with the highest predictive power is selected. The Stein loss function is used here, because it is robust in the sense that it does not change the ranking of competing forecasts for \(S_t\), when using an unbiased proxy of \(S_t\) as the target variable when evaluating their losses (Patton and Sheppard 2009), we use it. In this study, the model-confidence-set (MCS) method of Hansen and Lunde (2005) is used to select the model. Our framework is schematically summarized in Fig. 1, which is roughly divided into three steps following the estimation of the multivariate realized kernel:

  1. Step 1:

    Visualize the realized covariance matrix using the graphical lasso.

  2. Step 2:

    Construct an optimal portfolio by selecting stocks that maximize the Sharpe ratio among the estimated efficient frontiers and minimize the partial correlation among the graphical networks.

  3. Step 3:

    Forecast the realized covariance matrix using the estimated parameters of the CAW and CTAW models. We eventually obtain the Stein loss function of the forecasts and compare the models’ prediction performances via the MCS procedure.

Fig. 1
figure 1

Schematic summary of our framework

The structure of this paper is as follows: Sect. 2 describes the basic settings of this study and the multivariate realized kernel. Section 3 introduces the graphical lasso and Sect. 4 introduces the CAW-type model. In Sect. 5, empirical analysis is performed and the results are shown. Finally, in Sect. 6, the empirical results of this paper are briefly summarized and future works are presented.

2 Multivariate realized kernel

2.1 The asset-price process

Let \(X_t=(X_t^1,\ldots ,X_t^d)\) denote a d-dimensional log price vector of d assets at time \(t\in [0,1]\). Here, it is assumed that the stochastic fluctuation of the efficient log price within a day is expressed as follows:

$$\begin{aligned} X_t=\int _0^t \mu _s {\text {d}}s+\int _0^t \sigma _s {\text {d}}B_s, \quad t\in [0,1], \end{aligned}$$

where \(B=(B_s)_{s\in [0,1]}\) denotes d-dimensional Brownian motion. \(X=(X_t)_{t\in [0,1]}\) is assumed to be a d-dimensional Brownian semi-martingale defined on a filtered probability space

\((\varOmega ,{\mathcal {F}},({\mathcal {F}}_t)_{t\in [0,1]},P)\) (\(X\in \mathcal {BSM}\)). The drift \(\mu =(\mu _s)_{s\in [0,1]}\) is a d-dimensional \({\mathcal {F}}_t\)-predictable process, and the spot-covolatility process \(\sigma =(\sigma _s)_{s\in [0,1]}\) is \({\mathcal {F}}_t\)-predictable. The present paper, like other papers on the subject of realized covariance estimation, does not consider jumps in the \(X_t\) process.

We assume that the log prices \(X_t^i\) of all assets \(i=1,\dots ,d\) are directly observed at the same time grid \(T=\{t_1,t_2,\dots ,t_n\}\), where \(t_0=0<t_1<\dots <t_n=1\). In this case, the realized covariance matrix \(\varSigma _{RC}\) is:

$$\begin{aligned} \varSigma _{RC} = \sum _{\ell =1}^n (X_{t_\ell }^i - X_{t_{\ell -1}}^i)(X_{t_\ell }^j - X_{t_{\ell -1}}^j)^{\prime }. \end{aligned}$$

Under a high-frequency setting, such that \(\text {max}_{\ell = 1,\dots , n} \{t_{\ell } - t_{\ell -1}\} \rightarrow 0\) as \(n\rightarrow \infty \), this matrix converges in probability to the integrated covariance \(\varSigma ^*\):

$$\begin{aligned} \varSigma _{RC} \overset{p}{\rightarrow } \int _0^1 \varSigma _t {\text {d}}t = \varSigma ^*, \end{aligned}$$

where \(\overset{p}{\rightarrow }\) denotes the convergence in probability.

2.2 Multivariate realized kernel

Barndorff-Nielsen et al. (2011) proposed a multivariate realized kernel that is typically used to estimate covariance from high-frequency data. The multivariate realized kernel achieves the following:

  1. 1.

    mitigates microstructure effects,

  2. 2.

    guarantees that the covariance estimators are positive-semidefiniteness, and

  3. 3.

    removes the effects of non-synchronous trading.

On the real market, trading orders are made per asset. Hence, time stamps are recorded at unequal intervals. We used the multivariate realized kernel to solve the problems caused by such a plurality of unevenly arranged price-time series.

The following d-dimensional log price vector was considered in a financial market:

$$\begin{aligned} Y=(Y^{1},Y^{2},\ldots ,Y^{d}). \end{aligned}$$

We assumed that these prices are observed irregularly and non-synchronously for the interval [0, T]. For simplicity, we set \(T=1\), and we write:

$$\begin{aligned} Y^{i}\left( t_{\ell }^{i} \right) , \quad \ell = 1,2,\ldots ,N^{i}\left( 1 \right) , \quad i = 1,2,\ldots ,d, \end{aligned}$$

where \(N^{i}\left( t \right) \) indicates the number of samples of asset i observed up to time t and Y is derived from the efficient price X. If U is the microstructure noise, the efficient and observed prices have the relationship \(Y = X + U\).

Barndorff-Nielsen et al. (2011) proposed a refresh time \(\tau _{\ell }\) for such irregular and non-synchronous price vectors, which is defined as follows:

$$\begin{aligned} \left\{\begin{array}{l} \tau _1 = \max \{t_1^1,\dots ,t_1^d\},\\ \tau _{\ell +1} = \max \{t_{N_{\tau _{\ell }}^{1}+1}^{1},\dots ,t_{N_{\tau _{\ell }}^{d} + 1}^{d} \}, \end{array} \right. \end{aligned}$$

where \(\tau _1\) is the first time at which all of these asset prices are updated and \(\tau _2\) is the time of the next update.

The asymptotic theory in Barndorff-Nielsen et al. (2011) requires averaging m prices at the beginning and end of the day to obtain a consistent estimator. Here, \(n-1+2m=N(n, m\in {\mathbb {N}})\). We set the price vector in a day as \(Y_0,\dots ,Y_n \)(where \(Y_0\) denotes the price at the beginning of a day and \(Y_n\) denotes price at the day’s end). Here, we set:

$$\begin{aligned} \left\{\begin{array}{l} Y_0 = \frac{1}{m}\sum _{j=1}^m Y(\tau _j),\\ Y_j = Y(\tau _{j+m}), \qquad (j=1,\dots ,n-1),\\ Y_n = \frac{1}{m}\sum _{j=1}^m Y(\tau _{N-m+j}).\\ \end{array} \right. \end{aligned}$$

By taking m, which is much smaller than n, but still moderately large, these observations are averaged in error and have no large error, and the efficient price can be recorded. Using this price series, the following high-frequency-return vector \(y_{\ell }\) is calculated by:

$$\begin{aligned} y_{\ell } = Y_{\ell } - Y_{\ell -1},\quad \ell = 1,\dots , n. \end{aligned}$$

According to Barndorff-Nielsen et al. (2011), the optimal selection for m is \(m=2\).

If synchronized, the high-frequency-return vectors \(\{y_{\ell }\}\) are obtained, and we can define the multivariate realized kernel as follows:

$$\begin{aligned} K\left( {Y} \right) = \sum \limits _{h = - n}^{n}k\left( \frac{h}{H} \right) \varGamma _{h}, \end{aligned}$$

where

$$\begin{aligned} {\left\{ \begin{array}{ll} \varGamma _{h} = \sum \limits _{\ell = h + 1}^{n}y_{\ell }y_{\ell - h}^{\prime }, &{} \mathrm {for}\,\;h \ge 0, \\ \varGamma _{h} = \varGamma _{- h}^{\prime }, &{} \mathrm {for}\,\;h < 0. \end{array}\right. } \end{aligned}$$

Here, \(\varGamma _{h}\) is the hth realized autocovariance, \(k:{\mathbb {R}} \rightarrow {\mathbb {R}}\) is a non-stochastic weighting function which is also a kernel function, and H is a bandwidth. The kernel function uses the Parzen kernel window:

$$\begin{aligned} k(x) = \left\{ \begin{array}{ll} 1 - 6x^{2} + 6x^{3}, &{} 0 \le x \le \frac{1}{2}, \\ 2(1-x)^{3}, &{} \frac{1}{2} \le x \le 1, \\ 0, &{} x > 1, \end{array} \right. \end{aligned}$$

where the kernel function is an even function.Footnote 1

3 Graphical lasso

In this paper, the covariance matrix used for portfolio construction is estimated using a graphical lasso with the realized kernel as input.

Here, the log return vector is observed according to the multivariate normal distribution \(N(\varvec{\mu }, \varSigma )\). The partial correlation is calculated from the precision matrix \(\varOmega =(\varSigma )^{-1}\) as follows:

$$\begin{aligned} \rho ^{ij}=\frac{-\varOmega ^{ij}}{\sqrt{\varOmega ^{ii} \varOmega ^{jj}}}, \end{aligned}$$
(1)

where \(\varOmega ^{ij}\) denotes the (ij) element of the matrix \(\varOmega \). The partial correlation structure between assets can be detected by estimating the precision matrix of the log return vector.

To estimate the precision matrix, Brownlees et al. (2018) proposed the following realized network estimator:

$$\begin{aligned} \varOmega _{\lambda } = \mathop {\text {arg min}}\limits _{\varOmega }\left\{ \mathrm {tr}\left( \varOmega {\hat{\varSigma }}\right) - \log \det (\varOmega )+\lambda \sum _{i\ne j}|\varOmega ^{ij}|\right\} , \end{aligned}$$
(2)

where \(\lambda >0\) is the tuning parameter, \({\hat{\varSigma }}\) is an estimator of the integrated covariance matrix, and \(\varOmega \) is a d-dimensional positive-definite matrix. Equation (2) is a penalized log-likelihood function with a penalty equal to the sum of the absolute values of the non-diagonal elements in the estimator. It is an important feature of the absolute value penalty for \(\lambda >0\) that some elements of the estimator are estimated to be zero. This optimization can be attained via the graphical lasso proposed in Friedman et al. (2008), which can simultaneously estimate a covariance and precision matrices.

We must select the tuning parameter \(\lambda \) in Eq. (2) to solve the optimization problem. According to Yuan and Lin (2007), we choose the \(\lambda \) that minimizes the following equation defined in BIC:

$$\begin{aligned} \mathrm {BIC}(\lambda ) = n \{-\log \det ({\hat{\varOmega }}_{\lambda })+\mathrm {tr}({\hat{\varOmega }}_{\lambda }{\hat{\varSigma }}) \}+\log n\sum _{i\le j}1_{\{{\hat{\varOmega }}_{\lambda }^{ij}\ne 0\}}. \end{aligned}$$

4 Class of CAW models

4.1 The CAW model

We consider the d-dimensional realized covariance matrix \(R_t\) for \(t = 1,\dots , T\). When the history up to time \(t-1\) is given in \(F_{t-1} = \{R_{t-1},R_{t-2},\dots \}\), Golosnoy et al. (2012) assumed that \(R_t|F_{t-1}\) follows a d-dimensional Wishart distribution \(W_d(K, S_t/K)\). Here, K is the degree of freedom and \(S_t\) is the symmetric positive-definite scale matrix. Under this assumption, the following CAW model is proposed as a realized covariance prediction model:

$$\begin{aligned} S_t = CC^{\prime } + \sum _{i=1}^p B_{i}S_{t-i}B_{i}^{\prime } + \sum _{i=1}^q A_{i}R_{t-i}A_{i}^{\prime }, \end{aligned}$$

where C is the d-dimensional lower triangular matrix and \(A_i\) and \(B_i\) are square matrices. Due to the nature of the Wishart distribution, \(S_t=E[R_t|F_{t-1}]\) holds. A sufficient condition for identification is that all diagonal elements of the matrix C and the first diagonal elements of matrices \(A_i\) and \(B_i\) are positive. The CAW(pq) is parameterized by \(K, C, A_i,\) and \(B_i\), and the number of parameters is \((p+q)d^2+d(d+1)/2+1\).

In the diagonal CAW(pq) model (which is a restricted version of the CAW model), the parameter matrices \(A_i\) and \(B_i\) are diagonal. The number of parameters in the diagonal CAW(pq) model is \((p+q)d+d(d+1)/2+1\).

The CAW model, the extended-CAW model, and the CTAW model described later are all defined under the assumption that \(R_t\) follows the d-dimensional Wishart distribution \(W_d(K, S_t/K)\). Therefore, the maximum-likelihood method can be used to estimate the model, and if \(\psi \) is the set of parameters including the degree of freedom K and the parameter matrices, then the log likelihood of any CAW-class model is given by:

$$\begin{aligned} {{{\mathcal {L}}}}(\psi ) =&\sum _{t=1}^T \biggr \{-\frac{Kd}{2}\log 2-\frac{d(d-1)}{4}\log \pi -\sum _{i=1}^d \log \varGamma \left( \frac{K+1-i}{2}\right) \\&\quad -\frac{K}{2}\log |\frac{S_t}{K}|+\left( \frac{K-d-1}{2}\right) \log |R_t|-\frac{1}{2}\mathrm {tr}(KS_t^{-1}R_t)\biggl \}, \end{aligned}$$

where, \(\varGamma (\cdot )\) is the gamma function.

4.2 Extended-CAW models

To capture the long-term movement of covariation in financial-asset returns, Golosnoy et al. (2012) extended the CAW model into the MIDAS-CAW and HAR-CAW models.

4.2.1 MIDAS-CAW model

The GARCH models with short- and long-term components can accurately represent the complex-dependence structure in volatility (Engle and Lee 1999). The concept of MIDAS is to decompose fluctuations in volatility and covariance into short- and long-term components. Therefore, the scale matrix \(S_t\) is replaced with the long-term component \(M_t\) for the daily realized covariance matrix \(R_t\) and decomposed into the short-term component \(S_t^*\) as follows:

$$\begin{aligned} S_t&= C_t S_t^* C_t^{\prime },\\ M_t&= C_t C_t^{\prime }, \end{aligned}$$

where \(C_t\) is the lower triangular Cholesky factor. Furthermore, the short-term component \(S_t^*\) is modeled as:

$$\begin{aligned} S_t^* = I_d + \sum _{i=1}^p B_i(S_{t-i}^*-I_d)B_i^{\prime }+\sum _{i=1}^q A_i(C_{t-i}^{-1}R_{t-i}(C_{t-i}^{\prime })^{-1}-I_d)A_i^{\prime }. \end{aligned}$$

The long-term component \(M_t\) is modeled as the weighted sum of the L lags of the realized covariance matrix \({\overline{R}}_{t, l}^{(m)}\) calculated over the range of m trading days:

$$\begin{aligned} M_t&= {\overline{C}}\,{\overline{C}}^{\prime } + \theta \sum _{\ell =1}^L \phi _{\ell } (\omega ) {\overline{R}}_{t, l}^{(m)},\\ {\overline{R}}_{t, l}^{(m)}&= \sum _{\tau = t-m\ell }^{t-m(\ell -1)-1} R_{\tau }, \end{aligned}$$

where \({\overline{C}}\) is the lower triangular matrix, \(\theta (\ge 0)\) is a slope parameter, and \(\phi _{\ell }(\omega )\) is a scalar-value function of the weights. In the MIDAS-CAW(pq) model, a slope parameter \(\theta \) and a weight function argument \(\omega \) are added to the parameters of the CAW(pq) model as parameters to be estimated. The long-term component’s period m, and lag L, as well as the weighting function \(\phi \) are given. In the empirical analysis by Golosnoy et al. (2012), a weighted sum for a year is calculated as \(m = 20\) and \(L=12\). Moreover, beta weights are used in the weight function as follows:

$$\begin{aligned} \phi _{\ell }(\omega ) = \frac{(1-\frac{\ell }{L})^{\omega -1}}{\sum _{j=1}^L (1-\frac{j}{L})^{\omega -1}}. \end{aligned}$$

4.2.2 HAR-CAW model

The HAR model is a special case of MIDAS regression, and responds to long-term dependence on daily volatility using a hierarchical autoregressive specification that includes volatility via daily, weekly, and monthly lags. Golosnoy et al. (2012) proposed the following HAR-CAW model:

$$\begin{aligned} \begin{aligned} S_t&= CC^{\prime }+AR_{t-1}A^{\prime }+A^{(w)}{\overline{R}}_{t-1}^{(w)}A^{(w)^{\prime }}\\&\quad +A^{(bw)}{\overline{R}}_{t-1}^{(bw)}A^{(bw)^{\prime }}+A^{(m)}{\overline{R}}_{t-1}^{(m)}A^{(m)^{\prime }}, \end{aligned} \end{aligned}$$

where w indicates weekly periods (5 days), bw indicates biweekly periods (10 days), and m indicates monthly periods (20 days) in \(x\in \{w, bw, m\}\); \({\overline{R}}_{t-1}^{(x)}\) is the realized covariance matrix calculated over the period corresponding to x; and A and \(A^{(x)}\) are d-dimensional parameter matrices. The number of parameters needed for estimation in the HAR-CAW model is \(4d^2 + d(d+1)/2+1\).

4.3 The CTAW model

In many stocks and other financial markets, it has been observed that a fall in financial-asset prices will more strongly affect volatility than a rise in financial-asset prices; this phenomenon is called the leverage effect. In Anatolyev and Kobotaev (2018), the conditional-threshold autoregressive Wishart (CTAW) model was proposed by introducing a leverage effect into the CAW model. Similar to the CAW model, when the history up to time \(t-1\) is given in \(F_{t-1} = \{R_{t-1},R_{t-2},\dots \}\), \(R_t|F_{t-1}\) is assumed to follow the d-dimensional Wishart distribution \(W_d(K, S_t/K)\). Then, the CTAW model can be defined as:

$$\begin{aligned} \begin{aligned} S_t = CC^{\prime }&+ \sum _{i=1}^p \left( B_i+\sum _{j=1}^d H_{i, j}I_{j, t-i}\right) S_{t-i}\left( B_i+\sum _{j=1}^dH_{i, j}I_{j, t-i}\right) ^{\prime }\\&+ \sum _{i=1}^q\left( A_i+\sum _{j=1}^d G_{i, j}I_{j, t-i}\right) R_{t-i}\left( A_i+\sum _{j=1}^d G_{i, j}I_{j, t-i}\right) ^{\prime }, \end{aligned} \end{aligned}$$

where

$$\begin{aligned} I_{j, t} = {\left\{ \begin{array}{ll} 1 \,\, (\text{ when } \text{ the } \text{ price } \text{ of } \text{ the } \text{ stock } \text{ j } \text{ went } \text{ down } \text{ in } \text{ the } \text{ previous } \text{ periods}),\\ 0 \,\, (\text{ otherwise}), \end{array}\right. } \end{aligned}$$

C is a d-dimensional lower triangular matrix, and \(A_i\) and \(B_i\) are d-dimensional square matrices. Moreover, \(H_{i, j}\) and \(G_{i, j}\) are d-dimensional square matrices in which neither the elements of j rows, nor those of j columns, are equal to 0. The number of parameters that we should estimate in the CTAW(pq) model is \((2d-1)(p+q)d+(p+q)d^2+d(d+1)/2+1\). This includes the addition of \((2d-1)(p+q)d\) parameters to the baseline CAW model.

4.4 Extended and restricted CTAW models

The baseline-CTAW model has a large number of parameters and may overfit the sample data. Therefore, a restricted approach to the CTAW model was proposed in Anatolyev and Kobotaev (2018) and the authors of that paper extended it to include the MIDAS regression technique. The limits of the CTAW model can be compounded, with 3 classes for Off-Diagonal/Diagonal/Diagonal switches, 2 classes for Non-A-flexible/A-flexible, 2 classes for non-MCTAW/MCTAW. In other words, 12 models are provided including the baseline-CTAW model, and all of these models are extended by MIDAS.

4.4.1 Diagonal CTAW model and diagonal-switching CTAW model

Here, the diagonal and diagonal-switching CTAW models are proposed. In the diagonal CTAW model, \(A_i, B_i, G_{i,j}\) and \(H_{i,j}\) are restricted to the diagonal matrix. In other words, \(G_{i,j}\) and \(H_{i,j}\) are all 0 except for the (jj) component. In the diagonal-switching CTAW model, only \(G_{i, j}\) and \(H_{i,j}\) are restricted to the diagonal matrix. The number of parameters to be estimated is \(2(p+q)d+d+1\) for the diagonal CTAW(pq) model and \((p+q)d+(p+q)d^2+d(d+1)/2+1\) for the diagonal-switching CTAW(pq) model.

4.4.2 A-flexible CTAW model

In the A-flexible CTAW model, the coefficient of the model’s past predicted value \(S_{t-i}\) has no dependence upon the leverage effect, which appears only in the terms of the past-realized covariance matrix \(R_{t-i}\), as in the following equation:

$$\begin{aligned} \begin{aligned} S_t&= CC^{\prime } + \sum _{i=1}^p B_iS_{t-i}B_i^{\prime } \\&\quad + \sum _{i=1}^q \left( A_i+\sum _{j=1}^d G_{i,j}I_{j,t-i}\right) R_{t-i}\left( A_i+\sum _{j=1}^dG_{i,j}I_{j,t-i}\right) ^{\prime }. \end{aligned} \end{aligned}$$

The A-flexible specification is motivated by the fact that the realized covariance matrix \(R_t\) is observed and can be substituted for news concerning the number of assets. The leverage effect should affect the coefficient of \(R_t\) as investors react to this news. On the other hand, \(S_t\) is unobservable and investors cannot keep up with the changes; therefore, the leverage effect does not need to impact the \(S_t\) coefficient. The number of parameters to be estimated in the A-flexible CTAW(pq) model is \((2d-1)qn+(p+q)d^2+d(d+1)/2+1\).

4.4.3 Market-CTAW model

In the baseline-CTAW model, changes in the volatility dynamics of all assets are caused by changes in the directions of each asset’s price fluctuations. On the other hand, in the market-CTAW (MCTAW) model, changes in the volatility dynamics of all assets are caused by changes in the direction of market-representative indicators. The MCTAW(pq) model is given by:

$$\begin{aligned} \begin{aligned} S_t&= CC^{\prime } + \sum _{i=1}^p (B_i+H_{i,m}I_{m,t-i})S_{t-i}(B_i+H_{i,m}I_{m,t-i})^{\prime }\\&\quad + \sum _{i=1}^q (A_i+G_{i,m}I_{m,t-i})R_{t-i}(A_i+G_{i,m}I_{m,t-i})^{\prime }, \end{aligned} \end{aligned}$$

where

$$\begin{aligned} I_{m,t} = {\left\{ \begin{array}{ll} 1 \, (\text{ when } \text{ the } \text{ price } \text{ of } \text{ market } \text{ index } \text{ m } \text{ went } \text{ down } \text{ in } \text{ the } \text{ previous } \text{ periods}),\\ 0 \, (\text{ otherwise}), \end{array}\right. } \end{aligned}$$

and \(H_{i,m}\) and \(G_{i,m}\) are square matrices. The number of parameters to be estimated in the MCTAW(pq) model is \(2(p+q)d^2+d(d+1)/2+1\).

5 Empirical application

In this section, we propose two methods for selecting stocks and perform an empirical analysis with the stocks selected. The two methods that we propose are the Sharpe ratio and partial correlation techniques. Specifically, the precision matrix is estimated using a graphical lasso, and then, the portfolio is constructed. The covariance matrix, which is composed of stocks included in the portfolio, is predicted using the CAW model to compare the predictive power. A direct comparison between the partial correlation and the Sharpe ratio methods is left for future work.

5.1 Data

Sixty-five stocks traded on the Nikkei 225 between September 24, 2015 and December 29, 2017 are considered. The dataset was obtained from the Nikkei NEEDS-TICK data. Table 1 shows the tickers, company name, and sectors of used stocks. The symbols for each company are taken from overseas markets. For companies that are not listed on overseas markets, symbols are created and applied as appropriate. The number of samples is 558. In these data, the first 308 days are used as training data for model estimation and the latter 250 days are used as test data for a model-prediction comparison. Figure 2a, b shows the estimated realized variance and realized covariance, respectively, and the colored range is the sample period. In this paper, the integrated inverse-covariance matrices were estimated for each trading day over the full period. The integrated covariance estimator \({\hat{\varSigma }}\) used as the input for realized network estimation is the multivariate realized kernel obtained by the high-frequency return vector.Footnote 2

The setting for the multivariate realized kernel is explained in the empirical analysis. The jittering lag m is set to \(m=2\) according to Barndorff-Nielsen et al. (2011). We use the Parzen window type of non-flat top as a kernel function. The bandwidth H is estimated by a plug-in method, and the optimal bandwidth is estimated daily based on real data, but has median and mode values of 16, a minimum value of 9, and a maximum value of 26.

Table 1 Nikkei tickers, company names, and sectors
Fig. 2
figure 2

Realized variance and covariance

5.2 Asset selection

In this subsection, we introduce two methods for selecting the stocks to be predicted by the CAW model using a graphical lasso. The first is stock selection based on the Sharpe ratio, and the second is based on partial correlation. In both methods, we estimate the precision and covariance matrices using the graphical lasso; the realized kernel is the sample-covariance matrix.

5.2.1 The Sharpe ratio method

We estimate the efficient frontier using a mean-variance approach based on estimated covariance. We select stocks to make up the portfolio, such that the Sharpe ratio is maximized.

The Sharpe ratio indicates the return on the volatility ratio proposed by Sharpe (1994) and can be expressed as:

$$\begin{aligned} h(x) = \frac{\mu ^{\prime }x - r_f}{\sqrt{x^{\prime }\varSigma x}}, \end{aligned}$$

where \(r_f\) is the risk-free rate. \(x=(x_1,\dots ,x_d)'\) indicates the proportion of total funds invested in each asset and \(E[x] = x_1\mu _1+\dots +x_d\mu _d=\mu ^{\prime }x\) with \(\mu = (\mu _1,\dots ,\mu _d)'\). \(\varSigma \) is the d-dimensional covariance matrix. The portfolio that maximizes the Sharpe ratio can be obtained by solving the following optimization problem:

$$\begin{aligned}&\text {max}_x \quad \frac{\mu ^{\prime }x - r_f}{\sqrt{x^{\prime }\varSigma x}}, \nonumber \\&\text {subject to} \quad \sum _{i = 1}^d x_i =1, \quad 0\le x_i \le 1. \end{aligned}$$
(3)

When \(\mu ^{\prime }x-r_f>0\), the optimization problem (3) can be rewritten as follows with \(t = \frac{1}{\mu ^{\prime }x-r_f}, y = tx\):

$$\begin{aligned}&\text {min}_y \quad y^{\prime }\varSigma y,\\&\text {subject to} \quad \sum _{i = 1}^d y_i =t, \qquad t>0,\\&0\le y_i \le t, \quad \mu ^{\prime }y - r_ft =1.\\ \end{aligned}$$

If only one solution (yt) is obtained, we can construct the portfolio weights as \(x^* = y/t\).

In this paper, we use a function in the MATLAB Financial Toolbox to obtain the portfolio weights, such that the Sharpe ratio is maximized. Then, we adopt stocks with non-zero estimated portfolio weights.Footnote 3 The expected return is the mean of the daily log return in the sample term. The covariance matrix was estimated using a graphical lasso. We used the average of the realized kernel integrated over the sample period as a sample-covariance matrix for the graphical lasso. We also used the risk-free rate as \(r_f = 0\). Figure 3 shows the estimated efficient frontier, the point at which the Sharpe ratio is maximized, and the scatterplot of each asset. In addition, the blue line shows the estimated efficient frontier with a risk-free rate. The selected stocks are Kirin (KNBWY), Otsuka Holdings (OTSKY), Komatsu (KMTUY), Seiko Epson (SEKEY), Tokyo Electron (TOELY), Docomo (DCM), and SoftBank Group (SFTBY).

Fig. 3
figure 3

Estimated efficient frontier. The Sharpe ratio is maximum at the green point on the curve, and each point denotes an asset

5.2.2 The partial correlation method

We form the partial correlation network from an estimated precision matrix using the graphical lasso. The partial correlation is calculated via Eq. (1). Figure 4 illustrates the partial correlation network, with each node indicating a company and the thickness of the edge indicating the partial correlation strength. In other words, the edge becomes thicker as the partial correlation decreases. There is no partial correlation between nodes that are not connected by edges.

In this paper, companies with a small partial correlation are assembled in a portfolio, since when building a portfolio, reducing the correlation coefficient between assets in turn reduces risk. In Fig. 4, companies with a small partial correlation are gathered around Mitsui & Co (MITSY). Therefore, we select MITSY, Toyota Motor (TM), Subaru (SBR), Canon (CAJ), Tokyo Electron (TOELY), Resona Bank (RESONA), and Daiichi Life(DAIICH) as seven companies.

Fig. 4
figure 4

The partial correlation network structure

5.3 Model estimation method

We estimate the CAW and CTAW models with the lag orders \((p,q) = (0,1), (1,1), (2,1), (1,2), (2,2), (3,2), (2,3), (3,3)\) for unrestricted, restricted, and extended models. In terms of parameter constraints, we estimate \(K'>0\) with \(K = K'+d\) to ensure that the degree of freedom K of the Wishart distribution is significantly greater than the number of stocks d.

According to Anatolyev and Kobotaev (2018) and Golosnoy et al. (2012), we perform continuous estimation from lower to higher order models. First, for the lag order (0, 1) model, some different initial-parameter sequences are prepared in which each component of the parameter matrix is 0. Then, a numerical maximum-likelihood method using the BFGS method is performed for each initial parameter. Of the estimated parameters, we select that which maximizes the log-likelihood function as a parameter of the model with lag order (0, 1). The selected parameters are used as initial parameters for the next lag order. This continuous estimation is performed until the model with lag order (3, 3) is estimated. Note that since there are two models with lag orders one below in the models with the lag orders (2, 2) and (3, 3), we prepare three initial-parameter sequences for each lag order (i.e., six). Then, of the six estimated parameter sequences, that with the maximum likelihood is adopted. This method not only searches the area surrounding the optimal solution of the low order model, but also searches other areas in which the optimal solution may exist using independent initial-parameter values.

5.4 Estimation results

We estimate all models including the restricted and extended-CAW models and the CTAW model. As mentioned above, the sample period is 308 days and the test period is 250 days.

Table 2 shows the results of the best order based on the BIC of the non-MIDAS and MIDAS models. For stock selection based on the Sharpe ratio, the diagonal CAW model is found to be optimal for both the non-MIDAS and MIDAS types. Similarly, for stock selection by partial correlation, the non-MIDAS-type and MIDAS-type diagonal CAW models are selected. Moreover, the model of order (1, 1) is optimal in many cases. This result is consistent with the analysis in Hansen and Lunde (2005), which found that the model with lag order (1, 1) tended to be selected in the empirical analysis of the GARCH model.

Table 2 The best order and BIC of each model

5.5 Prediction method and result

We predict the covariance matrix one period ahead using the estimated model; then, we calculate the value of the error function using the test data and predicted value. In this paper, the BEKK-GARCH model is used as a benchmark. After that, the MCS proposed by Hansen et al. (2011) is employed to compare the model’s prediction performances. The MCS is a set of models that is constructed to include the best model at a given significance level. The MCS process is applied using the block-bootstrap method with length 25.Footnote 4 In this paper, we use the Stein loss (James and Stein 1961) as a loss function:

$$\begin{aligned} L_s(Y_t,{\hat{Y}}_t) = \mathrm {tr}[{\hat{Y}}_t^{-1}Y_t]-\log |{\hat{Y}}_t^{-1}Y_t|-d. \end{aligned}$$

First, the MCS process is executed for each model in turn to find the optimal order; then, the MCS process is executed for all models. In this paper, we set the significance level to \(20\%\).

Tables 3 and 4 show the order and error mean of the model with the best prediction performance in each case. Of all the models selected by the Sharpe ratio, the MIDAS-diagonal CAW (3, 2) model is the best. The partial correlation method, however, selects the MIDAS-CAW (0, 1) model as best. From Table 3, the mean value of loss \(L_s\) in all models is smaller in MIDAS-type than in non-MIDAS-type models, and the prediction performance is improved. Additionally, relatively high-order non-MIDAS-type models are selected, but most MIDAS-type models are low order. Moreover, most MIDAS-type models selected have a smaller number of parameters N, than do non-MIDAS-type models. For each CAW-type and CTAW-type model, we can see that the prediction performances of all extended and restricted models are better than those of the original CAW and CTAW models. However, the CAW-type model was judged to offer better prediction performance than the CTAW-type model for the data used in this study. Table 4 presents the same facts as Table 3, but the optimal order in the MIDAS-type model tends to be (0, 1) more than that in the Sharpe ratio method.

Table 3 The mean values of Stein loss and the best order, as obtained by MCS using the Sharpe ratio method
Table 4 The mean values of Stein loss and the best order, as obtained by MCS using the partial correlation method

Tables 5 and 6 presented the top 10 models selected by MCS from all cases. Here again, as before, the MIDAS-type model with a small number of parameters is highly ranked. The HAR-CAW model ranks high in stock selection by partial correlation. Moreover, in the partial correlation method, as in Table 4, the model of order (0, 1) tends to be higher. Figure 5a shows the realized variance and prediction of SFTBY and Fig. 5b shows the realized covariance and prediction of SFTBY-KNBWY, as determined by an out-of-sample technique using the MIDAS-diagonal CAW (3, 2) model and the Sharpe ratio method.

Table 5 Top 10 models according to MCS by the Sharpe ratio method
Table 6 Top 10 models according to MCS by the partial correlation method
Fig. 5
figure 5

Realized variance and covariance and each prediction

6 Conclusion

In this paper, we proposed two new frameworks for predicting covariance matrices using the graphical lasso and the CAW model. The first framework is stock selection based on the Sharpe ratio. The covariance matrix was estimated from the realized kernel using the graphical lasso, and the efficient frontier was determined via the estimated covariance matrix. Stocks were selected to maximize the Sharpe ratio among the estimated efficient frontiers. The second stock selection method was based on partial correlation. As with the first method, the precision matrix was estimated using a graphical lasso to construct a partial correlation network, which was used to select stocks. The covariance matrix was determined by the CAW-type covariance prediction model using the realized kernel constructed with the stocks selected by both methods.

As a final note, let us discuss future works. In this paper, we selected only useful stocks and predicted the covariance, because the CAW model is difficult to apply to a high-dimensional covariance matrix. However, our proposed framework cannot predict the covariance matrix of a portfolio composed of a large number of stocks; thus, it is conceivable to extend the model to predict high-dimensional covariance matrices. Additionally, a model with a long-term dependency structure must be proposed in the future.