1 Introduction

Artificial intelligence (AI) offers new approaches to modelling and forecasting real-time data. In the context of financial data analysis, one of the most relevant AI methods is machine learning (ML). Generally, machine learning includes methods that help computer systems automatically improve their performance with experience. These data-driven, self-adaptive techniques require very few assumptions about the models used for the investigated data. In recent years, several ML methods have been successfully used for forecasting purposes. One of them is the support vector machine (SVM) method proposed by Vapnik [1]. This method, applied to solve both classification and regression problems, is designed to have good generalisability and an overall stable behaviour, implying good out-of-sample performance.

The literature on SVM has been systematically expanding, both in the area of methodology and practical applications. In particular, new methodological approaches, including some modifications of the original SVM models or specific SVM-based hybrid models, have been proposed (e.g., [2,3,4,5,6,7]).

Originally, the SVM method was developed to solve classification problems; later, however, it was extended to the domain of regression problems [8]. In the literature, the term SVM is typically applied in the context of classification problems, while the term support vector regression (SVR) is used to describe regression with support vector methods.

The econometric literature extensively discusses the empirical properties of financial time series, which include volatility clustering, weak autocorrelation of returns, occurrence of an asymmetric impact of positive and negative shocks on conditional volatility (the so-called leverage effect), long memory, existence of strong dependencies between returns of various financial instruments, and some characteristics of return distributions, such as fat tails, leptokurtosis and asymmetry. Many empirical studies have shown that the dynamics of financial processes can be nonlinear (see, e.g., [9,10,11,12,13,14]; and references therein). Support vector methods, such as other nonparametric statistical analysis techniques, tend to be useful tools of nonlinear forecasting, as they do not presume the linearity of the data-generating process but let the data speak for themselves. They have been successfully applied to forecast financial time series, such as stock indices [3, 15,16,17,18,19], stock prices [20,21,22,23], volatility indices [24], derivatives [25,26,27], exchange rates [15, 28,29,30,31,32], exchange-traded funds [33, 34] and corporate bonds [35].

In the literature, volatility models are usually constructed on the basis of only closing price data. However, databases also usually contain daily low and high prices. These values come from intraday prices and are very important for the measurement of price changes during day. It has been shown that the use of low and high daily prices leads to more accurate estimates and forecasts of variances (see, e.g., [36,37,38,39,40]), covariances (see, e.g., [41,42,43,44]) and value-at-risk measures (see, e.g., [45, 46]). Moreover, in contrast to very-high-frequency data, the application of low and high prices does not suffer from a large computational burden. For these reasons, the use of these prices in forecasting is very important from a practical viewpoint.

The main motivation of our paper is to combine two gaining importance and popularity approaches, namely, the application of SVR and low and high prices, to create a new forecasting procedure for the covariance matrix of returns based on daily prices. Modelling and forecasting covariance matrices are vital because financial institutions and investors usually possess portfolios of assets. In the process of construction, valuation and management of financial instruments portfolios, knowledge about the relationships between assets is as important as knowledge about the dynamics of returns and volatilities. Forecasting the covariance matrix is also crucial in applications such as risk management, option pricing and hedging strategies. Such applications require a multivariate approach, whereas most of the SVR studies in finance are univariate. In contrast, in this study, SVR was applied to forecast not only variances of financial returns but also covariances. Modelling and forecasting the covariance matrix are much more demanding tasks because the matrix constructed from the forecasts of variances and covariances obtained by the disjoint models is not guaranteed to be positive definite.

We apply range-based variances and covariances of returns, which are formulated on the basis of low and high prices. Our approach is nonparametric, which makes it more general than the papers mentioned above. Chou et al. [41] combined the conditional autoregressive range (CARR) model by Chou [36] with the dynamic conditional correlation (DCC) model by Engle [47] to propose the range-based DCC model. Fiszeder et al. [44] proposed the DCC model constructed using the range generalised autoregressive conditional heteroskedasticity (R-GARCH) model by Molnár [39]. Fiszeder and Fałdziński [43] suggested the DCC model formulated using the CARR model and the range-based estimator of covariance of returns. The model introduced by Fiszeder [42] was based on the BEKK model by Engle and Kroner [48] and the use of range-based estimators of variances and covariances of returns. All the above papers are methodologically different from the approach considered in this paper because we do not use any parametric range-based volatility models.

The paper has four contributions:

  • First, we propose a new method for dynamic modelling and forecasting covariance matrices based on SVR. This approach guarantees the positive definiteness of the forecasted covariance matrices and is flexible, as it can be applied to different dependence patterns. At the beginning, we decompose the range-based covariance matrices of returns into the Cholesky factors, and then we forecast the univariate series of the entries of the Cholesky factors using the SVR model. Afterwards, we reconstruct the covariance matrix from these forecasts as a result of the reverse operation of the Cholesky decomposition. We use the range-based variances and covariances; the proposed approach, however, is quite general and can be applied to other proxies of covariance matrices formulated on the basis of daily data (e.g., squared returns and products of returns) or intraday data (e.g., realised variances and covariances).

  • Second, we provide empirical evidence that the forecasts of both the whole covariance matrix and each single covariance obtained by the proposed procedure are more accurate than those obtained by the DCC model. This model is a natural benchmark because it is one of the most popular multivariate volatility models and can even be applied to very large portfolios. To the best of our knowledge, this is the first attempt in the literature to use SVR to forecast covariance matrices.

  • Third, we demonstrate that the variance forecasts based on the proposed procedure are more precise than the forecasts from the univariate GARCH model [49]. It has already been shown in the literature that variance forecasts based on the SVR model can be more accurate than the forecasts calculated from the GARCH model (see, e.g., [15, 28, 31, 50]); however, range data have not been used in such applications so far.

  • Fourth, we show that the forecasting advantage of the proposed method over the DCC model is higher during high market volatility and dependence between assets. This conclusion is important since such periods are often associated with market turmoil and high market uncertainty, i.e., when forecasting is the most difficult and accurate forecasts matter most. Such a finding for the range-based estimators has not been formulated in the literature so far.

The rest of the paper is organised as follows. Section 2 provides an outline of the SVR model, describes the range-based covariance estimator and, most importantly, introduces the proposed method for covariance matrix forecasting. Section 3 presents the empirical research aimed at assessing the performance of the proposed procedure, the data applied, and a detailed description of the study and its results. Section 4 provides the conclusions.

2 Theoretical background

2.1 SVR model

Let us assume the following regression model:

$$ y=r\left(\mathbf{x}\right)+\delta $$
(1)

where r(x) is the regression function, y is the dependent variable, x is the vector of regressors and δ is additive zero-mean noise with variance σ2. On the basis of a training dataset {(xt, yt)}t = 1, …T, we want to approximate the unknown regression function by a function f(x) that has a deviation of at most ε from the outputs yt and is as flat as possible [51]. In SVR, the input x is first mapped onto a high-dimensional feature space using fixed (nonlinear) mapping, and then a linear model is constructed in this feature space:

$$ f\left(\mathbf{x}\right)={\sum}_{i=1}^d{\omega}_i{\varphi}_i\left(\mathbf{x}\right)+b $$
(2)

where d is the dimension of the feature space, φi(x) denotes (nonlinear) transformations, ωi are the coefficients and b is the bias term [52]. It should be noted that the dimension of the feature space determines the capacity of the SVR model to approximate a smooth input-output mapping; higher values of the dimension d lead to a more accurate approximation [15].

According to Eq. (2), to derive the function f(x), one must estimate ω = (ω1, ω2, …, ωd)′ and b. To measure the estimation quality, Vapnik [1] proposed the ε-insensitive loss function:

$$ {L}_{\varepsilon}\left(y,f\left(\mathbf{x}\right)\right)=\left\{\begin{array}{c}0,\kern5.25em \mid y-f\left(\mathbf{x}\right)\mid \le \varepsilon, \\ {}\mid y-f\left(\mathbf{x}\right)\mid -\varepsilon, \kern0.75em \mathrm{otherwise}\end{array}\right. $$
(3)

which means that errors below ε are not penalised. SVR performs linear regression in the d-dimensional feature space using the ε-insensitive loss function and, at the same time, tries to reduce model complexity by minimising ‖ω2 = ω′ω. The optimal regression function is given by the minimum of the functional:

$$ \Phi \left(\boldsymbol{\upomega}, \boldsymbol{\upxi} \right)=\frac{1}{2}{\left\Vert \boldsymbol{\upomega} \right\Vert}^2+C{\sum}_{t=1}^n\left({\xi}_t+{\xi}_t^{\ast}\right) $$
(4)

where C is a pre-specified positive value and ξt and \( {\xi}_t^{\ast } \) are nonnegative slack variables representing the upper and lower constraints, respectively, on the outputs of the system; i.e.,

$$ {y}_t-f\left({\mathbf{x}}_{\boldsymbol{t}}\right)\le \varepsilon +{\xi}_t^{\ast } $$
(5)
$$ f\left({\mathbf{x}}_{\boldsymbol{t}}\right)-{y}_t\le \varepsilon +{\xi}_t $$
(6)

for all t = 1, 2, …, T. The parameter C controls the penalty imposed on observations that lie outside the ε-margin and, consequently, helps to prevent overfitting. Both the ε and the C parameters of SVR must be determined by the user.

The optimisation problem described above can be transformed into a dual problem SVR for which the solution is given by:

$$ f\left(\mathbf{x}\right)={\sum}_{t=1}^{T_{SV}}\left({\alpha}_t-{\alpha}_t^{\ast}\right)K\left({\mathbf{x}}_t,\mathbf{x}\right)\kern1.25em \mathrm{s}.\mathrm{t}.\kern0.75em 0\le {\alpha}_t\le C,0\le {\alpha}_t^{\ast}\le C $$
(7)

where αt and \( {\alpha}_t^{\ast } \) are Lagrange multipliers, TSV is the number of support vectors and K is the kernel function of the form:

$$ K\left({\mathbf{x}}_t,\mathbf{x}\right)={\sum}_{i=1}^d{\varphi}_i\left(\mathbf{x}\right){\varphi}_i\left({\mathbf{x}}_t\right) $$
(8)

The dual problem can be solved more easily than the primal problem. The use of the kernel function prevents the need to explicitly compute the functional form of φi, which greatly reduces the computational complexity of the high-dimensional hidden space. Instead, the kernel function K computes the inner product of the vector φ(x) = (φ1(x), φ2(x), …, φd(x))′ [31].

In practice, the most popular kernel functions are the following:

  • Linear (dot product): K(xt, x) = xt′x,

  • Gaussian: K(xt, x) = exp(−‖xt− x2),

  • Polynomial: K(xt, x) = (1 + xtTx)p; p = 2, 3, …

The application of the linear kernel leads to linear SVR, while the Gaussian and polynomial kernels allow nonlinear SVR to be performed.

2.2 Range-based covariance estimator

We apply the estimator of covariance of returns calculated on the basis of low and high prices (see [53,54,55]). This estimator has an advantage over that based on only the closing prices because it uses information about the price changes during the day. It is given by:

$$ \operatorname{cov}\left(X,Y\right)=0.5\left[\operatorname{var}\left(X+Y\right)-\operatorname{var}(X)-\operatorname{var}(Y)\right] $$
(9)

where variances var(X + Y), var(X), var(Y) are estimated using low and high prices.

The Parkinson [56] estimator of variance can be used to calculate all the variances in Eq. (9). It is expressed as:

$$ {\operatorname{var}}_{Pt}={\left[\mathit{\ln}\left({H}_t/{L}_t\right)\right]}^2/\left(4\ln 2\right) $$
(10)

where Ht and Lt are the daily high and low prices, respectively. The Parkinson estimator was advocated by Brunetti and Lildholdt [53] and Brandt and Diebold [55]; however, other range-based variance estimators can also be applied.

Equation (9) can be applied when the range of the sum of variables X and Y is known, although, in practice, this range is not easy to calculate. It can be computed from tick-by-tick prices; however, such data are difficult to obtain for many financial assets. However, in the case of foreign exchange rates, the aforementioned range can be easily calculated. Let us consider two exchange rates of currencies x and y in terms of currency z, denoted by x/z and y/z, respectively. In the absence of triangular arbitrage opportunities, the return of the cross rate can be written as:

$$ \Delta \ln\ \mathrm{x}/y=\Delta \ln\ \mathrm{x}/z-\Delta \ln\ \mathrm{y}/z $$
(11)

Then, the range-based estimator of covariance for the currency pairs can be represented as:

$$ \operatorname{cov}\left(\Delta \ln\ \mathrm{x}/z,\Delta \ln\ \mathrm{y}/z\right)=0.5\left[\operatorname{var}\left(\Delta \ln \mathrm{x}/z\right)+\operatorname{var}\left(\Delta \ln \mathrm{y}/z\right)-\operatorname{var}\left(\Delta \ln \mathrm{x}/y\right)\right] $$
(12)

This approach was used by some authors to analyse covariance of returns (see, e.g., [53, 57, 58]). Such an estimator was also employed for the construction of multivariate GARCH models by Fiszeder [42] for the BEKK model and by Fiszeder and Fałdziński [43] for the DCC model.

The estimator of covariance based on low and high prices is less efficient than the most common estimator based on intraday prices, i.e., realised covariance, although it is more robust to microstructure noise, which makes the estimators biased [55]. Chou et al. [41] and Martens and van Dijk [59] showed how this bias of the range-based covariance estimator can be eliminated. Compared with the estimator calculated on the basis of closing prices, the estimator calculated on the basis of low and high prices is highly efficient. Monte Carlo simulations have indicated that when the Parkinson estimator is applied to all the variances in Eq. (9), the range-based covariance estimator is approximately five times more efficient (see, e.g., [53, 55]).

2.3 Forecasting the range-based covariance matrix using SVR

In this subsection, we introduce a methodology for dynamic modelling and forecasting of covariance matrices based on SVR models using the Cholesky decomposition. The approach guarantees the positive definiteness of the forecasted covariance matrices and is flexible, as it can be applied to different dependence patterns.

It should be noted that the matrix constructed from the variance and covariance forecasts obtained from the disjoint application of the forecasting models is not guaranteed to be positive definite. In this paper, we apply the Cholesky decomposition to preserve the positive definiteness of the forecasted covariance matrices. The Cholesky decomposition, also known as the Cholesky factorisation, is a method of decomposing a symmetric positive-definite matrix A into the product of a unique upper triangular matrix U with real and positive diagonal entries and its conjugate transpose U′; i.e., A = U ′ U. The matrix U is known as the Cholesky factor of A and can be interpreted as the square root of A. The motivation for modelling and forecasting the Cholesky factors instead of the elements of the range-based covariance matrix of returns directly is that we do not need to impose any restrictions. The idea of using the Cholesky factorisation in financial modelling is not new. For example, Tsay [60] applied it to re-parameterise the conditional covariance matrix of returns in the multivariate GARCH model. The idea of using the Cholesky decomposition of the realised covariance matrix in modelling and forecasting was put forward by Andersen et al. [58] and initially implemented in empirical studies by Chiriac and Voev [61].

To forecast the covariance matrix for the forecast horizon τ, we follow five steps:

  1. Step 1.

    We calculate the N × N range-based covariance matrices of returns Gt, t = 1, 2, …, T, where T is the time-series length. In general, the range-based variances of the returns are the diagonal entries of these matrices, while the range-based covariances based on Eq. (9) are the other entries. To estimate the range-based covariance matrices for the currency pairs, we use the estimator of the covariance of the returns given by Eq. (12) and the Parkinson estimator of the variance expressed in Eq. (10).

  2. Step 2.

    The matrices Gt (t = 1, 2, …, T) are decomposed using the Cholesky decomposition into the form Gt= PtPt.

  3. Step 3.

    For each entry \( {p}_t^{ij} \) (1 ≤ i ≤ j ≤ N) of the Cholesky factor Pt, we construct and train the autoregressive SVR model of the form:

    $$ {p}_{t+\tau}^{ij}=f\left({p}_t^{ij},{p}_{t-1}^{ij},\dots, {p}_{t-l+1}^{ij}\right) $$
    (13)

    where l is the lag length. The SVR model (13) is estimated separately for each (i, j) based on univariate series \( {p}_t^{ij} \) (t = 1, 2, …, T).

  4. Step 4.

    We forecast the Cholesky factor \( {\boldsymbol{P}}_{T+\tau }=\left[{p}_{T+\tau}^{ij}\right] \). To achieve this aim, the forecasts \( {\hat{p}}_{T+\tau}^{ij} \) are calculated using the models trained in Step 3.

  5. Step 5.

    The forecast of the covariance matrix is reconstructed using the reverse operation of the Cholesky decomposition; i.e., \( {\hat{\boldsymbol{G}}}_{T+\tau }=\hat{\boldsymbol{P}}{\prime}_{T+\tau }{\hat{\boldsymbol{P}}}_{T+\tau } \), where \( {\hat{\boldsymbol{P}}}_{T+\tau } \) is the forecast of the Cholesky factor.

The outline of the proposed algorithm is depicted in Fig. 1.

Fig. 1
figure 1

Outline of the algorithm for covariance matrix forecasting

2.4 Alternative forecasting methods

Modelling a covariance matrix is a challenging task for two reasons. First, the chosen model must guarantee the positive definiteness of the estimated and forecasted covariance matrices. Second, to limit the inflation of the number of estimated parameters and computational difficulties, severe restrictions on the model dynamics must be imposed. To model conditional covariance matrices, many methods have been proposed in the literature. Two of the most popular approaches are (1) modelling and forecasting realised covariance matrices and (2) applying multivariate GARCH models (see, e.g., [62]). The first method relies on the usage of intraday data to calculate nonparametric measures of variances and covariances, such as realised variances and covariances. Models based on these measures provide more precise estimates and forecasts than models based on daily closing prices; however, high-frequency data are not commonly available and are significantly more expensive than daily data. In this paper, we apply daily low and high prices, which are usually available with closing prices but contain much more information about volatility and relationships between returns. The use of such prices also has more advantages than intraday data, such as wider availability, lower acquisition costs, considerably lower database requirements, and greater robustness to some microstructure effects. Furthermore, the direct application of intraday data means some problems, such as the existence of daily cyclical fluctuations, the existence of strong autocorrelation or a significant impact of the publication of macroeconomic information on quoted prices. The goal of this study is to create a forecasting procedure based on daily data, which is why we do not consider models for realised covariance matrices.

The second most popular approach is to apply multivariate GARCH models. These are parametric models where, by definition, the structure of dependencies between variables is restricted to a specific analytical form. When dealing with large portfolios, many multivariate GARCH models present unsatisfactory performance or problems with estimation. Therefore, we choose for our forecasting study the dynamic conditional correlation (DCC) model by Engle [47], which has important advantages, such as the positive definiteness of the conditional covariance matrices and the ability to describe time-varying conditional correlations and covariances in a parsimonious way. Furthermore, the parameters of the DCC model can be estimated in two stages by the quasi-maximum likelihood method, which makes this approach relatively simple and possible to apply even to very large portfolios. The DCC model is one of the most popular multivariate GARCH models used to describe financial time series. Moreover, many papers, such as [63,64,65,66], show that it is very difficult for other multivariate GARCH models to outperform the DCC model.

There are also alternative methods to forecast conditional covariance matrices that are not included in the two above approaches. The method based on an SVR model that we propose in this paper is one of them. There are several benefits of applying SVR models to forecast time series. First, it preserves the common advantages of other machine learning methods. It is widely claimed that machine learning offers a more general approach than parametric models (cf. [67, 68]). It is capable of approximating nonlinear functions based on noisy and nonstationary data. Machine learning concentrates on prediction by using general-purpose learning algorithms to find patterns in often rich and unwieldy data. This approach makes minimal assumptions about the data-generating systems; it can be effective even when the data are gathered without a carefully controlled experimental design and in the presence of complicated nonlinear interactions. Moreover, it can be particularly helpful when the number of input variables exceeds the number of subjects. It should also be noted that empirical studies in the literature are promising since they confirm that specific ML methods can outperform econometric models (cf. [69]). In particular, one can find first attempts to forecast conditional covariance matrices by applying ML methods [70, 71]. In both cited papers, artificial neural networks (ANNs) were used. However, in contrast to our method, both studies can be included in the first mentioned approach since realised covariance matrices are modelled there.

The most important ML forecast methods are artificial neural networks, support vector machines, random forests, the nearest neighbours algorithm, Bayesian regression, kernel ridge regression and generalised linear models. We decided to apply the SVR model due to its promising properties. It has been shown that SVR combines the training efficiency and simplicity of linear algorithms with the accuracy of the best nonlinear techniques. In many practical applications, this approach can tolerate high-dimensional or incomplete data and is robust to outliers [28, 72]. One of the main advantages of SVR is that its computational complexity does not depend on the dimensionality of the input space. Additionally, it has excellent generalisation capability and high prediction accuracy [73].

Many of the conducted studies have confirmed that SVR can make more accurate forecasts than other machine learning techniques, including ANNs (see, e.g., [28, 74, 75]; however, some authors have claimed that the superiority of SVR over ANN is not obvious and can depend on the type of neural network [76, 77]. Moreover, unlike neural network training, which requires nonlinear optimisation with the risk of falling into the local minima, the SVR solution is always unique and globally optimal [75]. It has also been shown that SVR has advantages over artificial neural networks for real-world data of limited size: (i) fewer calibration samples are required to obtain a desired model performance, (ii) SVR is less sensitive to sampling variations in small datasets and (iii) cross-validation is an approximately unbiased option for evaluating the true support vector regression model performance even for small datasets [78].

3 Application of the SVR models for forecasting exchange rates

In the empirical study, we applied the proposed methodology for dynamic modelling and forecasting covariance matrices based on SVR models to exchange rates from the forex market. We assessed the accuracy of several SVR models with different lags and kernels and compared this approach with the DCC model.

3.1 Data applied

The three most heavily traded currency pairs in the forex market, namely, EUR/USD, USD/JPY and GBP/USD, were investigated. The analysis of currencies uses triangular arbitrage to calculate the covariance of the returns.

First, we evaluated the data for the 11-year period from 2 January 2006 to 30 December 2016 (2853 returns). The descriptive statistics for the returns, squared returns and products of the returns are presented in Table 1. The daily returns were calculated as rt = 100 ln(pt/pt − 1), where pt is the closing price at time t.

Table 1 Summary statistics of the daily currency pairs

The variability of the returns, measured by the standard deviation, was quite similar for all the currency pairs; however, there were significant differences in the skewness and kurtosis of the distributions. Owing to perturbations caused by the 2016 Brexit referendum, the distribution of returns was more leptokurtic, and the minimum return was significantly lower for GBP/USD than for the remaining pairs. A weak autocorrelation was present in the returns of the GBP/USD rate. The autocorrelation of the squared returns and products of the returns was much stronger than the autocorrelation of the returns. Moreover, there were considerably higher deviations from the normal distribution for the squared returns and the products of the returns. This finding means that modelling the covariance matrices of the returns is a much more demanding task than modelling the returns.

3.2 Description of the models and procedures

We applied autoregressive SVR models (cf. Eq. (13)) with different lags and two kernels: a linear kernel (which leads to linear SVR) and a Gaussian kernel (which leads to nonlinear SVR). We also applied several lag values; however, we present only the results for lags l = 1 and l = 15; lag l = 1 leads to the simplest specification, in which only one lagged variable, i.e., \( {p}_T^{ij} \), is used as the regressor. Our calculations showed that larger lags may lead to more accurate forecasts; however, this effect ceased to be visible for l > 15. According to the results, lag l = 15 seemed to be optimal when considering the accuracy of the forecasts and the computation time.

Finally, we considered four specifications of the SVR models:

  1. 1)

    SVR with a linear kernel and l = 1 (hereinafter SVR_lin_1),

  2. 2)

    SVR with a linear kernel and l = 15 (hereinafter SVR_lin_15),

  3. 3)

    SVR with a Gaussian kernel and l = 1 (hereinafter SVR_Gauss_1),

  4. 4)

    SVR with a Gaussian kernel and l = 15 (hereinafter SVR_Gauss_15).

All the above models were repeatedly constructed on the basis of a rolling sample with a fixed size of T= 527 (i.e., the number of observations in the first 2-year period, from 3 January 2006 to 31 December 2007, which was used as the initial sample). In the case when the range-based covariance matrices of the returns Gt were not positive definite, a simple method based on eigenvalues was applied (see, e.g., [79, 80]); however, this procedure did not significantly affect the dynamic dependencies between the covariance matrices. According to the methodology described in Subsection 2.3, the SVR models were constructed for the series \( {p}_t^{ij} \) (t = 1, 2, …, T) obtained from the Cholesky decomposition. The decomposed series were standardised, i.e., centred by subtracting the mean and divided by the standard deviation.

As described in Subsection 2.1, the values of the ε and C parameters (also called meta-parameters) must be determined to create the SVR models. There are competitive propositions in the literature on how to tune these parameters (see, e.g., [50, 52, 81,82,83]; and references therein); however, previous studies did not demonstrate a clear superiority for any of them. Therefore, we applied two tuning methods in our study. In the first method, we determined the parameters using the default settings in MATLAB (in our study, we used the function fitrsvm in MATLAB (R2015b) to perform SVR); i.e., C = 1 for the linear kernel and \( C=\frac{Iqr(Y)}{1.349} \) for the Gaussian kernel, where Iqr(Y) is the interquartile range of the response variable Y. For both kernels, the default value of the ε parameter was \( \varepsilon =\frac{Iqr(Y)}{13.49} \). The main advantage of this method is its simplicity and time effectiveness. The second method we applied was the grid search technique. This method consists of constructing many SVR models for different values of the parameters and selecting the optimal model on the basis of a validation set. We performed a grid search for the C and ε parameters by considering consecutive values of C = 2−5, 2−4, …, 24, 25 and ε = 2−5, 2−4, …, 24, 25. To select the optimal parameters, we applied a 10-fold cross-validation procedure. According to this approach, the investigated sample was randomly partitioned into 10 equal-sized subsamples. Nine of these subsamples were used to construct the SVR model, while the remaining one was used to validate the model. To this end, the mean squared error (MSE) was computed on the observations in the validation subsample. This procedure was repeated 10 times (for each of the 10 subsamples used as the test set), and the average of the 10 obtained MSEs was calculated. Finally, the parameters that led to the smallest MSE were considered optimal. It should be noted that this approach can be very time consuming. However, this problem can be avoided because it is reasonable to assume that the optimal parameters for consecutive rolling samples should be very similar (as these samples differ only in 1 of 527 observations), which means that there is no need to determine these parameters for each sample. In our study, we decided to perform the grid search technique to tune the parameters every 100 days.

Our study results showed that the grid search method produced better values than the default values in MATLAB (according to the in-sample MSE calculated by the cross-validation and to the out-of-sample prediction errors). Therefore, in the next subsection, we will present only the results from the SVR models with parameters tuned using the grid search technique.

3.3 Forecasting performance

Based on all the considered models, 1-day-ahead forecasts of covariance matrices – \( {\hat{\boldsymbol{G}}}_{T+\tau } \) (i.e., τ = 1) for the 9-year period from 2 January 2008 to 30 December 2016 were calculated (i.e., 2336 forecasts). The considered period was relatively long, and it covered both turbulent periods, such as the global financial crisis of 2008, European sovereign debt crisis and 2016 Brexit referendum, and tranquil periods; therefore, the results should be robust to the state of the global economy.

As a proxy of the daily covariance for the evaluation of forecasts, the sum of products of intraday returns (the realised covariance) was employed, while as a proxy for the daily variance, the sum of squared intraday returns (the realised variance) was used. This is a commonly accepted approach in the literature (see, e.g., [84,85,86]). One major problem of using such data is the choice of the appropriate frequency of observations (see, e.g., [87]). In this study, 15-min returns were applied; however, the main results did not change for the 5- or 30-min returns. It should be noted that we used intraday data only to evaluate the forecasts; we did not use them to construct the models or to calculate the forecasts.

First, following Laurent et al. [88]), we evaluated the forecasts of the whole conditional covariance matrix using the squared Frobenius loss function given by:

$$ LF=\left(1/m\right){\sum}_{t=1}^m\mathrm{Tr}\left[\left({\hat{\boldsymbol{G}}}_t-{\boldsymbol{G}}_{tR}\right)^{\prime}\left({\hat{\boldsymbol{G}}}_t-{\boldsymbol{G}}_{tR}\right)\right] $$
(14)

where m is the number of forecasts, Tr is the trace of a matrix, \( {\hat{\boldsymbol{G}}}_t \) is a forecast of the conditional covariance matrix and GtR is the realised covariance matrix (with the realised variances on the diagonal and the realised covariances on the off-diagonal).

The proposed method of covariance matrix forecasting using SVR models utilises the Cholesky decomposition. Unfortunately, the forecasting performance of the procedure may be sensitive to the order of the variables in the covariance matrix because each permutation of the elements in the original matrix yields a different decomposition and different factors. Therefore, we considered all possible permutations of the analysed currency pairs (permutation 1: EUR/USD, USD/JPY and GBP/USD; permutation 2: EUR/USD, GBP/USD and USD/JPY; permutation 3: USD/JPY, EUR/USD and GBP/USD; permutation 4: USD/JPY, GBP/USD and EUR/USD; permutation 5: GBP/USD, EUR/USD and USD/JPY; and permutation 6: GBP/USD, USD/JPY and EUR/USD).

To evaluate the statistical significance of the results, we applied the model confidence set (MCS) of Hansen et al. [89]. The objective of the MCS procedure is to determine the set of models that consists of the best model(s) from a given collection of models. The best models are selected with a given level of confidence in terms of a user-specified criterion. In our analysis, a criterion based on the squared Frobenius loss function was applied. The values of the squared Frobenius loss function and the corresponding p values (calculated by the bootstrap method) of the MCS test are presented in Table 2. It can be seen in the table that the forecasts from the SVR_lin_15 model were significantly more accurate than those from the other SVR models and the DCC model for all permutations.Footnote 1 This finding means that the order of the variables in the covariance matrix did not affect the forecasting superiority of the SVR_lin_15 model.

Table 2 Evaluation results of the covariance matrix forecasts based on the squared Frobenius loss function

The results of the analysis of the whole covariance matrix did not show whether the superiority of one model was due to more accurate forecasts of the variances, covariances or both. This question is important since, in some applications, only the volatility of financial processes is used, and in other cases, the relationship between processes plays a key role. Therefore, from a practical point of view, it is advisable to analyse the forecasts of the variances and covariances separately. To this end, the mean squared error (MSE), mean absolute error (MAE) and coefficient of determination from the Mincer-Zarnowitz regression were calculated. These criteria are often used to evaluate volatility forecasts in empirical studies (see, e.g., [90, 91]). We also tried other loss functions, and they yielded similar results. The statistical significance of the results was verified again by the MCS test. To save space, we present only the results for permutation 2, which was the worst permutation according to the squared Frobenius loss function for the SVR_lin_15 model. The other permutations also led to more accurate forecasts for the separate series of the variances and covariances. The results for the forecasts of the covariances are presented in Table 3.

Table 3 Evaluation results of the covariance forecasts based on the MSE, MAE and R2 criteria

According to the results of the MCS test, only the SVR_lin_15 model belonged to the MCS. The forecasting superiority of the SVR_lin_15 model did not depend on the type of loss function; all the considered criteria indicated that the covariance forecasts based on this model were the most accurate.

We also compared the forecasts of the variances. To this end, we applied the GARCH models, which were previously used in the DCC model. The obtained results are presented in Table 4.

Table 4 Evaluation results of the variance forecasts based on the MSE, MAE and R2 criteria

The results for variance forecasting were not unequivocal but also indicated the advantage of the SVR_lin_15 model. Based on the MSE measure, the forecasts from this model were the most accurate for all the series, with the single exception of the JPY/USD pair (when only the GARCH model was included in the MCS). Additionally, for the EUR/USD pair, two models (the GARCH model and the SVR_lin_15 model) belonged to the MCS, and there was no evidence to reject the null hypothesis of equal predictive ability for these models. However, under the MAE criterion, which is less sensitive to outliers than the MSE measure, the SVR_lin_15 model was the best forecasting model for all the currency pairs. The superiority of the SVR_lin_15 model was also confirmed by the R2 criterion.

For both the covariances and variances, the differences between the values of the MAE amongst the different currency pairs were much smaller than those in the case of the MSE and R2 criteria. These differences were associated with the existence of numerous outliers in the currency pairs returns, which have a much stronger impact on the latter measures.

Our research showed the superiority of the linear kernel over the nonlinear (Gaussian) kernel, which means that the autoregressive relations in each forecasted series were linear or almost linear. It can be concluded that the applied linear SVR models have a form similar to that of the ARCH model; however, it should be noted that they are not applied directly to the raw series of the variances and covariances but to the series transformed by the Cholesky decomposition. By the same analogy, one can easily explain the superiority of the SVR_lin_15 model over the SVR_lin_1 model. Many empirical studies have shown that the conditional variance (covariance) usually appears to be a function of many lagged past squared errors (products of errors), which is why a more parsimonious parametrisation in the form of the GARCH model is frequently used. Additionally, it should be noted that our calculations showed that, in the case of linear SVR, longer lags led to more accurate forecasts; however, this effect ceased to be clearly visible for l > 15.

3.4 Influence of market conditions on the superiority of forecasts

Two recent studies suggest that the application of low and high prices in volatility models leads to the largest improvement in the estimation and forecasting of volatility during turbulent periods [38, 39]. For this reason, in this section, we examine whether market conditions, i.e., market volatility and dependencies between assets, can affect the accuracy of the proposed forecasting procedure based on the SVR model.

For this aim, we applied a quantile regression model and tested whether extreme improvements in forecasts can be explained by the level of market volatility (for variance forecasts) or dependence (for covariance forecasts) on the previous day. Let dvar,T and dcov,T denote loss differentials defined for the variance and covariance forecasts, respectively, as:

$$ {d}_{\operatorname{var},T}={\left({\operatorname{var}}_{DCC,T}-{\operatorname{var}}_{R,T}\right)}^2-{\left({\operatorname{var}}_{SVR\_\mathrm{lin}\_15,T}-{\operatorname{var}}_{R,T}\right)}^2 $$
(15)
$$ {d}_{\operatorname{cov},T}={\left(\left|{\operatorname{cov}}_{DCC,T}\right|-\left|{\operatorname{cov}}_{R,T}\right|\right)}^2-{\left(\left|{\operatorname{cov}}_{SVR\_\mathrm{lin}\_15,T}\right|-\left|{\operatorname{cov}}_{R,T}\right|\right)}^2 $$
(16)

where varDCC,T and varSVR _ lin _ 15,T are the forecasts of the conditional variances based on the DCC and SVR_lin_15 models, respectively, varR,T is the realised variance; covDCC,T and covSVR _ lin _ 15,T are the forecasts of the conditional covariances based on the DCC and SVR_lin_15 models, respectively, and covR,T is the realised covariance. When dvar,T and dcov,T are positive, then the forecasts based on the SVR_lin_15 model are more accurate than the forecasts from the DCC model. The loss differentials described in Eqs. (15)–(16) are based on the MSE loss function, but very similar formulas can be written for the MAE loss function.

The τ-th linear quantile regression model can be specified for the variance and covariance as follows:

$$ {d}_{\operatorname{var},T}={\beta}_0\left(\tau \right)+{\beta}_1\left(\tau \right)\ {\operatorname{var}}_{R,T-1}+{\varepsilon}_{\operatorname{var},T}\left(\tau \right) $$
(17)
$$ {d}_{\operatorname{cov},T}={\alpha}_0\left(\tau \right)+{\alpha}_1\left(\tau \right)\left|{\operatorname{cov}}_{R,T-1}\right|+{\varepsilon}_{\operatorname{cov},T}\left(\tau \right) $$
(18)

The parameter estimation results for the above quantile regression models are presented in Tables 5 and 6. We report the results based on the 90th percentile because we are interested in analysing large forecast improvements; however, very similar results were achieved for other high quantiles (e.g., the 75th and 95th percentiles).

Table 5 The parameter estimation results for the 90th percentile regression of the loss differential dvar,T on the lagged realised variance
Table 6 The parameter estimation results for the 90th percentile regression of the loss differential dcov,T on the lagged realised covariance

The estimates of the coefficient β1 were positive and significant for all the currency pairs, which means that higher forecast improvement of the SVR_lin_15 model over the DCC model was observed when the realised variance was large. This conclusion is important since high market volatility is associated with turbulent periods and high market uncertainty, i.e., when forecasting is the most difficult and accurate forecasts matter most.

The results reported in Table 5 show that the estimates of the coefficient of interest α1 were positive and highly significant for all the analysed relations, which means that the forecasting advantage of the SVR_lin_15 model over the DCC model was higher when the dependence between currency pairs was large. This conclusion is also important and, in particular, confirms the previous results concerning the impact of the realised variance, since strong relations between assets often exist during turbulent periods (see, e.g., [92]). Such a finding for the range-based estimators has not been formulated in the literature so far.

We have presented the results only for the loss differentials based on the MSE loss function; however, the main results did not change for the MAE loss function.

4 Conclusions

We have proposed a methodology for dynamic modelling and forecasting covariance matrices based on SVR, which is our main contribution. The procedure guarantees the positive definiteness of the forecasted covariance matrices and is flexible, as it can be applied to different dependence patterns. The range-based covariance matrices of returns are decomposed into the Cholesky factors, and then SVR models are applied to forecast the elements of these factors. Afterwards, the forecast of the covariance matrix of returns is reconstructed from the forecasts of these elements as a result of the reverse operation of the Cholesky decomposition.

The procedure is based on the decomposed range-based covariance matrices; however, the proposed approach is quite general and can be applied to other proxies of the covariance matrices formulated on the basis of daily data (e.g., squared returns and products of returns) or intraday data (e.g., realised variances and covariances).

The proposed procedure was applied to analyse the most heavily traded currency pairs in the forex market: EUR/USD, USD/JPY and GBP/USD. Our second primary contribution was to show that the forecasts of each separate covariance and the whole covariance matrix obtained by the SVR models were more accurate than those obtained by the competing benchmark multivariate GARCH model. Moreover, the variance forecasts based on the proposed procedure were more precise than those from the univariate GARCH model. It should be emphasised that the advantage of the suggested procedure was higher during turbulent periods, i.e., when forecasting is the most difficult and accurate forecasts matter most. Furthermore, we showed that the order of the variables in the covariance matrix, which yields different Cholesky decomposition results, did not affect the forecasting superiority of the SVR model. The main conclusions of the study were also robust to the forecast evaluation criterion employed.

We applied the DCC model as a benchmark because it is one of the most popular multivariate volatility models; moreover, the estimation of its parameters is relatively simple, and it is possible to apply it even to very large portfolios. The comparison can be performed with other more or less complex models. However, the search for such models was not the purpose of this investigation. Similarly, other variants of the SVR models or even other machine learning techniques can also be considered in future research. The procedure proposed in this paper was an effective approach to forecast the covariance matrices; however, there are some potential directions on how to improve it. For example, one can apply other kernel functions or other methods for tuning the meta-parameters. These issues were not the primary objective of this work but can be investigated in future studies.