1 Introduction

Given the recent technological revolution, virtual currencies have gained tremendous popularity with individual investors and have increasingly positioned themselves as one of the most attractive assets for institutional investors. A cryptocurrency is a digital asset encrypted with blockchain technology (Nakamoto, 2008). Given that the issuance and transaction of these assets are carried out through decentralised networks, they are theoretically immune to interference or manipulation by an authority or government. In particular, this feature makes them difficult to counterfeit or to spend twice.

Cryptocurrencies have an advantage over fiat money in portability since they can be used in international operations with great speed and low cost. In addition, they tend to resist inflation, so they have been considered safe-haven assets. One of their main characteristics is the variability they present over short periods and their atypical behaviour concerning traditional markets. In this study, it is preferred to model their challenging volatility behaviour.

Forecasting the dynamics of a financial instrument is one of the essential ingredients for optimising investment portfolios. Here, we address the scenarios of a uniform portfolio in a univariate and multivariate estimation of variance or portfolio risk. Neither case is necessary to forecast the returns but only the volatility to obtain the portfolio’s profit. Likewise, special attention is given to the transaction volume assuming that the number of times an asset is bought or sold affects its price.

The work aims to apply statistical and deep learning techniques to forecast the volatility of nine risky cryptocurrencies, considering the price and transaction volume at high frequencies during the onset of the COVID-19 pandemic. Thus, we want to answer the following question: does building portfolios through the information of statistical, deep learning, and hybrid models, where volume is included in the estimation, represent a significant improvement concerning naive methods in periods of high volatility?

Specifically, an exploratory study of cryptocurrency assets in terms of their volatility in high-frequency periods is sought. In particular, we apply asymmetric generalised autoregressive conditional heteroskedasticity (GARCH) models. The eGARCH exponential model (Nelson, 1991), and the gjrGARCH model (Glosten et al., 1993) are proposed to model the volatility of cryptocurrencies. It is interesting to include volume as an exogenous variable in volatility models and examine the corresponding information. In addition, a relevant point is the estimation of DCC-GARCH multivariate volatility models (Engle, 2002) to forecast the covariance matrix. In the context of deep learning, long short-term memory (LSTM) models are studied on time series of volatilities. In this way, mixed models of the LSTM–GARCH type are compared, adding the parameters of the eGARCH and gjrGARCH models. In the final stage, investment strategies are constructed: (1) uniform portfolios with univariate forecasts and (2) uniform portfolios with multivariate forecasts of the covariance matrix. Finally, the performance of the models is analysed through the variance of both portfolios using statistical, computational, and financial metrics. Specifically, the heteroscedastic absolute error (HAE), the heteroscedastic quadratic error (HSE), the sharpe ratio (SR), and the value at risk (VaR) are considered, and the accuracy of the prediction is measured through the Diebold–Mariano (DM) test. In the case of SR, we have considered the stable-coin Tether as the risk-free asset as the performance benchmark.

On the one hand, an extensive literature has shown evidence that GARCH models capture the heteroskedastic effects of time series of financial returns (Cont, 2001; Taylor, 1994; Ghysels et al., 1996). For their part, the asymmetric models eGARCH and gjrGARCH are a more stylised proposal to model the asymmetric response of volatility to negative impacts (Nelson, 1991; Glosten et al., 1993; Hentschel, 1995; Harvey & Shephard, 1996). In the case of multivariate models, DCC-GARCH has been the most parsimonious proposal to model the covariance matrix due to its small number of parameters to be estimated (Engle, 2002). In the case of neural network models, recurrent networks have been able to capture time dependencies. In contrast, their LSTM successors have solved the stability problem during optimisation. Thus, combining both models will enhance the predictive capabilities of the individual models in forecasting the volatility of cryptocurrencies (Kim & Won, 2018; Makridakis et al., 2018, 2020; Lahmiri & Bekiros, 2019; Kristjanpoller & Minutolo, 2018; D’Amato et al., 2022).

In this work, hybrid LSTM–GARCH models are used to measure the effect of volatility forecasts in the construction of high-frequency cryptocurrency investment portfolios. The hybrid model is proposed to capture short-term dynamics and obtain predictions that meet the needs of this market. In addition, transaction volume has been included as an exogenous variable. Its behaviour and the relationship it presents in terms of the performance of the forecasts are explored. The results are evaluated using metrics that capture the heteroscedasticity, return, and risk of the associated portfolio and the DM test to determine if the models are significantly different in accuracy.

There are several contributions to this study. We forecast high-frequency volatility in cryptocurrency markets using hybrid deep-learning models combined with stylised GARCH models. One main differentiating factor is to discuss the forecasting performance of univariate and multivariate uniform portfolios through the VaR and the SR metrics. Then, we avoid relying only on statistical tests and computational error metrics to discriminate the best models. Another contribution is considering a highly capitalised selection of cryptocurrencies and comparing them against an stablecoin. Then, we exemplify the scope of the study on the portfolio allocation problem between risky assets and the cryptocurrency analogous free-rate asset: Tether. We frame our analysis on the turbulent time of the beginning of the COVID-19 pandemic with a highly systemic risk component. Moreover, we have applied a sliding window strategy to avoid biases due to a particularly low or high turbulent period of the cryptocurrency market. Our methodology can be applied in general to highly non-linear time series and has forecasting implications on simple allocation strategies.

In summary, the present work seeks to contrast different univariate and multivariate volatility forecast models through uniform variance portfolios and evaluate their performance through different metrics with a naive model as a reference. We tie to the recent evidence in the literature showing that machine learning models do not always represent the best results when forecasting financial time series (Makridakis et al., 2018). Finally, the analysis is framed into the beginning of the COVID-19 pandemic as it is a period of high volatility and because it represents a systemic risk scenario on international finances.

Our analysis has implications in the management of investment portfolios, where it is crucial to estimate the variance of the assets in a univariate and multivariate manner. Then, to have a better allocation of assets that minimises exposure to risk and the probability of loss. Our results suggest using simple learning models to improve heteroskedastic errors, drastically reducing computation time compared to more elaborate neural network models. Thus, allowing rapid risk management in high-frequency scenarios, such as the case of a hypothetical investment fund that seeks to diversify into cryptocurrencies.

In Sect. 2, a literature review of volatility forecasts and cryptocurrency markets is performed, as well as the recent application of the DM test. Section 3 introduces the deep learning models used in this study. Section 4 presents the models of the GARCH family. Section 5 presents the elements of portfolio theory and its metrics considered in this study. Section 6 describes the data used and performs a preliminary exploratory analysis. The methodology followed to perform the volatility forecasts through the different models is detailed in Sect. 7. Section 8 shows the main results and discusses their implications regarding the selected metrics. Finally, in Sect. 9, the main findings are summarised, and future lines of research are proposed.

2 Literature Review

The GARCH volatility models were initially introduced in Bollerslev (1986). Engle presents an emblematic financial application that predicts volatility through a GARCH(1,1) model for a portfolio composed of the Nasdaq, Dow Jones and the ten-year Treasury Bond (Engle, 2001). The VaR was taken as a reference to evaluate the proposed solution. Likewise, a class of multivariate GARCH models with the dynamic conditional correlation (DCC) property were proposed by the same author (Engle, 2002); these models are denoted by DCC-GARCH and have fewer parameters than other multivariate implementations of the same type. On the other hand, a GARCH model was extended in Lamoureux and Lastrapes (1990) to consider transaction volume as an exogenous variable, which showed that adding this variable improves the explanatory power of the model. This relationship was also explored in the futures market (Najand & Yung, 1991). In addition, the effect of the volatility index (VIX) of the Chicago Board Options Exchange (CBOE) Market and the transaction volume were compared when modelling them as exogenous variables through a GARCH model (Kambouroudis & McMillan, 2016). In this work, as in the studies mentioned above, the family of GARCH(1,1) models is considered. This choice is based on the discussion of Bollerslev (1987), where the author shows evidence that a GARCH(1,1) model is more parsimonious than a higher-order autoregressive conditional heteroskedasticity(ARCH) model. In this sense, we seek to implement a model with few parameters that correctly captures the dynamics of volatility in financial time series.

Notably, in the topic of cryptocurrencies, the study (Chu et al., 2017) is one of the first to use different variants of the GARCH model to model the daily returns of the seven cryptocurrencies with the highest market capitalisation. In the same sense, previous studies have modelled Bitcoin and found evidence of it exhibiting characteristics between gold and the dollar when applying asymmetric GARCH models (Dyhrberg, 2016). However, the authors in Baur et al. (2018) replicate the previous work reaching the opposite conclusion; that is, the dynamics of bitcoin are very different from those of gold and dollars.

In the area of machine learning, the idea of LSTM has been implemented to predict directional movements in the shares of the S&P 500, and with this information to propose an investment strategy, arguing that it is possible to obtain meaningful information even in markets with statistical noise (Fischer & Krauss, 2018). Likewise, there are competencies where statistical and learning methods are contrasted to evaluate the prognosis of macroeconomic variables, finding that in most cases, the classical statistical methods surpass those of learning (Makridakis et al., 2018). In cryptocurrencies, recent work has implemented deep learning models to predict the direction of Bitcoin by taking into account non-traditional variables and using a non-parametric method of variable selection (García-Medina & Luu Duc Huynh, 2021). Recently, the authors of Banik et al. (2022) built a decision support system (DSS) to forecast the Indian stock market based on the LSTM model. However, none of these studies have explicitly paid attention to volatility.

A line of work that combines the study of volatility through GARCH and learning models has led to the proposal of hybrid models to improve the accuracy of volatility forecasts. In this direction, the study (Kim & Won, 2018) mixes LSTM models with different GARCH models to predict the volatility of the KOSPI 200 index (Korea Composite Stock Price Index). A similar methodology was implemented in Quintero Valencia et al. (2019) to forecast the volatility of the dollar to Colombian peso exchange rate, where a combination of interest rates and commodity prices are added as exogenous variables.

In this work, to compare forecast accuracy is used the DM test. This test was proposed by Diebold and Mariano (1995) with the characteristic that forecast errors can be non-Gaussian, non-zero mean, serially correlated, and contemporaneously correlated. Initially, the authors illustrate the practical use of the tests with an application to exchange rate forecasting between the three-month change in the nominal Dollar/Dutch Guilder. The test was incorporated as a standard practice for testing models in econometrics, to which Diebold criticises in (2015) the applied methodology of practitioners. He emphasises that the DB test should be interpreted as comparing forecasts, not as comparing econometric models. Also, he pointed out the power loss of the pseudo-out-of-sample model comparisons and the limited and unclear benefits of following that path. In the context of forecasting financial series using machine learning, the authors of Gu et al. (2020) modified the DM test. They intended to compare the cross-sectional average of the prediction errors of each model instead of comparing errors between individual returns and avoid violating the condition of weak error dependence. In Mensi et al. (2019) is studied the structural break of price returns of Bitcoin and Ethereum under different models of the GARCH family and compared using the DM test. Similarly, the authors of Catania et al. (2019) study Bitcoin, Litecoin, Ripple, and Ethereum under several multivariate vector autoregressive models with different forms of variation over time, where they also apply the DM test to contrast results. On the other hand, the test has also begun to be used in high-frequency forecasts of cryptocurrencies (Peng et al. 2018) to evaluate hybrid models, as are the support vector regression (SVR) and GARCH type. In the same spirit, it has been decided to use the DM test in this study. F.X. Diebold’s discussion (Diebold, 2015) should prevent us from taking the test results as confirmatory rather than indicative of the superiority of one forecast over another.

3 LSTM Deep Learning Models

Machine learning is a field of study in artificial intelligence where algorithms are developed so that computers can learn without being explicitly programmed. Deep learning methods have gained significant importance in recent years, and the number of people or organisations that develop them has increased. In addition, the computational resources to perform this type of processing have become more accessible. In the case of financial series, statistical and econometric methods often present difficulties when modelling non-stationary variables or those with complex dependencies (Dixon et al., 2015). Fortunately, deep learning techniques can identify and deal with these types of complex structures (Fischer & Krauss, 2018).

Artificial neural networks (ANN) are part of machine learning models. Different mathematical disciplines guide this research, and these models can be understood as function approximation machines designed to achieve statistical generalisation (Goodfellow et al., 2016). In general, these models are based on a set of units called neurons that interact with each other, sharing information between them. An artificial neuron or unit receives an input vector x that represents the input or output data of the other neurons connected. This information is weighted by multiplying a vector of weights w that the algorithm estimates during the learning process. This way, an output value or signal f(x) is generated through a transformation function \(g(\cdot )\) known as the activation function. This process can be represented in a simplified way through the following expression:

$$\begin{aligned} f(x)=g(x\cdot w) + b, \end{aligned}$$
(1)

where b is the trend. MultiLayer perceptrons (MLPs) are the simpler ANNs and are usually used as a vanilla model. The architecture of a MLP consists of an input layer, a hidden layer and an output layer. In this naive model, each node is a neuron.

Recurrent neural networks (RNNs) are a more elaborated class of networks specializing in representing and modelling sequential data. This type of network uses its hidden layers to summarize the historical information of a data sequence. RNNs implement cycles that allow information to persist over time. An RNN unit can be represented by an input \(x_t\) that generates a value \(h_t\), where in its structure, the output of the state at time \(t-1\) is the input for the state at time t, and the output of the state at time t is the input for the state at time \(t+1\).

The LSTM model was proposed by Hochreiter and Schmidhuber (1997) and is designed to address the problem of long-term dependencies. This model implements gates that specify the amount of information that passes in each time state in such a way that the network discriminates the information that is relevant for the next state. This model solves some difficulties encountered in the RNN training process when calculating the derivatives involved in the optimisation process. Unlike a conventional RNN, each neuron has different layers that help to filter the information. Specifically, an LSTM unit consists of a memory gate (\(c_t\)), an input gate (\(i_t\)), a forget gate (\(g_t\)), and an output gate (\(o_t\)). In Fig. 1, the general structure of the mechanism of an LSTM unit is shown. This figure shows at time t, the input \(x_t\), the hidden state \(h_t\), and \(\tilde{c}_t\), which determines the amount of accepted information or discarded in the status cell. The internal process of an LSTM unit can be described mathematically through the following expression

$$\begin{aligned} g_t= & {} \sigma (U_g x_t + W_g h_{t-1} + b_f) \end{aligned}$$
(2)
$$\begin{aligned} i_t= & {} \sigma (U_i x_t + W_i h_{t-1} + b_i) \end{aligned}$$
(3)
$$\begin{aligned} \tilde{c}_t= & {} tanh(U_x x_t + W_c h_{t-1}+b_c) \end{aligned}$$
(4)
$$\begin{aligned} c_t= & {} g_t * c_{t-1} + i_t * \tilde{c}_t \end{aligned}$$
(5)
$$\begin{aligned} o_t= & {} \sigma (U_o x_t + W_o h_{t-1} b_o) \end{aligned}$$
(6)
$$\begin{aligned} h_t= & {} o_t * tanh(c_t) \end{aligned}$$
(7)

where U and W are weight matrices, b is the bias term, and the symbol * denotes element-wise multiplication (Kim & Won, 2018).

Fig. 1
figure 1

General structure of an LSTM unit (García-Medina & Luu Duc Huynh, 2021). The memory gate is represented as \(c_t\), the input gate is denoted by \(i_t\), the forget gate is indicated as \(g_t\), and the output gate is as \(o_t\)

4 GARCH Models

The GARCH model introduced by Bollerslev (1986) can be understood as a combination of the long-term average value, the volatility information in the previous periods and the adjusted variance of the model. Explicitly, the conditional variance of a GARCH model(q, p) is expressed as

$$\begin{aligned} \sigma _t^2 = \alpha _0 + \sum _{i=1}^q \alpha _i u^2_{t-i} + \sum _{j=1}^p \beta _j \sigma _{t-j}^2, \end{aligned}$$
(8)

where \(u_t\) are the residuals, \(\alpha _0>0\), \(\alpha _i \ge 0\) must be met for \(i=1, \dots ,q\) and \(\beta _j \ge 0\) for \(j=1, \dots ,p\).

In the GARCH models, positive and negative values have a symmetrical effect on conditional variance. However, empirical evidence shows that negative returns have a more significant impact on the increase in volatility about positive returns of similar magnitude (Jondeau et al., 2007). The exponential GARCH model or eGARCH was introduced (Nelson, 1991) to improve two aspects of the GARCH model. First, it limits the values of the parameters \(\alpha \) and \(\beta \) to ensure a favourable variance. Second, it explicitly includes an asymmetric volatility response to positive and negative news. In the eGARCH model, \(\sigma ^2\) depends on the size and sign of these changes

$$\begin{aligned} log(\sigma ^2_t) = \alpha _0 + \sum _{i=1}^q \alpha _i g(z_{t-i}) + \sum _{j=1}^p \beta _j log(\sigma _{t-j}^2), \end{aligned}$$
(9)

where \(g(z_t) = \psi z_t + \gamma [|z_t|-\mathbb {E}(|z_t|)]\), in which \(\psi \) and \(\gamma \) are real constants. Both \(z_t\), and \(|z_t|-\mathbb {E}(|z_t|)\) are independent and identically distributed zero-mean sequences, so \(\mathbb {E}[g(z_t)]=0\). The asymmetry is modelled through the function \(g(z_t)\), which has the following form:

$$\begin{aligned} g(z_t)=\left\{ \begin{matrix} (\psi + \gamma )z_t - \gamma \mathbb {E}(|z_t|) &{} if\,z_t \ge 0\\ (\psi - \gamma )z_t - \gamma \mathbb {E}(|z_t|) &{} if\,z_t < 0 \end{matrix}\right. \end{aligned}$$

On the other hand, the gjrGARCH model introduced by Glosten et al. (1993) is an extension of the GARCH model that includes an additional term to account for possible asymmetries. This model alternatively captures the empirically observed fact that negative changes at time \(t-1\) have a more substantial impact on the variance at time t than positive changes. The variance in the model gjrGARCH(q, p) is defined by

$$\begin{aligned} \sigma _t^2 = \alpha _0 + \sum _{i=1}^q (\alpha _i+\gamma _i I_{t-1}) u^2_{t-i} + \sum _{j=1}^p \beta _j \sigma _{t-j}^2 \end{aligned}$$
(10)

for \(\alpha _0>0\), \(\alpha _i \ge 0\), \(\alpha _i +\gamma _i \ge 0\) and \(\beta _j \ge 0\) for \(i=1, \dots ,q\) and \(j=1, \dots ,p\), where

$$\begin{aligned} I_t=\left\{ \begin{matrix} 0 &{} if\,u_t\ge 0 \\ 1 &{} if\,u_t< 0 \end{matrix}\right. \end{aligned}$$

On the other hand, GARCH models can be modified to introduce exogenous variables that influence the volatility of returns. In the case of the GARCH(q, p) volatility model, these variables are added by introducing an extra term as follows (Lamoureux & Lastrapes, 1990):

$$\begin{aligned} \sigma _t^2 = \alpha _0 + \sum _{i=1}^q \alpha _i u^2_{t-i} + \sum _{j=1}^p \beta _j \sigma _{t-j}^2 + \delta V_t, \end{aligned}$$
(11)

where \(V_t\) is the exogenous variable, with the condition that \(\delta \ge 0\).

In the multivariate case, we seek to estimate a covariance matrix (Jondeau et al., 2007; Bauwens et al., 2006). Let us consider a vector of random variables \(x_t = (x_{1,t}, x_{2,t}, \dots , x_{n,t})'\) whose joint distribution is given by:

$$\begin{aligned} x_t= & {} \mu _t(\theta ) + u_t \end{aligned}$$
(12)
$$\begin{aligned} u_t= & {} \Sigma _t^{1/2}(\theta )z_t, \end{aligned}$$
(13)

where \(\mu _t(\theta )\) denotes the \(n\times 1\) conditional mean vector, \(\Sigma _t(\theta )\) is the \(n\times n\) conditional covariance matrix of the error term \(u_t\), and \(\theta \) is the vector of unknown parameters. Let \(D_t\) be a diagonal matrix of size \(n\times n\) with conditional variances \(\sigma _i^2\) on the diagonal. Thus, the conditional correlation matrix of \(u_t\) can be represented as

$$\begin{aligned} R_t = D_t^{-1/2}\Sigma _t D_t^{-1/2}=\{\rho _t\}_{ij}. \end{aligned}$$
(14)

To estimate \(R_t\), it is introduced in Engle (2002) the DCC-GARCH model. In this model, the dynamic correlation matrix \(R_t\) is represented as follows:

$$\begin{aligned} R_t= & {} diag(Q_t)^{-1/2} Q_t diag(Q_t)^{-1/2}, \end{aligned}$$
(15)
$$\begin{aligned} Q_t= & {} (1-\delta _1 - \delta _2) \bar{Q} + \delta _1 (v_t v_t^\top ) + \delta _2 Q_{t-1}, \end{aligned}$$
(16)

where \(\bar{Q}\) is the unconditional variance matrix of \(v_t=\{z_{i,t}/\sigma _{i,t}\}_{i=1,\dots ,n}\) (Silvennoinen & Teräsvirta, 2009) and \(diag(Q_t)\) is the diagonal matrix of \(Q_t\). The matrix \(\bar{Q}\) can be estimated through the sample \(\frac{1}{T}\sum _{t=1}^T \hat{v}_t\hat{v}_t'\), while the parameters \(\delta _1\) and \(\delta _2\) satisfy \(0\le \delta _1\), \(\delta _2\le 1\) and \(\delta _2 + \delta _2\le 1\).

5 Investment Portfolio

In 1952, Markowitz introduced the concept of the efficient frontier. This was the first mathematical formulation of portfolio optimisation. The idea is that investors see an expected return as desirable, while the variance of the returns is undesirable. What Markowitz demonstrated is the existence of a set of optimal portfolios that maximise the expected return given a risk value. This was what he called the optimal frontier (Roncalli, 2013).

Let the vector of returns be \(\textbf{r}=(r_1, \dots , r_p)\), where each element of the vector is the logarithmic return of the p instruments in the portfolio. Additionally, let \(\textbf{w}=(w_1, \dots , w_p)\) be the weights of each asset in the investment portfolio. The return of the portfolio is expressed as

$$\begin{aligned} R=\sum _{i=1}^p w_i r_i, \end{aligned}$$
(17)

On the other hand, portfolio risk is traditionally defined as the variance of returns.

$$\begin{aligned} \sigma ^2 = \mathbf {w'\Sigma w}, \end{aligned}$$
(18)

where \(\Sigma \) is the covariance matrix of the returns.

In the case of equally weighted portfolios or uniform portfolios, the weights are estimated using the formula

$$\begin{aligned} w^*_i = \frac{1}{p}. \end{aligned}$$
(19)

Likewise, for the case of a portfolio of minimum variance, the weights are chosen through the expression

$$\begin{aligned} w^*= \text {argmin}_w \{w'\Sigma w\} \end{aligned}$$
(20)

In the scenario in which the weights take both positive and negative values, the following analytical solution can be obtained:

$$\begin{aligned} w^*= \frac{\Sigma ^{-1} 1}{1^\top \Sigma ^{-1} 1}. \end{aligned}$$
(21)

There are variants of this solution based on whether leverage (negative weights) or higher or lower weight restrictions are considered. Here, we only consider the budget restriction: \(\sum _{i=1}^n w_i = 1\). In both scenarios, uniform portfolio and minimum variance portfolio, once the portfolio weights are determined, the variance of the portfolio is obtained through Eq. 18. This last quantity is modelled and predicted in the following sections.

On the other hand, the SR was developed by the Nobel Prize winner William F. Sharpe and is used to compare a portfolio’s return, the risk-free rate and its volatility (Sharpe, 1994). The SR is defined as

$$\begin{aligned} SR = \frac{R - R_f}{\sigma ^2}, \end{aligned}$$
(22)

where \(R_f\) is the risk-free rate. The higher this value, the better trade-off between return and risk associated with the investment.

The VaR is a probabilistic method that measures the potential loss of the portfolio value given a period of time. Specifically, the VaR is the percentage loss of the portfolio value that the actual loss is equal to or greater than only \(\alpha \) percent of the time. That is, there is a probability of \(\alpha \) percent that the loss in the portfolio’s value is equal to or greater than the VaR measure. In our case, we follow the parametric approximation for the calculation of the VaR. In general, the VaR is given by

$$\begin{aligned} \text {VaR}^{\alpha } = -\sigma _{t} \Phi ^{-1}(\alpha ), \end{aligned}$$
(23)

where \(\sigma _{t}\) is the square root of the estimated portfolio’s variance and \(\Phi ^{-1}(\alpha )\) is the inverse probability function of \(\alpha \).

6 Data

The prices and transaction volume of the \(p=10\) leading cryptocurrencies in terms of capitalisation (at the date of capture) are obtained through the application programming interface (API) of CoinMarketCap.Footnote 1 The analysis period is from 01/01/2020 to 06/30/2020, with a frequency of 5 min. The cryptocurrencies considered are listed in Table 1.

Table 1 Name, symbol, release year, and description of the analysed cryptocurrencies

The data are transformed to obtain the logarithmic returns of the ten price and volume series. In the case of missing data, linear interpolation is applied. Next, the average returns per hour and the realized variance through the variance of the intrahour data are obtained. In this way, time series of average returns are obtained and volatility series for both the price and the transaction volume, giving a total of \(n=4,368\) observations at hourly frequency. In Fig. 2a, the average hourly returns of the prices and transaction volume are plotted for each of the currencies studied. At the beginning of 2020, the currencies presented low volatility, while in mid-March, there was a structural change in the returns, which can be inferred by their increase in volatility. We can observe a similar behaviour for the volatility returns shown in Fig. 2b.

Fig. 2
figure 2

a Hourly average logarithmic return of price. b Hourly average logarithmic return of volume. In both cases, the colours for each cryptocurrency is described in the sidebar

In Tables  2 and 3, the main statistics for performance concerning price and volume are presented, respectively. The mean (\(\mu \)), standard deviation (\(\sigma \)), skew (s), kurtosis (\(\tau \)), and Phillips Perrón (PP) statistics are used to verify the existence of unit roots. In addition, the Ljung-Box (LB) and ARCH LM (LM) statistics have been included in the case of price returns to test the fit of a heteroscedastic model.

Table 2 Descriptive statistics of logarithmic returns of prices: mean (\(\mu \)), standard deviation (\(\sigma \)), skew (s), kurtosis (\(\tau \))
Table 3 Descriptive statistics of logarithmic returns of volumes: mean (\(\mu \)), standard deviation (\(\sigma \)), skew (s), and kurtosis (\(\tau \))

In the PP test, the null hypothesis has a unit root at \(\alpha = 0.01\) in all the time series considered. The LB and LM tests evaluate the dependence of the second moment with a time lag. These have as a null hypothesis that an ARCH process adequately fits the data. The test does not reject \(H_0\) for any cryptocurrency.

As a complementary part of the exploratory analysis, the partial autocorrelation function (PACF) is measured to determine the relevance of applying the GARCH heteroscedastic models. In Fig. 3, the corresponding correlogram is plotted for the square of the price returns. It can be observed that the values do not stabilise for any of the cryptocurrencies considered in this study. In sum, this exploratory data analysis shows preliminary evidence of the heteroscedasticity of the returns of cryptocurrencies. This allows us to proceed with the methodology described in the following section.

Fig. 3
figure 3

PACF of squared logarithmic returns of prices from one to 36 lags

7 Methodology

The models described in Table 4 are implemented. In each scenario, a window sliding of 504 hours is performed, with a window shift of 24 hours. In this way, each model is applied to \(M=161\) windows to forecast the volatility in the hour horizon \(h = \{1,2,\dots ,24\}\). For each value of h, the forecasts of the M windows are generated. Therefore, an average volatility prediction over times \(T+1\), ..., \(T+24\) under different market conditions is computed. The average prediction avoids bias in selecting a specific time window. Instead, it is intended to show average results that capture the periods of high and low volatility as a whole. The specific methodology followed to implement each model (see Table 4) is described below.

Table 4 Identifier (ID), type of model (Model), and specification of the implementation (description)

7.1 Naive Model

An estimate of the future volatility is constructed using the average of the realized variance during the time windows. In particular, with this procedure, future estimates for the portfolio of equal weights are the same for all prediction horizons \(h=1,\dots ,24\).

7.2 GARCH Models

In the case of the eGARCH, eGARCH-Vol, gjrGARCH, and gjrGARCH-Vol models, the volatility is predicted directly in terms of the portfolio’s performance of cryptocurrencies with uniform weights. In the case of the models that include volume, the volume returns index is constructed by taking the average volume returns of the individual variables. This index represents the volume return of the uniform portfolio and is included as an exogenous variable in the corresponding models.

In the case of the DCC-eGARCH and DCC-eGARCH-Vol models, it is necessary to estimate the portfolio’s volatility in two steps. First, the covariance matrix is estimated, where an eGARCH model is specified for each variable, and the corresponding DCC model is estimated with it. In the case of the model with volume, the volume return is included in each individual specification of the eGARCH model for all variables. In the second step, the variance of the portfolio is computed through Eq. 18 considering uniform weights.

The models are estimated for the order (1,1) in all cases. In addition, Student’s t-distribution is assumed for the residuals, and the model of the mean is specified as constant. For the estimation of the parameters, the libraries rugarch and rmgarch in the software R are used.

7.3 Deep Learning Models

The MLP and LSTM deep learning models are trained with the variance of each asset. In addition, the volatility data are structured in a supervised learning format. Thus, given the series of volatilities \(x_1,\dots ,x_n\), we proceed to create a matrix X that serves as an independent variable of the model and a variable y as the dependent variable, that is, the variable to be predicted. Let n be the size of each of the realized variance series and m be the size of the samples in X. In this way, to obtain the forecast of volatilities \(h=\{1,\dots ,24\}\) steps forwards, the following matrices are constructed:

$$\begin{aligned} X= \begin{bmatrix} x_1&{}x_2 &{}\dots &{} x_m \\ x_2&{}x_3 &{} \dots &{} x_{m+1}\\ \vdots &{} &{} &{} \\ x_{n-m+1}&{} x_{n-m+2} &{} \dots &{} x_n \end{bmatrix} \quad y=\begin{bmatrix} x_{m+1}&{}\dots &{}x_{m+24} \\ x_{m+2}&{} &{} x_{m+25}\\ \vdots &{} &{} \\ x_{n+1}&{}\dots &{}x_{n+24}, \end{bmatrix} \end{aligned}$$
(24)

where samples have size \(m=72\). With the \(n=504\) observations, the matrices X and y are divided into training, validation and testing subsets of sizes 383, 48, and 1, respectively.

Likewise, the input data are scaled in [0, 1] to streamline the learning process. In the training and validation, the hyperparameters are optimised in a mesh. Specifically, the set of values is considered for the batch sizes [24, 48, 72], with the sigmoid activation function. The combination of parameters that decreases the mean square error is used during the validation. These optimal parameters are used in the test set, and 10 model runs are performed to avoid biases due to the stochastic nature of the model. Finally, the average of the nine predictions is calculated, and this average is considered the estimate of the volatility in each step of the horizon \(h=1,\dots ,24\).

In the case of the LSTM-Vol models, the procedure is similar to that described above. The radical difference is how the volume returns index is included as a feature of the model. Explicitly, this is done by extending the number of columns of the matrix X in Eq. 24. In both models, LSTM and LSTM-Vol, the procedure is carried out through the libraries TensorFlow and keras in Python. In Fig. 4a, b, the architecture followed for the LSTM and LSTM-Vol models is presented, where how the volume index is introduced to the deep learning model can be explicitly seen. The MLP model is used as a vanilla approach, where it is set simply by an input layer, a hidden dense layer of 10 neurons, and an output layer for the predictions.

7.4 LSTM–GARCH Models

In the case of the LSTM-eGARCH and LSTM-gjrGARCH hybrid models, the methodology is similar to that described for the LSTM models. The main difference is that the parameters estimated in the GARCH models for a uniform portfolio are now included as additional features, that is, the parameters of the eGARCH and gjrGARCH models. Specifically, the coefficients added to the hybrid models are described in Table 5. The model includes these parameters by extending the number of columns of matrix X in Eq. 24. In this case, the GARCH parameters are not scaled as is done in the preprocessing of the returns. To estimate the models, the libraries of R and Python mentioned in the previous sections are used. In Fig. 4c, the architecture followed for the LSTM-eGARCH and LSTM-eGARCH-Vol models is presented, where the procedure to include the GARCH parameters is explicitly shown. In practical terms, the input layer receives the four characteristics given in Table 5 and the value of the historical volatility as in the other simple LSTM models.

Table 5 Asymmetric GARCH model coefficients added as features of the LSTM learning model (see section 4 for further details)
Fig. 4
figure 4

Architectures used to predict volatility over time \(T+1,\dots ,T+24\). Here, \(\sigma _{r_t}\) represent the realized variance at time t, and \(\sigma _{T+h}\) represent the predicted volatility at \(T+h\). a LSTM models. b LSTM-Vol models. c LSTM–GARCH models. From bottom to top the flow diagram is the same in all cases. First the input layer, then several hidden layers, and at the top the output layer

7.5 Metrics

To compare the performance of each of the models presented in the previous section, the metrics HAE, HSE, SR, and VaR are evaluated, and the DM test is applied to determine if there are significant differences in the prediction accuracies of the different models. The HAE and HSE are defined at the prediction horizon h and window T by the following expressions:

$$\begin{aligned} HAE(h,T)= & {} \Vert 1- \frac{\sigma ^2_{predicted}(T+h)}{\sigma ^2_{realized}(T+h)}\Vert , \end{aligned}$$
(25)
$$\begin{aligned} HSE(h,T)= & {} \left( 1- \frac{\sigma ^2_{predicted}(T+h)}{\sigma ^2_{realized}(T+h)}\right) ^2 \end{aligned}$$
(26)

where \(\sigma ^2_{predicted}(T+h)\) is the estimated variance over time \(T+h\), and \(\sigma _{realized}(T+h)\) is the realized variance at time \(T+h\). Different values of T represent different time windows or financial scenarios. The HAE and HSE are metrics designed to capture the heteroscedastic effect of the time series. Therefore, it is more natural and fair to use these versions since our object of study is the volatility forecast.

The SR and VaR are defined in the portfolio section. Defined in terms of the prediction horizon h and window T take the following forms:

$$\begin{aligned} SR(h,T)= & {} \frac{R(h,T) - R_f(h,T)}{\sigma _{predicted}^2(h,T)} \end{aligned}$$
(27)
$$\begin{aligned} VaR^{\alpha }(h,T)= & {} -\sigma _{predicted}(h,T) \Phi ^{-1}(\alpha ), \end{aligned}$$
(28)

where \(\sigma _{predicted}\) is computed by Eq. 18 in the particular case of multivariate DCC-GARCH type models. These financial metrics are included to compare the performance of the forecasts in a possible application, such as the allocation of capital in investment portfolios.

On the other hand, suppose that we have two forecasts \(y^{(1)}_{T+h},y^{(2)}_{T+h}\) at horizon h. We want to know which prognosis is better concerning predictive accuracy. The objective is to select the forecast with the smallest error measure and determine if this difference is significant or simply due to the specific choice of data in the sample. Given the actual time series \(y_{T+h}\) and the estimation \(\hat{y}_{T+h}\), the prediction error is

$$\begin{aligned} e^{(i)}_{T+h}=\hat{y}^{(i)}_{T+h} - y_{T+h}, \quad i=1,2 \end{aligned}$$
(29)

In this way, the loss associated with the forecasts is a function associated with the forecast error denoted by \(g(e^{(i)}_{T+h})\). The loss differential between two forecasts is defined as

$$\begin{aligned} d_{T+h}=g(e^{(1)}_{T+h})-g(e^{(2)}_{T+h}). \end{aligned}$$
(30)

Thus, two forecasts have the same precision at horizon h if and only if the expected value of \(d_{T+h}\) is zero for all T. This condition is the null hypothesis in the DM statistic, which is expressed at horizon h by the formula (Diebold & Mariano, 1995)

$$\begin{aligned} DM(h) = \frac{{\bar{d}}}{\sqrt{2\pi \hat{f}_d(0)/M}}, \end{aligned}$$
(31)

where \({\bar{d}}=\frac{1}{M}\sum _{T=1}^{M}(g(e^{(1)}_{T+h})-g(e^{(2)}_{T+h}))\) and \(\hat{f}_d(0)\) is a consistent estimator of \(f_d(0)\), which denotes the spectral density of the loss differential at frequency 0.

8 Results

This section describes the results obtained in terms of the metrics specified in the methodology. The forecasts of each of the models proposed in Table 4 are compared with the variance of a uniform portfolio, denoted as the realized variance. The uniform portfolio is built with the cryptocurrencies described in Table 1 except Tether, which is considered the risk-free rate \(R_f\) in the SR metric. For the calculation of the VaR, the inverse function is estimated based on the 5% quantile. In the DM test, the null hypothesis is that there is no difference in the accuracy of the volatility prediction between one model and another.

The GARCH family models last 5854.06 s (approximately 1 h and 37 min) on an Intel® Core\(^\textrm{TM}\) i5-8250U CPU @ 1.60 GHz \(\times \) four cores. In the case of the deep learning models, it has taken 228,910 s (2 days, 15 h, 35 min and 10 s) to run all 24,150 configurations (5 models x 161 windows x 3 batch values x 10 realizations) in parallel using an Intel® Xeon(R) Gold 6254 CPU @ 3.10GHz \(\times \) eight cores. Figure 5 shows two typical learning curves for the MLP and LSTM-gjrGARCH models at window 161 and batch size of 72. It can be seen that the number of epochs is different since we have used the criterion of stopping the learning process once a local minimum is reached to avoid overfitting problems.

Fig. 5
figure 5

A typical learning curves at window 161 and batch size 72 for a MLP. b LSTM-gjrGARCH

Figure 6 shows the volatility estimate of each of the models considered throughout the time windows analysed. Due to space, the results are offered only for the representative horizons, where \(h=\{1,6,12,24\}\). The forecasts are averages over the \(M=161\) time windows. For comparison terms, the y-axis is logarithmically scaled, and the realized variance of a uniform portfolio is included. It can be observed that the deep learning models recover the dynamics of the realized variance. However, the original values present higher orders of magnitudes than the results obtained by the volatility models. This can be observed in the predictions for the horizon \(T+24\), shown in Fig. 6d. A peak is observed in the prognosis of the deep learning models that differ significantly from the GARCH family models. Likewise, a natural grouping can be observed in the models according to their type. For example, the naive model usually exhibits smooth behaviour over time but is of greater magnitude than the other models. This model does not recover all the changes in volatility, which is expected due to its nature. On the other hand, the univariate models of the GARCH family present a similar dynamic but with values lower than the realized variance in most of the windows. Similarly, multivariate GARCH models generally show a common dynamic, although of an even smaller magnitude. This occurs because they are associated with a portfolio of minimum variance. Note that the models that include both pure and hybrid learning are the closest to the realized variance in magnitude. In general, most models recover the dynamics globally; when there is high or low variance, the models capture it, although usually to a lesser extent.

Fig. 6
figure 6

Prediction of volatility for each time window. a horizon \(T+1\). b horizon \(T+6\). c horizon \(T+12\). d horizon \(T+24\). The ID Model of the sidebar is described in Table 4. The realized variance is represented by a gray dashed line

Figure 7 shows the behaviour of the metrics through the \(M=161\) time windows at the horizon \(T+1\). We can see that in specific periods the metrics explode, mainly at times of highest volatility. For example, atypical values are observed around the financial turbulence of March 2020 derived from the announcement of a pandemic by the World Health Organization(WHO). Due to these outliers, instead of averaging the metrics over all the windows, we have taken their median with the value that characterizes the performance of the models over time. Thus we have denoted by MHAE, MHSE, MSR, and MVaR, the median over the M=161 time windows of the HAE, HSE, SR, and VaR metrics, respectively.

Fig. 7
figure 7

Metrics through the \(M=161\) time windows at \(T+1\). a HAE. b HSE (c) SR. d VaR

In Table 6, the results for the MHAE in the selected prediction horizons are shown. It can be observed that the results in the naive case are the least favourable. For the other models, errors of the same magnitude are presented, although the results obtained by the MLP model show the lowest errors at \(T+1\), \(T+6\) and \(T+12\), while the DCC-eGARCH model present the lowest MHAE at \(T+24\). On the other hand, notably, there is no significant increase in errors when increasing the prediction horizon for any model. In Table7, the results corresponding to the MHSE are presented. Given this metric, the naive model again presents the worst performance by one order of magnitude. On the other hand, the models that involve deep learning exhibit a performance with lower errors than that of the GARCH models at \(T+1, T+6\) and \(T+12\). In all prediction horizons, the MLP model yields the best results, yet comparable to the other deep learning models.

Table 6 MHAE
Table 7 MHSE

The performance of each of the portfolios with respect to the MSR is shown in Table 8. In this case, at \(T+1\) and \(T+6\) the Naive model present the highest SR, which is desirable in this metric. Nevertheless, the values at these horizons of predictions are negative. On the contrary, at horizon \(T+12\) and \(T+24\) the best result is obtained by the DCC-eGARCH-Vol model with positive values. Hence, the results imply that an investment strategy with a longer time horizon generates a higher Sharpe ratio most of the time. The results also imply that the uniform portfolio performance is worse or better than only investing in Tether, depending on the expected variance at different forecasting horizons. Then, a potential investor should trade on the selected cryptocurrencies at long and Tether at short horizons, considering our high-frequency frame.

Table 8 MSR

Similarly, the results for the MVaR are shown in Table 9. Note that all values are negative by definition. In this case, the best results are consistently shown by the univariate eGARCH-Vol models for the horizons considered. They are an order of magnitude higher than the naive model and considerably lower than the deep learning models. The results are remarkable since we can infer that the transaction volume reduces the portfolio’s losses at a tolerance of 5%. In other words, it is essential to include the volume information in the volatility forecast to better management of a hypothetical uniform allocated portfolio.

Table 9 MVaR

Next, the comparison of the residuals according to the DM statistic is presented. The null hypothesis of this test is that Model 2 (M2) has greater precision than Model 1 (M1), and the alternative is that Model 1 (M1) has greater accuracy than Model 2 (M2). The null hypothesis is rejected if the p-value is less than \(\alpha =0.1\). In Tables 10, 11, 12, and 13, the results to horizons \(T+1,T+6,T+12\), and \(T+24\) are shown, respectively.

Table 10 Comparison of residuals of the different models with the Diebold–Mariano test at horizon \(T+1\)
Table 11 Comparison of residuals of the different models with the Diebold–Mariano test at horizon \(T+6\)
Table 12 Comparison of residuals of the different models with the Diebold–Mariano test at horizon \(T+12\)
Table 13 Comparison of residuals of the different models with the Diebold–Mariano test at horizon \(T+24\)

In these tables, the rows represent Model 1, and the columns represent Model 2. When making the row/column comparison, the cell is highlighted if Model 1 has greater forecasting accuracy than Model 2. We can observe in Table 10 that in the estimation of \(T+1\), the DCC-eGARCH (E6) and all the deep learning models (E8-E12) do not reject the null hypothesis when compared to any other model. On the other hand, Table 11 show at horizon T+6 that the MLP, LSTM-Vol, and LSTM-gjrGARCH are the only ones for which the null hypothesis is not rejected concerning all the other models. For the case \(T+12\), Table 12 show that there is not significative evidence that any model surpasses those in the class of deep learning models (E8-E12) according to the DM test. A similar behaviour is found at horizon \(T+24\) (see Table 13). Note that the naive model rejects the null hypothesis of better accuracy about the GARCH family models at this horizon. In general, it is found that the models of the GARCH family are not significantly more accurate than the deep learning models. Even among the latter, we have no evidence to lean towards one or the other; that is, they are statistically equivalent in accuracy.

One possible application of variance forecasting is in the context of asset allocation in an investment portfolio. For this purpose, we construct a portfolio of minimum variance using Eq. 21 and the forecasts of the covariance matrix of the DCC-GARCH models. Figure 8 shows the change in portfolio allocations for each time window at the forecast horizon \(T+1\). Note that the only restriction considered in the allocation of weights is their sum must be equal to one. Therefore, it is expected that there are allocations with negative weights in the portfolios associated with the DCC-GARCH and DCC-GARCH-Vol models. Our results reveal the transition from the long to the short position of BTC, which coincides with the declaration of a pandemic by WHO. Also, BTC allocated the highest proportion of capital before March 2020; after that, there was a more diversified allocation of assets in the portfolio. Hence, we observe investors’ typical tendency to diversify in times of high volatility and financial turbulence.

Fig. 8
figure 8

Allocation weights per instrument on the minimum variance portfolio at horizon \(T+1\) through the analysed windows. a DCC-eGARH predictions. b DCC-eGARCH-Vol predictions. The budget restriction \(\sum _{i=1}^n w_i = 1\), is considered in all cases

The performance of the forecasts is in line with what is found in Kim and Won (2018) for the volatility forecast of the KOSPI 200 stock index. However, their models are neither compared to vanilla MLP models nor in the context of cryptocurrencies. On the other hand, the recent competition (Makridakis et al., 2020) is favourable to hybrid models, but they do not compare with cryptocurrencies nor price volatility. In fact, in its previous version (Makridakis et al., 2018), it had been concluded that statistical models outperformed learning models in macroeconomic time series. Also, the work (Lahmiri & Bekiros, 2019) predicts the price of Bitcoin, Digital Cash and Ripple using chaotic neural networks. However, they do not compare with traditional and naive methods or analyse volatility. Further, the work (Kristjanpoller & Minutolo, 2018) is more in line with our object of study. They analyse hybrid models to forecast the volatility of bitcoin and find favourable results for Artificial Neural Network-Generalised AutoRegressive Conditional Heteroskedasticity (ANN-GARCH). Although the approach of the hybrid models is slightly different, and in our case, we analyse a portfolio instead, the results coincide in favouring the deep learning models but not the vanilla MLP models as in our case. Moreover, in D’Amato et al. (2022), the authors propose the Jordan Neural Network model to forecast the volatility of Bitcoin, Ripple and Ethereum. Their work found positive results concerning simpler models, although they do not consider the vanilla MLP version. To summarise, it can be inferred that the performance of the models is closely linked to the object of study. Cryptocurrencies are much more volatile than traditional markets, which may be why hybrid models do not work as expected.

9 Conclusions

We explored different models for volatility forecasting in the cryptocurrency market. The study period covered the financial turbulence of March 2020, when the World Health Organization (WHO) declared that the sanitary situation derived from the COVID-19 virus was a pandemic. In this sense, the cryptocurrency market, like traditional markets, experienced high volatility, representing an atypical period of systemic risk and, therefore, an increase in the complexity of the dependencies to be modelled.

Surprisingly, the median of the heteroscedastic metrics HAE and HSE reveal better performance for vanilla MLP models, yet comparable to the simple and hybrid deep learning models of LSTM and LSTM–GARCH. Particularly at T+1, a clear distinction in performance can be seen between the models of the GARCH family and the deep learning models. In general, the latter outperforms the former under these metrics.

Further, the performance of forecasts is evaluated in the context of univariate and multivariate uniform variance portfolios through the SR and VaR. Here we consider as a free-rate asset the stablecoin Tether and set a level of loss of \(\alpha =0.5\). Interestingly, we have found that the median of SR is negative for horizons T+1 and T+6 and positive for horizons T+12 and T+24. Also, we find that the naive models give better results for T+1 and T+6, while the DCC-eGARCH-Vol model maximises the SR for T+12 and T+24. Hence, the results imply that an investment strategy with a longer time horizon generates a higher Sharpe ratio. On the contrary, investors should allocate the capital to the risk-free asset Tether in the short horizon.

In the case of VaR, the lowest losses were found when we forecasted with the eGARCH-Vol models at the selected horizons. Therefore, including transaction volume information helps reduce losses in the portfolio. Nevertheless, we must be cautious with these two metrics since we are using data within the sample, and comparisons between models are only helpful to evaluate which reproduces the best-desired characteristics.

We have analysed the error residuals according to the DM statistics for the selected horizons. In general, GARCH family models are not found to be significantly more accurate than deep learning models. Even among the latter, we have no evidence to lean towards one or the other; that is, they are statistically equivalent in accuracy. In sum, the DM test favours the predictions obtained by the LSTM models for most of the prediction horizons.

One of the applications of this work is in the context of the allocation of assets in a portfolio of minimum variance or minimum investment risk. We illustrate the T+1 case, where we find a large proportion of capital assigned to BTC before the pandemic declaration. While from March, the portfolio becomes more diversified in the predictions of both DCC-eGARH and DCC-eGARCH-Vol. In short, there is a clear phase transition from a period highly dominated by BTC to a period where investors prefer diversifying their portfolios. The radical change from long to short positions in BTC is observed. So this type of forecast could serve as a thermometer in the face of financial turmoil in general. The allocation change of BTC is an unexpected finding and has exciting risk management implications that should be studied more carefully.

The evidence of our work leans towards a vanilla learning model as the preferred to model the volatility of high-frequency cryptocurrencies. The MLP outperforms the more stylised statistical models of the GARCH type. Even more surprising, LSTM and hybrid models do not show significant improvement over the vanilla neural network model. Further, including transaction volume showed improvements by reducing VaR. The unexpected result suggests that the increased parameters estimated by the more elaborated deep learning models destabilise the solutions and increase the prediction error. Then, using simple learning models in the volatility forecast of highly non-linear time series is suggested. On the one hand, MLP outperforms the most stylised models as the family GARCH. At the same time, MLP is not computationally as expensive as the LSTM models while keeping the same level of accuracy.

Future work can be extended in different aspects. Within finance, it would be interesting to analyse portfolio optimisation techniques by including additional constraints in a multistage manner and exploring risk metrics that consider more general distributions of returns. In the line of time series, the models of the GARCH family can be explored exhaustively using other specifications. In terms of machine learning, there is a wide field of work. Regarding the model selection, it would be desirable to perform tests with different architectures and hyperparameters, considering a more significant amount of information, both for the size of the input sequence and the dataset in general. It would be interesting to integrate data preprocessing, feature selection, and classification strategies as proposed in Pustokhina et al. (2021) to improve the methodology of our study. Even more, implementing hybrid models, including dynamic parameters of the GARCH models, is of great relevance to tackling real applications. Finally, we are interested in exploring more flexible models of deep learning to model the covariance matrix in the multivariate context.