1 Introduction

Since the creation of Bitcoin (Nakamoto 2009), cryptocurrencies have emerged as a new asset class that has seen extraordinary returns over the past decades. As a result, the global crypto market capitalisation exploded to a record of more than $2.6 trillion on Oct. 30 2021-roughly equivalent to twice the nominal GDP of Canada.Footnote 1 This rapid growth of the cryptocurrency market has revealed the increasingly important value of digital currencies as an electronic payment system and an attractive financial asset. Thus, accurate forecasts for the price and return of cryptocurrencies are essential for determining digital currency price trends and making informed decisions regarding asset allocation and risk management strategies.

Since the global financial crisis of 2008, the financial sector has seen tight regulatory measures put in place. For instance, a stricter capital requirement has been enforced by the Basel III international regulatory framework for banks, and improved risk management systems have been developed. However, the introduction of decentralized cryptocurrencies as a purely peer-to-peer version of electronic cash, allowing online payments to be sent directly from one party to another without using a financial institution, has posed a new challenge to the international financial system.

Cryptocurrencies, unlike traditional currencies, are based on cryptographic proof, which provides many advantages over traditional payment methods, such as high liquidity, lower transaction costs, and anonymity. Each transaction of the electronic coin is defined by a chain of digital signatures that each owner transfers to the next by digitally signing a hash of the previous transaction and the next owner’s public key and adding these to the coin’s end (Nakamoto 2009).

Among asset classes, Bitcoin, the world’s first cryptocurrency, has had one of the most volatile trading history with strong price movements. Some analysts even drew parallels between Bitcoin and the Dutch Tulipmania in the 17th century and the South Sea Bubble in the 18th century to indicate excessive greed and speculation in the crypto market. Notwithstanding the risks (including the collapse in November 2022 of FTX, a Bahama-based crypto exchange), the interest in Bitcoin and other cryptocurrencies has risen considerably in recent years as they provide high returns, thus attracting more and more new retail and institutional investors. With major exchanges, for example, the Chicago Mercantile Exchange and the Chicago Board Options Exchange, trading Bitcoin futures, central banks have been debating whether or not to regulate cryptocurrencies, given the numerous technical and legal issues involved.

All these issues have given the analysis of cryptocurrencies momentum. For instance, Mikhaylov (2020) examined the prospects and risks of cryptocurrency as a financial system element while also providing a theoretical foundation for predicting crypto market prices. The author explored the theoretical basis of digital currencies, assessing the chronology of blockchain technology and cryptocurrency development, analysing the current market state, and evaluating the potential threats and prospects for cryptocurrency within the global financial system. A series of academic studies have investigated models for predicting the price (e.g. Singh et al. 2023a; Rathore et al. 2022; Jaquart et al. 2021; Pabuçcu et al. 2020; Adcock and Gradojevic 2019) and volatility of cryptocurrencies (e.g. Kim et al. 2021; Ardia et al. 2019; Borri 2019; Caporale and Zekokh 2019; Katsiampa 2019; Feng et al. 2018; Chu et al. 2017; Gronwald 2014; Singh et al. 2023a). Some studies have used the GARCH family models to estimate the volatility of digital currencies (e.g. Kim et al. 2021; Katsiampa 2019; Chu et al. 2017; Gronwald 2014). Other researchers have instead adopted extreme value theory (EVT) and machine learning approach for estimating risk measures such as Value-at-Risk (VaR) and Expected Shortfall (ES) for cryptocurrencies (e.g. Lahmiri and Bekiros 2019; Tiwari et al. 2019; Feng et al. 2018; Peng et al. 2018). We discuss all these studies in more detail in the review of related literature.

The aim of this work is to jointly predict conditional quantiles and tail expectations for the returns of the most popular cryptocurrencies, namely Bitcoin, Ethereum, Ripple, Dogecoin and Litecoin, using financial and macroeconomic indicators as explanatory variables. Methodologically, we first fit a Monotone Composite Quantile Regression Neural Network model (MCQRNN) for each of the considered cryptocurrencies to estimate a one-step ahead prediction of VaR and ES using a rolling window technique. The superior set of models is then chosen using the Model Confidence Set procedure. We also compare the performance of different MCQRNN models against two benchmark models—Historical simulation (HS) and the GARCH model.

We contribute to the extant literature in several ways. First, we provide the forecasts of large negative and positive returns (i.e. located on the left and right tails of the distributions) for the major leading digital currencies, since we consider long and short trading positions. While the literature has generally provided forecasts for Bitcoin prices (Basher and Sadorsky 2022; Rathore et al. 2022; Jaquart et al. 2021; Kim et al. 2021; Pabuçcu et al. 2020; Adcock and Gradojevic 2019), there is less attention to extreme return forecast (Borri 2019; Caporale and Zekokh 2019; Ardia et al. 2019; Feng et al. 2018); thus, our work broadens the attention by focusing on tail risk predictions of a comprehensive set of cryptocurrencies using a novel approach.

Second, we apply, for the first time, the MCQRNN model proposed by Cannon (2018) to the case of digital currencies to predict their returns on the two tails. Ignoring the quantile crossing problem may lead to unrealistic distribution forecasting of response variables. The MCQRNN allows us to impose the monotonicity constraints, thus solving the quantile crossing problem. Our research shows that the MCQRNN model can produce quantile forecasts that are more accurate, robust, and realistic.

Third, we introduce a set of explanatory variables to check whether their inclusion improves the forecasting performance. Similar to the studies of Basher and Sadorsky (2022); Wang et al. (2022); Bakas et al. (2022), we aim to forecast tail risk (VaR and Expected Shortfall) not just for Bitcoin but four additional cryptocurrencies using exogenous (macroeconomic and financial) variables. Our results indicate that some macroeconomic and financial variables are useful in forecasting the tail risk of cryptocurrencies. In particular, the VIX index, yield spread and inflation expectations tend to sharpen the predictions of tail returns. Our findings appear robust, considering different significance levels and benchmark models.

Our analysis is important since the high volatility behaviour of cryptocurrencies with the consequent ups and downs in returns necessitates a high level of risk management from institutional investors and individuals who include these new digital assets in their portfolios. In addition, accuracy in the predictions would help policy authorities to make more informed decisions. We also contribute to the growing literature on risk forecasting by employing a method that allows for estimating VaR by considering more quantiles at the same time and treating the quantiles as a monotone covariate while ensuring no quantile crossing.

This paper is organised as follows: Sect. 2 reviews the related work in volatility modelling of cryptocurrencies, Sect. 3 presents our proposed models, Sect. 4 shows the empirical analysis, and Sect. 5 concludes.

2 Literature review

Many studies have examined the price dynamics and volatility of cryptocurrencies. We have grouped the analyses into three main strands. The first strand discusses studies on modelling and forecasting volatility, including a group of analyses using the GARCH family models and studies considering other parametric methods. The second strand of the literature concerns those analyses that predict cryptocurrency volatility and tail risks using non-parametric models, such as machining learning. Finally, the third strand focuses on studies that use various parametric and non-parametric methods to predict cryptocurrency price movements.

The literature for modelling the volatility of financial time series and estimating VaR and ES started from the seminal work by Bollerslev (1986), which introduced the Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) model as an extension of the original ARCH model proposed by Engle (1982). Since then, many additional specifications have been developed for the GARCH model: the Student’s t-GARCH model of Bollerslev (1987), the exponential GARCH (EGARCH) model of Nelson (1991), the GJR-GARCH model of Glosten et al. (1993), the threshold GARCH (TGARCH) model of Zakoian (1994) are just a few examples.

Regarding the volatility of cryptocurrencies, Gronwald (2014) showed that an autoregressive jump-intensity GARCH fits Bitcoin’s data more accurately than a standard GARCH model. Chu et al. (2017) modelled seven cryptocurrencies with 12 GARCH specifications having different distributions; an IGARCH (1, 1) model with normal innovations generated the smallest values of information criteria like AIC, BIC, HQC and the Consistent AIC (CAIC) for Bitcoin, Dash, Litecoin, Maidsafecoin and Monero. In contrast, the GJR-GARCH (1, 1) and GARCH (1, 1) models with normal innovations yielded the smallest values for Dogecoin and Ripple, respectively. Extending the study by Ardia et al. (2019) to test the presence of regime changes in the GARCH volatility dynamics of cryptocurrencies, Caporale and Zekokh (2019) documented that using standard GARCH models may yield incorrect VaR and ES predictions. Instead, adopting ES and joint loss functions within the Model Confidence Set procedure advanced by Hansen et al. (2011) allows the selection of the best model or set of models for modelling volatility of Bitcoin, Ethereum, Ripple, and Litecoin.

Some studies employed exogenous variables for studying Bitcoin volatility. Wang et al. (2022) used 17 macroeconomic variables and 18 technical indicators to forecast the realized volatility of Bitcoin. Their result showed that technical indicators have more forecasting capabilities for Bitcoin volatility during the low volatility state. Applying different models (Autoregressive model, Principal component model, Partial least squares model, Shrinkage method, and Forecast combination), they showed that shrinkage methods, including Elastic net and LASSO, can significantly improve the accuracy of Bitcoin volatility forecasting. Bakas et al. (2022) applied the dynamic Bayesian model to identify the main drivers of Bitcoin volatility and showed that Google trends, total circulation of Bitcoins, the US consumer confidence, and the S &P500 index are the most important factors for Bitcoin volatility. Their results further indicated that Bitcoin Google trends have a positive impact, while total circulation of Bitcoins, US consumer confidence, and the S &P500 index have a negative impact on Bitcoin volatility. Borri (2019) used CoVaR to estimate the conditional tail-risk within the cryptocurrency market (Bitcoin, Ethereum, Ripple, and Litecoin). The study found that cryptocurrencies are highly exposed to tail risks within crypto markets, but they are not exposed to tail risks concerning other global assets, including equities or gold. The author reported, in fact, that gold is poorly correlated with cryptocurrencies. A similar result was also documented by Lawuobahsumo et al. (2022). Borri (2019) further indicated that the VIX index has positive and significant \(q-quantile\) slope coefficients on Bitcoin and Ripple, meaning that a substantial increase in the VIX index leads to a significant drop in these cryptocurrencies.

In order to improve the forecasting beyond the GARCH for cryptocurrencies, scholars have applied several other models that provide better performance. Feng et al. (2018) used the EVT-based method to evaluate the extreme characteristics of seven representative cryptocurrencies to measure the conditional VaR and conditional ES. The authors show that cryptocurrencies are independent with four selected stock indices for left tail and cross tail. This result means that cryptocurrencies can function as a safe haven to stock indices and also their ability to serve as a diversifier for the stock market as gold, but cannot be a tail hedging instrument like gold. Katsiampa (2019) used extreme value theory to investigate the tail behaviour of Bitcoin, Ethereum, Ripple, Bitcoin Cash, and Litecoin returns, estimating VaR and ES as tail risk metrics. The main finding is that the conditional variances of all five cryptocurrencies are significantly affected by both previous squared errors and past conditional volatility.

The second strand of the literature focuses on non-parametric techniques to forecast price volatility. Unlike traditional time-series models, machine learning and deep learning models perform better at analyzing non-linear multivariate data while remaining robust to noise values. Kim et al. (2021) examined the volatility of nine cryptocurrencies based on their market capitalization using a Bayesian Stochastic Volatility (SV) model and several GARCH models. They reported that the SV model performed better than the GARCH family models when dealing with extremely volatile financial data. Based on forecasting errors, they further noted that the SV model was more accurate than the GARCH model for longer time horizons. Also, Tiwari et al. (2019) compared GARCH and SV models, discovering that the latter models performed better for both Bitcoin and Litecoin. Furthermore, their results revealed that the Stochastic Volatility-t model performs the best for Bitcoin while GARCH-t is the best for Litecoin.

Lahmiri and Bekiros (2019) used the long-short term memory neural network (LSTM) and found that the predictability is significantly higher than the generalized regression neural networks, their benchmark system. Peng et al. (2018) applied the standard GARCH model along with a machine learning approach to volatility estimates, employing Support Vector Regression (SVR) to estimate the mean and volatility equations and comparing them to GARCH family models. The authors used the Diebold-Mariano test and Hansen’s Model Confidence Set to assess the models’ prediction performance for low- and high-frequency data. SVR-GARCH models turned out to outperform GARCH, EGARCH, and GJR-GARCH models with Normal, Student’s t, and Skewed Student’s t distributions, according to their findings.

The third strand of the literature comprises studies that adopt non-parametric and parametric methods to forecast prices and/or returns. For instance, Adcock and Gradojevic (2019)

utilize the feedforward neural network to produce point and density forecasts of Bitcoin return. They argue that returns are characterized by predictive non-linear trends reflecting the speculative nature of cryptocurrency trading. Pabuçcu et al. (2020) use Support Vector Machines, the Artificial Neural Network, the Naïve Bayes, and the Random Forest machine learning model to forecast the movements of Bitcoin prices at a high degree of accuracy and compared their performance against the logistic regression model as a benchmark. They showed that the Random Forest model performed better than all the other models including the benchmark model. Basher and Sadorsky (2022) use interest rates, inflation, and market volatility as macroeconomic variables and MA50, WAD, and MACDSignal as indicator variables for forecasting Bitcoin prices. Using tree-based machine learning classifiers against traditional logit models, they reported that random forests predict Bitcoin and gold price directions better than logit models. Further results showed that MA50, WAD, MACDSignal oil volatility index and Ten-year bond yields are relevant variables for predicting Bitcoin price direction.

Jaquart et al. (2021) used machine learning models to predict short-term price movements of the bitcoin market, gold, oil, and minute levels for the total return variants of the indices MSCI World, S &P 500, and VIX. Model predictions were compared based on the accuracy scores, the Gradient boosting classifier, and recurrent neural networks were the best-performing methods across all prediction horizons. Rathore et al. (2022) used the Fbprophet model to predict Bitcoin closing price, opening price, day high, day low, day volume, and market capitalization on a particular day. This method relies on using deep learning algorithms and various concepts of machine learning, which can find hidden patterns in data, combine them, and make considerably more accurate predictions. Fleischer et al. (2022) apply the LSTM model to learn the patterns within cryptocurrency closing prices and to predict future prices. Using the root-mean-squared error to compare the predictive ability of alternative models, the LSTM model significantly outperformed the ARIMA model due to its ability to handle trends and seasonality. Similarly, Singh et al. (2023a) employ the generalized grey model EGM (1, 1, \(\alpha\), \(\theta\)) to predict the closing prices of Bitcoin, Bionic, Cardano, Dogecoin, Ethereum, and Ripple. They compare the accuracy of the generalized model (EGM (1, 1, \(\alpha\), \(\theta\)) with the classical model (EGM (1,1)), linear regression, and exponential regression. The research findings show that the generalized model (EGM(1, 1, \(\alpha\), \(\theta\))) generally outperforms the classical model in forecast accuracy.Footnote 2 Their works show the importance of considering various forecasting methods when making investment decisions and selecting the best-performing model. Due to the high volatility nature of cryptocurrencies, the paper emphasizes the need for dependable and accurate prediction methods to guide investment decisions in the cryptocurrency market.

Contrary to Rathore et al. (2022); Jaquart et al. (2021); Pabuçcu et al. (2020); Adcock and Gradojevic (2019); Fleischer et al. (2022); Singh et al. (2023a), our aim is to forecast cryptocurrencies tail risk similar to Borri (2019); Basher and Sadorsky (2022); Wang et al. (2022); Bakas et al. (2022) using exogenous macroeconomic and financial market variables. Our method allows for estimating VaR and ES jointly. The novelty of our work is the ability of our model to estimate more quantiles simultaneously while treating the quantiles as a monotone covariate and ensuring no quantile crossing. In addition, we assess whether the hypothesis according to which macroeconomic and financial variables can improve the predictions of extreme cryptocurrency returns holds.

3 Models

Let \(P_t\) denote the time-t price of a given cryptocurrency so that the time-t log-return is \(y_t=\log (P_t)-\log (P_{t-1})\). For \(h\ge 1\), we denote by \(Y_{t+h}=\sum _{j=0}^{h-1} y_{t+j}\) the h-period return.

In the case of a long position in one of the cryptocurrencies, VaR, denoted by \(Q_{t,h}(\tau )\) is the \(\tau\)-conditional quantile of \(Y_{t+h}\) at time t, meaning that \(\tau =P\left( Y_{t+h} \le Q_{t,h}(\tau ) | \Omega _{t}\right)\), where \(\Omega _{t}\) denotes the information set available at time t. We also consider the following tail expectation, known as expected shortfall (ES):

$$\begin{aligned} ES_{t,h}(\tau )=\mathbb {E}_{t}\left[ Y_{t+h} | Y_{t+h} \le Q_{t,h}(\tau ) \right] , \end{aligned}$$

where \(\mathbb {E}_t(\cdot )\) denotes expectation conditional on \(\Omega _{t}\). In the case of short positions, instead, we compute the two risk measures on \(-Y_{t+h}\), or, equivalently, look at the right tail of \(Y_{t+h}\).

3.1 Quantile regression neural networks

Taylor (2000) proposed the Quantile Regression Neural Network (QRNN), a nonlinear nonparametric method combining quantile regression and neural networks. In the QRNN model, the \(\tau\)-quantile is specified as

$$\begin{aligned} Q_{t,h}(\tau ) = \sum _{i=1}^I H_{i,t}(\tau ) \exp (w_i(\tau ))+b(\tau ) \end{aligned}$$
(1)

where \(w_i(\tau )\) is a I-parameter vector of the output layer, \(b(\tau )\) is the intercept parameter of the output layer and the hidden layer \(H_{i,t}(\tau )\) outputs are given by

$$\begin{aligned} H_{i,t}(\tau )= f\left( \sum _{j=1}^p w_{i,j}^{(H)}(\tau ) x_{j,t}+ b_{i}^{(H)}(\tau ) \right) . \end{aligned}$$
(2)

In (2), \(w_{i,j}^{(H)}(\tau )\) is a \(I\times p\) parameter matrix, \(x_{j,t}\) represents j-th explanatory variable at time t and \(f(x)=\tanh (x/2)\).

The parameters \(w_{i,j}^{(H)}(\tau )\), \(b_{i}^{(H)}(\tau )\), \(w_i(\tau )\), and \(b(\tau )\) are estimated by minimizing the asymmetric absolute value error function given as

$$\begin{aligned} E_\tau =\frac{1}{T}\sum _{t=1}^T \rho _\tau ( Y_{t+h} - Q_{t,h}(\tau ) ). \end{aligned}$$
(3)

where \(\rho _{\tau }(u)=u\cdot (\tau -\mathbb {I}(u<0))\) and \(\mathbb {I}(\cdot )\) is the indicator function. The fundamental quantity of interest in this context is the magnitude of the conditional quantile associated with the quantile probability \(\tau\) (\(0<\tau <1\)). The asymmetric absolute value function gives different weights to positive or negative deviations.

Since the derivative of (3) is not defined at the origin, Chen (2007) and Cannon (2011) suggest replacing, for small values of \(\epsilon\), the quantile regression error function by

$$\begin{aligned} \rho _\tau ^{(A)}(u)= {\left\{ \begin{array}{ll} \tau \varphi (u)&{} \hbox { if \,}\ u \ge 0 \\ (\tau -1) \varphi (u)&{} \hbox { if \,}\ u < 0 \end{array}\right. } \end{aligned}$$
(4)

where

$$\begin{aligned} \varphi (u) = {\left\{ \begin{array}{ll} \frac{u^2}{2\epsilon }&{} \hbox { if \,}\ 0 \ge |u| \ge \epsilon \\ |u|-\frac{\epsilon }{2} &{} \hbox { if \,}\ |u| > \epsilon \end{array}\right. } \end{aligned}$$

is the Huber norm. The smooth transition of the Huber norm from the square error ensures differentiability. As \(\epsilon \rightarrow 0\), the approximated error function tend to converge to the exact quantile regression error function. The QRNN optimization method uses a quasi-Newton algorithm to minimize \(E_\tau\), see Amalia et al. (2018).

Regularization terms \(\lambda\) can be added to the error function to penalize the magnitude of the parameters which limits the nonlinear modeling capacity of the model. The penalized loss function has the form

$$\begin{aligned} {\widetilde{E}}^{(A)}_\tau =\frac{1}{T}\sum _{t=1}^T \rho ^{(A)}_\tau ( Y_{t+h} - Q_{t,h}(\tau ) ) +\lambda ^{H}\frac{1}{I\cdot p} \sum _{i=1}^I \sum _{j=1}^p\left( w_{i,j}^{(H)}(\tau ) \right) ^2+ \lambda \sum _{i=1}^I\left( w_{i}(\tau ) \right) ^2 \end{aligned}$$
(5)

where \(\lambda ^{H} \ge\) 0 and \(\lambda \ge\) 0. These hyperparameters are typically set by minimizing the out-of-sample generalization error as a control for \(w^{(H)}\), and w. As suggested by Cannon (2018), \(\lambda\) was set to zero in our estimation.

It is possible to minimize quantile crossing in the standard QRNN model by selecting and implementing appropriate weight penalties, but multiple quantiles guaranteed to be non-crossing cannot be estimated synchronously.

3.2 Monotone composite quantile regression neural network

The Monotone Composite Quantile Regression Neural Network (MCQRNN) method was first proposed by Cannon (2018). MCQRNN model is more robust than traditional QRNN model, especially for non-normal error distribution. The main advantage of MCQRNN, in addition to non-crossing, is that the multiple quantile functions estimated simultaneously are more approximate to the actual conditional quantile function.

MCQRNN is created by combining elements from the traditional QRNN model, the monotone multi-layer perceptron, the composite quantile regression neural network (CQRNN), the expected regression neural network, and the generalized additive neural network. The MCQRNN model is the first neural network-based implementation of quantile regression that estimates multiple non-crossing, nonlinear conditional quantile functions while improving regression quantile estimation accuracy Cannon (2018). The MCQRNN of Cannon (2018) is obtained by considering \(K>1\) quantiles at the same time and treating the quantile levels as a monotone covariate. To be more precise, in the CQRNN model, the function to be minimised (ignoring for simplicity the penalty terms) is

$$\begin{aligned} {\widetilde{E}}^{(A)}_{C\tau }=\frac{1}{U}\sum _{k=1}^K\sum _{t=1}^T \rho ^{(A)}_{\tau _k}( Y_{t+h} - Q_{t,h}(\tau _k) ), \end{aligned}$$
(6)

where \(U=T\cdot K\) and C stands for composite. The vector \((\tau _1,\ldots ,\tau _K)\) is first added to the model as a covariate. To this end, we create the covariate vector \(x_{0,u}^{(S)}\), \(u=1,\ldots ,U\), by repeating each \(\tau _k\) T times and stacking. Here (S) denotes stacked data. Next, we concatenate the newly created vector with K copies of the original \(T\times p\) covariate matrix, \(\varvec{X}\). At the same time, we stack K copies of the response variable to obtain \(\varvec{y}^{(S)}\) The new covariate matrix and response variable we consider in the MCQRNN model are

$$\begin{aligned} \varvec{X}^{(S)} = \begin{bmatrix} \tau _1&{} x_{1,1}&{} \cdots &{} x_{p,1}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \tau _1&{} x_{1,T}&{} \cdots &{} x_{p,T}\\ \tau _2&{} x_{1,1}&{} \cdots &{} x_{p,1}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \tau _2&{} x_{1,T}&{} \cdots &{} x_{p,T}\\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ \tau _K&{} x_{1,1}&{} \cdots &{} x_{p,1}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \tau _K&{} x_{1,T}&{} \cdots &{} x_{p,T}\\ \end{bmatrix}, \quad \varvec{Y}^{(S)} = \begin{bmatrix} Y_{1+h}\\ \vdots \\ Y_{T+h}\\ Y_{1+h}\\ \vdots \\ Y_{T+h}\\ \vdots \\ Y_{1+h}\\ \vdots \\ Y_{T+h}\\ \end{bmatrix} \end{aligned}$$

Using the stacked data above, we minimise the function

$$\begin{aligned} {\widetilde{E}}^{(A,S)}_{C\tau }=\frac{1}{U} \sum _{u=1}^U \rho ^{(A)}_{\tau (u)}\left( Y^{(S)}_{u} - Q^{(S)}_{u,h}(\tau (u)) \right) , \end{aligned}$$
(7)

where \(\tau (u)=x_{0,u}\) for \(u=1,\ldots ,U\). Penalty terms can be added as in (5). Finally, the vector \(x_{0}^{(S)}\) is treated as a monotone covariate by changing the associated coefficient in (2) to \(\exp \left( w_{i,j}^{(H)}\right)\). Since \(f(\cdot )\) is an increasing function, we are guaranteed that quantiles do not cross, in the sense that if \(\tau _l<\tau _k\) then \(Q_{t,h}(\tau _l)<Q_{t,h}(\tau _k).\) Hence, we can estimate ES using the following approximation (see for instance Khalaf et al. 2021

$$\begin{aligned} ES_{t,h}(\tau )\simeq \frac{1}{K}\sum _{j=1}^{K} Q_{t,h}\left( j\frac{\tau }{K}\right) \end{aligned}$$
(8)

3.3 Assessing quantile and tail expectation forecasts

We use the Model Confidence Set procedure of Hansen et al. (2011) to classify the models based on their out-of-sample performance. The MCS procedure is based on an optimality criterion so that the resulting set \(M^*\) (Superior Set Models, SSM) will contain the best model with a given confidence level \(\alpha\).

It uses the idea of sequential testing, for which the generic set \(M^0\), containing \(m_0\) competing models, gets reduced in the number of elements by an elimination rule if the Equal Predictive Ability (EPA) null hypothesis is rejected. The procedure is iterated until the EPA hypothesis is not rejected for all the models left in the set, constituting the optimal model confidence set \(M^*_{1-\alpha }\).

Let \(l_{i,t}\) be a loss function associated with model i at time t. ES is not elicitableFootnote 3 on its own but jointly elicitable along with VaR provided a set of suitable scoring functions. We jointly assess quantile and ES forecasts considering the following functional form for the loss function proposed by Fissler et al. (2015):

$$\begin{aligned} l_{i,t,h}= & {} \rho _{\tau }\left( Y_{t+h} - Q_{t,h}^i(\tau ) \right) -\tau Y_{t+h} \nonumber \\&\quad&+ \frac{ES_{t,h}^i(\tau )}{ 1 + \exp \left( ES_{t,h}^i(\tau ) \right) }\times \left( ES_{t,h}^i(\tau ) - Q_{t,h}^i(\tau ) + I(Y_{t+h} \le Q_{t,h}^i(\tau )) \right. \nonumber \\{} & {} \quad \left. \times \frac{Q_{t,h}^i(\tau ) - Y_{t+h}}{\tau }\right) + \log \left( \frac{2}{1+ \exp \left( ES_{t,h}^i(\tau ) \right) }\right) . \end{aligned}$$
(9)

The function enables us compare forecast from different methods with the best method having the lowest value.

For simplicity, we remove the dependence of the loss function on the horizon h and define the relative performance variables as the differential between the i and j models as

$$\begin{aligned} d_{ij, t} = l_{i,t} - l_{j,t}\quad \forall i, j \in M^0\quad t=1, \dots , n. \end{aligned}$$
(10)

and the simple average loss of model i relative to the other models \(j \in M\) at time t as

$$\begin{aligned} d_{i\cdot , t} = (m-1)^{-1} \sum _{j\in M\setminus {i} } d_{ij,t}. \end{aligned}$$
(11)

For the elimination of inferior elements within the set \(M^0\), two alternative sets of hypothesis are available to test the EPA:

$$\begin{aligned} {\left\{ \begin{array}{ll} H_0 :\mathbb {E}(d_{ij})=0 \quad \forall i, j= 1, \dots , m, \quad \text {against}\\ H_1 :\mathbb {E}(d_{ij})\ne 0 \end{array}\right. } \end{aligned}$$
(12)

or

$$\begin{aligned} {\left\{ \begin{array}{ll} H_0 :\mathbb {E}(d_{i\cdot })=0 \quad \forall i= 1, \dots , m, \quad \text {against}\\ H_1 :\mathbb {E}(d_{i\cdot })\ne 0 \end{array}\right. } \end{aligned}$$
(13)

Two statistics are then constructed to test the hypotheses:

$$\begin{aligned} T_{ij} = \frac{\bar{d}_{ij} }{\sqrt{var(d_{ij})}}\quad T_{i\cdot } = \frac{\bar{d}_{i\cdot }}{\sqrt{var(d_{i\cdot })}} \end{aligned}$$
(14)

where \(\bar{d}_{ij}\) is the relative average losses between i and j models and \(\bar{d}_{i\cdot }\) represents the average losses of the \(i^{th}\) model relative to the average losses across the models belonging to the set M.

$$\begin{aligned} \bar{d}_{ij} = n^{-1} \sum _{t=1}^{n} d_{ij, t} \quad \bar{d}_{i\cdot } = (m-1)^{-1} \sum _{j\in M\setminus {i}} \bar{d}_{ij} \end{aligned}$$
(15)

The standard errors are constructed by block bootstrap where p is the maximum number of significant parameters obtained by fitting an \(AR_{(p)}\) process to the \(d_{ij}\) terms.

The two hypotheses from (12) and (13) can be tested using

$$\begin{aligned} T_{R} = \max _{i,j \in M} T_{ij} \quad T_{max} = \max _{i \in M} T_{i\cdot }. \end{aligned}$$
(16)

Because the distributions of the two test statistics under the null are not known, they are simulated using the bootstrap.

Bernardi and Catania (2016) summarised the algorithm for the procedure as follows:

  1. 1.

    Set \(M = M_0\).

  2. 2.

    Compute the test statistics under the null EPA hypothesis. If it is not rejected, set \(M^*_{1-\alpha } = M\) and terminate the algorithm. If it is rejected use the elimination rule to determine the worst model.

  3. 3.

    Discard the model and repeat step 2.

The elimination rule defines a sequence of sets \(M = M_0 \supset M_1\dots \supset M_m\), where \(M_i = (e_{M_i}, \dots , e_{M_m})\), each of which has a p value associated with EPA test. Let \(P_{H_0, M_i}\) be the p value associated with the null hypothesis \(H_{0, M_i}\). The MCS p value for model \(e_{M_j}\supset M\) is defined as \({\hat{p}}_{e_{M_j}} = max_{i\le j} P_{H_0, M_i}\).

4 Empirical analysis

4.1 Data description

To jointly compute VaR and ES, we consider daily returns for the five most popular cryptocurrencies, namely Bitcoin (BTC), Ethereum (ETH), Ripple (XRP), Litecoin (LTC) and Dogecoin (DOGE). The period of the analysis goes from February 10, 2015, to September 27, 2021, for a total of 1500 daily observations. We set \(\tau =1\%\) and 5\(\%\) for both in the case of a long and short position in a given cryptocurrency. We also set \(K=8\) so that ES is obtained as the average of 8 quantiles. Finally, we move a window of 1000 observations and use each window to fit the models and make predictions. Hence, for each model, we end up with 500 out-of-sample forecasts for the quantiles and tail expectations. We consider a one-day (\(h = 1\)) and a one-week (\(h = 5\)) forecast horizon for the two risk measures of interest. To further compare the performance of our model, we use two benchmarks: the Historical simulation and a popular conditional volatility model. The first method does not make any assumption about the return distribution and is simple to implement since it estimates VaR and ES using their empirical counterparts. The second benchmark is the ARMA(1,1)-GARCH(1,1) model with normal innovations.

Table 1 Explanatory variables

Table 1 reports the explanatory variables and their various transformations used in our study, and Fig. 1 shows the correlation of all variables.

In particular, we consider explanatory business cycle variables such as interest rates, inflation, and market volatility to forecast extremely large cryptocurrency returns in line with Basher and Sadorsky (2022). We further include the dollar-effective exchange rate, the NASDAQ index and the prices of gold and crude oil. The US dollar-effective exchange rate (DXY) is a trade-weighted index that tracks the price of the US dollar against the main foreign currencies and is a significant indicator of international competitiveness (Algieri 2014). The NASDAQ Composite Index includes almost all equities listed on the NASDAQ stock exchange. Gold and crude oil prices reflect the developments of the main precious metal and energy commodity markets. Interest rates are measured using the term spread, which is calculated as the spread between 10-Year Treasury Constant Maturity and 3-Month Treasury Constant Maturity, and using the US generic government 10-year yield (Algieri et al. 2017). Inflation variables comprise the expected five-year inflation and breakeven inflation. The latter is computed as the difference between the nominal yield on a fixed-rate 10-Year Treasury bond and the real yield on the inflation-linked investment with the same maturity. Market volatility is captured by the VIX index derived from the volatility implied from the prices of options on the S &P 500 index. This index is a sentiment variable that reflects the equity market’s perception of risk and tends to be lower (higher) in bull (bear) markets (Algieri and Leccadito 2021; Jaquart et al. 2021; Adcock and Gradojevic 2019). All data have been extracted from the Bloomberg database.

Figure 1 shows that among cryptocurrencies, there exists a high positive correlation. The high positive correlation implies that cryptocurrencies are exposed to the same extreme factors and are rapidly contagious to each other during extreme events. This means that there exist systematic risks in the cryptocurrency market. Moreover, Bitcoin and Litecoin have the highest dependence. We also observe a positive but weak correlation between the selected cryptocurrencies and the NASDAQ Composite Index and gold, and a negative but weak correlation with the VIX and US Dollar Index. The correlation in Fig. 1 also confirms the high negative dependence between VIX and NASDAQ Composite Index. Figures 23 and 4 present the plots of cryptocurrencies log-return, economic and financial data used as predictor variables, respectively.

The main descriptive statistics for our data are displayed in Table 2. Specifically, Table 2 shows that the average returns are higher for Ethereum, Dogecoin and Bitcoin. The Dogecoin series has the highest positive skewness (suggesting frequent small losses and a few extreme gains) compared to the other cryptocurrency. Bitcoin returns are, instead, negatively skewed, indicating frequent small gains and a few extreme losses. The standard deviations across the returns of the five cryptocurrencies range between 4.3 for Bitcoin and 7.9 for Dogecoin. All cryptocurrency series do not display bell-shaped Gaussian distributions, as suggested by the zero p values of the Jarque-Bera test. Indeed, all five return series show excess kurtosis, especially for Dogecoin.

Table 2 Descriptive statistics
Fig. 1
figure 1

Correlation plot of variables. Note: the lighter the color, the closer the correlation is to zero. Descriptions of variables is given in Table 1

Fig. 2
figure 2

Plot of cryptocurrencies log returns. Note: Data spans from February 10, 2015, to September 27, 2021

Fig. 3
figure 3

Plot of economic predictor variables. Note: Excluding the US Dollar Index, for which we use log return, we consider first differences for the remaining variables

Fig. 4
figure 4

Plot of financial predictor variables. Note: We consider log returns for all financial variables in the plot

We have used the MCQRNN model to jointly predict conditional quantiles and tail expectations for the returns of the five cryptocurrencies. We have constructed six MCQRNN models and compared them to both benchmark models. The specifics of the models are in Table 3. In detail, we consider three different sets of explanatory variables and either two or three hidden layers in the MCQRNN model. Results for the HS benchmark are presented in Tables 4, 5, 6 and 7, whereas those for the ARMA(1,1)-GARCH(1,1) model are displayed in Tables 8, 9, 10 and 11.

Table 3 MCQRNN models

Specifically, we report the Score Ratio, i.e. the ratio between the average loss function (9) for model i and the one for the benchmark model,

$$\begin{aligned} {\text{Score Ratio}}_i=\frac{ \sum _{t=1}^{T_h} l_{i,t,h} }{ \sum _{t=1}^{T_h} l_{\text{Benchmark},t,h} } \end{aligned}$$

where \(T_h\) is the number of predictions we can make for the horizon h. If a model has a Score Ratio lower than one, then it produces better VaR and ES forecasts for the cumulated return \(Y_{t+h}\) than the benchmark model. We also test whether a given model produces significantly better forecasts than the benchmark model by testing the null hypothesis of equal predictability with the test of Diebold and Mariano (2002). Hence we report p values for the test of equal predictability between model i and the benchmark model. p values smaller than a prespecified significance level (e.g. 5%) suggest that model i is more accurate than the benchmark model in jointly forecasting VaR and ES for cryptocurrencies. Finally, we use the MCS procedure to assess the accuracy of the considered models. We present under the column ‘SSM Y/N’ whether model i belongs (Y) or not (N) to the Superior Set Models.

4.2 Discussion of results

In Tables 4, 5, 6 and 7, we report the performance of our model against the Historical simulation method used as the first benchmark for estimating VaR and ES. Our results show that the MCQRNN model performs better than the HS method. In all cases (1% and 5% confidence level) and for both long and short positions, the benchmark model does not enter the Superior Set Model for the one-step ahead forecast for joint VaR and ES. The superior forecasting performance of our model is also confirmed considering the score ratio and the reported p values of Diebold and Mariano (2002) test of equal predictability between model i and the HS benchmark (being the p values close to zero for all i models). We observe similar results for the five-step-ahead predictions. The benchmark model enters the Superior Set Model for Dogecoin in the short position at 1% (see panel 2a of Table 5), suggesting equal predictability between MCQRNN and HS models, as confirmed by the DM p value close to 1. Also, in panel 5a of Table 7 for Litecoin, the Superior Set Model includes the benchmark model.

Table 4 Joint 1%-VaR and 1%-ES forecast, long position, benchmark: historical simulation
Table 5 Joint 1%-VaR and 1%-ES forecast, short position, benchmark: historical simulation
Table 6 Joint 5%-VaR and 5%-ES forecast, long position, benchmark: historical simulation
Table 7 Joint 5%-VaR and 5%-ES forecast, short position, benchmark: historical simulation

Considering the ARMA(1,1)-GARCH(1,1) model as the second benchmark, the results show that the MCQRNN models have superior performance in most cases when predicting VaR and ES for both long and short positions in crypto trading (Tables 8, 9, 10, 11). For all instances of a one-day-ahead forecast, the benchmark model was not eliminated from the Superior Set Model for 1% \(\tau\) short position forecast of Dogecoin (Table 9 panel 2b). Nevertheless, the score ratio and the Diebold and Mariano test p values indicate that the MCQRNN model has a better predictive ability than the ARMA(1,1)-GARCH(1,1) benchmark.

The results of the superior predictive performance of MCQRNN models hold, with a few exceptions, for the five-day-ahead forecast at 1% and 5% for long or short positions. We observe that the benchmark model is not eliminated from the Superior Set Model for Dogecoin, Ethereum, Ripple, and Litecoin. We document a high score ratio and high p values for the Diebold and Mariano test (panel 2b of Table 8 and panel 2b–2e of Table 9). This result implies equal predictability among the models for forecasting jointly the 1%-VaR and ES of long and short positions. For Ethereum, as shown in panel 2c of Table 10, the benchmark model performs better than the MCQRNN models with score ratios greater than 1 and Diebold and Mariano test p values equal to 1.

In a nutshell, the MCQRNN models augmented with macroeconomic and financial variables can improve the predictions of extreme cryptocurrency returns for both long and short-trading positions. The best performing models are M3 and M4, with 2 and 3 layers containing the VIX index, Treasury Yield Spread, the 5-Year Forward Inflation Expectation Rate and the 10-Year Breakeven Inflation Rate.

Table 8 Joint 1%-VaR and 1%-ES forecast, long position, benchmark: ARMA-GARCH (1, 1)
Table 9 Joint 1%-VaR and 1%-ES forecast, short position, benchmark: ARMA-GARCH (1, 1)
Table 10 Joint 5%-VaR and 5%-ES forecast, long position, benchmark: ARMA-GARCH (1, 1)
Table 11 Joint 5%-VaR and 5%-ES forecast, short position, benchmark: ARMA-GARCH (1, 1)

Our findings differ from Adcock and Gradojevic (2019), who reported that the VIX index failed to improve their model performance for predicting returns but is consistent with previous research results, according to which VIX is an important macroeconomic variable for predicting Bitcoin price direction and tail risks (Borri 2019; Basher and Sadorsky 2022; Wang et al. 2022).Footnote 4 However, our results hold not only for Bitcoin but also for Ethereum, Ripple and Litecoin. Moreover, differently from the extant empirical literature, we point to the role of expected inflation as a possible variable to add when predicting extreme returns of cryptocurrencies.

4.3 Policy implications

The above findings provide crucial implications for the cryptocurrency market participants. Our results suggest that traders and investors can use macroeconomic and financial market indicators such as 5-Year Forward Inflation Expectation Rate, 10-Year Breakeven Inflation Rate, 10-Year Treasury Constant Maturity Minus 3-Month Treasury Constant Maturity, and the VIX index to forecast cryptocurrency tail risk. However, they should care about the combination of variables to include in the model. We recommend that traders and investors train the model on an array of indicators before deciding which to enter into the final model. Based on the relevance of forecasting tail risk for investment, decision-makers must consider various forecasting methods before selecting the appropriate one. Due to the predictive ability of the MCQRNN model compared to the industrially used Historical simulation and the standard ARMA-GARCH model, institutional investors, regulators, and academicians can use this model as an additional tool for forecasting cryptocurrencies’ tail risk. These results, together with increased transparency of market operations (Mikhaylov et al. 2023), may guide policymakers to make informed risk management decisions and investors to choose what other assets to include in the portfolio along with cryptocurrencies.

5 Conclusion

This study has employed the Model Confidence Set technique to determine the optimal model for predicting the returns of the major cryptocurrencies, namely Bitcoin, Ethereum, Ripple, Litecoin and Dogecoin. Our goal has been to jointly forecast the Quantile and Expected Shortfall using Monotone Composite Quantile Regression Neural Network models. First, we constructed six MCQRNN models for the considered cryptocurrencies, and then we compared their performances against the traditional Historical simulation method and ARMA(1,1)-GARCH(1,1) benchmark models. Our results show that the MCQRNN models outperform the benchmark for all the investigated cryptocurrencies. Both the Historical Simulation method and the ARMA(1,1)-GARCH(1,1) model do not enter the superior set of models for the one-day-ahead predictions of cryptocurrency returns. This finding holds for \(\tau =1\%\) and \(\tau =5\%\), both in case of long or short positions in cryptocurrencies. The exception is for the short positions in Dogecoin when \(\tau =1\%\).

As a robustness check, we have forecast the five-day-ahead returns for the five cryptocurrencies using the same models. For all \(\tau\)-quantiles and trading positions in Bitcoin, the MCQRNN models turn out to be superior to the other benchmarks. Furthermore, for all trading positions in Dogecoin, the EPA was not rejected for all the considered MCQRNN models, both with \(\tau =1\%\) and \(\tau =5\%\).

Our results, thus, show that MCQRNN models augmented with economic and financial data allow us to improve the forecasting performances of cryptocurrencies’ extreme returns and hedge the risks of investing in this new type of asset. In particular, the VIX index, Treasury Yield Spread, the 5-Year Forward Inflation Expectation Rate and the 10-Year Breakeven Inflation Rate are relevant when predicting tail risks. Notwithstanding the limitations due to the peculiar nature of cryptocurrencies, that exhibit attributes of both commodities and money, our methodology could be applied to other areas, including the forecast of the e-commerce sector (Moiseev et al. 2023) and CO2 emissions (Algieri et al. 2023; Singh et al. 2022).