1 Introduction

With the creation of multiple financial indices and the increasing number of companies listing their shares on stock exchanges, the demand for forecasting models to quantify inherent risk in each of such instruments has grown considerably. Financial firms and traders are frequently concerned about the risks linked to the increasing volatility of stocks, such as Tesla, Inc. (TSLA) listed in Nasdaq, whose volatility will be predicted in this paper for illustrative purposes due to high volatility present in this stock. Tesla shares rose strongly at the time of the COVID-19 pandemic. However, the month-on-month decrease in the stock is deeper than the Represent the Bitcoin symbol (BTC) price drop. The problem is speculated to be associated with the CEO’s sale of a portion of his stake to help fund new acquisitions. Investors are now questioning whether the sale has been finalized, which is causing high volatility in the stock.

On the other hand, accurate volatility prediction is a crucial part of risk management, most notably in asset allocation to diverse investment portfolios to adequately hedge the underlying risk. High volatility indicates that either the market is at risk or that securities values are unreliable and that capital markets are not performing well enough. Gaining a more accurate understanding of volatility, predicting it with accuracy and controlling portfolio exposure and its impact are essential to effective trading.

Different authors have studied volatility forecasting using statistical and machine learning models. For example, some of them have opted to use wavelet transform support vector machine (WSVM) to outperformed solutions obtained from (SVM) (Tang et al., 2009). Comparative studies on different versions of the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model, such as Cholesky Generalized Autoregressive Conditional Heteroskedasticity (CGARCH), Integrated Generalized Autoregressive Conditional Heteroskedasticity (IGARCH) and Fractionally Integrated Generalized Autoregressive Conditional Heteroskedasticity (FIGARCH), to detect which one delivers the best performance have been developed (Kang et al., 2009). Vector autoregressive models such as the VARIMA and VGARCH model along with aggregated causality factors have also been used in volatility forecasting (Hafner, 2009). The use of particle filtering (PF) techniques has also been applied in volatility forecasting for high frequency data, which contains patterns difficult to capture by classical statistical models (Gaoyu et al., 2009). In addition, techniques based on artificial neural networks together with classical time series models have been used in volatility forecasting and electricity demand to improve prediction accuracy (Hyup Roh, 2007; Luzia et al., 2023).

The main problem with volatility prediction lies in the fact that strong nonlinear patterns may be present in the associated time series. Therefore, linear models such as ARIMA or Exponential Smoothing (ES), which are widely used in statistics, for example to predict earnings before interest, taxes, depreciation and amortization (EBITA) index, mid-long-term electric energy consumption and exchange-rates (de Oliveira & Cyrino Oliveira, 2018; Maria & Eva, n.d.; Rubio et al., 2021) others are unable to predict volatility with high efficiency. These types of models are defined in terms of linear filters considering a finite number of historical data and residuals, to fit linear patterns in a time series. An alternative way to improve goodness-of-fit is to use moving average or discrete Fourier transform as pre-processing (Al Wadi et al., 2010; Alshammari et al., 2020).

Another option is to use machine learning (ML) or deep learning (DL) techniques which are able to capture strong fluctuations in a time series or have important properties associated with long term time dependencies, such as support vector regression (SVR) and long short-term memory (LSTM) models, respectively (Bathla, 2020; Chniti et al., 2017; Guo et al., 2019). A principal concern associated with the use of these approaches is the high computational cost required to train these models. GARCH stochastic processes have shown great effectiveness in predicting time series with implicit heteroscedasticity, such as volatility, and their evaluation does not require large computational resources. Therefore, the use of this technique on its own or in combination with another method will provide predictions with adequate scores for volatility forecasting.

In this paper, a hybridization technique based on the wavelet transform, using the ARIMA and GARCH models, is proposed. This technique uses selected models for each of the time series obtained from the decomposition into high and low frequency signals. Time series with high frequency are predicted using GARCH and those with low frequency are predicted using ARIMA or wavelet ARIMA. In this way, advantages offered by each technique when predicting time series volatility are maximized. A comparative study between this type of hybridization and other related techniques is performed to identify which approach provides the best combination of these predictive models.

The primary objective of this work is to demonstrate that the model, which incorporates wavelet transform, ARIMA, and GARCH, with the application of these models to both approximate and detailed components, yields superior predictions compared to other hybrid models utilized for volatility forecasting. To achieve this goal, various iterations of hybrid models, integrating wavelet transform and the incorporation of predicted residuals, are assessed. This assessment aims to establish that the suggested model provides the most favourable outcomes in terms of error rates and goodness of fit.

Having described the background of this research, the organization of the other sections shall be as follows: Sect. 2 includes a review of the literature associated with the problem addressed in this work. Section 3 describes formally predictive models used for volatility forecasting and associated important definitions. Section 4 introduce data set used for time series volatility estimation, its associated descriptive analysis. Section 5 and 6 present the methodology proposed for the optimal combination of ARIMA and GARCH models, experimental results obtained and discussions related. Section 7 present conclusions, future and on-going work.

2 Literature review

As an alternative to ARIMA, wavelet ARIMA model has been used by different authors for volatility forecasting using diverse approaches. For example, it has been shown that wavelet ARIMA provides better predictions than ARIMA for forecasting Amman stocks. In this case, ARIMA was applied to each of the Daubechies wavelets obtained from the decomposition (Al Wadi et al., 2010). Other authors use for example, maximal overlap discrete wavelet transform (MODWT) and hybrid ARIMA models with feedforward neural network FNN for volatility forecasting. In this case, it was found that the hybrid ARIMA-FNN model provides better predictions in terms of goodness-of-fit. ARIMA-FNN uses first ARIMA to fit linear patterns in the time series, then residuals are predicted using the FNN model. In this fashion, FNN adjusts prediction errors obtained from ARIMA (Xiao et al., 2014).

Although wavelet transform is crucial to improve volatility predictions obtained from ARIMA, there are some strong fluctuations that cannot be captured due to its linear structure. Some authors have opted for using machine learning and deep learning techniques to solve this problem. For example, using ML models such as neural networks and random forests for volatility prediction of the Dow Jones Industrial Average index stocks yields considerable predictive gains, since these models can effectively capture multiple highly nonlinear patterns in a time series, which cannot be fitted by models such as heterogeneous autoregressive (HAR) (Christensen et al., 2021). Support vector machine models dominated the area of machine learning in the 1990s until 2010, displacing techniques based on artificial neural networks. After that, the deep learning-based techniques began to dominate, thanks to the current availability of computing power, advances in computer architectures, as well as the accumulation of big data. However, the SVM model has a great advantage over alternative models, as it is able to capture long-memory nonlinear patterns in time series such as the volatility of the S&P 500 index, it is additionally effective for high dimensionality data processing. SVM has been compared with classical models used in econometrics, for example, multiple versions of the GARCH model, obtaining comparable or better predictions using SVM than those obtained using the GARCH models (Gavrishchaka & Banerjee, 2006).

On the other hand, some authors prefer to use deep learning techniques such as long short-term memory (LSTM) for volatility prediction. It has been found that these techniques provide similar or better predictions in some cases to those provided by the support vector regression (SVR) model, which is highly suggested in the literature for this type of forecasting, as it is effective when it comes to adjusting marked patterns of heteroskedasticity. The use of LSTM is highlighted by the fact that it can be trained using graphics cards to reduce computational cost (Liu, 2019). Recent research has also considered to make use of stacked machine learning models based on artificial neural networks to forecast volatility of the S&P 500 index. In this case, predictions are obtained by evaluating under an activation function, forecasts obtained through Gradient Descent Boosting (GDC), Random Forest (RF) and Support Vector Machine (SVM) models, to obtain a single stacked forecast. This model has proven to be competitive with the classical models used individually (Ramos-Pérez et al., 2019).

Another alternative used to forecast realized volatility is to apply the hybrid ARIMA-GARCH model to each signal obtained from MODWT decomposition. The final forecast is obtained by adding forecasts from each MODWT wave component. Authors highlight advantages of using MODWT as a pre-processor for hybrid models such as ARIMA-GARCH to predict volatility of the Saudi Arabia stock market (Alshammari et al., 2020).

Several successful applications of wavelet transform-based methods has been used in finance applications. One such example is enhancing stock index prediction accuracy using a closed recurrent unit (GRU) neural network in conjunction with adaptive noise decomposition (CEEMDAN-wavelet). The results demonstrate that the GRU-CEEMDAN-wavelet approach yields lower errors compared to ARIMA and individual models (Qi et al., 2023). On the other hand, Li and Tang (2020) proposed the WT-FCD-MLGRU model to enhance accuracy of forecasting financial time series linked to stock indices. This model integrates wavelet transform, filter cycle decomposition, and multilag neural networks. Empirical analysis reveals that the WT-FCD-MLGRU model outperform alternative approaches in forecast precision, showing the lowest error when predicting stock indices, as compared to conventional models as ARIMA and the enhanced SVR machine learning model. Wang and Guo (2020) introduces the hybrid model DWT-ARIMA-GSXGB. The authors use discrete wavelet transform to partition the dataset into approximation and error components. Subsequently, ARIMA is applied to the approximate partial data, while the enhanced XGBoost model (GSXGB) is used for the error data. Through experimental comparisons with stock market data, it was found that the DWT-ARIMA-GSXGB model exhibits lower errors when contrasted with four prediction models: ARIMA, XGBoost, GSXGB, and DWT-ARIMA-XGBoost. A hybrid model which integrates the Empirical Wavelet Transform (EWT) with the Improved Bee Colony Algorithm (ABC), the Extreme Learning Machine (ELM) neural network, and ARIMA has been proposed (Yu et al., 2020). EWT is employed to decompose, clean the data and removing noise, making it suitable for forecasting. Subsequently, the optimized ELM using GPS-EO-ABC and ARIMA are separately applied to generate diverse prediction outcomes, which are then combined with weighting. The optimized ELM demonstrated superior accuracy and stability when compared to the original ELM, ABC-ELM, LSTM, and ANN. This hybrid approach proves to be effective in not only prediction but also efficient in noise reduction and outlier correction within the data.

Each of the studies mentioned previously have proposed novel techniques for volatility forecasting, with a common purpose, to find a technique to get accurate and trustworthy predictions for this type of data, which has a large amount of noise and strong behaviour difficult to adjust, due to different external factors and current market conditions. The literature review did not reveal the existence of a model that combines wavelet transform, ARIMA, and GARCH for volatility prediction. In this proposed approach, ARIMA and GARCH are specifically employed for the approximate and detailed components derived from discrete wavelet transform.

3 Forecasting models

3.1 ARIMA model

ARIMA models, popularized by Box and Jenkins, are a flexible and powerful statistical tool for predictive modelling with time series data (Asteriou & Hall, 2016). Mainly, ARIMA models approximate time series future values as a linear function of past observations and white noise terms. The model consists of three components: non-stationary differences for stationarity, autoregressive model (AR) and moving average (MA) model (Montgomery et al., n.d.).

To define non-stationarity, the backshift operator, B is introduced. A time series, \({y}_{t}\), will be called homogeneous non-stationary if it is non-stationary but its first difference, i.e. \({w}_{t}={y}_{t}-{y}_{t-1}=(1-B){y}_{t}\) or \(d\)th difference, \({w}_{t}={(1-B)}^{d}{y}_{t}\), yields a stationary time series. In addition, \({y}_{t}\) will be called an autoregressive integrated moving average (ARIMA) process of orders p, d, and q, denoted ARIMA (p, d, q) if its \(d\)th difference yields a stationary process ARMA(p, q). Therefore, an ARIMA (p, d, q) can be written as:

$$\Phi \left(B\right){\left(1-B\right)}^{d}{y}_{t}=\delta +\Theta \left(B\right){\varepsilon }_{t},$$
(1)

where:

$$\Phi \left(B\right)=1-\sum_{i=1}^{p}{\phi }_{i}{B}^{i},\Theta \left(B\right)=1-\sum_{i=1}^{q}{\theta }_{i}{B}^{i}$$
(2)

are the backshit operator terms in the AR(p) and MA(q) defined as: \(\Phi \left(B\right){y}_{t}=\delta +{\varepsilon }_{t}\) and \({y}_{t}=\mu +\Phi (B){\varepsilon }_{t}\), with \(\delta =\mu -\phi \mu\), where μ is the mean and \({\varepsilon }_{t}\) the white noise with \(E\left({\varepsilon }_{t}\right)=0\).

Model orders \(p, q\) are determined by the nature of the autocorrelation and partial autocorrelation functions. The model coefficients are calculated using the maximum likelihood method (Box et al., 2008). The best model is identified by diagnostic checks such as the Akaike information criterion (AIC), the Bayes information criterion (BIC) and the Jarque–Bera normality test on the residual error series.

3.2 Wavelet-based ARIMA

The wavelet-based ARIMA is a composite technique, in which the raw data is decomposed first using wavelet transformation (WT) and then forecasting models are applied. The WT decomposes a signal into different time scales. It is defined as a set of basis functions \({\psi }_{a,b}(t)\) that can be generated by translating and scaling the so-called mother wavelet (Daubechies, 1992)

$${\psi }_{a,b}=\frac{1}{\sqrt{a}}\psi \left(\frac{t-a}{a}\right), a>0, -\infty <b<\infty ,$$
(3)

where a is the scale parameter and b the location of the wavelet. There are several types of WT for which different mothers can be used, in this work we will concentrate on the mostly used Discrete Wavelet Transform (DWT).

A DWT is a discrete set of wavelet scales and translations. The DWT is adapted mainly for samples. Although, this transform decomposes the signal into a multi-orthogonal set of wavelets. DWT uses a dyadic grid, where the mother wavelet is scaled by the power two (\(a={2}^{j}\)) and shifted by an integer (\(b=k{2}^{j}\)), where \(k\) is a location index ranges from \(1\) to \({2}^{-j}N\) (\(N\) number of observations) and \(j\) from \(0\) to \(J\) (\(J\) total number of scales). DWT is given by the following equation:

$${\Psi }_{j,k}\left(t\right)={2}^{-j/2}\Psi \left({2}^{-j}t-k\right),$$
(4)

where the coefficients are calculated according to the following expression:

$$W_{{j,k}} = W\left( {2^{j} ,~k2^{j} } \right) = 2^{{ - j/2}} \mathop \int \limits_{{ - \infty }}^{\infty } f\left( t \right)\overline{{\Psi \left( {2^{{ - j}} t - k} \right)}} ~dt$$
(5)

The inverse discrete wavelet transform (IDWT) mean is then calculated to re-construct the original signal from the wavelet coefficients \({W}_{j,k}\) as follows:

$$f\left(t\right)=\sum_{j=-\infty }^{\infty }\sum_{k=-\infty }^{\infty }{W}_{j,k}{\Psi }_{j,k}(t)$$
(6)

There are different mother wavelets in the DWT, for example the Haar wavelet, the Daubechies wavelet, the orthogonal wavelet, the Symlet wavelet, the Meyer wavelet and the Coiflets wavelet (Mallat & Peyré, n.d.). The Haar wavelet is considered the simplest mother wavelet, while the Daubechies wavelet is a set of orthogonal wavelets. In this paper, Daubechies mother wavelets will be used for our first pre-processing. Symlet and Coiflet are defined as a modified version of Daubechies wavelets. On the other hand, Symlet is a Daubechies wavelet with a higher symmetry.

Multi-Resolution Analysis (MRA) also called pyramid algorithm, is defined by a hierarchical representation of DWT (Mallat, n.d.). This is based on the decomposition of the raw data into m levels by translation and convolution of the mother wavelet using low-pass (LP) and high-pass (HP) filters. Detail \((D)\) and approximation \(\left(A\right)\) components are kept using these filters. The signal can be reconstructed through the sum of the last approximation component and every detail component.

The wavelet transform is employed to eliminate noise from the time series, thereby enhancing the stability of the data structure. As an example, consider a time series \({y}_{t}\). By performing Daubechies wavelet filtering on \({y}_{t}\), two sets of filtered and decomposed series are generated, the (LP) and (HP) versions, named the \(i\)-th approximate and detailed components at the time \(t\), denoted by \({\mathcal{A}}_{it}\) and \({\mathcal{D}}_{it}\), respectively (see Fig. 1).

Fig. 1
figure 1

Multi-resolution analysis applied to the original time series \({y}_{t}\), where \({\mathcal{A}}_{it}\) represent the approximation components, and \({\mathcal{D}}_{it}\), the detailed. (HP) is the high pass filter and (LP) is the low pass filter

Each component can be further filtered to obtain a second level for each one. The decomposed time series is predicted by applying the according time series model to each component. For example, the ARIMA model described in Sect. 3.1 can be used to predict the \({\mathcal{D}}_{it}\) and \({\mathcal{A}}_{it}\) components, the final predictive model for the original time series, denoted by w-ARIMA can be obtained by the following addition.

$${\widetilde{y}}_{t}^{\text{w-ARIMA}}={\widetilde{\mathcal{A}}}_{2t}^{\text{ARIMA}}+{\widetilde{\mathcal{D}}}_{2t}^{\text{ARIMA}}+{\widetilde{\mathcal{D}}}_{1t}^{\text{ARIMA}}$$
(7)

where each term of the sum is obtained from the inverse reconstruction of signals (IDWT) (see Fig. 2). This model uses Daubechies filter as pre-processing to reduce noise in the time series data and then apply ARIMA to each component, to obtain higher prediction accuracy than the classical ARIMA.

Fig. 2
figure 2

Flow chart for Python implementation of partitioned models to forecast volatility using PyWavelets, statsmodels.tsa.arima and arch.arch model libraries

3.3 GARCH models

Heteroscedastic conditional autoregressive (ARCH) models first studied by Engle (1982) have been widely used for high volatility forecasting. These models effectively describe the variance changing over time i.e. heteroscedasticity, by employing a deterministic mapping based on historical errors. Formally, the ARCH model of order \(p\ge 0\) is defined as follows:

$${y}_{t}=\sqrt{{h}_{t}}{\varepsilon }_{t}, {h}_{t}={\alpha }_{0}+\sum_{i=1}^{p}{\alpha }_{i}{y}_{t-i}^{2}$$
(8)

The generalized autoregressive conditional heteroskedasticity (GARCH) model can be considered as a generalization of the ARCH model defined by Eq. (9). GARCH aggregates linear combination of conditional variance squares and for orders \(p\ge 0\) and \(q\ge 0\) is given by

$${y}_{t}=\sqrt{{h}_{t}}{\varepsilon }_{t}, {h}_{t}={\alpha }_{0}+\sum_{i=1}^{p}{\alpha }_{i}{y}_{t-i}^{2}+\sum_{j=1}^{q}{\beta }_{j}{h}_{t-j}^{2},$$
(9)

where \({\varepsilon }_{t}\sim {\text{iid}}(0, 1)\), \({\alpha }_{0}>1, {\alpha }_{i}\ge 0, i=\mathrm{1,2},\dots ,p, {\beta }_{j}\ge 0,j=\mathrm{1,2},\dots ,q\),

$$\sum_{i=1}^{p}{\alpha }_{i}+\sum_{j=1}^{q}{\beta }_{j}<1,$$
(10)

and \({\varepsilon }_{t}\), \({y}_{t-j}\) are independent for \(j\ge 1\). The stochastic process defined by Eq. (9) is known as the GARCH process of order p, q and is denoted as GARCH \((p, q)\). Let \({\Sigma }_{t}\) be a \(\sigma\)-field associated with the sequence \(\left\{{\varepsilon }_{t}, {\varepsilon }_{t-1}, \dots ,\right\}\), hence, \({\mathbb{E}}\)(\({\varepsilon }_{t}^{2}\)|\({\Sigma }_{t})={h}_{t}\), i.e. conditional variance of \({h}_{t}\) is time varying rather than constant as volatility time series studied in this work.

Given that linear GARCH models are not efficient in capturing possible asymmetries in a time series, one alternative is to use non-linear GARCH techniques, one of these is the Exponential GARCH approach, denoted as (EGARCH), which is defined as follows:

$$\mathrm{log}\left({h}_{t}\right)={\alpha }_{0}+\sum_{i=1}^{p}\left\{\frac{\left|{\varepsilon }_{t-i}\right|}{\sqrt{{h}_{t-i}}}-\sqrt{\frac{2}{\pi }}\right\}+\omega \frac{{\varepsilon }_{t-i}}{\sqrt{{h}_{t-i}}}+\sum_{j=1}^{q}{\beta }_{j}\mathrm{log}\left({h}_{t-j}\right),$$
(11)

where \({\alpha }_{i}, {\beta }_{j}, i=\mathrm{1,2},\dots , p, j=\mathrm{1,2},\dots ,q\) and \(\omega\) are positive constants. The Akaike information criteria (AIC) and Bayesian information criteria (BIC) were used to find the best model in terms of goodness of fit, as well as to add penalty criteria to the number of selected parameters under minimization of these constants. For estimation of the associated parameters, the maximum likelihood approach (MLE) was used. Forecasts were carried out using one-step-ahead predictions for a fixed forecast horizon.

3.4 Hybrid and wavelet models

3.4.1 Hybrid model

To improve predictions obtained by each predictive model individually, hybrid models are proposed, which exploit the advantages of each model and have been demonstrated to be effective in different applications (Khashei & Bijari, 2011, 2012; Rubio & Alba, 2022; Wang et al., 2013; Zhang, 2003). The first type of hybrid model used in this work consider for example the use of ARIMA to extract linear components in the time series and then forecast the associated residuals, being non-linear data with heteroscedasticity, using for example the GARCH model. With this strategy, prediction of linear and nonlinear patterns will be outperformed, thus obtaining a final forecast with higher accuracy. Hybrid model \({\mathcal{Y}}_{t}\) is then expressed as:

$${\mathcal{Y}}_{t}={\widetilde{\mathcal{L}}}_{t}+{\widetilde{\mathcal{N}}}_{t},$$
(12)

where \({\widetilde{\mathcal{L}}}_{t}\) and \({\widetilde{\mathcal{N}}}_{t}\) correspond to the linear and nonlinear components part of the decomposition of the time series data. Let \({\widetilde{\mathcal{L}}}_{t}\) be the forecast obtained from ARIMA in the volatility time series at \(t\), hence, the corresponding residual \({\varepsilon }_{t}\) are given by

$${\widehat{\varepsilon }}_{t}={\mathcal{Y}}_{t}-{\widehat{\mathcal{L}}}_{t},$$
(13)

and are predicted applying the GARCH model and may be represented as

$${\widehat{\varepsilon }}_{t}={\widehat{\mathfrak{f}}}_{\text{GARCH}}\left({\widehat{\varepsilon }}_{t-1}, {\widehat{\varepsilon }}_{t-2}, \dots , {\widehat{\varepsilon }}_{t-n}\right)+{\Delta }_{t},$$
(14)

where \({\widehat{\mathfrak{f}}}_{\text{GARCH}}\) is the nonlinear mapping corresponding to the GARCH model with random error \({\Delta }_{t}\) and its forecast can be represented as \({\widetilde{\mathcal{N}}}_{t}\). Accordingly, the hybrid model (ARIMA-GARCH) is represented by

$${\widehat{\mathcal{Y}}}_{t}={\widetilde{\mathcal{L}}}_{t}+{\widetilde{\mathcal{N}}}_{t},$$
(15)

where \({\widetilde{\mathcal{L}}}_{t}\) and \({\widetilde{\mathcal{N}}}_{t}\) represent predicted results of both linearity and nonlinearity based on ARIMA and GARCH models, correspondingly. In the present work a second alternative (GARCH-ARIMA) is considered. Similarly, to the definition of the hybrid model presented above, but instead of applying ARIMA as a pre-processing method before to apply GARCH model. Therefore, the most appropriate strategy for predicting volatility using hybrid models is also analysed. In other words, the models associated with the terms \({\widetilde{\mathcal{L}}}_{t}\) and \({\widetilde{\mathcal{N}}}_{t}\) are swapped with the GARCH and ARIMA models, respectively.

3.4.2 Wavelet model

Another technique studied in this work, which also performs a mixed combination of predictive models, is considered with the purpose of performing a comparative study between these partitioned techniques. The technique propose below is related to the approach presented in Sect. 3.2. In this approach, each signal \({\widetilde{\mathcal{A}}}_{it}\), \({\widetilde{\mathcal{D}}}_{it}\) resulting from the discrete wavelet decomposition (DWT) is predicted by applying ARIMA or GARCH model to each component. These predictions are denoted by \({\widetilde{h}}_{t}^{\text{w-ARIMA}}\) or \({\widetilde{h}}_{t}^{\text{w-GARCH}}\), depending on the model used.

This work proposes to use ARIMA and GARCH models in a mixed strategy to predict low and high frequency signals form (DWT). Considering each possible combination in the selection of the models employed for each component, with the objective of identifying which of these methodologies should be considered, for volatility forecasting.

Formally, it is proposed to use models of the type:

$${\widetilde{h}}_{t}^{\text{w-ARIMA-GARCH}}={\widetilde{\mathcal{A}}}_{2t}^{\text{ARIMA}}+{\widetilde{\mathcal{D}}}_{2t}^{\text{GARCH}}+{\widetilde{\mathcal{D}}}_{1t}^{\text{GARCH}},$$

where for each signal obtained from the wavelet decomposition, either high or low frequency, the ARIMA and GARCH models are applied in a mixed form. In our literature review, it was found that the usual way to apply these wavelet models is by using an individual model, ARIMA or GARCH for low-frequency and high-frequency components respectively, however, in this work a comparative study of all possible options is performed. Selection of individual models for each wave is described in Sects. 3.1 and 3.3.

4 Stochastic volatility model

To make accuracy and performance comparisons between the proposed model and one of the currently used for volatility forecasting, the stochastic volatility model (SV) is introduced. Asset prices exhibit fluctuations in volatility over time, characterized by periods of both high and low return variability. Stochastic volatility models capture this phenomenon by incorporating a latent volatility variable, which is modelled as a stochastic process (Hoffman & Gelman, n.d.; Kim et al., 1998).

$$\begin{array}{c}\sigma \sim {\epsilon }_{t}{\text{exp}}\left(\frac{{h}_{t}}{2}\right),\\ {h}_{t+1}=\mu +\phi \left({h}_{t}-\mu \right)+{\delta }_{t}\sigma ,\\ \begin{array}{c}{h}_{1}\sim N\left(\mu ,\frac{\sigma }{\sqrt{1-{\phi }^{2}}}\right),\\ {\epsilon }_{t}, {\delta }_{t}\sim N\left(0, 1\right),\end{array}\end{array}$$

where, \({\epsilon }_{t}\) represent the asset return white noise, at time \(t, {\delta }_{t}\) is the shock on volatility and \({h}_{t}\) is the latent parameter for log volatility. The primary idea of this model is identifying hidden factors that impact alterations in asset values. These undisclosed elements, encompassing shifts in market sentiment, news, or other relevant variables, play a role in determining the extent of market volatility. By representing volatility as a dynamic and stochastic process, the SV model provides a more accurate representation of financial markets, distinguishing it from simpler models that assume a constant level of volatility.

5 Methodology

Prediction of volatility time series has been performed in different works using machine learning and deep learning techniques as well as classical time series models. On the other hand, results obtained using a single forecasting strategy can be improved by using mixed models, combining techniques that have different predictive properties. Hybrid models have been used previously for time series forecasting, usually by decomposing the original data into two components, associated with the linear and nonlinear part of the series. In this manner, the use of two predictive methods is combined, where usually the non-linear part is adjusted using models able to capture strong fluctuations as ANNs for example (Zhang & Zhang, 2018). Another hybrid technique found in the literature consider outputs from a GARCH model as input of an ANN and conversely for forecasting volatility (Lu et al., 2016). Stacking models have also been used to forecast volatility, where predictions are dynamically selected from a set of trained machine learning models based on feature extraction and selection (Aras, 2021).

Some authors proposed to use wavelet transform as a pre-processing to decompose a time series into multiple high and low frequency waves, and depending on the nature of the data different strategies are proposed to forecast each component. For example, D. Liu et al. proposed to use SVM to forecast wind speed based just on the approximate signal. On the other hand, Glosten–Jagannathan–Runkle (GJR)-GARCH model has been used together with wavelet transform, to predict financial returns, in this case each wavelet component is adjusted with the same model (Berger, 2016). To forecast electricity prices, a wavelet transform based on ARIMA have also been used, in this case the prediction is obtained applying ARIMA-GARCH to the approximate signal and GARCH to the detailed signals (Tan et al., 2010).

In this work, a comparative study between hybridization techniques based on residuals decomposition and partitioned strategies according to wavelet transform for volatility forecasting is presented. The strategy what indeed provide better predictions in terms of the accuracy metrics, MAPE, RMSE, and \({R}^{2}\) is proposed. Initially it is proposed to use a technique based on wavelet transform and the ARIMA and GARCH models is address, where ARIMA, being linear, is used to forecast the low frequency wave, while GARCH is used for high frequency waves. For realized volatility forecasting, the use of this strategy was not found in the literature found to date, nor a detail description of the implementation process of this type of models in the selected programming language Python. This choice of models has been found to be the most natural and efficient for volatility forecasting, and consequently, it is the one proposed in this work.

Most of the applications of the wavelet transform found in Python are mostly targeted at digital image processing. Therefore, this paper describes how this implementation was done to serve also as a starting point for those interested in using these types of partitioned models. In addition, given that in the related literature, limited detail is provided regarding the functioning of the associated libraries for the wavelet decomposition and recovery of the original signals. Figure 2 describes the process of implementing the hybridization technique suggested.

One of the main challenges presented during the implementation of the wavelet transform for time series forecasting is that there is limited development of the wavelet transform when using the Python language. Most of the applications of the wavelet transform found in Python are mostly targeted at digital image processing. Therefore, this paper describes how this implementation was done to serve also as a starting point for those interested in using these types of partitioned models. In addition, in the related literature, limited detail is provided regarding the functioning of the associated libraries for the wavelet decomposition and recovery of the original signals. Figure 2 describes the process of implementing the hybridization technique proposed in this work. As can be seen in Fig. 2, to obtain the final prediction, first, the respective decomposition of the daily returns is obtained by using the discrete wavelet transform. Then, by using the discrete wavelet transform inverse, the original signal is reconstructed through approximate and detailed components. Finally, each signal is predicted by using, for example, ARIMA for approximate components and GARCH for detailed components to get the prediction \({\widetilde{h}}_{t}^{\text{w-ARIMA-GARCH}}\).

6 Data

To carry out the empirical part of this investigation, the daily closing prices of TSLA stocks were considered. Data was loaded directly from Yahoo Finance using the free Python API, for a total of 3178 observations (between 2010-07-07 to 2023-02-17). Descriptive parameters are calculated to obtain relevant information related to measures of central tendency, dispersion, kurtosis and skewness of closed prices (see Table 1). The values presented point to an asymmetrical (positive) data distribution, with values of \({Q}_{1}, {Q}_{2}\) and \({Q}_{3}\) relatively close (in the first ten years considered in the study, historical data show low values). Around 75% of the observations show a closed price below USD 27, far from the maximum value observed (USD 409.97) at the end of 2021.

Table 1 Descriptive parameters for TSLA closed prices

Figure 3 shows for the TSLA closed price and trading volume strong fluctuations associated with high volatility. Strong uptrend and a large trading volume ending around 2021 can be identified. In terms of trends, we can identify three clearly distinct moments. Until 2020, historical data shows a relatively constant trend (with low closed price). Globally, there is a marked growth between 2020 and the end of 2021, although with some fluctuations, namely in early 2021 (due to the instability of the financial markets, partly caused by the COVID-19 pandemic). From the end of 2021, the data is marked by an abrupt drop in the closed price.

Fig. 3
figure 3

Time series for TSLA closed price (left) and trading volume (right)

To analyse some features of the closed price time series, Table 2 contains the statistic test and the following hypothesis tests: Normality tests (Jarque–Bera test and Skewness and Kurtosis tests), Stationarity/Existence of unit root (ADF test and KPSS test) and independence test (BDS test). As expected, for any significance level, the normality, stationarity, and independence tests are rejected. In fact, there is statistical evidence to: (i) do not reject the nonnormality of the distribution of the data (with rejection of the null hypothesis, which indicates normal behaviour, in all tests performed); (ii) assume the non-stationarity (due to the null hypothesis not being rejected when doing ADF test, and due to the statistical value corresponding to KPSS being superior to the critical reference values); (iii) infer about the non-iid, since the null hypothesis of the data being iid has been rejected through BDS test.

Table 2 Normality tests (Jarque–Bera test and skewness and kurtosis tests), stationarity/existence of unit root (ADF test and KPSS test) and independence test (BDS test) for TSLA closed prices

According to (Karasan, n.d., 2005), to model volatility we need to calculate the return volatility, which is also known as realized volatility. Realized volatility is the square root of realized variance, which is the sum of squared return. Realized volatility is used to calculate the performance of the volatility prediction method, it is denoted by \({h}_{t}\) in the model description of GARCH (Eq. 9).

7 Numerical results

To continue the study, based on historical data on TSLA closing prices, described in previous section, the time series of the daily returns and realized volatility were obtained. Figure 4 shows daily returns plot, whose values are around the zero mean. As can be seen, the variance is a function of time, which confirms the effect of heteroscedasticity in the time series, and the associated histogram shows that the daily returns have a non-normal distribution. Regarding realized volatility time series, note the peaks of values observed during the period 2020/2021 (instability of the financial markets, partly motivated by the COVID-19 pandemic).

Fig. 4
figure 4

Daily return series (left) and KDE histogram (right) plots for TSLA stock

This detail can be better perceived when evaluating the annual box plots (Fig. 5), where a considerable sample amplitude is observed in 2020 (with several outliers observations above the maximum barrier of the box-plot) and a notable interquartile amplitude, in comparison with the other years.

Fig. 5
figure 5

Realized volatility time series for TSLA stock. Graphical representation of annual box plots

Complementing the graphical representation, some descriptive statistical measures are presented in Table 3. Based on these values, a positive kurtosis (leptokurtic distribution) and a slight asymmetry to the right are observed.

Table 3 Descriptive parameters for TSLA daily returns and realized volatility

As was done in previous section, to analyse some features of the time series, Table 4 contains the statistic test and the following hypothesis tests: Normality tests (Jarque–Bera test and Skewness and Kurtosis tests), Stationarity/Existence of unit root (ADF test and KPSS test) and independence test (BDS test). As expected, for any significance level, the normality is rejected for the two series under study, concluding that it does not have a normal distribution.

Table 4 Normality tests (Jarque–Bera test and skewness and kurtosis tests), stationarity/existence of unit root (ADF test and KPSS test) and independence test (BDS test) for TSLA daily returns and realized volatility

In addition, for both time series, ADF test and KPPS tests confirm that the series is stationary (due to the null hypothesis being rejected when doing ADF test, and due to the statistical value corresponding to KPSS being less to the critical reference values). When testing for Independency, for daily returns it is concluded that the null hypothesis is not rejected for any significance level and therefore the data are i.i.d. For realized volatility time series, there is statistical evidence to infer about the non-iid, since the null hypothesis of the data being iid has been rejected through BDS test. Different normality test such as: Shapiro–Wilk, D’Agostino’s K2 and Anderson–Darling tests, were implemented to verify normality and results are presented in Table 5. As can be seen, based on the null hypothesis:

\({H}_{0}:\) dataset has normal distribution

Table 5 Normality tests on daily returns and realized volatility for TLSA stocks

Shapiro–Wilk, and D’Agostino’s \({K}^{2}\) test yields \(p-\) values far below the significance threshold of 0.05. Therefore, the null hypothesis is rejected, i.e. the distribution of daily returns is not normal. On the other hand, it can be seen that daily returns distributions are peaked around the mean, which is confirmed with positive kurtosis values larger than 2, it means most of the daily return values are concentrated in the mean. Furthermore, this implies a high level of risk but the possibility of higher returns due to large price movements. Since the statistic (42.514), on Anderson–Darling test, is greater than all critical values for different significance levels, non-normality is confirmed.

7.1 Application of ARIMA model

Daily returns and realized volatility for TLSA show stationary signals, with p-values less than 0.05 on Augmented Dickey-Fuller test. Therefore, the null hypothesis

\({H}_{0}\): dataset has nonstationary distribution

is rejected, thus each share shows a stationary signal; consequently, an integration order greater than zero is not necessary. The ARIMA model was trained based on the Box and Jenkins strategy to get the best model in terms of goodness of fit. Optimal orders \(p, d, q\) were obtained based on the Akaike information criterion (AIC), which provides best orders for model quality, based on the minimization of the AIC. The minimum AIC was obtained by doing a loop over a fixed range for \(p, q\) where the order \(d\) was set to zero since no differentiation was necessary given that Dickey-Fuller test confirmed stationarity.

ARIMA was adjusted using the TSLA stock volatility training set composed of all the available historical data, except for the last 21 days considered for the test set or prediction horizon. One-step-ahead forecasting was considered in this work. The Best ARIMA model obtained was ARIMA(4, 0, 4) for realized volatility prediction. Figure 6 shows realized volatility prediction and correlation plots between the original and predicted time series. Goodness of fit and simulation performance can be confirmed with the \({R}^{2}\) and accuracy metrics, such as MAPE, MAE, and RMSE (Table 6). As can be seen, ARIMA shows good performance forecasting implicit linear patterns in TSLA realized volatility, maintaining trends for a long-term horizon; nevertheless, it is essential to use another method to detect non-linear patterns.

Fig. 6
figure 6

Real vs. ARIMA adjustment for TSLA realized volatility (left). Correlation plots for the test set and prediction with its corresponding \({R}^{2}\) value (right)

Table 6 Accuracy metrics MAPE, MAE, RMSE, and \({R}^{2}\) for TSLA realized volatility forecast using ARIMA and GARCH

7.2 Application of GARCH models

Similar to ARIMA training and considering the same prediction horizon, the best orders p, q for the GARCH model were obtained by minimization of the AIC coefficient. Parameters obtained for the ARCH model with a prediction horizon of 21 days were \(p=4, q=2\), considering a zero-volatility process and the exponential version of the GARCH model as optimal parameters after a grid search was applied. Coefficients for the model were obtained based on the maximum likelihood estimation (MLE). Figure 7 shows model adjustment for the real time series in the test set and correlation plots between the original and predicted realized volatility. It is remarkable the difference between goodness of fit between GARCH and ARIMA models, although in terms of error metrics they are only slightly different (Table 6). ARIMA remains as the model with the best goodness-of-fit. Figure 8 shows ARIMA and GARCH model predictions and their correlations with the original time series.

Fig. 7
figure 7

Real vs. GARCH adjustment for TSLA realized volatility (left). Real time series is represented in green and its prediction in blue colour. Correlation plots for the test set and prediction with its corresponding \({R}^{2}\) value (right)

Fig. 8
figure 8

Real vs. ARIMA, and GARCH adjustment for TSLA realized volatility (left). Correlation plots with its corresponding \({R}^{2}\) value (right)

7.3 Application of hybrid and partitioned models

A first approximation to combine ARIMA and GARCH models is to predict realized volatility first using for example ARIMA and then predict its residuals with ARCH to have the first hybrid model implemented in this work denoted by ARIMA-GARCH (see Sect. 3.4). The second hybrid model can be obtained, changing roles for the first and second model applied to the original time series and its forecasted residuals, this model will be denoted by GARCH-ARIMA. This second option has the disadvantage that residuals are extremely small, and therefore the contribution of ARIMA is not significant, resulting in the same accuracy results provided by GARCH (see Table 7).

Table 7 Accuracy metrics MAPE, MAE, RMSE, and \({R}^{2}\) for realized volatility forecast using individual and hybrid models

The second option for combining the econometric models proposed in this work is to consider the wavelet transform as a pre-processing method for the time series of interest. Daily return time series is decomposed in an approximate signal named \({\mathcal{A}}_{t}\) and a detailed signal denoted by \({\mathcal{D}}_{t}\) using the wavelet transform. Usually, detailed signal has higher frequency and therefore in this work it is proposed to use GARCH model to predict this component and ARIMA as a linear model for the low frequency signal. This first approximation will be denoted by W-ARIMA-GARCH. It means, over the signals \({\mathcal{A}}_{t}\) and \({\mathcal{D}}_{t}\), ARIMA and GARCH models are applied respectively. By W-GARCH-ARIMA shall be denoted the opposite, for selecting predictive models for low and high frequency decomposed signals (see Fig. 2). In addition, it was investigated the case where for each of the wavelets obtained by wavelet transform the same model, either ARIMA or GARCH, was applied to study the behaviour of these predictions. Prediction results are presented in Fig. 9, where the original TSLA time series is plotted over the prediction and variance explained by each model is at the right side.

Fig. 9
figure 9

Real vs. ARIMA-GARCH and GARCH-ARIMA predictions for TSLA realized volatility (left). Correlation plot with its corresponding \({R}^{2}\) value (right)

For W-ARIMA model, the following parameters were obtained from the training datasets for TSLA daily returns after a manual grid search in pursuit of the best decomposition and models: Daubechy wave db2 cast the orders (4, 1, 4) and (4, 0, 4) for high and low frequency waves. Figure 10 shows time series decomposition using the db2-Daubechy filter. Next, the W-GARCH model was implemented, and the parameters obtained were: Daubechy wave db3, (4, 4) and (4, 4) for low and high frequency signals considering zero mean model and exponential GARCH. Figure 11 shows the model adjustment for the real time series, with its respective correlation plots. As can be seen, W-GARCH prediction are strongly penalizing the high values of the time series. A possible reason can be to use GARCH process to adjust low frequency signal from the decomposition. Comparing with W-ARIMA an improvement can be seen and confirmed with the \({R}^{2}\) as goodness of fit measure and accuracy metrics (see Table 8). Therefore, applying ARIMA to each component in terms of goodness of fit is better than using GARCH on each one.

Fig. 10
figure 10

Decomposition with db2-Daubechy filter and ARIMA-WAVELET adjustment for each decomposition

Fig. 11
figure 11

Real vs. W-GARCH and W-ARIMA models predictions for TSLA realized volatility. Correlation plots for the test set and prediction with its corresponding \({R}^{2}\) value (right)

Table 8 Accuracy metrics MAPE, MAE, RMSE, and \({R}^{2}\) for realized volatility forecast using the W-ARIMA, W-GARCH models

For W-ARIMA-GARCH model the following were parameters obtained for TSLA daily returns: Daubechy wave db3, orders (4, 2), with optional exogenous regressors and exponential GARCH) version for high frequency signal and (4, 0, 4) as ARIMA orders for low frequency. W-GARCH-ARIMA model yielded the following parameters for the best model: Daubechy wave db2, (4, 1, 4) as ARIMA orders for approximate signal and (4, 2), zero mean, and exponential GARCH for the low frequency component.

Figure 12 shows models adjustment for the real time series of realized volatility and the respective correlation plots. Goodness of fit can be confirmed with \({R}^{2}\) and accuracy metrics MAPE, MAE, and RMSE are resumed in Table 9 for all the models studied in this work. Thus, applying ARCH and ARIMA process in high and low frequency signal provides better predictions in terms of goodness-of-fit. As before, applying a GARCH process on the low frequency decomposition fails to give optimal prediction results.

Fig. 12
figure 12

Real vs. W-GARCH-ARIMA and W-ARIMA-GARCH models predictions for TSLA realized volatility. Correlation plots with its corresponding \({R}^{2}\) value (right)

Table 9 Accuracy metrics MAPE, MAE, RMSE, and \({R}^{2}\) for realized volatility forecast using all the models studied

Finally, the stochastic volatility model was implemented for the same prediction horizon, using rolling-forecast. However, the model proposed in this work showed better error metrics and goodness of fit. For the implementation of the SV model, defined in Sect. 4, \(\phi \sim {\text{uniform}}(-1, 1)\), \(\sigma \sim {\text{cauchy}}(0, 5)\) and \(\mu \sim {\text{cauchy}}(0, 10)\) were considered (Fig. 13).

Fig. 13
figure 13

Real vs. SV model with \(\phi \sim {\text{uniform}}(-1, 1)\), \(\sigma \sim {\text{cauchy}}\left(0, 5\right)\) and \(\mu \sim {\text{cauchy}}(0, 10)\)

8 Conclusions

The present work provides a starting point for any research related to the use of hybrid and wavelet transform models for volatility forecasting. It also proposed ARIMA and GARCH models which do not require significant computational training costs as alternative strategies to predict each signal. This same strategy can be used considering even any other pair of models, e.g. machine learning or deep learning. A detailed description has been made with outlines, which show how the implementation of these models in Python language should be done due to the fact that there is very little in the literature on the subject.

ARIMA models, when used in combination with the wavelet transform to predict realized volatility, will provide better predictions than those obtained with the GARCH model in terms of goodness of fit and each accuracy metric. By using multiple hybridization techniques between the proposed models, prediction scores can be improved. Selection of models to be used in each component of the wavelet transform decomposition is crucial to obtain more accurate predictions. In conclusion, the best alternative is to forecast low frequency signals with linear models such as ARIMA and high frequency signals with models that can capture strong nonlinear patterns, such as the GARCH model.

Models based on wavelet transform are compared with hybrid models, based on forecasting the original time series using for example a linear model and then its residual using a model able to detect strong fluctuations, such as GARCH. The two possible combinations were studied, to reach the conclusion that in order to obtain improved predictions, the model that should always be applied first is the linear model, in this case ARIMA. The hybrid model was compared with each model individually and the partitioned wavelet transform approach. In terms of goodness of fit, the best model for forecasting volatility is the wavelet transform where the approximate signal is predicted with a linear model, in this case ARIMA and the detailed high frequency using GARCH.

The present W-ARIMA-GARCH model has been applied to Tesla stock volatility, to quantify risk, however, it can be applied to any other time series with strong fluctuations, such as cryptocurrencies, for example. One of the advantages of this model is that it takes advantage of the ARIMA and GARCH models to capture different types of patterns in the time series. In addition, another advantage of implementing this type of forecasts is that they can be optimized, allowing to obtain predictions in short CPU time. For example, for algorithmic trading firms, where fast predictions must be executed, to reduce risk and detect market and arbitrage opportunities, it is of high importance.

As future work, it is proposed to make use of this technique in different applications where it is not yet used, to prove its effectiveness. Investigate which alternative models can be used in this type of decomposition. Also, study techniques based on genetic algorithms and stacked neural networks to compare their performance in comparison with the models studied in this work. The use of sentiment analysis can be used to refine a financial decision, therefore as future work is also proposed to study natural language processing and sentiment analysis to obtain even more accurate predictions.