Our method comprises four steps. First, our data in which the overall market sectors and tickers are analyzed, compiled, and extracted. Then the relevant features are processed and extracted. Later, LSTM models are trained and tuned on the aforementioned features to create forecasting models with a range of hyperparameters, steps, and historical periods. Finally, the method and the results are evaluated and validated to be used for future forecasting tasks.
LSTM Models
Originally, LSTM was investigated in 1991 by Hochreiter [22] to deal with the problem of vanishing or exploding gradients, which are very common in RNNs. Generally, RNNs are good at handling the sequence dependencies. LSTMs are a type of RNNs which are better suited for larger architectures and more capable of extracting patterns from large sequences of datasets. LSTMs are also known to have a better response for non-linearity [28]. The COVID-19 and market time-series data show a non-linear behavior that motivates the application of LSTM in this research.
As illustrated in Fig. 1, each LSTM unit consists of three gates: a forget gate which remembers the values over arbitrary time intervals, and two other gates to regulate information into and out of the cell, which is called input and output gates. Each LSTM cell maintains a cell state vector and at each time step, the next LSTM can choose to read from it, write to it or reset the cell. These gates give the ability to control the process of memorizing to LSTM, and therefore, it can avoid long-term dependency [25] which is a key factor of solving problems related to COVID-19 with a short historical dataset. The parameters of the gates are expressed in Eq. 1, where \(\alpha\) expresses the sigmoid function, \(w_x\) expresses the weights for the neurons of gate x, \(h_{t-1}\) expresses the output from last LSTM unit, \(x_t\) expresses current input, and \(b_x\) expresses the biases for the gate x.
$$\begin{aligned} \begin{aligned}&i_t = \alpha (w_i[h_{t-1},x_t]+b_i)&\text {for input gate}\\&\quad f_t = \alpha (w_f[h_{t-1},x_t]+b_i)&\text {for forget gate}\\&\quad o_t = \alpha (w_o[h_{t-1},x_t]+b_o)&\text {for output gate} \end{aligned}. \end{aligned}$$
(1)
In this study, we conduct both single-step and multi-step analysis, predict gold prices at least a day ahead. To have a better sensitivity analysis, we also employ both multi-variate and uni-variate approaches to demonstrate the effectiveness of other variables. There are numerous methods and approaches involving LSTMs to tackle time series forecasting analysis which depends on the dataset and the task. In the following sections, these approaches will be discussed and established to pave the ground for comparison and model selection.
Single-Step LSTM
To predict the gold price a day in advance, single-step feed-forward stacked LSTM networks are used. As mentioned earlier, a series of hyper-parameters and input variables are tested to better understand the effect of feature space on the prediction error. The overall structure of the LSTM networks employed in this section is depicted in Fig. 2. Generally, the higher number of LSTM cells within a layer would allow us to have a longer memory. This means that for longer historic days, we can grow the width of the LSTM network and vice versa to have an optimal fit. The activation layers in all architectures are Rectified Linear Unit (ReLU) (Eq. 2) and the optimizer of choice for LSTM networks are usually ADAM, as opposed to Stochastic Gradient Descent (SGD) which are usually known for robust optimization. After tuning numerous hyper-parameters, the selection of top models has been presented at Table 3.
$$\begin{aligned} \begin{aligned}&\text {ReLU}(x) = \text {max}(0.0,x). \end{aligned} \end{aligned}$$
(2)
Multi-step LSTM
Forecasting for more than one day or step makes it a multi-step forecasting problem. The methods for addressing the multi-step forecasting problem can be categorized under the vector-output sequence prediction approach and encoder–decoder approach. The encoder–decoder approach in addition to the vector output sequence prediction methods is the main focus of this study. To validate the results and have a comparison to other suggested methods in the literature, the results will also be compared to Bidirectional LSTM and CNN–LSTM.
The output for a multi-step forecasting LSTM can be a vector sequence. This can be achieved by simply adding n-output neurons to a simple vanilla LSTM network. Hence, the overall architecture of the multi-step vector output approach is almost identical to Fig. 2.
Encoder–decoder as it is explained by its name, predicts by encoding the inputs and then decoding the output. This approach is used for multi-step time series forecasting [10]. The model was designed to solve sequence to sequence problems like natural language processing [35], text translation, and answering textual questions. Encoder–decoder is also known to yield good results for image classification, image to text, movement classification, and describing images by text tasks [38]. The encoder–decoder approach in LSTM can have many different implementations, suiting different workloads. In general, the overall architecture of the experimented encoder–decoder models in this study can be illustrated as Fig. 3. As illustrated, the model takes and encodes the inputs, then repeats the final state of the encoding layer for all time steps. The decoder comprises at least an LSTM layer and time distributed dense layer to provide the output of the desired shape and structure.
CNN–LSTMs are the combination of CNNs and LSTMs [29] that are common to be utilized in computer vision problems [13]. CNN–LSTMs are also encoder–decoder-based approaches, where the encoding happens in the CNN section. They have been utilized in various tasks in the literature like caption generation [38] and prediction of gold prices [27]. Thus, CNN–LSTM models have also been experimented in this study, and the results are discussed in the following section.
Bidirectional LSTM is inspired by Bidirectional Recurrent Neural Networks [32]. The network learns the sequences both from forward and backward and then concatenates all the data for prediction. Bidirectional LSTM networks can be more beneficial than unidirectional ones in terms of results [20].
Similar to the single-step LSTM approaches, ReLU activation and ADAM optimizer have been used in all architectures. In the following section, we discuss the evaluation metrics, the data required for this analysis and forecasting models, and the result of the analysis.
Validation and Evaluation
For validation purposes, we separate a 90 recent days period of our dataset as a validation set and validated the model on the aforementioned period. In order to accommodate the randomness factor, the tests have been repeated a number of times and only the top-performing seeds and training instances have been referred to.
There are various metrics to calculate the loss in regression and prediction tasks. Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Error(MAE), Mean Squared Logarithmic Error (MSLE) are all techniques to find the difference between the predicted value and the actual value. However, in this study the models are optimized on the MSE values. Hence, the comparison results will favor this metric. This affects both best-model and training checkpoint selection, and the comparison of different LSTM methodologies and models based on validation errors. However, to better understand the results, we take advantage of all of them which are respectfully defined by Eq. 3, where n is for the number of predictions and \(y_i\) is the ground truth of i instance and the \({\hat{y}}_i\) is the predicted results of them.
$$\begin{aligned} \begin{aligned}&\text {RMSE} = \sqrt{\sum _{i=1}^{n}i=\frac{({\hat{y}}_i - y_i)^2}{n}}&\\&\quad \text {MSE} = \frac{1}{n} \sum _{i=1}^{n} (y_i - {\hat{y}}_i)^2&\\&\quad \text {MAE} = \frac{1}{n} \sum _{i=1}^{n} |y_i - {\hat{y}}_i|&\\&\quad \text {MSLE} = \sqrt{\frac{1}{n} \sum _{i=1}^{n}\log (y_i - {\hat{y}}_i )^ 2} \end{aligned}. \end{aligned}$$
(3)
The Data
There are thousands of publicly traded stocks around the world and every one of them can be categorized as a member of the 11 major market sectors [14], including Financial, Utilities, Consumer Discretionary, Consumer Staples, Energy, Healthcare, Industrial, Technology, Telecom, Materials, and Real Estate. These 11 sectors are responding to the key areas of the economy and all the companies in each sector share the same broad focus. The list of corresponding tickers of the 11 market sectors has been collected from ETFdb.Footnote 1 As the list of the market sectors is long and the market values and volume of the tickers vary drastically, only the top 10 tickers with the highest values for each sector have been selected. The market data in this study is gathered for this selected top 10 sector tickers for the past five years from 07/2015 to 07/2020 from Yahoo! Finance, which also covers the recent global COVID-19 pandemic period. On the other hand, we also need COVID-19 pandemic data (including newly infected and total infections) to incorporate in our model and it has been collected from the “’JHU CSSE COVID-19 Data” daily time series. As the market data has been collected from 30-07-2015 to 30-07-2020 and COVID-19 data starts from 22-01-2020, the COVID-19 values prior to 22-01-2020 have been set to zero to match the dimensionality of market dates.
Understanding the Feature Space
The COVID-19 time-series data comprises both the world and USA data separately. To best utilize this data, an aggregation of these cases have also been recorded and added to the feature space. Then the new cases have been calculated as the difference of the daily cases, yielding six features as follows: US-new, US-all, World-wide-new (except the US), World-wide-all (except the US), Total-new, Total-all. Finally, the case numbers are normalized before feeding to the neural networks, making the feature space even better suited for the task. The stock market data has many missing rows as a result of market closures for holidays and weekends. Hence, data was padded to interpolate the missing data. The reason to prefer padding to other interpolation techniques is that it’s intuitive to refer to the final exchange rates and values as the current one. To obtain sector data, the selected ticker data for market close rate, volume, and daily average rates are acquired and calculated. Later, the mean of the corresponding values is taken as the overall sector values and then normalized to better fit the neural networks. To better understand the feature space and the relationships among the features, Fig. 4 illustrates the technology sector symbol average vs. the COVID-19 cases, which hints at a correlation in the data. Hence, a correlation analysis is carried out to better understand these relationships among COVID-19 and the market. This analysis provides further insights into the strength of relationships among the variables and parameters to be used in our model, helping us to confidently include important variables in the feature space and eliminate less important ones.
To prepare the data for analysis, the volume, and the average of daily sector values are calculated. Later, the results are normalized and then the logistic regression model of the values is calculated. As the prediction goal and case of this study are to forecast gold price, the feature space should be prepared. To better understand the feature space, the correlation coefficients are calculated against the daily gold price (as the possible dependent variable) within the last 300 days period. A strong correlation can be subjective and vary from one study to other [37], but in this study, \(r=\pm 0.4\) has been selected as the correlation coefficient threshold to eliminate weak correlations. As shown in Table 1, only some sectors including total new COVID-19 cases in the world, Consumer-staples (closing price), and Technology (closing price) have statistically significant correlations (significance of over \(95\%\) and \(r>0.4\) or \(r<-0.4\)) with gold price. Hence, all the remaining sectors values with non-significant correlations are eliminated to finalize the feature space. Table 1 shows the correlation coefficients across the sectors. In Table 1 the correlations for both ‘Close-gold’ and ‘Average-gold’ are presented to illustrate the difference between the correlations of the closing price of the market and the daily average prices. Overall, we can see similar results on both.
Table 1 Correlation analysis of all sectors vs. gold As shown in Table 1, new COVID-19 cases have stronger correlation (r value) with the market data. On the other hand, it is also observable that the daily market average value (normalized) has a stronger correlation with the COVID-19 pandemic than the market volume (normalized). It should also be reminded that correlation does not necessarily result in causation, yet the strong correlations can bear latent underlying connections, relationships, or meanings. For instance, the energy sector has the strongest (negative) correlation coefficient to the new cases. As the cases rise, the energy sector market value falls. This is very sensible, as can be observed from the recent drop in fuel prices, which hints at causality. The same also applies to the industrial sector, which comes second after the energy sector. The financial sector also has a very strong correlation coefficient, with – 0.954, which comes third in this table. This indicates that the financial sector has also been hit hard by the pandemic at similar levels.