Abstract
Spread prediction models are vital tools to help health authorities and governments fight against infectious diseases such as COVID-19. The availability of historical daily COVID-19 cases, in conjunction with other datasets such as temperature and humidity (which are believed to play a key role in the spread of the disease), has opened a window for researchers to investigate the potential of different techniques to model and thereby expand our understanding of the factors (e.g., interaction or exposure resulting from mobility) that govern the underlying dynamics of the spread. Traditionally, infectious diseases are modeled using compartmental models such as the SIR model. However, this model shortcoming is that it does not account for mobility, and the resulting mixing or interactions, which we conjecture are a key factor in the dynamics of the spread. Statistical analysis and deep learning-based approaches such as autoregressive integrated moving average (ARIMA), gated recurrent units, variational autoencoder, long short-term memory (LSTM), convolution LSTM, stacked LSTM, and bidirectional LSTM have been tested with COVID-19 historical data to predict the disease spread mainly in medium- and high-income countries with good COVID-19 testing capabilities. However, few studies have focused on low-income countries with low access to COVID-19 testing and, hence, highly biased historical datasets. In addition to this, the arguable best model (BiLSTM) has not been tested with an arguably good set of features (people mobility data, temperature, and relative humidity). Therefore, in this study, the multi-layer BiLSTM model is tested with mobility trend data from Google, temperature, and relative humidity to predict daily COVID-19 cases in low-income countries. The performance of the proposed multi-layer BiLSTM is evaluated by comparing its RMSE with the one from multi-layer LSTM (with the same settings as BiLSTM) in four developing countries namely Mozambique, Rwanda, Nepal, and Myanmar. The proposed multi-layer BiLSTM outperformed the multi-layer LSTM in all four countries. The proposed multi-layer BiLSTM was also evaluated by comparing its root mean-squared error (RMSE) with multi-layer LSTM models, ARIMA- and stacked LSTM-based models in eight countries, namely Italy, Turkey, Australia, Brazil, Canada, Egypt, Japan, and the UK. Finally, the proposed multi-layer BiLSTM model was evaluated at the city level by comparing its average relative error with the other four models, namely the LSTM-based model considering multi-layer architecture, Google Cloud Forecasting, the LSTM-based model with mobility data only, and the LSTM-based model with mobility, temperature, and relative humidity data for 7 periods (of 28 days each) in six highly populated regions in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka. The proposed multi-layer BiLSTM model outperformed the multi-layer LSTM model and other previous models by up to 1.6 and 0.6 times in terms of RMSE and ARE, respectively. Therefore, the proposed model enables more accurate forecasting of COVID-19 cases and can support governments and health authorities in their decisions, mainly in developing countries with limited resources.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Models for infectious disease forecasting are essential tools that can support decision-makers on how to fight against outbreak dissemination. COVID-19 caused by Severe Acute Respiratory Syndrome of Coronavirus-2 (SARS-CoV-2) is an example of an infectious disease [1]. Since its first reported case in Wuhan, China, in December 2019, COVID-19 has posed serious challenges to the economy [2] and health systems worldwide [3]. The World Health Organization (WHO) declared COVID-19 a pandemic in March 2020, and since then, several interventions have been imposed to contain and delay its spread. The interventions range from light to middle to radical, i.e., quarantine, eye protection, non-essential business and school and university closures, testing and tracing, international mobility restrictions, bans on large gatherings, social distancing, face mask use, hands washing, and countries and regional lockdowns [4,5,6,7,8,9]. These interventions were implemented to achieve mainly two goals, i.e., avoid the collapse of health system while saving lives and reduce the economic impact of the pandemic on society [2, 3]. Therefore, modeling COVID-19 could provide strategic information, which is crucial for decision-makers such as governments and health authorities.
Traditionally, infectious diseases are modeled using “compartmental” models. The standard compartmental model is SIR, which was originally proposed by Kermack and McKendrick [10]. In this model, the entire population is divided into three compartments, namely S: susceptible, I: infected, and R: removed or recovered, and a set of ordinary differential equations is solved to predict disease dynamics and propagation. For instance, Nguemdjo et al. [11] used SIR models to evaluate the evolution of COVID-19 in Cameroon. The authors evaluated the basic reproduction number of the virus, determined the peak of the infection and the spread-out period of the disease, and simulated the interventions of public health authorities. Their results suggested that the basic reproduction number, sometimes called \(R_o\), of new coronaviruses in Cameroon is about 1.5. In addition to that, the models revealed that the peak of infections would be observed at the end of May, with more than 7% of the population infected. Moreover, the models suggested that the different public health interventions, such as physical distancing and hygiene measures, would flatten the epidemic curve. Cooper et al. [12] implemented SIR models in different communities (China, South Korea, India, Australia, the USA, Italy, and the state of Texas in the USA) to investigate the time evolution of different populations and monitor diverse significant parameters for the spread of the disease. Unlike what happens in the classical SIR model where the total population is considered constant and there is homogeneous mixing of the infected and susceptible, the proposed model does not consider the total population and takes the susceptible population as a variable that can be adjusted at various times to account for newly infected individuals spreading throughout a community. The SIR models implemented in different communities suggested that if proper restrictions and strong policies are implemented to control the infection rates early from the spread of the disease, COVID-19 could be controlled. Anand et al. [13] proposed an extended SIR model to predict the spread of COVID-19 taking into account the percentage of infected individuals who are tested and quarantined. The effects of testing and quarantine greatly affected the dynamics of the model. Similarly, as Anand et al. [13], Wangping et al. [14] developed an extended SIR model to predict the epidemic trend of COVID-19 in Italy. However, while Anand et al. [13] focused on incorporating the effects of testing and quarantine in the model, in Wangping et al. [14] the model adds a transmission modifier parameter to allow a time-varying probability of the transmission rate. Dansana et al. [15] used data from January 22 to April 25, 2020, and implemented a susceptible–exposed–infectious–recovered (SEIR) model to predict the future outbreak of COVID-19 in Hubei Province, China, Taiwan, South Korea, Japan, and Italy. This was done by applying four differential equations to calculate the number of confirmed cases in each country, plotting them on a graph, and then applying polynomial regression with the logic of multiple linear regression to predict the further spread of the pandemic. The calculated and predicted cases showed a perfect match.
However, it is known that people are the main vector of dissemination of infectious diseases. Therefore, incorporating their mobility information into models is crucial for effective infectious disease spread prediction [16]. In addition to this, weather conditions, such as temperature and relative humidity, play a key role in the transmission of COVID-19 [17]. Moreover, machine learning-based approaches have proven their capabilities to provide better data analysis and accurate prediction models.
Therefore, in this study, a deep learning approach is tested for COVID-19 spread prediction. The proposed approach employs bidirectional long-short-term memory (BiLSTM) neural network to predict future COVID-19 cases based on the daily number of positive cases and other factors such as people’s mobility trend, temperature, and relative humidity. The proposed model was tested using datasets from four developing countries, namely Mozambique, Rwanda, Nepal, and Myanmar, and showed good performance compared to the multi-layer LSTM model. Then a comparison with other models (ARIMA and LSTM) was carried out on a national scale in other settings, in eight countries, namely Italy, Turkey, Australia, Brazil, Canada, Egypt, Japan, and the UK. The proposed model was also compared with the LSTM-based model with a different number of features and the SEIR-based model at city level in six regions with high infection records in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka.
The remainder of the paper is organized as follows: Sect. 2 presents the related work that has been carried out. Section 3 describes the data that were used in this study and the proposed method. Section 4 presents the results and discussion. Finally, Sect. 5 concludes and presents future directions of the research.
2 Related work
Many organizations and institutions have been collecting COVID-19 data and making it available for research purposes. Researchers have been taking advantage of this opportunity to develop different LSTM-based neural network models for forecasting COVID-19 cases to support government and health authorities in their decisions. An early study was conducted by Chimmula and Zhang [18]. In this study, the authors used the available COVID-19 dataset up to March 31, 2020, to test a LSTM model for COVID-19 spread prediction. Based on their simulation results, the authors predicted that the outbreak could end around June 2020, which was not the case. Another study was conducted by Arora et al. [19]. In this study, the authors used different LSTM variants, such as deep LSTM, convolutional LSTM, and bidirectional LSTM (BiLSTM), on different Indian datasets (from March to May 2020) to predict the number of positive cases. From the analysis, bidirectional LSTM was found to be very accurate (with an error less than 3%) for short-term prediction (1−3 days). Similarly, Shastri et al. [20] tested stacked LSTM, bi-directional LSTM, and convolutional LSTM for datasets from India and the USA. Convolution LSTM outperformed the other two models and predicted the COVID-19 cases with high accuracy and very little error when forecasting the COVID-19 cases for one month ahead. Zeroual et al. [21] used COVID-19 data (from 22 January 2020 till 17 June 2020) from various countries, namely, Italy, Spain, France, China, the USA, and Australia, to test different models such as LSTM, BiLSTM, gated recurrent units (GRUs), and variational autoencoder (VAE). Their results showed that the VAE model achieved better forecasting performance in comparison to all other models. Dansana et al. [22] used autoregressive integrated moving average (ARIMA) to forecast the total number of confirmed cases and fatalities and Q-Q plots of confirmed cases and fatalities at the beginning of the pandemic. The results showed that the ARIMA model successfully forecasted the future trend for COVID-19 in terms of confirmed cases and fatalities.
Shahid et al. [23] tested ARIMA, support vector regression (SVR), LSTM, and BiLSTM for time series prediction of confirmed cases, deaths, and recoveries in ten major affected countries, namely, Brazil, China, Germany, India, Israel, Italy, Russia, Spain, the UK, and the USA (from January 22 to June 27, 2020). Overall, the BiLSTM model outperforms the other models with the lowest mean absolute error (MAE) and root mean-squared error (RMSE) values of 0.0070 and 0.0077, respectively. Yu et al. [24] developed an online artificial intelligence (AI) system to analyze the dynamic trend of the COVID-19 pandemic by using four models, namely ARIMA, feedforward neural network (FNN), multilayer perceptron (MLP) neural network, and LSTM. The authors integrated two data sets from a total of 171 countries: the dataset from the Oxford COVID-19 Government Response Tracker from the Blavatnik School of Government and the dataset from the COVID-19 Data Repository, which was established by the Johns Hopkins University Center for Systems Science and Engineering. As a result, LSTM demonstrated the best forecast accuracy for Canada with RMSE, MAE, and mean absolute percentage error (MAPE). Devaraj et al. [25] tested ARIMA, LSTM, stacked long short-term memory (SLSTM) and Prophet approaches to predict cumulative confirmed, dead, and recovered global cases. The stacked LSTM algorithm yields higher accuracy with an error of less than 2% as compared to the other considered algorithms for the studied performance metrics. Recently, Xu et al. [26] tested three deep learning models, including CNN, LSTM, and CNN-LSTM, to predict the number of COVID-19 cases in Brazil, India, and Russia. In this study, the authors used data from January 22, 2020, to July 17, 2020, and compared their results with models from Zeroual et al. [21], and the LSTM model outperformed other models.
Almost all the studies presented above focus only on testing different models for COVID-19 cases’ prediction based only on COVID-19 historical data. However, it is known that other factors, such as people’s mobility, temperature, and humidity, play a key role in the COVID-19 spread. There are few studies that have incorporated these factors into deep learning-based approaches to modeling the spread of COVID-19. García-Cremades et al. [27] used COVID-19 data in conjunction with mobility data to test LSTM, GRU, and ARIMA models. Results showed that a multivariate model that includes mobility data provided by Google yields a better forecast. Banerjee and Lian [28] used COVID-19 data and mobility from Google Reports from eight countries, namely Italy, Turkey, Australia, Brazil, Canada, Egypt, Japan, and the UK, to test a LSTM model. The results from the proposed model were compared to the results from the ARIMA model proposed by Hernandez-Matamoros et al. [29], and the LSTM model showed better performance for all the study areas. Rashed and Hirata [30] also used COVID-19 data together with mobility and metrological data from different cities in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka to train a multi-path LSTM model. The results were compared to the data from the Google Cloud Forecast platform [31] and overall showed that the multi-path LSTM model with mobility and meteorological data performed better than other models.
As can be seen from previous studies, many researchers have been mostly focusing on testing the standard LSTM model for COVID-19 spread prediction. However, from the few studies that have compared the standard LSTM model and BiLSTM, the second has proven to be more accurate. Nevertheless, none of the above studies have tested multivariate BiLSTM with COVID-19 cases in conjunction with mobility, temperature, and humidity data for COVID-19 spread forecasting. Secondly, there is a need to evaluate the multivariate BiLSTM model for small and large COVID-19 training datasets. Lastly, there is a lack of studies that evaluate the performance of the multivariate BiLSTM model for large-scale (e.g., country level) and small-scale (e.g., city or district level) COVID-19 spread prediction.
3 Materials and methods
For this study, three types of datasets were used: daily confirmed cases, mobility data, and meteorological data. The time series files that include all these data were downloaded from Google COVID-19 Open Data [32]. Below, a short description of each dataset is provided.
3.1 Daily confirmed cases
The time series data of daily confirmed cases for different study areas represent the daily number of positive cases. Figure 1 shows an example of daily confirmed cases in Mozambique, Rwanda, Nepal, and Myanmar.
From Fig. 1 it is clear that the COVID-19 pandemic has affected countries in different ways. For example, while the pattern of daily cases in Mozambique and Rwanda shows that the pandemic was controlled at the beginning; in Nepal and Myanmar, the daily positive cases reached 1000. In addition to this, Mozambique and Rwanda have controlled the pandemic [33, 34] for the entire study period, with a maximum number of daily cases around 4000 and 1500, respectively. For all four countries, Nepal is the most affected country, with daily positive cases that reach 10,000.
3.2 Mobility data
Since the main vector of COVID-19 transmission is human [35, 36], it is important to incorporate information about human mobility in the prediction models. Therefore, Google COVID-19 Community Mobile Reports, which describe the movement trends over time by geography across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential [37] were considered in the model. Figure 2 presents an example of what the Google Community Mobility Reports look like in Mozambique.
From Fig. 2, it is clear that the mobility trend for different purposes has changed differently over the time compared to the baseline (which represents the median value, for the corresponding day of the week, during the 5-week period, 3 January–06 February 2020) in Mozambique. At the beginning of the pandemic, almost all activities were stopped and people stayed at home. As can be seen in Fig. 2, the percent of mobility for almost all activities went down at the beginning of the pandemic except the percent of mobility trend for places of residence, which increased. Then, the curves went up and down as the pandemic progressed, which can be associated with the different COVID-19 waves.
3.3 Meteorological data
It has been proven that meteorological conditions play an important role in COVID-19 dissemination [38,39,40]. In this study, maximum temperature and relative humidity were selected to be incorporated in the proposed COVID-19 spread prediction model. Figure 3 shows the daily maximum temperature, daily relative humidity, 7-day average maximum temperature, and 7-day average relative humidity for Mozambique, Rwanda, Nepal, and Myanmar.
From Fig. 3 it is clear the seasonality of the temperature and relative humidity in all four countries. While in Mozambique, Nepal, and Myanmar, the average temperature goes up around 40 \(^{\circ }\)C, in Rwanda, the maximum temperature goes up 30 \(^{\circ }\)C. While in Mozambique and Rwanda, the relative humidity goes from 60 to 90%, in Nepal and Myanmar, it goes from 40 to 90%.
3.4 Proposed deep learning-based framework
The proposed framework for data analysis for COVID-19 spread prediction is composed of six main steps: data collection, data preprocessing (data preparation), training the model, evaluating the model, parameter tuning, and making predictions. Figure 4 summarizes the tasks that are done in each step. Some important notations used in this paper with their descriptions are given in Table 1.
3.4.1 Bidirectional LSTM (BiLSTM) model design
A LSTM is a type of recurrent neural network (RNN) designed to learn short- and long-term temporal dependencies in a time series. The key feature is that those networks can store information that can be used for future cell processing. This network was introduced by Hochreiter and Schmidhuber [41] to overcome the vanishing gradient problem exhibited by RNN. The significant element of the LSTM is the cell state, \(C_t\), which is regulated by three different gates, namely the forget gate (\(f_t\)), input gate (\(i_t\)), and the output gate (\(o_t\)), each of which performs a specific task, as shown in Fig. 5.
-
(1)
Forget Gate decides which information the network should keep or remove in the memory cell (expressed in Eq. 1) using a sigmoid function (Eq. 2). The output value is between 0 and 1.
$$\begin{aligned} f_t&= \sigma (W_{xf}*x_t+W_{hf}*h_{t-1}\nonumber \\&\quad +W_{cf}*C_{t-1}+b_f), f_t \in [0,1] \end{aligned}$$(1)$$\begin{aligned} S(\sigma )&= \frac{1}{1 + e^{-1}} \end{aligned}$$(2)where \(h_{t-1}\) is the output in the previous state and \(x_t\) is the input of LSTM unit. \(C_{t-1}\) is the state of memory cell at the last moment. \(W_{xf}\), \(W_{hf}\), and \(W_{cf}\) are the connecting weights of \(x_t\), \(h_{t-1}\), and \(C_{t-1}\) with forget gate f, respectively. \(b_f\) represent the biases values. \(\sigma \) is sigmoid activation function.
-
(2)
Input Gate is expressed by Eq. 3. It decides what new information the network should add to the cell state based on sigmoid function, and create a vector value (\(\tilde{C}_t\)) (expressed in Eq. 4). Then, the old cell is updated with the new values.
$$\begin{aligned}&i_t = \sigma (W_{xi}*x_t+W_{hi}*h_{t-1}+W_{ci}*C_{t-1}+b_i) \end{aligned}$$(3)$$\begin{aligned}&\tilde{C}_t = \tanh (W_{xc}*x_t+W_{hc}*h_{t-1}+W_{cc}*C_{t-1}+b_c), \end{aligned}$$(4)$$\begin{aligned}&C_t =f_t*C_{t-1} + \imath _{t}*\tilde{C}_t \end{aligned}$$(5)where \(C_t\) (Eq. 5) represents the memory cell, \(C_{t-1}\) is the memory cell from previous time step, \(\tanh \) is the tanh activation function, \(W_{xi}\) is the connecting weight of \(x_t\) and \(i_t\), \(W_{hi}\) is the connecting weight of \(h_{i-1}\) and \(i_t\), \(W_{ci}\) is the connecting weight of \(C_{i-1}\) and \(i_t\), \(W_{xc}\) is the connecting weight of \(x_t\) and \(\tilde{C}_t\), \(W_{hc}\) is the connecting weight of \(h_{i-1}\) and \(\tilde{C}_t\), \(W_{cc}\) is the connecting weight of \(C_{i-1}\) and \(\tilde{C}_t\), and \(b_i\) and \(b_c\) are the bias terms.
-
(3)
Output Gate based on the input and cell state values, the output gate decides what should be the output of the cell. It is expressed by Eq. 6 that permits the cell state either to affect other neurons or not. This is achieved by passing the cell state through a tanh layer and multiplying it with the outcome of the output gate to get the ultimate output (\(h_t\)) defined by Eq. 7.
$$\begin{aligned}&o_t = \sigma (W_{xo}*x_t+W_{ho}*h_{t-1}+W_{co}*C_{t-1}+b_o) \end{aligned}$$(6)$$\begin{aligned}&h_t = o_t * \tanh (C_t) \end{aligned}$$(7)where \(\sigma \) is the sigmoid activation function, \(W_{xo}\) is the connecting weight of \(x_t\) and \(o_t\), \(W_{ho}\) is the connecting weight of \(h_{i-1}\) and \(o_t\), \(W_{co}\) is the connecting weight of \(C_{i-1}\) and \(o_t\), \(b_o\) represents the bias vector, and \(C_t\) represents the memory cell.
In LSTM, memory units can ultimately carry the results from observation data X into the prediction of Y. This means that the process occurs in only one direction, forward, which neglects the backward connection and makes the system less efficient. This drawback is overcome by performing the data training phase in a BiLSTM system in two sequential directions: forward and backward. The idea of BiLSTM comes from bidirectional RNN Schuster and Paliwal [42], and it was applied for the first time by Graves and Schmidhuber [43]. Figure 6 presents the basic structure of a BiLSTM.
BiLSTM is a sequence processing model that consists of two LSTMs: one learns the sequence of the input provided, and the second model learns the reverse of that sequence. After training the two models, the results are merged using one of the following functions: concatenation (which is used by default), averaging, sum, or multiplication. In other words, the BiLSTM generates an output vector, \(Y_T\), in which each element is calculated by using the following equation (Eq. 8):
where \(\sigma \) function is used to combine the two output sequences, which can be a concatenating function, a summation function, an average function, or a multiplication function.
By doing this, it increases the amount of information available to the network, improving the context available to the algorithm. The training process of BiLSTM over time can be summarized as below:
-
(1)
Forward Pass
-
i
Run input data for time 1 \(\le \) t \(\le \) T through BiLSTM and evaluate the predicted outputs as calculated in standard LSTM
-
ii
Run forward pass for forward states from t = 1 to t = T and backward states from t = T to t = 1
-
iii
Run the forward pass for output neurons
-
i
-
(2)
Backward pass
-
i
Evaluate the objective function derivative for time 1 \(\le \) t \(\le \) T calculated in forward pass
-
ii
Run backward pass for output neurons
-
iii
Run backward pass for forward states from t = T to t = 1 and backward states from t = 1 to t = T
-
i
-
(3)
Update Weights
Existing studies [44,45,46,47] have shown that BiLSTM architectures with several hidden layers can build up progressively higher levels of representations of sequence data and thus work more effectively. Therefore, in this study, the stacked layers mechanism, which can enhance the power of neural networks, is adopted. As mentioned previously, BiLSTM makes use of both forward and backward dependencies. Figure 7 summarizes the network architecture adopted in this study.
As shown in Fig. 7, in this study, four time series data were used, namely daily COVID-19 cases, daily people mobility, daily temperature, and daily relative humidity. The input is prepared using a sliding window (set to seven), i.e., daily COVID-19 cases are predicted based on the previous seven days of data. The adopted network, apart from the input layer, consists of three layers (two BiLSTM layers and one dense layer). The first and second BiLSTM layers consist of 200 and 100 neurons, respectively. The sequence of LSTM cells is fed forward and backward, as depicted in Fig. 6. The input layer has dimension (7,8), where 7 represents the time steps (window size), and 8 is the number of features. To stack BiLSTM layers, we change the configuration of the prior BiLSTM layer to output a 3D array which is used as input for the subsequent layer (layer 2). Layer 2 processes one input sequence of time steps and outputs a single value for the whole sequence as a 2D array which is used as input for the dense layer. The network is trained using the backpropagation [48] algorithm to minimize the mean-squared error between the actual daily cases and the value predicted by the network. The dense layer in the model is composed of a neuron that is connected to every neuron of its preceding layer, from where it receives the outcome.
3.4.2 Hyper-parameter tuning
Hyper-parameter tuning is conducted with trial and error during the training. In the experiments, an Adam optimizer with a learning rate of 0.1 was used for training the BiLSTM model, and the mean-squared error was used as the loss function. The model was trained with 80% of the data for 200 epochs with a batch size of 64. After that, the models with the selected hyper-parameters were used to forecast the number of daily COVID-19 cases in the testing stage with the remaining 20% of data. The model’s accuracy was verified by comparing the measured data with real data via different statistical indicators, including root mean square error (RMSE) and average relative error (ARE) (see evaluation metric).
3.4.3 Evaluation metrics
For the quantitative evaluation, two statistical measures were used, namely the root mean-squared error (RMSE) given by equation (Eq. 9) and the average relative error (ARE) over a period of n days (Eq. 10).
where \(y_i\) and \(\hat{y_i}\) are the real and predicted positive cases on day i, respectively.
4 Results and discussion
4.1 Results
The proposed multi-layer BiLSTM was used to predict the COVID-19 spread in four developing countries, namely Mozambique, Rwanda, Nepal, and Myanmar. Figure 8 shows the real 7-day average daily COVID-19 cases and the predicted 7-day average daily COVID-19 cases using the proposed stacked BiLSTM framework.
In order to evaluate the proposed multi-layer BiLSTM, its RMSE was compared to RMSE of multi-layer LSTM (with the same setting as BiLSTM) for the four countries. Tables 2 presents the computed RMSE for the two models in four developing countries, namely, Mozambique, Rwanda, Nepal, and Myanmar.
As can be seen from Table 2, RMSE for BiLSTM varies from 38.53 to 344.95. The RMSE of BiLSTM was compared to RMSE of multi-layer LSTM (with the same settings as BiLSTM), and the results show that BiLSTM outperformed multi-layer LSTM in all developing countries (Mozambique, Rwanda, Nepal, and Myanmar). Moreover, the proposed multi-layer BiLSTM model was applied in regions where ARIMA model [29] and stacked LSTM model [28] were tested, namely Turkey, Italy, Australia, Brazil, Canada, Egypt, Japan, and the UK. The RMSE for each country was computed and compared with the RMSE from multi-layer LSTM and Hernandez-Matamoros et al. [29] who used an ARIMA-based model, and Banerjee and Lian [28] who used a stacked LSTM-based model. The percent improvement of the proposed multi-layer BiLSTM model was computed as a relative RMSE to the second-best model for each country. Table 3 shows the RMSE comparison of the proposed multi-layer BiLSTM for different countries.
From Table 3 it is clear that the proposed multi-layer BiLSTM model outperformed the second-best model (multi-layer LSTM) for at least 12% Japan) and 167% at most (in Turkey).
While this first evaluation focused on comparing the RMSE at the country level, i.e., prediction of COVID-19 cases at the macroscale, there was a need to assess the proposed multi-layer BiLSTM-based model at the microscale, i.e., at the city or municipality level. Therefore, seven data samples, each of 28 days from November 17, 2020, to January 7, 2021, for six cities in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka, were considered. For each period, ARE (Eq. 10) was computed using the proposed multi-layer BiLSTM-based model and compared with the results from the LSTM-based approach considering temperature, humidity, and mobility data, and considering only mobility data (both from Rashed and Hirata [30]), and data from Google Cloud Forecasting Platform [31] obtained using a model that integrates machine learning with compartmental disease model (SEIR model) [49]. Figure 9 shows the periods considered in this study, each of 28 days, as suggested by Rashed and Hirata [30]. These periods include mainly data before, during, and after Christmas.
Figure 10 shows the 7-day average real COVID-19 cases and 7-day average predicted COVID-19 cases for each period of prediction in six city in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka.
The colors of periods in Fig. 10 were selected so that they match with those presented in Fig. 9. From Fig. 10, it is clear that Tokyo, Kyoto, and Fukuoka data were predicted very well for all the periods, i.e., the curves for all periods match very well with the ground truth data represented by the black lines (7-day average daily cases). However, data from Osaka and Hyogo were well predicted for the first four periods, but the models did not perform very well for the last three periods. The worst case was registered in Aichi, where the model only performed well for the first three periods and failed to predict for the four last periods.
Table 4 presents the comparison results of ARE between the predicted 7-day average predicted COVID-19 cases and 7-day real COVID-19 cases in different prediction periods (Period-1, Period-2, Period-3, Period-4, Period-5, Period-6, and Period-7) between the proposed multi-layer BiLSTM-based framework, with Google Cloud Forecast (GCF), LSTM-based model considering all the features (\(LSTM_{all}\)), and LSTM-based model considering only the mobility data (\(LSTM_m\)) for each city (Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka)
Table 4 shows that overall, the proposed multi-layer BiLSTM-based model outperformed the other considered models, namely LSTM, GCF, \(LSTM_{all}\), and \(LSTM_m\), in almost all cities, except in Aichi City, where in some periods the \(LSTM_{all}\) performed better. In order to clearly show the overall performance of the proposed model, the relative ARE to the second-best model in each region was computed, and the results are presented in Fig. 11.
4.2 Discussion
Fighting against COVID-19 has been challenging for all countries in the world. Accurate and effective COVID-19 spread prediction models have been demanded to help governments and health authorities make informed decisions. Since the first reported case in Wuhan, Hubei Province (China), in December 2019, researchers from different fields have proposed different models, including machine and deep learning-based approaches. Data availability is one of the key factors in building an accurate deep learning-based prediction model. However, at the beginning of the COVID-19 pandemic, the data were still scarce. Nevertheless, the health authorities always needed some tools to support them in fighting against the spread of this infectious disease. ARIMA-based model [29] and stacked LSTM-based model [28] are just some examples of models that were trained with a limited sample of data. However, the results from these studies provided insight on how statistical analysis and deep learning could help fight against the COVID-19 spread with a limited dataset. Since the predictions are expected to be as close as possible to the real data, in this study, a stacked BiLSTM model was trained with data from the same period as used in Hernandez-Matamoros et al. [29] and Banerjee and Lian [28]. The performance of the proposed multi-layer BiLSTM model was evaluated first by comparing its RMSE with the one from multi-layer LSTM (with the same settings as BiLSTM) in four developing countries. The proposed multi-layer BiLSTM model outperformed the multi-layer LSTM model in all four countries. In addition to this, the proposed multi-layer BiLSTM was also tested in eight other countries, namely Turkey, Italy, Australia, Brazil, Canada, Egypt, Japan, and the UK and the results were compared with those from ARIMA and stacked LSTM (both from previous studies). The BiLSTM model, again, outperformed all other models considered in this study.
Even though different countries all over the world were impacted in different ways in terms of social relations, health systems, and economy [50, 51], the proposed multi-layer BiLSTM model proved its potential in predicting future COVID-19 cases with high accuracy compared to the baseline approaches considered in this study. Table 3 clearly shows that the proposed model outperformed the other three approaches considered in this study, even with the limited dataset used to train the model.
Similarly, at the country level, the cities or municipalities were impacted differently due to differences in measures that were put in place to respond to the pandemic [52]. Therefore, an evaluation at the local scale was carried out. The proposed multi-layer BiLSTM was tested in terms of ARE for six regions in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka. The test was carried out for seven prediction periods (each of 28 days) in each region. From Table 4, it is clear that the proposed multi-layer BiLSTM model outperformed the other four considered models, namely LSTM, GCF, \(LSTM_{all}\), and \(LSTM_m\) in Tokyo, Osaka, Hyogo, Kyoto, and Fukuoka regions in all prediction periods. However, in the Aichi region, the proposed approach performed better in the first three prediction periods but failed in the last four, which were shared by \(LSTM_{all}\), and \(LSTM_m\) models. This might be related to the fact that the number of polymerase chain reaction (PCR) tests decreased during the third and seventh periods because most hospitals were closed from December 28, 2020, to January 3, 2021, suggesting that the new positive cases were reported in the week of January 4, 2021 [30]. The low-quality data from the Aichi region for these periods were not captured by the proposed model, and hence, the results were not good.
5 Conclusions
In this study, a deep learning approach for COVID-19 spread prediction in developing countries was investigated. The investigated model, well known as BiLSTM, with two hidden layers, one of 200 neurons and another of 100 neurons, was trained with data from four developing countries, namely Mozambique, Rwanda, Nepal, and Myanmar. The results from the proposed multi-layer BiLSTM were compared to the multi-layer LSTM (with the same settings as BiLSTM) in terms of RMSE. Multi-layer BiLSTM outperformed multi-layer LSTM model in all four developing countries, namely Mozambique, Rwanda, Nepal, and Myanmar. Moreover, the performance of the multi-layer BiLSTM model was evaluated in other settings through the computation of RMSE for eight countries, namely Turkey, Italy, Australia, Brazil, Canada, Egypt, Japan, and the UK. The proposed multi-layer BiLSTM outperformed the three other models, namely multi-layers LSTM, LSTM and ARIMA trained for the same period from 12% (worst case) to 167% (best case). The proposed multi-layer BiLSTM model was also evaluated for seven periods (of 28 days each) in six cities in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka. The mean ARE showed that the proposed multi-layer BiLSTM model outperformed the four other models trained with the same data, namely, multi-layer LSTM (with same settings as BiLSTM), Google Cloud Forecasting, an LSTM-based model with mobility data only, and a LSTM-based model with mobility, temperature, and relative humidity data in all seven periods in Tokyo, Osaka, Hyogo, Kyoto, and Fukuoka. However, it failed to outperform the LSTM-based model with mobility, temperature, and relative humidity data in the Aichi region in the last four periods. This is related to the quality of data during the Christmas and New Year period since most of the hospitals were closed and the model could not capture these changes. This study focused on testing the BiLSTM model in developing countries. However, there is still a lack of studies that test deep learning techniques in developing countries. Therefore, other studies that implement different techniques in developing countries are needed to certify the robustness of the proposed multi-layer BiLSM model.
6 Data sharing statement
All data analyzed in the present study were gathered from media and government resources, which are freely available in the public domain.
References
Lai, C.-C., Shih, T.-P., Ko, W.-C., Tang, H.-J., Hsueh, P.-R.: Severe acute respiratory syndrome coronavirus 2 (sars-cov-2) and corona virus disease-2019 (COVID-19): the epidemic and the challenges. Int. J. Antimicrobial Agents 55, 105924 (2020)
World Bank: The Global Economic Outlook During the COVID-19 Pandemic: A Changed World. (2020). https://www.worldbank.org/en/news/feature/2020/06/08/the-global-economic-outlook-during-the-COVID-19-pandemic-a-changed-world. Accessed 27 Oct 2020
World Health Organization: Attacks on health care in the context of COVID-19. (2020). https://www.who.int/news-room/feature-stories/detail/attacks-on-health-care-in-the-context-of-COVID-19. Accessed 27 Oct 2020
Ferguson, N., Laydon, D., Nedjati Gilani, G., Imai, N., Ainslie, K., Baguelin, M., Bhatia, S., Boonyasiri, A., Cucunuba Perez, Z., Cuomo-Dannenburg, G., et al. Report 9: impact of non-pharmaceutical interventions (npis) to reduce covid19 mortality and healthcare demand (2020)
Zhang, Y., Jiang, B., Yuan, J., Tao, Y.: The impact of social distancing and epicenter lockdown on the COVID-19 epidemic in mainland China: a data-driven SEIQR model study. MedRxiv (2020)
Chu, D.K., Akl, E.A., Duda, S., Solo, K., Yaacoub, S., Schünemann, H.J., El-harakeh, A., Bognanni, A., Lotfi, T., Loeb, M., et al.: Physical distancing, face masks, and eye protection to prevent person-to-person transmission of sars-cov-2 and COVID-19: a systematic review and meta-analysis. The Lancet 395, 1973–1987 (2020)
Zhou, Y., Xu, R., Hu, D., Yue, Y., Li, Q., Xia, J.: Effects of human mobility restrictions on the spread of COVID-19 in Shenzhen, China: a modelling study using mobile phone data. Lancet Digit. Health 2(8), e417–e424 (2020)
Alzyood, M., Jackson, D., Aveyard, H., Brooke, J.: COVID-19 reinforces the importance of hand washing. J. Clin. Nurs. 29, 2760 (2020)
Kumaravel, S.K., Subramani, R.K., Sivakumar, T.K.J., Elavarasan, R.M., Vetrichelvan, A.M., Annam, A., Subramaniam, U.: Investigation on the impacts of COVID-19 quarantine on society and environment: preventive measures and supportive technologies. 3 Biotech 10(9), 1–24 (2020)
Kermack, W.O., McKendrick, A.G.: A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. Ser. A Contain. Pap. Math. Phys. Charact. 115(772), 700–721 (1927)
Nguemdjo, U., Meno, F., Dongfack, A., Ventelou, B.: Simulating the progression of the COVID-19 disease in Cameroon using sir models. PLoS ONE 15(8), e0237832 (2020)
Cooper, I., Mondal, A., Antonopoulos, C.G.: A sir model assumption for the spread of COVID-19 in different communities. Chaos Solitons Fractals 139, 110057 (2020)
Anand, N., Sabarinath, A., Geetha, S., Somanath, S.: Predicting the spread of COVID-19 using sir model augmented to incorporate quarantine and testing. Trans. Indian Natl. Acad. Eng. 5(2), 141–148 (2020)
Wangping, J., Ke, H., Yang, S., Wenzhe, C., Shengshu, W., Shanshan, Y., Jianwei, W., Fuyin, K., Penggang, T., Jing, L., et al.: Extended sir prediction of the epidemics trend of COVID-19 in Italy and compared with Hunan, China. Front. Med. 7, 169 (2020)
Dansana, D., Kumar, R., Parida, A., Sharma, R., Adhikari, J.D., Le, H.V., Pham, B.T., Singh, K.K., Pradhan, B.: Using susceptible-exposed-infectious-recovered model to forecast coronavirus outbreak. Comput. Mater. Contin. 67(2), 1595–1612 (2021)
Engebretsen, S., Engoe-Monsen, K., Aleem, M.A., Gurley, E.S., Frigessi, A., de Blasio, B.F.: Time-aggregated mobile phone mobility data are sufficient for modelling influenza spread: the case of Bangladesh. medRxiv (2020)
Wang, J., Tang, K., Feng, K., Lin, X., Lv, W., Chen, K., Wang, F.: Impact of temperature and relative humidity on the transmission of covid-19: a modelling study in China and the United States. BMJ Open 11(2), e043863 (2021)
Chimmula, V.K.R., Zhang, L.: Time series forecasting of covid-19 transmission in Canada using lstm networks. Chaos Solitons Fractals 135, 109864 (2020)
Arora, P., Kumar, H., Panigrahi, B.K.: Prediction and analysis of covid-19 positive cases using deep learning models: a descriptive case study of India. Chaos Solitons Fractals 139, 110017 (2020)
Shastri, S., Singh, K., Kumar, S., Kour, P., Mansotra, V.: Time series forecasting of covid-19 using deep learning models: India-USA comparative case study. Chaos Solitons Fractals 140, 110227 (2020)
Zeroual, A., Harrou, F., Dairi, A., Sun, Y.: Deep learning methods for forecasting covid-19 time-series data: a comparative study. Chaos Solitons Fractals 140, 110121 (2020)
Dansana, D., Kumar, R., Das Adhikari, J., Mohapatra, M., Sharma, R., Priyadarshini, I., Le, D.-N.: Global forecasting confirmed and fatal cases of covid-19 outbreak using autoregressive integrated moving average model. Front. Public Health 8, 580327 (2020)
Shahid, F., Zameer, A., Muneeb, M.: Predictions for covid-19 with deep learning models of lstm, gru and bi-lstm. Chaos Solitons Fractals 140, 110212 (2020)
Yu, C.-S., Chang, S.-S., Chang, T.-H., Wu, J.L., Lin, Y.-J., Chien, H.-F., Chen, R.-J., et al.: A covid-19 pandemic artificial intelligence-based system with deep learning forecasting and automatic statistical data acquisition: development and implementation study. J. Med. Internet Res. 23(5), e27806 (2021)
Devaraj, J., Elavarasan, R.M., Pugazhendhi, R., Shafiullah, G., Ganesan, S., Jeysree, A.K., Khan, I.A., Hossain, E.: Forecasting of covid-19 cases using deep learning models: is it reliable and practically significant? Results Phys. 21, 103817 (2021)
Xu, L., Magar, R., Farimani, A.B.: Forecasting covid-19 new cases using deep learning methods. Comput. Biol. Med. 144, 105342 (2022)
García-Cremades, S., Morales-García, J., Hernández-Sanjaime, R., Martínez-España, R., Bueno-Crespo, A., Hernández-Orallo, E., López-Espín, J.J., Cecilia, J.M.: Improving prediction of covid-19 evolution by fusing epidemiological and mobility data. Sci. Rep. 11(1), 1–16 (2021)
Banerjee, S., Lian, Y.: Data driven covid-19 spread prediction based on mobility and mask mandate information. Appl. Intell. 52(2), 1969–1978 (2022)
Hernandez-Matamoros, A., Fujita, H., Hayashi, T., Perez-Meana, H.: Forecasting of covid19 per regions using arima models and polynomial functions. Appl. Soft Comput. 96, 106610 (2020)
Rashed, E.A., Hirata, A.: One-year lesson: machine learning prediction of covid-19 positive cases with meteorological data and mobility estimate in Japan. Int. J. Environ. Res. Public Health 18(11), 5736 (2021)
COVID-19 Public Forecasts: COVID-19 Public Forecasts (2022) https://datastudio.google.com/reporting/8224d512-a76e-4d38-91c1-935ba119eb8f/page/p_fpn1zv84pc?s=jbtyZdv8uwI (accessed on 06-11-2022)
Google Covid-19 Open Data: How to access and use the dataset (2022) https://health.google.com/covid-19/open-data/raw-data Accessed 08 Sept 2022
Africa News: Mozambique has covid-19 under control - WHO (2020) https://www.africanews.com/2020/11/17/mozambique-has-covid-19-under-control-who/ Accessed 06 Mar 2023
Club of Mozambique: Rwanda ranked 6th globally in Covid-19 management (2021) https://clubofmozambique.com/news/rwanda-ranked-6th-globally-in-covid-19-management-183179/ Accessed 06 Mar 2023
Chan, J.F.-W., Yuan, S., Kok, K.-H., To, K.K.-W., Chu, H., Yang, J., Xing, F., Liu, J., Yip, C.C.-Y., Poon, R.W.-S., et al.: A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. The Lancet 395(10223), 514–523 (2020)
Liu, J., Liao, X., Qian, S., Yuan, J., Wang, F., Liu, Y., Wang, Z., Wang, F.-S., Liu, L., Zhang, Z.: Community transmission of severe acute respiratory syndrome coronavirus 2, Shenzhen, China, 2020. Emerg. Infect. Dis. 26(6), 1320 (2020)
Google Community Mobility Reports: see how your community moved differently due to COVID-19 (2022) https://www.google.com/covid19/mobility/ Accessed 08 Nov 2022
Chen, B., Liang, H., Yuan, X., Hu, Y., Xu, M., Zhao, Y., Zhang, B., Tian, F., Zhu, X.: Roles of meteorological conditions in covid-19 transmission on a worldwide scale. MedRxiv (2020)
Ma, Y., Pei, S., Shaman, J., Dubrow, R., Chen, K.: Role of meteorological factors in the transmission of sars-cov-2 in the United States. Nat. Commun. 12(1), 1–9 (2021)
Suman, T.Y., Keerthiga, R., Remya, R.R., Jacintha, A., Jeon, J.: Assessing the impact of meteorological factors on covid-19 seasonality in metropolitan Chennai, India. Toxics 10(8), 440 (2022)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm networks. In: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., volume 4, pp. 2047–2052. IEEE (2005)
Zhou, J., Lu, Y., Dai, H.-N., Wang, H., Xiao, H.: Sentiment analysis of Chinese microblog based on stacked bidirectional lstm. IEEE Access 7, 38856–38866 (2019)
Trueman, T.E., Cambria, E., et al.: A convolutional stacked bidirectional lstm with a multiplicative attention mechanism for aspect category and sentiment detection. Cognit. Comput. 13(6), 1423–1432 (2021)
Althelaya, K.A., El-Alfy, E.-S.M., Mohammed, S.: Stock market forecast using multivariate analysis with bidirectional and stacked (lstm, gru). In: 2018 21st Saudi Computer Society National Computer Conference (NCC), pp. 1–7. IEEE (2018)
Said, A.B., Erradi, A., Aly, H.A., Mohamed, A.: Predicting covid-19 cases using bidirectional lstm on multivariate time series. Environ. Sci. Pollut. Res. 28(40), 56043–56052 (2021)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Arik, S., Li, C.-L., Yoon, J., Sinha, R., Epshteyn, A., Le, L., Menon, V., Singh, S., Zhang, L., Nikoltchev, M., et al.: Interpretable sequence learning for covid-19 forecasting. Adv. Neural Inf. Process. Syst. 33, 18807–18818 (2020)
Singh, J., Singh, J.: COVID-19 and its impact on society. Electron. Res. J. Soc. Sci. Human. 2 (2020)
Kaye, A.D., Okeagu, C.N., Pham, A.D., Silva, R.A., Hurley, J.J., Arron, B.L., Sarfraz, N., Lee, H.N., Ghali, G.E., Gamble, J.W., et al.: Economic impact of covid-19 pandemic on healthcare facilities and systems: international perspectives. Best Pract. Res. Clin. Anaesthesiol. 35(3), 293–306 (2021)
Böhme, K., Besana, F., Lüer, C., Holstein, F., Hans, S., Valenza, A., Caillaud, B., Derszniak-Noirjean, M.: Potential impacts of COVID-19 on regions and cities of the EU. European Committee of the regions (2020)
Acknowledgements
We would like to thank the anonymous reviewers for their insightful comments, which significantly improved the paper.
Funding
The research is funded by the Swedish International Development Agency (SIDA) through the SIDA-Eduardo Mondlane University Program, Sub-Program 1.3.1, the collaborations of KTH Royal Institute of Technology, Lund University, North West University, and Eduardo Mondlane University.
Author information
Authors and Affiliations
Contributions
Silvino Pedro Cumbane contributed to the study conception and design of the article. Silvino Pedro Cumbane also contributed in acquisition of all data, analysis, and writing. In addition, Győző Gidófalvi provided comments for the revision of the paper. Both authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
No potential conflict of interest was reported by the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Cumbane, S.P., Gidófalvi, G. Deep learning-based approach for COVID-19 spread prediction. Int J Data Sci Anal (2024). https://doi.org/10.1007/s41060-024-00558-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41060-024-00558-1