1 Introduction

Models for infectious disease forecasting are essential tools that can support decision-makers on how to fight against outbreak dissemination. COVID-19 caused by Severe Acute Respiratory Syndrome of Coronavirus-2 (SARS-CoV-2) is an example of an infectious disease [1]. Since its first reported case in Wuhan, China, in December 2019, COVID-19 has posed serious challenges to the economy [2] and health systems worldwide [3]. The World Health Organization (WHO) declared COVID-19 a pandemic in March 2020, and since then, several interventions have been imposed to contain and delay its spread. The interventions range from light to middle to radical, i.e., quarantine, eye protection, non-essential business and school and university closures, testing and tracing, international mobility restrictions, bans on large gatherings, social distancing, face mask use, hands washing, and countries and regional lockdowns [4,5,6,7,8,9]. These interventions were implemented to achieve mainly two goals, i.e., avoid the collapse of health system while saving lives and reduce the economic impact of the pandemic on society [2, 3]. Therefore, modeling COVID-19 could provide strategic information, which is crucial for decision-makers such as governments and health authorities.

Traditionally, infectious diseases are modeled using “compartmental” models. The standard compartmental model is SIR, which was originally proposed by Kermack and McKendrick [10]. In this model, the entire population is divided into three compartments, namely S: susceptible, I: infected, and R: removed or recovered, and a set of ordinary differential equations is solved to predict disease dynamics and propagation. For instance, Nguemdjo et al. [11] used SIR models to evaluate the evolution of COVID-19 in Cameroon. The authors evaluated the basic reproduction number of the virus, determined the peak of the infection and the spread-out period of the disease, and simulated the interventions of public health authorities. Their results suggested that the basic reproduction number, sometimes called \(R_o\), of new coronaviruses in Cameroon is about 1.5. In addition to that, the models revealed that the peak of infections would be observed at the end of May, with more than 7% of the population infected. Moreover, the models suggested that the different public health interventions, such as physical distancing and hygiene measures, would flatten the epidemic curve. Cooper et al. [12] implemented SIR models in different communities (China, South Korea, India, Australia, the USA, Italy, and the state of Texas in the USA) to investigate the time evolution of different populations and monitor diverse significant parameters for the spread of the disease. Unlike what happens in the classical SIR model where the total population is considered constant and there is homogeneous mixing of the infected and susceptible, the proposed model does not consider the total population and takes the susceptible population as a variable that can be adjusted at various times to account for newly infected individuals spreading throughout a community. The SIR models implemented in different communities suggested that if proper restrictions and strong policies are implemented to control the infection rates early from the spread of the disease, COVID-19 could be controlled. Anand et al. [13] proposed an extended SIR model to predict the spread of COVID-19 taking into account the percentage of infected individuals who are tested and quarantined. The effects of testing and quarantine greatly affected the dynamics of the model. Similarly, as Anand et al. [13], Wangping et al. [14] developed an extended SIR model to predict the epidemic trend of COVID-19 in Italy. However, while Anand et al. [13] focused on incorporating the effects of testing and quarantine in the model, in Wangping et al. [14] the model adds a transmission modifier parameter to allow a time-varying probability of the transmission rate. Dansana et al. [15] used data from January 22 to April 25, 2020, and implemented a susceptible–exposed–infectious–recovered (SEIR) model to predict the future outbreak of COVID-19 in Hubei Province, China, Taiwan, South Korea, Japan, and Italy. This was done by applying four differential equations to calculate the number of confirmed cases in each country, plotting them on a graph, and then applying polynomial regression with the logic of multiple linear regression to predict the further spread of the pandemic. The calculated and predicted cases showed a perfect match.

However, it is known that people are the main vector of dissemination of infectious diseases. Therefore, incorporating their mobility information into models is crucial for effective infectious disease spread prediction [16]. In addition to this, weather conditions, such as temperature and relative humidity, play a key role in the transmission of COVID-19 [17]. Moreover, machine learning-based approaches have proven their capabilities to provide better data analysis and accurate prediction models.

Therefore, in this study, a deep learning approach is tested for COVID-19 spread prediction. The proposed approach employs bidirectional long-short-term memory (BiLSTM) neural network to predict future COVID-19 cases based on the daily number of positive cases and other factors such as people’s mobility trend, temperature, and relative humidity. The proposed model was tested using datasets from four developing countries, namely Mozambique, Rwanda, Nepal, and Myanmar, and showed good performance compared to the multi-layer LSTM model. Then a comparison with other models (ARIMA and LSTM) was carried out on a national scale in other settings, in eight countries, namely Italy, Turkey, Australia, Brazil, Canada, Egypt, Japan, and the UK. The proposed model was also compared with the LSTM-based model with a different number of features and the SEIR-based model at city level in six regions with high infection records in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka.

The remainder of the paper is organized as follows: Sect. 2 presents the related work that has been carried out. Section 3 describes the data that were used in this study and the proposed method. Section 4 presents the results and discussion. Finally, Sect. 5 concludes and presents future directions of the research.

2 Related work

Many organizations and institutions have been collecting COVID-19 data and making it available for research purposes. Researchers have been taking advantage of this opportunity to develop different LSTM-based neural network models for forecasting COVID-19 cases to support government and health authorities in their decisions. An early study was conducted by Chimmula and Zhang [18]. In this study, the authors used the available COVID-19 dataset up to March 31, 2020, to test a LSTM model for COVID-19 spread prediction. Based on their simulation results, the authors predicted that the outbreak could end around June 2020, which was not the case. Another study was conducted by Arora et al. [19]. In this study, the authors used different LSTM variants, such as deep LSTM, convolutional LSTM, and bidirectional LSTM (BiLSTM), on different Indian datasets (from March to May 2020) to predict the number of positive cases. From the analysis, bidirectional LSTM was found to be very accurate (with an error less than 3%) for short-term prediction (1−3 days). Similarly, Shastri et al. [20] tested stacked LSTM, bi-directional LSTM, and convolutional LSTM for datasets from India and the USA. Convolution LSTM outperformed the other two models and predicted the COVID-19 cases with high accuracy and very little error when forecasting the COVID-19 cases for one month ahead. Zeroual et al. [21] used COVID-19 data (from 22 January 2020 till 17 June 2020) from various countries, namely, Italy, Spain, France, China, the USA, and Australia, to test different models such as LSTM, BiLSTM, gated recurrent units (GRUs), and variational autoencoder (VAE). Their results showed that the VAE model achieved better forecasting performance in comparison to all other models. Dansana et al. [22] used autoregressive integrated moving average (ARIMA) to forecast the total number of confirmed cases and fatalities and Q-Q plots of confirmed cases and fatalities at the beginning of the pandemic. The results showed that the ARIMA model successfully forecasted the future trend for COVID-19 in terms of confirmed cases and fatalities.

Shahid et al. [23] tested ARIMA, support vector regression (SVR), LSTM, and BiLSTM for time series prediction of confirmed cases, deaths, and recoveries in ten major affected countries, namely, Brazil, China, Germany, India, Israel, Italy, Russia, Spain, the UK, and the USA (from January 22 to June 27, 2020). Overall, the BiLSTM model outperforms the other models with the lowest mean absolute error (MAE) and root mean-squared error (RMSE) values of 0.0070 and 0.0077, respectively. Yu et al. [24] developed an online artificial intelligence (AI) system to analyze the dynamic trend of the COVID-19 pandemic by using four models, namely ARIMA, feedforward neural network (FNN), multilayer perceptron (MLP) neural network, and LSTM. The authors integrated two data sets from a total of 171 countries: the dataset from the Oxford COVID-19 Government Response Tracker from the Blavatnik School of Government and the dataset from the COVID-19 Data Repository, which was established by the Johns Hopkins University Center for Systems Science and Engineering. As a result, LSTM demonstrated the best forecast accuracy for Canada with RMSE, MAE, and mean absolute percentage error (MAPE). Devaraj et al. [25] tested ARIMA, LSTM, stacked long short-term memory (SLSTM) and Prophet approaches to predict cumulative confirmed, dead, and recovered global cases. The stacked LSTM algorithm yields higher accuracy with an error of less than 2% as compared to the other considered algorithms for the studied performance metrics. Recently, Xu et al. [26] tested three deep learning models, including CNN, LSTM, and CNN-LSTM, to predict the number of COVID-19 cases in Brazil, India, and Russia. In this study, the authors used data from January 22, 2020, to July 17, 2020, and compared their results with models from Zeroual et al. [21], and the LSTM model outperformed other models.

Fig. 1
figure 1

Example of daily real and 7-day average confirmed cases in Mozambique, Rwanda, Nepal, and Myanmar

Fig. 2
figure 2

Example of daily % of mobility change for different purposes in Mozambique

Almost all the studies presented above focus only on testing different models for COVID-19 cases’ prediction based only on COVID-19 historical data. However, it is known that other factors, such as people’s mobility, temperature, and humidity, play a key role in the COVID-19 spread. There are few studies that have incorporated these factors into deep learning-based approaches to modeling the spread of COVID-19. García-Cremades et al. [27] used COVID-19 data in conjunction with mobility data to test LSTM, GRU, and ARIMA models. Results showed that a multivariate model that includes mobility data provided by Google yields a better forecast. Banerjee and Lian [28] used COVID-19 data and mobility from Google Reports from eight countries, namely Italy, Turkey, Australia, Brazil, Canada, Egypt, Japan, and the UK, to test a LSTM model. The results from the proposed model were compared to the results from the ARIMA model proposed by Hernandez-Matamoros et al. [29], and the LSTM model showed better performance for all the study areas. Rashed and Hirata [30] also used COVID-19 data together with mobility and metrological data from different cities in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka to train a multi-path LSTM model. The results were compared to the data from the Google Cloud Forecast platform [31] and overall showed that the multi-path LSTM model with mobility and meteorological data performed better than other models.

As can be seen from previous studies, many researchers have been mostly focusing on testing the standard LSTM model for COVID-19 spread prediction. However, from the few studies that have compared the standard LSTM model and BiLSTM, the second has proven to be more accurate. Nevertheless, none of the above studies have tested multivariate BiLSTM with COVID-19 cases in conjunction with mobility, temperature, and humidity data for COVID-19 spread forecasting. Secondly, there is a need to evaluate the multivariate BiLSTM model for small and large COVID-19 training datasets. Lastly, there is a lack of studies that evaluate the performance of the multivariate BiLSTM model for large-scale (e.g., country level) and small-scale (e.g., city or district level) COVID-19 spread prediction.

3 Materials and methods

For this study, three types of datasets were used: daily confirmed cases, mobility data, and meteorological data. The time series files that include all these data were downloaded from Google COVID-19 Open Data [32]. Below, a short description of each dataset is provided.

Fig. 3
figure 3

Example of daily maximum temperature, relative humidity, 7-day average maximum temperature, and 7-day average relative humidity for Mozambique, Rwanda, Nepal, and Myanmar

3.1 Daily confirmed cases

The time series data of daily confirmed cases for different study areas represent the daily number of positive cases. Figure 1 shows an example of daily confirmed cases in Mozambique, Rwanda, Nepal, and Myanmar.

From Fig. 1 it is clear that the COVID-19 pandemic has affected countries in different ways. For example, while the pattern of daily cases in Mozambique and Rwanda shows that the pandemic was controlled at the beginning; in Nepal and Myanmar, the daily positive cases reached 1000. In addition to this, Mozambique and Rwanda have controlled the pandemic [33, 34] for the entire study period, with a maximum number of daily cases around 4000 and 1500, respectively. For all four countries, Nepal is the most affected country, with daily positive cases that reach 10,000.

3.2 Mobility data

Since the main vector of COVID-19 transmission is human [35, 36], it is important to incorporate information about human mobility in the prediction models. Therefore, Google COVID-19 Community Mobile Reports, which describe the movement trends over time by geography across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential [37] were considered in the model. Figure 2 presents an example of what the Google Community Mobility Reports look like in Mozambique.

From Fig. 2, it is clear that the mobility trend for different purposes has changed differently over the time compared to the baseline (which represents the median value, for the corresponding day of the week, during the 5-week period, 3 January–06 February 2020) in Mozambique. At the beginning of the pandemic, almost all activities were stopped and people stayed at home. As can be seen in Fig. 2, the percent of mobility for almost all activities went down at the beginning of the pandemic except the percent of mobility trend for places of residence, which increased. Then, the curves went up and down as the pandemic progressed, which can be associated with the different COVID-19 waves.

3.3 Meteorological data

It has been proven that meteorological conditions play an important role in COVID-19 dissemination [38,39,40]. In this study, maximum temperature and relative humidity were selected to be incorporated in the proposed COVID-19 spread prediction model. Figure 3 shows the daily maximum temperature, daily relative humidity, 7-day average maximum temperature, and 7-day average relative humidity for Mozambique, Rwanda, Nepal, and Myanmar.

From Fig. 3 it is clear the seasonality of the temperature and relative humidity in all four countries. While in Mozambique, Nepal, and Myanmar, the average temperature goes up around 40 \(^{\circ }\)C, in Rwanda, the maximum temperature goes up 30 \(^{\circ }\)C. While in Mozambique and Rwanda, the relative humidity goes from 60 to 90%, in Nepal and Myanmar, it goes from 40 to 90%.

3.4 Proposed deep learning-based framework

The proposed framework for data analysis for COVID-19 spread prediction is composed of six main steps: data collection, data preprocessing (data preparation), training the model, evaluating the model, parameter tuning, and making predictions. Figure 4 summarizes the tasks that are done in each step. Some important notations used in this paper with their descriptions are given in Table 1.

Fig. 4
figure 4

Work flow

Table 1 Notations and their meanings in this paper

3.4.1 Bidirectional LSTM (BiLSTM) model design

A LSTM is a type of recurrent neural network (RNN) designed to learn short- and long-term temporal dependencies in a time series. The key feature is that those networks can store information that can be used for future cell processing. This network was introduced by Hochreiter and Schmidhuber [41] to overcome the vanishing gradient problem exhibited by RNN. The significant element of the LSTM is the cell state, \(C_t\), which is regulated by three different gates, namely the forget gate (\(f_t\)), input gate (\(i_t\)), and the output gate (\(o_t\)), each of which performs a specific task, as shown in Fig. 5.

Fig. 5
figure 5

LSTM cell

  1. (1)

    Forget Gate decides which information the network should keep or remove in the memory cell (expressed in Eq. 1) using a sigmoid function (Eq. 2). The output value is between 0 and 1.

    $$\begin{aligned} f_t&= \sigma (W_{xf}*x_t+W_{hf}*h_{t-1}\nonumber \\&\quad +W_{cf}*C_{t-1}+b_f), f_t \in [0,1] \end{aligned}$$
    (1)
    $$\begin{aligned} S(\sigma )&= \frac{1}{1 + e^{-1}} \end{aligned}$$
    (2)

    where \(h_{t-1}\) is the output in the previous state and \(x_t\) is the input of LSTM unit. \(C_{t-1}\) is the state of memory cell at the last moment. \(W_{xf}\), \(W_{hf}\), and \(W_{cf}\) are the connecting weights of \(x_t\), \(h_{t-1}\), and \(C_{t-1}\) with forget gate f, respectively. \(b_f\) represent the biases values. \(\sigma \) is sigmoid activation function.

  2. (2)

    Input Gate is expressed by Eq. 3. It decides what new information the network should add to the cell state based on sigmoid function, and create a vector value (\(\tilde{C}_t\)) (expressed in Eq. 4). Then, the old cell is updated with the new values.

    $$\begin{aligned}&i_t = \sigma (W_{xi}*x_t+W_{hi}*h_{t-1}+W_{ci}*C_{t-1}+b_i) \end{aligned}$$
    (3)
    $$\begin{aligned}&\tilde{C}_t = \tanh (W_{xc}*x_t+W_{hc}*h_{t-1}+W_{cc}*C_{t-1}+b_c), \end{aligned}$$
    (4)
    $$\begin{aligned}&C_t =f_t*C_{t-1} + \imath _{t}*\tilde{C}_t \end{aligned}$$
    (5)

    where \(C_t\) (Eq. 5) represents the memory cell, \(C_{t-1}\) is the memory cell from previous time step, \(\tanh \) is the tanh activation function, \(W_{xi}\) is the connecting weight of \(x_t\) and \(i_t\), \(W_{hi}\) is the connecting weight of \(h_{i-1}\) and \(i_t\), \(W_{ci}\) is the connecting weight of \(C_{i-1}\) and \(i_t\), \(W_{xc}\) is the connecting weight of \(x_t\) and \(\tilde{C}_t\), \(W_{hc}\) is the connecting weight of \(h_{i-1}\) and \(\tilde{C}_t\), \(W_{cc}\) is the connecting weight of \(C_{i-1}\) and \(\tilde{C}_t\), and \(b_i\) and \(b_c\) are the bias terms.

  3. (3)

    Output Gate based on the input and cell state values, the output gate decides what should be the output of the cell. It is expressed by Eq. 6 that permits the cell state either to affect other neurons or not. This is achieved by passing the cell state through a tanh layer and multiplying it with the outcome of the output gate to get the ultimate output (\(h_t\)) defined by Eq. 7.

    $$\begin{aligned}&o_t = \sigma (W_{xo}*x_t+W_{ho}*h_{t-1}+W_{co}*C_{t-1}+b_o) \end{aligned}$$
    (6)
    $$\begin{aligned}&h_t = o_t * \tanh (C_t) \end{aligned}$$
    (7)

    where \(\sigma \) is the sigmoid activation function, \(W_{xo}\) is the connecting weight of \(x_t\) and \(o_t\), \(W_{ho}\) is the connecting weight of \(h_{i-1}\) and \(o_t\), \(W_{co}\) is the connecting weight of \(C_{i-1}\) and \(o_t\), \(b_o\) represents the bias vector, and \(C_t\) represents the memory cell.

In LSTM, memory units can ultimately carry the results from observation data X into the prediction of Y. This means that the process occurs in only one direction, forward, which neglects the backward connection and makes the system less efficient. This drawback is overcome by performing the data training phase in a BiLSTM system in two sequential directions: forward and backward. The idea of BiLSTM comes from bidirectional RNN Schuster and Paliwal [42], and it was applied for the first time by Graves and Schmidhuber [43]. Figure 6 presents the basic structure of a BiLSTM.

Fig. 6
figure 6

Basic structure of a BiLSTM

BiLSTM is a sequence processing model that consists of two LSTMs: one learns the sequence of the input provided, and the second model learns the reverse of that sequence. After training the two models, the results are merged using one of the following functions: concatenation (which is used by default), averaging, sum, or multiplication. In other words, the BiLSTM generates an output vector, \(Y_T\), in which each element is calculated by using the following equation (Eq. 8):

$$\begin{aligned} y_t = \sigma (\overrightarrow{h}, \overleftarrow{h}) \end{aligned}$$
(8)

where \(\sigma \) function is used to combine the two output sequences, which can be a concatenating function, a summation function, an average function, or a multiplication function.

By doing this, it increases the amount of information available to the network, improving the context available to the algorithm. The training process of BiLSTM over time can be summarized as below:

  1. (1)

    Forward Pass

    1. i

      Run input data for time 1 \(\le \) t \(\le \) T through BiLSTM and evaluate the predicted outputs as calculated in standard LSTM

    2. ii

      Run forward pass for forward states from t = 1 to t = T and backward states from t = T to t = 1

    3. iii

      Run the forward pass for output neurons

  2. (2)

    Backward pass

    1. i

      Evaluate the objective function derivative for time 1 \(\le \) t \(\le \) T calculated in forward pass

    2. ii

      Run backward pass for output neurons

    3. iii

      Run backward pass for forward states from t = T to t = 1 and backward states from t = 1 to t = T

  3. (3)

    Update Weights

Existing studies [44,45,46,47] have shown that BiLSTM architectures with several hidden layers can build up progressively higher levels of representations of sequence data and thus work more effectively. Therefore, in this study, the stacked layers mechanism, which can enhance the power of neural networks, is adopted. As mentioned previously, BiLSTM makes use of both forward and backward dependencies. Figure 7 summarizes the network architecture adopted in this study.

Fig. 7
figure 7

Adopted network architecture

As shown in Fig. 7, in this study, four time series data were used, namely daily COVID-19 cases, daily people mobility, daily temperature, and daily relative humidity. The input is prepared using a sliding window (set to seven), i.e., daily COVID-19 cases are predicted based on the previous seven days of data. The adopted network, apart from the input layer, consists of three layers (two BiLSTM layers and one dense layer). The first and second BiLSTM layers consist of 200 and 100 neurons, respectively. The sequence of LSTM cells is fed forward and backward, as depicted in Fig. 6. The input layer has dimension (7,8), where 7 represents the time steps (window size), and 8 is the number of features. To stack BiLSTM layers, we change the configuration of the prior BiLSTM layer to output a 3D array which is used as input for the subsequent layer (layer 2). Layer 2 processes one input sequence of time steps and outputs a single value for the whole sequence as a 2D array which is used as input for the dense layer. The network is trained using the backpropagation [48] algorithm to minimize the mean-squared error between the actual daily cases and the value predicted by the network. The dense layer in the model is composed of a neuron that is connected to every neuron of its preceding layer, from where it receives the outcome.

3.4.2 Hyper-parameter tuning

Hyper-parameter tuning is conducted with trial and error during the training. In the experiments, an Adam optimizer with a learning rate of 0.1 was used for training the BiLSTM model, and the mean-squared error was used as the loss function. The model was trained with 80% of the data for 200 epochs with a batch size of 64. After that, the models with the selected hyper-parameters were used to forecast the number of daily COVID-19 cases in the testing stage with the remaining 20% of data. The model’s accuracy was verified by comparing the measured data with real data via different statistical indicators, including root mean square error (RMSE) and average relative error (ARE) (see evaluation metric).

3.4.3 Evaluation metrics

For the quantitative evaluation, two statistical measures were used, namely the root mean-squared error (RMSE) given by equation (Eq. 9) and the average relative error (ARE) over a period of n days (Eq. 10).

$$\begin{aligned}&RMSE = \sqrt{\sum _{i=1}^{n}\frac{(\hat{y_i} - y_i)^2}{n}} \end{aligned}$$
(9)
$$\begin{aligned}&ARE = \frac{1}{n}\sum _{i=1}^{n} \frac{|y_i - \hat{y_i}|}{y_i} \end{aligned}$$
(10)

where \(y_i\) and \(\hat{y_i}\) are the real and predicted positive cases on day i, respectively.

4 Results and discussion

4.1 Results

The proposed multi-layer BiLSTM was used to predict the COVID-19 spread in four developing countries, namely Mozambique, Rwanda, Nepal, and Myanmar. Figure 8 shows the real 7-day average daily COVID-19 cases and the predicted 7-day average daily COVID-19 cases using the proposed stacked BiLSTM framework.

Fig. 8
figure 8

Real 7-day average daily COVID-19 cases and the predicted 7-day average daily COVID-19 cases using the proposed stacked BiLSTM framework for Mozambique, Rwanda, Nepal, and Myanmar

In order to evaluate the proposed multi-layer BiLSTM, its RMSE was compared to RMSE of multi-layer LSTM (with the same setting as BiLSTM) for the four countries. Tables 2 presents the computed RMSE for the two models in four developing countries, namely, Mozambique, Rwanda, Nepal, and Myanmar.

Table 2 RMSE of proposed BiLSM and multi-layer LSTM for Mozambique, Rwanda, Nepal, and Myanmar

As can be seen from Table 2, RMSE for BiLSTM varies from 38.53 to 344.95. The RMSE of BiLSTM was compared to RMSE of multi-layer LSTM (with the same settings as BiLSTM), and the results show that BiLSTM outperformed multi-layer LSTM in all developing countries (Mozambique, Rwanda, Nepal, and Myanmar). Moreover, the proposed multi-layer BiLSTM model was applied in regions where ARIMA model [29] and stacked LSTM model [28] were tested, namely Turkey, Italy, Australia, Brazil, Canada, Egypt, Japan, and the UK. The RMSE for each country was computed and compared with the RMSE from multi-layer LSTM and Hernandez-Matamoros et al. [29] who used an ARIMA-based model, and Banerjee and Lian [28] who used a stacked LSTM-based model. The percent improvement of the proposed multi-layer BiLSTM model was computed as a relative RMSE to the second-best model for each country. Table 3 shows the RMSE comparison of the proposed multi-layer BiLSTM for different countries.

Table 3 RMSE comparison of the proposed multi-layer BiLSTM model for different countries and percent of improvement of the proposed model compared to the second best model

From Table 3 it is clear that the proposed multi-layer BiLSTM model outperformed the second-best model (multi-layer LSTM) for at least 12% Japan) and 167% at most (in Turkey).

While this first evaluation focused on comparing the RMSE at the country level, i.e., prediction of COVID-19 cases at the macroscale, there was a need to assess the proposed multi-layer BiLSTM-based model at the microscale, i.e., at the city or municipality level. Therefore, seven data samples, each of 28 days from November 17, 2020, to January 7, 2021, for six cities in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka, were considered. For each period, ARE (Eq. 10) was computed using the proposed multi-layer BiLSTM-based model and compared with the results from the LSTM-based approach considering temperature, humidity, and mobility data, and considering only mobility data (both from Rashed and Hirata [30]), and data from Google Cloud Forecasting Platform [31] obtained using a model that integrates machine learning with compartmental disease model (SEIR model) [49]. Figure 9 shows the periods considered in this study, each of 28 days, as suggested by Rashed and Hirata [30]. These periods include mainly data before, during, and after Christmas.

Fig. 9
figure 9

Periods of prediction data (each of 28 days) used to evaluate the BiLSTM model

Figure 10 shows the 7-day average real COVID-19 cases and 7-day average predicted COVID-19 cases for each period of prediction in six city in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka.

Fig. 10
figure 10

Prediction for each period data (each of 28 days) used to evaluate the BiLSTM model

The colors of periods in Fig. 10 were selected so that they match with those presented in Fig. 9. From Fig. 10, it is clear that Tokyo, Kyoto, and Fukuoka data were predicted very well for all the periods, i.e., the curves for all periods match very well with the ground truth data represented by the black lines (7-day average daily cases). However, data from Osaka and Hyogo were well predicted for the first four periods, but the models did not perform very well for the last three periods. The worst case was registered in Aichi, where the model only performed well for the first three periods and failed to predict for the four last periods.

Table 4 presents the comparison results of ARE between the predicted 7-day average predicted COVID-19 cases and 7-day real COVID-19 cases in different prediction periods (Period-1, Period-2, Period-3, Period-4, Period-5, Period-6, and Period-7) between the proposed multi-layer BiLSTM-based framework, with Google Cloud Forecast (GCF), LSTM-based model considering all the features (\(LSTM_{all}\)), and LSTM-based model considering only the mobility data (\(LSTM_m\)) for each city (Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka)

Table 4 Comparison results of ARE between the predicted 7-day average predicted COVID-19 cases and 7-day real COVID-19 cases in different prediction periods (T-1, T-2, T-3, T-4, T-5, T-6, and T-7) between the proposed multi-layer BiLSTM-based framework with LSTM-based model considering multi-layer architecture (LSTM), Google Cloud Forecast (GCF), LSTM-based model considering all the features (\(LSTM_{all}\)), and LSTM-based model considered only the mobility data (\(LSTM_m\)) for each city (Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka)

Table 4 shows that overall, the proposed multi-layer BiLSTM-based model outperformed the other considered models, namely LSTM, GCF, \(LSTM_{all}\), and \(LSTM_m\), in almost all cities, except in Aichi City, where in some periods the \(LSTM_{all}\) performed better. In order to clearly show the overall performance of the proposed model, the relative ARE to the second-best model in each region was computed, and the results are presented in Fig. 11.

Fig. 11
figure 11

Relative of ARE of the proposed model to the second-best model in each region (Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka)

4.2 Discussion

Fighting against COVID-19 has been challenging for all countries in the world. Accurate and effective COVID-19 spread prediction models have been demanded to help governments and health authorities make informed decisions. Since the first reported case in Wuhan, Hubei Province (China), in December 2019, researchers from different fields have proposed different models, including machine and deep learning-based approaches. Data availability is one of the key factors in building an accurate deep learning-based prediction model. However, at the beginning of the COVID-19 pandemic, the data were still scarce. Nevertheless, the health authorities always needed some tools to support them in fighting against the spread of this infectious disease. ARIMA-based model [29] and stacked LSTM-based model [28] are just some examples of models that were trained with a limited sample of data. However, the results from these studies provided insight on how statistical analysis and deep learning could help fight against the COVID-19 spread with a limited dataset. Since the predictions are expected to be as close as possible to the real data, in this study, a stacked BiLSTM model was trained with data from the same period as used in Hernandez-Matamoros et al. [29] and Banerjee and Lian [28]. The performance of the proposed multi-layer BiLSTM model was evaluated first by comparing its RMSE with the one from multi-layer LSTM (with the same settings as BiLSTM) in four developing countries. The proposed multi-layer BiLSTM model outperformed the multi-layer LSTM model in all four countries. In addition to this, the proposed multi-layer BiLSTM was also tested in eight other countries, namely Turkey, Italy, Australia, Brazil, Canada, Egypt, Japan, and the UK and the results were compared with those from ARIMA and stacked LSTM (both from previous studies). The BiLSTM model, again, outperformed all other models considered in this study.

Even though different countries all over the world were impacted in different ways in terms of social relations, health systems, and economy [50, 51], the proposed multi-layer BiLSTM model proved its potential in predicting future COVID-19 cases with high accuracy compared to the baseline approaches considered in this study. Table 3 clearly shows that the proposed model outperformed the other three approaches considered in this study, even with the limited dataset used to train the model.

Similarly, at the country level, the cities or municipalities were impacted differently due to differences in measures that were put in place to respond to the pandemic [52]. Therefore, an evaluation at the local scale was carried out. The proposed multi-layer BiLSTM was tested in terms of ARE for six regions in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka. The test was carried out for seven prediction periods (each of 28 days) in each region. From Table 4, it is clear that the proposed multi-layer BiLSTM model outperformed the other four considered models, namely LSTM, GCF, \(LSTM_{all}\), and \(LSTM_m\) in Tokyo, Osaka, Hyogo, Kyoto, and Fukuoka regions in all prediction periods. However, in the Aichi region, the proposed approach performed better in the first three prediction periods but failed in the last four, which were shared by \(LSTM_{all}\), and \(LSTM_m\) models. This might be related to the fact that the number of polymerase chain reaction (PCR) tests decreased during the third and seventh periods because most hospitals were closed from December 28, 2020, to January 3, 2021, suggesting that the new positive cases were reported in the week of January 4, 2021 [30]. The low-quality data from the Aichi region for these periods were not captured by the proposed model, and hence, the results were not good.

5 Conclusions

In this study, a deep learning approach for COVID-19 spread prediction in developing countries was investigated. The investigated model, well known as BiLSTM, with two hidden layers, one of 200 neurons and another of 100 neurons, was trained with data from four developing countries, namely Mozambique, Rwanda, Nepal, and Myanmar. The results from the proposed multi-layer BiLSTM were compared to the multi-layer LSTM (with the same settings as BiLSTM) in terms of RMSE. Multi-layer BiLSTM outperformed multi-layer LSTM model in all four developing countries, namely Mozambique, Rwanda, Nepal, and Myanmar. Moreover, the performance of the multi-layer BiLSTM model was evaluated in other settings through the computation of RMSE for eight countries, namely Turkey, Italy, Australia, Brazil, Canada, Egypt, Japan, and the UK. The proposed multi-layer BiLSTM outperformed the three other models, namely multi-layers LSTM, LSTM and ARIMA trained for the same period from 12% (worst case) to 167% (best case). The proposed multi-layer BiLSTM model was also evaluated for seven periods (of 28 days each) in six cities in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka. The mean ARE showed that the proposed multi-layer BiLSTM model outperformed the four other models trained with the same data, namely, multi-layer LSTM (with same settings as BiLSTM), Google Cloud Forecasting, an LSTM-based model with mobility data only, and a LSTM-based model with mobility, temperature, and relative humidity data in all seven periods in Tokyo, Osaka, Hyogo, Kyoto, and Fukuoka. However, it failed to outperform the LSTM-based model with mobility, temperature, and relative humidity data in the Aichi region in the last four periods. This is related to the quality of data during the Christmas and New Year period since most of the hospitals were closed and the model could not capture these changes. This study focused on testing the BiLSTM model in developing countries. However, there is still a lack of studies that test deep learning techniques in developing countries. Therefore, other studies that implement different techniques in developing countries are needed to certify the robustness of the proposed multi-layer BiLSM model.

6 Data sharing statement

All data analyzed in the present study were gathered from media and government resources, which are freely available in the public domain.