Introduction

The coronavirus disease (COVID-19) is a severe, ongoing, novel pandemic that officially emerged in Wuhan, China, in December 2019. As of January 21, 2021, it had affected 219 countries and territories with almost 100 million cases and over 2 million deaths [77]. At the time of writing, the countries most significantly affected include both advanced and developing countries, such as Brazil, France, India, Italy, Russia, Spain, the UK, and the US. From October to December 2020, several European countries, including Italy, saw a worrisome surge of COVID-19 infections.

Italy was the first European country to be severely impacted by COVID-19, and it remained one of the main epicenters of the pandemic for approximately 2 months, i.e., from mid-February 2020 to mid-April 2020. After that first peak, the pandemic curve progressively decreased until mid-August 2020. However, the spread of infection accelerated again in the late Summer and early Fall of 2020, and this second surge continues today. As of January 21, 2021, Italy has suffered 84,202 deaths and 2,428,221 cases.

The likelihood of new and consecutive COVID-19 waves is real, and efforts to study the pandemic’s trajectory are imperative to purchase medical devices and healthcare facilities and to manage health centers, clinics, hospitals, and ordinary and intensive care beds.

Thus, the first goal of this paper is to provide short-term and mid-term forecasts for the number of patients hospitalized with COVID-19 during the second wave of COVID-19 infections, i.e., during the period after October 13, 2020. COVID-19-related hospitalization trends offer a clear picture of the overall pressure on the national healthcare system. Moreover, models fitted to hospitalized patients are usually more reliable and accurate than models fitted to confirmed cases [30].Footnote 1 The paper’s second goal is to compare and investigate the accuracy of several statistical methods.

In particular, I estimated four time series forecast techniques and all of their feasible hybrid combinations: the autoregressive moving average (ARIMA) model, innovations state space models for exponential smoothing (ETS), the neural network autoregression (NNAR) model, and the trigonometric exponential smoothing state space model with Box–Cox transformation, ARMA errors, and trend and seasonal components (TBATS).

The rest of this paper is organized as follows. “Related literature” reviews the relevant literature while “Materials and methods” presents the data used in the analysis and discusses the empirical strategy. “Evaluation metrics” presents the evaluation metrics used to measure the performance of the models. “Results and discussion” discusses the main findings and policy implications. Finally, “Conclusions” provides some conclusive considerations.

Related literature

From the beginning of 2020, an increasing body of literature has employed various approaches to forecast the spread of the COVID-2019 outbreak [9, 22, 26, 58, 73, 78, 79, 83, 85]. The most frequently used were ARIMA models [3, 8, 14, 62], ETS models [13, 44], artificial neural network (ANN) models [55, 75], TBATS models [68, 71], models derived from the susceptible–infected–removed (SIR) basic approach [22, 26, 58, 78, 85], and hybrid models [15, 29, 68, 69]. The implementation and comparison of these approaches—with the exception of mechanistic–statistical models (such as SIR)—represents the core of this paper.

Ala’raj et al. [2] utilized a dynamic hybrid model based on a modified susceptible–exposed–infected–recovered–dead (SEIRD) model with ARIMA corrections of the residuals. They provided long-term forecasts for infected, recovered, and deceased people using a US COVID-19 dataset, and their model had a remarkable ability to make accurate predictions. Using a nonseasonal ARIMA model, Ceylan [14] made short-term predictions of cumulative confirmed cases after April 15, 2020, for France, Italy, and Spain. The forecasts showed low mean absolute percentage errors (MAPE) and seemed to be sufficiently reliable and suitable for the short-term epidemiological analysis of COVID-19 trends.

Hasan [29] proposed a hybrid model that incorporates ensemble empirical mode decomposition (EEMD) and neural networks to forecast real-time global COVID-19 cases for the period after May 18, 2020. The analysis showed that the ANN-EEMD approach was quite promising and outperformed traditional statistical methods, such as regression analysis and moving average.

Ribeiro et al. [65] provided short-term estimates of COVID-19 cumulative confirmed cases in Brazil by employing multiple approaches and selecting several models, such as ARIMA, cubist regression (CUBIST), random forest (RF), ridge regression (RIDGE), support vector regression (SVR), and stacking-ensemble learning (SEL). The models’ reliabilities were evaluated based on the improvement index, mean absolute error (MAE), and symmetric MAPE criteria. The analysis demonstrated that SVR and SEL performed best, but all models exhibited good forecasting performances.

Using ARIMA, TBATS, their statistical hybrid, and a mechanistic mathematical model combining the best of the previous models, Sardar et al. [68] attempted to forecast daily COVID-19 confirmed cases across India and in five different states (Delhi, Gujarat, Maharashtra, Punjab, and Tamil Nadu) from May 17, 2020, until May 31, 2020. The ensemble model showed the best prediction skills and suggested that COVID-19 that daily COVID-19 cases would significantly increase in the considered forecast window and that lockdown measures would be more effective in states with the highest percentages of symptomatic infection.

Wieczorek et al. [75] implemented deep neural network architectures, which learned by using a Nesterov-accelerated adaptive moment (Nadam) training model, to forecast cumulative confirmed COVID-19 cases in several countries and regions. The predictions, which referred to different time windows, revealed that the models had an extremely high level of accuracy (approximately 87.7% for most regions but, in some cases, reaching almost 100%).

Talkhi et al. [71] attempted to forecast the number of COVID-19 confirmed infections and deaths in Iran between August 15, 2020, and September 14, 2020, using several single and hybrid models. The extreme learning machine (ELM) and hybrid ARIMA–NNAR models were the most suitable for forecasting confirmed cases, while the Holt–Winters (HW) approach outperformed the others in predicting death cases.

Finally, Table 1 reports 30 international studies that utilized single or hybrid ARIMA, ETS, neural network, and TBATS models to forecast the transmission patterns of COVID-19 across the world.

Table 1 30 selected international studies that utilized single or hybrid ARIMA, ETS, neural network, and TBATS models.

Materials and methods

The data used in this article, which include 236 observations, referred to the real-time number of COVID-19 hospitalizations of patients with mild symptoms and patients assigned to the ICU in Italy from February 21, 2020, to October 13, 2020. I extracted the data from the official Italian Ministry of Health’s website (www.salute.gov.it). The confirmed COVID-19-related hospitalization trends appear in Fig. 1.

Fig. 1
figure 1

Source: Italian Ministry of Health [43]

Patients hospitalized with mild symptoms and in the ICU from February 21, 2020 to October 13, 2020.

The data showed that the number of COVID-19 patients hospitalized with mild symptoms and the number of COVID-19 patients assigned to the ICU reached an initial peak on April 4, 2020. They then followed a downward trend until mid-August before accelerating again from the end of September 2020 to mid-October 2020. Recognizing that the use of only one model is never wise and may lead to unreliable forecasts [68], I computed the forecasts by employing different statistical techniques and their combinations. Specifically, linear ARIMA models, ETS models, linear and nonlinear NNAR models, TBATS models, and their feasible hybrid combinations were examined.

ARIMA models, which were first proposed by Box and Jenkins [11], represent one of the most widely used frameworks for epidemic/pandemic and disease time series predictions [66, 84]. Considering only the linear trend of a time series, they are able to capture both nonseasonal and seasonal patterns of that series. The following should be noted regarding the nonseasonal component of ARIMA models: (i) the autoregressive (AR) process aims to forecast a time series using a linear combination of its past values; (ii) the differencing (I) is required to make the time series stationary by removing (or mitigating) trend or seasonality (if any), and (iii) the moving average (MA) process aims to forecast future values using a linear combination of previous forecast errors. The seasonal component is similar to the nonseasonal component, but implies backshifts of the seasonal period, i.e., adding the seasonal parameters to the AR, I, and MA components, which allows the model to handle most of the seasonal patterns in the real-world data. Therefore, the final seasonal model can be denoted as ARIMA (p,q,d)(P,Q,D)m, where m is the seasonal period and the lowercase and uppercase letters indicate the number of nonseasonal and seasonal parameters for each of its three components, respectively [34], Sect. 8).

The ETS class of models was introduced in the late 1950s [12, 31, 76] to consider different combinations of trend and seasonal components. The basic ETS model consists of two main equations: a forecast equation and a smoothing equation. By integrating these two equations into an innovation state space model, which may correspond to the additive (A) or multiplicative (M) error assumption, it is possible to obtain an observation/measurement equation and a transition/state equation, respectively.Footnote 2 The first equation describes the observed data while the second equation describes the behavior of the unobserved states. The states refer to the level, trend, and seasonality. The trend and seasonal components may be none (N), additive (A), additive damped (Ad),Footnote 3 or multiplicative (M), resulting in a wide range of model combinations. The final model assumes the form of a three-character string (Z,Z,Z), where the first letter identifies the error assumption of the state space model, the second letter identifies the trend type, and the third letter identifies the season type. These models are able to produce a time series forecast by using the weighted average of its past values and adding more weight to recent observations [34], Sect. 7, [40].

NNAR models can be viewed as a network of neurons or nodes that depict complex nonlinear relationships and functional forms. In a basic neural network framework, the neurons are organized in two layers: (i) the bottom layer identifies the original time series, and (ii) the top layer identifies the predictions. The resulting model is equivalent to a simple linear regression and becomes nonlinear only when an intermediate layer with “hidden neurons” is included. For seasonal data, NNAR models can be described with the notation NNAR (p,P,k)m, where m is the seasonal period, p denotes the number of nonseasonal lagged inputs for the linear AR process, P represents the seasonal lags for the AR process, and k indicates the number of nodes/neurons in the hidden layer [34], Sect. 11.3).

Finally, TBATS models are a class of models that combine different approaches: trigonometric terms for modeling seasonality, Box–Cox transformation [10] for addressing heterogeneity, ARMA errors for addressing short-term dynamics, damping (if any) trends, and seasonal components. Therefore, TBATS models have several properties: (i) they deal well with very complex seasonal patterns, which might, for example, exhibit daily, weekly, and annual patterns simultaneously; (ii) they are able to consider time series nonlinear patterns, and iii) they can handle any type of autocorrelation in the residuals [34], Sect. 11.1, [71].

The combination of different times series forecast methods maximizes the chance of capturing seasonal, linear, and nonlinear patterns [60, 82] and is especially useful for predicting real-world phenomena, such as the COVID-19 pandemic, which are characterized by complex dynamics [7]. Well established from the seminal work of Bates and Granger [6], combining techniques with unique properties could allow models to achieve better performance and forecast accuracy.Footnote 4 The models were calculated by using the following analytical procedures:

  • ARIMA models were detected by applying the “auto.arima()” function included in the package “forecast” (in the R environment) and developed by Hyndman and Khandakar [35]. This function followed sequential steps to identify the best model, i.e., the number of p parameters of the autoregressive process (AR), the order i of differencing (I), the number of q parameters of the MA, and the number of the parameters of the seasonal component. It combined unit root testsFootnote 5 and the minimization of the following estimation methods: the bias-corrected Akaike’s information criterion (AICc)Footnote 6 and the maximum likelihood estimation (MLE). The unit root tests identified the order of differencing while the AICc and the MLE methods identified the order and the values of the parameters (respectively) of the seasonal and nonseasonal AR and MA processes;

  • ETS models were identified by using the “ets()” function included in the package “forecast” (in the R environment) and developed by Hyndman et al. [39].Footnote 7 In particular, I applied the Box–Cox [10] transformation to the data before estimating the model and then used the AICc metric to determine if the trend type was damped or not. The final three-character string identifying method (Z,Z,Z) was selected automatically;

  • NNAR models were identified via the “nnetar()” function included in the package “forecast” (in the R environment) written by Hyndman [33].Footnote 8 I proceeded as follows: (i) first, the Box–Cox transformation [10] was applied to the data before estimating the model; (ii) second, the optimal number of nonseasonal p lags for the AR(p) process was obtained by using the AICc metric; (iii) third, the seasonal P lags for the AR process were set to 1Footnote 9; and (iv) finally, the optimal number of neurons was identified using the formula \(k=\frac{(p+P+1)}{2}\) [34], Sect. 11.3);

  • TBATS models were identified using the “tbats()” function included in the package “forecast” (in the R environment) as described in De Livera et al. [17]. The optimal Box–Cox transformation parameter, ARMA (p,q) order, damping parameter, and number of Fourier terms were selected using the Akaike’s information criterion (AIC) metric;Footnote 10

  • Hybrid models were identified via the “hybridModel()” function included in the “forecastHybrid” package (in the R environment) developed by Shaub and Ellis.Footnote 11 The individual time series forecasting methods were combined as follows: (i) first, the Box–Cox power transformation [10] was applied to the inputs to increase the plausibility of the normality assumption; and (ii) then, the individual models were combined using both equal weights and cross-validated errors (“cv.errors”), which gave greater weight to the models that performed relatively better. In fact, since the best weighting procedure has not been established, I adopted a parsimonious approach and chose the one that performed better. Specifically, I tested the overall goodness-of-fit of all models with four common forecast accuracy measures: MAE, MAPE, mean absolute scaled error (MASE), and root mean square error (RMSE).

The estimated basic equation for the ARIMA was the following [16]:

$${\Delta }^{d}{y}_{t}={\phi }_{1}{\Delta }^{d}{y}_{t-1}+\dots {\phi }_{p}{\Delta }^{d}{y}_{t-p}+{\gamma }_{1}{\varepsilon }_{t-1}+\dots {\gamma }_{q}{\varepsilon }_{t-q}+{\varepsilon }_{t},$$
(1)

where \({\Delta }^{d}\) is the second difference operator, \({y}_{t}\) indicates the predicted values, p is the lag order of the AR process, \(\phi\) is the coefficient of each parameters p, q is the order of the MA process, \(\gamma\) is the coefficient of each parameter q, and \({\varepsilon }_{t}\) denotes the residuals of the errors at time t.

The estimated equations for the basic ETS (A,N,N) model with additive errors were the following [34], Sect. 7):

$$\mathrm{F}\mathrm{o}\mathrm{r}\mathrm{e}\mathrm{c}\mathrm{a}\mathrm{s}\mathrm{t}\;\mathrm{e}\mathrm{q}\mathrm{u}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}:{\widehat{y}}_{t+1|t}={l}_{t}.$$
(2)
$$\mathrm{S}\mathrm{m}\mathrm{o}\mathrm{o}\mathrm{t}\mathrm{h}\mathrm{i}\mathrm{n}\mathrm{g}\;\mathrm{e}\mathrm{q}\mathrm{u}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}:{l}_{t}={l}_{t-1}+\alpha \left({y}_{t}-{l}_{t-1}\right),$$
(3)

where \({l}_{t}\) is the new estimated level, \({\widehat{y}}_{t+1|t}\) denotes each one-step-ahead prediction for time t+1 which results from the weighted average of all the observed data, \(0\le \alpha \le 1\) is the smoothing parameter, which controls the rate of decrease of the weights, and \({y}_{t}-{l}_{t-1}\) is the error at time t. Hence, each forecasted observation is the sum of the previous level and an error, and each type of error, additive or multiplicative, corresponds to a specific probability distribution. For a model with additive errors, as is this case here, errors are assumed to follow a normal distribution. Thus, Equations (2) and (3), respectively, can be rewritten as follows:

$$\mathrm{O}\mathrm{b}\mathrm{s}\mathrm{e}\mathrm{r}\mathrm{v}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}\;\mathrm{e}\mathrm{q}\mathrm{u}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}:{y}_{t}={l}_{t-1}+{\varepsilon}_{t},$$
(4)
$$\mathrm{T}\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{s}\mathrm{i}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}\;\mathrm{e}\mathrm{q}\mathrm{u}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}:{l}_{t}={l}_{t-1}+\alpha {\varepsilon}_{t.}$$
(5)

Equations (4) and (5) represent the innovation state space models that underlie the exponential smoothing methods.

The basic form of the neural network autoregression equation was the following [34], Sect. 11.3):

$${y}_{t}=f\left({y}_{t-1}\right)+{\varepsilon}_{t},$$
(6)

where \({y}_{t}\) indicates the predicted values, \({y}_{t-1}{=\left({y}_{t-1,}{y}_{t-2}\dots ,{y}_{t-n}\right)}^{{'}}\) is a vector containing the lagged values of the observed data, f is the neural network with n hidden neurons in a single layer, and \({\varepsilon }_{t}\) is the error at time t. A simple graphical example of a nonlinear neural network is shown in Fig. 2.

Fig. 2
figure 2

A neural network with four inputs and an intermediate layer with three hidden neurons

Finally, the basic equation of the TBATS model took the following form [17]:

$${y}_{t}^{({\omega })}={\mathrm{l}}_{t-1}+\phi {b}_{t-1}+\sum _{i=1}^{T}{s}_{t-{m}_{i}}^{(i)}+{d}_{t},$$
(7)

where \({y}_{i}^{(\omega )}\) indicates the Box–Cox transformation parameter (ω) applied to the observation \({y}_{t}\) at time t, \({l}_{t}\) is the local level, \(\phi\) is the damped trend, b is the long-run trend, T denotes the seasonal pattern, \({s}_{t}^{(i)}\) is the ith seasonal component,Footnote 12\({m}_{i}\) denotes the seasonal periods, and \({d}_{t}\) indicates an ARMA (p,q) process for residuals.

Evaluation metrics

The main metrics used to compare the performances of the single and hybrid prediction models were MAE, MAPE, MASE, and RMSE. The formulae used to calculate each of these metrics appear below (Eqs. 811):

$$\mathrm{M}\mathrm{A}\mathrm{E}=\frac{1}{n}\sum _{i=1}^{n}\left|{y}_{i}-{\hat{y}}_{i}\right|,$$
(8)
$$\mathrm{M}\mathrm{A}\mathrm{P}\mathrm{E}=\frac{1}{n}\sum _{\;i=1}^{n}\frac{\left|{y}_{i}-{\hat{y}}_{i}\right|}{{y}_{i}}{*}100\mathrm{\%},$$
(9)
$$\mathrm{M}\mathrm{A}\mathrm{S}\mathrm{E}=\frac{1}{n}\sum _{i=1}^{n}\left(\frac{\left|{y}_{i}-{\hat{y}}_{i}\right|}{\frac{1}{n-1}{\sum }_{i=2}^{n}\left|{y}_{i}-{\hat{y}}_{i}-1\right|}\right),$$
(10)
$$\mathrm{R}\mathrm{M}\mathrm{S}\mathrm{E}=\sqrt{\frac{1}{n}\sum _{i=1}^{n}{\left({y}_{i}-{{\hat{y}}}_{i}\right)}^2},$$
(11)

where n represents the number of observations, \({y}_{i}\) denotes the actual values, and \(\widehat{{y}_{i}}\) indicates the predicted values. Specifically, MAE and RMSE are both scale-dependent measures, although based on different errors. MAE is easier to interpret because minimizing it leads to predictions of the median, while minimizing RMSE leads to predictions of the mean. In fact, if the first metric is based on absolute errors, the second is based on squared errors. MAPE is probably the most widely employed error measure [27, 47], and unlike MAE and RMSE, it is not scale-dependent because it is based on percentage errors. Thus, it has the advantage of being a unit-free metric. However, it also requires some critical considerations. For example, it can lead to biased forecasts because it gives infinite or undefined results when one or more time series data point equals 0, and it puts a heavier penalty on negative errors (i.e., when predicted values are higher than actual values) than on positive errors. Finally, MASE, which was proposed by Hyndman and Koehler [36], is a scale-free error metric and probably the most versatile and reliable measure of forecast accuracy. It is superior to MAPE in that it does not give infinite or undefined values and can be used to compare forecast accuracy both on single and multiple time series.Footnote 13 Since each model thus entails specific strengths and disadvantages, I opted for the prudent approach—evaluating the output of all of them.

Results and discussion

Table 2 reports the best selected parameters for the single modelsFootnote 14 while Tables 3 and 4 include the forecast accuracy measures for the single and hybrid models.Footnote 15 For patients hospitalized with mild symptoms (Table 2), the optimal single models were the seasonal ARIMA (1,2,3) (0,0,1)7, ETS (A,Ad,N),Footnote 16 NNAR (7,1,4)7, and TBATS (0.428, {2,2}, 1, {< 7,2 >}).Footnote 17 For patients hospitalized in the ICU (Table 2), the optimal single models were the seasonal ARIMA (1,2,2)(0,0,1)7, ETS (A,A,N),Footnote 18 NNAR (6,1,4)7, and (T)BATS (0.427,{0,0},1,−).Footnote 19 The hybrid models were derived by combining the optimal single models with equal weights, which proved to be more suitable than weights based on error values.Footnote 20

Table 2 Structure of the single models for patients hospitalized with mild symptoms and in the ICU
Table 3 Forecast accuracy measures for the single and hybrid models (patients hospitalized with mild symptoms)
Table 4 Forecast accuracy measures for the single and hybrid models (patients hospitalized in the ICU)

In particular, for patients hospitalized with mild symptoms, the most accurate single model was NNAR, followed by seasonal ARIMA, while the best hybrid model was NNAR–TBATS, followed by ARIMA–NNAR, and ARIMA–NNAR–TBATS. For patients hospitalized in the ICU, the best single model was NNAR, followed by ARIMA, while the best hybrid model was ARIMA–NNAR, followed by ARIMA–ETS–NNAR, and ETS–NNAR.Footnote 21

Autocorrelation function (ACF) indicated that current values were not correlated with previous values at lag 1 (Tables 2 and 3). In fact, the correlation coefficient between one point and the next in the time series ranged from − 0.07 to 0.19 for patients hospitalized with mild symptoms and from − 0.16 to 0.31 for patients in the ICU. The highest values (0.19 and 0.31) were obtained for the TBATS model.Footnote 22

According to Lewis’ [52] interpretation, since MAPE was always significantly lower than 10, all predictive models can be considered as highly accurate. Moreover, MASE was much lower than 1 for all models; therefore, all the proposed forecasting approaches performed significantly better than the forecasts from the (no-change) “naïve” methods, i.e., the forecasts with no adjustments for casual factors (Hyndman and Koehler 2006), which justifies the use of more complex and sophisticated models.

Tables 5 and 6 compare the hybrid models with the respective single models considering the minimization of MAE, MAPE, MASE, and RMSE metrics. For patients hospitalized with mild symptoms, the hybrid models outperformed the respective single models in 98 out of 112 metrics, i.e., on 87.5% of all the forecast accuracy measures. For patients hospitalized in the ICU, the hybrid models outperformed the respective single models in 81 out of 112 metrics, i.e., on 72.3% of all measures. In the latter case, however, almost all losses of efficiency (25 out 31) were attributable to NNAR.Footnote 23 Thus, the hybrid models generally increased forecast accuracy, but this increase was more evident for patients hospitalized with mild symptoms.

Table 5 Comparison between hybrid models and respective single models considering the minimization of MAE, MAPE, MASE, and RMSE metrics (in percentage), for patients hospitalized with mild symptoms
Table 6 Comparison between hybrid models and respective single models considering the minimization of MAE, MAPE, MASE, and RMSE metrics (in percentage), for patients hospitalized in the ICU

The best hybrid model for patients hospitalized with mild symptoms—NNAR–TBATS—outperformed the single NNAR and TBATS models by 5.63–9.23% on MAE and 5.88–9.95% on RMSE. While, the best hybrid model for patients hospitalized in the ICU—ARIMA–NNAR—outperformed the single ARIMA and NNAR models by 1.2–8.67% on MAE, and 2.81–11.44% on RMSE. Figures 3 and 4 graphically represent the models for both time series ranked by the MAE and RMSE metrics.

Fig. 3
figure 3

Models ranked by the MAE and RMSE metrics for patients hospitalized with mild symptoms

Fig. 4
figure 4

Models ranked by the the MAE and RMSE metrics for patients hospitalized in the ICU

Figures 5 and 6 show the best six models and the remaining nine models for patients hospitalized with mild symptoms, respectively. Similarly, Figs. 7 and 8 show the best six models and the remaining nine models for patients hospitalized in the ICU, respectively. The light blue area in each graph shows the prediction intervals at 80%, while the dark blue area shows the prediction intervals at 95%.Footnote 24 The forecasts of the best single and hybrid models anticipated an increase in the number of patients hospitalized with mild symptoms and in the number of patients admitted to the ICU over the next 30 days, i.e., from October 14, 2020, to November 12, 2020. This predicted trend was also confirmed by the remaining estimated models.

Fig. 5
figure 5

The six best forecast models for predicting patients with mild symptoms. Notes: the models were ranked (from first to sixth place) on the MAE metric

Fig. 6
figure 6

The remaining nine forecast models for predicting patients hospitalized with mild symptoms. Notes: The models were ranked (from seventh to fifteenth place) on the MAE metric

Fig. 7
figure 7

The six best forecast models for predicting patients hospitalized in the ICU. Notes: the models were ranked (from first to sixth place) on the MAE metric

Fig. 8
figure 8

The remaining nine forecast models for predicting patients hospitalized in the ICU. Notes: the models were ranked (from seventh to fifteenth place) on the MAE metric

Specifically, the NNAR–TBATS, ARIMA–NNAR, and ARIMA–NNAR–TBATS models predicted that: (i) after 10 days (October 23), the number of patients hospitalized with mild symptoms should have been 9624, 9397, and 9259, respectively; (ii) after 20 days (by November 2), the number should have been 18,000, 16,986, and 16,062, respectively; and (iii) after 30 days (by November 12), the number should have been 25,039, 21,669, and 21,430, respectively (Fig. 5). Regarding the number of patients hospitalized in the ICU, the ARIMA–NNAR, NNAR, and ARIMA–ETS–NNAR models predicted that: (i) after 10 days, the required number of intensive care beds should have been 1175, 972, and 1114, respectively; (ii) after 20 days, the required number should have been 2164, 1493, and 1915, respectively; and (iii) after 30 days, the number should have been 3270, 1985, and 2726, respectively (Fig. 7).Footnote 25

Figures 9, 10, 11 and 12 compare all 15 of the estimated models and the observed data over the period October 14, 2020, to November 12, 2020. For patients hospitalized with mild symptoms, the NNAR–TBATS models best fit the observed data (Fig. 9). Of the remaining models, predictions from ARIMA–NNAR, ARIMA–NNAR–TBATS, and NNAR most closely approximated the observed data (Figs. 9, 10). For patients hospitalized in the ICU, Fig. 11 reveals that ARIMA–NNAR, NNAR–TBATS, and ARIMA–NNAR–TBATS hybrid models also fit the observed data quite well. All estimated models generally exhibited a strong match between predictions and observed data, except for ETS, NNAR, and ETS–TBATS, where the trends differed (Figs. 11 and 12). Notably, the models with lowest loss of efficiency adapted better to the observed data, which confirms the consistency and robustness of the statistical approach.

Fig. 9
figure 9

Comparison between forecasts and real data during the period October 14, 2020, to November 12, 2020, for patients hospitalized with mild symptoms (six best models)

Fig. 10
figure 10

Comparison between forecasts and real data during the period October 14, 2020, to November 12, 2020, for patients hospitalized with mild symptoms (nine remaining models)

Fig. 11
figure 11

Comparison between forecasts and real data during the period October 14, 2020, to November 12, 2020, for patients hospitalized in the ICU (six best models)

Fig. 12
figure 12

Comparison between forecasts and real data during the period October 14, 2020, to November 12, 2020, for patients hospitalized in the ICU (nine remaining models)

Thus, a second wave of COVID-19 was predicted for the period following October 13, 2020, which could have several policy implications both for the national healthcare system and the economy. In particular, the predictions underscored the importance of implementing adequate containment measures and increasing the number of ordinary and intensive care beds, hiring additional healthcare personnel, and buying care facilities, protective equipment, and ventilators to fight the infection and reduce deaths.Footnote 26 Meanwhile, the opportunity to implement more or less restrictive non-pharmaceutical interventions (NIPs) to tackle the pandemic—such as social distancing, travel bans, the use of face masks, hand hygiene, and bar and restaurant restrictions [20]—should be evaluated carefully in light of these measures’ potentially negative economic impacts. In fact, according to Fitch Rating’s [24] previsions, the first wave of COVID-19 and the consequent massive lockdown measures may already have caused up to a 9.5% contraction in Italy’s 2020 GDP.

While the models with the lowest loss of efficiency seemed to adapt substantially well to the observed data in the forecast window, the predictions should, in general, be treated with caution and employed mainly to inform short-term decisions. In fact, pandemic forecasting has raised many doubts in the last year due to several issues that can affect its accuracy and reliability, including, for example, (i) high interval predictions and sensitivity of the estimates, especially with long-term forecasts; (ii) inaccurate modeling assumptions, and (iii) the lack of or difficulty in measuring and identifying biological features of COVID-19 transmission [30, 42]. These limitations—and the possibility for misleading forecasts—may erode public trust in science and thus affect compliance with policies intended to mitigate the spread of COVID-19 [49]. Indeed, the inevitable uncertainty associated with this novel disease and the general failure of long-term and even mid-term forecasts require a different scientific approach toward model predictions. More prudent and balanced communication with the public is crucial if the field of science desires to maintain its leading role in human development and policymaking.

Conclusions

This paper attempted to forecast the short-term dynamics of real-time patients hospitalized from COVID-19 in Italy. In particular, it employed both single time series forecast methods and their feasible hybrid combinations. The results demonstrated that (i) the best single models were NNAR and ARIMA for both patients hospitalized with mild symptom and patients admitted to the ICU, (ii) the most accurate hybrid models were NNAR–TBATS, ARIMA–NNAR, and ARIMA–NNAR–TBATS for patients hospitalized with mild symptoms and ARIMA–NNAR, ARIMA–ETS–NNAR, and ETS–NNAR for patients hospitalized in the ICU, (iii) hybrid models generally outperformed the respective single models by offering more accurate predictions, and (iv) finally, predictions for the number of patients hospitalized in the ICU generally better fit the observed data than did predictions for patients hospitalized with mild symptoms. Notably, the best hybrid models always included a NNAR process, confirming the extensive and successful use of this algorithm in the COVID-19 related literature [56, 57, 71, 72, 81].

Compared to the single models, the hybrid statistical models captured a greater number of properties in the data structure, and the predictions seemed to offer useful policy implications. In fact, consistent with real-time data, the models predicted that the number of patients hospitalized with mild symptoms and admitted to the ICU would grow significantly until mid-November 2020. According to the estimations, the necessary ordinary and intensive care beds were expected to double in 10 days and to triple in approximately 20 days. Thus, since new waves of COVID-19 infections cannot be excluded, it may be necessary to strengthen the national healthcare system by buying protective equipment and hospital beds, managing healthcare facilities, and training healthcare staff.

Although the hybrid models proved to be sufficiently accurate, it is nevertheless important to stress that statistical methods may lead to unavoidable uncertainty and bias, which tend to grow over time, due, for example, to public authorities’ progressive implementation of NIPs, such as the closure of public spaces and national or local lockdown measures, which the forecasts cannot adequately incorporate. Combining hybrid models with mechanistic mathematical models may partially overcome these issues by considering the effects of lockdowns on epidemiological parameters [68]. Thus, future research could proceed along these lines. Ultimately, since other factors may have affected COVID-19 dynamics, especially as the pandemic progressed, these predictions should be treated with caution and utilized only to inform short-term decision-making processes.