1 Introduction

Infectious diseases can rapidly spread because it caused by breathing in an airborne virus, insect bite, sexual intercourse, skin contact by patient who is already suffering with that disease (Kaur et al. 2020).

When the disease spread runs out of control and infect a community or region with specified health behavior, or other health-related events more than normal expectancy it is called epidemic (Porta 2014). The term pandemic is commonly taken to refer to a widespread epidemic of contagious disease throughout a whole country or one or more continents at the same time (Honigsbaum 2009).

Although personal measures should be taken to avoid the infection and therefore their spread, for instance not to share personal things, to clean hands properly, to always take good and safe food, to get vaccinated or to cover month when sneezing or coughing (Kaur et al. 2020), health systems and governments of all countries must be able to develop and improve nonpharmacological measures like animal source containment, early detection and diagnosis, rigorous infection control, timely case report and rapid information dissemination, quarantines, mask obligation, lockdown and pharmacological measures like vaccine development (Yang et al. 2020).

Over the last few decades, mathematical models applied over infectious diseases growth have been helpful to gain insights into the transmission dynamics (Chowell et al. 2016) allowing scientists to forecast new cases and deaths as well as evaluate the interventions’ impact (Metcalf and Lessler 2017).

Although still showing numerous limitations and pitfalls often driven by data scarcity and delay, Smirnova and Chowell (2017) state that the integration of mathematical models’ prediction results with public health practice has the potential to increase the timeliness and quality of health care unit responses.

In addition, Chen’s et al. (2021) research investigates the temporal and spatial distribution characteristics of the COVID-19 outbreak in China such as the influence of different meteorological factors, the proportion of the population flow entered from Wuhan on other regions and the effects of nonpharmaceutical interventions. As a result of dealing with different factors, the authors were able to predict the number of infected cases under different controlling scenarios and conditions.

In this context, during COVID-19 pandemic, Marques et al. (2021) applied four univariate forecasting approaches using real COVID-19 data from 5 countries. These approaches are classical statistical models, compartmental models, state-space model and machine learning models and will be presented in Sect. 2.

After evaluating and comparing 66 previous works (see Table 15), we conclude that less than 10% of previous research applied multivariate techniques and none of them used more than one country or region. Thus, this research contributes to forecasting methods application over human infections diseases outbreaks by being the first attempt to evaluate, over real time-series data:

  • Of three different countries (Brazil, Italy and USA);

  • Using six univariate and two multivariate methods;

  • Providing a short-term prediction of 28 days ahead which is two or four times longer than similar previous research.

In Sect. 3, we present the all time series evaluated in this research and their features, how we choose the data range for all time series and how we split these data in data training and data test.

In Sect. 4, we detailed explain all forecasting methods used and how the error criterion was selected. Thereafter, in Sect. 5 we applied these methods for all time series, specify how the results are obtained and compared, choose the best model for each time-series and make a short-term prediction of 28 days.

Finally, in Sect. 6 we present research’s conclusions, address limitations and make proposals for further research.

2 Theoretical Background

Epidemics or pandemics disease outbreak have been devastating populations worldwide all over the years (Hays 2005; White 2006) and Kaur et al. (2020). From Athens epidemics (“Plague of Athens”) in 430–427 B.C (see Hays 2005 for more details) to coronavirus (SARS-CoV 2) also known as COVID-19 on going pandemic, the civilizations have lived with epidemics or pandemics caused mainly by virus and bacteria.

Kaur et al. (2020) summarized the most relevant disease outbreaks in human history like blackdeath (black plague), cholera, malaria and influenzas virus (Spanish, Hong Kong and Russian Flu). In addition, Hays (2005), White (2006) and Yamey et al. (2017) point out many others like the smallpox, blackdeath (black plague), cholera, influenza, HIV/AIDS, measles, dengue, Ebola, Zika virus.

Table 1 summarizes in a nonexhaustive list of worldwide human outbreaks diseases (epidemics or pandemics) by year, impact in number of deaths and where each one occurred.

Table 1 Majority of worldwide epidemics or pandemic. Source: The authors adapted from Yamey et al. (2017), Yang et al. (2020), Kaur et al. (2020)

Besides the number of human deaths caused by epidemics and pandemics, Kaur et al. (2020) state that it will not disappear in future if we do not find efficient ways to stop before spreading any disease to other population or countries.

Many authors use time-series approach (Chen et al. 2021; ArunKumar et al. 2021; Katris 2021; Benítez et al. 2020) to explain, evaluate and estimate further values (forecast) the behavior of some variable like outbreak disease cases, deaths, or transmission rate all over the time.

A time series is a set of data points arranged in time and its analysis intends to reveal reliable and meaningful statistics (Marques et al. 2021) that can be used to evaluate some patterns and forecast future values (Hyndman and Athanasopoulos 2018). It drew the attention of the scientific community when Yule introduced a general approach for time-series analysis in 1927 (Yule 1927).

In the same year, one deterministic compartimental model widely applied in epidemiology science was proposed by Kermack and McKendrick (1927), the susceptible–infectious–removed (SIR) model.

Almost three decades later (1950s) classical time-series statistical models started to appear (Holt 1957; Brown 1959; Winters 1960; Box and Jenkins 1970) as well as machine learning (Samuel 1959) and space-state model (Kalman 1960).

Bring forecasting methods to human infectious disease outbreak context, Chretien et al. (2014) proposed a framework to classify research as follows: Population-based forecasting studies (seasonal or pandemic), forecast type (temporal or spatial–temporal) and forecasting method (mechanistic, Statistical).

To the same authors, the forecasting method were divided into compartmental model, regression tree, generalized linear model, agent-based model, survival analysis, Bayesian network and time series model.

Focusing on forecasting method, during ongoing COVID-19 pandemic, Marques et al. (2021) presented four different univariate approaches for epidemiological time-series prediction, which will be able to provide support for Governments and Healthcare decision-makers. They worked with five countries real data: China, USA, Brazil, Italy and Singapore.

In this research, we adopt the framework proposed by Marques et al. (2021) that divided epidemiological time-series prediction in: classical statistical models (Sect. 2.1), compartmental models (Sect. 2.2), state-space models (Sect. 2.3) and machine learning models (Sect. 2.4).

In the following sections, we do not aim to present a exhaustive list of forecasting methods, but we present all methods applied over human disease outbreak prediction (summarized in “Appendix A,” Table 15). These methods were obtained after an extensive literature review which steps are presented in “Appendix D.”

2.1 Classical Statistical Models (CSM)

In this section, we present CSM methods found in literature that are divided into:

  • Exponential smoothing (ES) or their generalization error, trend and seasonal (ETS);

  • Autoregressive integrated moving average (ARIMA);

  • Vector autoregressive (VAR);

  • Vector error correction (VEC);

  • Vector autoregressive moving average (VARMA).

ES was proposed in the late 1950s (Holt 1957; Brown 1959; Winters 1960), and has motivated some of the most successful forecasting methods.

ARIMA was introduced by Box and Jenkins (1970) in the 1970 and takes into consideration changing disturbances in time and tendencies.

Hyndman and Athanasopoulos (2018) state that ES, or their generalization ETS, and ARIMA models are the two most widely used approaches to time-series forecasting and provide complementary approaches to the problem. While ES models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.

VAR, VEC and VARMA are the most used models the prediction of multivariate time series in econometric research. But these models can also be applied to predict human disease outbreaks (for more details, see Wu et al. 2018; Khan et al. 2020).

For instance, Kiang et al. (2021), ArunKumar et al. (2021), Talkhi et al. (2021), Katris (2021), Khan et al. (2020), Bomfim et al. (2020), Liang et al. (2020), Ramos et al. (2020), Zhang et al. (2019), Wang et al. (2019), Li et al. (2019), Choi et al. (2019), Chakraborty et al. (2019), Chumachenko et al. (2019), Haddawy et al. (2018), Wu et al. (2018), Wu et al. (2018) Zhao et al. (2018), Jerónimo-Martínez et al. (2017), Ray et al. (2017), Anggraeni and Aristiani (2016), Ke et al. (2016), Li et al. (2016), Johansson et al. (2016), Pradhan et al. (2016), Wu et al. (2015) Mekparyup and Saithanu (2015), Kane et al. (2014), Feng et al. (2014), Soebiyanto et al. (2010), Shen et al. (2008), Medina et al. (2007), Burkom et al. (2007) and Nobre et al. (2001) research applied these models to several human disease outbreaks like COVID-19, Ebola, Zika virus, dengue hemorraric fever (DHF), scarlet fever (SF), tuberculosis, malaria, leprosy, hemorragic fever with renal syndrome (HFRS), hand, foot and mouth disease (HFMD), HIV/AIDS, tuberculosis, malaria, influenza-like illness (ILI) and others acute respiratory infection (ARI). Predicted variables (daily cases, reproduction number, among others), prediction range and other methods applied for each mentioned research are summarized in Table 15.

2.2 Compartmental Models (CM)

In this section, we present CM methods found in literature that are divided into:

  • Susceptible–infectious–removed (SIR);

  • Susceptible-exposed-infectious-removed (SEIR);

  • Susceptible-infectious-susceptible (SIS);

  • Cellular automation (CA);

  • Growth models (GM).

One deterministic model widely considered in epidemiology is the SIR model, which is based on the classification of the individuals into three stages of infection and was introduced almost one hundred years ago by Kermack and McKendrick (1927).

All over the years SIR model was improved, and other stages were added (Krause et al. 2018), for instance: SEIR with or without intervention and SIS among others.

Considering single variables, GM like Richards (GMR), Gompertz (GMG), Logistic (GML) and Cellular Automation (CA) are widely used (Gerardi and Monteiro 2011) to describe and predict infectious diseases spread cases and deaths.

Research such as Chen et al. (2021), Katris (2021), Paul et al. (2021), Benítez et al. (2020), Wang et al. (2020), Smirnova et al. (2019), Eilertson et al. (2019), Suparit et al. (2018), Basile et al. (2018), Li et al. (2018), Valeri et al. (2016), Yang et al. (2014), Wang et al. (2013), Towers and Chowell (2012), Aguiar et al. (2011), Gerardi and Monteiro (2011), Laneri et al. (2010), Santos et al. (2009), Finkenstädt et al. (2005) and Gamerman and Migon (1991) applied these models to several human disease outbreaks like COVID-19, Measles, ILI, dengue, DHF, Skin and Soft Tissue Infections (SSTIS). Predicted variables (daily cases, reproduction number, among others), prediction range and other methods applied for each mentioned research are summarized in Table 15.

2.3 State-Space Models (SSM)

In this section, we present SSM methods found in literature that are divided into:

  • Hidden Markov Model (HMM);

  • Monte Carlo Markov Chain (MCMC);

  • Kalman filter (KF);

  • Exponential smoothing state-space model with Trigonometric, Box-Cox transformation, ARMA errors, trend and seasonal components (TBATS).

A SSM, also known in the technical literature as HMM, can be defined as a class of probabilistic models that describes the dependence between a latent state variable and an observed measurement (Koller and Friedman 2009). The term “state space” originated in control engineering subject (Kalman 1960). HMM can also be combined with simulation approach like Monte Carlo. It is called, according to Wang et al. (2013), MCMC.

SSM is a general framework for ES, ARMA and Trend and Seasonal component where TBATS, according to Talkhi et al. (2021), is widely applied to univariate time series.

The KF is a state-space model provides estimates of the unknown variables given the measurements observed over time using only the previous estimate for calculation which reduces the need for saving the whole data from previous iterations (Haykin 2004).

Research like Talkhi et al. (2021), Han et al. (2021), Benítez et al. (2020), Eilertson et al. (2019), Yang et al. (2014), Wang et al. (2013), Nunes et al. (2013), Mode et al. (1991) applied these models to several human disease outbreaks like COVID-19, Ebola, Zika virus, ILI, SSTIS, HIV/AIDS. Predicted variables (daily cases, reproduction number, among others), prediction range and other methods applied for each mentioned research are summarized in Table 15.

2.4 Machine Learning Models (MLM)

In this section, we present MLM methods found in literature that are divided into:

  • Multilayer perceptron (MLP);

  • Artificial recurrent neural network (RNN);

  • Long short-term memory (LSTM);

  • Convolutional neural network (CNN);

  • Feed-forward neural networks with a single hidden layer and lagged inputs (NNETAR) that is also divided into neural network autoregressive (NNAR) and nonlinear auto-regressive neural network (NARNN);

  • Extreme learning machine algorithm (ELM);

  • Automated machine learning (AutoML);

  • Ensemble empirical mode decomposition (EEMD);

  • Cross-location attention graph neural network (CLAGNN);

  • Support vector machine (SVM);

  • Bayesian model averaging (BMA);

  • Kernel conditional density estimation (KCDE);

  • Kernel ridge regression Gausian process network (KRRGPN);

  • Neural fuzzy inference system (NFIS);

  • Random forest (RF);

  • Generalized regression neural network (GRNN);

  • Genetic algorithm (GA);

  • Wavelet neural network (WNN).

First defined by computer scientists at the Dartmouth Conferences in 1956, artificial intelligence (AI) field draws upon computer science, mathematics, psychology, linguistics, neuroscience and many others (Ongsulee 2017).

Evolved from the study of pattern recognition and computational learning theory in AI, machine learning (ML) appears in 1959 (Samuel 1959) to explore the study and build algorithms that can learn from and make predictions on data (Kohavi 1998). Deep learning (DL) and neuro networks (NN) are subfields of ML (for more details, see Alzubi et al. 2018; Ongsulee 2017).

Some ML algorithms widely applied to forecast are MLP, RNN like LSTM with or without false nearest neighbors (FNN), CNN, NNETAR, ELM, AutoML, EEMD, CLAGNN, SVM, BMA, NARNN, NNAR, KCDE, KRRGPN, NFIS, RF, GRNN, GA and WNN.

All algorithms mentioned above were applied, for example, in ArunKumar et al. (2021), Talkhi et al. (2021), Katris (2021), Han et al. (2021), Ribeiro et al. (2020), Deng et al. (2020), Wang et al. (2020), Bomfim et al. (2020), Liang et al. (2020), Zhang et al. (2019), Wang et al. (2019), Choi et al. (2019), Chakraborty et al. (2019), Stolerman et al. (2019), Wu et al. (2018), Ray et al. (2017), Caicedo-Torres et al. (2017), Nguyen et al. (2017), Chau and Ngoc Anh (2016), Wu et al. (2015), Kane et al. (2014), Gerardi and Monteiro (2011) and Peng et al. (2008) research. Predicted variables (daily cases, reproduction number, among others), prediction range and other methods applied for each mentioned research are summarized in Table 15.

2.5 Research Synthesis

Table 15 summarizes forecast research applied in human infectious disease outbreak (pandemic or epidemic) context considering the “general methods” pointed out by Marques et al. (2021) the results. Only research that in fact make predictions were considered.

Columns 3 to 7 address the approach used in each research mentioned in last section. In columns 8 to 9 is presented the range of time windows used as well as the prediction range. Time windows found on previous research were day (d), week (w), month (m) or year (y). The prediction range are expressed in time windows mentioned, but some models proposed to forecast the whole pandemic period (wpp).

The variable measured/evaluated and forecasted in each research (column 10) on those time windows can be number of patient cases (ca), deaths (de) and recovered (re), admitted and discharged from hospital or intensive care unit (adhosp and dishosp) and transmission rate (rt). Excluding transmission rate, all measures mentioned can be counted in two different ways: by time window or cumulative. For example, daily cases (dca), monthly deaths (mde), yearly patients admitted in hospital (yadhosp), cumulative cases (Cca), cumulative patients discharged from hospital (Cdishosp).

Columns 11 to 13 presents in which countries each research applied the methods specified in columns 3 to 7, the type of forecast approach divided into univariate (Uni), causal or multivariate (Mul) and the disease outbreak studied.

After comparing sixty six works in Table 15 we can conclude that:

  • Approach: only 7 (10.61%) research apply multivariate methods to make predictions. 23 (34.84%) apply causal methods to make predictions combining number of specific queries all over the time on web search engines (Google, Baidu index), climate variables (temperature, air pollution, rainfall) or another seasonality infectious diseases. 36 (54.55%) apply only univariate methods to make predictions. The number of publications by approach is presented in Fig. 1.

  • Disease outbreaks: 14 research forecasted Dengue, Chikungunya or DHF, fourteen ILI, nine COVID-19, six HFRS, five Malaria, four Measles three Ebola and three HIV/AIDS.

  • Countries: 15 research were applied in disease outbreak in China, nine in Brazil, nine in USA, 4 in Thailand and three in Japan. Only 4 research worked with African countries.

  • Data range and prediction: only 4 research proposed to predict the whole pandemic period (wpp). Considering different time windows, twenty six research forecasted less or equal to six steps ahead.

  • Time window: twenty eight research worked with monthly cases and twenty four with weekly cases.

  • Variables: number of patient cases was studied in 63 research (95.45%) while deaths, recovered and hospital admission or discharge or transmission rate are not much explored (deaths appears in second place with only five research).

  • Epidemiological time-series prediction: 34 research applied CSM, twenty three applied MLM, twenty applied CM and only eight applied SSM. We found no research in which all approaches were applied. The number of publications by type of epidemiological time-series prediction is presented in Fig. 2. Only two research used three univariate approaches (Talkhi et al. 2021; Katris 2021) in a single country (Iran and Greece, respectively), sixteen research used two approaches, forty five research used only one approach and three reseach used approaches not mentioned by Marques et al. (2021).

  • On twenty CM models only five worked with growth models and basically applied three models: Richards (GMR), Gompertz (GMG), Logistic (GML). But we point out that there is other fourteen GM models (Fekedulegn et al. 1999; Kaps et al. 2000; Tsoularis and Wallace 2002; Khamis 2005) that were not explored in human disease outbreak context.

Fig. 1
figure 1

(Color figure online) Research publication by approach by year Source: The authors

Fig. 2
figure 2

(Color figure online) Research publication by model by year. Source: The authors

Although the current review shows benefits of using CM models including to provide mid- and long-term predictions and mostly uses susceptible–infectious–removed models or their variations, many assumptions over them must be made before obtaining all parameters (Smirnova et al. 2019), results of all stages and then a prediction of a whole pandemic period.

In addition, the real-time data COVID-19 showed us that new stages are necessary to be considered like immunity period and rate of reinfection, vaccination, period of strong nonpharmacological measures (quarantine and lockdown), among others.

The current research is the first attempt to evaluate over real data of three different countries (Brazil, Italy and USA) using three univariate approaches (CSM, SSM and MLM) proposed by Marques et al. (2021). We apply the same univariate methods proposed in Talkhi et al. (2021) and add KF.

Finally, we apply two multivariate approaches and compare their results with previous mentioned univariate methods to find which approach can better fit real data and give us a reliable short-term prediction to each region/country.

3 Data Sets Selection and Problem Statement

In this research, we work with real COVID-19 data of Rio de Janeiro (RJ) (Assad 2022) city health regions in Brazil, Italy (IT) regions (Krispin 2021) and US states (Dobbyn 2020). All time series used are presented in figures below.

We select these data sets because Brazil, Italy and US population were highly affected by COVID-19 pandemic and adopted different rules to fight against COVID-19 dissemination.

After first COVID-19 wave starts to spread, Italian government establish hard common measures for all country regions. Could we expect that the number of daily new cases from one region helps to explain and predict this number in other region or only the past data of the same region is better enough?

In other words, given common government rules which approach best fits and predict daily number of cases: univariate or multivariate methods?

Regional divisions of each Italian time series used in this research are quickly presented in Table 2. For more details, see supplementary material.

In USA, each state has autonomy to establish measures as long as they consider necessary to fight against COVID-19 dissemination. As a result many states adopted different measures, but the question proposed in Italy remains: Could we expect that the number of daily new cases from one State helps to explain and predict this number in other State or only the past data of the same region is enough?

USA has 51 states and work with this number of time series would be useless and time consuming considering the scope of this research. Thus, we choose the state with highest number of positive cases (California) and its surrounding states (Oregon, Nevada, Arizona) also presented in Table 2.

Closer to US policy, in Brazil each state were in charge of defining necessary measures to avoid COVID-19 dissemination. Here we bring the RJ city health regions time series with the same question, but we want to evaluate if the lower distance betwwen these health regions (comparing to US states and IT regions distances) could bring us a different result comparing to IT and US time series. RJ health regions are also presented in Table 2.

In Figs. 3, 4 and 5, we can see that COVID-19 pandemic started at different dates (presented below) and the data set range of each country or region can also vary according to data set source. To establish comparisons between forecasting techniques, we work with the same time-series range to all regions. Thus, in this research we work with time-series range of 369 days.

Fig. 3
figure 3

(Color figure online) Rio de Janeiro city COVID-19 daily cases per health region. Source: The authors

Fig. 4
figure 4

(Color figure online) Italy regions COVID-19 data per region. Source: The authors

Fig. 5
figure 5

(Color figure online) US states COVID-19 data per region. Source: The authors

  • Rio de Janeiro city time-series range available: from January 13, 2020, to December 22, 2021, but we decide to start on March 12, 2020 (when cases start to appear in every day). 651 days;

  • Italy regions time-series range available: from February 24, 2020, to July 27, 2021. 520 days;

  • US time-series range available: from March 4, 2020, to July 3, 2021. 369 days.

In this research, we evaluate all mentioned time series presented above using univariate and multivariate approach. Apply multivariate approach can potentially provide us reliable predictions given the high correlation that each time series has with the others in the same region at the same lag (correlation) and at different lags (auto-correlation) as we can see in Fig. 6. All correlation plots are available in section C.

Fig. 6
figure 6

(Color figure online) Auto-correlation plot between NV and OR. Source: The authors

All data sets are divided into training and test data and their lengths are 341 and 28 days, respectively. Our short-term forecasting is 28 days ahead. The reasons for choosing forecasting range of 28 days are presented below.

  • We work with 341 past observations which is more than 10 times the prediction length. It does not mean that our past data are larger enough to well train some models and give reliable predictions;

  • Forecasting daily new cases four weeks ahead allow decision makers in health departments to better plain resource availability or governments to choose adequate measures. At least, better than in most of the previous research daily predictions worked with shorter forecasting range (seven or fourteen days);

  • Considering that resource availability depends on health departments resource allocation (doctors, beds, among others), the smaller time window unit (until this moment is daily data) we are able to work with and provide reliable predictions will result in the more useful information to help decision makers to meet resources requirements that ensure an adequately treatment to patient demand.

Table 2 Regions or states division (for more details see “Appendix B”)

4 Forecasting Models Applied

In this research, we expand the framework proposed by Marques et al. (2021) by using more univariate approaches and adding multivariate approach. Models applied in next sections are presented in Table 3.

Table 3 Forecasting models applied in this research

To build each model is necessary estimate some times more than 10 parameters and present all of them in 1 or more table is not the aim of this research. Thus, we explain in Sect. 4.1 the main features of each model.

4.1 Applied Models Description

In this section, we provide a detailed explanation of forecasting methods summarized in Table 3.

  • ES: ETS is a class of models that essentially works with 2 components equations trend and season that can be added or multiplied to the remainder. In each model these components can not be significant, also known as none (N) or can be significant and better describe original time-series features as follows: additive (A) or additive damped (Ad) or multiplicative (M). This class of models can be combined in 18 different ways (A, N, A; M, Ad, M; for instance). Equations of each model are presented in Fig. 7. For more details, see Hyndman and Athanasopoulos (2018);

  • ARIMA: ARIMA or seasonal ARIMA (SARIMA) is a class of models that combine autoregressive (AR) and moving average (MA) with differenced values. The AR part of ARIMA (p) shows that the time series is regressed on its own past data. The MA part of ARIMA (q) indicates that the forecast error is a linear combination of past respective errors. The I part of ARIMA (d) shows that the data values have been replaced with differenced values of d order to obtain stationary data, which is the requirement of the ARIMA model approach (Kotu and Deshpande 2019). When we work with SARIMA the same components appears lagged by the length of seasonal time window (frequency) as P, D and Q. For instance, ARIMA (\(p=5\), \(d=0\), \(q=3\)) (\(P=0\), \(D=1\), \(Q=1\)) [\({\textit{frequency}}=7\)]. For more details, see Hyndman and Athanasopoulos (2018) and Kotu and Deshpande (2019);

  • Space-state model univariate (SSM-U): The state of a deterministic dynamic system is the smallest vector that summarises the past of the system in full (Haykin 2004). The linearity of state dynamics and observation process and the normal distribution of noise in state dynamics and measurements are the assumptions of SSM. A linear autoregressive equation \({x(t)} = A*x(t) + W(t)\) where \(W(t) \approx N(0,Q)\) with a measurement that is \({y(t)} = C*y(t) + V(t)\) where \(V(t) \approx N(0,R)\), define the linearized process in which \(y(t) \in \mathbb {R}\). The random variables W(t) and V(t) represent the process and measurement noise, respectively, and are assumed to be independent of each other and with normal distributions. In our case we will work with a vector length (\(n = 2\) for linear model and \(n = 3\) for order 2 polynomial model) which means a \(n*n\) dimensions A matrix. We select the best approach for each time series based on the lowest Akaike information criterion (AIC) criteria.

  • MLP: MLP is a supplement of feed-forward neural network. It consists of three types of layers—the input layer, output layer and hidden layer. The input layer receives the input signal to be processed. An arbitrary number of hidden layers that are placed in between the input and output layer are the true computational engine of the MLP. Similar to a feed-forward network in a MLP the data flows in the forward direction from input to output layer. The neurons in the MLP are trained with the backpropagation learning algorithm. MLPs are designed to approximate any continuous function and can solve problems which are not linearly separable. In time-series problem the input layer is past observations and we set then to choose between 1 and 28 (prediction length) according to Mean Square Error the optimal number of lags used and which lags will be used. The same criteria were used to define number of hidden nodes in each hidden layer;

  • NNETAR: NNETAR is a feed-forward neural networks with a single hidden layer and lagged inputs. This model works with 2 (for nonseasonal time-series) or 3 (for seasonal time-series) parameters: the number of past observations used as input layers (p), the number of past observations lagged by the length of seasonal time window used as input layers (P) and the number of neurons (k) in the single layer. For instance, (\(p=21\), \(P=1\), \(k=11\))[7]. For more details, see Hyndman and Athanasopoulos (2018);

  • TBATS: BATS model is Exponential Smoothing Method + Box-Cox Transformation + ARMA model for residuals. Aiming to reduced the parameters of model when the frequencies of seasonalities are high and giving more flexibility to deal with complex seasonality, De Livera et al. (2011) propsed TBATS model which is BATS model + Trigonometric Seasonal. Equations of the TBATS model are presented in equations below where \(\omega \) and \(\phi \) are Box-Cox and the damping parameters, respectively, ARMA(pq) process model the error and \(m_1\) to \(m_J\) list the seasonal periods used (in our case there is only \(m_1\) always equal to 7) while \(k_1\) to \(k_J\) are the corresponding number of Fourier terms used (in our case there is only \(k_1\)). For instance, TBATS (\(\omega \) = 0.21, [\(p=0\), \(q=0\)], \(\phi \) = 0.96, \( [\langle m_1=7, k_1=3 \rangle ]) \).

    $$\begin{aligned} y_{t}^{(\omega )}&= \frac{ y_{t}^{(\omega )}-1}{\omega },\quad \omega \ne 0, \\ y_{t}^{(\omega )}&= \log {y_{t}},\quad \omega = 0,\\ y_{t}^{(\omega )}&= l_{t-1}+\phi *b_{t-1}+\sum _{i=1}^{t} s_{t-m_i}^{i} +d_t,\\ l_{t}&= l_{t-1}+\phi *b_{t-1} +\alpha *d_t, \\ b_{t}&= (1-\phi )*b_t +\phi *b_{t-1}+\beta *d_t,\\ s_{t}^{i}&= s_{t-m_i}^{i} +\gamma _i *d_t, \\ d_{t}&= \sum _{i=1}^{p} \phi _i*d_{t-i}+\sum _{i=1}^{q} \theta _i*\epsilon _{t-i} +\epsilon _{t},\\ s_{t}^{i}&= \sum _{j=1}^{k_j} s_{j,t}^{i}, \\ s_{t}^{i}&= s_{j,t-1}^{i}*\cos {\lambda _j^i} + s_{j,t-1}^{*i}*\sin {\lambda _j^i} + \gamma _1^i*d_t ,\\ s_{t}^{*i}&= s_{j,t-1}^{i}*\sin {\lambda _j^i} + s_{j,t-1}^{*i}*\cos {\lambda _j^i} + \gamma _2^i*d_t \end{aligned}$$
  • VAR: A VAR(p) model is a generalization of the univariate autoregressive (AR) where (p) shows that the time series is regressed on past data of all time series for forecasting a vector of time series (Hyndman and Athanasopoulos 2018). Each variable has one equation that includes a constant and lags of all of the variables in the system.

  • Space-state model multivariate (SSM-M): A SSM-M model is a generalization of SSM-U and works similarly, but with \(y(t) \in \mathbb {R}^m\) where m is the number of time series considered.

Fig. 7
figure 7

ETS equations. Source: Hyndman and Athanasopoulos (2018)

4.2 Error Evaluation

To compare models results a error criterion must be selected, but choosing the right forecasting metric is not straightforward (Vandeput 2021) because each error criterion has shortcomings (for more details, see Shcherbakov et al. 2013).

For instance, Vandeput (2021) states that although the mean absolute percentage error (MAPE) is one of the most used KPIs to measure forecast accuracy it is considered a poor-accuracy indicator as long as it divides each error individually by, in our research, the daily cases, so it is skewed: high errors during low-demand periods will significantly impact MAPE.

Shcherbakov et al. (2013) provides an analysis of existing and quite common forecast error measures that are used in forecasting and divide them in:

  • Measures based on absolute forecast error: mean absolute error (MAE), median absolute error (MdMAE), mean square error (MSE) and root mean square error (RMSE);

  • Measures based on percentage errors: mean absolute percentage error (MAPE), median absolute percentage error (MdAPE), root mean square percentage error (RMSPE) and median percentage error of the quadratic (RMdSPE);

  • Measures based on symmetric errors: symmetric mean absolute percentage error (sMAPE) and median mean absolute percentage error (sMdAPE);

  • Measures based on relative errors: mean relative absolute error (MRAE), median relative absolute error (MdRAE) and geometric mean relative absolute error (GMRAE);

  • Measures based on scaled error: mean absolute scaled error (MASE), root mean square scaled error (RMSSE).

The same authors (Shcherbakov et al. 2013) states the following shortcomings for each type of error measures

  • Measures based on absolute forecast error:

    1. 1.

      The scale dependency. Does not work with objects in different scales or magnitudes;

    2. 2.

      The high influence of outliers in data on the forecast performance evaluation. If data contain an outliers with maximal value then absolute error measures provide conservative values;

    3. 3.

      RMSE, MSE have a low reliability: the results could be different depending on different fraction of data.

  • Measures based on percentage errors:

    1. 1.

      Appearance division by zero when the actual value is equal to zero;

    2. 2.

      Nonsymmetrical issue—the error values differ whether the predicted value is bigger or smaller than the actual;

    3. 3.

      Outliers have significant impact on the result, particularly if outlier has a value much bigger then the maximal value of the regular cases;

    4. 4.

      The error measures are biased. This can lead to an incorrect evaluation of the forecasting models performance.

  • Measures based on symmetric errors:

    1. 1.

      If the actual value is equal to forecasted value, but with opposite sign, or both of these values are zero, then a divide by zero error occurs;

    2. 2.

      These criteria are affected by outliers in analogous with the percentage errors;

    3. 3.

      If more complex estimations have been used, the problem of interpretability of results occurs and this fact slows their spread in practice;

    4. 4.

      In fact, they do not solve the nonsymmetrical issue problem.

  • Measures based on relative errors:

    1. 1.

      division by zero error still occurs when predicted value obtained by reference model is equal to the actual value;

    2. 2.

      If naive model has been chosen then division by zero error occurs in case of continuous sequence of identical values of the time series.

  • Measures based on scaled error:

    1. 1.

      If the forecast horizon real values are equal to each other, then division by zero occurs;

    2. 2.

      Besides it is possible to observe a weak bias estimates.

Thus, considering that all time series are in the same scale and we want to minimize the amount of number with scientific notation, we choose the root mean square error (RMSE) accuracy criteria to compare all models presented in Table 3. The results of each model is presented in Tables 45 and 6. The error evaluation is divided into 3 parts and was applied to using each model:

  • In-sample (RMSE IN): comparing training data with fitted values obtained;

  • Out-sample all (RMSE OUT-ALL): comparing all test data with predicted values obtained;

  • Out-sample mean (RMSE OUT-MEAN): comparing a piece of test data (7 days ahead) with predicted values obtained 4 times and calculate the average error. We run the same model without parameters re-estimation, but we add a new week data (7 days).

The reasons why we define our forecast range as 28 days ahead were presented in the end of Sect. 3.

5 Experimentation

In this section, we apply the methods presented in the second (Sect. 5.1) and third (Sect. 5.2) columns of Table 4 to each health region (Rio de Janeiro city), region (Italy) and states (USA).

5.1 Univariate Approaches

Tables 45 and 6 summarize the models obtained by each method and in Tables 78 and 9 the RMSE IN, RMSE OUT-ALL and RMSE OUT-MEAN are presented for each model. These methods only consider the previous values of the same variable to make predictions. In all models the seasonality time window is 7 (weekly).

All plots of time-series approach applied are available in supplementary material. In Fig. 8 an example is provided showing the results of ETS model application in Center (Italy Region). The best type model (ETS(M, Ad, M)) in each class of model (ETS) for Fig. 8, for instance is chosen by the lowest AIC criteria.

Fig. 8
figure 8

(Color figure online) ETS model applied in Center. Source: The authors

Table 4 Forecasting parameters per model per region for Rio de Janeiro health region
Table 5 Forecasting parameters per model per region for Italy regions
Table 6 Forecasting parameters per model per state for US
Table 7 Forecasting results per model per region for Rio de Janeiro health region
Table 8 Forecasting results per model per region for Italy regions
Table 9 Forecasting results per model per state for US

From Tables 78 and 9, we can conclude that the best error in-sample considering RMSE criteria are NNETAR (with thirteen) and MLP (with one) for all time series which is not surprising since neural networks work better the more data we give them.

Although outperforming on in-sample comparison, ML models do not obtained the same result by evaluating RMSE OUT-ALL and RMSE OUT-MEAN in which they got lowest RMSE in only five and two time series, respectively.

Trying to predict daily cases 28 days ahead without adding new data or parameter re-estimation (OUT-ALL), MLP showed better results for four RJ health regions and one US state. In the second place, TBATS showed better results for three IT regions and one US state. SSM-U appeared in the third position being chosen in two IT regions and one US state.

However, when we predict daily cases 28 days ahead adding new data weekly without parameter re-estimation (OUT-MEAN) we conclude that ES models give us better predictions for six time series while TBATS models and SMM-U were chosen for three and two time series, respectively. All these results are summarized in Table 10.

SSM-U best approach considering the lowest AIC criteria were order two polynomial model (\(n=3\)) in thirteen time series. Only in AZ time-series linear model (\(n=2\)) was chosen.

Table 10 Best performance models frequency

After comparing 6 different class of univariate forecasting models and point out which class of model according to lowest RMSE criteria, in next section we present two multivariate forecasting models.

5.2 Multivariate Approaches

SSM-M and VAR methods consider previous values of all variables available to make predictions. In Table 11, we summarize the forecasting error results.

VAR models are divided into four types of deterministic regressors: none, constant, trend or both (constant and trend). We select the deterministic regressors type to each multivariate time series using the lowest AIC criteria. In addition, to select the VAR model order (p) we adopt the Schwarz Criterion (SC(n)) and obtained \(p=1\) to RJ with constant and trend deterministic regressors (18 parameters), \(p=23\) to IT with trend deterministic regressors (113 parameters) and \(p=2\) to US with constant and trend deterministic regressors (20 parameters).

In SSM-M, we select a vector length (nx) that gave us the lowest error considering Akaike information criterion (AIC). The nx can be 8 (linear model) or 12 (polynomial order 2 model) to USA and 10 (linear model) or 15 (polynomial order 2 model) to RJ and IT (two or three times the number of univariate time series).

Table 11 Forecasting results per multivariate model per region for Rio de Janeiro, Italy and USA

From Table 11, we can conclude that:

  • Linear models were chosen for all RJ (\(n=10\)), IT (\(n=10\)) and US (\(n=10\)) data considering the lowest AIC criteria. In univariate time series, we obtained the opposite (almost all models obtained lowest AIC with polynomial order 2 models).

  • VAR(1) model obtained best in-sample (IN) error in four RJ health regions while in three US states and four IT regions SSM-M(8) and SSM-M(10) outperform VAR approach considering in-sample error. In other words, SSM-M models better fitted training data in ten time-series training while VAR models better fitted the other four time-series;

  • Predicting 28 days ahead without add new data or parameter re-estimation (OUT-ALL) VAR models tied with SSM-M models. VAR(2) achieve better results in three US states while SSM-M(10) fitted better four IT regions.

  • predicting 28 days ahead adding new data weekly without parameter re-estimation (OUT-MEAN) SSM-M models better fitted all IT regions and two US states while VAR(1) better fitted four RJ health regions. In other words, SSM-M models showed better results for eight time series while VAR models better fitted the other six time series.

In the next section, we compare results obtained with all approaches mentioned in Table 3 and detailed presented in Tables 4 to 9 and 11.

5.3 Comparing Results of Univariate and Multivariate Methods

In Tables 12 and 13 we compare best model (univariate and multivariate) for all time series considering IN, OUT-ALL and OUT-MEAN RMSE results. This comparison combines Tables 78, 9 and 11 presented in previous sections.

Table 12 Univariate and multivariate selected models and error in-sample

Table 12 reinforces the flexibility of neural networks to fit training data (IN) when working with a large number of observations. Results obtained by NNETAR (thirteen times) and MLP (one time) models outperform all univariate and multivariate models applied in this research.

Table 13 Univariate and multivariate selected models and error out-sample

Table 13 shows us that besides NNETAR not present the same performance taking into account out-sample results, we see another neural network method (MLP) providing the lowest RMSE OUT-ALL to 4 of 5 RJ health regions. It suggests that to RJ data, working with a large number of observations, neural networks methods can also give us a reliable short-term prediction (OUT-ALL).

However, to Italy five regions and USA four states, neural networks short-term prediction (OUT-ALL) only presented better results for NV and CA in US while TBATS models outperform in four Italy Regions (CEN, NOW, NOE and SOT) and in AZ (US).

Despite of high correlation between variables of RJ, IT and US (see Fig. 6 and supplementary data) time-series data, we see multivariate approach outperform only in R3 from RJ and in OR from US (OUT-ALL) using VAR models and OR from US (OUT-MEAN) using SSM-M.

It is important to emphasize that, although univariate models obtained the lowest RMSE in 39 of 42 time series, the difference of results between univariate and multivariate best approaches is lower in RJ than in IT and USA. It may occur because health regions in RJ city are close comparing to US states or IT regions.

The univariate methods could also outperform multivariate because we chose pure simpler models (SSM-M and VAR) and we did not combine them or propose to include more complex models on this analysis like VARMA or some neural network multivariate method. Finally, comparing RMSE OUT-ALL and OUT-MEAN results we can observe that:

  • The best class of Univariate models only remains the same in 5 time series (R1, CEN, ISL and NOW, CA). In all these predictions, as expected, OUT-MEAN results were lower than OUT-ALL;

  • Even changing the model selected OUT-MEAN results are lower than OUT-ALL in both approaches (Univariate and Multivariate);

Then, we can conclude that although we can make a reliable forecast 28 days ahead, updating the new daily cases weekly allows us to reduce the expected mean error of the forecast in all time series used.

5.4 Forecasting 28 Days Ahead

In Tables 12 and 13, we compared the error results between univariate and multivariate approach which provide us many useful insights.

In this section, we summarize the results presented in Tables 12 and 13 to select the best model for each time series evaluated and then apply it considering all data available (training and test data) to predict daily new cases 28 days ahead. The reasons for choosing forecasting range of 28 days were presented the end of Sect. 3.

To provide the daily new cases prediction proposed, we re-estimate all parameters of models selected in third column of Table 14. Finally, in Figs. 910 and 11, we present the forecasting values with confidence interval of 0.95.

Fig. 9
figure 9

(Color figure online) RJ health regions forecasting. Source: The authors

Fig. 10
figure 10

(Color figure online) IT regions forecasting. Source: The authors

Fig. 11
figure 11

(Color figure online) US states forecasting. Source: The authors

Table 14 Best model selection to each time series

6 Conclusions

In this research, we apply 6 univariate and 2 multivariate models to evaluate 14 time series from a Brazilian city (RJ), all Italian regions and 4 US states. For each time series, we pointed out the best approach considering the lowest RMSE criteria.

An extensive literature review (for more details, see “Appendix D”) were conducted to find forecasting models applied to human infectious disease outbreak (research’s scope) presented in Sects. 2.1 to 2.4.

In mentioned sections, we only pointed out forecasting models applied to the scope of this research which are summarized in Sect. 4. Thus, it is suggested to explore forecasting methods used in other subjects or knowledge area. An extensive list of forecasting methods can be seen at Petropoulos et al. (2022).

Although unusual in current literature of human infectious disease outbreak prediction or forecasting (less then 10% of research we found), we apply multivariate methods because of the high correlation and auto-correlation between different time series from the same region in many lags as we saw in Fig. 6.

In “Appendix C,” all auto-correlation plots are presented where we see a significant correlation between regions data until lag 15 to RJ and in all lags to Italy regions and US states.

In-sample (RMSE IN) results obtained best results using univariate MLM for all time series which is expected considering that these types of models usually provide better results the more data are available for training.

However, the same pattern was not observed in both out-sample (RMSE OUT-ALL and RMSE OUT-MEAN) results evaluation. In RMSE OUT-ALL univariate MLM outperform 4 times, TBATS 4 times and SSM-U 3 times. In RMSE OUT-MEAN ES outperform 6 times and TBATS 3 times.

Besides the strong potential of multivariate methods, we did not observe them outperforming univariate methods. It only happens 3 times (RMSE OUT-ALL and RMSE OUT-MEAN for CA and RMSE OUT-MEAN for OR). For this three time-series SSM-M have got the most reliable predictions.

Our prediction presented in Figs. 910 and 11, suggests that in the next 28 days:

  • 4 RJ health regions will remain on the same level of daily new cases, but in R5 is expected to face a considerable increasing of daily COVID-19 new cases. However, it will be at least lower than levels observed in previous data;

  • IT regions will face a exponential increasing of daily COVID-19 new cases, excluding CEN Region;

  • In US states, we can expect different behaviours of daily COVID-19 new cases. To AZ, it is expected a tiny decreasing while in CA and NV will increase. In OR it is expected that daily cases remains in the same level of 600 new daily cases.

As further research, we suggest the application of multivariate MLM techniques like multivariate MLP or LSTM (largely and successfully applied in literature for univariate time-series approach). Another possible way is to combine the mentioned multivariate methods with VARMA.

Causal models are largely applied in current literature and should be also explored. However, this type of approach also depends from collecting data from other sources that sometimes unavailable.

Finally, it is important to emphasize that the set of models and data collection that should be applied to any forecast human disease outbreaks depends on the type of disease transmission.

In airborne infectious diseases transmission like COVID-19, influenza, among others, we observe interesting applications combining daily/weekly or monthly cases with search engine of Google or Baidu (in China) or mobility data to find better predictions. On the other hand, diseases transmitted by vectors such as mosquitoes like dengue, Zika virus among others are typically combined with temperature and rainfalls, for instance.