1 Introduction

The severe acute respiratory syndrome coronavirus (SARS-CoV-2) that was first detected in late 2019 and the resultant disease (COVID-19) have completely upended the way we led our lives. The global pandemic has led to debilitating damage to our lives, economies, health care systems, and food security [1]. The virus is extremely transmissible, spreading through water droplets produced when talking, coughing, or sneezing [2]. Since the virus’ first reports in 2019, it has mutated multiple times and some of these mutations have proven to spread faster and infect easier [3]. Additionally, when a virus mutates, combating it with vaccines and other public health measures becomes increasingly difficult [4]. Hence, it is in our best interest to investigate the spreading mechanisms and obtain ways to predict outbreaks, which are one of the goals of this study. The transmission of the virus has been further aided by modern society’s hypermobility [5]. Early on, COVID-19 spread fast within China and other countries due to both international and national travel, through air, land, and seas.

In previous works, the relationship between human mobility and epidemics spreading has been investigated. Recently, we have had some very important works relating COVID-19 to traffic. Indicatively, we mention the fundamental work of Kraemer et al. [6] which analyzed human mobility data and traced infection metrics in the early and later stages of the COVID-19 pandemic. The results reinforce that earlier in the pandemic, strict travel restrictions are helpful and lead to easier confinement and control; later, once the outbreak is spread, travel restrictions are less useful and local measures, such as social distancing and masking, are preferable. In a second recent work [7], the authors posit that while traffic and human mobility are often the culprits for driving viral spreading, the traffic network structure is often overlooked from these studies. Hence, they propose a traffic-driven model that accounts for that; here, we also account for human mobility through the transportation network and consider the effects of traffic on edge affectthe spread of a disease.

Specifically, in this work we also investigate whether and how travel patterns affect COVID-19 dissemination; we do so by employing generalized network autoregression on a proxy of the US transportation network. We generate and use a network of all counties in the USA in an effort to forecast the spread of COVID-19 using data available for each county as well as travel patterns across counties. The remainder of the manuscript is organized as follows. First, in Sect. 2, we provide a brief literature review on models that have been put to the use to forecast COVID-19 cases and to protect the communities from its transmission. Then, in Sect. 3, we discuss our approach using the generalized network autoregression (GNAR). We also provide a description of the data that were acquired to perform the analysis in Sect. 4. Section 5 presents the computational experiments and the results we observed during our analysis. We conclude this work in Sect. 6.

2 Literature Review

Due to the impact of COVID-19 in our daily lives, a lot of research has already appeared on the analysis of the spread of the disease. That said, epidemics and pandemics such as the one caused by SARS-CoV-2 are not a recent phenomenon for humanity. As an example, the “Spanish flu” ravaged the world in the early 1900s. Shortly after the outbreak of the disease, in 1918, researchers Kermack and McKendrick published papers that presented mathematical models for predicting the number of infections in a population as a function of time: the assumption was that it is valid to split the population into smaller clusters or “compartments” when analyzing a disease’s propagation through a population [8]. Their foundational work continues to help epidemiologists model outbreaks of diseases today.

More recently, epidemiological models such as the Susceptible-Infected-Removed (Recovered) (SIR) and Susceptible-Exposed-Infected-Removed (Recovered) (SEIR) and other extensions have been put to use to model the movement of individuals from one “compartment” (i.e., Susceptible, Exposed, Infected, Removed) to the next [9]. As an example, a person may be moved from the initial state of Susceptible to the intermediate state of Infected upon exposure to and infection with a disease; later that same person may be categorized as Removed once they recover. As expected, such epidemiological models have been applied in the fight against COVID-19. These models have been largely successful, revealing their utility for policy to prevent the spread of disease.

In Cameroon, research based on SIR determined that the number of COVID-19 cases was limited due to the health precautions taken [10]. Another similar application of the SIR model originates from Saudi Arabia, where researchers analyzed the number of COVID-19 cases and deaths both with and without public health measures such as quarantine enforcement [11]. Although SIR models have been accurate enough in predicting the size of the COVID-19 outbreaks, more recent research indicates that individuals who contract the virus once can become infected again [12], necessitating a means to dynamically update the parameters of the SIR model in an effort to improve its predictive power. In [13], the authors propose time-varying these parameters to account for changes over time, using machine learning to determine exactly how to update these parameters.

Moreover, the incubation period of COVID-19 (i.e., the period during which an infected individual bears no symptoms yet can still transmit the virus to others) has proven to be an important factor in the spread of COVID-19 [14]. While asymptomatic, some recently infected individuals can unknowingly spread COVID-19, a fact that needs to be included in epidemic models [15].

Similarly to the work from Saudi Arabia, researchers in Wuhan used the SEIR model to analyze the impacts of public health measures such as quarantines and restrictions of movement [16]. Following the time-varying updates recommended in [13], researchers in Portugal dynamically adjusted the exposure rates and other parameters in order to simulate infected asymptomatic individuals who can spread the virus [17].

Another methodology that has been put to the use in the fight against COVID-19 is agent-based simulation modeling. Even before SARS-CoV-2 first appeared, researchers have been using simulation in conjunction with transit data; the insight is that population movements will critically affect the spread of diseases [18]. As far as COVID-19 is concerned, agent-based simulation models have been used to test the effect of public health mitigation efforts. As an example, in [19] using agent-based simulation models, the researchers conclude that traditional measures such as mask-wearing and social distancing, as well as lockdowns, are viable tools in the fight against COVID-19.

The work presented here is heavily motivated by the literature on diffusion processes on networks (see [20]). COVID-19 and its spread is no exception, with many works pointing to the relationship between outbreaks and population movements through the transportation network [6, 21]. Since 2020, we have seen a multitude of works investigating the network spreading dynamics in air and rail networks as well as public transit [22,23,24,25].

Finally, we discuss time series models. Autoregressive Integrated Moving Average (ARIMA) models regress a forecast value onto previous values of the time series [26]. Thus, ARIMA models seek to describe autocorrelations in the time series data [27]. In India, researchers used ARIMA to model and predict COVID-19 infections [28], with higher accuracy of other moving average and exponential smoothing models. Still in India, other research analyzed COVID-19 spreading trends using both an ARIMA and a Holt-Winters model (Holt-Winters accounts for trends and seasonality) [29]. The accuracy of the models (during the time period specified) proved very high, at 99.8%. Another example of ARIMA and Holt-Winters models comes from Jakarta [30], finding that ARIMA outperforms the other time series approaches. Last, ensemble methods include a variety of time series models; the final prediction of an ensemble model is a combination of the time series models included [31]. Such an ensemble model was put to use in Nigeria. The time series model, called Prophet, processed missing values, seasonal effects, and outliers, allowing it to perform well against other models for predicting spread [32].

Researchers employing these techniques across the world can help leaders interdict the spread of the virus. What we mean by this statement is to use spread prediction in a way that informs mobility policy such that threat to human life is minimized. A recent interdiction policy, motivated by COVID-19, is presented in [33]. Interestingly, the authors utilize the mobility data and a set of different network science notions on a network obtained from the districts and boroughs in New York City. Outside the context of viral spread and epidemics, researchers have investigated the idea of using betweenness centrality and extensions, such as betweenness-accessibility [34] to identify the most critical links (i.e., streets or main arteries) and nodes (i.e., zip codes, cities, or counties) whose interdiction or closure lead to better isolation of areas. While our work does not focus on interdiction, our contributions can help policy-makers identify parts of the network that are more susceptible to increases in positivity rate.

3 The Generalized Network Autoregressive Process

In this section, first we describe the Generalized Network Autoregressive Process (GNAR) [35] and the associated R package [36]. Then, we present the way that we adapt the GNAR model to our problem. We also provide the different metrics that we use to evaluate the performance of time series models.

3.1 The GNAR Process

Suppose we have a directed graph \(\mathcal {G} = (N, E)\) where N is a set of nodes (\(N = \{1, \ldots , n\}\)) and E is a set of edges. Suppose we have an edge \(e = (i, j) \in E\) for \(i, j \in N\) and suppose a direction of e is from a node i to a node j, then we write it as \(i \rightarrow j\). For any \(A \subset N\) we define the neighbor set of A as follows:

$$\begin{aligned} \mathcal {N}(A) := \{j \in N/A| i \rightarrow j , \text{ for } i \in A\}. \end{aligned}$$

The r-th stage neighbors of a node \(i \in N\) is defined as

$$\begin{aligned} \mathcal {N}^{(r)}(i) := \mathcal {N}\{\mathcal {N}^{(r-1))}\}/[\{\cup _{q = 1}^{r-1}\mathcal {N}^{(q)}(i)\}\cup \{i\}], \end{aligned}$$

for \(r = 2, 3, \ldots\) with \(\mathcal {N}^{(1)}(i) = \mathcal {N}(\{i\})\).

Under this model, we assume that we can assign a weight \(\mu _{i, j}\) on an edge (ij). We define a distance between nodes \(i, j \in N\) such that there exists an edge \((i, j) \in E\) as \(d_{i, j} = \mu _{i, j}^{-1}\). Then we define

$$\begin{aligned} w_{i, k} =\frac{\mu _{i, k}}{\sum \limits _{l \in \mathcal {N}^{(r)}(i)} \mu _{i, l}}. \end{aligned}$$
(1)

The GNAR model uses a covariate for an edge effect in different types of nodes by an additional attribute, such as infected or not infected in an epidemiological network. Assume that a covariate takes discrete values \(\{1, \ldots , C\} \subset \mathbb {Z}\). Then, let \(w_{i, k, c}\) be \(w_{i, k}\) for a covariate c such that

$$\begin{aligned} \sum _{q \in \mathcal {N}^{(r)}(i)}\sum _{c = 1}^C w_{i, q, c} = 1. \end{aligned}$$

Now we are ready to define the generalized network autoregressive processes (GNAR) model. Suppose we have a vector of random variables in

$$\begin{aligned} X_t := (X_{1, t}, \ldots , X_{n, t}) \in \mathbb {R}^n \end{aligned}$$

which varies over the time horizon and each random variable associates with a node. For each node \(i \in N\) and time \(t \in \{1, \ldots , T\}\) a generalized network autoregressive processes model of order \((p, [s]) \in \mathbb {N} \times (\mathbb {N}\cup \{0\})^p\) on a vector of random variables \(X_t\) is

$$\begin{aligned} X_{i, t}:= \sum _{j = 1}^p \left( \alpha _{i, j}X_{i, t-j} + \sum _{c = 1}^C\sum _{r = 1}^{s_j}\beta _{j, r, c} \sum _{q \in \mathcal {N}_t^{(r)}(i)} w_{i, q, c}^{(t)}X_{q, t-j}\right) \end{aligned}$$
(2)

where \(p \in \mathbb {N}\) is the maximum time lag, \([s]:= (s_1, \ldots , s_p)\), \(s_j \in \mathbb {N} \cup \{0\}\) is the maximum stage of neighbor dependence for time lag j, \(\mathcal {N}_t^{(r)}(i)\) is the rth stage neighbor set of a node i at time t, and \(w_{i, q, c}^{(t)} \in [0, 1]\) is the connection weight between node i and node q at time t if the path corresponds to covariate c. \(\alpha _{i, j} \in \mathbb {R}\) is a parameter of autoregression at lag j for a node \(i \in N\) and \(\beta _{j, r, c} \in \mathbb {R}\) corresponds to the effect of the rth stage neighbors, at lag j, according to a covariate \(c = 1, \ldots , C\).

3.2 COVID-19 Analysis Using GNAR

In order to apply the GNAR model defined in this section to the county network on COVID-19 data, we set variables as follows.

Note that the GNAR model conducts a time series analysis on the time series data on the networks. The GNAR model assumes that the topology of the network is fixed over the time horizon \(t > 0\). In this research, the network is the county network \(\mathcal {G} = (N, E)\), where each node \(i \in N\) is a county in the particular state in the USA and we draw an edge \((i, j) \in E\) between a county \(i \in N\) and a county \(j \in N\) if and only if a county i has commuters traveling to a county j. A weight \(\mu _{i, j}\) on each edge \((i, j) \in E\) is the number of commuters from a county \(i \in N\) and a county \(j \in N\). The GNAR model assumes that these weights \(\mu _{i, j}\) are fixed over the time horizon \(t > 0\). Therefore, the input of the GNAR package includes these variables. Now, what we wish to infer using the GNAR model are random variables

$$\begin{aligned} X_t := (X_{1, t}, \ldots , X_{n, t}) \in \mathbb {R}^n, \end{aligned}$$

where \(X_{i, n}\) is the number of COVID-19 cases of deaths from COVID-19 at a county \(i \in N\) at the time \(t > 0\).

In this research, we do not have differences between all nodes, i.e., we treat all counties in N as the same type. Therefore, we ignore this covariate index c, rendering the formulation of the GNAR model as follows. For each county \(i \in N\) and time \(t \in \{1, \ldots , T\}\) a generalized network autoregressive processes model of order \((p, [s]) \in \mathbb {N} \times (\mathbb {N}\cup \{0\})^p\) on a vector of the numbers of COVID-19 cases of deaths from COVID-19 \(X_t\) is

$$\begin{aligned} X_{i, t}:= \sum _{j = 1}^p \left( \alpha _{i, j}X_{i, t-j} + \sum _{r = 1}^{s_j}\beta _{j, r} \sum _{q \in \mathcal {N}_t^{(r)}(i)} w_{i, q}^{(t)}X_{q, t-j}\right) \end{aligned}$$
(3)

where \(p \in \mathbb {N}\) is the maximum time lag, \([s]:= (s_1, \ldots , s_p)\), \(s_j \in \mathbb {N} \cup \{0\}\) is the maximum stage of neighbor dependence for time lag j, \(\mathcal {N}_t^{(r)}(i)\) is the rth stage neighbor set of a county i at time t, and \(w_{i, q}^{(t)} \in [0, 1]\) is the connection weight between a county i and a county q at time t. \(\alpha _{i, j} \in \mathbb {R}\) is a user specific parameter (tuning parameter) of autoregression at lag j for a county \(i \in N\) and a user specific parameter (tuning parameter) \(\beta _{j, r} \in \mathbb {R}\) corresponds to the effect of the rth stage neighbors, at lag j. These user specific parameters are defined by a user. In this paper, we select three combinations of tuning parameters after conducting a model selection discussed in Sect. 3.2.1.

In addition, note that \(w_{i, k}\) for a county \(i \in N\) and its neighbor \(k \in \mathcal {N}^{(r)}_t\) is computed using the Eq. (1).

3.2.1 Model Parameters

The GNAR package takes in a number of parameters for its predictive time series models. For both the cases and the deaths, we adjusted two GNAR parameters to create three unique models. The first model fit applies a non-negative integer, alphaOrder = 1, that specifies a maximum time lag of 1 to model along with a vector of length betaOrder = 0, which specifies the maximum neighbor set to model at each of the time lags [36]. These parameters represent the time lag, p, and the maximum stage of neighbor dependence for each of the time lags, [s], as discussed above. The second model sets \(\mathtt{{alphaOrder}} = 0\) and \(\mathtt{{betaOrder}} = 1\). The third model is the default model in GNAR, with no parameter modifications, making both \(\mathtt{{alphaOrder}} = 0, \mathtt{{betaOrder}} = 0\). We conduct a model selection by changing alphaOrder and betaOrder from 0 to 5 independently. Table 1 provides a summary of the model parameter combinations.

Table 1 Model Parameter Summary. We vary the value of alphaOrder and betaOrder to create three different models for the COVID-19 cases and deaths

Because there are two prediction options (cases and deaths) and three model parameter selections, in total, we create 6 different combinations (e.g., Deaths - Model 1). Moreover, since we predict by state, each state has these 6 models for comparison.

3.3 Evaluation Performance

Measuring performance in traditional statistics often calls for measures of performance such as RMSE and adjusted \(R^2\). Although easily calculated, these measures do not measure errors in terms of the time horizon [37]. For outputs of a predictive time series model, performance can be measured by the mean absolute percentage error (MAPE) and the mean absolute scaled error (MASE). The MAPE measures an estimated average of a model’s forecast performance over the time horizon, while the MASE measures the ratio of an estimated absolute error of the forecast divided and estimated absolute error of the naïve forecast method over the time horizon [38]. The MAPE commonly falls between 0 and 1, but can be skewed outside this range if actual values are close to zero [38]. The MASE is less than one if a model has smaller error than the naïve model’s error and if a model has greater error than the naïve model’s error, then it is greater than one [38]. The MAPE and MASE are defined by the following equations:

$$\begin{aligned} MAPE = \frac{\sum _{t=1}^{N}\left|\frac{Y_t-F_t}{Y_t}\right|}{N}, \end{aligned}$$
(4)
$$\begin{aligned} MASE = \frac{\sum _{t=1}^{N}\left|\frac{Y_t-F_t}{Y_t-Y_{t-1}}\right|}{N}, \end{aligned}$$
(5)

where \(Y_t\) is the observation at time t, \(F_t\) is the predicted value, and \(Y_t-Y_{t-1}\) is error of the one-step naïve forecast.

In order for a model to have predictive power, its MASE must exceed the accuracy of the respective naïve model and we say it has a good forecasting if a model has the MAPE less than 0.2 [39].

When we measure the performance of each model in terms of the MASE and MAPE, we apply a rolling horizon design [40] in this paper. A rolling horizon design for a time series model is to assess accuracy of a time series model such that it updates the forecasted value successively using different subsets of previous and current observations, and then it takes averages of the performance of the model for different time periods.

4 Data

We obtain the data for this work from the United States Census Bureau (USCB), the United States of America Facts (USAFacts), and the Center for Disease Control and Prevention (CDC). The USAFacts obtains their data from the CDC [41] and updates the daily death count on their website [42]. Manipulating this data in Python, we transform the data into a usable format for the GNAR package, create our models, and assess them using a variety of evaluation performance metrics.

4.1 Data Description and Limitations

The data is entirely numerical, with no categorical predictors or response variables. No transformations are applied to the original data for the proposed models. We also note that the COVID-19 cases and deaths data meet the assumption of stationarity of errors because the noise of the data does not depend on the time at which the data was observed [27]. Autoregressive models require stationarity of the errors, meaning that the series’ variance must be constant over a long time period [43].

Furthermore, we assume that the COVID-19 data is complete and accurate. Although human error and reporting standards affect the number of deaths and cases sometimes, on any given day, we assume the data obtained from the CDC is accurate. Additionally, the data is autoregressive. Last, we assume the presence of no outliers [44]. The principal limitation of this data involves the constant nature of the commuting network structure [45]. The USCB compiled this commuting data over a 5-year period from 2011 to 2015, giving it a static property. We thus assume that the traffic and commuting patterns by county remain the same through the time of the COVID-19 pandemic.

4.2 County Network

The USA comprises 3,143 county or county equivalents as of 2020 [46]. 48 states use the term “county” to describe their administrative districts while Louisiana and Alaska use the terms “parishes” and “boroughs,” respectively. Each county is assigned a unique five-digit FIPS code. The first two digits represent the state’s FIPS code, while the latter three digits represent the county’s FIPS code within the state. This number serves as a uniform index for each county, facilitating county data sorting and filtering.

The number of counties per state varies widely across the USA, regardless of a state’s geographic size, population, or terrain. For example, Rhode Island, the state with the smallest land area, has 5 counties, while Alaska, the state with the largest land area, has 29 counties [47]. However, Delaware contains the least amount of counties and Texas contains the most. Moreover, the majority of the country’s population lives in only 143 of the 3143 counties as of 2020. Table 2 and Fig. 1 display the number of counties in each state.

Table 2 A list of all US counties per state (including the District of Columbia). As can be seen, the number of counties per state varies widely. Source: [48]
Fig. 1
figure 1

All counties in the USA plotted by the ggplot2 [49] and usmap [50] packages in R

Additionally, it is not uncommon for county information to change as time goes on. Counties can divide, merge, or rename themselves at any time, even if that time does not fall on a census year. For example, Colorado created Broomfield County from merging parts of other counties [51]. Shannon County, South Dakota, renamed itself to Ogala County in 2015 out of respect to its Native American heritage [52]. Of course, when new counties are formed or renamed, they are assigned new FIPS codes, which can complicate reporting of statistics later on. Regardless, many counties and states have protocols in place to prevent such mistakes.

In this work, we construct a network structure using the original commuting data from the USCB using Python. The commuting data comes in the form of a data frame with three columns: the county from which the individuals commute, the county to which the individuals commute, and the number of commuters. This represents a flow structure, where we can deduce how many commuters commute from one county to the next. Using Python, we transform this flow structure into a matrix format, with the row and column entries representing the “From” and “To” columns of the original data. Thus, one can easily search in this new commuting data matrix for how many commuters go from one county to the next.

In this research we divide the county network in each state. We designed the county commuting network structure for each state with the following information:

  • Workers commuting from within-state counties to within-state counties.

  • Workers commuting from out-of-state counties to within-state counties.

  • Workers commuting from within-state counties to out-of-state counties.

Dividing the network into states allows us to see more localized trends in COVID-19, instead of considering the entire country at once. States can act as “communities” in the country’s commuting network. Communities in network science are groups of nodes with similar characteristics [53].

5 Computational Experiments

As discussed earlier, GNAR takes a univariate time series dataset along with an underlying network structure in order to create a predictive time series model. After transforming the data for the GNAR model, we fit three models for each prediction type (for COVID-19 cases and COVID-19 deaths), giving us 6 models for each state. We will be selecting 5 states to showcase the results on; since the 5 states will be tested on 6 models each, this gives us a total of 30 individual models. We can evaluate the models for prediction accuracy graphically, using the MAPE and MASE as measures of performance.

In order to determine if our models would perform in a similar fashion across different states, we select a diverse array of states based on vaccination rates. Vaccination rates could affect the time-varying number of cases and deaths in a state, potentially leading to differing model performances. Hence, we choose the following states for our analysis: Rhode Island, Massachusetts, California, Florida, and Arkansas. Table 3 describes the vaccination rate (% of population) and corresponding rank out of 50 of each state we choose.

Table 3 State vaccination rates as of 11/2021. The vaccination rates capture the percentage of population that is considered fully vaccinated (two doses of the appropriate vaccines, or one dose of a single dose vaccine). The rates help us determine which states to choose for our comparison. Adapted from [54]

We calculate the MASE and MAPE for each GNAR model with respect to each test period within the 40-week forecast. We then calculate the mean, median, and variance of these values. In order to determine if a transformation of measurements in a given data helps in improving the performance of a model, we test transforming measurements in each county commuting network in the following ways:

  1. 1.

    logarithm transformation,

  2. 2.

    square-root transformation, and

  3. 3.

    normalization.

In our experiments, however, all the above transformations result in minor changes in the performance of the model; therefore, we report here the results using the original scale of measurements.

The naïve models across all states started with a high MAPE at the beginning in the time horizon, sometimes doubling the MAPE of all the other models. Additionally, the MAPE for the naïve model appears to increase slightly across all states in the last 4 weeks in the time horizon. Poor performance of the naïve model in the beginning of the time horizon may be caused by a sudden increase of cases and a lack of information in the beginning of the time horizon.

As we will see below, the MASE for Models 1 and 2 in each state exhibit a bimodal “hump”, with the largest hump centered around week 80 of our dataset. This hump then shows a sharp decrease for the last 4 weeks of the testing period for all models. Models 1 and 2 perform worse than the baseline naïve model during this bimodal hump period. The 80th week mark falls near the end of August of 2021: during that period, everywhere in the USA, the number of cases increased at a much slower pace than earlier. The timing coincides with when many people were fully vaccinated (for clarity, full vaccination in our work refers to a two-dose regimen or a single dose of an approved vaccine [55]). Hence, these big “humps” in the model performance could be caused by the effect of these vaccinations on the number of cases. Models 1 and 2 are the ones impacted by this increased number of vaccinations. This is because Model 1 and Model 2 are most affected by the numbers of cases in neighboring counties, and vaccination rates in neighbor counties are not necessarily the best predictors for the number of cases in each individual county. Overall, since Model 3 seems largely unaffected by correlations between numbers of fully vaccinated individuals in neighbor counties, Model 3 performs the best, since it outperforms the naïve model at almost all times during the time horizon. Earlier, we mentioned that full vaccination refers to individuals that have received two doses or a single dose of the appropriate vaccines. It would be interesting, given updated information and data, to study whether this effect changes when considering an individual as full vaccinated after having received the appropriate booster shot, also referred to as “up-to-date” individuals [55].

Table 5 provides a summary of the results obtained for each of the models in each of the states. More details graphical results are shown in Figs. 2, 3, 4, 5, and 6 for Rhode Island (RI), Massachusetts (MA), California (CA), Florida (FL), and Arkansas (AR), respectively. Recall that in order for a model to have predictive power, its MASE must exceed the accuracy of the respective naïve model [39]. Therefore, in order for a model to have a predictive power, MASE should be smaller than 1. Additionally, a model’s MAPE must be lower than 50%. Table 4 describes appropriate interpretations for different levels of MAPE.

Table 4 MAPE Interpretation. The following table provides a guide for interpretation of a time series model’s MAPE. The higher the MAPE, the less predictive power the model has. Source: [39]

As can be seen from Table 5 and Fig. 2, Rhode Island, a state with few counties but a dense population that is highly vaccinated, exhibits a great deal of difference among its models. Also from Table 5 and Fig. 3, Massachusetts, similar to Rhode Island, has a more urban population that is highly vaccinated. This state did not exhibit much difference among its models; regardless, Model 3 outperformed all of them. California, a state with a large land area and population that is highly vaccinated, did not exhibit much difference among its models. Model 3 outperformed all of them, regardless as we can see from Table 5 and Fig. 4. Florida, a state with a low vaccination rate and a large population, did not exhibit much difference among its models. Model 3 still performed the best out of all of them (Table 5 and Fig. 5). Finally, Arkansas, a state with a great deal of rural land and one of the lowest vaccination rates in the country, performed differently than other states. As we see in Table 5 and Fig. 6, Model 3 remains the best performing model.

Table 5 Summary statistics of each model’s performance in MAPE and MASE over the time horizon
Fig. 2
figure 2

The horizontal axis depicts the number of weeks since data collection began on January 22, 2020, for all plots in this figure. The four subfigures discuss the three models for the cases and deaths and the MAPE and MASE evaluation metrics for the state of Rhode Island. Adapted from [45] and [56]

Fig. 3
figure 3

The horizontal axis depicts the number of weeks since data collection began on January 22, 2020, for all plots in this figure. The four subfigures discuss the three models for the cases and deaths and the MAPE and MASE evaluation metrics for the state of Massachusetts. Adapted from [45] and [56]

Fig. 4
figure 4

The horizontal axis depicts the number of weeks since data collection began on January 22, 2020, for all plots in this figure. The four subfigures discuss the three models for the cases and deaths and the MAPE and MASE evaluation metrics for the state of California. Adapted from [45] and [56]

Fig. 5
figure 5

The horizontal axis depicts the number of weeks since data collection began on January 22, 2020, for all plots in this figure. The four subfigures discuss the three models for the cases and deaths and the MAPE and MASE evaluation metrics for the state of Florida. Adapted from [45] and [56]

Fig. 6
figure 6

The horizontal axis depicts the number of weeks since data collection began on January 22, 2020, for all plots in this figure. The four subfigures discuss the three models for the cases and deaths and the MAPE and MASE evaluation metrics for the state of Arkansas. Adapted from [45] and [56]

All states analyzed exhibit similar behavior in the measures of performance via the MAPE and MASE. All models decrease gradually in the MAPE as the time series prediction passes through the time horizon. The MAPE for the naïve model consistently starts very high but then gradually approaches 0 along with the other models. Models 1 and 2 perform worse than the MAPE for the naïve model from approximately week 71 to week 84. Most of the states exhibited a bimodal structure in the MASE from approximately week 66 to week 84. Approximately week 70 to week 77 showed a rapid increase in the MASE across all states. Models for all states perform similarly because of the inclusion of both into-state and out-of-state travel. Likely, there are some counties that people throughout the country commute to that are included in many states, including those that we select in this work. Because Model 3 outperforms the naïve model most of the time in the time horizon for all states, this model should be considered by epidemiologists. The GNAR models on county networks with commuting information prove potentially useful for predicting COVID-19 cases and deaths.

6 Conclusion

The coronavirus pandemic has ravaged the world, killing many, and affecting the daily lives of all people. Concentrated efforts from all parts of the earth have attempted to curb this virus’ spread. From mathematical models to public health policy decisions, these efforts have brought the world together in an attempt to eradicate this virus. In this work, we show that the GNAR model performs very well in predicting COVID-19 cases and deaths throughout the county network in the USA. Using the open-source data from common sources, including the USCB and USAFacts, we can create a predictive model that could better inform public health officials.

For example, cell phone data is both nearly ubiquitous and surprisingly accurate [57]. Companies and organizations have been able to harness the data from commuter’s cell phones using their navigation applications in order to better influence their prediction of traffic flow through an area [57]. This data is almost live, since it comes directly from the drivers as they drive along a road. This live traffic data could help describe a by-county commuting network. The network could be dynamic, changing as more data is obtained. Perhaps analyzing trends over the past few weeks in a local area could result in a more accurate and current county commuting network structure. In addition, in this paper, we assume that the traffic and commuting patterns by county remain the same through the time of the COVID-19 pandemic as of the USCB compiled this commuting data over a 5-year period from 2011 to 2015, giving it a static property. In the beginning of the pandemic, because of the lock-down in many states, we had much less traffic flows. The effect of less traffic flows between counties might be most to Model 1 and Model 2 since Model 1 and Model 2 put parameters to weight on neighbor traffic flows. However, since Model 3 has less weight on the traffic flows between a county and its neighbor counties, Model 3 was less affected by the change of traffic flows between counties. If we had information on traffic flows between counties during the pandemic, we expect that the performances of Model 1 and Model 2 will be improved.

The data sources of this work are delineated by county, which provides data for a more localized area. However, one could further subdivide this data into zone improvement plan (ZIP) codes in order to obtain an even further refined prediction at a lower level. The CDC currently only collects data at the county level; however, with future technologies for tracking a disease’s spread, the CDC could subdivide its data even further. As of November 2021, there exist 41,692 ZIP codes in the USA [58]. Since individuals are freely able to move between their ZIP codes, and since the frequency of moving between ZIP codes is likely higher on average, this subdivision of data may provide a great deal of insight into localized trends of movement of people. ZIP code analysis may demonstrate a more realistic representation of daily life and community interactions due to the relatively smaller distance between nodes.

In this paper we treated all states the same so that we ignored covariates \(c = 1, \ldots , C\) in Eq. (2). We might be able to use covariates \(c = 1, \ldots , C\) for information of low and high vaccination rates in states and it might increase performances of the models using the GNAR model to predict the number of cases.

Finally, one could find any network structure and incorporate it into the GNAR model as long as it is geographically delineated the same way as the time series data. Any data that describes a flow from one geographic area to another can be formulated into a network structure, which is a key component of a GNAR model. Comparing multiple network structures could provide insight into what is important in the dissemination of a disease. With the advent of structure centrality [59], this could be a possibly interesting avenue for extending traditional centrality metrics (see, e.g., [34]) in epidemic spreading.

Applying this methodology to other geographic areas or governance divisions could also prove useful around the world, not just the USA. Any country’s municipalities, provinces, or townships could represent nodes in a network similar to the US county structure. Comparing countries of a similar geographic, climatic, and demographic makeup to the USA may especially prove insightful. One can also compare and contrast the public health policy effects in different geographic areas.