1 Introduction

The ongoing COVID-19 disease has caused tremendous negative impacts on human health and society. Until February 2023, the COVID-19 pandemic has resulted in 1,140,000 deaths and over 100 million cases (JHU CSSE 2023). As a highly contagious virus, its transmission routes are divided into direct, aerosol, and contact transmission. The respiratory droplets the infected person produces when coughing, sneezing, or talking may spread COVID-19 (CDC Spread 2021). Social distancing has been approved as an effective way to prevent the pandemic, a pivotal countermeasure to control the spread of COVID-19 (Hu et al. 2022). Also, the emergence of the COVID-19 vaccine has significantly reduced the mortality rate (Steiger et al. 2021). The effectiveness of the COVID-19 vaccines in preventing severe illness, hospitalizations, and deaths has been a monumental achievement in public health. However, the pandemic continues to pose challenges due to the emergence of new COVID-19 variants. These variants have occasionally led to increases in cases and have tested the resilience of public health measures.

Geographic information system (GIS) represents a technical system that collects, stores, manages, calculates, analyzes, and displays spatial data with the help of computer hardware and software systems to support spatial decision-making (Rogers 1999; Zhang et al. 2021; Zhang et al. 2014; Zhang et al. 2020). Thus, GIS has played a vital role in examining the spatial pattern of infectious diseases (Mollalo et al. 2018; Li et al. 2020; Huang et al. 2022). For example, the emergence of Web-GIS provides various GIS dashboards that can help monitor the live pandemic distribution worldwide (JHU CSSE) (Dong et al. 2020; Berry et al. 2020). Also, GIS modeling can spatially examine the spread patterns of COVID-19 and help scholars to understand the associations between the pandemic and explanatory socioeconomic factors. Fortaleza et al. (2020) applied Cox regression to reinforce the spatial and hierarchical spread, which indicates the importance of travel distance (Fortaleza et al. 2020). Also, socioeconomic factors such as median household income and poverty have been identified as crucial factors affecting social distancing practices and causing the spread of the disease (Ehlert 2021).

Nevertheless, while GIS has provided valuable insights into the spatial aspects of the pandemic, there remains a gap in understanding the influence of individual health factors. Studies have shown that older people with health concerns, such as heart disease and diabetes, are more vulnerable to severe symptoms and mortality due to COVID-19 (Iyanda et al. 2020, CDC COVID 2021). For instance, (Abdi et al. 2020) highlighted diabetes as a significant contributor to COVID-19 severity and mortality, and (Gu et al. 2020) found that the mortality risk for patients with heart disease was three times higher than for those without. These findings underscore the need for a more comprehensive analysis integrating health and mobility factors with spatial and temporal data to fully understand COVID-19's impacts.

Geographically weighted regression (GWR) has been one of the most effective methodologies to examine the non-stationary spatial relationships between the dependent and independent variables (Fotheringham et al. 2002). GWR explores spatial object changes and related driving factors locally by establishing the local regression equation at each point in the spatial range and can predict future results (Fotheringham et al. 2002). Many studies can be found to apply the GWR model to COVID-19 studies (Liu et al. 2020; Jiao et al. 2021; Wu and Zhang 2021; Cui et al. 2022), which already proven the spatial heterogeneity issue in COVID-19 related studies. However, while GWR effectively captures spatial heterogeneity, it does not inherently account for temporal dynamics, which are crucial in understanding the evolution and progression of phenomena like the COVID-19 pandemic. The temporal dimension is particularly vital in infectious disease modeling, where transmission, recovery, and mortality rates can change significantly over time due to factors such as policy interventions, behavioral changes, and natural disease progression. The absence of this temporal component in traditional GWR models can limit their effectiveness in providing a comprehensive understanding of the pandemic's dynamics. As an extension of the GWR model, the Geographically and Temporally Weighted Regression (GTWR) model was introduced to model spatial and temporal variations simultaneously. It can deal with non-stationary spatial data and consider temporal effects (Fu and Li 2020). Additionally, to address the spatial heterogeneity evident in different scales in COVID-19 mortality data, we also employ the Multiscale Geographically Weighted Regression (MGWR) model. MGWR can analyze data across multiple scales and helps identify the differential impacts of various factors in distinct regions, further enriching our understanding of the spread and impact of COVID-19.

This study integrates various socioeconomic factors, mobility variables, and health conditions to understand their impact on COVID-19 mortality. This holistic approach allows for a more detailed and nuanced analysis than previous studies, which often considered these factors in isolation (Iyanda et al. 2020; Liu et al. 2020; Jiao et al. 2021; Wu and Zhang 2021; Cui et al. 2022). It aimed to quantitatively examine the non-stationary spatial and temporal associations between mortality and several socioeconomic factors such as mobility, health conditions, poverty level, and insurance coverage to assist policymakers in making better risk-informed decisions for public health management.

2 Background and theory

Geographic Weighted Regression (GWR) is a statistical tool used to explore the spatial variation in relationships between variables (Cui et al., 2022). They are widely used in various research fields, including geography, environmental sciences, and urban planning. This part aims to overview these three models and their applications in different research fields.

GWR is a spatial regression method that has been widely used to examine the spatial variation of relationships between variables. The method was first proposed by Brunsdon et al. (1998) and has been extended by various researchers since then:

$${y}_{i}={\alpha }_{0}\left(i\right)+{\sum }_{k=1}^{p}{\alpha }_{k}\left(i\right){X}_{ik}+{\varepsilon }_{i}, i=1,\dots ,n$$

The parameters in the model will vary by location \(i\), \({\alpha }^{\mathrm{^{\prime}}}\left(i\right)\) allows for local variation in the relationship between the variables to be captured. The relationship between dependent and independent variables varies over space in GWR. The basic idea of GWR is to estimate a separate regression model for each location in the study area, using data from neighboring locations to inform the estimation of the model (Brunsdon et al. 1998). The distance between locations is usually measured using a spatial weight matrix, which can be defined based on geographic distance or some other spatial relationship (Deller and Lledo 2007; Leung et al. 2000; Hu et al. 2022; McMillen 1996).

MGWR is a significant extension of the GWR model. While GWR allows for the estimation of spatially varying coefficients at the local level, MGWR allows for estimating these coefficients at multiple scales, capturing the complexity of spatial relationships across different levels of analysis (Fotheringham et al. 2017):

$${y}_{i}={\sum }_{j=0}^{m}{\beta }_{bwj}\left({u}_{i},{v}_{i}\right){X}_{ij}+{\varepsilon }_{i}, i=1,\dots ,n$$

where \({y}_{i}\) is the dependent variable for the ith observation and \(bwj\) in \({\beta }_{bwj}\) indicates the bandwidth used for calibration of the \(jth\) conditional relationship. The concept of scale dependency is widely recognized in the spatial analysis literature, referring to the observation that spatial relationships between variables can vary at different scales. It means that the spatial processes and patterns observed at one scale may not hold at another scale. Therefore, analyzing spatial relationships at a single scale may not capture the full complexity of the relationship and may result in biased or incomplete results. To overcome this challenge, MGWR estimates coefficients for multiple spatial scales simultaneously, providing a more comprehensive and nuanced analysis of spatial relationships. The critical difference between GWR and MGWR is that MGWR models relationships between variables at multiple spatial scales, allowing for examining spatial non-stationarity across multiple scales (Fotheringham et al. 2017; Oshan et al. 2020).

GTWR model is another extension of GWR. It models the relationship between a dependent variable and one or more independent variables. GTWR extends the traditional regression models by incorporating spatial and temporal variations into the modeling process (Fotheringham et al. 2015):

$${y}_{i}={\beta }_{0}\left({u}_{i},{v}_{i},{t}_{i} \right)+{\sum }_{k=1}^{p}{\beta }_{k}\left({u}_{i},{v}_{i},{t}_{i}\right){X}_{ik}+{\varepsilon }_{i}, i=1,\dots ,n$$

Compared with MGWR, a temporal element added to the formula of GTWR. GTWR allows for considering temporal effects and non-stationary spatial data (Fu and Li 2020). This method accounts for the heterogeneity of the relationships between the variables, which can vary based on geographic or temporal location. GTWR is particularly useful when there is evidence of non-stationarity in the data, which means that the relationships between variables change over space and time. While it may have limitations in terms of computational intensity and the selection of optimal bandwidths, GTWR remains a valuable tool for researchers looking to model their data's spatial or temporal variations.

3 Methods

This study investigated county-level variations of COVID-19 mortality across the United States by incorporating mobility, social, and other health-related data during July, August, and September 2021. The mobility data (home-dwell time) was retrieved from Safegraph (Safegraph 2021). Health-related data on heart disease and diabetes were collected from the Centers for Disease Control and Prevention Atlas (CDC, Atlas 2018). We collected other county-level demographic data from the National Historical Geographic Information System-NHGIS) (NHGIS (2021). The COVID-19 data was downloaded from the New York Times daily report on GitHub (Covid GitHub, 2020). All data were normalized during the model processing. Higher levels of mobility may exacerbate the impact of pre-existing health conditions on COVID-19 mortality. Increased mobility could result in greater exposure to the virus and a higher risk of infection, as suggested by previous studies (Fortaleza et al. 2020)). Additionally, populations with pre-existing health conditions, such as heart disease and diabetes, are known to be more vulnerable to COVID-19-related mortality. We hypothesized that the prevalence of these health conditions might play a critical role in the spatial distribution of COVID-19 mortality, with areas having higher rates of heart disease and diabetes experiencing a more significant number of deaths (Iyanda et al. 2020; CDC COVID 2021; Gu et al. 2020). Table 1 provides descriptions of the datasets and variables.

Table 1 Description of the datasets

3.1 Ordinary least squares (OLS) model

In this study, we first applied OLS models to the variables to observe the spatial associations between mortality and the other six explanatory factors (health, socioeconomic, and demographic factors). The OLS regression model assumes the correlation between the dependent variable and explanatory factors is stationary and constant (Mahanty et al. 2021). It calculates a global model for variables, generating one equation for the entire dataset (Bacha 2003; Batisani and Yarnal 2009; Geri et al. 2010; Ali et al. 2007). Here, the dependent variable is the COVID-19 mortality rate, and the independent variables refer to Mobility, Income, Non-Insurance, Old, Diabetes, and Heart disease, described in Table 1. The results of the OLS model are shown in Table 2. Table 2 shows that the OLS has a low-performance \({R}^{2}\) value, which indicates that the model only explained 47% of the independent variable. Also, it can reveal that the OLS model may not be able to capture all changes in the data, especially in the presence of spatial heterogeneity.

Table 2 Ordinary least squares (OLS) model results in July 2021

3.2 Multiscale geographically weighted regression (MGWR) model

According to Table 2, OLS has produced a low \({R}^{2}\) value (0.471), which indicates poor model fit outcomes. Therefore, a local regression model such as MGWR would be a better choice to improve the model accuracy and observe the spatial variations at the local level. The MGWR study uses a rigorous and comprehensive approach to model selection and parameter estimation to produce accurate and robust results. The statistical analysis results of the MGWR model for each month (July, August, and September 2020) are summarized in Tables 3, 4 and 5. Figure 1 illustrates the spatial distribution of local \({R}^{2}\) in July. Figure 1 represents the missing/null value data in white. The missing values appeared here due to the need for some data. For example, some counties lacked mobility data. Results showed that most counties received a high local \({R}^{2}\) value (over 0.75) except for some North Dakota and Montana counties. The results of the statistical analysis are illustrated in Table 3. According to Table 3, the MGWR model has a much higher adjusted \({R}^{2}\) (over 0.8 for July, August, and September) value compared to the OLS model (0.471) and a lower AICc value (2876.387) compared to the OLS model (4114.932). The MGWR model performs better than the OLS model in predicting COVID-19 mortality, mainly due to its ability to better handle spatial heterogeneity in data and provide more accurate local predictions. This performance is reflected in comparing the RMSE values of the two models. In addition, the bandwidth values of different parameters in the MGWR model vary in different months (as shown in Tables 3, 4 and 5), proving that the model can capture spatial heterogeneity at different time points. The dynamic changes in bandwidth reveal that the model can flexibly adapt to the spatial distribution characteristics of different periods, thereby providing more accurate analysis and prediction at the local level. Therefore, the selection of the MGWR model is not only due to its overall statistical performance superior to the OLS model but also its unique advantages in spatial heterogeneity analysis.

Table 3 Multiscale geographically weighted regression model for July 2021
Table 4 Multiscale geographically weighted regression model for August 2021
Table 5 Multiscale geographically weighted regression model for September 2021
Fig. 1
figure 1

Local R square map in July 2021

3.3 Geographically temporal weighted regression (GTWR) model

Even though the MGWR has produced reliable results for the model fit (\({R}^{2}\)) and AICc value, the spatiotemporal kernel function, which consists of mixed spatial and time bandwidths, does not always seem reasonable (Fotheringham et al. 2015; Fotheringham et al. 1998). The MGWR function must first find and fix an optimized spatial bandwidth and the optimized temporal bandwidth to calculate the spatiotemporal weight, indicating that MGWR could not simultaneously optimize both temporal and spatial bandwidths. Therefore, we applied GTWR to estimate the temporal and spatial weights and to analyze the spatiotemporal associations between the variables. The statistical analysis results in Table 6 illustrate that the GTWR model has a higher adjusted \({R}^{2}\) value, lower AICc value and RMSE value than the OLS model. GTWR produced a slightly higher \({R}^{2}\) than MGWR models, while it owned a lower RMSE value and AICc value.

Table 6 Geographically temporal weighted regression model results

4 Results

This section introduces the key findings of GTWR results. In this project, we used six independent variables (number of diabetes and heart disease cases, income level, mobility, number of older people and non-insurance population) to test the relationship with the COVID-19 mortality rate. According to Table 2, the OLS model produced a low \({R}^{2}\) value, which indicates poor model performance (Fotheringham et al. 2017; Oshan et al. 2020). Both MGWR and GTWR have produced good \({R}^{2}\) and AICc values, indicating the models' robust goodness of fit. Figures 2, 3, 4, 5, 6 and 7 illustrate the local effects (coefficient maps) of independent variables using GTWR.

Fig. 2
figure 2

Local effects of old factor

Fig. 3
figure 3

Local effects of median household income factor

Fig. 4
figure 4

Local effects of noinsurance factor

Fig. 5
figure 5

Local effects of heart disease factor

Fig. 6
figure 6

Local effects of diabetes factor

Fig. 7
figure 7

Local effects of mobility factor

The degree of correlation between a county and its COVID-19 mortality rate can be represented on a map using a color gradient, with darker shades of color indicating higher correlation and lighter shades of color indicating lower correlation. According to Fig. 2, the importance of the old factor in COVID-19 mortality has increased over the study period. Although there was still a positive correlation in July, the correlation was relatively low in most regions. However, as time progressed, this factor became increasingly important. The Arizona cluster in the southwest, Maine in the northeast, and Florida in the southeast, appear to be relatively insensitive to time, as their COVID-19 mortality rates remained consistently high throughout the study period. The number of older adults in a region positively correlates with COVID-19 mortality rates. This relationship is statistically significant across most areas, except for a few small regions where the correlation is weak or nonexistent. These findings highlight the importance of protecting vulnerable elderly populations in the fight against the COVID-19 pandemic, particularly in areas where older people comprise a significant proportion of the population.

Figure 3 shows distinct spatial clusters in areas such as Arizona, Nevada, and California. Interestingly, these clusters exhibit a relative insensitivity to temporal changes compared to other regions, suggesting unique regional dynamics at play. Notably, the impact of income on COVID-19 mortality is intensified over the study period, with a notable new cluster emerging around the middle of September, particularly in New Mexico and Arizona. In contrast, some regions show a positive correlation with the pandemic, indicating a diminishing significance of the household income factor in these areas. Figure 4 presents a clustering pattern that mirrors the observations made in Fig. 3, with both figures highlighting clusters in the central and southeastern areas of the study region. However, it is crucial to interpret these findings cautiously, as they represent correlations rather than direct causations.

In the southern U.S., particularly in Texas, a cluster negatively correlated with mortality is observed, suggesting a lower COVID-19 risk among uninsured populations. This counterintuitive finding coincides with when Texas reported a decrease in daily case and fatality numbers. Such observations underscore the complex interplay of socioeconomic factors in the pandemic's trajectory and highlight the importance of regional analyses in understanding COVID-19's impact.

Figure 5 shows a positive correlation between the heart disease factor and COVID-19 mortality across all regions, indicating that patients in areas with a higher prevalence of heart disease were more likely to experience higher levels of mortality, with a few exceptions. This pattern could be attributed to the already low mortality rates from coronary heart disease in areas like Kansas and Nebraska, potentially due to effective healthcare strategies and public health interventions targeting heart disease in these regions. Such interventions could have inadvertently strengthened the population's resilience against COVID-19. Additionally, these findings reflect broader socio-economic and health system factors that contribute to these states' overall lower mortality rates.

Similar to the findings for heart disease, most regions showed a positive correlation between diabetes prevalence and COVID-19 mortality rates, suggesting that areas with a higher prevalence of diabetes patients were more likely to experience higher COVID-19 mortality rates in Fig. 6. Intriguingly, a shift is observed in August, with the central region, particularly a small cluster in the heart of Texas, demonstrating a sudden negative correlation with mortality rates. The persistence of a cluster in central Texas across various local effect maps suggests a regional anomaly. This unusual pattern could be linked to Texas experiencing a significant reduction in daily reported COVID-19 cases and fatalities during that time, potentially influencing the overall mortality correlation in the state. This reduction may reflect the impact of localized public health initiatives, changes in community behavior, or other region-specific factors, such as the timing of virus waves or the introduction of health policies and interventions. The emergence of this cluster warrants further investigation to understand the confluence of factors that contributed to the observed decrease in COVID-19 mortality in Texas during the study period.

In assessing the local effects of mobility on the epidemiological landscape, Fig. 7 provides a temporal choropleth representation. A clear temporal transition is evident, with mobility levels varying significantly across regions. In July, most counties exhibit low mobility levels, with darker shades concentrated in specific areas, suggesting stricter adherence to mobility restrictions or potentially less need for movement. As the summer progresses, the map for August reveals an increased mobility range, with a substantial number of counties shifting towards lighter shades of blue, particularly in the Midwest and Southern regions. This shift could indicate a relaxation of restrictions or an adaptation to new norms of travel and movement. By September, the distribution of mobility levels becomes more homogeneous, with many regions returning to near-normal mobility levels, indicated by the prevalence of the 'above 0' category in the legend. The increase in mobility could be correlated with various factors, including the reopening of schools and businesses and an increase in travel and economic activity.

5 Discussion

This study marks a significant advancement in understanding the spatiotemporal dynamics of the COVID-19 pandemic by investigating its relationship with six crucial explanatory factors, including mobility, heart disease prevalence, diabetes prevalence, median household income, number of older adults, and insurance coverage, applying the MGWR model and GTWR model. While GTWR has been utilized in several studies, our research distinctively contributes by integrating these specific health and mobility factors into a spatial–temporal framework, thereby shedding light on how these variables collectively influence the pandemic's dynamics over time and across different regions. The MGWR and GTWR models improve the diagnostic information (AICc and \({R}^{2}\)) from OLS, so they perform better on our dataset. They can better explain the total variations in this issue. The direction of variables is in line with the expectation.

The positive associations of diabetes prevalence, heart disease prevalence, and the number of older adults with COVID-19 mortality align with existing literature that highlights the vulnerability of individuals with pre-existing conditions to the pandemic (Iyanda et al. 2020; CDC COVID 2021; Gu et al. 2020). However, the diminishing influence of the older adult population on mortality over time warrants a nuanced exploration. These results could suggest a successful implementation of protective measures for this demographic or a shift in the pandemic's impact on other age groups.

Contrarily, the negative association of median household income with mortality suggests socio-economic status as a buffer against the pandemic's worst outcomes, potentially due to better access to healthcare resources or the ability to adhere to protective measures such as social distancing (Ehlert 2021). However, the uniformity of this pattern across the country suggests underlying systemic factors at play that transcend local variations. Moreover, the fluctuating significance of the mobility factor raises questions about its role in the spread of COVID-19. While generally negatively related to mortality, indicating the effectiveness of stay-at-home measures, its varying sensitivity across regions points to the complex interplay of local policies, community compliance, and perhaps even cultural attitudes towards mobility and social interaction. There are some data limitations in this research. For example, only one temporal data might limit the performance of the GTWR model, while the results seem fine. Another is the limitation of mobility data types. Comparing different mobility indexes in the same model can improve the results. Due to the health data availability, we cannot access temporal health data to enhance the results.

6 Conclusion

The study investigated the significant factors contributing to the mortality rate of COVID-19 in the United States. Employing the MGWR and GTWR models, the study analyzed six variables: health, socioeconomic, and mobility. The results revealed that the GTWR and MGWR models could explain up to 91% and 88% of the variations in COVID-19 mortality, respectively. The findings highlight the critical role of mobility and health conditions related to COVID-19 in controlling the prevalence of the pandemic. The study indicates that policymakers should pay close attention to individuals with coronary heart disease and diabetes and their mobility. Moreover, the results show that the mean home-dwelling time in most counties negatively impacts the spread of COVID-19, and this impact decreased from July to August and then increased from August to September. The study also revealed that higher median household income could reduce mortality.

Additionally, the number of older adults is essential in addressing this issue, while the role of this factor needs to be more critical in the study. While the importance of addressing the needs of older adults is recognized, future research should explore the nuanced roles of various demographics in pandemic response. The positive correlation between lack of insurance and increased mortality underscores the need for a policy focusing on healthcare access and coverage. The study recommends that policymakers focus on individuals without insurance, as their lack of access to healthcare is positively related to the mortality rate.

The GTWR and MGWR models are crucial tools for analyzing spatiotemporal COVID-19 issues. They can aid federal and state agencies in resource allocation and applying lockdown policies. This study provides valuable insights into the factors contributing to COVID-19 mortality and the results can help governors make evidence-based policies for different regions.