Analyzing spatial variations of heart disease and type-2 diabetes: A multi-scale geographically weighted regression approach

Cui, Wencong; Hu, Nanzhou; Zhang, Shuyang; Li, Diya; Martinez, Luis; Goldberg, Daniel; Güneralp, Burak; Zhang, Zhe

doi:10.1007/s43762-022-00059-6

Analyzing spatial variations of heart disease and type-2 diabetes: A multi-scale geographically weighted regression approach

Original Paper
Open access
Published: 24 September 2022

Volume 2, article number 34, (2022)
Cite this article

Download PDF

You have full access to this open access article

Computational Urban Science Aims and scope Submit manuscript

Analyzing spatial variations of heart disease and type-2 diabetes: A multi-scale geographically weighted regression approach

Download PDF

Wencong Cui¹,
Nanzhou Hu¹,
Shuyang Zhang¹,
Diya Li¹,
Luis Martinez¹,
Daniel Goldberg¹,
Burak Güneralp¹ &
…
Zhe Zhang ORCID: orcid.org/0000-0001-7108-182X¹

2247 Accesses
2 Citations
Explore all metrics

Abstract

Heart disease is the leading cause of death in the United States. A person who has type-2 diabetes is twice as likely to have heart disease than someone who doesn’t have diabetes. Therefore, analyzing factors associated with both diseases and their interrelationships is essential for cardiovascular disease control and public health. In this article, we propose a Multi-scale Geographically Weighted Regression (MGWR) approach to observe spatial variations of environmental and demographic risk factors such as alcohol consumption behavior, lack of physical activity, obesity rate, urbanization rate, and income from 2005 to 2015 in the United States. The MGWR model has applied to eight census divisions of the United States at the county level: New England, Middle Atlantic, East North Central, West North Central, South Atlantic, East South Central, West South Central, and Mountain. Results illustrate that there are notable differences in the spatial variation of the risk factors behind these two diseases. In particular, obesity has been a leading factor that associate with diabetes in the east, south-central, and south Atlantic regions of the U.S. On the other hand, smoking and alcohol consumption was the primary concern in the northern part of the U.S., in 2005. In 2015, alcohol consumption levels decreased, but the smoking level remained the same in those regions, which showed a significant impact on diabetes in the neighboring regions. Between 2005 and 2015, lack of physical exercise has become a significant risk factor associated with diabetes in the Northeast and West parts of the U.S. The proposed MGWR produced high goodness to fit (R²) for most of the areas in the United States.

Geographically Weighted Regression Analysis of Cardiovascular Diseases: Evidence from Canada Health Data

Individual and contextual correlates of cardiovascular diseases among adults in the United States: a geospatial and multilevel analysis

Article 25 July 2019

Spatial analysis of cardiovascular mortality and associated factors around the world

Article Open access 16 August 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Heart disease and diabetes are two cardinal public health issues in the United States. According to the CDC official data, there were about 647,000 Americans who died due to heart disease each year, which is nearly one-quarter of the deaths in America (Heron 2019). According to CDC, in 2014 and 2015 the United States spent $219 billion in healthcare services, medicine, and lost productivity due to death. Medical conditions and lifestyle choices affect the likelihood to develop heart disease. A review of the literature indicates a strong association between type-2 diabetes^{Footnote 1} and coronary heart disease mortality (PAN et al. 1986). There are about 34.2 million people who have diabetes in the United States, which is approximately 10.5% of the total population (Centers for Disease Control and Prevention, 2022).

Several leading factors can cause heart diseases such as alcohol use (Vogel, 2019), smoking (Barengo et al. 2017), household income (Xiang et al. 2018), urbanization rate (Allender et al. 2008), blood pressure and depression (Brunström and Carlberg 2018, Li et al. 2020), as well as obesity and physical inactivity (Arsenault, 2010). Many of these factors such as urbanization rate (Fei et al. 2016), obesity, physical inactivity (Eaton and Eaton 2017), household income (Maty et al. 2005), smoking (Akter et al. 2017) and drinking habits (Holst et al. 2017) can also lead to diabetes. However, the existing research does not consider the spatiotemporal variation of those risk factors associated with heart disease or type 2 diabetes. For instance, does one geographic area have more heart disease and diabetes patients than other areas? If so, why? What are the leading environmental and demographic factors that cause heart disease and diabetes within a specific region? What kinds of public and health care services are needed in that region in order to reduce the disease risk?

The Geographically Weighted Regression (GWR) model has been used to model the non-stationary spatial variation and applied in various application domains to support spatial decision-making (Jiang et al., 2021). It explores the spatial changes and related driving factors of the research object in a certain scale by establishing a local regression equation of each point in the spatial range. GWR has been utilized to understand how spatial determinants vary across space in health and disease-related researches, such as targeting the spatial context of obesity determinants (Oshan et al. 2020) and modeling the transmission of hand, foot, and mouth disease (Hu et al. 2020). Some of the research work also focused on using GWR in modeling diabetes and heart disease. For example, Siordia et al. (2012) used GWR to estimate how poverty affects diabetes prevalence. Ford and Highfield et al. (2016) applied GWR to measure the association between social deprivation and cardiovascular disease mortality.

While much literature focuses on the applications of GWR in diabetes and heart disease (Hu et al, 2020), none studied the spatiotemporal variation of leading factors that may cause the diseases. Furthermore, the traditional GWR method assumes that all of the regression processes are in the same spatial scale, constraining local relationships within each model to vary at the same spatial scale (Yang 2014). Bandwidth in a GWR process represents the spatial range in which the data points can affect each other based on the distance between them. This range can vary from different independent variables as the spatial relationship between dependent variables and each independent variable may change. Therefore, GWR is limited to fit a single optimal bandwidth that reflects an “average” of the best bandwidths for each process (or different variables). On the other hand, Multiscale Geographically Weighted Regression (MGWR) can overcome that limit, which allows covariate-specific bandwidths to be optimized (Fotheringham et al. 2017).

In this research, we developed an MGWR-based approach to analyze the non-stationary spatiotemporal pattern of heart disease and diabetes using multiple environmental and demographic factors such as urbanization rate, household income, obesity percentage, blood pressure level, medical cost, physical inactivity percentage, smoking, and alcohol consumption habit in each county of the United States. The alcohol consumption habit data describes the proportion of adults (21 years and older) who have had, on average, more than one (for women) or two (for men) alcoholic drinks per day during the previous month. By employment of two-year data, the trend of major contributing factors can be determined. Showing the correlation between diabetes and heart disease risk factors will aid Federal and State agencies in developing educational campaigns, decide where to allocate resources, and predict future heart disease and diabetes rates based on adjusted risk factor levels. The rest of the article is organized as follows: section two introduces the background, theories, and data processing methods. Section three illustrates the methods of applying MGWR to the datasets in observing spatial variations of heart disease and diabetes patterns. Section four presents the research methods and results. Finally, sections five and six discuss the results and draw conclusions.

2 Methodology and theory

2.1 Study area and data processing

Our study area covers eight U.S. Census Divisions including New England, Middle Atlantic, East North Central, West North Central, South Atlantic, East South Central, and West South-Central, and Mountain. We selected all the regions for those eight U.S. Census Divisions except the Pacific division. For the Pacific Division, we only selected California, Oregon, and Washington. The medical datasets such as heart disease mortality rate and the number of diabetes patients were collected from the Centers for Disease Control and Prevention (CDC) (2022). We also included the datasets that describe people’s healthy living habits such as health care costs, moderate drinking and smoking habits, and their frequency of doing physical activities (IHME, 2014). Table 1 illustrates the descriptions of the dataset used in this project. In addition to the health-related datasets, we considered the urbanization rate as an important indicator (Kumar et al. 2006). Here, we used 300 m resolution classified land cover data from European Space Agency Climate Change Initiative (ESA-CCI-LC). We calculated the percentage of urban areas in each county as an index of urban rate. In addition, Figs. 1 and 2 are visualizations of the variables and study area.

Table 1 Descriptions of the datasets

Full size table

In this study, we included datasets that were collected in 2005 and 2015 to observe the change of spatial variations of these two diseases for these two time points. The 10-year study period can allow us to capture any significant changes over time in the spatial pattern of the prevalence of these two diseases and the corresponding risk factors across the country. For the dataset, we used the following variables: the median household income as income level variable; among adults age 18 and older; the proportion who smoke cigarettes as smoking rate variable; among all adults age 21 and older, the proportion who have had, on average, more than one (for women) or two (for men) alcoholic drinks per day as the moderate alcohol consumption rate variable; age-adjusted percentage diagnosed diabetes patients as diabetes variable; the age-adjusted percentage of leisure time as physical inactivity variable; age-adjusted percentage of obesity percentage as obesity variable; percentage of urban land cover in each county as urbanization variable.

Compare between Figs. 1 and 2, we can find that most variables showed an increasing trend across the country between 2005 and 2015, with the exception of urbanization. Because the United States had finished urbanization process in 1980s and the urbanization increase rate became slow after that. The smoking and obesity rate had experienced the most significant increase among all variables. For each factor, all the data values were normalized between 0 to 1 using Eq. 1 in order to make the data comparable within each attribute.

$${x}_{norm}=\frac{x-{x}_{min}}{{x}_{max}-{x}_{min}}$$

(1)

2.2 Ordinary least square (ols) and multiscale geographically weighted regression(MGWR)

In the first step, we applied an ordinary least square (OLS) model to the datasets to observe the global distribution pattern of both diseases. The OLS equation is illustrated below:

$${Heart}_{i}, {Diabetes}_{i}={\alpha }_{0}+ \alpha 1 {OBE}_{i}+ \alpha 2 {SMO}_{i}+\alpha 3 {DR}_{i}+\alpha 4 {INAC}_{i}+a5 {Income}_{i}+\alpha 6 {Urban}_{i}+{\varepsilon }_{i}$$

(2)

In this article, several MGWR models were developed to observe non-stationary correlation between each type of disease (heart disease and diabetes) and various environmental and demographic risk factors. Table 2 illustrates the description of dependent and independent variables. In this case, the dependent variables are heart disease mortality rate and the number of diabetes patients; the independent variables include various environmental and sociodemographic indicators. In this study, we also looked at the temporal variation of the disease pattern by including two years of observations, 2005 and 2015.

Table 2 Description of dependent and independent variables

Full size table

In this study, we employed Multiscale Geographically Weighted Regression (MGWR) to analyze how the relationship between heart disease and diabetes as well as between each of these diseases and their corresponding risk factors vary spatially.

Geographically Weighted Regression (GWR) has been often used for exploring spatially varying relationships. It is a spatial regression model that can be used to model spatial variations and non-stationary relationships between dependent and a set of independent variables. Traditional statistical methods, such as correlation analysis and ordinary least square (OLS) regression could lead to results where the impact of local variations could be hidden (Bacha 2003, Batisani and Yarnal 2009, Geriet al. 2010) because they produce ‘average’ or ‘global’ parameters to estimate the spatial relationships (Ali et al. 2007). In light of this, GWR is developed to build regression models to explore how one dependent variable changes in response to one or more independent variables at the local scale (McMillen 1996, Fotheringham et al. 1998, Leung et al. 2000, Yu and Wu 2004, Deller and Lledo 2007, Waller et al. 2007). The outcomes of the GWR model depend on the observations that are in close proximity to the subject point, so they reveal the relationships within the neighborhood (Fotheringhamet al. 2001, Foody 2004, Bickford and Laffan 2006). Fotheringham et al. (1998 and 2001) presented a general form of a basic GWR model (Fotheringham et al. 1998, Fotheringhamet al. 2001)

$${Y}_{i}={\alpha }_{0}\left(i\right)+{\sum }_{k=1}^{p} {\alpha }_{k}\left(i\right){X}_{ik}+{\varepsilon }_{i}, i=1,\dots ,n$$

(3)

It allows the parameters in the model to vary by location i. The GWR estimator is:

$${\alpha }^{^{\prime}}\left(i\right)={{(X}^{T}W(i)X)}^{-1}{X}^{T}W\left(i\right)Y$$

(4)

W(i) is a matrix of weights specific to location i (longitude, latitude) such that observations nearer to i are given greater weight than observations further away. The basic principle of GWR is that the data is ‘borrowed’ from nearby locations, weighted by the proximity of the location from which the data is being borrowed to the location for which the local regression is calibrated. This allows models to be calibrated specifically to location i. To minimize bias in the results, data from nearby locations is weighted more heavily than from more distant ones. In order to evaluate the performance of the GWR model, the following concepts were considered:

R²: A measure of goodness of fit of the model with values varying from 0.0 to 1.0, with higher values being preferable. It may be interpreted as the proportion of dependent variable variance accounted for by the regression model (Fotheringham, Charlton et al. 1998).

AIC: The optimal bandwidth was obtained in each GWR calibration through an iterative process to minimize the corrected Akaike Information Criterion (AIC_c) value calculated as:

$${AIC}_{c}=2n{log}_{e}\left(\widehat{\sigma }\right)+n{log}_{e}\left(2\pi \right)+2n\left[\frac{n+tr\left(S\right)}{n-2-tr\left(S\right)}\right]$$

(5)

where n denotes the sample size, is defined as the estimated standard deviation of the error term, and tr(S) denotes the trace of the hat matrix S.

Spatial autocorrelation: Global spatial autocorrelation is a description of the spatial characteristics of attribute values throughout the region. Spatial autocorrelation in the regression residuals is often interpreted to mean that (1) an important independent variable is missing from the regression, or (2) an underlying spatial process that induces spatial autocorrelation in some of the variables is missing from the model. Geographically Weighted Regression (GWR) can be used when there is spatial autocorrelation in the residuals from the regression, or the regression coefficients might change from one location to another (e.g., the regression coefficients are not stationary); it is critical to fulfilling spatial autocorrelation analysis before applying GWR methods.

Moran's Index is the most commonly used method to measure global spatial autocorrelation and quantify the similarity of outcome variables between regions defined as spatially related (Fu et al. 2014). It can be applied to detect the beginning of spatial randomness. The beginning of spatial randomness indicates spatial patterns such as grouping or forming trends towards space. The value generated in the Moran Index calculation ranges from -1 to 1. The value of the zero-value index is not grouped, the positive Moran's Index value indicates positive spatial autocorrelation which means that adjacent locations have similar and grouped values; the negative Moran's Index value indicates negative spatial autocorrelation which means that adjacent locations have different values (Pfeiffer et al. 2008). According to Lee and Wong, the Moran’s Index can be obtained using Eq. (6) (Lee and Wong 2001).

$$I=\frac{n}{{S}_{0}}\frac{{\sum }_{i=1}^{n} {\sum }_{j=1}^{n} {W}_{ij}({x}_{i}-\underline{x})({x}_{j}-\underline{x})}{{\sum }_{i=1}^{n} {({x}_{i}-\underline{x})}^{2}}$$

(6)

where S is the variation, x_i is the input data of pixel i, and x_j is the input data of pixel j, $\underline{x}$ is the average value, n is the number, and W_ij is the aggregate of all spatial weights. If pixel i and pixel j are adjacent, the value of corresponding elements in the matrix W_ij is 1, otherwise it is 0.

The hypothesis testing of parameter i is carried out as follows:

H0: there is no spatial autocorrelation.

H1: there is positive autocorrelation (Moran Index is positive) or.

H1: there is negative autocorrelation (Moran's I index is negative).

Although GWR captures any spatial heterogeneity in relationships, it does so under the assumption that all such relationships vary at the same spatial scale across all covariates. MGWR is a significant improvement over GWR because it relaxes the “same spatial scale” assumption and allows covariate‐specific bandwidths to be optimized. It is formulated as (Fotheringham et al. 2017):

$${y}_{i}={\sum }_{j=0}^{m} {\beta }_{bwj}({u}_{i},{v}_{i}){x}_{ij}+{\varepsilon }_{i}$$

(7)

where bwj in ${\beta }_{bwj}$ indicates the bandwidth used for calibration of the jth conditional relationship. MGWR, thus allows different processes to operate at different spatial scales by deriving separate bandwidths for the conditional relationships between the response variable and different predictor variables. MGWR is calibrated using a back‐fitting algorithm as described in Fotheringham et al. (2017). The back‐fitting process is initialized with GWR parameter estimates. Based on these initial values, the calibration process works in an iterative manner and during each iteration, all local parameter estimates, and optimal bandwidths are evaluated. Iteration terminates when the difference between the parameter estimates from successive iterations converges to a specified threshold (we selected 1e‐5 in this study) (Fotheringham, et al. 2017, Oshan et al. 2020).

The election of bandwidth is a trade-off process since the spatial range in which the data points can affect each other. If the bandwidth is too large, the model cannot reflect the spatial non-stationary of the correlation between the dependent variable and independent variables, which will cause large bias in the local estimates (i.e., if the bandwidth = N of the dataset, the model will not have any spatial heterogeneity and will be the same as OLS model). Also, too small a bandwidth can lead to a large variance in the local estimates. In this study, associations between health conditions and factors that influence them to have spatial heterogeneities because of the differences in lifestyle, diet, income, etc. across the US.

3 Results

3.1 Results of OLS and spatial autocorrelation

The detailed information of the dependent and independent variables can be found in Table 2. The results of the OLS model for the years 2005 and 2015 are illustrated in Tables 3 and 4.

Table 3 The statistical analysis results of ordinary least squares regression model using datasets collected in 2005

Full size table

Table 4 The statistical analysis results of ordinary least square estimation using the datasets collected in 2015

Full size table

The results of calibrating the model for each disease are shown in Tables 3 and 4, which include normalized parameter estimates, significant test results, and spatial autocorrelation statistics results. The p values from the results are low, which indicated that the variables have significant correlation with both diseases, except the urbanization variable (see Table 4). According to Table 4, the R² value of diabetes reaches 0.776, which is relatively high, indicating the model performance of diabetes variable in 2015 is good. For heart disease, the R² values are relatively low (e.g., around 0.5) for 2015 data. For the 2005 data, the R² is low for both diseases. The results also show that there is less than 1% likelihood that this clustered pattern could be the result of random chance, which indicates spatial heterogeneity among the independent variables. Therefore, the MGWR should be used for more advanced analysis.

3.2 Results of spatial coefficient analysis using MGWR

Due to the limitations that have been shown in OLS analysis, we applied MGWR with the same dependent and independent variables to observe the spatial variations of heart and diabetes diseases for 2005 and 2015 data. The statistical analysis results of the MGWR model are summarized in Tables 5 and 6. Figures 3 and 4 illustrate the spatial distribution of R² for both years’ data. According to Fig. 3, most of the areas received high R² values except small areas in New York, Ohio, Texas, and Indiana, which indicates the high explanation rate of the model in the majority areas of the country. For the 2015 data, the results are similar except Florida and Michigan states received low R² values. The spatial interpretation of the coefficient analysis results will be explained in the results section. When visualizing the local coefficients of selected variables spatially, we excluded counties (colored in grey) where the variable is statistically non-significant (p > 0.1) locally.

Table 5 Statistical analysis results of MGWR estimation using the datasets collected in 2005

Full size table

Table 6 Statistical analysis results of MGWR estimation using the datasets collected in year 2015

Full size table

According to Tables 5 and 6, MGWR produced better performance than the OLS model since both R² and AIC_c values have been improved significantly. The MGWR models produced a higher R² value and lower AIC_c value compared to the OLS models.

The spatial coefficient analysis results of heart disease in the years 2005 and 2015 are shown in Figs. 5 and 6. For heart disease, the obesity variable shows statistical significance across the entire country in 2005, but not for 2015. The drinking habit seemed to negatively correlate with heart disease in the pacific coast area in 2005. However, this impact has increased a lot during 2015, especially for most areas of the Southeast and intersection areas between Southeast, Midwest, and Southwest of the U.S. According to Figs. 5 and 6, smoking will dramatically increase the probability of getting heart disease in the gulf, pacific, and east coasts of the U.S. in 2005. However, the situation has improved for the entire country between 2005 and 2015 since the coefficient value decreased in 2015 compared to 2005. Lack of physical exercise has been the leading factor for heart disease in east coast areas in 2005. The correlation has become more evident in 2015, especially for the states such as Virginia, North Carolina, New Mexico, and Colorado. Furthermore, the income negatively correlates to heart disease in many mid-west and southeast places of the U.S. and California. Finally, east coastal areas show a positive correlation between the urban cover and heart disease in 2005 and the results of the urban cover variable were not statistically significant for the 2015 data.

These two datasets (2005 and 2015 data sets) produced similar spatial patterns for diabetes analysis (see Figs. 7 and 8). For example, obesity has been a leading risk factor of diabetes in the east, south-central, and south Atlantic regions of the U.S. On the other hand, smoking and alcohol consumption was the primary concern in the northern part of the U.S., such as North and South Dakota, Montana, and Wyoming in 2005. In 2015, alcohol consumption level was improved, but the smoking level remained the same in those regions and showed a more significant impact on diabetes in the neighborhood regions. During the ten-year period, lack of physical exercise has become a significant risk factor of diabetes in the Northeast and West parts of the U.S.

Furthermore, Tables 5 and 6 show the standard deviation of the coefficients of variables have changed from 2005 to 2015, but the trends are different between heart disease and diabetes. The standard deviation of lifestyle variables (drinking, smoking, obesity, and inactivity) has a considerable increase during the ten years, while income and urban area variables had a decreased standard deviation. However, only the standard deviation of the inactivity and smoking variable showed an increase in diabetes.

4 Discussion

The impact of the variables on both diseases exhibits strong spatial heterogeneity across the country. The direction of most variables is in line with our hypothesis, except for the moderate alcohol consumption variable. Moderate alcohol consumption has a negative coefficient for both diseases, which is a controversial finding. However, according to our dataset, moderate drinking is defined as “among all adults aged 21 and older, the proportion who have had, on average, more than one (for women) or two (for men) alcoholic drinks per day during the previous month.” And this is the same criteria for moderate drinking as defined by the Dietary Guidelines for Americans: 2015–2020. The previous study already shows that there was a significant decreasing trend in diabetes risk as alcohol consumption increased, and the risk of diabetes is especially lower with an alcoholic consumption of 8–14 drinks/week (He et al. 2019). Similarly, Zhang et al. (2017) did cohort research in China that also found that men who consumed 20.01– 40 g of ethanol per time with less than 5 times per week had a 24% lower risk of coronary heart disease (CHD) incidence compared with non-drinkers.

Furthermore, different variables may share a similar spatial pattern of the correlation coefficients in the same disease across ten years. For instance, drinking and smoking variables have high coefficient values concentrated in northern states for diabetes (e.g., in Montana, North Dakota, and Wyoming), while obesity has high coefficients in South Atlantic states, such as Florida and Louisiana. These patterns may result from the low population density, shared culture and climate shared in these states. Also, the relationship of physical inactivity variable to both diseases showed similar Spatio-temporal patterns from 2005 to 2015. For diabetes, whereas the low coefficients of physical inactivity are concentrated in West North Central states in 2005, they are concentrated in South Atlantic states for diabetes, such as Texas, Louisiana, and Florida in 2015. The same trend can be found in the results of heart disease, which means the influence of physical activity on both diseases is very similar. Another important finding is that urban land cover is not significant at all areas for the death rate of heart disease, which is out of our expectation.

By comparing the analysis results between 2005 and 2015, we captured the temporal trend of the spatial distribution of coefficients for both diseases. We found that the inactivity variable played a more important role from 2005 to 2015 for both diseases. On the other hand, the coefficient of obesity has decreased from 2005 to 2015 for both diseases. However, it does not mean the obesity had a lower effect on both diseases in 2015 than in 2005 as the exact value of the coefficients are not comparable between different years. In other words, the lower coefficient value only means the obesity has a relatively lower impact than other risk factors (e.g., inactivity) in 2015. This study also shows the spatial variations of different independent variables for both diseases across the country. Notably, the effect of income on the prevalence of diabetes became more uniform across the country between 2005 and 2015. The absolute values of the income coefficients also decreased across the country, indicating that the impact of income on the prevalence of diabetes decreased overall. These changes might be caused by the advances in biotechnology, rising living standards, and popularizing public medical insurance.

5 Conclusion and limitations

Understanding the leading factors of heart disease and diabetes is the key to address many spatial decision problems related to disease control and public health management. Geographic Information Systems played an important role in supporting spatial decision-making in various application domains (Zhang et al., 2017; Zhang et al., 2021A; Zhang et al., 2021B; Zhang et al., 2014). This article introduced an MGWR model into the risk factors analysis of heart disease and diabetes. We used urbanization rate, obesity, and healthy living habits such as moderate drinking and smoking habits, and the frequency of physical activities as independent variables to evaluate the impacts of variables on heart disease and diabetes. The MGWR model is applied to eight census divisions of the United States at the county level, including New England, Middle Atlantic, East North Central, West North Central, South Atlantic, East South Central, and West South-Central Mountain. The analysis results illustrate the spatial variations of different risk factors for these two diseases. The findings can inform the development of an intelligent decision support system for Federal and State agencies to facilitate the allocation of resources to combat these diseases and predict future heart disease and diabetes rates based on adjusted risk factor levels. The findings can also inform public education campaigns.

The major limitation of this study is the lack of measurement of other types of risk factors of type-2 diabetes and heart disease, such as high blood pressure, high cholesterol, etc. Although the MGWR can catch the spatial heterogeneity, the area-based characteristics are not a good representation of individual characteristics as each county has been impacted by the surrounding counties. Furthermore, the MGWR model can serve as a good solution to review the spatial distribution and historical trend of the coefficients of risk factors. In the future, a predictive model can be developed to forecast the future trend based on the MGWR results.

Availability of data and material

Urban land cover dataset: https://climate.esa.int/en/odp/#/project/land-cover Smoking, drinking and obesity dataset: https://vizhub.healthdata.org/subnational/usa Heart disease and diabetes: https://nccd.cdc.gov/DHDSPAtlas/.

Notes

The diabetes mentioned in this study are all type-2 diabetes.

References

Akter, S., Goto, A., & Mizoue, T. (2017). Smoking and the risk of type 2 diabetes in Japan: A systematic review and meta-analysis. J Epidemiol, 27(12), 553–561.
Article Google Scholar
Ali, K., Partridge, M. D., & Olfert, M. R. (2007). Can geographically weighted regressions improve regional analysis and policy making? Int Reg Sci Rev, 30(3), 300–329.
Article Google Scholar
Allender, S., Foster, C., Hutchinson, L., & Arambepola, C. (2008). Quantification of urbanization in relation to chronic diseases in developing countries: A systematic review. J Urban Health, 85(6), 938–951.
Article Google Scholar
Arsenault, B. J., Rana, J. S., Lemieux, I., Despres, J. P., Kastelein, J. J. P., Boekholdt, S. M., & Khaw, K. T. (2010). Physical inactivity, abdominal obesity and risk of coronary heart disease in apparently healthy men and women. International journal of obesity, 34(2), 340–347.
Article Google Scholar
Bacha, C. J. (2003). The determinants of reforestation in Brazil. Appl Econ, 35(6), 631–639.
Article Google Scholar
Barengo, N. C., Teuschl, Y., Moltchanov, V., Laatikainen, T., Jousilahti, P., & Tuomilehto, J. (2017). Coronary heart disease incidence and mortality, and all-cause mortality among diabetic and non-diabetic people according to their smoking behavior in Finland. Tob Induc Dis, 15(1), 1–8.
Article Google Scholar
Batisani, N., & Yarnal, B. (2009). Urban expansion in Centre County, Pennsylvania: Spatial dynamics and landscape transformations. Appl Geogr, 29(2), 235–249.
Article Google Scholar
Bickford, S. A., & Laffan, S. W. (2006). Multi-extent analysis of the relationship between pteridophyte species richness and climate. Glob Ecol Biogeography, 15(6), 588–601.
Article Google Scholar
Brunström, M., & Carlberg, B. (2018). Association of blood pressure lowering with mortality and cardiovascular disease across blood pressure levels: A systematic review and meta-analysis. JAMA Intern Medi, 178(1), 28–36.
Article Google Scholar
CDC- Centers for Disease Control and Prevention. (2020). National diabetes statistics report. Available from web: https://www.cdc.gov/diabetes/data/statistics-report/index.html. Accessed 26 Aug 2022.
Deller, S. .C., & Lledo, V. (2007). “Amenities and rural Appalachia economic growth.” Agri Res Econ Rev, 36(1203 2016 95353), 107–132.
Google Scholar
Dwyer-Lindgren, L., Mokdad, A. H., Srebotnjak, T., Flaxman, A. D., Hansen, G. M., & Murray, C. J. L. (2014). Cigarette smoking prevalence in US counties: 1996–2012. Popul Health Metrics, 12(1), 5.
Article Google Scholar
Eaton, S. B., & Eaton, S. B. (2017). Physical inactivity, obesity, and type 2 diabetes: An evolutionary perspective. Res Q Exerc Sport, 88(1), 1–8.
Article Google Scholar
Fei, Y., He, Y., Sun, L., Chen, J., Lou, Q., Bao, L., & Cha, J. (2016). The study of diabetes prevalence and related risk factors in Fuyang, a Chinese county under rapid urbanization. Int J Diabetes in Dev Ctries, 36(2), 213–219.
Article Google Scholar
Foody, G. M. (2004). Spatial nonstationarity and scale-dependency in the relationship between species richness and environmental determinants for the sub-Saharan endemic avifauna. Glob Ecol Biogeography, 13(4), 315–320.
Article Google Scholar
Ford, M. M., & Highfield, L. D. (2016). Exploring the spatial association between social deprivation and cardiovascular disease mortality at the neighborhood level. PLoS ONE, 11(1), e0146085.
Article Google Scholar
Fotheringham, A. S., Charlton, M. E., & Brunsdon, C. (1998). Geographically weighted regression: A natural evolution of the expansion method for spatial data analysis. Environ Plann A, 30(11), 1905–1927.
Article Google Scholar
Fotheringham, A. S., Charlton, M. E., & Brunsdon, C. (2001). Spatial variations in school performance: A local analysis using geographically weighted regression. Geogr Environ Model, 5(1), 43–66.
Article Google Scholar
Fotheringham, A. S., Yang, W., & Kang, W. (2017). Multiscale geographically weighted regression (MGWR). Annals of the Ame Assoc Geogr, 107(6), 1247–1265.
Google Scholar
Fu, W. J., Jiang, P. K., Zhou, G. M., & Zhao, K. L. (2014). Using Moran’s I and GIS to study the spatial pattern of forest litter carbon density in a subtropical region of southeastern China. Biogeosciences, 11(8), 2401–2409.
Article Google Scholar
Geri, F., Amici, V., & Rocchini, D. (2010). Human activity impact on the heterogeneity of a Mediterranean landscape. Appl Geogr, 30(3), 370–379.
Article Google Scholar
He, X., Rebholz, C. M., Daya, N., Lazo, M., & Selvin, E. (2019). Alcohol consumption and incident diabetes: The Atherosclerosis Risk in Communities (ARIC) study. Diabetol, 62(5), 770–778.
Article Google Scholar
Heron, M. . P. (2019). “Deaths: leading causes for 2017.” Natl Vital Stat Rep, 68, 1–77.
Google Scholar
Holst, C., Becker, U., Jørgensen, M. E., Grønbæk, M., & Tolstrup, J. S. (2017). Alcohol drinking patterns and risk of diabetes: A cohort study of 70,551 men and women from the general Danish population. Diabetol, 60(10), 1941–1950.
Article Google Scholar
Hu, B., Qiu, W., Xu, C., & Wang, J. (2020). Integration of a Kalman filter in the geographically weighted regression for modeling the transmission of hand, foot and mouth disease. BMC Pub Health, 20, 1–15.
Article Google Scholar
Institute for Health Metrics and Evaluation (IHME) (2014). United States Smoking Prevalence by County 1996-2012. Seattle, United States of America: Institute for Health Metrics and Evaluation (IHME).
Institute for Health Metrics and Evaluation. (2021). US Health Map. Available from web: https://vizhub.healthdata.org/subnational/usa. Accessed 16 Aug 2022.
Jiang, C., Yang, Z., Wen, M., Huang, L., Liu, H., Wang, J., & Zhuang, C. (2021). Identifying the spatial disparities and determinants of ecosystem service balance and their implications on land use optimization. Science of The Total Environment, 793, 148472.
Kumar, R., Singh, M. C., Ahlawat, S. K., Thakur, J. S., Srivastava, A., Sharma, M. K., Malhotra, P., Bali, H. K., & Kumari, S. (2006). Urbanization and coronary heart disease: A study of urban-rural differences in northern India. Indian Heart J, 58(2), 126–130.
Google Scholar
Lee, J., & Wong, D. W. (2001). Statistical analysis with ArcView GIS. John Wiley and Sons.
Google Scholar
Leung, Y., Mei, C.-L., & Zhang, W.-X. (2000). Statistical tests for spatial nonstationarity based on the geographically weighted regression model. Environ Plann A, 32(1), 9–32.
Article Google Scholar
Li, D., Chaudhary, H., & Zhang, Z. (2020). Modeling spatiotemporal pattern of depressive symptoms caused by COVID-19 using social media data mining. Int J Environ Res Pub Health, 17(14), 4988.
Article Google Scholar
Maty, S. C., Everson-Rose, S. A., Haan, M. N., Raghunathan, T. E., & Kaplan, G. A. (2005). Education, income, occupation, and the 34-year incidence (1965–99) of type 2 diabetes in the Alameda County Study. Int J Epidemiol, 34(6), 1274–1281.
Article Google Scholar
McMillen, D. P. (1996). One hundred fifty years of land values in Chicago: A nonparametric approach. J Urban Econ, 40(1), 100–124.
Article Google Scholar
Oshan, T. M., Smith, J. P., & Fotheringham, A. S. (2020). Targeting the spatial context of obesity determinants via multiscale geographically weighted regression. Int J Health Geogr, 19, 1–17.
Article Google Scholar
PAN, W. H., Cedres, L. B., Liu, K., Dyer, A., Schoenberger, J. A., Shekelle, R. B., Stamler, R., Smith, D., Collette, P., & Stamler, J. (1986). Relationship of clinical diabetes and asymptomatic hyperglycemia to risk of coronary heart disease mortality in men and women. Ame J Epidemiol, 123(3), 504–516.
Article Google Scholar
Pfeiffer, D., Robinson, T. P., Stevenson, M., Stevens, K. B., Rogers, D. J., & Clements, A. C. (2008). Spatial analysis in epidemiology. Oxford University Press.
Book Google Scholar
Siordia, C., Saenz, J., & Tom, S. E. (2012). An introduction to macro-level spatial nonstationarity: A geographically weighted regression analysis of diabetes and poverty. Hum Geogr, 6(2), 5.
Google Scholar
Steven Manson, Jonathan Schroeder, David Van Riper, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 14.0 Minneapolis, MN: IPUMS. 2019. https://doi.org/10.18128/D050.V14.0
Vintage, C. (2012). "bridged-race postcensal population estimates [File pcen_v2013_y13. sasbdat]. Hyattsville, MD: CDC." National Center for Health Statistics.
Vogel, R. A. (2019). Alcohol, heart disease, and mortality: A review. Rev Cardiovasc Med, 3(1), 7–13.
Google Scholar
Waller, L. A., Zhu, L., Gotway, C. A., Gorman, D. M., & Gruenewald, P. J. (2007). Quantifying geographic variations in associations between alcohol distribution and violence: A comparison of geographically weighted regression and spatially varying coefficient models. Stochastic Environ Res Risk Assess, 21(5), 573–588.
Article Google Scholar
Xiang, L., Su, Z., Liu, Y., Zhang, X., Li, S., Hu, S., & Zhang, H. (2018). Effect of family socioeconomic status on the prognosis of complex congenital heart disease in children: An observational cohort study from China. The Lancet Child Adolesc Health, 2(6), 430–439.
Article Google Scholar
Yang, W. (2014). An extension of geographically weighted regression with flexible bandwidths. Available from web: https://research-repository.st-andrews.ac.uk/handle/10023/7052. Accessed 26 Aug 2022.
Yu, D., & Wu, C. (2004). Understanding population segregation from Landsat ETM+ imagery: A geographically weighted regression approach. Giscience and Remote Sensing, 41(3), 187–206.
Article Google Scholar
Zhang, Z., Demšar, U., Rantala, J., & Virrantaus, K. (2014). A fuzzy multiple-attribute decision-making modelling for vulnerability analysis on the basis of population information for disaster management. International Journal of Geographical Information Science, 28(9), 1922–1939.
Article Google Scholar
Zhang, Y., Yu, Y., Yuan, Y., Yu, K., Yang, H., Li, X., Min, X., Zhang, C., He, M., & Zhang, X. (2017). Association of drinking pattern with risk of coronary heart disease incidence in the middle-aged and older Chinese men: Results from the Dongfeng-Tongji cohort. PLoS ONE, 12(5), e0178070.
Article Google Scholar
Zhang, Z., Zou, L., Li, W., Usery, L., Albrecht, J., & Armstrong, M. (2021A). Cyberinfrastructure and intelligent spatial decision support system. Trans GIS, 25(4), 1651–1653.
Article Google Scholar
Zhang, Z., Yin, D., Virrantaus, K., Ye, X., & Wang, S. (2021B). Modeling Population Dynamics: An Object-Oriented Space-Time Composite Model based on Social Media and Urban Infrastructure Data. Comput Urban Sci, 1(1), 1–13.
Article Google Scholar

Download references

Acknowledgements

Thanks to Karina Riches for proofreading the article.

Code availability

Not Applicable.

Funding

Not Applicable.

Author information

Authors and Affiliations

Department of Geography, Texas A & M University, College Station, TX, 77843, USA
Wencong Cui, Nanzhou Hu, Shuyang Zhang, Diya Li, Luis Martinez, Daniel Goldberg, Burak Güneralp & Zhe Zhang

Authors

Wencong Cui
View author publications
You can also search for this author in PubMed Google Scholar
Nanzhou Hu
View author publications
You can also search for this author in PubMed Google Scholar
Shuyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Diya Li
View author publications
You can also search for this author in PubMed Google Scholar
Luis Martinez
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Goldberg
View author publications
You can also search for this author in PubMed Google Scholar
Burak Güneralp
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Wencong Cui: Writing- Original draft preparation, Visualization, Investigation, Conceptualization, Methodology, Software. Nanzhou Hu: Methodology, Software, Data Preprocessing, Investigation, Writing-Partial draft writing. Shuyang Zhang: Writing-Partial draft writing, Data Preprocessing. Diya Li: Data Processing, Visualization. Luis Martinez, Daniel Goldberg, Burak Güneralp: Writing- Proofreading and editing. Zhe Zhang: Writing- Reviewing and editing, Investigation, Conceptualization, Methodology.The authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhe Zhang.

Ethics declarations

Competing of interest

Prof. Zhe Zhang is a EBM of Computational Urban Science. She was not involved in the peer-review or handling of the manuscript. The authors have no other competing interests to disclose.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cui, W., Hu, N., Zhang, S. et al. Analyzing spatial variations of heart disease and type-2 diabetes: A multi-scale geographically weighted regression approach. Comput.Urban Sci. 2, 34 (2022). https://doi.org/10.1007/s43762-022-00059-6

Download citation

Received: 15 May 2022
Accepted: 23 August 2022
Published: 24 September 2022
DOI: https://doi.org/10.1007/s43762-022-00059-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Analyzing spatial variations of heart disease and type-2 diabetes: A multi-scale geographically weighted regression approach

Abstract

Similar content being viewed by others

Geographically Weighted Regression Analysis of Cardiovascular Diseases: Evidence from Canada Health Data

Individual and contextual correlates of cardiovascular diseases among adults in the United States: a geospatial and multilevel analysis

Spatial analysis of cardiovascular mortality and associated factors around the world

1 Introduction