Background

Dengue is an arboviral disease characterized by fever and vascular complications and is endemic in many tropical and subtropical regions of the world [1]. Within the global efforts to control the disease, representation of the problem in space and time is key to supporting disease surveillance systems [2]. Spatial and spatio-temporal models are useful for generating risk maps for dengue disease, supporting early warning systems and intervention programs [2, 3].

Disease mapping tools correlated dengue incidence with socioeconomic, demographic and environmental variables using the Moran index in Brazil [4], while its geographical distribution has been characterized using spatial statistics and geographical information system (GIS) analysis in Ecuador [5]. Dengue disease mapping has also been combined with surveillance and monitoring of the Aedes aegypti vector in the Middle East [6], and in Peru, investigators have explored the association between dengue and clinical, meteorological, climatic, and sociopolitical variables through fuzzy association rule mining in a spatial setting [7]. At a micro-regional scale, Lowe et al. [8] included temperature, rainfall, the El Niño Southern Oscillation index, and other relevant socioeconomic and environmental variables in a spatio-temporal Bayesian hierarchical model implemented with Markov Chain Monte Carlo (MCMC), generating predictions at spatial and temporal levels and supporting a dengue alert system. Honorato et al. [9] studied the relationship between risk of dengue and sociodemographic variables using Bayesian spatial regression models in the municipalities of Espírito Santo, Brazil, while Ferreira and Smith [10] modeled the number of cases of dengue fever in Rio de Janeiro, Brazil, considering the cases as a Poisson random variable, with conditional autoregressive (CAR) priors in the spatial random effects, testing different neighborhood structures and covariates with fixed coefficients.

Colombia is highly endemic for dengue disease. From 2000 to 2011, the country experienced a stable annual incidence of dengue, with major outbreaks in 2001–2003 and 2010, followed by a considerable decrease of incidence in 2011, with cases mainly occurring in children (\({<}15\) years of age) and the highest incidence in 2009 in infants (\({<}1\) year of age) [11]. Small scale studies using dengue reports have investigated the spatial autocorrelation of dengue cases [12] and the association between dengue and satellite environmental data using spatially stratified tests of ecological niche models [13]. Hagenlocher et al. [14] performed a spatial assessment of current socioeconomic vulnerabilities to dengue fever in 340 neighborhoods of a Colombian city through a spatial approach that included expert-based and purely statistical-based modeling of current vulnerability levels using a GIS.

At national level, Quintero et al. [15] used epidemiological surveillance data (weekly cases) and Poisson regression models to assess the influence of the El Niño Southern Oscillation index and pluviometry on dengue incidence, adjusting by year and week.At a regional scale, Cadavid-Restrepo et al. [16] explored the variation in spatial distribution of notified dengue cases in Colombia from 2007 to 2010, exploring associations between the disease and selected environmental risk factors through a Bayesian spatio-temporal conditional autoregressive model. The results elucidate the role of environmental risk factors in the spatial distribution of dengue disease, explaining how these factors can be used to develop and refine preventive approaches for dengue in Colombia. All these studies are strategic to the research of surveillance and control of dengue disease, demonstrating the importance of representation of the disease in space and time.

In Colombia, dengue epidemiological surveillance is based in passive notification of cases, coordinated by the ‘Instituto Nacional de Salud’ (Colombia National Institute of Health) [17], and supports monitoring, prediction, risk factor identification and intervention measures. Even though the surveillance network works adequately, and provides information to all the national institutions involved in dengue control, we appreciate that disease mapping techniques currently developed and employed for many health problems are not widely applied.

We selected the Colombian city of Bucaramanga to use Bayesian areal disease mapping models, testing the challenges and difficulties of the approach to dengue disease mapping. Bucaramanga was one of the cities with the highest dengue incidence in Colombia by year during the period 2008–2015. We estimated the relative risk of dengue disease, applying Bayesian spatial areal models to dengue case counts and satellite covariates (normalized difference vegetation index (NDVI) and land surface temperature (LST)) with fixed and space-varying coefficients, at a small spatial scale and with global and annual aggregation time scales, for the 2008–2015 period. Ideally, we would use data on the vector presence, distribution and ecology, but that kind of data does not exist for the city of Bucaramanga at the aggregation level of the study. We relied on satellite images to search associations between dengue incidence and environmental data. In addition, we provided a Bayesian model to estimate the Cohen’s Kappa measure of agreement for the interpretation of the change in relative risk of dengue between the global and annual time scales.

Methods

Cases of dengue disease from Bucaramanga, Colombia

The city of Bucaramanga, Colombia is located at coordinates \(7^{\circ }7'07''\)N–\(73^{\circ }06'58''\) W, at 959 m above sea level. It covers an urban area of 27 km\(^{2}\) and it has a population of 527,913 people in 2016, living in 220 neighborhoods nested in 17 communes. While Colombia presented an incidence rate of 436 cases per 100,000 persons in 2010, Bucaramanga reported an incidence rate of 1359.1 per 100,000 persons. We obtained data on incident cases of dengue disease (dengue and severe dengue) from the SIVIGILA (public health surveillance system) for the urban area of Bucaramanga for the period from January 2008 to December 2015.

We geocoded and allocated every case of dengue disease to one of 293 Bucaramanga census sections (geographical unit composed by approximately 20 closed city blocks) according to the cartography of the 2005 census from the national geostatistical framework of the ‘Departamento Nacional de Estadística’ (Colombia national statistics office) [18]. For geocoding purposes, we started with a database of 30,063 cases corresponding to the notified dengue cases from health institutions in Bucaramanga to the surveillance system. The cases were obtained from the database checked for duplicates reported to the surveillance system. The dengue cases data included address, sex, age and an identification code which anonymized the name and personal identification of the case to the geocoder. From this database, we selected the cases with address of residence belonging to Bucaramanga. We discarded cases without address, cases with rural address and wrong addresses. Then, an R [19] script sent batches of addresses to the web geocoding service of ArcGIS server. The returned geocoding were checked and accepted, or revised for a new geocoding cycle. At the end of the process, we succesfully geocoded to the urban area of Bucaramanga a total of 27,301 cases, which were aggregated to the spatial scale defined above, therefore our data does not relate to an identifiable natural person thus the data subject is not identifiable.

The cases aggregated by census section were aggregated along two time scales: a global scale, running for the entire study period (2008–2015); and an annual scale, resulting in eight respective datasets for each included year.

We obtained disaggregated data by census section, sex and five-years age groups from the 2005 census and calculated annual and global crude incidence rates according to these variables. We calculated expected values for dengue case counts by multiplying the global and annual crude incidence of dengue times the population by sex and age at census section level.

Satellite images for normalized difference vegetation index

We used satellite raster images obtained from Landsat Surface Reflectance (SR) Enhanced Thematic Mapper (ETM) 7, bands 3 and 4 (60 m resolution) for the years 2008, 2009, 2010, 2011; and from Landsat SR Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) 8, bands 4 and 5 (30 m resolution) for years 2013, 2014, and 2015 (Landsat Surface Reflectance products courtesy of the U.S. Geological Survey). We selected Landsat multispectral images based on those with the least cloud cover. Images covered the city of Bucaramanga and were taken from row 55, path 8 or path 7, with Universal Transversal Mercator (UTM) projection, zone 18 North and datum WGS-84. From Landsat SR ETM 7, we selected images from January 2, 2008; January 27, 2009; January, 14 2010; and February 2, 2011. From Landsat SR OLI-TIRS 8, we chose images from June 16, 2013; January 10, 2014; and January 4, 2015. We did not find any suitable Landsat images for the year 2012.

Satellite images for land surface temperature

We use Moderate Resolution Imaging Spectroradiometer (MODIS) satellite raster images from the MOD11A2 version 6 product [20] to obtain mean 8-day, per-pixel land surface temperature (LST), in a 1200 km \(\times\) 1200 km grid. Each pixel value in the MOD11A2 is a simple average of all the corresponding MOD11A1 LST pixels collected within that 8-day period, with a pixel size of 1000 m \(\times\) 1000 m. We selected the ‘day time surface temperature’ band.

Image processing

For the Landsat SR 7 ETM raster images, we calculated a composite NDVI raster image for the annual satellite images using band 4 [near infra red (NIR)] and band 3 (red). For the Landsat SR 8 OLI-TIRS raster images, we calculated a composite NDVI raster image for the annual satellite images using band 5 NIR and band 4 (red). We applied the formula NDVI = (NIR band − red band)/(NIR band + red band), following Yuan et al. [21]. Due to the absence of good quality images for 2012, we created a composite NDVI image for 2012 by pixel averaging of the composite NDVI images for 2011 and 2013.

To obtain an NDVI value by census section, we superimposed a mask comprised by the polygons (census sections) from the shape file of the city of Bucaramanga, onto the composite NDVI raster image, calculating the NDVI pixel mean by census section to the composite NDVI annual satellite images. We also produced a composite NDVI image at global aggregation scale (2008–2015 period) by pixel averaging by census section of all the composite NDVI images per year. For the composite NDVI image at global scale, we followed the same masking procedure to produce NDVI pixel mean by census section applied to the composite NDVI images by year.

For the MODIS LST raster images, we first reprojected the images from sinusoidal projection to UTM 18N projection, datum WGS84, and resampled to 30 m using the Modis reprojection tool (MRT) software [22]. We then created composite LST raster images per epidemiological period by pixel averaging all the reprojected and resampled images available in every epidemiological period. We generated composite LST images per year by pixel averaging all the composite LST raster images per epidemiological period. For every composite LST raster image per year, we applied a mask comprised by the polygons (census sections) from the shape file of the city of Bucaramanga, and calculated the average LST by polygon (census section).

Finally, we created a composite LST image at global aggregation scale (2008–2015 period) by pixel averaging all the composite LST images per year. We produced LST average by census section in the composite LST image at global scale, following the same masking process to obtain LST average by census section per year. Raster image processing was done using the R software version 3.3 [19] with the raster package version 2.5-8 [23].

Statistical models

Disease mapping is the area of epidemiology that estimates the spatial pattern in disease risk over an extended geographical region in order to identify areas at high risk [24]. Besag et al. [25] developed a Bayesian hierarchical model for spatial analysis of areal data. Let \(\theta _{i}\) be the log relative risk of a contagious disease with low transmission, \(O_{i}\) the observed number of cases and \(E_{i}\) the expected number of cases in area i. Assuming \(O_{i}\) are independent Poisson variables with mean \(E_{i}\)exp\((\theta _{i})\), where exp\((\theta _{i})\) is the relative risk of the disease and the linear predictor \(\theta = t + u + v\) is adopted, where t is a term associated with measured covariates; and u and v are surrogates for unknown observed covariates. The \(u_{i}\)’s represent variables with spatial structure and the \(v_{i}\)’s represent spatially unstructured variables. In the hierarchical framework, prior probability distributions are assigned to u and v. For v, Normal prior with zero mean and variance \(\lambda\), and for u,

$$\begin{aligned} p(u)\propto \left\{ -\sum _{i<j}\zeta _{ij}\phi (u_{i}-u_{j}) \right\} , \quad u \in \mathcal {R} \end{aligned}$$
(1)

where \(\phi\) is a function of pairwise differences among u’s, and \(\zeta _{ij}\) are weights equal to zero for i non-contiguous to j. Besag et al. [25] consider two options for \(\phi\): \(\phi (z)=z^{2}/2\kappa\) or \(\phi (z)=|z|/\kappa\), where \(\kappa\) is an unknown constant. Choosing the first options for \(\phi\), the conditional structure of u follows

$$\begin{aligned} p(u|\kappa )\propto \frac{1}{\kappa ^{n/2}}\left\{ -\frac{1}{2\kappa }\sum _{i\sim j}\zeta _{ij}(u_{i}-u_{j})^{2} \right\} \end{aligned}$$
(2)

where \(i\sim j\) represent neighbor areas, and the model is referred to as the ‘Normal intrinsic autoregression.’ For the non-zero \(\zeta _{ij}\) several options are available, such as 1 for zones sharing a border and 0 otherwise, or the length of the boundary between contiguous zones [10]. The model with independent normal priors for \(v_{i}\) and Normal intrinsic priors for \(u_{i}\) is known as the ‘convolution model’.

We fitted Poisson log normal models for relative risk, following Besag et al. [25], to the aggregated data at annual and global scale, with observation equation,

$$\begin{aligned} O_{i} \sim \text{ Poisson } (E_{i}e^{\theta _{i}}) \end{aligned}$$
(3)

where \(O_{i}\), \(E_{i}\) and exp(\(\theta _{i}\)) are the observed count, the expected count, and the relative risk of dengue disease, respectively, in census section i (\(i=1,\ldots ,m, m=293\)). For the linear predictor, we explored the following structures:

$$\begin{aligned} \theta _{i}= \left\{ \begin{array}{l} \alpha + u_{i} + v_{i} \\ \alpha + u_{i} + v_{i} + \beta _{1}\text{ NDVI }_{i} \\ \alpha + u_{i} + v_{i} + \beta _{2}\text{ LST }_{i}\\ \alpha + u_{i} + v_{i} + \beta _{1}\text{ NDVI }_{i} + \beta _{2}\text{ LST }_{i}\\ \end{array} \right. \end{aligned}$$
(4)

where the \(u_{i}\) are spatially correlated effects with Normal intrinsic conditional autoregressive (ICAR) priors distribution with precision parameter \(\tau _{u}\), and the \(v_{i}\) are the spatially uncorrelated effects with Normal prior distribution with zero mean and precision parameter \(\tau _{v}\). \(\beta _{j}\) (\(j=1,2\)) are normally distributed fixed coefficients for NDVI and LST, with zero mean and precision 100. Uniform (0,1) hyperpriors were assigned to \(\tau _{u}^{-1/2}\) and \(\tau _{v}^{-1/2}\).

Next, we fitted models with spatially correlated effects with Leroux [26] Normal conditional autorregressive (CAR) prior distribution and fixed coefficients for the covariates,

$$\begin{aligned} \theta _{i}=\left\{ \begin{array}{l} \alpha + w_{i}\\ \alpha + w_{i} + \beta _{1}\text{ NDVI }_{i} \\ \alpha + w_{i} + \beta _{2}\text{ LST }_{i} \\ \alpha + w_{i} + \beta _{1}\text{ NDVI }_{i} + \beta _{2}\text{ LST }_{i} \\ \end{array} \right. \end{aligned}$$
(5)

The \(w_{i}\) are the spatially correlated effects with Leroux Normal CAR prior distributions, with precision matrix \(\tau _{w}\) with Gamma(1, 0.001) hyperpriors. Finally, models were fitted with fixed and spatially varying coefficients for the covariates with Leroux CAR priors as follow:

$$\begin{aligned} \theta _{i}=\left\{ \begin{array}{l} \alpha + (\beta _{1} + b_{i,1})\text{ NDVI }_{i} + (\beta _{2} + b_{i,2})\text{ LST }_{i}\\ \alpha + (\beta _{1} + b_{i,1})\text{ NDVI }_{i}\\ \alpha + (\beta _{2} + b_{i,2})\text{ LST }_{i}\\ \end{array} \right. \end{aligned}$$
(6)

where the \(b_{i,j}\) are spatially varying coefficients for NDVI (j = 1) and LST (j = 2). For the linear predictor including two spatially varying coefficients \(b_{i,j}\), these coefficients are modelled multivariate Normal with Leroux conditional mean vector \(\mu _{i,j}\) (\(j=1,2\)) and precision matrix \(\xi _{2 \times 2}\) following Congdon (2014) [27], where \(\mu _{i,j}|\mu _{k \in \partial _{i},j} = (\rho /(1-\rho + \rho d_{i}))\sum _{k \in \partial _{i}} \mu _{k,j}\) and \(\xi _{2 \times 2}=(1-\rho + \rho d_{i})\Xi _{2 \times 2}\), where the precision matrix \(\Xi _{2 \times 2}\) is Wishart distributed with symmetric matrix S and 2 degrees of freedom.

The coefficient \(\rho\) establishes the degree of spatial structure of the spatial effects \(\mu _{i,j}\). When \(\rho\) = 1, the Leroux prior for the spatial effects implies an Normal ICAR prior, while, \(\rho\) = 0, we have an independent model [28].

For the models with linear predictor including spatially varying coefficients for only one covariate, the \(b_{i,1}\) or \(b_{i,2}\) space-varying coefficients for NDVI or LST are modelled with Leroux conditional mean vector \(\mu _{i,j}\) (\(j = 1,2\)) and precision \(\tau _{b_{j}}\) (j = 1 for NDVI or j = 2 for LST), with Gamma(1,0.001) hyperpriors. All models included an intercept \(\alpha\) with a diffuse improper prior .

From the Poisson log Normal models, we obtain the relative risk exp(\(\theta _{i}\)) of dengue disease by census section and calculate point-wise mean estimates and 95% credible intervals (CI). Choropleth maps were produced using the logarithm of the mean relative risk (\(\theta _{i}\)) and the mean spatially correlated effects \(u_{i}\) by census section.

Additionally, the relative risk of dengue disease by census section was discretized as low or high risk, based on the lower bound of the 95% CI, where a value of 1 or less for the lower bound of the relative risk of dengue disease by census section indicates a low risk, and values exceding 1 signify a high risk. Choropleth maps of the discretized relative risk (DRR) of dengue disease are produced at global and annual aggregation scale.

Using the DRR of dengue disease, we calculated the Cohen’s Kappa [29] coefficients for global-to-annual or annual-to-annual agreement of low-high risk by census section, using the following Bayesian model adapted from Lee and Wagenmakers [30]:

$$\begin{aligned} \kappa&= {} (\delta - \psi )/(1 - \psi )\nonumber \\ \delta&= {} \pi _{1} + \pi _{4}\nonumber \\ \psi&= {} (\pi _{1}+\pi _{2})(\pi _{1}+\pi _{3}) + (\pi _{3}+\pi _{4})(\pi _{2}+\pi _{4})\nonumber \\ \mathbf {y}&\sim {} \text{ Multinomial }([\pi _{1},\pi _{2},\pi _{3},\pi _{4}],n)\nonumber \\ \pi _{i}&\sim {} \text{ Dirichlet }(1,1,1,1) \end{aligned}$$
(7)

where \(\kappa\) is the Kappa coefficient and \(\mathbf {y}\) is the vector of counts in categories from the cross tabulation of low and high risk from global-to-annual or annual-to-annual agreement. Let year\(_{a}\) be one the eight study years and year\(_{b}\) other year not equal to year\(_{a}\), the categories are as follows: \(y_{1}\) = low risk in global scale or year\(_{a}\) and low risk in year\(_{b}\); \(y_{2}\) = low risk in global scale or year\(_{a}\) and high risk in year\(_{b}\); \(y_{3}\) = high risk in global scale or year\(_{a}\) and low risk in year\(_{b}\); and \(y_{4}\) = high risk in global scale or year\(_{a}\) and low risk in year\(_{b}\).

We first calculated the global-to-annual agreement of DRR, between the pairs of DRR from model by global aggregation scale and the models for the data at annual aggregation scale. Second, we calculated the annual-to-annual agreement of DRR between all pairs of models fitted at annual aggregation scale. For the interpretation of the Kappa coefficients, we used the categories in Table 1, from Broemeling [31].

Table 1 Degree of agreement for Kappa

Models were fitted with Markov Chain Monte Carlo (MCMC) using the WinBUGS software version 1.4 [32]. We utilize three chains with a burn-in period of 30,000 iterations, a final run of 10,000 iterations and thinning rate of 10, deriving a final sample of 1000 iterations by chain for the inference. To evaluate convergence, we check trace and density plots as well as the Gelman, Brooks and Rubin and Geweke tests [19]. Model selection was accomplished using the deviance, the deviance information criterion (DIC) and the number of effective parameters (p\(_{D}\)) [33].

Results

Summary statistics

Table 2 presents summary statistics for the dengue case counts, the standardized morbidity rate (SMR, the observed dengue cases divided by the expected dengue cases by census section), the NDVI, the LST and, the correlations between these variables, for the data aggregated at global and annual scales.

Table 2 Summary statistics for counts of dengue disease, NDVI and LST, by Bucaramanga census section, for globally and annually aggregated data, 2008–2015

For the global scale, a total of 27,301 cases (range by census section: 1–433) were reported and geocoded. The mean SMR for all census section was 1.098, with a minimum of 0.225 and a maximum of 4.05. The mean value of NDVI was 0.368, with a minimum of 0.135 and a maximum of 0.792. Mean LST for the aggregated data was \(32.1\,^\circ\)C, with a minimum of \(28.8\,^\circ\)C and a maximum of \(34\,^\circ\)C. Linear correlations between counts of cases of Dengue and NDVI (r = −0.071) and LST (r = 0.127) was weak, while, correlation between the NDVI and LST was moderate and negative (r = −0.629).

For the summary statistics at annual scale, 2010 was the year with the highest number of cases (n = 6932) followed by 2014 (n = 4956) and 2013 (n = 4839), with 2011 showing the lowest number of cases (896). The maximum number of cases in a census section ocurred in 2013 (n = 115), followed by 2010 (109) and 2014 (84). For the annual SMR, the year 2011 presented the highest average SMR (1.193) followed by 2010 (SMR = 1.143) and 2015 (SMR = 1.165).

The lowest maximum NDVI by census section corresponded to the year 2010 (NDVI = 0.757) followed, in order, by 2008 (NDVI = 0.767) and 2014 (NDVI = 0.784), while the lowest mean NDVI were for years 2008 and 2015 (mean NDVI = 0.346) followed by 2010 (mean NDVI = 0.355).

With respect to the LST, the year 2010 displayed the lowest mean temperature (\(30.7\,^\circ\)C), followed by 2015 (\(31.7\,^\circ\)C) and 2011 (\(31.8\,^\circ\)C), while for the rest of the years, the LST mean was close to \(32\,^\circ\)C.

The linear correlations between dengue case counts and NDVI or LST were low, whereas the most pronounced correlation was for 2013: between dengue and NDVI, r = 0.198; and between dengue and LST, r = −0.126. At an annual scale, the correlation between NDVI and LST was moderate and inverse for all years, with the strongest correlations for 2009 (r = −0.640) and 2012 (r = −0.629).

Model selection

Table 3 shows the selection statistics deviance, number of effective parameters (p\(_{D}\)) and DIC, for the models fitted at global and annual aggregation scales in Bucaramanga.

Table 3 Information criterion statistics, for relative risk models of dengue disease, 2008–2015

In general, the convolution models with CAR priors fitted the data better than the models with Leroux CAR priors, evidenced by the smallest deviance in the convolution models. For the data at global aggregation scale, the model with spatially correlated and uncorrelated effects with fixed coefficient for NDVI (\(u_{i} + v_{i} + \beta _{1}\)) presented the smallest DIC (DIC = 2364.6). For the year 2008, the smallest DIC was for the model with spatial effects with Leroux CAR priors and fixed coefficient for NDVI (\(\beta _{1} + w_{i}\)) (DIC = 1457.8). For the years 2009, 2010, and 2014, we selected the models with correlated and uncorrelated spatial effects plus a fixed coefficient for NDVI (\(u_{i} + v_{i} + \beta _{1}\)), which presented the smallest DIC values of 1618.4, 1916.5, and 1798.3, respectively. For the years 2011, 2012, 2013, and 2015, the selected models contained space-varying coefficients for NDVI with Leroux CAR priors and a fixed coefficient for NDVI (\(\beta _{1} + b_{i,1}\)), displaying the smallest DIC values of 1178.4, 1361.8, 1772.2, and 1629.1 respectively.

Parameter estimates of the selected model at global aggregation scale, 2008–2015

The selected model at global scale was the convolution model with fixed coefficient for NDVI (\(u_{i} + v_{i} + \beta _{1}\)). Figure 1 shows the map of the SMR logarithm, the map of the logarithm of the mean relative risk of dengue disease \(\theta _{i}\), the DRR of dengue disease, and the spatially correlated effects (\(u_{i}\)) from the model at global scale. The log mean relative risk of dengue disease shows high risk clusters in the south and north-west census sections of the city. The DRR map presents the areas where the 95% CIs for the relative risk do not include 1, and the map of spatial effects displays spatial correlation at the south of the city.

Fig. 1
figure 1

a Logarithm of the standardized morbidity rate (log SMR); b logarithm of the mean relative risk (log RR) of dengue disease; c discretized relative risk (DRR\(_{i}\)); and d mean spatial effects (SE) \(u_{i}\), by census section for the data aggregated at global scale for the period 2008–2015

Parameter estimates from models at annual aggregation scale, 2008–2015

Table 4 presents the mean point-wise parameters estimates and 95% CI from the selected models fitted at the annual aggregation scale.

Table 4 Parameter estimates (point-wise mean and 95% CI) from the selected model at annual scale, 2008–2015 period

The selected model for 2008 was the model with spatially correlated effects with Leroux CAR priors plus fixed coefficient for NDVI (\(w_{i} + \beta _{1}\)). The point-wise mean estimate for the NDVI fixed coefficient is positive (0.294), and the 95% CI include zero (−0.214, 0.814), suggesting a weak association between dengue and NDVI by census section, at the same time, the point mean estimate of \(\rho\) is 0.486, which denotes moderate spatially correlated effects \(w_{i}\) in the relative risk of dengue disease.

The convolution model plus fixed coefficient for NDVI was the selected model for the years 2009, 2010, and 2014 (\(u_{i} + v_{i} + \beta _{1}\)). The point-wise mean and 95% CI estimates of the fixed coefficients for NDVI for years 2009, 2010, and 2014 show a strong, positive association between Dengue and NDVI for 2009 (0.589; 95% C.I: 0.168, 1.024), but not for 2010 and 2014. The mean of the spatially correlated effects \(w_{i}\) (2008) and \(u_{i}\) (2009, 2010 and 2014) are presented in Fig. 2a. We observe the highest values of the mean spatial effects for the south of the city in 2008, but a scattered pattern of the mean of the spatially correlated effects for the rest of the years.

The model with fixed coefficient and space-varying coefficients for NDVI was selected for years 2011, 2012, 2013, and 2014 (\(\beta _{1} + b_{i,1}\)). The fixed coefficients for NDVI have to be accompanied by the space-varying coefficients, to be fully interpreted. The mean estimates for the fixed coefficient plus the space-varying coefficients for NDVI (\(\beta _{1} + b_{i,1}\)) are presented in the Fig. 2b. We discretized the space-varying coefficients in a similar way as the discretized relative risk (DRR). We considered the association between the NDVI and dengue by census section to be weak when the lower bound of the 95% CI was 1 or less and, to be strong otherwise (Fig. 2c). The discretized space-varying (DSV) NDVI coefficients enabled us to identify census sections where there was strong association between dengue incidence and NDVI. For 2012, 2013, and 2015, we observe a strong asssociation in 4 to 8 census sections, while for 2011, the discretized effect was so low that there were no census sections showing an association between the covariate and dengue disease.

Fig. 2
figure 2

a Mean spatial effects (SE) \(w_{i}\) (2008) and \(u_{i}\) (2009, 2010, and 2014) from the selected models at annual aggregation scale; b mean space-varying NDVI coefficients (\(\beta _{1}+b_{i,1}\)); and c discretized space-varying (DSV) NDVI coefficients (\(\beta _{1}+b_{i,1}\)) for years 2011, 2012, 2013, 2015, from models at annual aggregation scale

Mapping of relative risk of dengue disease, from models by annual scale

In this section, we begin by presenting the maps for the logarithm of the mean relative risk of dengue disease, and then we display the maps of the DRR of dengue, from the models selected by year 2008–2015.

Figure 3a presents the logarithm of the mean relative risk (Log RR) by year, for the period 2008–2015. The smoothed estimates of the Log RR let us discern patterns of the disease distribution in Bucaramanga. We observe similar patterns for 2009, 2010, and 2011, with clusters of high relative risk in the southern and northwestern census sections. For 2008, 2012, 2013 and 2015, we observe slightly fewer zones with high relative risk. Estimates of relative risk of dengue for 2014 presented a great number of high-risk census sections in the northwestern, southern and central zones of the city.

The maps of DRR of dengue disease help us to identify rapidly those census sections presenting the highest risk for each year (Fig. 3b). The areas along the southern and northwestern edges of the city show a consistent tendency towards high relative risk, while the center generally presents a low risk, with some exceptions.

Fig. 3
figure 3

a Logarithm of the mean relative risk (log RR) of dengue disease , from models at annual aggregation scale 2008–2015; b discretized relative risk (DRR) of dengue disease, 2008–2015

Kappa coefficient for the global-toa-annual and annual-to-annual agreement of DRR of dengue disease

We present estimates for the Kappa coefficient (point-wise mean and 95% CI) for global-to-annual and annual-to-annual agreement of DRR of dengue disease in Table 5. We interpreted the Kappa coefficient for agreement based on the coverage of the 95% CI over the categories of ‘degree of agreement’ from Table 1. Firstly, we established the global-to-annual agreement of DRRs of dengue disease. Secondly, we determined the annual-to-annual agreement of the DRR of dengue disease.

From Table 5, we interpreted the global-to-annual agreement in DRR at years 2008, 2009, 2012, and 2015 to be poor to fair, while the agreement was poor to slight for 2011. Agreeement of DRR of dengue disease was slight to moderate between the model at global scale and models for 2013 and 2014, and, substantial to moderate for the global scale model and the model for 2010.

For the annual-to-annual agreement of DRR, from 2009 to 2010, there was poor to fair agreeement of DRR. From years 2011 to 2012 and 2012 to 2013, the agreement of DRR for dengue disease was slight to moderate. Additionally, the agreement was slight to fair from 2010 to 2011, 2013 to 2014, and 2014 to 2015. Finally, annual-to-annual agreement of DRR between other year pairs was almost always poor to slight.

Table 5 Kappa coefficient (point-wise mean and 95% CI) for global-to-annual and annual-to-annual agreement of discretized relative risk (DRR) of dengue by census sections, from models at global and annual aggregation scales in Bucaramanga, 2008–2015

Discussion

In this study we applied Bayesian hierarchical models for spatial analysis of areal data to the estimation of relative risk of dengue disease in a Colombian city. We chose this particular city for its high incidence of dengue disease, in 2008–2015. We fitted models at global and annual scales for the study period. The hierarchical models included covariates (NDVI and LST) obtained from satellite raster images.

From the descriptive statistics, we found low correlation between the covariates and the dengue case counts by census section. NDVI and LST were moderately correlated by census sections at global and annual scales. Two main models were fitted: first the convolution model (spatially correlated effects with Normal ICAR priors, and uncorrelated spatial effects with Normal zero mean priors and precision \(\tau _{v}\)); and second, spatially correlated effects model with Leroux Normal CAR priors. We used those structures with or without covariates. The covariates effects were modeled as fixed coefficients with Normal prior distributions with zero mean and high precision, or space-varying coefficients with Leroux Normal CAR priors.

We used MCMC to fit the models, and selected the models for inference, applying DIC measures. All the selected models (at global and annual scale) included NDVI fixed coefficients (2008, 2009, 2010, and 2015), and NDVI space-varying coefficients (2011, 2012, 2013, and 2014).

The convolution model was selected at global and annual scale for 2009, 2010, and 2014. The models with spatially correlated effects with Leroux Normal CAR priors were selected for 2008, 2011, 2012, 2013, and 2015. We illustrated the relative risk of dengue disease using two types of maps. First, we produced maps of the logarithm of the mean relative risk, containing smoothed estimates of relative risk, allowing us to distinguish clusters of high relative risk at global and annual scales. Second, we created maps of DRR of dengue disease, allowing us to identify zones where the risk is higher than in zones with 95% CIs including 1.

We employed satellite images from two sources (Landsat and MODIS) to calculate NDVI and LST data at areal level (census section). We were interested in establishing association between the information from raster images and dengue incidence at global and annual scale according to census section. The selected models for inferences did include NDVI but not LST. Parameter estimates (point-wise mean and 95% CI) for the association of the NDVI and dengue disease are mainly positive; however, they only reveal a strong association between dengue and NDVI, in 2009 (95% CI not including zero). Our results were different from some results in the literature. In Costa Rica, Troyo et al. [34] found negative coefficients for NDVI suggesting an inverse association with relative risk of dengue disease. They reported NDVI from satellite images at different resolutions, finding an association between high NDVI and low incidence of dengue. Meza-Ballesta et al. [35] reported the association between dengue incidence and high air temperature, high rainfall, and vegetation deterioration. Araujo et al. [36] analyzed dengue incidence in São Paulo, Brazil. They associated thermal remote sensing images, census data, and dengue incidence, and their findings point to low dengue incidence in areas with high vegetation cover, and high incidence in areas with low vegetation coverage and land surface temperatures above \(29\,^{\circ }\mathrm {C}\); however Nazri et al. [37] did not find NDVI to be a major factor influencing dengue incidence in Malaysia. Qi et al. [38] found that the associations between cases of Dengue disease and population density, GDP per capita, road density, and NDVI were nonlinear, and the risk of dengue disease declined gradually with rising NDVI.

These reports involving NDVI as a covariate did not use areal data, or the aggregation resolution was higher than the resolution used in our study. Additionally, at high spatial scale, the link between NDVI and rainfall variability in zones with epidemiological reports of dengue or malaria has been reported across South America [39], contributing to a better understanding of climate, environment, and epidemics facilitating the implementation of local and regional health early warning systems. In this context, our study contributes to understand the relationship between NDVI and incidence of dengue disease at a finer resolution.

As limitations of the satellite data employed in our study, we noted that for the period 2008–2011, we had to impute the NDVI in some census sections because of known issues with the sensor from Landsat ETM 7. We employed a convolution model to make the imputation of the missing NDVI values. For the NDVI data, the pixel resolution was 30 m, while the pixel resolution of the MODIS LST data was 1000 m, resampled to 30 meters, which makes some census section highly associated with respect to the LST. Additionally, for LST, we used single raster image per year, which is a composite of around 48 MODIS images. Moreover, finding high-quality Landsat images for the period was difficult, mainly because of cloud cover. In addition, we used a non-conventional method to treat information from satellite images, as compared to the literature, because we employed continuous variables such as NDVI and LST in space obtaining areal inputs as covariates, using the mean value for every census section.

With respect to the data quality on dengue, we employed official data, aggregating the cases into census sections acknowledging factors such as age or sex by means of internal standardization. We consider our dengue case counts to be high-quality data given that the surveillance system is based on compulsory reporting physicians in Colombia at all service levels, but keeping in mind critic considerations regarding possible underreporting as shown by Romero-Vega et al. [40].

For the spatial aggregation level and census data, we based our population estimates on the 2005 census, with the assumption that city’s population has been fairly stable throughout the study period. We recognize the possible bias produced by the use of 2005 data for estimation of the expected values for the study period. This is one of the challenges of the study, but it is not only for the disease under study. At the time of the realization of this study, there were not updated data for population living in Bucaramanga, at census section level. The Colombian government updates its census every ten years. It was programmed a census for 2016, but was not developed. We consider this finding extremely important, if public health authorities are interested to provide information, not only for temporal analysis (as is currently done) but also for spatial analysis at finer resolution scales (census block, section or sector). This study provides the space to recommend to the Colombian authorities, to build real-time registries of population, which support evaluation, decision making and intervention, not only for dengue disease, but for all the notifiable diseases. Additionally, we are aware that the spatial aggregation scale changes the conclusion from areal data as shown by Khormi and Kumar [41], as does the choice of neighborhood structure [10]. The spatial aggregation scale used in this study corresponds to the spatial scale supplied by the official cartography from the 2005 census.

Together with the generation of relative risk maps and the evaluation of satellite data associated with incidence of dengue disease, we determined whether there was an association in the DRR of dengue disease by census section from year to year, and from the risk at global scale and annual scale. To this end, we employed the Kappa coefficients to define the global-to-annual agreement of risk when the relative risk is discretized as low or high risk, estimating the Kappa coefficients, using a Bayesian model. We have found substantial to moderate agreeement between the DRR at global scale and the year 2010, and moderate to fair agreeement for 2013 and 2014. For the rest of the global-to-annual agreement or all the annual-to-annual agreement of the DRR for consecutive years, we found slight to fair agreement, reflecting a volatile pattern of dengue risk by census section from year to year. The main results for the annual-to-annual agreement of the discretized relative risk for non consecutive year, point to low agreeement between census sections in terms of risk level between years.

Conclusions

We applied disease mapping models at small spatial scale in a Colombian city with fixed and space-varying coefficients for covariates derived of satellite images. We found the NDVI associated to high dengue risk by census section. The modeling process produced relative risk maps of dengue disease, allowing us to identify areas with high risk in the city. We compared the concordance of high risk by census section between the global aggregated model and the annual aggregated models, and between years in the 2008–2015 period. We found in general, slight to fair agreement in high risk. The information obtained by the use of disease mapping models is not currently available to public health authorities in Colombia. We highlight the importance of transforming raw spatial data into relative risk maps of dengue disease for planning and implementation of public health strategies. The map quality depends on high-quality population data at the selected spatial aggregation scale, which are sometimes difficult to obtain; and the geocoding process to allocate every case to a correct spatial coordinate. Information from satellite images improves the output of the spatial modeling, by associating environmental variables to the dengue incidence. For future research we are interested to apply disease mapping models in similar datasets from municipalities in Colombia. The main limitation is the cost of geocoding the data from the official records in terms of work-time and accurate geocoding. However, after the geocoding task is finished, the data will be available not only to spatial analysis, but also to temporal and spatiotemporal analysis, including the covariates with constant or time-varying coefficients[42]. We would like to present the method to the public health authorities from diverse surveillance offices in Colombia, and test the field applicability of the disease mapping models discussed in our study. Finally, we also are interested to apply disease mapping methods based on hierarchical Bayesian models to map the relative risk not only for dengue disease, but also for arboviral diseases like Chykungunya and Zika at small spatial scale in Colombia, to support public health authorities in disease surveillance and control strategies.