Study area and local climate regions
We focused our analysis on the main island of Taiwan and specific islands in the region (the Pescadore, Kinmen, and Matou Islands). Smaller, more isolated islands, such as the Little Liuchiu, Green, and Orchid Islands, were excluded. We collected data recorded between 1897 and 2008 at 25 traditional surface stations. Taiwan’s Central Weather Bureau (CWB) divided the regional climate and associated geographical areas into the following 6 regions (4 metropolitan areas in Northern, Central, Southern, and Eastern Taiwan, and 2 mountainous regions in Northern and Central Taiwan) [16]. Because the regional climate differs markedly between the northeastern and southeastern regions of Taiwan, for the purpose of this study, we divided the geographical area of Eastern Taiwan into 2 regions, resulting in 7 categories. We subsequently added an additional 3 neighboring islands for a total of 10 geographical regions in Taiwan (Figure 1). For our analysis, we used obtained meteorological data from 24 meteorological stations that were collected from 2002 to 2011. We also obtained data on meteorological variables from the CWB, based on monthly observations (e.g., surface temperature, precipitation, relative humidity, atmospheric pressure, number of wet days, and hours of sunshine) [17]. Table 1 shows a summary of the mean, median, and range of meteorological factors for the 10 local climate regions of this study.
Table 1
The 10 local climatic regions and their monthly meteorological factors in Taiwan (2002–2011)
Incidence of scrub typhus in people
Data for confirmed cases of scrub typhus were obtained from the Notifiable Infectious Diseases Statistics System and Infectious Diseases Database at the Taiwanese Center for Disease Control (CDC) [18]. Because scrub typhus is a notifiable disease, blood samples from patients with suspected scrub typhus were collected and sent to the CDC for laboratory confirmation. The samples were labeled positive for scrub typhus based on a positive real-time polymerase chain reaction test, or a 4-fold increase in OT-specific immunoglobulin M or immunoglobulin G antibody of paired sera by using an indirect immunofluorescence assay technique. Data were obtained for the years 2002 through 2011. Ethical approval for this study was unnecessary because the data are public domain.
Data management
Data on the population density of farm workers and the use of forest land (e.g., timber management, recreational forests, national protectorate, natural reserves, plantations, clear-cut areas, and other purposes) were obtained from the 2005 Agricultural, Forestry, Fishery, and Husbandry census [19], which also included the socioeconomic and environmental status represented by variables used in our analysis. The calculated SIR of scrub typhus for each township was subsequently used as the response variable in the GWR model. The GWR model used the following explanatory variables: (a) percentage of farm labor; and (b) land share ratio of timber management, recreational forests, national protectorates, natural reserves, plantations, clear-cut areas, and other purposes.
We used the statistical software package SPSS (v.12) to calculate the Pearson’s product–moment correlations (r), and employed ArcMap (v.9.3) to map the GWR.
General statistics
We applied the chi-square goodness-of-fit and Fisher exact tests [20] to analyze the seasonal variation within and among the 10 local climatic regions. The strength of the relationships between scrub typhus incidence and meteorological variables across the 10 regions were assessed based on Pearson’s r.
Geographically weighted regression
The GWR method is a spatial statistical tool that generates parameters disaggregated by the spatial units of analysis. We considered analyzing the contiguity-based spatial units (e.g., 349 administrative government areas on the main island of Taiwan) by using the GWR method. However, this method is unsuitable for examining isolated regions (e.g., the Pescadore, Kinmen, and Matou Islands). The GWR model is an extension of the traditional standard regression framework that estimates local rather than global parameters [21]. It is a type of local statistic that produces a set of local parameter estimates showing the spatial variation of relationships. Thus, the spatial pattern of local estimates can be examined to elucidate potentially ambiguous causes of observed differences [22]. Conversely, traditional regression methods, such as the ordinary least squares (OLS) method, are global statistical tools that assume the spatial constancy of a particular relationship (i.e., a parameter is assumed to remain constant across an entire area).
An OLS model can be defined as follows:
where y is the response variable, β
0 is the intercept, β
i
is the parameter estimate (coefficient) for explanatory variable x
i
, p is the number of explanatory variables, and ϵ is the error term.
The GWR model allows local rather than global parameters to be estimated for the study area. Thus, the GWR model rewrites the OLS model as follows:
(1)
where u
j
and v
j
are the coordinates for each location j, β
0 (u
j
,v
j
) is the intercept for location j, and β
i
(u
j
,v
j
) is the local parameter estimate for explanatory variable x
i
at location j. The weight assigned to each observation is based on a distance decay function centered on observation i.
The estimator for the GWR model is similar to the weighted least squares global model. However, in the GWR model, the weights are determined by location u, which is relative to the observations in the data set, and consequently changes for each location. The estimator is expressed as follows:
(2)
where W(u) is the square matrix of weights relative to position u. A specific location can be indexed (u
j
,v
j
) in the study area. The geographically weighted variance-covariance matrix is represented by X
T
W(u)X, and y is the vector of the value of the response variable. The W(u) matrix contains the geographical weights in its leading diagonal elements, and zero in its off-diagonal elements.
(3)
In the area that this study was conducted, the sample points produced by the polygon centroids were clustered rather than placed regularly. A convenient method for implementing an adaptive bandwidth specification is to select a kernel that allows an identical number of sample points for estimations. The weight can subsequently be calculated using the selected kernel and by setting the value for any observation with a distance that exceeds the bandwidth to zero. The bisquare function is expressed as follows:
(4)
where w
i
(u
j
,v
j
) is zero when d
i
(u
j
,v
j
) is greater than h. The variable h represents the bandwidth quantity, which is a near-Gaussian function with the useful property of the weight being zero at a finite distance.
In this study, we selected the bandwidth by minimizing the Akaike information criterion (AIC) score by using the following equation:
(5)
where tr(S) is the trace of the hat matrix. The AIC method is advantageous because it considers the possible variation in degrees of freedom among models centered on various observations. We determined the optimal bandwidth by minimizing the adjusted AIC, as detailed by Fotheringham et al. (2002) [22]. GWR models produce a set of local regression results, including local parameter estimates and local residuals, which can be mapped to show their spatial variability.
The Benjamini-Hochberg (B-H) procedure modifies the significance level for each test consistently. We applied this procedure to control the false discovery rate in multiple comparisons and to determine the significance of parameter estimates obtained using the GWR model. Thissen et al. (2002) proposed a simple method for calculating the B-H procedure false discovery rate by using Microsoft Excel [23]. The B-H approach controls the FDR by sequentially comparing the observed p values for each family of multiple test statistics (from largest to smallest) with a list of computed B-H critical values [pB-H(i)]. The critical value on the list is determined for each test statistic, and is indexed by i through linear interpolation between α/2 (for the largest observed p value) to (α/2)/m, where m is the family size (for the smallest of the p values). Because the final value is the Bonferroni critical value, the reason for the gain in the power of B-H relative to the Bonferroni approach is as follows: the B-H approach compares only the smallest of the m observed p values with the Bonferroni critical value. All other p values are calculated using less stringent criteria. The local parameter is considered significant if the p value is less than the B-H critical value; otherwise, the parameter is considered non-significant [23].