Multilingual absracts

Please see Additional file 1 for translations of the abstract into the five official working languages of the United Nations.

Background

Angiostrongyliasis is a food-borne parasitic zoonosis while human infection is caused by infection with the third-stage larvae of Angiostrongylus cantonensis. The life cycle of A. cantonensis involves rodents as definitive hosts and molluscs as intermediate hosts [1]. Humans are dead-end hosts for A. cantonensis, and they can still be infected through ingestion of infected molluscs, contaminated vegetables, or transport hosts, such as shrimps, crabs, frogs and lizards [2]. Infected individuals present with eosinophilic meningitis or meningoencephalitis [3]. More than 30 recent outbreaks have been reported in mainland China so far [4,5,6,7]. Human infection is listed as an emerging food-borne parasitic disease by the World Health Organization (WHO) and is classified as an emerging infectious disease by the Chinese Ministry of Health since 2003 [6, 8, 9].

Previously, studies on angiostrongyliasis were mainly concentrated on epidemic foci and case descriptions. Most reports were based on descriptive epidemiological methods in general, while the spatial parameters were seldom considered [10]. In recent years, spatial analysis using the geographic information system (GIS) has started to be applied more widely in the field of parasitic diseases [11, 12]. From then on, recent such work on schistosomiasis and malaria at different scales has provided new ideas and methods for angiostrongyliasis research. It would be therefore important to know the spatial distribution, infection status, and the factors that influence these parameters both for snail intermediate hosts and for rat definitive hosts. Such knowledge would be most useful for the prevention and control of angiostrongyliasis and estimating risk of infection. This study aimed at analysing the characteristics of spatial distribution of these two hosts at village scale, by exploring the applicability of ordinary least squares regression (OLS) and geographically weighted regression (GWR) models to provide a methodology reference and theoretical basis for the prevention and control of this emerging human parasitic disease.

Methods

Study area

Nanao Island is located at the extreme eastern edge (116°53′–117°19′E, 23°11′–23°32′N) of Guangdong Province, China (see Fig. 1). The annual average temperature is 21.5 °C and the annual average rainfall is 1348 mm. The snail intermediate host Pomacea canaliculata and various rat definitive hosts of A. cantonensis are widely distributed on the island. Besides, from previous research data, there have been revelations that there might be hotspots of A. cantonensis infections in this region of China [13, 14].

Fig. 1
figure 1

Geographical location of Nanao Island and the villages involved in the study

Data collection

Three villages on Nanao Island (Gongqian Village, Liudu Village and Jinshan Village) belonging to the towns of Houzhai, Shenao and Yunao, respectively, were selected as study areas for stratified random sampling. The density and infection status of P. canaliculata and various rat species were surveyed every three months from December 2015 to September 2016. The method of trapping rats was referred to sampling procedure of vector infected by pathogens – Rodent (GB/T28940–2012 China). In addition, environmental data, such as season, environmental type, distance from residential areas and fresh-water relation were also collected. The geographical coordinates for each sampling point were recorded by a Garmin 60CS (Garmin Corp., Olathe, KS, USA) global positioning system (GPS) instrument.

Pomacea canaliculata lung and muscles were examined for third-stage larvae by lung-microscopy, tissue homogenate and enzyme digestion [15]. Rats were captured by night feeding and trapping, and were drowned and dissected with hearts and lungs checked for adult worms. Morphological observation was performed for P. canaliculata, rats and third-stage larvae of A. cantonensis, followed by DNA extraction and polymerase chain reaction to double-check the species. A unique identification number for every sampling site, including spatial data, density and natural infection rate, was established to reflect the distribution and infection status of these hosts.

Statistical analysis

Descriptive analysis

Infection rates and densities of snails and rats were analysed by non-parametric rank and summing tests using the statistical software SAS 9.3 (SAS Institute, Cary NC, USA) and the Chi-square test (Fisher’s exact test). The correlation between density and infection rate was analysed by Spearman’s rank correlation. The level of P < 0.05 (with two-tailed test) was chosen for statistical significance.

Spatial correlation analysis

ArcGIS (version 10.2, ESRI Corp., Redlands, CA, USA) was used to analyse the spatial correlation of the infection rates, both for snails and for rats, by establishing a semivariogram model, which provides the measure of variance as a function of distance between data points. The semivariance graph conveyed information about the continuity and spatial variability of the process [16, 17]:

$$ \mathrm{r}(h)=\frac{1}{2N(h)}{\sum}_{i=1}^{N(h)}{\left[Z\left({x}_i\right)-Z\left({x}_i+h\right)\right]}^2. $$

In the formula, Z(xi) and Z(xi + h) are sample values at locations xi and ( xi + h), N(h) the number of paired data at the distance h (points with h separation distance), and r(h) the semivariance. The value of the semivariogram model is based on the fixed distance h. The model includes the following four parameters: Nugget, Range, Sill and Partial Sill. The degree of spatial autocorrelation is reflected by the ratio between Partial Sill and Sill; the greater the ratio, the stronger the spatial autocorrelation. The ratio between Nugget and Sill is called the substrate effect, which represents the variation among samples and is caused by a random factor.

Scan statistics

SaTScan 9.4 (https://www.satscan.org/) was used to detect spatial aggregation of infection rates. The parameters were set as Poisson distribution model and 30% of the population at risk as maximum spatial cluster size. Monte Carlo simulation was used to find the likelihood ratio (LLR) and to explore the maximum possible cluster [18,19,20].

Spatial modelling

The OLS and GWR models were compared with respect to their ability to analyse the relationship between season, type of environment, distance from residential areas and density on the one hand, and the spatial distribution of infection rates (both in the snail and in the rat definitive hosts) on the other hand. This was performed by using the Geostatistical Analyst tool in ArcGIS by comparing the degree of fit.

The OLS model is a global spatial analytical method where the basic assumption is that the dependent and independent variables have the same linear relations in all spatial parts of the area studied, i.e. its parameters are constant in different locations. Therefore, OLS can only produce average and global parameter estimates, rather than local ones [21]. The OLS model was calculated as follows:

$$ {y}_i={\beta}_0+{\sum}_{j=1}^n{\beta}_j{\chi}_{ij}+{\varepsilon}_i; $$

Where yi is the value at the ith region, I = 1, 2…N, χij the jth variable value at the ith region, and εi the linear random error.

The GWR model is a local spatial analytical method, which is mainly used in non-stationary parameter estimation for the analysis of local spatial relation. Its parameters can change with different geographic positions. Since both spatial autocorrelation and spatial heterogeneity are taken into account in the GWR model, it has more value in the choice of spatial influence factors of disease compared with the general spatial models, such as OLS, spatial lag model, spatial error model and spatial Durbin model [22,23,24]. The GWR model was calculated as follows:

$$ {Y}_i={\beta}_0\left({u}_i,{\nu}_i\right)+{\sum}_{j=1}^k{\beta}_j\left({u}_i,{\nu}_i\right){\chi}_{ij}+{\varepsilon}_i; $$

where the term (ui, νi)is the geographic coordinates of the ith sample, βj the regression coefficient that changes with geographic position, and εi the linear random error.

Results

Descriptive analysis

A total of 2192 P. canaliculata snails were collected from the three study villages (see Fig. 2). About half of the sails (1190) were randomly chosen to be microscopically examined, which resulted in 72 (6.1%) positives (see Table 1). When comparing according to the study sites, it was found that the difference in infection rates between the villages was statistically significant (χ2 = 12.8058, P = 0.0017). The infection rates between Gongqian Village and Jinshan Village were statistically significant (χ2 = 9.8581, P = 0.0017), as well as between Liudu Village and Jinshan Village (χ2 = 6.5297, P = 0.0106). However, the infection rates were not statistically significant (χ2 = 1.3459, P = 0.2460) between Gongqian Village and Liudu Village.

Fig. 2
figure 2

Collection of snails and screening of Angiostrongylus cantonensis

Table 1 Summary of the investigations on the snail intermediate hosts

A total of 110 rats were captured, including R. norvegicus, R. flavipectus (see Fig. 3), Rattus losea and Suncus murinus, of which 32 (29.1%) were positive (see Table 2). There were 31 positive R. norvegicus and only one R. flavipectus, but no adult worms at all were found in R. losea and S. murinus. The variation of infection rate between different species of rodents was significant (Fisher’s Exact Test, P = 0.0051), and there was also a significant variation of the infection rate of rodents between the villages (χ2 = 13.9719, P = 0.0009). The infection rates between Gongqian Village and Jinshan Village were statistically significant (χ2 = 12.1951, P = 0.0005). However, the infection rates between Gongqian Village and Liudu Village (χ2 = 3.0590, P = 0.0803), as well as between Liudu Village and Jinshan Village (χ2 = 1.6250, P = 0.2024) were not statistically significant.

Fig. 3
figure 3

Capture of rats and screening of Angiostrongylus cantonensis

Table 2 Summary of the investigations on the rat definitive hosts

The Spearman’s rank correlation tests showed that no statistically significant correlation was observed between P. canaliculata densities and infection rates (rs = 0.20582, P = 0.2151), but the correlation between these parameters in the rats was significant (rs = 0.51755, P ≤ 0.0001).

Spatial autocorrelation

The semivariance model for infection rate of P. canaliculata (see Fig. 4), was:

$$ \mathrm{r}(h)={0.008833}^{\ast}\mathrm{Nugget}+{0.014418}^{\ast}\mathrm{Spherical}(0.0030444). $$
Fig. 4
figure 4

Semivariogram of infection rate of Angiostrongylus cantonensis in Pomacea canaliculata

In this model, the values of Nugget, Partial Sill and Sill were 0.008833, 0.014418 and 0.023251, respectively. The ratio of Nugget to Sill was 38% and the ratio of partial Sill to Sill was 62%, which indicated that the spatial heterogeneity was mainly caused by spatial autocorrelation.

The semivariance model for the infection rate of rats (see Fig. 5), was:

$$ \mathrm{r}(h)={0.032829}^{\ast}\mathrm{Nugget}+{0.093567}^{\ast}\mathrm{Spherical}(0.00038847). $$
Fig. 5
figure 5

Semivariogram of infection rate of Angiostrongylus cantonensis in rats

Here, the value of Nugget, Partial Sill and Sill were 0.032829, 0.093567 and 0.093567, respectively. The ratio of Nugget to Sill was 26% and the ratio of partial Sill to Sill was 74%, which also indicated that the spatial heterogeneity was mainly caused by spatial autocorrelation.

Spatial scan statistics

As shown in Tables 3 and 4, eleven spatial clusters were detected: nine clusters of infected snails and two of infected rats. The maximum radius of accumulation areas in the infection rates of P. canaliculata and rats were 0.049 km and 0.069 km, respectively.

Table 3 Scanning of Angiostrongylus cantonensis infection rate in Pomacea canaliculata
Table 4 Scanning of Angiostrongylus cantonensis infection rate in rats

Spatial cluster areas and the sampling sites were added to Remote Sensing maps from Google Earth for spatial overlay analysis. The infected snail clusters were mostly seen in artificial channels near the villages, while the infected rat clusters were found in places characterized by the presence of messy environment, such as rubbish heaps and recycling centres (see Fig. 6).

Fig. 6
figure 6

An overview of the sampling site and cluster areas of positive samples

Spatial modelling

First, the OLS model was selected to find the associations with the variables chosen and then the modelling was performed to decide whether the GWR model was also needed. The parameters and the fitting criterion of OLS model were obtained (see Table 5 and Table 6).

Table 5 Infection rates of Pomacea canaliculata with Angiostrongylus cantonensis using different variables estimated by OLS approach
Table 6 Infection rates of rats with Angiostrongylus cantonensis using different variables estimated by OLS approach

With the aim to analyse the residual error in the OLS model of the P. canaliculata infection rate and that of the rats, Jarque-Bera statistic (JB) was instituted. JB for P. canaliculata infection rate was 13.013 (P = 0.0015), and that of the rats 21.627 (P = 0.00002). They were obviously both statistically significant, which indicates that the residual error did not fit the hypothesis of normal distribution, i.e. data on the infection rate was non-stationary in space and there was spatial heterogeneity. This means that spatial information was not fully extracted by the OLS model.

It is important to set the weighting when GWR model was established. In order to make sure of the best bandwidth, Gaussian function was chosen for this function based on the principle of minimum corrected Akaike information criterion (AICC). The parameters and fitting criterion of GWR models are shown in Tables 7 and 8.

Table 7 Infection rates of Pomacea canaliculata with Angiostrongylus cantonensis using different variables estimated by GWR approach
Table 8 Infection rates of rats with Angiostrongylus cantonensis using different variables estimated by GWR approach

The results from the comparison of the OLS and the GWR approaches are displayed in Table 9. AICC (AICC = (2 k-2 L)/n + 2 k (k + 1)/(n – k – 1)) is a criteria to evaluate the performance of statistical model; where n was the sample size, k repeated the concision and L repeated the accuracy of the model, meaning that if AIC or AICC was smaller, the model was better. The coefficient of determination (R2) and the degree-of-freedom adjusted coefficient of determination (R2 adjusted) assumed that the independent variable explains the variation in the dependent variable in the model. The model was better if the value of R2 and R2 adjusted was more approximate to zero. Mean square deviation (σ2) and residual sum of squares (RSS) were the deviation of variables. Concerning P. canaliculata, for OLS model, AIC was 306.0523, AICC 308.7620, R2 0.0864, R2 adjusted 0.0243, and σ2 154.6894; while for GWR model, AICC was − 30.9822, R2 0.1307, R2 adjusted 0.1345, RSS 0.4857, and σ2 0.0171. The results showed that AICC, R2, R2 adjusted and σ2 in the GWR model were superior to the ones in the OLS model for P. canaliculata. With regard to rats, for OLS model, AIC was 529.8539, AICC 531.6039, R2 0.2976, R2 adjusted 0.2414, and σ2 790.7322; while for GWR model, R2 was 0.4411, R2 adjusted 0.3195, AICC 27.0182, RSS 3.1458, and σ2 0.0709. These also showed AICC, R2, R2 adjusted and σ2 in the GWR model were superior to those in the OLS model for rats. According to some authors [25], the GWR approach should be chosen, even though it was more complex, if the difference of AIC between the two models were greater than 3. In our study, for both P. canaliculata and rats, AICC in the GWR models was much larger than the value in the OLS models, indicating that the GWR model had more advantage in analysing spatial heterogeneity data. However, R2 was not big in any of the two models, implying that the account of spatial variance that the models could explain was small, and the existing influence factor did not properly represent the spatial variance of infection rate data.

Table 9 Comparison between OLS and GWR models

Discussion

In comparison with some other cities/provinces in China, e.g., Shenzhen [26] and Xiamen [27] and Hainan [28], the infection rates of P. canaliculata were lower than those from our findings, but they were higher in Dali [29] and Guangzhou [30]. However, our reported rate of infected rats was lower than those reported in Rio de Janeiro in Brazil [31], Canary Islands [32] and Nueva Ecija in the Philippines [33]; while it was higher than those found in Guangzhou [34] and Zhongshan [35] in China.

There was a significant difference in the infection rate between the different kinds of rats investigated, e.g., only one infected R. flavipectus was found, while there was no infection in either R. losea or S. murinus. The reason could be attributed to the biological variation between different rat species, i.e. R. norvegicus and R. flavipectus live close to human dwellings and they were also known to prey on snails, while R. losea and S. murinus were field mice mainly feeding on plants [36], thereby lowering the expectation of infection.

The spatial distribution of infection rate of both P. canaliculata and the rat species showed spatial autocorrelation and aggregation. More so, positive P. canaliculata were mostly clustered in artificial channels near the villages, while the positive rats were more clustered in the places where the environment was what can be described as messy; consisting of garbage. Such places were always near human settlements, thus strengthening the probability for rats and molluscs to infect each other.

The result showed that GWR model was better than OLS model in terms of applicability, but the degree of fit of this model were not impressive. The possible reason was that our choice of factors that might influence infection was incomplete and some other possibly important factors, such as the normalized difference vegetation index (NDVI), temperature, rainfall, soil pH and socioeconomic status of the humans in the study area [37,38,39,40] were not part of our modelling. Such variables, particularly factors with potentially higher explanatory power need to be explored and included into spatial models in the future, in order to perfect further spatial models. We would also like to investigate different spatial scales [41], as well as improve the degree of fit of the models used by increasing sample sizes and the number of villages, with the expectation that these parameters might affect the finding.

Conclusions

The intermediate hosts and definitive hosts of A. cantonensis were widely distributed in Nanao Island and positive infection has also been found in the hosts. These findings indicate that there was a risk of angiostrongyliasis in this region of China, and intensive monitoring work on the hosts should be undertaken.

Our study also showed that there existed spatial correlation and spatial clusters in the spatial distribution of positive P. canaliculata and rats. The maximum radius of spatial cluster areas of positive rats was basically consistent with the rats’ sphere of activity. More so, GWR model had advantage over OLS model in the spatial analysis of hosts of Angiostrongylus cantonensis.