Background

Dogs are susceptible to infection to numerous tick-borne rickettsial pathogens including Anaplasma phagocytophilum, the etiologic agent of granulocytic anaplasmosis in people, dogs, horses, sheep and other animals [1]. A closely related pathogen, A. platys, causes infectious cyclic thrombocytopenia in dogs and cross-reacts with antibodies to A. phagocytophilum. Clinical signs of canine granulocytic anaplasmosis range in severity, but commonly include fever, thrombocytopenia, lethargy, and polyarthritis, while infectious cyclic thrombocytopenia, caused by A. platys, is generally considered a mild disease except when co-infection exacerbates other diseases such as ehrlichiosis [2]. People with A. phagocytophlium infections may have flu-like symptoms, but rashes are rare, unlike other tick-borne zoonoses such as Lyme disease or Rocky Mountain spotted fever [3]. Although considered a low risk for human infection, a recent case report suggested A. platys might also be zoonotic [4].

In the United States, Ixodes scapularis (the blacklegged tick) and Ixodes pacificus (the western blacklegged tick) are considered the primary vectors of A. phagocytophilum. Ixodes scapularis is found in at least 32 states in the eastern and central states, while I. pacificus appears limited to five western states [5], but evidence of autochthonous transmission of pathogenic strains of A. phagocytophilum to people and dogs has only been documented in the Northeast, Upper Midwest, and limited parts of the western United States [6]. Ixodes scapularis and Ixodes pacificus are also found northward into Canada. In contrast, Rhipicephalus sanguineus (the brown dog tick) is thought to transmit A. platys, although this cycle has not been confirmed in North America. The distribution of R. sanguineus is described as cosmopolitan, as these ticks can infest buildings in otherwise inhospitable climes [7]. Brown dog ticks also thrive in arid areas with high temperatures. Accordingly, populations of this tick are most intense and infestations of premises are more common in the southern United States.

Transmission by tick vectors is considered the primary means of canine exposure to Anaplasma spp., thus variation in regional risk factors is tied to presence and abundance of competent tick vectors and vertebrate reservoirs. Factors associated with the presence of tick vectors include vector amplification hosts, pathogen reservoir host population densities, climate, and topography [8, 9]. Advances in testing and recording technologies have led to large datasets of diagnostic test results by county for canine exposure to Anaplasma spp. [6, 10]. With support from a veterinary diagnostic company (IDEXX Laboratories, Inc., Westbrook, ME), the Companion Animal Parasite Council (CAPC) has compiled a dataset of diagnostic test results that were reported by veterinary practitioners and a network of reference laboratories within the contiguous United States. This database allowed us to conduct the first comprehensive risk factor study of canine Anaplasma spp. in North America. The CAPC also convened a workshop to identify factors that are putatively associated with canine seroprevalence of tick-borne pathogens, specifically focusing on risk factors for which data are available, so these factors could be quantitatively evaluated for predictive power with respect to spatial-temporal seroprevalence patterns [11]. The objectives of this investigation were to identify risk factors associated with canine seroprevalence of Anaplasma spp. and to incorporate these factors into a refined spatial-temporal analysis. These data allow for the creation of maps that indicate risk of Anaplasma infections of people, dogs, horses, and other wildlife.

Methods

Data collection

To spatially analyze the canine seroprevalence of Anaplasma spp., the results of 3,950,852 diagnostic tests performed during 2011–2013 were acquired by the CAPC from IDEXX Laboratories, who provided qualitative (positive/negative) results reported for each county in the contiguous United States. Test results were generated using SNAP® 4Dx® and SNAP® 4Dx® Plus Test kits (IDEXX Laboratories, Inc.) which are point-of-care ELISAs to detect antigen from or antibodies to several vector-borne pathogens. The tests were performed at both the clinic level and at reference laboratories. The performance of these test kits was reported elsewhere [12, 13]. The Anaplasma portion of these tests uses a synthetic peptide from a major surface protein of A. phagocytophilum (MSP2/P44) and detects antibodies to both A. phagocytoyphilum and A. platys [13].

Data analysis

Spatial structure of canine exposure to Anaplasma spp. in the United States

Two statistical smoothing techniques were applied to the data to generate a spatial prevalence map of canine exposure to Anaplasma spp. in the United States. A weighted head-banging algorithm was first used to reveal patterns in the data [14, 15]. To account for counties not reporting data, kriging, an interpolation method, was subsequently used to construct a spatially complete map [16].

Risk factors

Previously, 15 posited risk factors were proposed for canine exposure to pathogens transmitted by I. scapularis, I. pacificus or R. sanguineus [11]. Of these, nine were analyzed for predictive power in explaining the observed regional canine seroprevalence. To be considered, a factor had to be quantifiable with currently available data; this limited the number of factors to climate (annual temperature, precipitation, and relative humidity), socioeconomic characteristics (human population density and household income), and local topography (surface water, forestation coverage, and elevation) [11]. Finally, nationwide county-level deer densities were not available; hence, a state-by-state estimated annual probability of deer/vehicle collisions was used as a surrogate risk factor [17]. Counties within a state were assigned the collision proportion for the entire state (Additional file 1: Figure S1). The premise was that regions with greater deer/vehicle collision reports support higher deer populations. A list of the considered factors and their sources is provided in Table 1.

Table 1 Candidate factors, considered in both the Endemic Regions and Contiguous US models, along with their units, data sources, and spatial resolution

Statistical methods

To assess the significance of the putative risk factors, let Y i,j denote the number of positive tests in the i th county during the j th year and n i,j the corresponding total number of tests performed. An estimate of the i th county’s prevalence over the three study years is

$$ {\widehat{p}}_i=\left({Y}_{i,1}+{Y}_{i,2}+{Y}_{i,3}\right)/\left({n}_{i,1}+{n}_{i,2}+{n}_{i,3}\right). $$

Generalized linear models (GLMs) are used here with assumptions that the observed data are (1) independent and (2) follow a distribution belonging to an exponential family. For further details, see [18]. Here, it is assumed that the number of positive test results is a true random sample, obeying a binomial distribution (an exponential family member). Possible departures from this assumption are discussed later in the Conclusions. Consequently, a GLM can be formulated as

$$ \mathit{\mathsf{g}}\left({p}_{ij}\right)={\beta}_0+{{\displaystyle {\sum}_{k=1}^p{\beta}_kX}}_{ijk}={X}_{i^{\prime }j}\beta, $$

where g is an invertible link function, X ij  = (1, X ij1, …, Xi jp )′ is a vector of risk factors from the i th county during the j th year, and β = (β 0,…, β p )′ is a vector of regression coefficients. Herein, g is specified to be the logistic link; i.e., \( \mathit{\mathsf{g}}\left({p}_{ij}\right)= \log \left\{{p}_{ij}/\left(1-{p}_{ij}\right)\right\}. \) Models of this form are easily fit using standard statistical software. For a fixed county, it is unreasonable to assume that seroprevalence estimates are statistically independent in time. In fact, in endemic areas, infections persist in reservoir host populations; consequently, the number of positive test results from year-to-year in a given county may be highly positively correlated.

To allow for temporal correlation, a generalized estimating equation (GEE) was used to estimate regression coefficients [19, 20]. GEEs are similar in form to GLMs, but account for the correlation between observations within a particular county over time by minimizing a “weighted” sum of squares to obtain parameter estimators [19, 20] (GLMs minimize an “unweighted” sum of squares). To apply the GEE method a working correlation matrix has to be specified; e.g., independent, exchangeable, auto-regressive, etc. The specification of this matrix accounts for the temporal correlation within a given county. In order to prevent misspecification, an unstructured working correlation matrix was considered and its components were estimated along with the regression parameters. GEE models can be fitted using standard statistical software (e.g., SAS, Stata, Splus, and R) [21, 22].

While GEE techniques account for temporal dependence within a county, they assume observations from different counties are independent. Consequently, the weighted head-banging and Kriging algorithms [23, 24], which implicitly account for spatial dependence, were used to graphically display prevalence estimates. The weighted head-banging algorithm, which made use of 20 triples, was first used to smooth the county-level prevalence estimates. The weights were set as the reciprocal of the estimated standard deviation of the prevalence estimates. Thus, counties with more observations had more importance in the smoothing. Kriging was then applied to the head-banging estimates to infill counties not reporting data and to generate spatially complete prevalence maps. Kriging was implemented using the default settings within ArcGIS. Two main effects models, described below, were considered.

In describing model fits, estimated regression coefficients and their standard errors were obtained by fitting the proposed model in SAS. In order to retain model interpretability, this analysis considers only first-order models. Backward elimination was implemented, with a cutoff of 0.05, to complete model selection; i.e., the factor with the highest p-value greater than 0.05 was removed from the model at each step. Based on variance inflation factors, it was found that multicollinearity was not a significant issue. From these statistics, confidence intervals were constructed. To assess the quality of the model fit, a coefficient of determination, R 2, is reported [25].

Endemic region and contiguous US models

Two models were posited. The first was an “Endemic Regions” model and only used data from regions where A. phagocytophilum was considered potentially endemic based on published reports and expert opinion (shown in Additional file 2: Figure S2). Although data to indicate a particular region is endemic are imprecise, we subsequently show that the conclusions are not heavily dependent on this region’s definition. The second model considered was a “Contiguous US" model. Here, an indicator factor was added that demarcated whether or not a county was located within the A. phagocytophilum-endemic area (Additional file 2: Figure S2). This latter approach made use of all available data.

Results and discussion

Spatial prevalence

Nationwide, from 2011–2013, 3.76 % of tests were seropositive (4.26 % in 2011, 4.45 % in 2012, and 3.24 % in 2013). Approximately 1,500 of 3,144 US counties reported data each year, although this number varied slightly from year-to-year. Figure 1 shows the distribution and prevalence of dogs with antibodies to Anaplasma spp. by county. Most Anaplasma-positive test results originated from the Upper Midwest and Northeast, with the highest probabilities coming from northern Wisconsin, northern Minnesota, and eastern New England. Most counties not reporting data are in regions where these infections are considered uncommon (e.g., the South, Southwest and West), with the exception of the Rio Grande River Valley north through eastern New Mexico and Colorado.

Fig. 1
figure 1

Map illustrating percentages of positive tests for canine exposure to Anaplasma spp. reported from US counties from 2011 to 2013

Prevalence was highly variable and data were missing for many counties, thus, to improve map utility, these estimates were statistically smoothed using head-banging and kriging algorithms. The expected prevalence of canine exposure to Anaplasma spp. during a typical year by county is shown in Fig. 2. These data confirm that canine exposure to Anaplasma spp. was most prevalent in the Northeast, upper Midwest, northern California, and western Texas and eastern New Mexico.

Fig. 2
figure 2

Statistically Smoothed Prevalence Estimates for Canine Exposure to Anaplasma spp. (2011 to 2013). Spatial smoothing was completed via the head-banging and Kriging algorithms

Risk factor data

Several factors were significantly associated with the prevalence of Anaplasma-positive dogs, although the significant factors slightly change between the Endemic Regions and Contiguous US models (Table 2). All factors except for water coverage were significant with 95 % confidence in the Contiguous US model. When just the endemic regions were considered, all factors except water coverage and elevation were significant with 95 % confidence. Temperature, population density, relative humidity, elevation, and deer vehicle collisions are negatively correlated with Anaplasma prevalence and precipitation, forestation coverage, and median household income are positively correlated with Anaplasma prevalence.

Table 2 Estimates, standard errors, and odds ratios for the parameters corresponding to the factors found to be significantly associated with prevalence of canine exposure to Anaplasma spp. See Table 1 for the factor units

There was a significant correlation in the prevalence of Anaplasma spp. in dogs between years, regardless of the model (Table 3). The highly positive correlations imply that regions experiencing high or low canine seroprevalence will likely experience similarly high or low proportions in the near future. Correlations between proportions two years apart were lower than those separated by one year.

Table 3 Estimated year-to-year working correlation matrix in each model

Regional prevalence based on contiguous US and endemic regions models

Based on the Endemic Regions model, the highest prevalence estimates were reported for the Northeast followed by the upper Midwest, western Texas and central coastal California (Fig. 3). The Contiguous US model estimated higher prevalence in the upper Midwest but lower prevalence in Texas (Fig. 4). The model fits are summarized in Table 2. For the Endemic Regions model, prevalence estimates for counties in the endemic region were obtained from the fitted GEE model. This fit only uses data and factors for counties in the endemic regions. However, non-endemic regions were assigned the crude estimates depicted in Fig. 1 to coincide with the usual notion of prevalence (there are sporadic cases in non-endemic regions and some dogs also travel). The fitted models were similar and explain considerable structure: R 2 for the fits are 0.72 (Endemic Regions model) and 0.71 (Contiguous US model).

Fig. 3
figure 3

Estimated Canine Anaplasma Prevalence from Endemic Region Model. The presented results consist of statistically smoothed prevalence estimates, where the prevalence estimates were obtained from the fitted Endemic Region model. Spatial smoothing was completed via the head-banging and Kriging algorithms

Fig. 4
figure 4

Estimated Canine Anaplasma Prevalence from Contiguous US Model. The presented results consist of statistically smoothed prevalence estimates, where the prevalence estimates were obtained from the fitted Contiguous US model. Spatial smoothing was completed via the head-banging and Kriging algorithms

Conclusions

Like other tick-borne diseases in the United States, the incidence of human anaplasmosis has been increasing [26, 27]. Although canine anaplasmosis is not reportable, the incidence of seropositive canine cases also appears to be increasing. Similar to Bowman et al. [6], we found the highest prevalence of Anaplasma antibodies in dogs from the upper Midwest and eastern New England. These data also correlated with areas where the highest incidence of human anaplasmosis were reported, supporting the suggestion that dogs can make useful sentinels for human risk [26, 27]. Many of the dogs with antibodies reactive to Anaplasma are likely due to infection with A. phagocytophilum, given the general distribution and concordance with antibodies to Borrelia burgdorferi in dogs and human Lyme disease cases [6, 26, 28]. Further support comes from Qurollo et al. [29],who used A. platys- and A. phagocytophilum-specific assays to find similarly low seroprevalence of both pathogens in the Southeast and West. In contrast, the prevalence of antibodies to A. phagocytophilum was significantly higher in other regions. But, notably, there were isolated areas that had unexpectedly high prevalence estimates for Anaplasma (e.g., Texas, New Mexico, and Oklahoma) where neither A. phagocytophilum nor known tick vectors are common. Possible explanations of these findings include (1) exposure to A. platys or a novel Anaplasma spp., (2) an unrecognized novel A. phagocytophilum vector-reservoir transmission cycle in that region or (3) a relatively high frequency of dogs tested that had previously traveled to endemic regions [6]. These data, while sometimes enigmatic, should not be ignored as demonstrated by similar unexplained foci in the upper Midwest, where a novel E. muris-like agent was ultimately found in association with an unexpectedly high seroprevalence of Ehrlichia spp. among dogs [6, 30, 31].

Data from both the Endemic Regions and Contiguous US models agreed well with each other and original serologic data. However, there were some minor differences between the two models that resulted in some regions having a higher or lower estimated prevalence. For example, the Contiguous US model had higher prevalence estimates than the Endemic Regions model in some regions of the upper Midwest (e.g., Wisconsin, Minnesota, and Illinois) where granulocytic anaplasmosis is considered endemic and other regions of the Midwest (e.g., Indiana, Kentucky, and Ohio) where granulocytic anaplasmosis is currently considered rare. Also, the Contiguous US model estimated a lower prevalence for Maine, where granulocytic anaplasmosis is common. Lastly, the Contiguous US model estimated lower prevalence in western Texas, which was arguably influenced by smaller sample sizes.

The estimated regression coefficient for the endemic risk factor in the Contiguous US model is positive and significant. This implies higher prevalence among dogs living in areas where human granulocytic anaplasmosis is endemic.

Numerous factors were useful predictors for the seroprevalence of Anaplasma in dogs. Because rodents and white-tailed deer are important in the maintenance of A. phagocytophilum in nature, the association with increased forest coverage and decreased human population density is likely tied to suitable habitat for these critical wildlife species. Forest cover was also associated with higher prevalence of another tick-borne pathogen, E. chaffeensis, in white-tailed deer [32]. Importantly, forest fragmentation is highly associated with increasing Lyme disease incidence so these fragmented habitats will likely be important areas for A. phagocytophilum; however, the scale of this study was not fine enough to investigate edge effects [33].

Climatic variables such as temperature, precipitation and relative humidity have been associated with prevalence of ticks and tick-borne pathogens [3436]. In both of our models, precipitation was positively associated with Anaplasma infections in dogs and temperature was negatively associated with prevalence. Although one previous study found no effect of precipitation on the density of I. scapularis, a more recent long-term study found that increased regional winter precipitation was associated with higher tick densities [37]. Ixodid tick survival and activity are tied to temperature, and a recent study found that I. scapularis survived better under temperatures more representative of northern states compared with those in the southern states [38]. Relative humidity is important for ixodid ticks to maintain moisture while off of the host, but both of our models found that increasing relative humidity was negatively associated with Anaplasma seroprevalence in dogs. A plausible explanation for this finding is that increased humidity may well be related to decreased tick densities. That is, higher humidity levels are conducive to mold and fungal growth to which ticks are fatally susceptible to as eggs and during molting. For example, [39, 40] reported that I.ricinus densities on rodents decreased with increasing relative humidity.

The seroprevalence of Anaplasma spp. in dogs decreased as deer/vehicle collision reports increased, which was contrary to our initial hypothesis given the importance of deer to the life cycle of I. scapularis [41]. Unfortunately, this factor does not account for the rural/urban nature of the habitats or road types (e.g., secondary or tertiary) where the collisions take place; see [42] for a more in depth discussion of these issues. While further investigation is warranted to understand this negative association, other authors have also found “deer density associations” counter intuitive, see [32, 40, 4346] for some of the discussion and related literature.

Another puzzling finding was the positive association of Anaplasma seroprevalence in dogs with increasing household income. It is conceivable that high Anaplasma spp. prevalence areas coincide with some of the richer areas of the United States, thus confounding the factor. While people in these richer areas may engage in behaviors that increase the likelihood of ticks feeding on their dogs, such as outdoor recreational activities, wealthier dog owners may tend to keep their pet predominantly indoors, thus minimizing their risk of acquiring ticks [47]. However, even dogs that spend only small periods of time outdoors can acquire vector-borne infections; thus, the use of tick preventives is recommended for all dogs. Dogs in poorer regions may never be taken to the vet, clearing the infection themselves or may be treated with antibiotics (and not tested). Overall, the confounding nature of socioeconomic status merits further study.

The fitted models explain much of the data, but better fits could be achieved by including additional factors. One difficulty is that these data may not have been a true random sample, with correlation existing between some of the tests conducted at the same location. A more problematic issue lies with sampling biases: dogs in different parts of the country may be tested for exposure to Anaplasma for different reasons. For example, veterinarians in the Upper Midwest and Northeast, where Lyme disease has a high prevalence, may be more likely to screen all dogs using this rapid test. However, in areas where canine anaplasmosis or Lyme disease is uncommon, it is possible that only dogs with clinical signs or with travel histories to endemic regions may be tested. Other dogs could be coincidentally tested when screened for other vector-borne pathogens (e.g., heartworm), as the SNAP 4Dx Plus Test simultaneously tests for four distinct pathogen genera. Diagnostic tests specific for exposure to A. platys and acquisition of travel histories of seropositive dogs could help answer these questions about areas where granulocytic anaplasmosis is not considered endemic. Unfortunately, such data were unavailable at the time of this study. Because of these issues, caution should be used when comparing prevalence at two different areas of the United States.

The spatial prevalence maps here should not be interpreted at too fine of a spatial scale, they are intended as rough guidance. A county’s estimated prevalence is impacted by factor conditions in that county and by factor conditions in adjacent counties. For example, ticks are not expected to be numerous within New York City (say Manhattan), even though our mathematical model does not predict zero prevalence for Manhattan. Due to the zoonotic nature of anaplasmosis, one may compare the findings of our analysis to the reported geographic distribution of anaplasmosis incidence in humans provided by the Centers for Disease Control and Prevention [48]. Further, as I. scapularis is a primary vector of anaplasmosis another relevant comparison can be made between our findings and the predicted geographic density of nymphal I. scapularis presented in [49]. From these comparisons, one will note that the geographic patterns of our spatial prevalence maps are largely in agreement with the spatial patterns found in these two surrogate measures.

Clearly, our list of risk factors is incomplete. Tick abundances, for example, are likely an important consideration, but these data are not available for the entire United States. However, this model can be updated as more factors such as tick densities, land-use changes, or acaricide use are obtained.