Background

Dirofilaria immitis is the causative agent of heartworm disease, which can produce life-threatening morbidity that affects dogs, cats, and wild canids [1,2,3,4,5,6]. This filarial nematode is distributed in North and South America, Southern Europe, Japan, Australia, India, and China [7, 8]. The macrocyclic lactones (ML) milbemycin oxime, ivermectin, moxidectin, and selamectin are the available prophylactic drugs in the U.S. veterinary marketplace that prevent the establishment of L3–L4 larval D. immitis stages in dogs and cats [9]. The first ML loss of efficacy (LOE) report was published in 2005 [10]. Reports of LOE dogs in the United States have persisted over the years. Some of these suspected LOE cases are no doubt due to lack of full compliance with recommended chemoprophylaxis regimens [11]. Nevertheless, recently ML resistance has been confirmed in the U.S. [12, 13]. Because ML resistance has a genetic origin [14], whole genome analysis has been performed on well characterized susceptible D. immitis isolates from the U.S., Italy, Gran Canaria, and Grenada and LOE isolates from the U.S. to identify genetic differences that could correlate with evidence of LOE and resistance [13]. One hundred eighty-six single nucleotide polymorphisms (SNP) showed highly significant differences between pools of susceptible and LOE D. immitis. Based on these 186 SNPs, Sequenom® SNP frequency analyses were conducted on 663 individual parasites (adult worms and microfilariae) which were phenotypically characterized as susceptible (SUS), confirmed ML treatment survivors/resistant (RES), or suspected resistant/loss of efficacy (LOE) parasites. This approach identified 42 SNPs that appeared to differentiate ML-susceptible from LOE and resistant D. immitis isolates [13]. It is highly desirable to reduce the number of marker SNPs using additional ML phenotypically characterized D. immitis isolates. Previously, only a small number of confirmed resistant isolates had been genetically characterized. Ultimately, the goal is to build a robust protocol for field application that could be used to monitor for resistant D. immitis isolates in the field. Such genetic markers for susceptibility/resistance may also be useful in developing protocols for managing drug resistance in D. immitis and for establishing improved scientifically based protocols for registration of new heartworm preventives.

Methods

Samples

Ten isolates were provided by Zoetis Animal Health for analysis. The detailed information on the isolates has been published elsewhere [15, 16]. The ML phenotype response was assessed in efficacy studies with a dose of 3 μg/kg of moxidectin (MOX), and the origin of the isolates are presented in Table 1. In total, five isolates were susceptible to MOX while five were resistant to heartworm preventive to varying degrees.

Table 1 Dirofilaria immitis isolates used for MiSeq sequencing

Sample manipulation and DNA extraction

Microfilariae (MF) were shipped to McGill University in fresh blood collected in EDTA tubes from untreated dogs infected with each isolate. A filtration procedure [13] was used to extract and clean MF from a 15- to 20-mL blood sample. Polycarbonate membrane filters (3.0 μm; 25 mm; Sterlitech® Corporation, Kent, WA, USA) were used for the filtration. A 1:1 dilution from venous blood with NaHCO3 (SIGMA®, Aldrich Co., Oakville, ON, Canada) solution (2 g/L) was made before filtration (5–20 mL per filter). DNA extraction of pooled MF was achieved using QIAamp® DNA Micro kit (Qiagen Inc., Toronto, ON, Canada). DNA concentrations for all samples were evaluated using the Quant-iT™ PicoGreen DNA Assay Kit (Invitrogen®, Life Technologies Inc., Burlington, ON, Canada).

SNP markers

Based on previous work, 42 SNPs out of the 186 identified from the whole genome [13] were evaluated as they seemed to better differentiate the ML-susceptible phenotype from the LOE and resistant phenotype. The list of SNP positions investigated in the current study is available in Additional file 1 at the end of this article.

Genetic analysis

Ten DNA pools of MF were sequenced with MiSeq® at Génome Québec Innovation Centre (McGill University). Libraries were prepared following two successive thermocycler steps for tagging with CS1 and CS2 primers [17] and barcoding the fragments. The lists of primers are available in Additional file 2 at the end of this article. Then the fragments were pooled and purified using AMPure XP beads (Beckman Coulter, Inc.) [18], and library quality control was performed. Libraries were then run on a MiSeq® sequencing system using paired end read of 250 base pairs (PE250).

Data analysis

Reads obtained from MiSeq® were trimmed from the 3’ end to have a Phred score of at least 30. Illumina sequencing adapters were removed from the reads, and all reads were required to have a length of at least 50 bp. Trimming and clipping were performed using Trimmomatic (http://www.usadellab.org/cms/?page=trimmomatic) [19]. The filtered reads were aligned to the nDi.2.2. D. immitis genome (http://www.nematodes.org/genomes/dirofilaria_immitis/). Each read set was aligned using BWA (http://bio-bwa.sourceforge.net/) [20], which has a low error rate (<3%), and which a Binary Alignment Map file (.bam) created. Then, all read set BAM files from the same sample were merged into a single global BAM file using Picard (http://broadinstitute.github.io/picard/). BVATool (https://bitbucket.org/mugqic/bvatools/src) was then used to extract from the BAM files the read frequencies at each of the 42 SNPs. The read frequencies were assimilated to the allele frequencies. Allele frequencies from the ten isolates were compared with those of isolates described previously [13].

Statistical analysis

The identification of the best SNPs to predict ML resistance from these samples was assessed visually by plotting the allele frequencies using GraphPad Prism Software (Version 5). Any allele frequency difference between groups was assessed using Chi-squared tests. The limit of significance was p value = 0.05.

Predictive model

The Random Forest algorithm [21] as implemented in the “Biomarker analysis” module in MetaboAnalyst 3.0 (http://www.metaboanalyst.ca) [22,23,24,25,26] was used to build classification models and to evaluate their performance in predicting ML phenotypes in D. immitis. Random Forest is a well-established algorithm based on ensemble learning using a multitude of decision trees. It has been successfully used in building predictive models from SNP data [27, 28]. The tool allowed the identification of different SNP combinations that could best distinguish the two groups. In this case, a score of zero was the optimal value for ML susceptibility prediction; and a score of one was the optimal value for ML resistance prediction. Using a cut-off of 0.5, any sample with a predicted class probability less than 0.5 was considered ML susceptible while any sample with a predicted class probability higher than 0.5 was considered ML resistant. The sensitivity [True Positive/(True Positive + False Negative)] and the specificity [True Negative/(False Positive + True Negative)] of the different models based on different numbers of SNP combinations were obtained using MetaboAnalyst 3.0. To obtain further insight, a heat map was constructed with the percentage of the alternative allele that characterized resistance and the phenotype response from the ML efficacy study of each isolate. The Metairie isolate was not used in the heat map, as the percentage efficacy was not known, although the isolate was classified as resistant. The susceptible samples from a previous study [13] were assumed to give 100% reduction in an efficacy trial, while RES-1 and RES-2, a resistant isolate [13], were known to have 21.6% and 71.3% efficacy, respectively, with ivermectin.

Results

Genetic analysis

The mean depth sequencing coverage of the region, including the SNP, was ~2000X. The percentage frequencies of the alternative alleles of the 42 SNPs previously associated with LOE and resistance [13] are presented in Fig. 1. The differences of the percentage alternative allele frequencies between the new resistant isolates ZoeRES (ZoeMO, ZoeAMAL, ZoeLA, ZoeJYD-34, Metairie) and the new susceptible isolates ZoeSUS (ZoeKY, ZoeMI, ZoeGCFL, ZoeAL, ZoeMP3) showed that 40 of the 42 SNP positions had a higher percentage of the alternative alleles in ZoeRES compared with ZoeSUS (Fig. 1). The allele frequencies identified in the ZoeSUS and in ZoeRES populations are in accordance with the allele frequencies previously reported [13]. Two SNP positions (NODE_42411_RC and NODE_21554_RC) are presented as examples in Fig. 2 which shows the percentage allele frequencies of these two SNPs in all of the characterized D. immitis isolated collected so far. This current study allowed the number of SNPs that can best predict ML response in D. immitis to be narrowed.

Fig. 1
figure 1

Percentage frequencies of the alternative alleles at 42 previously reported SNP positions in ZoeSUS and ZoeRES isolates. ZoeSUS correspond to ML-susceptible isolates (ZoeKY, ZoeMI, ZoeGCFL, ZoeAL, and ZoeMP3). ZoeRES correspond to MLresistant isolates (ZoeMO, ZoeAMAL, ZoeLA, ZoeJYD-34, and Metairie). ZoeRES – ZoeSUS is the difference in the percentage alternative allele frequencies between ZoeRES and ZoeSUS. RC stands for reverse complement

Fig. 2
figure 2

Example of two SNP positions (NODE_42411_RC and NODE_21544_RC) for which allele frequencies from ZoeSUS and ZoeRES are in accordance with previously reported data from ML-susceptible, LOE, and resistant isolates. RC stands for reverse complement

In addition, ZoeMO and ZoeJYD-34 isolates were compared. The difference between the two samples was that ZoeMO was the ZoeJYD-34 isolate re-passaged to a recipient dog 1.5 years later, with no intervening drug exposure. Thus, ZoeMO D. immitis population was a subset of the Zoe-JYD34 population (from a 50 L3 inoculum). So, ZoeMO and ZoeJYD-34 parasite populations are related but may not be genetically identical. ZoeMO and ZoeJYD-34 isolates had 82 and 19% percentage MOX efficacy, respectively [15] (Table 1). Interestingly, in Fig. 3, out the 42 SNPs described, 28 SNPs showed a higher frequency of the alternative allele (resistance associated) in ZoeJYD-34 compared with ZoeMO, six SNPs shared similar genetic profiles while nine SNPs had showed a higher frequency of the alternative allele in ZoeMO compared with ZoeJYD-34.

Fig. 3
figure 3

Difference in the percentage alternative allele frequencies of 42 SNPs between ZoeMO and ZoeJYD-34. RC stands for reverse complement

Predictive model

Using the Random Forest algorithm as implemented in the biomarker module from MetaboAnalyst 3.0, a series of predictive models were generated using any combination of two, three, five, and ten SNPs to differentiate the ML-susceptible isolates from the resistant isolates (see Additional file 3 at the end of this article). The results showed that using just a few SNPs, all models perform well with >90% sensitivity and 100% specificity. Given the relatively small the sample size (n = 17), it is expected that more robust performance estimate will be obtained when results of more isolates become available. For practical reasons for future field application, it was decided to test SNP combinations with the three- and five-SNP models, the three and five SNPs that gave, individually, the best performance (Fig. 4). Caution is necessary as they may be only the best individual SNP markers for this current dataset, and not necessarily for new isolates. Thus, there is a risk that performance evaluation was over-optimistic due to small sample size. With this in mind, only SNPs labeled NODE_42411_RC, NODE_21554_RC, and NODE_45689 were used in the three-SNP model. NODE_42411_RC, NODE_21554_RC, NODE_45689, NODE_20587_RC, and NODE_9400 were used in the five-SNP model. The results are presented in Fig. 4. Interestingly, the three-SNP model, using the SNPs NODE_42411_RC, NODE_21554_RC, and NODE_45689, better differentiated the ML-susceptible samples from the ML-resistant samples compared with the five-SNP model. In both cases, however, none of the samples were misclassified. When the allele frequencies of some of the SNPs used in the best three-SNP and five-SNP models were plotted against the percentage resistance (100 - % efficacy) for the nine isolates for which efficacy data are available, significant regressions were obtained (Fig. 5).

Fig. 4
figure 4

Mathematical models that predict for ML susceptibility and ML resistance in D. immitis isolates using combinations of three or five SNPs. The models were built with MetaboAnalyst 3.0 and Random Forest algorithm (n = 17). RC stands for reverse complement

Fig. 5
figure 5

Linear regressions with some of the SNPs retained in the three-SNP and five-SNP models (Fig. 4). Each regression is based on the nine isolates for which efficacy data was available. RC and alt stand for reverse complement and alternative nucleotide, respectively

The heat map presented in Fig. 6 allowed the phenotype ML response of the isolates to be mapped against the percentage frequency of the alternative allele at all of the different SNP positions for all isolates with efficacy data. The analysis identified nine SNPs that were more closely correlated with the phenotype response such as NODE_42411_RC, NODE_9400, NODE_29128, NODE_45689, NODE_27461, NODE_15709_A_RC, NODE_30575, NODE_21554_RC, and NODE_48992_B. Some of the SNPs were identified earlier as having a better individual performance than others in the mathematical model (see Additional file 4 at the end of this article). The difference between the two analyses was that for the individual performance, the ML response was categorical (0 for susceptible and 1 for resistant), while in the heat map, phenotype was a continuous variable as the percentage reduction efficacy was used. Details of the estimation of efficacy are described in detail elsewhere [15]. Although the genetic data and the biological data were in accordance, the data should be treated with caution, as the efficacy studies and samples for genetic analysis (using untreated control dogs only) were related but not from the same dogs (in the efficacy studies, obviously the treated dogs were in the treatment group, whereas the MF for genetic analysis were from untreated dogs, from the same isolate). In general, the heat map showed significant differences in the frequencies of the alternative alleles between ML-susceptible and ML-resistant isolates.

Fig. 6
figure 6

Heat map built with MetaboAnalyst 3.0 and Random Forest algorithm including genetic information and ML phenotype response from efficacy studies (n = 16). The red circles identified the SNPs that are the most correlated with ML phenotype response. RC stands for reverse complement

Discussion

At a time when ML resistance in D. immitis has become a concern [12, 13], identifying reliable genetic markers to predict ML response is important. The current study allowed us to determine the percentage of alternative alleles, previously identified as putative markers, in additional phenotypically well-characterized D. immitis isolates. ZoeRES isolates showed similar genetic profiles to LOE and confirmed resistant isolates previously reported [13], which was encouraging when seeking universal genetic markers to predict ML response in D. immitis. In addition, MiSeq technology appeared to be a suitable technology to work with MF pool samples.

Genetic analysis of ZoeMO and ZoeJYD-34 isolates was consistent with some reversion towards susceptibility in ZoeMO compared with parental ZoeJYD-34. Previous work in Onchocerca volvulus [29], a closely related filarial parasite, showed that female worms that carried an IVM selected genotype were less fertile than unselected worms. Thus, a possible difference in fitness between susceptible and resistant parasites could be considered. With no additional drug pressure on ZoeMO, the more susceptible female worms in the population may have produced more susceptible offspring (microfilariae) that could change the genetic profile and the resistance phenotype of ZoeMO compared with ZoeJYD-34. These fitness possibilities need investigation.

Conclusion

Predictive models based on the Random Forest algorithm offer a promising approach to investigate which SNP combinations would best predict ML response. At the current stage, due to a limited sample size (n = 17), the combination of SNPs identified with the mathematical model may not be yet the final optimal set of markers. However, they will provide useful tools to consider when additional isolates can be added to the current models. This should help to identify a small number of the SNP for field monitoring for resistance. A new study supported by the American Heartworm Society to further evaluate the SNP markers in isolates coming from veterinary clinics in U.S., should provide additional information and increase confidence in using these SNPs for resistance identification and monitoring.