Background

Triatomines are hematophagous insect vectors of Trypanosoma cruzi (Chagas, 1909), the etiological agent of Chagas disease, which is considered the principal parasitic disease of Latin America in terms of economic impact and public health [1,2,3].

Among the autochthonous triatomines of Brazil, Panstrongylus megistus (Burmeister, 1835) is considered the vector of greatest epidemiological importance of Chagas disease due to its high capacity to invade and colonize domestic buildings, high levels of infection with T. cruzi, eclectic blood-feeding and a wide geographical distribution in Brazil, ranging from the state of Maranhão to the state of Rio Grande do Sul [4,5,6,7]. In recent years, the presence of P. megistus has also been reported in the Brazialn states of Acre and Rondônia [8].

It has become apparent that the principal factor hindering the control of triatomines is reinfestation of houses previously treated with insecticides, possibly due to insects that survive spraying (residual foci) or triatomines from sylvatic foci [9]. Analysis of this problem requires an investigation at the microgeographic level. Microsatellite molecular markers have been used with success to evaluate the gene flow of triatomine populations between natural and artificial environments (intradomestic and peridomestic) [10,11,12,13,14,15,16,17,18,19,20,21,22]. It has been reported that populations of T. infestans originating from three Bolivian Andean locations did not show any difference in preference for insects in the artificial environment and those in the natural environment at the same location, suggesting that the source of infestation in the dwellings is wild insects [15]. De Rosas et al. [12] demonstrated that the level of genetic structure of Argentine T. infestans populations is approximately 400 m, indicating the possibility of passive dispersion within that distance. To reduce the odds of reinfestation, these authors suggested that insecticide treatment should occur on a larger radius around the infested area. In northern Argentina, an evaluation of the genetic structure of T. infestans populations revealed putative sources of reinfestation and its dynamics [22]. These studies illustrate that microsatellites can be useful for understanding the factors that favor the infestation/reinfestation of domiciles.

Given the absence of tools capable of capturing at a fine-scale the process of (re)colonization of P. megistus in the artificial environment, the aim of this work was to isolate and characterize microsatellite loci from P. megistus, using Illumina HiSeq genome sequencing.

Methods

This work was perfomed in five stages: (i) genome sequencing of P. megistus; (ii) assembly and selection of contigs with microsatellites; (iii) validation of the amplification and evaluation of the polymorphisms in the loci; (iv) standardization of the polymorphic loci; and (v) verification of cross-amplification with other triatomines species.

Sequencing, identification of regions with microsatellites and development of the primers

Two pools of five P. megistus were used, one of female individuals and another of males, from a colony derived from the municipality of Santana do Riacho, Minas Gerais, Brazil (19°10′8″S, 43°42′50″W) maintained in the insectary of the Laboratório de Referência em Triatomíneos e Epidemiologia da Doença de Chagas were selected for sequencing. DNA was extracted from the reproductive organs of the insects using the Genomic DNA Extraction and Purification Kit® (Promega, Madison, WI, USA) following the protocol described by the manufacturer. The libraries were constructed using the TruSeq DNA PCR Free 350 bp kit (Macrogen Inc., Seoul, South Korea) according to the manufacturer’s instructions, and the sequencing of each pool was performed separately using the Illumina HiSeq X platform (Illumina, Inc., San Diego, CA, USA). The raw sequence reads were evaluated in terms of read quality with PRINSEQ [23]. Data filtering and trimming (adaptor removal and Phred quality score cut-off ≥ 25) were performed with Trimmomatic [24]. Contigs were assembled de novo with a kmer size of 77 using SOAPdenovo2 [25]. Microsatellite regions were identified with MISA software [26] with the following parameters: minimum of ten repeats of 1 base, six repeats of 2 bases, five repeats of 3 bases, five repeats of 4 bases, five repeats of 5 bases and five repeats with 6 bases each. Microsatellites were selected based on: (i) minimum of six repeats of perfect, di- or trinucleotides; and (ii) identified in contigs with a minimum size of 2 kb. Additionally, Primer3 [27, 28] was used in order to identify primer annealing sites flanking the repeat regions. Only intergenic regions with a predicted amplicon size of between 150 and 400 bp were selected for further analysis.

Standardization of amplification and evaluation of polymorphism in microsatellite regions

Amplification and polymorphism of the loci selected were evaluated in six specimens of P. megistus: (i) two from a colony derived from the municipality of Juquiá, São Paulo, Brazil (24°19′15″S, 47°38′6″W); (ii) two from a colony formed from insects captured in diverse localities in the state of Minas Gerais, Brazil; and (iii) two others from the same colony used for genome sequencing, which came from Santana do Riacho.

The DNA was extracted from two legs of each insect following an adapted protocol of the Wizard® Genomic DNA Purification Kit (Promega) [29]. The DNA was quantified using a NanoDrop One Microvolume UV–Vis spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and stored at − 20 ºC until used.

In order to standardize and optimize the specificity of the PCR amplification, different dilutions of enzymes and cofactors, as well different annealing temperatures (range: 54 °C to 65 °C), were tested. The PCRs were performed in a final volume of 10 µl containing the Master Mix GoTaq Green (Promega), 10 nmol of forward primer, 10 nmol of the reverse primer and approximately 10 ng of template DNA. The reactions were performed on a Veriti thermocycler (Applied Biosystems, Foster City, CA, USA) under the following conditions: an initial denaturation at 95 °C, 4 min; followed by 95 °C/30 s, 54 °C to 65 °C/30 s, 72 °C/30 s for 35 cycles; and a final extension at 72 °C/5 min. The amplicons were run on 8% polyacrylamide gels stained with silver nitrate. The approximate size of the fragments was determined using the molecular marker ΦX 174 DNA HaeIII (Promega).

Characterization of polymorphic microsatellite loci

Among the evaluated loci that were found to be polymorphic under the conditions described in the previous section, 20 were selected for standardization (Table 1) based on an amplicon size of between 100 and 300 bp and annealing temperature, for use in future multiplex PCRs.

Table 1 Primer sequence and repeat motif of the genotyped microsatellite loci

Fifteen specimens of P. megistus captured in the municipality of Jaboticatubas, Minas Gerais, Brazil (19°30′50″S, 43°44′42″W) by Belisário et al. [30] were used. These samples were divided into two groups in order to evaluate intra-populational variability: nine insects from the locality Fazenda Santo Antônio (group I); and one insect each from the following localities: Barreiro do Papagaio, Fazenda Espada, Capão Grande II, Fazenda Borges, Guarazinho, and Fazenda Boiça (group II). The second group also included the insects from Santana do Riacho and Juquiá described in section Standardization of amplification and evaluation of polymorphism in microsatellite regions.

Four other species of triatomines were used to evaluate cross-amplification: (i) one specimen of Panstrongylus diasi Pinto & Lent 1946 from the insect collection of the Instituto René Rachou Fiocruz Minas/Belo Horizonte, Brazil; (ii) one specimen of Panstrongylus lignarius (Walker, 1873); (iii) one specimen of Triatoma tibiamaculata (Pinto, 1926); and (iv) one specimen of Triatoma sordida (Stal 1859). These last three insects were derived from the colonies of the Laboratório de Referência em Triatomíneos e Epidemiologia da Doença de Chagas of the Instituto René Rachou, Fiocruz Minas.

The DNA was extracted from two legs from each individual as described in section Standardization of amplification and evaluation of polymorphism in microsatellite regions. The PCRs were performed in a total final volume of 10 µl containing 5× Colorless GoTaq® Flexi (Promega), 3 mM MgCl2, 10 nmol of the fluorescently-labeled forward primer, 10 nmol of the reverse primer and approximately 10 ng of template DNA. The reactions were performed in a Veriti® 96-well thermocycler (Applied Biosystems) using the following cycling conditions: an initial denaturation at 95 °C, 4 min; then 95 °C/30 s, 65 °C/ 30 s, 72 °C/30 s for 35 cycles; 95 °C/30 s, 53 °C/ 30 s, 65 °C/30 s, 72 °C/30 s for 3 cycles; and a final extension of 72 °C for 5 min. The products were diluted in the ratio of 1:10 in UltraPure Distilled Water (Invitrogen, Carlsbad, CA, USA) and genotyped on an ABI 3730 Sequencer (Life Technologies, Carlsbad, CA, USA) using the size standard GeneScan Liz 500 of the sequencing platform of Sequenciamento de DNA por Eletroforese Capilar of the Instituto René Rachou. The chromatograms were analyzed using the software Geneious (ver. 10.1.3) [31]. The number of alleles with observed heterozygosity (OH) and expected heterozygosity (EH) (Arlequin ver. 3.5.2.2) [32] and the presence of null alleles (MICRO-CHEKER ver. 2.2.3) [33] were evaluated.

Results

Sequencing of the P. megistus genome yielded 463,151,518 reads (90.41% > Q30) for the female pool and 338,531,204 reads (89.54% > Q30) for the male pool. The GC-content was 34.53 and 35.06%, respectively. The de novo assembly was made using only reads with a quality > 25, generating a total of 7,908,463 contigs with a total length of 2,043,422,613 bp. The N50 (sequence length of the shortest contig at 50% of the total genome length) of the assembly was 1034 and the final GC-content was 33.14%. We identified a total of 2,043,690 microsatellite regions located in 1,441,091 contigs with mononucleotide repeats being the most abundant (1,054,968, corresponding to 51.62%) and pentanucleotide repeats present at a lower quantity (1337, corresponding to 0.06%). In accordance with the parameters described in the Methods section, 79 different microsatellite regions were selected for further analysis.

Among the 96 primer pairs selected for PCR amplification, 79 resulted in amplicons visualized on polyacrylamide gels after standardization. The annealing temperature after standardization varied from 60 °C to 67 °C (Additional file 1: Table S1). It was possible to visualize the presence of polymorphism on the gel for 64.55% (51) of the loci. Among these 51 loci, 20 markers were selected for characterization (Table 1). It was not possible to standardize the amplification of 17 primer pairs due either to low specificity or the absence of amplicons.

In the samples of P. megistus, the number of alleles per locus varied from two (Pm051, Pm071 and Pm079) to nine (Pm049), with a mean of 4.9 (Table 2). Among the 20 loci genotyped, two (Pm051 in group II and Pm058 in group I) were monomorphic and for one (Pm054) it was not possible to visualize peaks on the chromatogram (Table 3).

Table 2 Allele number and size range per loci for each triatominae species used in the study
Table 3 Number of genetic copies, allele number, observed heterozygosity and expected for each Panstrongylus megistus group

The OH of group I varied from 0.00000 (Pm002) to 0.55556 (Pm049) (mean 0.29630), while the EH varied from 0.20915 (Pm071) to 0.77778 (Pm083) (mean 0.57807). In group II, the OH varied from 0.00000 (Pm002) to 0.60000 (Pm049) (mean 0.24504), while the EH varied from 0.10000 (Pm071) to 0.88421 (Pm049) (mean 0.61446) (Table 3).

Of the 20 loci analyzed, group I had null alleles at seven loci (Pm002, Pm008, Pm018, Pm048, Pm051, Pm076 and Pm079) and group II had null alleles at five loci (Pm002, Pm027, Pm048, Pm079 and Pm083).

Regarding cross-amplification, 13 loci were amplified from other species. The Pm030 marker was amplified from all the samples tested. However, the number of alleles was lower than that in P. megistus, varying from one to four, with a mean of 1.5 (Table 2). OH and EH were not calculated because there was only one specimen for each of the species evaluated.

Discussion

This study is novel in two aspects: in the development of primers to microsatellites of P. megistus and in the methodology used. Unlike previously published studies on the identification of microsatellite markers in triatomines, in which the methodology most commonly used is enriched libraries and the subsequent use of probes for the selection of microsatellite regions [34,35,36,37,38,39,40], the present study is the first to use next-generation sequencing. This methodology was chosen due to the limited information available on the genome of P. megistus, the principal species transmitting T. cruzi in Brazil.

For the design of the initial primers, we chose to synthesize only primers to regions flanking dinucleotide and trinucleotide repeats because these latter arrangements have higher mutation rates compared to other microsatellite classes [41]. Therefore, they will be more informative in future population genetic studies.

To make the PCR reactions more specific, we tested different annealing temperatures. The optimal temperatures utilized were those observed prior to the absence of bands in the polyacrylamide gel, as well as prior to the dilution of enzymes and cofactors. Modification of such determinants can alter the success of PCRs [42]. Even after these attempts, it was not possible to standardize the PCR for 17.7% of the 96 loci selected due to the absence of fragments or the presence of nonspecific fragments. In these cases, the primer may have been designed to a non-conserved region or have more than one binding site.

Two loci (Pm051 in group II and Pm058 in group I) exhibited monomorphic patterns and in another locus (Pm054) it was not possible to visualize peaks in the chromatogram. However, working with a larger sample number can increase the chances of observing heterozygous and polymorphic individuals [43]. Accordingly, these three loci will be evaluated in a larger sample set in a future study to verify the absence of polymorphism, since it was possible to visualize amplicons from these loci in polyacrylamide gels.

The quantity of alleles encountered in the 20 loci tested differed from that that encountered in other studies on triatomines: in T. dimidiata, the mean allele number (AN) was reported to be 16, varying from six to 27 alleles at eight loci [35]; in T. infestans, the mean AN was 9.7, varying from five to 17 alleles at 13 loci [36]; in Rhodnius pallescens, the mean observed AN was nine alleles, varying from two to 20 at ten loci analyzed [34]; in T. sordida, the mean AN was 7.4, varying from one to 12 at ten loci [44]; in T. pseudomaculata, the mean AN was 6.6, varying from two to 15 alleles at seven loci [39]; and for T. brasiliensis, the mean observed AN was five at seven standardized loci, ranging from one to 13 alleles per locus [40]. This difference can be explained by the sample size in each study, ranging from 34 [35] to 171 [44] samples, while we used 19 samples of P. megistus.

The presence of four loci in P. diasi and eight in P. lignarius was expected due to the conservation of some genomic regions in phylogenetically related species. This has also been reported in other studies [35, 36, 38,39,40, 44]. However, the present work is the first to report amplification in species from different genera (T. sordida and T. tibiamaculata).

The values observed for OH and EH may be indicative of the presence of excessive homozygotes in the groups tested, population structure or the presence of null alleles [45]. The analysis of null alleles demonstrated low occurrence in both groups (35 and 25% in groups I and I, respectively). The presence of null alleles can occur due to mutations in the flanking sequences of the microsatellite regions, thus preventing binding of the primers. The regions not amplified result in apparently homozygous samples when present in a heterozygous state [46, 47].

In the last 40 years, microsatellites have been the most used molecular marker to access polymorphisms of a wide variety of organisms. One of the difficulties in applying this methodology is the need for prior knowledge of the genome. Currently, this problem can be overcome by using the genotyping-by-sequencing (GBS) as a marker. However, GBS requires a greater amount of DNA, throughput and the cost is higher [48].

Conclusions

Due to the high polymorphism and number of alleles encountered in each locus, as well as the capacity to amplify from geographically distant populations, we conclude that the markers developed in this study show promise for population genetic studies of P. megistus. Thus, we hope to help to elucidate the reinfestation processes in the artificial environment by this vector.