Introduction

Populus L. is a model tree genus, with many of its approximately 30 species being extensively used for both pure and applied research purposes (Ellis et al. 2010). Despite the economic and scientific importance of this genus, the identification of different Populus species and hybrids can be problematic due to intraspecific morphological plasticity (e.g., Bylesjö et al. 2008) and frequent natural hybridization between species, particularly among members of the sections Aigeiros Duby and Tacamahaca Spach. (Eckenwalder 1984; Floate 2004; Mahama et al. 2011). Adding to this complexity, trihybrids (crosses involving three different species) and more complex combinations have been developed and deployed in Populus-breeding programs (e.g., Meirmans et al. 2010; Talbot et al. 2011) and have been detected in nature (Thompson et al. 2010; Talbot et al. 2012; Williams et al., unpublished). Molecular diagnostics, including AFLP (Cervera et al. 2005), DNA sequencing (Hamzeh and Dayanandan 2004) combined with RFLP (Schroeder et al. 2012), microsatellites (Liesebach et al. 2010), and medium-throughput single nucleotide polymorphism (SNP)-based genotyping assays (Hamzeh et al. 2007; Meirmans et al. 2007; Thompson et al. 2010; Talbot et al. 2011) have been used to diagnose poplar species and hybrids with success.

Nevertheless, a single combined genotyping array that is optimized to identify a maximum number of poplar species with the greatest marker success rates would be a valuable, cost-effective, and versatile diagnostic tool. Here, we present an optimized set of 36 SNP markers that can discriminate among eight poplar (Populus) species (Populus angustifolia James, Populus balsamifera L., Populus deltoides Bartram, Populus fremontii Watson, Populus laurifolia Ledeb., Populus maximowiczii Henry, Populus nigra L., and Populus trichocarpa Torr. & Gray) and that could be used to detect their early-generation hybrids (e.g., Thompson et al. 2010; Meirmans et al. 2010; Talbot et al. 2012).

Our strategy was to (1) obtain variable sequence sets from six Populus species commonly used in breeding programs; (2) identify “species-specific” SNPs in these target regions; (3) develop, test, and optimize a genotyping assay composed of these putative diagnostic SNPs; and (4) test the assay on additional Populus species to evaluate its accuracy and broader utility.

Materials and methods

Twenty-nine leaf samples (representing 27 provenances) from six Populus species were selected (1–15 individuals per species, Table 1) for DNA sequencing of 31 gene regions. From these, a total of 700 high-quality bidirectional DNA sequences were used to construct the SNP array. For practical considerations, one set of DNA sequences (forward and reverse) per gene per species was submitted to GenBank and has an accession number (171 DNA sequences in total, Supplementary Table S1), i.e., 52 new sequences and the remaining 119 from previously published studies (Meirmans et al. 2007; Thompson et al. 2010; Talbot et al. 2011). Primer pairs were designed using Primer3 (Rozen and Skaletsky 2000) and tested in silico on the P. trichocarpa genome v1 to avoid simultaneous amplification of paralogous loci (this is to be avoided as paralogues will produce biased, non-Mendelian results). Amplified regions were designed to be approximately 800 bp in length. DNA was extracted from dried leaf material with the Nucleospin 96 Plant II kit (Macherey-Nagel, Bethlehem, PA) following the manufacturer’s protocol for vacuum processing with the following modifications: (a) cell lysis using buffers PL2 and PL3 (PL2 was heated for 2 h at 65° C instead of 30 min) and (b) elution with an in-house Tris-HCl 0.01 mM pH 8.0 buffer. Gene regions were amplified by PCR using a PTC-100 (MJResearch, Waltham, MA) thermocycler. Reactions contained 1× PCR buffer, 0.13 μM of forward and reverse primers, 0.17 mM of each dNTP, 2.0 mM MgCl2, and 1 U Platinum Taq polymerase (Invitrogen, Burlington, ON). Temperature profiles were as follows: (1) 4 min at 95 °C, (2) 35 cycles of 30 s at 94 °C, 30 s at 58 °C, and 45 s at 72 °C, (3) 5 min at 72 °C. PCR products were visualized by gel electrophoresis and then sequenced at the McGill University and Génome Québec Innovation Centre on ABI 3730XL DNA Analyzer systems (Applied Biosystems, Carlsbad, CA) using their internal protocols.

Table 1 List of the 29 individuals (representing 27 provenances) of Populus used for Sanger sequencing and SNP discovery

SeqMan software v8 (DNAStar, Madison, WI) was used to assemble electropherograms for all DNA sequences and to identify nucleotide variations within the 31 gene regions. All potential variations were carefully validated visually and homozygous SNPs that differentiated Populus species were identified. A total of 40 loci were chosen for inclusion in a Sequenom iPLEX MassARRAY genotyping assay. An optimized set of primers for multiplex PCR was designed in invariant flanking regions of our sequences by the McGill University and Génome Québec Innovation Centre according to internal protocols (primer sequences in Supplementary Table S2).

The performance of the SNP array was evaluated by genotyping a validation set of 337 Populus samples from ten different species in both the Aigeiros and Tacamahaca sections, plus 111 samples from four species of the section Populus (1–120 samples per species, Supplementary Table S3). A SNP locus was considered to be monomorphic (i.e., fixed) when the minor allele frequency was equal to or lower than 3 %. SNP loci that failed or gave inconsistent results for every individual for the majority of all eight studied species were excluded from the remaining analyses. If a SNP locus did not amplify (absence of signal) for every individual in a species but consistently amplified for the majority of remaining species, this “technically” failed reaction was denoted as a fixed null allele (Carlson et al. 2006) and scored as homozygous 00 (0 is used to designate a null allele). However, a SNP locus was considered “unreliable” for a species if individuals of this species displayed a puzzling distribution of genotype classes (e.g., TT 00 or AA AG GG 00). In the latter case, the expected genotype classes were observed but an unusual number of individuals had failed reactions, suggesting the presence of a null allele. Confirmation of these putative null alleles would require additional evidence (more specimens/sequencing) and was beyond the scope of this project.

Results and discussion

The 31 gene regions ultimately targeted were distributed across 18 of the 19 Populus chromosomes, with one to four genes (median = 1.5) and one to three SNPs (median = 2) per chromosome. The physical distance between neighboring genes on the same chromosome ranged from 240,596 to 12,800,671 bp, so the 31 gene regions could be considered as unlinked. Out of the 40 loci tested, four failed or gave inconsistent results for all species in the Aigeiros/Tacamahaca sections, and very few of the locus/species combinations (7 out of the 288; 2.4 %) were deemed unreliable (Supplementary Table S4).

The remaining 36 loci from 28 gene regions (Supplementary Figure S1; Supplementary Table S5) accurately diagnosed all poplar species of the Aigeiros/Tacamahaca sections. Between 0 and 19 fixed differences separated pairs of poplar species (Table 2; locus/species combinations that contained fixed null alleles are shown but are not included in the final counts). P. balsamifera, P. deltoides, and P. nigra, all of which are known to hybridize (Thompson et al. 2010), were differentiated by 12–19 fixed SNPs, while P. fremontii, P. trichocarpa, and P. nigra, all of which are known to hybridize in California and Nevada (Eckenwalder 1982, 1984), were differentiated by 12–18 fixed SNPs, theoretically providing enough resolving power to identify hybrids using model-based Bayesian methods (Vähä and Primmer 2006). However, only four SNPs consistently differentiated P. balsamifera and P. trichocarpa, despite our screening efforts. Interestingly, no SNP was detected that could discriminate between P. deltoides and P. fremontii, except at locus A-025 where a null allele was detected. This lack of genetic differentiation between these two pairs of species is congruent with earlier surveys of the genus (Hamzeh and Dayanandan 2004; Levsen et al. 2012).

Table 2 Number of fixed SNP loci that differentiate pairs of Populus species and polymorphic SNPs within species (bold), as detected with the SNP assay (see Supplementary Table S4 for in-depth genotyping results); (+n) indicates the number of additional diagnostic loci displaying fixed null alleles

Thirty-six loci were selected for this study and were expected to be fixed within species, yet a number of loci showed intraspecific polymorphism within some species (Table 2, File S1). A total of 11 loci were polymorphic within at least one of the Aigeiros or Tacamahaca species (4.5 %, 13/288 of the locus/species combinations; Supplementary Table S4). SNPs have been developed without consideration for broader polymorphism data (and hence may suffer from ascertainment bias) and are occasionally identified based on sequence data from a single heterozygous individual (Vezzulli et al. 2008) as done here for P. laurifolia and P. maximowiczii (Table 1). It should be noted that despite sequencing a single individual from each of these two species, 23 individuals of P. laurifolia and 20 individuals of P. maximowiczii constantly differed from the other seven poplar species by 4–19 and 4–14 fixed SNPs, respectively (Table 2, Supplementary Table S4).

For particular locus/species combinations, null alleles were detected (Table 2; Supplementary Table S4, File S1). This type of result has already been observed when working with distant species because of the presence of unexpected polymorphisms affecting the amplification/hybridization process (e.g., Ollitrault et al. 2012). Although fixed null alleles were not included in the counts presented in Table 2, they could be used for diagnostic purposes under certain conditions (see below). In fact, since DNA sequences of P. angustifolia and P. fremontii were not used to design primers for the SNP array, it was not surprising that unexpected polymorphisms should affect SNP amplification (Carlson et al. 2006). Indeed, a posteriori examinations of DNA sequences for these two species (one individual per species) revealed mutations at the priming sites that likely hampered amplification of the studied loci, thus resulting in a failed reaction and subsequent absence of signal (= null allele). For instance, in P. angustifolia, such mutations were observed in one of the three priming sites used in the genotyping method for each of both loci A-024 and B-021 (no DNA sequence could be obtained for locus F-001). In this study, 38 pure P. angustifolia individuals consistently displayed an absence of signal for these three loci whereas they were successfully genotyped at the remaining loci (File S1). These individuals were then considered to have a fixed null allele and were scored as homozygous 00. Constant occurrences of null alleles were observed at loci A-024 and B-021 in more than 160 P. angustifolia individuals (Floate et al., unpublished results). Indeed, they proved to be useful for hybrid detection between P. angustifolia, P. balsamifera, and P. deltoides without the need to redesign a new SNP array (Supplementary Table S6, File S2).

Transferability of this diagnostic SNP array to the four distantly related aspen species in the section Populus was limited. In fact, 22 out of 144 (15.3 %) locus/species combinations were considered unreliable because they had putative null alleles (Supplementary Table S4), a higher proportion than that observed for the members of the Aigeiros/Tacamahaca sections. Only two loci, A-040 and A-041, could distinguish Populus grandidentata and Populus tremuloides from each other (Supplementary Table S4), whereas Populus alba and Populus tremula could not be distinguished using this SNP array. Three additional loci (A-007, A-024, and A-034) had fixed null alleles that could differentiate P. grandidentata from the three other aspen species. As described previously, DNA sequences of this group of species were not used to design the SNP assay, which presumably resulted in unexpected polymorphisms in the priming sites that led to failed amplifications (Carlson et al. 2006). Since no further genetic survey was conducted on aspens, these results should be interpreted with caution. Confirmation of the priming site polymorphisms and improvements of the array for the aspen species would require additional sequencing and validation. Nonetheless, under certain circumstances (e.g., a regeneration survey in mixed natural forests), the SNP array could discriminate among individuals belonging to different sections (Aigeiros/Tacamahaca versus Populus) of poplar species that are otherwise indistinguishable based on leaf morphology.

We demonstrated that this SNP assay, with its high success rate (97.6 %; 281 out of 288 locus/species combinations), could discriminate most Aigeiros and Tacamahaca species. Polymorphic loci are included in this final count because they can be useful for individual assignment (Vähä and Primmer 2006). Detection of early-generation hybrids among these species (for example, see Supplementary Table S6, File S2) will be possible in both natural populations and breeding programs, with various subsets of SNP being most informative in different contexts (i.e., depending on the number and identity of the species involved, natural versus artificial hybrid zones, etc.). In particular, this SNP array will be a reliable method to monitor for exotic gene escape from commercial plantations of exotic poplars to natural forests, providing a method for plantation managers to demonstrate compliance with regional certification standards (e.g., Forest Stewardship Council Certification).