Theoretical and Applied Genetics

, Volume 109, Issue 6, pp 1204–1214

Cross-species transferability and mapping of genomic and cDNA SSRs in pines

Authors

  • D. Chagné
    • UMR 1202 BIOGECO-INRA
  • P. Chaumeil
    • UMR 1202 BIOGECO-INRA
  • A. Ramboer
    • UMR 1202 BIOGECO-INRA
  • C. Collada
    • Departamento de Biotecnología, Escuela Técnica Superior de Ingenieros de Montes de Madrid, UPMCiudad Universitaria sn
  • A. Guevara
    • Departamento de Genética ForestalCIFOR-INIA
  • M. T. Cervera
    • Departamento de Genética ForestalCIFOR-INIA
  • G. G. Vendramin
    • Istituto di Genetica Vegetale, Sezione di FirenzeConsiglio Nazionale delle Ricerche
  • V. Garcia
    • UMR Physiologie et Biotechnologie VégétaleINRA Bordeaux
  • J-M Frigerio
    • UMR 1202 BIOGECO-INRA
  • C. Echt
    • New Zealand Forest Research Institute Ltd
  • T. Richardson
    • New Zealand Forest Research Institute Ltd
    • UMR 1202 BIOGECO-INRA
Original Paper

DOI: 10.1007/s00122-004-1683-z

Cite this article as:
Chagné, D., Chaumeil, P., Ramboer, A. et al. Theor Appl Genet (2004) 109: 1204. doi:10.1007/s00122-004-1683-z
  • 363 Views

Abstract

Two unigene datasets of Pinus taeda and Pinus pinaster were screened to detect di-, tri- and tetranucleotide repeated motifs using the SSRIT script. A total of 419 simple sequence repeats (SSRs) were identified, from which only 12.8% overlapped between the two sets. The position of the SSRs within their coding sequences were predicted using FrameD. Trinucleotides appeared to be the most abundant repeated motif (63 and 51% in P. taeda and P. pinaster, respectively) and tended to be found within translated regions (76% in both species), whereas dinucleotide repeats were preferentially found within the 5′- and 3′-untranslated regions (75 and 65%, respectively). Fifty-three primer pairs amplifying a single PCR fragment in the source species (mainly P. taeda), were tested for amplification in six other pine species. The amplification rate with other pine species was high and corresponded with the phylogenetic distance between species, varying from 64.6% in P. canariensis to 94.2% in P. radiata. Genomic SSRs were found to be less transferable; 58 of the 107 primer pairs (i.e., 54%) derived from P. radiata amplified a single fragment in P. pinaster. Nine cDNA-SSRs were located to their chromosomes in two P. pinaster linkage maps. The level of polymorphism of these cDNA-SSRs was compared to that of previously and newly developed genomic-SSRs. Overall, genomic SSRs tend to perform better in terms of heterozygosity and number of alleles. This study suggests that useful SSR markers can be developed from pine ESTs.

Introduction

In contrast to other plant species, few polymorphic single-copy nuclear microsatellite markers or simple sequence repeats (SSR) have been reported in the Pinaceae (reviewed in Table1). The genome structure of these species, characterised by a large physical size (22 pg/C, Leitch et al. 2001) with a large amount of repeated sequence (Kriebel 1985; Kamm et al. 1996; Kossack and Kinlaw 1999; Elsik and Williams 2000) has been the main obstacle to the development of useful markers. In addition, the ancient divergence time between coniferous species (Price et al. 1998) and the complexity of their genomes means that transferability of single-copy SSRs among genera and even within Pinus (the most studied genus) is generally poor, resulting in a large proportion of amplification failure, non-specific amplification, multi-banding patterns or lack of polymorphism (Echt et al. 1999; Mariette et al. 2001). Given the high cost of developing useful SSR markers, cross-species transferability is a valuable attribute.

In an attempt to circumvent these genome-related problems, Elsik and Williams (2001) removed most of the repetitive portion of the genome using a DNA reassociation kinetics-based method, and Zhou et al. (2002) targeted the low-copy portion of the genome using an undermethylated region enrichment method. Both approaches yielded remarkable enrichment for useful SSR markers in Pinus taeda. Scotti et al. (2002a, b) used an alternative strategy based on the pre-screening of single-copy microsatellite containing clones, using dot blot hybridisation analysis, and also obtained a high number of single-copy polymorphic SSR markers in Picea abies. Pinus taeda SSRs developed by Elsik and Williams (2001) and Zhou et al. (2002) transferred quite well between American hard pines (Shepherd et al. 2002), but were shown to be less transferable in the phylogenetically divergent Mediterranean hard pines (Gonzalez-Martinez et al. 2004). Interestingly, perfect trinucleotide SSRs transferred from American to Mediterranean pines better than other motifs (Kutil and Williams 2001).

Simple sequence repeats have been found in all genomic regions, including coding regions (Toth et al. 2000). By developing a cDNA library enriched in SSRs, Scotti et al. (2000) showed the presence of microsatellites within the coding regions of Norway spruce (Picea abies), a species belonging to the Pinaceae. The availability of expressed sequence tags (ESTs) resulting from large sequencing projects is potentially a valuable source of SSRs that can be evaluated with less intensive laboratory development. Recently, cDNA-SSRs were obtained from EST databases developed in several plant species such as grape (Scott et al. 2000), cereals (Temnykh et al. 2000, 2001; Cho et al. 2000; Cordeiro et al. 2001; Kantety et al. 2002; Eujayl et al. 2002; Varshney et al. 2002; Gao et al. 2003) and Arabidopsis (Cardle et al. 2000; Morgante et al. 2002). These EST-derived markers showed good transferability between phylogenetically related species (Eujayl et al. 2003; Gupta et al. 2003).

The objectives of this study were threefold: (1) to investigate the relative occurrence and types of SSRs present in the coding regions of two pine genomes, (2) compare polymorphism levels of SSRs derived from cDNA and genomic sources, and (3) compare the transferability of cDNA-SSRs and genomic SSR markers across several pine species.

Materials and methods

In silico SSR detection in pine ESTs

Public EST database were independently assembled for Pinus pinaster and P. taeda using StackPack (Christoffels et al. 2001). A total of 18,498 P. pinaster ESTs provided 2,893 contigs and 5,001 singletons (http://cbi.labri.fr/outils/SAM/COMPLETE/index.php). For P. taeda, 8,070 contigs and 12,307 singletons resulted from 75,047 ESTs (http://web.ahc.umn.edu/biodata/nsfpine/contig_dir16/).

Pinus pinaster and P. taeda unigene sets were searched for tandemly repeated motifs of 2, 3 and 4 bp using the SSRIT SSR search tool (Temnykh et al. 2001; http://www.gramene.org/db/searches/ssrtool), with 14, 15 and 20 as the minimum repeat length, respectively. We associated the SSRIT Perl script with the FrameD gene prediction software (Schiex et al. 2003) to determine if the detected repeat motifs were located in the 5′ or 3′ untranslated regions (UTRs) or in the open reading frames (ORF). FrameD was developed to predict the position of the translated regions in EST sequences. Because FrameD uses interpolated Markov models (IMM; Salzberg et al. 1998) to build probabilistic models of coding sequences, a pine-specific IMM was constructed to enhance the prediction in P. taeda and P. pinaster sequences. We used 67 kb from 65 pine full-length coding sequences to build the Pinus IMM (Table S2). Finally, the sequences containing microsatellites in P. pinaster and P. taeda were compared in order to check the redundancy of the sequences containing SSRs in both species.

PCR primer design and amplification

We designed 56 PCR primer pairs (set no. 1) flanking the microsatellites identified with our in silico analysis using Primer v3.0 software (http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi) with default parameters, except that we used a range of 40–55% for the primer GC%, GC clamps of 2 bases and a maximum Tm difference of 10. We kept the expected amplified fragment length below 500 bp to avoid the risk of the presence of introns, which may induce PCR failure. Fifty-three out of 56 PCR primers were designed based on P. taeda sequences and three were developed from P. pinaster sequences. The PCR primers were chosen to represent the broadest range of SSRs possible considering the repeat type (di-, tri- or tetranucleotide), the motif (e.g., AG, AT), the length (5–26 repeats) and the position (UTR or ORF). In addition to these new SSRs, we also included a set of 16 cDNA-SSRs previously developed from P. taeda sequences (set no. 2, C. Echt, http://dendrome.ucdavis.edu/Gen_res.htm). This second set resulted from a SSR search using a preliminary sequence dataset of about 10,000 P. taeda ESTs.

A third set of 107 PCR primers (set no. 3) was developed from P. radiata genomic SSRs and screened for amplification success in pine species (C. Echt and T. Richardson, unpublished data). A fourth set of three SSR markers described by Mariette et al. (2001) was also used (set no. 4).

DNA was isolated using the protocol described by Doyle and Doyle (1990). PCR reactions were performed with 15 ng of genomic DNA in a total reaction volume of 10 μl, with 1× reaction buffer (Gibco BRL), 2 mM MgCl2, 1 μM of each primer, 0.2 mM of dNTP and 0.5 U of Taq polymerase (Gibco BRL) on a Stratagene Robocycler Gradient 96 (Stratagene, La Jolla, Calif., USA) using the following cycles: preliminary denaturing (94°C, 5 min) followed by 30 cycles of denaturing (94°C, 30 s), annealing (locus-specific temperature, 30 s), and extension (72°C, 1 min), and a final extension (72°C, 10 min). An additional touchdown was performed for some loci (10 cycles with the annealing temperature decreasing by 1°C for every cycle).

Amplification success was checked on 1.5% agarose gels. We checked that the amplification showed a single band pattern with a size corresponding to the expected length. Amplifications resulting in multiple bands were discarded from further analysis since they could result from non-specific amplification or paralogous loci. The useful loci were then run on a LICOR automated sequencer using the same conditions described by Mariette et al. (2001) to precisely determine the length of each amplification product (i.e., allele).

Sequencing

Amplified fragments in P. pinaster were cloned and sequenced as described by Dubos and Plomion (2003) in order to check the orthology of the same markers as based on sequence identity.

Plant material

Polymorphism and reliable co-dominant inheritance were tested in three P. pinaster mapping pedigrees (the INRA-F2 pedigree, Costa et al. 2000; the INRA-G2 pedigree Chagné et al. 2002; and the AFOCEL-F1 pedigree, Ritter et al. 2002) for which saturated genetic maps are available, and a fourth (INIA-F1) which is under construction (M.T. Cervera, unpublished data). Loci that were polymorphic in at least one mapping pedigree were also tested on 26 unrelated P. pinaster elite trees from the Aquitaine region (south-western France). These trees are first generation selections for the P. pinaster breeding programme and were used to estimate the level of diversity (heterozygosity and number of alleles) of the SSRs.

Samples from seven species belonging to the genus Pinus (subgenus Pinus): P. canariensis, P. halepensis, P. pinaster, P. pinea, P. radiata, P. sylvestris, and P. taeda were used to test the amplification rate of the cDNA-SSR markers.

Mapping

Markers segregating in the INRA-G2 and INRA-F2 mapping pedigree were visually scored and assigned two allele genotypes. We used Joinmap v3.0 (Van Oijen and Voorrips 2001) using a minimum LOD of 6.0 for genetic map construction. The Arlequin software (Schneider et al. 2000) was used to estimate genetic diversity parameters based on the genotypes of the 26 unrelated P. pinaster individuals.

Results

SSR detection in pine ESTs and sequence annotation

A total of 251 and 168 SSRs were found in P. taeda and P. pinaster unigene sets (Table S3). This corresponds to enrichment rates of 1.2 and 2.1%, respectively (Table 1). The most common repeat types were trinucleotides (63% in P. taeda and 51% in P. pinaster), followed by dinucleotides (36% in P. taeda and 45% in P. pinaster). Tetranucleotide repeats were almost absent (1% in P. taeda and 3% in P. pinaster). These results were obtained for a minimum repeat number of 7, 5 and 5 for di-, tri- and tetranucleotide motifs, respectively. These thresholds are comparable to those used by Cardle et al. (2000) and Scott et al. (2000), and correspond to perfect motifs only. If we used less stringent detection criteria (e.g., minimum of 5 repeats for dinucleotides, as in Morgante et al. 2002) and allowed the detection of compound motifs we have estimated that the SSR enrichment would increase by twofold.
Table 1

Di-, tri- and tetranucleotide SSR detection in Pinus pinaster and P. taeda unigenes using SSRIT software

Regarding the types of repeated motif (Fig. 1), the AT and AG motifs were the most represented among the dinucleotides (76 and 19% in P. taeda, and 47 and 51% in P. pinaster, respectively), whereas the AC and CG types were rare (<3% in both species). Regarding trinucleotides, the AAG motif was the most common repeat type (23.9 and 19.3% in P. taeda and P. pinaster, respectively), followed by AGC and AGG motifs.
Fig. 1

Distribution of the different classes of di- and trinucleotide SSRs in Pinus taeda (grey boxes) and P. pinaster (black boxes) unigenes

Figure 2 shows the position of the detected SSRs in the gene sequences of both species based on the results obtained with FrameD (Schiex et al. 2003). Significant differences between di- and trinucleotide SSRs were observed. Dinucleotides were found mostly in the UTRs (75 and 65% in P. taeda and P. pinaster, respectively), whereas trinucleotides were more frequent in the ORFs (76% in both species). For both type of repeats, SSRs were less abundant in the 5′ UTR than in the 3′ UTR.
Fig. 2

Distribution of the di- and trinucleotide SSRs within the open reading frame (ORF, in white) or in the 5′ untranslated regions (UTR, dark grey) and 3′ UTR (light grey) in P. taeda and P. pinaster contigs. Sequences for which no ORF could be detected were not considered

By assembling the P. taeda and P. pinaster contigs and singletons that contained SSRs using StackPack (Christoffels et al. 2001), we found that only 22 of the 171 (12.8%) P. pinaster sequences matched contig sequences in the P. taeda unigene set, providing a catalogue of 397 non-redundant putative SSR markers for pines.

Transferability of cDNA and genomic SSRs in pines

As a representative sample, 72 primer pairs (sets no. 1 and 2) were designed from cDNA-SSR sequences. Fifty-two out of the 69 P. taeda and one out of the three P. pinaster cDNA-SSRs amplified a single band of the expected size in the source species. The multi-banding pattern observed for five loci could be attributed to non-specific amplifications or the presence of multi-gene families that are frequent in pines (Kinlaw and Neale 1997). The lack of amplification obtained for 14 loci, could be explained by the quality of the primer pairs and/or the presence of introns. Table 2 summarises the amplification success for these 53 cDNA-SSR markers in seven pine species. Overall, the amplification rates in non-source species ranged between 64.6% in P. canariensis and 94.2% in P. radiata. This transferability rate was comparable to the result obtained with EST-derived markers in pines (Brown et al. 2001; Chagné et al. 2003; Komulainen et al. 2003).
Table 2

Cross-specific amplification of 53 cDNA-SSR markers: locus ID and amplification in seven hard pine species. The locus nomenclature follows the recommendations of the Treegenes database (http://dendrome.ucdavis.edu/Tree_Page.htm) for pine STS, also described by Brown et al. (2001). Position in the gene: UTR untranslated region, ORF open reading frame, NP no protein. The annealing temperature (°C) or touchdown temperature range used for the PCR amplification are given. Amplification: Pp Pinus pinaster (subsection Sylvestres), Pt Pinus taeda (subsection Australes), Pr Pinus radiata (subsection Oocarpae), Ps Pinus sylvestris (subsection Sylvestres), Ph Pinus halepensis (subsection Sylvestres), Ppi Pinus pinea (subsection Pineae), Pc Pinus canariensis (subsection Canarienses), + single locus amplification, − no amplification, NA no data

Primer set

Locus information

Amplification

Locus name

Identification

Repeated motif

Number of repeat

Position in gene

Forward primer

Reverse primer

Annealing temperature

Expected length (bp)

Pp

Pt

Pr

Ps

Ph

Ppi

Pc

1

SsPp_cn524

Contig524c

AG

14

5′UTR

cgattgtttttgccttttaagc

aaatatggcggggtgtgc

50

156

+

+

+

+

+

+

1

SsrPt_AA739797

AA739797b

AT

11

3′UTR

actttgcggtgaatcagacc

aaagtaaggctgcttgcatga

51

281

+

+

+

1

SsrPt_AW010960

AW010960b

AT

9

ORF

atcgactaggcatcaggtgg

tcctcgtagcccagctttta

49

225

+

+

+

+

+

+

+

1

SsrPt_AW225917

AW225917b

AT

9

3′UTR

tgcattgaaaaatacagcgg

attatgtacgaggccccaca

49

198

+

+

+

+

+

+

+

1

SsrPt_AW981642

AW981642b

AAG

7

ORF

gtggcacagggttttctgat

caaaccttcggtagcctcat

60–50

245

+

+

NA

NA

NA

NA

1

SsrPt_AW981772

AW981772b

CCT

4

ORF

gatcctgttcctcctcctcc

cctggacagaaacagcaaca

49

266

+

+

+

+

+

+

+

1

SsrPt_BF049767

BF049767b

AG

22

ORF

ttttgggtcgtaggaacctg

taaaacgggtgtctcttcgg

51

227

+

+

+

+

+

+

1

SsrPt_BF778306

BF778306b

AG

7

NP

gaagatggagacgaagcagg

tttgcagtctgttgcctttg

60–50

172

+

+

NA

NA

NA

NA

1

SsrPt_ctg1376

Contig1376a

AT

20

NP

cgatattatggattttgcttgtga

aaatgcatgccaaacttaaatac

60–50

145

+

+

+

+

1

SsrPt_ctg1525

Contig1525a

AGG

7

ORF

ttgaaaccatataagcaatgcc

aggacctgggtaaggaggc

60–50

173

+

+

+

+

+

+

+

1

SsrPt_ctg16480

Contig16480a

AAAT

13

NP

ctaaaacatcggtcggaagc

atttagtccaggccatgtcg

60–50

151

+

+

NA

NA

NA

NA

NA

1

SsrPt_ctg16811

Contig16811a

AT

11

5′UTR

gtccatgatgttgcagattgg

tgttccccaatggtctgtc

56

199

+

+

+

+

1

SsrPt_ctg17601

Contig17607a

AAG

9

ORF

cgccattaatatgcctaccg

atctctgcgctgcttgaagt

54

225

+

+

+

+

+

+

+

1

SsrPt_ctg18103

Contig18103a

AT

10

NP

cctggattcatttgtggctaa

catgccaacttcttgcattg

60

184

+

+

+

+

+

+

1

SsrPt_ctg2300

Contig2300a

CCG

6

ORF

cactttgcgagagactgcac

acgctgaaggaaatcgagaa

49

173

+

+

+

+

+

+

+

1

SsrPt_ctg275

Contig275a

AT

16

3′UTR

acggagatatattgctggcg

aaagaataacgtgaaacaaaccc

60–50

137

+

+

+

+

1

SsrPt_ctg3021

Contig3021a

AGC

14

ORF

ctcagattcctccaaatgcg

catgcaacatatgcaaaccg

60–50

234

+

+

+

+

+

+

+

1

SsrPt_ctg3089

Contig3089a

AT

17

NP

ctttcttcacgttggacttctt

ttagccatggagagtgcaga

45

482

+

+

+

+

+

+

1

SsrPt_ctg3754

Contig3754a

AGC

6

5′UTR

tctttgggtttctggagtgg

gctgttgctgttgttcttgg

60–50

421

+

+

+

+

+

+

+

1

SsrPt_ctg4363

Contig4363a

AT

10

3′UTR

taataattcaagccaccccg

agcaggctaataacaacacgc

60–50

100

+

+

+

+

+

+

+

1

SsrPt_ctg4487a

Contig4487a

CCG

5

ORF

tctgctgtgtggacaaacct

ttcttggctcaaaatctcgg

60–50

155

+

+

+

+

+

1

SsrPt_ctg4487b

Contig4487a

CCG

10

3′UTR

atgacgcattatcaggggaa

ttgcacagaaagcaggtttg

45

254

+

+

+

+

+

1

SsrPt_ctg4698

Contig4698a

ATC

10

ORF

cgaaaaggtggttctgatgg

ttttccgctggatttaccac

49

246

+

+

+

+

+

+

+

1

SsrPt_ctg5167

Contig5167a

AAC

7

ORF

tgcagagagattcgatggg

attttggtttgtttgctggc

60–50

293

+

+

+

+

+

+

+

1

SsrPt_ctg5333

Contig5333a

AGC

7

ORF

gaaggagtcggcgataacag

gggaattcgacctgtgaaga

49

163

+

+

+

+

1

SsrPt_ctg6390

Contig6390a

AAG

8

5′UTR

atccacgacttgtcgacgc

atcaaccaacttaggcagcg

45

440

+

+

+

+

+

1

SsrPt_ctg64

Contig64a

CCG

7

ORF

ggaagctgttacaagtgcgg

atcgagaagagaggaagggc

60–50

284

+

+

+

+

+

+

+

1

SsrPt_ctg7024

Contig7024a

AAG

7

ORF

gggaattctgaaagacaaggg

aacttacccatcgagagcccc

60–50

277

+

+

+

+

+

+

1

SsrPt_ctg7081

Contig7081a

AAG

7

ORF

gtcatccacgttcattggc

tcacaactgaccaaactgcc

60–50

442

+

+

+

+

+

+

1

SsrPt_ctg7141

Contig7141a

CCG

8

ORF

gaatgacgcattatcagggg

tcacctttctcacctctgcc

45

381

+

+

+

+

+

+

1

SsrPt_ctg7170

Contig7170a

AGC

5

ORF

ggtttttcgatttctgaggc

aacaggtgtgcaaatagccc

60–50

385

+

+

+

+

+

+

1

SsrPt_ctg7425

Contig7425a

AAG

6

ORF

aataagaccccagaggagcc

gacgtctttcaccaaatcgc

60–50

384

+

+

+

+

1

SsrPt_ctg7444

Contig7444a

AT

10

5′UTR

tcttcaccatcggtttctcc

tggatctgtcacctcctcatc

58

285

+

+

+

+

+

+

+

1

SsrPt_ctg7731

Contig7731a

AT

12

5′UTR

agtggtgaagggtccatctg

gcataacacaaaagccagca

51

217

+

+

+

+

+

+

+

1

SsrPt_ctg7824

Contig7824a

AT

12

3′UTR

tgacctgtcttgtgagacgc

ttttgaaacagattgcagcc

60–50

501

+

+

+

+

+

1

SsrPt_ctg7867

Contig7867a

CCG

6

5′UTR

ggtcgtggaggaggtaggg

actgataacagctgccccc

45

154

+

+

+

+

+

+

+

1

SsrPt_ctg8064

Contig8064a

ACC

6

ORF

gaacgtggttatggcggtag

tcgtggcaactatctcctcc

50

147

+

+

+

+

+

+

+

1

SsrPt_ctg865

Contig865a

AT

15

3′UTR

tttcagaagctcccgatttg

cttgtggacatggttaatgaag

45

232

+

+

+

+

+

+

+

1

SsrPt_ctg8767

Contig8767a

AGC

8

ORF

tggggaaaaatggcatacat

ggagcagacacccatggact

55

180

+

+

+

1

SsrPt_ctg9249

Contig9249a

AAG

7

5′UTR

ctgctccctcagctcttcc

agacgtcactgccattaccc

55

156

+

+

+

+

+

+

1

SsrPt_ctg946

Contig946a

AGG

9

3′UTR

tatcaggtataggcctccgc

aaataggagcccttctggga

53

287

+

+

+

1

SsrPt_ctg988

Contig988a

AT

7

3′UTR

taataattcaagccaccccg

aacattttgcacgatagccc

51

319

+

+

+

2

RPtest1

Contig4518a

AAT

7

5′UTR

gatcgttattcctcctgcca

ttcgatatcctccctgcttg

50

125

+

+

+

+

+

+

+

2

RPtest5

Contig6309a

AAC

6

ORF

acaacaataataacgggggc

acgctttagatcctcctgca

55

197

+

+

+

+

+

+

+

2

RPtest6

Contig3845a

TGC

5

ORF

aggattccaacagcatcacc

ctgaacatgaagcgcagtgt

55

147

+

+

+

+

+

+

+

2

RPtest8

Contig8048a

CCG

6

ORF

ggtgcgagattgaaattcgt

tttgcagtctgttgcctttg

60–50

196

+

+

NA

NA

NA

NA

2

RPtest9

Contig1667a

AGC

10

ORF

ccagacaacccaaatgaagg

gcctgctatcgaatccagaa

51

289

+

+

+

+

+

+

+

2

RPtest11

Contig3631a

ATC

7

3′UTR

aggatgcctatgatatgcgc

aaccataacaaaagcggtcg

56

213

+

+

+

+

+

+

2

RPt11est13

AA739656b

CTG

5

ORF

gatttttcaggaagaccccc

tgtaaggcacaagccctctt

51

277

+

+

+

+

+

2

RPtest15

Contig8064a

ACC

6

ORF

gaacgtggttatggcggtag

ccagggacagttaccagcat

56

246

+

+

+

+

+

+

+

2

RPtest16

AA739818b

AGT

5

ORF

cagaaatggcgtccaaattc

accccacttatatccccagc

56

132

+

+

+

+

+

+

2

RPtest20

Contig6393a

AGC

5

ORF

gttcccactcaagggttgaa

acatcatttgttgccgcata

56

259

+

+

+

2

RPtgbLP5

AF013805b

AAT

6

5′UTR

agaggttccaaacgagagt

tcgacttctgatttctttacatga

60–50

176

+

NA

NA

NA

NA

Amplification rate (%)

86.8

100

94.2

85.4

72.9

70.8

64.6

aPinus taeda unigene contig numbering (http://web.ahc.umn.edu/biodata/nsfpine/contig_dir16/)

bGenBank accession

cPinus pinaster unigene contig numbering (http://cbi.labri.fr/outils/SAM/COMPLETE/index.php)

Fifty-eight out of 107 (54%) of the set no. 3 P. radiata SSR markers amplified a single band in P. pinaster. This transferability rate was higher than of Gonzalez-Martinez et al. (2004) in P. pinaster using P. taeda-derived SSRs (42%), and that of Shepherd et al. (2002) in P. elliottii and P. caribaea using P. radiata-derived SSRs (44%). Overall, the interspecific transferability of cDNA-SSR markers was higher than that of the genomic SSRs.

Polymorphism, orthology, and genetic mapping of cDNA and genomic SSRs in Pinus pinaster

Among the 46 single-copy cDNA and 58 genomic SSR loci that amplified in P. pinaster, nine (19.5%) and seven (12%) were found to be polymorphic in at least one of the four mapping pedigrees, respectively. Six out of 18 (33%) of the cDNA-SSRs located in UTRs were polymorphic, compared to three out of 30 (10%) of those located in ORFs. This result suggests that a pre-annotation of the sequences containing SSRs can be used to enrich for primer pairs that yield polymorphic cDNA-SSR markers. If we consider the repeat type and position of the cDNA-SSRs (Table 2), then it should be noted that five out of 17 dinucleotide cDNA-SSRs (29%) were polymorphic in at least one P. pinaster mapping pedigree whereas four out of 35 (11%) trinucleotide cDNA-SSRs were polymorphic.

We verified the orthology for the seven polymorphic SSR loci originated from P. radiata genomic library by sequencing PCR products obtained by amplifying P. pinaster DNA. The high levels of sequence identity found for six of the loci (Table 3) were comparable to the levels found between orthologous pine ESTs in previous studies (Brown et al. 2001; Chagné et al. 2003; Komulainen et al. 2003). Interestingly, one locus (NZPR1702_b) was not homologous between the species and did not contain an SSR motif. Electrophoresis on an acrylamide gel showed that this locus presented two distinct bands, 30 bp apart (i.e., two alleles corresponding to an insertion-deletion polymorphism). This locus presented the lowest genetic diversity (H=0.38), and was subsequently discarded for the comparison between genomic and cDNA SSRs (see next section).
Table 3

Pinus radiata genomic SSR markers that were mapped in P. pinaster and marker sequence homologies between P. pinaster and P. radiata

Primer set

Locus name

Repeated motif

Forward primer

Reverse primer

Annealing temperature (°C)

Expected length (bp)

Sequence homology (%)

3

NZPR1078

AC10

tggtgatcaagcctttttcc

gttgatgagtgatggcatgg

53

342

91.5

3

NZPR114

CA15... CA13 TA22

aagatgacccacatgaagtttgg

ggagctttataacatatctcgatgc

56

193

88.2

3

NZPR1702_b

AC15 CA13...AT5

tatgattggaccattggggt

ccaaaccctcctccacatatc

53

187

No homology

3

NZPR413

TG23 GT6

tgaacctcgatggaatagcc

cccgccttgcatcaatta

53

253

89.1

3

NZPR472

AC13

gagaaaattcaaccaccgga

ggttgtagggcagtgaatcc

53

309

89.4

3

NZPR544

CA5AC12 TA5

gcgatgtgcaacccttgata

tgctattccgtcaaaaaccc

56

286

86.1

3

NZPR823_a

AC57

tatcgggagcaagttatgcc

tgcactctttttcgtctcca

53

296

92.5

The chromosomal assignments of 19 polymorphic SSR markers in the INRA-G2 and INRA-F2 genetic maps (Chagné et al. 2002; Costa et al. 2000) and their polymorphism state in two other P. pinaster pedigrees are presented in Table 4. All the loci were linked with a minimum LOD of 6.0, except for locus ssrPt_ctg275 that was not linked to any linkage group in either of the maps. The three SSR markers of set No. 4 previously developed by Mariette et al. (2001) were also mapped in both pedigrees. Overall, these SSRs made it possible to align eight of the 12 linkage groups between the two maps. Linkage group homology was also confirmed using a set of ESTPs mapped in the INRA-G2 (Chagné et al. 2003) and INRA-F2 pedigrees (D. Chagné and P. Semat, unpublished data).
Table 4

Chromosomal assignment and genetic diversity parameters of the three classes of microsatellites genotyped on 26 unrelated P. pinaster trees. The mapping location in the INRA-G2 (following linkage group numbering of Chagné et al. 2002) and INRA-F2 maps (following linkage group numbering of Costa et al. 2000) are indicated. M Monomorphic, P polymorphic, UL unlinked, H heterozygosity, A number of alleles

Marker type

Primer set

Locus ID

Mapping pedigree

Genetic diversity

INRA-G2

INRA-F2

AFOCEL-F1

INIA-F1

H

A

cDNA-SSR

1

RPtEST11

5

2

P

M

0.74

4

1

RPtEST13

10

M

M

M

0.66

3

2

SsrPp_cn524

6

1

P

M

0.81

5

2

SsrPt_ctg275

P/UL

P/UL

P

P

0.74

8

2

SsrPt_ctg4363

M

12

P

M

0.68

4

2

SsrPt_ctg7824

10

M

M

M

0.35

2

2

SsrPt_ctg988

11

M

P

M

0.55

3

2

SsrPt_ctg1525

M

11

M

M

0.16

2

2

SsrPt_ctg64

3

3

M

P

0.68

4

P. radiata

genomic SSR

3

NZPR1078

2

7

P

M

0.68

4

3

NZPR114

M

5

M

P

0.68

5

3

NZPR1702_b

11

6

P

M

0.38a

2a

3

NZPR413

4

8

P

P

0.58

4

3

NZPR472

1

M

P

P

0.67

4

3

NZPR544

M

3

M

P

0.41

4

3

NZPR823_a

5

M

P

P

0.67

3

P. pinaster and P. halepensis

genomic SSR

4

FRPp91

1

9

P

P

0.85

9

4

FRPp94

10

5

P

P

0.80

8

4

ITPh4516

3

3

P

P

0.84

8

aThese values were not taken into account for the comparison of diversity parameters between cDNA and genomic SSRs

Level of diversity of cDNA and genomic SSRs in Pinus pinaster

The nine polymorphic cDNA-SSR loci and 10 polymorphic genomic SSR loci were genotyped in 26 unrelated P. pinaster trees. Their expected heterozygosities (H) and number of alleles (A) are shown in Table 4. Within the cDNA-SSRs, there was no significant difference between the heterozygosity values obtained in the ORF and the UTRs, or between tri- and dinucleotide SSRs (F test with a P value of 0.46). Within the genomic SSRs, a significant difference (F test with a P value of 0.11) of the diversity parameters was found between the loci transferred from P. radiata and those were developed from P. pinaster and P. halepensis by Mariette et al. (2001). This difference suggests that genomic SSRs tend to be less polymorphic when transferred from phylogenetically distant species; P. radiata belongs to the Oocarpeae subsection, whereas P. pinaster and P. halepensis belongs to the Sylvestres subsection of the pine genus (Mirov 1967). Finally, the level of diversity was not different between the transferred P. radiata genomic SSRs and the cDNA-SSRs (F test with a P value of 0.27).

Discussion

Composition and distribution of SSRs in the expressed genome of pine

The SSR composition of the coding region of the pine genome was first compared to the results published in other plant species. In dicotyledonous species where cDNA-SSR evaluations have been reported: i.e., Vitis vinifera (Scott et al. 2000) and Arabidopsis thaliana (Cardle et al. 2000; Morgante et al. 2002), the most represented repeat types, i.e., AG, AT, AAG, AGG and AGC, were also found to be the most frequent in pines (Fig. 1). Conversely, the most common repeated motif in monocotyledonous species (Varshney et al. 2002), CCG, was quite rare in pines (5.2 and 7.2% in P. pinaster and P. taeda, respectively). This result suggests that the SSR composition of gymnosperms genes is more similar to that of dicots than monocots. However, given the few number of species analysed, this interpretation remains to be confirmed.

The presence of a majority of trinucleotides in the ORFs (Fig. 2) was also in agreement with that whichhas been described in other plants. Morgante et al. (2002) showed a strong positive selection for trinucleotides in the translated regions of A. thaliana. Metzgar et al. (2000) explained the excess of triplet repeat microsatellites in the coding regions by the effect of important mutation pressures. Indeed, a mutation in a mono-, di-, tetra- or pentanucleotide SSR in the ORFs would result in a frameshift that could change the translated protein structure and function.

Morgante et al. (2002) detected much higher levels of SSRs in the 5′ UTRs, especially AG/CT repeats. The rather small number of SSRs detected in the 5′ UTRs of pine genes (17.4%, Table S3) contrasted with their results and could reflect a true feature of pine genes or it could simply be that the low coverage of the 5′-end in the pine ESTs has provided a bias. Some support for the latter view comes from ESTs obtained from the sequencing of the 5′ ends of 3′ anchored cDNAs (Frigerio et al. 2004; Kirst et al. 2003). Therefore, the 5′ UTRs were probably under-represented in the two pine EST collections analysed.

Transferability of cDNA and genomic SSRs in pines

From 64.6 to 94.2% of the pine cDNA-SSRs transferred to one or more of the seven pine species tested (Table 2). It has been clearly shown that the transferability of molecular markers (including SSRs) depends on the phylogenetic distance between species. Most of the markers developed in this study originated from P. taeda, an American pine which belongs to the Pinus section of the subgenus Pinus (Mirov 1967). It is not surprising, therefore, that the highest transfer rate was observed for P. radiata markers (94.2%), another American pine belonging to the same section. Similarly, the transfer rate decreased for SSR markers of Mediterranean pines of the same section (P. pinaster, 86.8%; P. sylvestris, 85.4%; P. halepensis, 72.9%), and was even lower with Mediterranean pine markers of the more distant section Pinea (P. pinea, 70.8%; P. canariensis, 64.6%). We also anticipate a lower transferability of cDNA-SSR markers in the subgenus Strobus, or even within other genera of the Pinaceae family. However, the transferability rates in these more distant species should be higher for cDNA-SSR markers compared to genomic SSRs (Echt et al. 1999).

Similar rates of cross-species transferability were reported using EST-derived SSR markers in the genus Medicago (Eujayl et al. 2003, 89%) and within the Poaceae (Gupta et al. 2003, 55%). Comparatively, genomic SSR markers have shown to be less transferable in pine (54% between P. radiata and P. pinaster, this study; 29% between P. strobus and P. radiata, Echt et al. 1999; and 42% between P. taeda and P. pinaster, Gonzalez-Martinez et al. 2004). This rate is low compared to other plant genera (e.g., up to 85% between Glycine spp., Peakall et al. 1998). These results suggest that the data mining of pine cDNA libraries is valuable approach to develop transferable SSR markers. Furthermore, it should be noted that the cDNA-SSR markers were obtained without library screening. Clearly the development of pine sequence databases and the in silico approach described here provides a cost-effective approach to SSR marker development.

In rice and wheat, EST-derived SSR markers have been reported to have lower rate of polymorphisms compared to SSR markers derived from genomic libraries (Cho et al. 2000; Eujayl et al. 2002). However, such differences were not found in Medicago (Eujayl et al. 2003) and Picea (Scotti et al. 2000) two highly polymorphic genera compared to the highly domesticated cereal crops. Our findings in P. pinaster revealed that non-source species genomic SSRs and cDNA-SSRs have similar levels of diversity and thus cDNA-SSRs are not less polymorphic.

At the intraspecific level, these markers have been mapped within the different genetic maps of P. pinaster, which will make it possible to construct a consensus map of this species. Nevertheless, more markers will be needed to reach the saturation levels desired. The markers developed in this study were also mapped in the P. pinaster genetic map that was aligned with the loblolly pine map using comparative genome mapping (Chagné et al. 2003) and so can be used as orthologous markers in other conifer species.

Conclusion

We have shown in this study that database-sourced cDNA-SSRs can be efficiently developed for, and transferred across, pine species. Pine SSR markers developed in this way are less expensive to produce and are as informative as SSR markers derived from other (genomic-based) methods. However, since these markers correspond to transcribed regions, further study is necessary to determine if they behave as neutral markers or not, if they are to be used in genetic diversity analysis and in association studies

Acknowledgements

D.C. was funded by the French Ministry of Research. This research was supported by grants from France (Ministère de l’Agriculture et de la Pêche-DERF No. 61.45.80.15/02) and the European Union (TREESNIPS project: QLK3-CT-2002-01973). The maritime pine ESTs were produced with the support of the Aquitaine Région (n°2002 0307002A) and INRA (Lignome) as well as the European Union (GEMINI: QLK5-CT-1999-00942). The work at New Zealand Forest Research was funded by New Zealand’s Foundation for Research, Science and Technology (CO4X005).

Supplementary material

122_2004_1683_esm.pdf (517 kb)
Supplemental Table 2 and 3: Sequences used for the construction of the pine interpolated Markov model (PDF 47 KB)

Copyright information

© Springer-Verlag 2004