Background

Almost all of the applied protocols to isolate microsatellites de novo include construction of partial genomic libraries (selected for small insert size) followed by cumbersome screening steps with hybridization probes [2]. Here, we introduce an improved approach called TOMMI (T argeted O ligonucleotide-M ediated M icrosatellite I dentification) to develop microsatellites by straightforward sequencing of clones isolated from large-insert libraries like PAC (P 1-derived A rtificial C hromosome) and BAC (B acterial A rtificial C hromosome) with repeat-containing oligonucleotides. The need to specifically identify and isolate STS-markers from these types of libraries is unquestionable. First, large-insert libraries are predominantly used in animal genetics, e.g. [3, 4], as tools to identify candidate genes or to generate overlapping contigs of chromosomal regions that are associated with quantitative or economic trait loci (QTL or ETL). Secondly, the overall number of microsatellites present in a genome depends mainly on their complexity and size. Assuming a total size of 3 × 109 bp and an estimated frequency of a dinucleotide repeat every 30–50 kb in mammals (as reviewed by [5]), a genome-wide figure of 100,000 microsatellite markers of that kind can be assumed [6]. However, only approximately 1,200 porcine microsatellites have been reported so far [7]. Furthermore, both the total number and the distribution of the loci are still not sufficient to have well-distributed microsatellite coverage throughout the genome or for several chromosomes, e.g. SSC18 [8]. The objective of the present study was the selective generation of microsatellites from PAC-clones, which were prior to STS development isolated from the porcine PAC library TAIGP714 [3] by a three-dimensional PCR screening strategy [9]. Eight of the eleven clones harbored functional or positional candidate genes involved in health, reproduction, production, and regulation, whereas the other three clones have been used in the attempt to construct a PAC contig covering SSC16q11-13 (Table 1).

Table 1 Primers used for selecting the PAC clones from TAIGP714 large-insert library

Results and discussion

Fifteen of the seventeen microsatellites (Table 2) were developed with sequencing primers containing one selective nucleotide at the 3'-end: (CA)8T (S0701, S0703, and S0767), (CA)8A (S0702, S0704, and S0710), (CA)8G (S0705, S0706, S0712, and S0766), (AC)8C (S0709), (AC)8G (S0707 and S0715), (AC)8T (S0708 and S0711). Characterization of microsatellites S0713 and S0714 was only accomplished by an improved discrimination of the PAC clone sequences with sequencing primers further extended at the 3'-end with a second nucleotide [(CA)8AT for S0713 and (CA)8GC for S0714]. The second nucleotide became necessary because the respective clones TAIGP714L02061Q (for S0713) and TAIGP714I23038Q (for S0714) contained additional (CA)8A or (CA)8G primer binding regions or motifs. Contrary, a further extension with three nucleotides at the 3'-ends of the primers did not result in additional microsatellites in any of the PAC clones or was not required. Therefore, we conclude that repeat primers with two 3'-nucleotides next to the repeat motif are sufficient to detect and sequence all repeats potentially present on a large-insert library clone. The results of our isolation strategy also indicate that two sequencing reactions (the reverse sequencing primer was designed based on the obtained sequences) seem to be sufficient in most cases to gain sequence information of high quality to amplify microsatellites (Table 2). Usage of sequencing primers degenerated at the 3'-end proved, however, to be inadequate as no sequence information at all was achieved. Also, to avoid overlapping primary sequences, oligonucleotides that basically extend the dinucleotide repeat at the 3'-end – such as (CA)8C and (AC)8A – are not recommended. TOMMI proved to be an efficient and reliable isolation strategy. Besides new STS-markers, six previously described microsatellites were also detected. Three of these loci, microsatellites S0111 [10], SW742 [11], and SW813 [12], were initially used as probes for the isolation of clones TAIGP714L02061Q, TAIGP714I23038Q, and TAIGP714F10061Q. The other three already described microsatellite sequences reside on TAIGP714C09004Q [GenBank: AJ440949 (repeat location: 3172–3231) and GenBank: AJ440950 (repeat location: 15831–15860 and 16007–16038)]. They were not further considered in this study as they were not regarded as novel. Independently of our effort, two other groups [13, 14] introduced similar sequencing approaches to generate microsatellites from large-insert libraries. There are, however, several differences between our approach and the ones of the other groups in terms of sequence generation and selective amplification of microsatellites. Here, contrary to Waldbieser and colleagues [14] – who used trinucleotide repeat containing primers for sequencing – both gene-specific primers are not 5'-tailed with extra nucleotide stretches to enable either product labeling or to promote alleged non-template adenylation. Fujishima-Kanaya's group [13] used larger repeat compounds contributing to the primer [(CA/GT)(10) instead of (CA/GT)(8)]. Secondly, the sequencing primers consisted generally of three selective nucleotides at the 3'-end adjacent to the repeat motif (e.g. CNA/GVG). There, the first of the three terminal nucleotides was always identical with the starting nucleotide of the dinucleotide repeat primer used. In addition, primers contained a degenerated base according to the International Union of Biochemistry (IUB) codes at the second position from or directly at the 3'-end. Thirdly, determination of the double-stranded primary DNA sequence stretch was achieved by four sequencing reactions using both a CA-repeat containing primer plus a GT-repeat containing primer heading in the opposite direction and two reverse primers were developed based on the obtained sequence. Finally, they always designed an additional primer pair for the specific amplification of the microsatellite. In contrast, we used the single reverse sequencing primer in combination with a newly developed sequence specific primer (S0766 and S0767) or designed a new primer pair to amplify the microsatellite (S0701 to S0715).

Table 2 Forward and reverse sequencing primer

The observed number of alleles per locus (monomorphic locus S0709 is not included in this calculation) in the heterogeneous sampling was as low as 2 (S0702) and as high as 22 (S0713), leading to an average number of 9.94 alleles, NE ranged from 1.05 to 11.54 and both HT and PIC from 0.05 to 0.91 (Table 3).

Table 3 Characteristics of TOMMI-microsatellites

Due to their isolation from partial genomic libraries selected for small insert sizes most of the publicly available porcine microsatellites lie within DNA-fragments of about 80 to 200 bp. Their potential combination in multiplex assays – also considering different annealing temperatures and technical limitations of the automated sequencers (limited number of available fluorescent dyes) – is therefore hampered. Hence, an enhanced number of genotypes per run can only be achieved by the integration of STS-markers covering a larger allelic spectrum. Thus, we intended and focused on the development of large amplicons for microsatellites by utilizing as much sequence information as possible for primer design. Indeed, fourteen STS-markers had allele sizes of at least 200 bp and for five of the isolated microsatellites, sequence information proved to be good enough to amplify allele sizes of at least 300 bp (Table 3).

By the guided isolation of STS-markers S0709 to S0715 from three SSC16q derived PAC clones (relative position 0 cM to 9.3 cM [7]; 2.33 STS-markers per clone), the marker density in this chromosomal region was improved remarkably. An average of 1.55 new microsatellites was isolated from PAC clones harboring functional candidate genes (S0701-S0708; S0766 and S0767). Considering all used PAC clones and developed STS-markers, 1.55 microsatellites per clone were isolated. As the PAC clones had an average length of 80 kb (as shown by pulsed-field-gel electrophoresis) the frequency of dinucleotide repeats every (30 to) 50 kb [5] was more or less confirmed. TOMMI holds therefore the potential to identify existing STS-markers linked/adjacent to e.g. candidate genes on large-insert library clones. Thus, in combination with a genome scan, respective putative candidate genes could either be transformed to or excluded as positional candidate genes prior to their complete structural characterization including SNP detection. Linkage mapping results for S0701, S0705, S0707, S0711, S0712, S0713, S0715, and S0766 are presented in Table 4. A comparison of their mapping positions with QTL positions (Pig Quantitative Trait Loci (QTL) database [15] reveal that S0705 (64.22 cM), S0707 (43.19 cM), and S0766 (102.50 cM) reside on the respective chromosomes exactly at QTL locations (S0705: backfat between the last 3th and 4th rib; S0707: early growth rate and water holding capacity; S0766: backfat thickness at first rib and intra-muscular fat). The other STS-markers are located in QTL spans of ± 5 cM. This indicates their immediate potential to further dissect these respective QTL regions.

Table 4 MARC marker information and linkage mapping results

Conclusion

The sequencing strategy described in this study provides a targeted, inexpensive and fast method to develop microsatellites from large-insert libraries. It is also well suited to generate polymorphic markers for selected chromosomal regions and contigs of overlapping clones and yielded sufficient high quality sequence data to develop marker amplicons greater than 250 bases.

Methods

PAC clone isolation and physical mapping

Prior to STS development, a total of 11 clones were isolated from the porcine PAC library TAIGP714 [3] by a three-dimensional PCR screening strategy. PAC-DNA preparations were done according to the manufacturer's protocol (Qiagen, Hilden, Germany). The physical assignment of the PAC clones was performed by Fluorescence in situ Hybridization (FISH) as described in [16] or alternatively by analysis of the INRA-UMN porcine radiation hybrid (IMpRH) panel [17]. Microsatellite primers (Table 3) were used to RH map S0703, S0704 and S0708S0715. Marker assignment of S0701, S0702, S0705S0707, S0766 and S0767 was performed with primers from further sequence segments of the PAC clones.

Microsatellite generation and characterization

All sequencing reactions and the separation of microsatellites were performed on an ABI PRISM® 3100 DNA analyzer (ABI, Weiterstadt, Germany). Sequencing reactions were done using the BigDye™ Terminator (v 3.0) Cycle Sequencing Kit (ABI, Weiterstadt, Germany). DNA sequencing was performed using 10 pmol of the respective oligonucleotide, 1 μl BigDye Premix and 50–100 ng of purified plasmid DNA as template in a total volume of 10 μl. Sequencing conditions were 96°C for 30 s followed by 30 cycles of 96°C for 10 s, the respective annealing temperature for 5 s and 60°C for 4 min. The optimal annealing temperature for the repeat containing primer was between 50°C and 52°C, except for the generation of sequences for S0714, which were at 56°C. To generate STS-markers, oligonucleotides containing repeat motifs (CA)8 respectively (AC)8 at the 5'-end and few (one or two) non-repetitive bases at the 3'-end were originally used as sequencing primers. Based on the obtained sequence, specific primers were developed and used as reverse oligonucleotides to determine the composition of the repeat region and its 5'-flanking region (Table 2; Figure 1). BLAST comparison followed sequence determination to verify the novelty and uniqueness of the obtained sequences. Depending on the quality of the sequenced stretch, primers were developed to amplify seventeen STS-markers (S0701 to S0715; S0766 and S0767; Table 3). To confirm the sequence identity of the respective microsatellites [GenBank: AY253989 to AY254003, AY731063, and AY731064] on genomic DNA, the resulting PCR products were subcloned into the polylinker of the pGEM®-T vector (Promega, Mannheim, Germany) and three independent clones each were bi-directionally sequenced using standard sequencing primers SP6 (5'-ATT TAG GTG ACA CTA TAG AA-3') and T7 (5'-TAA TAC GAC TCA CTA TAG GG-3').

Figure 1
figure 1

Generation of STS-markers by TOMMI.

Evaluation of microsatellites and size determination of alleles were done with appropriate ABI-softwares GENESCAN (3.7) and GENOTYPER (3.6) using GENESCAN™-500ROX™ as internal size standard. Oligonucleotides were designed with the Oligo Selection Program [18] and synthesized by MWG Biotech (Ebersberg, Germany). To characterize size range, number of alleles, polymorphism information content (PIC), average heterozygosity (HT) and effective allele number (NE) of the microsatellites, STS-markers were separately amplified. PCR assays were performed at 54°C for S0706, S0708, S0712, S0713, S0714, and S0767, at 56°C for S0701, S0702, S0703, S0705, S0707 and S0715, and at 58°C for S0704, S0709, S0710, S0711, and S0766 in a RoboCycler Gradient 96® (Stratagene, LaJolla, USA) using PURE Taq Ready-To-Go PCR Beads® (Amersham Biosciences, Freiburg, Germany), along with the respective oligonucleotides (one labeled at the 5'-end alternatively with fluorescent dyes FAM, JOE or NED) and 50 ng of genomic porcine DNA in a volume of 12.5 μl (the concentration of each dNTP is 100 μM in 10 mM Tris-HCl (pH 9.0 at room temperature), 50 mM KCl and 1.5 mM MgCl2). In total, 336 unrelated pigs representing nine European breeds (9 Angeln Saddleback, 18 Bunte Bentheimer, 9 German Edelschwein, 15 German Landrace, 30 Hampshire, 27 Göttingen Minipig, 31 Pietrain, 12 Swabian-Haellian Swine, and 7 European Wild Boar), and six Chinese breeds (30 Chinese Jiangquhai, 28 Chinese Luchuan, 30 Chinese Minpig, 30 Chinese Rongchang, 30 Chinese Tibetan, and 30 Chinese Yushanhei) were investigated. The standard PCR profile was as follows: pre-denaturation at 92°C for 2 min, followed by 35 cycles of 92°C for 30 s, the optimal annealing temperature for 30 s, and 72°C for 30 s. The final cycle had an extension at 72°C for 10 min. PIC, HT and NE were estimated based on algorithms as introduced by Botstein and colleagues [19], Nei [20], and Kimura and Crow [21].

Linkage mapping of STS-markers on the USDA-MARC linkage map

Seven families of the MARC Swine Reference Population were genotyped as described [22]. Amplified DNA was radioactively labeled, separated by denaturing polyacrylamide gel electrophoresis and visualized with autoradiography. To ensure accurate sizing and discrimination of alleles, amplification primers were redesigned to yield smaller products for all markers except S0706, S0707 and S0709. S0767 was not tested in this population. Four markers were not informative in the MARC Swine Reference Population (S0702, S0706, S0709 and S0714) and four primer sets failed to produce reliable products (S0703, S0704, S0708 and S0710). Genotypes were determined and entered into the MARC Genome Database. Each marker was initially assigned to a chromosome based on TWOPOINT results of CRIMAP [23], then multipoint linkage analyses determined the final location of each marker. Genotypic data were evaluated with CHROMPIC and corrections made if necessary. The final position reported is based on the current MARC swine linkage map. Amplification primers for the eight successfully mapped markers are presented in Table 4.