An improved allele-specific PCR primer design method for SNP marker analysis and its application
- First Online:
- 25k Downloads
Although Single Nucleotide Polymorphism (SNP) marker is an invaluable tool for positional cloning, association study and evolutionary analysis, low SNP detection efficiency by Allele-Specific PCR (AS-PCR) still restricts its application as molecular marker like other markers such as Simple Sequence Repeat (SSR). To overcome this problem, primers with a single nucleotide artificial mismatch introduced within the three bases closest to the 3’end (SNP site) have been used in AS-PCR. However, for one SNP site, nine possible mismatches can be generated among the three bases and how to select the right one to increase primer specificity is still a challenge.
In this study, different from the previous reports which used a limited quantity of primers randomly (several or dozen pairs), we systematically investigated the effects of mismatch base pairs, mismatch sites and SNP types on primer specificity with 2071 primer pairs, which were designed based on SNPs from Brassica oleracea 01-88 and 02-12. According to the statistical results, we (1) found that the primers designed with SNP (A/T), in which the mismatch (CA) in the 3rd nucleotide from the 3’ end, had the highest allele-specificity (81.9%). This information could be used when designing primers from a large quantity of SNP sites; (2) performed the primer design principle which forms the one and only best primer for every SNP type. This is never reported in previous studies. Additionally, we further identified its availability in rapeseed (Brassica napus L.) and sesame (Sesamum indicum). High polymorphism percent (75%) of the designed primers indicated it is a general method and can be applied in other species.
The method provided in this study can generate primers more effectively for every SNP site compared to other AS-PCR primer design methods. The high allele-specific efficiency of the SNP primer allows the feasibility for low- to moderate- throughput SNP analyses and is much suitable for gene mapping, map-based cloning, and marker-assisted selection in crops.
KeywordsSNP AS-PCR Mismatch Polymorphism Destabilization
Single Nucleotide Polymorphism
Simple Sequence Repeat
Restriction fragment length polymorphism
Amplified fragment length polymorphism
The Cleaved Amplified Polymorphic Sequence
Derived The Cleaved Amplified Polymorphic Sequence.
Single Nucleotide Polymorphisms (SNPs) are single base differences between DNA of different individuals. Once discovered, SNPs can be converted into genetic markers that can be assayed[1, 2]. As the most abundant and stabile form of genetic variation in most organism genomes, SNPs are more suitable for genotyping markers compared to the conventional markers such as RFLP (Restriction fragment length polymorphism), AFLP (Amplified fragment length polymorphism) and SSR (Simple Sequence Repeat). With the development of bio-technology, SNPs are becoming favored genetic markers that are used in marker-assisted breeding, map-based cloning, study of evolutionary conservations between different species[5, 6], and the detection of risk-associated alleles linked to human diseases.
Recently, massive parallel sequencing platforms such as GSFLX (Roche), Solexa (Illumina) and SOLID (Applied Bios stems) have significantly reduced the cost of high throughout sequencing. A large number of genomes and transcriptomes have been rapidly sequenced using these new platforms to identify novel SNPs in maize, rapeseed and human etc. A large variety of techniques for high-throughput SNP genotyping have also been developed using Taqman, Amplifluor, genome re-sequencing[14, 15], and SNP arrays[16, 17]. These techniques are expensive and require specialized equipments, which cost more standard primers and are not practical for assaying low- to moderate-throughput SNPs. Hence, there is a need for simple and accurate genotyping assays that can be implemented in laboratories lacking access to sophisticated equipment.
Traditional SNP genotyping methods such as CAPs (The Cleaved Amplified Polymorphic Sequence), dCAPs (derived CAPS), and AS-PCR (Allele-specific PCR) are widely used for low-throughput applications in plant research. In application, CAPS and dCAPS are restricted by end nuclease sites that could be inefficient and not cost-effective[18, 19, 20]. AS-PCR is based on the extension of primer only when its 3’end is a perfectly complemented to the template. In principle, SNPs can be detected using allele-specific PCR primers based on the 3’ terminal nucleotide of a primer that corresponds to a specific SNP site. However, reliable discrimination between the alleles is not sufficient to achieve using this described method. To overcome this problem, allele-specific primers with an additional base pair change within the three bases closest to the SNP site between alleles have been used[21, 22]. Each specific SNP site in an allele can generate at least 18 possible primers with one mismatch base. The SNAPER program generates a list of up to 16 possible primers per SNP site for each allele. Therefore, choosing additional mismatches to increase primer specificity has been a challenge for AS-PCR. Some studies have proposed criteria for designing AS-PCR primers. Hayashi et al (2004) proposed that base pair mismatches created through T-G or C-A transversions at third base from 3’ end could increase the allele-specificity. Hirotsu et al (2010) identified A-T transversion and A-G transition were useful base pair mismatches for improvement of allele-specific amplification. The WASP tool could also be used to introduce mismatches at the penultimate (2nd to the terminal) base of the primer[26, 27]. However, most studies used only a limited quantity of primers, which might have some influences on efficiency of SNP primer specificity.
In this study, over 2000 primer pairs, which were designed based on SNPs between B. oleracea lines 01-88 and 02-12, were used to analyze the effects of different SNP types, mismatch bases and sites within the three bases closest to the 3’end on primer specificity. Based on these results, we advanced the SNP primer design principle. Compared to traditional SNP genotyping methods, our method could provide a cost-effective alternative for high efficient specific primers and would greatly facilitate plant research.
SNP analysis of B.oleracea 01-88 and 02-12 genome sequences
Putative SNPs identified betweenB. oleraceagenomes of 01-88 and 02-12
No. of every SNP type
Total No. of transition and transversion
To further identify the putative SNPs and estimate the proportion of false positives, 96 SNP sites derived from only 8-read sequences were chosen randomly. Primers were designed according to the genome sequences near these SNP sites and all amplicons could generate about 500 bp fragments in which containing the corresponding SNPs. Sanger sequencing results showed 93 SNPs were identical with the putative SNPs and 3 SNPs were unpredicted or undetectable. This indicated a very high proportion of SNPs really existed between lines 01-88 and 02-12.
Design and efficiency detection of SNP primers
Destabilization strength of eight combinations of mismatch nucleotide pairing
mismatch types at 3’end of primer in 01-88
Destabilization strength of mismatch type
Strong and Medium
Strong and Medium
Effect of mismatch sites and SNP types on the specificity of allele-specific PCR
Mismatch sites closest to the 3’end of primers
Polymorphism percent of SNP primers
Polymorphism percent of primers for every SNP type
Effect of artificial base mismatches in three mismatch sites on the specificity of allele-specific PCR
Mismatch sites closest to the 3’end of primers
Polymorphism percent of primers
Polymorphism percent of primers
Polymorphism percent of primers
Polymorphism percent of primers
Application of the design method
Besides the highest polymorphic primers mentioned above, the high polymorphic primers could be found in every SNP type based on our results. The principle of primer design is described as followed: firstly, for every kind of SNP, the mismatch site (the 2nd, 3rd and 4th site closest to the SNP site) is chosen according to the result of polymorphism percent in Table3. For the primers of three SNP types A/T (T/A), A/C (G/T) and A/G (C/T), the highest polymorphism percents are 45.9%, 37.4% and 30.7%, respectively and the mismatches in primers are all located in the 3rd site. While for the SNP type C/G (G/C), the mismatches in the 2nd site closest to 3’end of primers show highest polymorphism percent (43.3%). For the base in mismatch site (the 2nd, 3rd and 4th site closest to the SNP site), there exist three different mismatch styles for every kind of base. The best mismatch style is chosen according to the statistical results from Table4. For example, if there is a G in the 3rd site, the mismatches of CC, CA and CT will be formed. The polymorphism percent of primers with CA (46.8%) mismatch is highest compared to that of CC (23.9%) and CT (32.1%).
20 primers pairs designed according to SNPs between rapeseed zy036 and 51070
Mismatch site closest to 3’end of primers
Mismatch base pairs
In recent years, various methods for high-throughput SNP analysis have been described. Although these methods are highly efficient compared to other traditional SNP genotyping by electrophoresis, significant investments of expensive probes, microchips or special instrumentation have limited their use in most laboratories. For traditional low-throughput SNP genotyping methods, the main time and labor, and low efficiency of specific primer are still challenges. The Allele-specific PCR method was developed for allele analysis of clinically significant mutations. To facilitate reliable discrimination between two alleles highly, the addition of artificial mismatches within the three bases from 3’end of the primers might be beneficial. Although the third position from the 3’end has been detected as the best to place a mismatch base in primer previously, we really do not know which kind of mismatch (for every base, there are three kinds of mismatches) is the best choice in the 3rd position. In this study, different from the previous reports which used only a limited quantity of primers, a large amount of SNP primers designed by introducing mismatches within the three bases closest to the 3’end of primers were used to solve this problem.
Generally, AS-PCR primers designed randomly had a low allelic specificity rate of approximately 30%, which was consistent with our results (29.1%). However, mismatch sites (2nd, 3rd, and 4th site closest to the 3’end) had different effects on the polymorphic efficiency of primers. In our study, we found primer polymorphic percent was lowest in the 2nd base location because many primers could not amplify any bands in both of the lines 01-88 and 02-12. For the 4th base location, the polymorphism efficiencies of all mismatch types were almost equivalent (under 30%). The highest polymorphic percent was found in the 3rd base located closest to the SNP site, which was observed by Hayashi et al similarly (2004).
According to the results of thermodynamics of mismatches reported by Peyret et al (1999) and Little (2001)[28, 29], the mismatch base pairs had different destabilization effects that could be divided into weak, medium, and strong strength of destabilization. Therefore, during design of AS-PCR primers, the effects of mismatches no matter in 3’end or within the three bases closest to the 3’end of primers should be both considered[26, 27]. In this study, for convenient analysis, we compressed four SNP types including A/T (T/A), A/G (T/C), A/C (T/G), and G/C (C/G) based on their destabilization effects of mismatch base pairs formed in 3’end of primers. Among them, primers generated from SNP types A/G (T/C) had the lowest detection efficiency in all mismatch sites. It was reasonable because AC and GT mismatches had weak destabilization strength. The primers including these specific mismatches at 3’end were easier to make amplification in both alleles.
Similarly, mismatch types within the three bases closest to the 3’end affect specificities of primers. In the 3rd base, CA and TG (the highest polymorphic mismatches) belonged to weak destabilization strength mismatches. The mismatches GA, TC, TT, and CC (the higher polymorphic mismatches) located at the 4th base away from the SNP site belonged to the strong destabilization strength mismatches. From the results, we deduced that SNPs (A/T), which contained CA mismatches in the 3rd nucleotide from the 3’ end of the primers, had the highest allele-specification. According to the combination rules, polymorphic efficiency between TT (mismatch in 3’end of primer, strong destabilization strength) and CA (weak destabilization strength) are typically higher than AA (mismatch in 3’end of primer, medium destabilization strength) and CA. Our results confirmed this deduction.
Based on these results, we performed the primer design principle which could form the one and only best primer for every SNP type. Among them, mismatches in the second positions were more appropriate for SNP type (C/G and G/C), which was different from the viewpoint that mismatch in the 3rd position was the best choice for AS-PCR. With the primer design principle, we further tested the primers designed based on SNPs of rapeseed and sesame. High efficient polymorphism of the primers identified the usability of the method in other species.
A SNP primer design method was developed which improved the polymorphism efficiency of AS-PCR primers highly. The modified primer design can help to identify the best effective primer for each SNP and potentially is a valuable tool for gene mapping, map-based cloning and marker-assisted selection in crops.
Plant materials and SNP information
At least 20ug genome DNA of B. oleracea lines 01-88 and 02-12 at a concentration of ≥50 ng/ul, was sent for Solexa sequencing as a commercial service. The DNA was fragmented into small pieces using divalent cations at elevated temperature. The cleaved short DNA fragments were prepared for Solexa sequencing in BGI (China). REPEAT MASTER was used for screening repeated sequences with default parameter and labeling the sequences from different materials. For genome location of fragments, SOAP adapting the default parameter values was used for the initial alignment and screening to avoid the effects of paralog. SNP primer design was performed using screened results.
SNP analysis and verification by Sanger sequencing
To verify the putative SNPs, 96 SNP sites derived from only 8-read sequences were randomly chosen between B. oleracea lines 01-88 and 02-12. Primers (Sangon, China) were designed to amplify about 500 bp fragments in which containing the corresponding SNPs. The PCR reaction contained 25 ng DNA, 0.2 mM dNTP, 0.5U Taq (MBI, USA) with 1xbuffer, and 5pM of each primer. PCR parameters were as follows: a pre-denaturation of 94°C for 2 min, 35 cycles of amplification (94°C for 30S, 60°C for 1 min and 72°C for 1 min) and a final extension reaction was performed at 72°C for 5 min. PCR products were detected on 1.0% agarose gel by electrophoresis and ligated into PMD18T-vector (Takara, Japan) for SNP identification.
Primer design and testing
Allele-specific primers corresponding to 12 kinds of SNP in B. oleracea were designed according to different combinations between mismatch base and mismatch site. Optimization of melting temperature, primer length and amplified products length were achieved using primer program WebSNAPER (http://pga.mgh.harvard.edu/cgi-bin/snap3/websnaper3.cgi). Primer sequences were screened against B. oleracea genome repetitive sequences to minimize mis-priming.
Polymorphism assay of SNP primers were performed by PCR and detected by agarose gel electrophoresis. All the forward primers are allele-specific for B. oleracea line 02-12 and the reverse primer is not allele-specific. Amplification of SNP primers was performed on C1000TM Thermal Cycler (Bio-Rad, USA) using 20 ul reactions. Before carrying out this study, we had chosen some Taq polymerases: MBI Taq DNA Polymerase and Takara Taq (two general Taq), MBI Dream TaqTM DNA Polymerase and Takara Ex Taq (which are better in amplification efficiency and sensitivity compared to general Taqs) to identify their effects on amplification efficiency. Result showed both of general polymerases (MBI Taq DNA Polymerase and Takara Taq) had same and high allele-specific amplification efficiency compared to the other two Taqs. Therefore, general Taq polymerase would be best choice in allele-specific PCR and Taq DNA Polymerase from MBI was chose in this study. The PCR reaction contained 25 ng DNA, 0.2 mM dNTP, 0.5U Taq DNA Polymerase with 1xbuffer, and 5 pM of each primer. PCR parameters were as follows: a pre-denaturation of 94°C for 2 min, 35 cycles of amplification (94°C for 30s, 55°C-65°C for 1 min and 72°C for 30s) and a final extension reaction was performed at 72°C for 10 min. PCR products were separated on 2.5% agarose gel by electrophoresis.
Application in rapeseed and sesame
Rapeseed DNA samples including two parents (high oil content line zy036 and low oil content line 51070) and DH lines, which had been reported by Hua et al, were prepared using the DNAeasy plant kit miniprep (Qiagen, Valencia, CA). Zy036 and 51070 were re-sequenced and blasted with B. napus genome sequence (unpublished). Additionally, two sesame lines 28-31 and ZZM2289 (genome sequence has not been published) were also used in our research. All SNPs were chosen according to the method described in B. oleracea lines 01-88 and 02-12. The SNP primers were designed according to our primer design method.
This study was supported by the National Key Basic Research Program of China (2011CB109300), National 863 plans projects (2012AA101107), and Key Projects in the National Science & Technology Pillar Program (2010BAD01B02).
- 6.Hillier LW, Miller RD, Baird SE, Chinwalla A, Fulton LA, Koboldt DC, Waterston RH: Comparison of C. elegans and C. briggsae genome sequences reveals extensive conservation of chromosome organization and synteny. PLoS Biol. 2007, 5: e167-10.1371/journal.pbio.0050167.PubMedCentralCrossRefPubMedGoogle Scholar
- 16.Matsuzaki H, Dong S, Loi H, Di X, Liu G, Hubbell E, Law J, Berntsen T, Chadha M, Hui H, Yang G, Kennedy GC, Webster TA, Cawley S, Walsh PS, Jones KW, Fodor SP, Mei R: Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat Methods. 2004, 1: 109-111. 10.1038/nmeth718.CrossRefPubMedGoogle Scholar
- 17.Shen R, Fan JB, Campbell D, Chang W, Chen J, Doucet D, Yeakley J, Bibikova M, Wickham Garcia E, McBride C, Steemers F, Garcia F, Kermani BG, Gunderson K, Oliphant A: High-throughput SNP genotyping on universal bead arrays. Mutat Res. 2005, 573: 70-82. 10.1016/j.mrfmmm.2004.07.022.CrossRefPubMedGoogle Scholar
- 23.Drenkard E, Richter BG, Rozen S, Stutius LM, Angell NA, Mindrinos M, Cho RJ, Oefner PJ, Davis RW, Ausubel FM: A simple procedure for the analysis of single nucleotide polymorphisms facilitates map-based cloning in Arabidopsis. Plant Physiol. 2000, 124: 1483-1492. 10.1104/pp.124.4.1483.PubMedCentralCrossRefPubMedGoogle Scholar
- 29.Little S: Amplification-refractory mutation system (ARMS) analysis of point mutations. Curr Protoc Hum Genet. 2001, 9: 9.8.1-9.8.12.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.