Theoretical and Applied Genetics

, Volume 119, Issue 1, pp 151–164 | Cite as

Single feature polymorphisms between two rice cultivars detected using a median polish method

  • Weibo Xie
  • Ying Chen
  • Gang Zhou
  • Lei Wang
  • Chengjun Zhang
  • Jianwei Zhang
  • Jinghua Xiao
  • Tong Zhu
  • Qifa Zhang
Original Paper


Expression levels measured in microarrays of oligonucleotide probes have now been adapted as a high throughput approach for identifying DNA sequence variation between genotypes, referred to as single feature polymorphisms (SFPs). Although there have been increasing interests in this approach, there is still need for improving the algorithm in order to achieve high sensitivity and specificity especially with complex genome and large datasets, while maintaining optimal computational performance. We obtained microarray datasets for expression profiles of two rice cultivars and adapted a median polish method to detect SFPs. The analysis identified 6,655 SFPs between two the rice varieties representing 3,131 rice unique genes. We showed that the median polish method has the advantage of avoiding fitting complex linear models thus can be used to analyze complex transcriptome datasets like the ones in this study. The method is also superior in sensitivity, accuracy and computing time requirement compared with two previously used methods. A comparison with data from a resequencing project indicated that 75.6% of the SFPs had SNP supports in the probe regions. Further comparison revealed that SNPs in sequences immediately flanking the probes also had contributions to the detection of SFPs in cases where the probes and the targets had perfectly matched sequences. It was shown that differences in minimum free energies caused by flanking SNPs, which may change the stability of RNA secondary structure, may partly explain the SFPs as detected. These SFPs may facilitate gene discovery in future studies.


Minimum Free Energy Median Polish Single Feature Polymorphism Flank SNPs Hybridization Affinity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We thank Dr. James Ronald and Dr. Rachel B. Brem for help and suggestions in yeast data. This work was supported by grants from the National Special Key Project of China on Functional Genomics of Major Plants and Animals, and the National Natural Science Foundation of China.

Supplementary material

122_2009_1025_MOESM1_ESM.pdf (4 kb)
Supplementary file 1 is a figure showing the effect of SNP position for SFP detection. The description of the supplementary file 1: The x axis shows the distance of SNP position from the edge of the 25 mer probe and the y axis is the false negative rate of SNPs. (PDF 4 kb)
122_2009_1025_MOESM2_ESM.pdf (22 kb)
Supplementary file 2 is a figure showing the distribution of ΔG RM – BY , indicated the influence of minimum free energy of RNA towards the binding affinity. The description of the supplementary file 2: The x axis shows the difference of minimum RNA free energy of RM minus BY (ΔG RM - BY ). Grey lines indicate the 50% quantile (0.601 kJ) of all 3,439 ΔG RM - BY . The distribution of all ΔG is demonstrated in grey. The distribution of subset sequence pairs of which the corresponding probes have higher residuals of median polish to RM targets than to BY (\( \bar{E}_{RM} > \bar{E}_{BY} \), BH adjusted P value <0.5) is denoted in red while the distribution of probes with lower residuals to RM targets \( \left( {\bar{E}_{RM} > \bar{E}_{BY} } \right) \) is showed in green. The enrichment of positive ΔG RM - BY in the \( \bar{E}_{RM} > \bar{E}_{BY} \) group and negative ΔG RM - BY in the \( \bar{E}_{RM} > \bar{E}_{BY} \) group provide an instinct consideration that the minimum free energy of RNA is correlated positively to the residuals thus is correlated positively to binding affinity. (PDF 22 kb)


  1. Affymetrix Inc (2001) GeneChip expression analysis technical manualGoogle Scholar
  2. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410PubMedGoogle Scholar
  3. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D et al (2007) NCBI GEO: mining tens of millions of expression profiles-database and tools update. Nucleic Acids Res 35:D760–D765PubMedCrossRefGoogle Scholar
  4. Borevitz JO, Liang D, Plouffe D, Chang HS, Zhu T et al (2003) Large-scale identification of single-feature polymorphisms in complex genomes. Genome Res 13:513–523PubMedCrossRefGoogle Scholar
  5. Borevitz JO, Hazen SP, Michael TP, Morris GP, Baxter IR et al (2007) Genome-wide patterns of single-feature polymorphism in Arabidopsis thaliana. Proc Natl Acad Sci USA 104:12057–12062PubMedCrossRefGoogle Scholar
  6. Carlon E, Heim T, Wolterink JK, Barkema GT (2006) Comment on “Solving the riddle of the bright mismatches: labeling and effective binding in oligonucleotide arrays”. Phys Rev E 73:063901CrossRefGoogle Scholar
  7. Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC et al (2005) The transcriptional landscape of the mammalian genome. Science 309:1559–1563PubMedCrossRefGoogle Scholar
  8. Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G et al (2007) Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science 317:338–342PubMedCrossRefGoogle Scholar
  9. Cui X, Xu J, Asghar R, Condamine P, Svensson JT et al (2005) Detecting single-feature polymorphisms using oligonucleotide arrays and robustified projection pursuit. Bioinformatics 21:3852–3858PubMedCrossRefGoogle Scholar
  10. Das S, Bhat PR, Sudhakar C, Ehlers JD, Wanamaker S et al (2008) Detection and validation of single feature polymorphisms in cowpea (Vigna unguiculata L. Walp) using a soybean genome array. BMC Genomics 9:107PubMedCrossRefGoogle Scholar
  11. Gautier L, Cope L, Bolstad BM, Irizarry RA (2004) affy-analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20:307–315PubMedCrossRefGoogle Scholar
  12. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80PubMedCrossRefGoogle Scholar
  13. Gore M, Bradbury P, Hogers R, Kirst M, Verstege E et al (2007) Evaluation of target preparation methods for single-feature polymorphism detection in large complex plant genomes. Crop Sci 47:S-135–S-148CrossRefGoogle Scholar
  14. Gresham D, Ruderfer DM, Pratt SC, Schacherer J, Dunham MJ et al (2006) Genome-wide detection of polymorphisms at nucleotide resolution with a single DNA microarray. Science 311:1932–1936PubMedCrossRefGoogle Scholar
  15. Guryev V, Saar K, Adamovic T, Verheul M, van Heesch SA et al (2008) Distribution and functional impact of DNA copy number variation in the rat. Nat Genet 40:538–545PubMedCrossRefGoogle Scholar
  16. Hua JP, Xing YZ, Xu CG, Sun XL, Yu SB, Zhang Q (2002) Genetic dissection of an elite rice hybrid revealed that heterozygotes are not always advantageous for performance. Genetics 162:1885–1895PubMedGoogle Scholar
  17. Huang Y, Zhang L, Zhang J, Yuan D, Xu C et al (2006) Heterosis and polymorphisms of gene expression in an elite rice hybrid as revealed by a microarray analysis of 9198 unique ESTs. Plant Mol Biol 62:579–591PubMedCrossRefGoogle Scholar
  18. Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5:299–314CrossRefGoogle Scholar
  19. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ et al (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249–264PubMedCrossRefGoogle Scholar
  20. Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12:656–664PubMedGoogle Scholar
  21. Kumar R, Qiu J, Joshi T, Valliyodan B, Xu D, Nguyen HT (2007) Single feature polymorphism discovery in rice. PLoS ONE 2:e284PubMedCrossRefGoogle Scholar
  22. Li C, Hung Wong W (2001) Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol 2:RESEARCH0032Google Scholar
  23. Luo ZW, Potokina E, Druka A, Wise R, Waugh R, Kearsey MJ (2007) SFP genotyping from affymetrix arrays is robust but largely detects cis-acting expression regulators. Genetics 176:789–800PubMedCrossRefGoogle Scholar
  24. Markham NR, Zuker M (2005) DINAMelt web server for nucleic acid melting prediction. Nucleic Acids Res 33:W577–W581PubMedCrossRefGoogle Scholar
  25. McNally KL, Bruskiewich R, Mackill D, Buell CR, Leach JE, Leung H (2006) Sequencing multiple and diverse rice varieties. Connecting whole-genome variation with phenotypes. Plant Physiol 141:26–31PubMedCrossRefGoogle Scholar
  26. Naef F, Hacker CR, Patil N, Magnasco M (2002a) Characterization of the expression ratio noise structure in high-density oligonucleotide arrays. Genome Biol 3:PREPRINT0001Google Scholar
  27. Naef F, Lim DA, Patil N, Magnasco M (2002b) DNA hybridization to mismatched templates: a chip study. Phys Rev E 65:040902CrossRefGoogle Scholar
  28. Naef F, Magnasco MO (2003) Solving the riddle of the bright mismatches: labeling and effective binding in oligonucleotide arrays. Phys Rev E 68:011906CrossRefGoogle Scholar
  29. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D et al (2008) The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320:1344–1349PubMedCrossRefGoogle Scholar
  30. Potokina E, Druka A, Luo Z, Wise R, Waugh R, Kearsey M (2008) Gene expression quantitative trait locus analysis of 16 000 barley genes reveals a complex pattern of genome-wide transcriptional regulation. Plant J 53:90–101PubMedCrossRefGoogle Scholar
  31. Ronald J, Akey JM, Whittle J, Smith EN, Yvert G, Kruglyak L (2005) Simultaneous genotyping, gene-expression measurement, and detection of allele-specific expression with oligonucleotide arrays. Genome Res 15:284–291PubMedCrossRefGoogle Scholar
  32. Rostoks N, Borevitz JO, Hedley PE, Russell J, Mudie S et al (2005) Single-feature polymorphism discovery in the barley transcriptome. Genome Biol 6:R54PubMedCrossRefGoogle Scholar
  33. Seheult AH, Tukey JW (2001) Towards robust analysis of variance. Data Analysis from Statistical Foundations. Nova Publishers, New York, pp 217–244Google Scholar
  34. Smyth GK (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:Article3Google Scholar
  35. Southern E, Mir K, Shchepinov M (1999) Molecular interactions on microarrays. Nat Genet 21:5–9PubMedCrossRefGoogle Scholar
  36. Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C et al (2007) Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315:848–853PubMedCrossRefGoogle Scholar
  37. Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Menlo ParkGoogle Scholar
  38. Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98:5116–5121PubMedCrossRefGoogle Scholar
  39. Walter AE, Turner DH, Kim J, Lyttle MH, Muller P et al (1994) Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. Proc Natl Acad Sci USA 91:9218–9222PubMedCrossRefGoogle Scholar
  40. Wang BB, Brendel V (2006) Genomewide comparative analysis of alternative splicing in plants. Proc Natl Acad Sci USA 103:7175–7180PubMedCrossRefGoogle Scholar
  41. Wang Y, Miao ZH, Pommier Y, Kawasaki ES, Player A (2007) Characterization of mismatch and high-signal intensity probes associated with Affymetrix genechips. Bioinformatics 23:2088–2095PubMedCrossRefGoogle Scholar
  42. West MA, van Leeuwen H, Kozik A, Kliebenstein DJ, Doerge RW et al (2006) High-density haplotyping with microarray-based expression and single feature polymorphism markers in Arabidopsis. Genome Res 16:787–795PubMedCrossRefGoogle Scholar
  43. West MA, Kim K, Kliebenstein DJ, van Leeuwen H, Michelmore RW et al (2007) Global eQTL mapping reveals the complex genetic architecture of transcript-level variation in Arabidopsis. Genetics 175:1441–1450PubMedCrossRefGoogle Scholar
  44. Winzeler EA, Richards DR, Conway AR, Goldstein AL, Kalman S et al (1998) Direct allelic variation scanning of the yeast genome. Science 281:1194–1197PubMedCrossRefGoogle Scholar
  45. Xing Z, Tan F, Hua P, Sun L, Xu G, Zhang Q (2002) Characterization of the main effects, epistatic effects and their environmental interactions of QTLs on the genetic basis of yield traits in rice. Theor Appl Genet 105:248–257PubMedCrossRefGoogle Scholar
  46. Xue W, Xing Y, Weng X, Zhao Y, Tang W et al (2008) Natural variation in Ghd7 is an important regulator of heading date and yield potential in rice. Nat Genet 40:761–767PubMedCrossRefGoogle Scholar
  47. Zhang L, Miles MF, Aldape KD (2003) A model of molecular interactions on short oligonucleotide microarrays. Nat Biotechnol 21:818–821PubMedCrossRefGoogle Scholar
  48. Zhu T, Salmeron J (2007) High-definition genome profiling for genetic marker discovery. Trends Plant Sci 12:196–202PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • Weibo Xie
    • 1
  • Ying Chen
    • 1
  • Gang Zhou
    • 1
  • Lei Wang
    • 1
  • Chengjun Zhang
    • 1
  • Jianwei Zhang
    • 1
  • Jinghua Xiao
    • 1
  • Tong Zhu
    • 1
  • Qifa Zhang
    • 1
  1. 1.National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan)Huazhong Agricultural UniversityWuhanChina

Personalised recommendations