Molecular Genetics and Genomics

, Volume 270, Issue 1, pp 24–33 | Cite as

Snipping polymorphisms from large EST collections in barley (Hordeum vulgare L.)

  • R. Kota
  • S. Rudd
  • A. Facius
  • G. Kolesov
  • T. Thiel
  • H. Zhang
  • N. Stein
  • K. Mayer
  • A. Graner
Original Paper


The public EST (expressed sequence tag) databases represent an enormous but heterogeneous repository of sequences, including many from a broad selection of plant species and a wide range of distinct varieties. The significant redundancy within large EST collections makes them an attractive resource for rapid pre-selection of candidate sequence polymorphisms. Here we present a strategy that allows rapid identification of candidate SNPs in barley (Hordeum vulgare L.) using publicly available EST databases. Analysis of 271,630 EST sequences from different cDNA libraries, representing 23 different barley varieties, resulted in the generation of 56,302 tentative consensus sequences. In all, 8171 of these unigene sequences are members of clusters with six or more ESTs. By applying a novel SNP detection algorithm (SNiPpER) to these sequences, we identified 3069 candidate inter-varietal SNPs. In order to verify these candidate SNPs, we selected a small subset of 63 present in 36 ESTs. Of the 63 SNPs selected, we were able to validate 54 (86%) using a direct sequencing approach. For further verification, 28 ESTs were mapped to distinct loci within the barley genome. The polymorphism information content (PIC) and nucleotide diversity (π) values of the SNPs identified by the SNiPpER algorithm are significantly higher than those that were obtained by random sequencing. This demonstrates the efficiency of our strategy for SNP identification and the cost-efficient development of EST-based SNP-markers.


Single-nucleotide polymorphisms (SNPs) Expressed sequence tags (ESTs) Denaturing high-performance liquid chromatography (DHPLC) Data mining Bioinfomatics 


  1. Abdel-Ghani AH, Parzies HK, Geiger HH (2002) Estimation of outcrossing rate in Hordeum spontaneum and barley landraces from Jordan. In: Deininger A (ed) International research on food security, natural resource management and rural development (Deutscher Tropentag 2002). University of Kassel-Witzenhausen, GermanyGoogle Scholar
  2. Badr A, Muller K, Schafer-Pregl R, El Rabey H, Effgen S, Ibrahim HH, Pozzi C, Rohde W, Salamini F (2000) On the origin and domestication history of barley ( Hordeum vulgare). Mol Biol Evol 17:499–510Google Scholar
  3. Bakhanashvili M, Hizi A (1992) Fidelity of the RNA-dependent DNA synthesis exhibited by the reverse transcriptases of human immunodeficiency virus types 1 and 2 and of murine leukemia virus: mispair extension frequencies. Biochemistry 31:9393–9398PubMedGoogle Scholar
  4. Bakhanashvili M, Hizi A (1993) The fidelity of the reverse transcriptases of human immunodeficiency viruses and murine leukemia virus, exhibited by the mispair extension frequencies, is sequence dependent and enzyme related. FEBS Lett 319:201–205CrossRefPubMedGoogle Scholar
  5. Buetow KH, Edmonson MN, Cassidy AB (1999). Reliable identification of large numbers of candidate SNPs from public EST data. Nat Genet 21:323–325CrossRefPubMedGoogle Scholar
  6. Cho RJ, et al (1999) Genome-wide mapping with biallelic markers in Arabidopsis thaliana. Nat Genet 23:203–207PubMedGoogle Scholar
  7. Curry J, Glickman BW (1997) Moloney murine leukemia reverse transcriptase suspect in the production of multiple misincorporations during hprt cDNA synthesis. Mutat Res 374:145–148CrossRefPubMedGoogle Scholar
  8. Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194PubMedGoogle Scholar
  9. Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res 8:175–185PubMedGoogle Scholar
  10. Feuillet C, Keller B (2002) Comparative genomics in the grass family: molecular characterization of grass genome structure and evolution. Ann Bot 89:3–10CrossRefPubMedGoogle Scholar
  11. Gaut BS, Le Thierry d'Ennequin M, Peek AS, Sawkins MC (2000) Maize as a model for the evolution of plant nuclear genomes. Proc Natl Acad Sci USA 97:7008–7015CrossRefPubMedGoogle Scholar
  12. Giordano M, Oefner PJ, Underhill PA, Cavalli-Sforza L, Tosi R, Richiardi PM (1999) Identification by denaturing high-performance liquid chromatography of numerous polymorphisms in a candidate region for multiple sclerosis susceptibility. Genomics 56:247–253CrossRefPubMedGoogle Scholar
  13. Goff SA, et al (2002) A draft sequence of the rice genome ( Oryza sativa L. ssp. japonica) Science 296:92–100Google Scholar
  14. Gribskov M, Devereux J, Burgess RR (1984) The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression. Nucleic Acids Res 12:539–549PubMedGoogle Scholar
  15. Griffin TJ, Smith LM (2000) Single-nucleotide polymorphism analysis by MALDI-TOF mass spectrometry. Trends Biotechnol 18:77–84CrossRefPubMedGoogle Scholar
  16. Hartl DL, Clark AG (1997) Principles of population genetics. Sinauer Associates, Sunderland, Mass.Google Scholar
  17. Heumann K, Mewes H-W (1996) The Hashed Position Tree (HPT): a suffix tree variant for large data sets stored on slow mass storage devices. In: Ziviani N, Baeza-Yates A, Guimaraes G (eds) Proceedings of the Third South American Workshop on String Processing. Carlton University Press, Ottawa, pp 101–115Google Scholar
  18. Hoskins RA, et al (2002) Heterochromatic sequences in a Drosophila whole-genome shotgun assembly. Genome Biol 3:Research 0085.1–0085.16CrossRefGoogle Scholar
  19. Kent WJ, Haussler D (2001) Assembly of the working draft of the human genome with GigAssembler. Genome Res 11:1541–1548CrossRefPubMedGoogle Scholar
  20. Kota R, Wolf M, Michalek W, Graner A (2001) Application of DHPLC for mapping of single nucleotide polymorphisms (SNPs) in barley ( Hordeum vulgare L.). Genome 44:523–528CrossRefPubMedGoogle Scholar
  21. Lund B, Ortiz R, Skovgaard IM, Waugh R, Andersen SB (2002) Analysis of potential duplicates in barley gene bank collections using re-sampling of microsatellite data. Theor Appl Genet (DOI 10.1007/s00122-002-1130-y)
  22. Marth GT, Korf I, Yandell MD, Yeh, RT, Gu Z, Zakeri H, Stitziel NO, Hillier L, Kwok PY, Gish WR (1999) A general approach to single-nucleotide polymorphism discovery. Nat Genet 23:452–456CrossRefPubMedGoogle Scholar
  23. Nairz K, Stocker H, Schindelholz B, Hafen E (2002) High-resolution SNP mapping by denaturing HPLC. Proc Natl Acad Sci USA 99:10575–10580CrossRefPubMedGoogle Scholar
  24. Neff MM, Neff JD, Chory J, Pepper AE (1998) dCAPS, a simple technique for the genetic analysis of single nucleotide polymorphisms: experimental applications in Arabidopsis thaliana genetics. Plant J 14:387–392PubMedGoogle Scholar
  25. Nei M (1987) Molecular evolutionary genetics. Columbia University Press, New YorkGoogle Scholar
  26. Nei M, Li WH (1979). Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci USA 76:5269–5273PubMedGoogle Scholar
  27. Newton CR, Graham A, Heptinstall LE, Powell SJ, Summers C, Kalshekar N, Smith JC, Markham AF (1989) Analysis of any point mutation in DNA: the amplification refractory mutation system (ARMS). Nucleic Acids Res 17:2503–2516PubMedGoogle Scholar
  28. Oefner PJ, Underhill PA (1998) DNA mutation detection using denaturing high performance liquid chromatography (DHPLC). In: Current Protocols in Human Genetics. Wiley and Sons, USAGoogle Scholar
  29. Pastinen T, Raitio M, Lindroos K, Tainola P, Peltonen L, Syvanen AC (2000) A system for specific, high-throughput genotyping by allele-specific primer extension on microarrays. Genet Res 10:1031–1042CrossRefGoogle Scholar
  30. Picoult-Newberg L, Ideker TE, Pohl MG, Taylor SL, Donaldson MA, Nickerson DA, Boyce-Jacino M (1999) Mining SNPs from EST databases. Genome Res 9:167–174PubMedGoogle Scholar
  31. Quackenbush J, Cho J, Lee D, Liang F, Holt I, Karamycheva S, Parvizi B, Pertea G, Sultana R, White J (2001) The TIGR gene indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res 29:159–164PubMedGoogle Scholar
  32. Rafalski JA (2002a) Novel genetic mapping tools in plants: SNPs and LD-based approaches. Plant Sci 162:329–333CrossRefGoogle Scholar
  33. Rafalski JA (2002b) Application of single nucleotide polymorphisms in crop genetics. Curr Opin Plant Biol 5:94–100Google Scholar
  34. Roberts JD, Preston BD, Johnston LA, Soni A, Loeb LA, Kunkel TA (1989) Fidelity of two retroviral reverse transcriptases during DNA-dependent DNA synthesis in vitro. Mol Cell Biol 9:469–476PubMedGoogle Scholar
  35. Ronaghi M, Uhlén M, Nyrén P (1998) A sequencing method based on real-time pyrophosphate. Science 281:363–365PubMedGoogle Scholar
  36. Rostoks N, Park YJ, Ramakrishna W, Ma J, Druka A, Shiloff BA, SanMiguel PJ, Jiang Z, Brueggeman R, Sandhu D, Gill K, Bennetzen JL, Kleinhofs A (2002) Genomic sequencing reveals gene content, genomic organization, and recombination relationships in barley. Funct Integr Genomics 2:51–59CrossRefPubMedGoogle Scholar
  37. Rudd S, Mewes HW, Mayer KF (2003) Sputnik: a database platform for comparative plant genomics. Nucleic Acids Res 31:128–132CrossRefPubMedGoogle Scholar
  38. Schneider K, Weisshaar B, Borchardt DC, Salamini F (2001) SNP frequency and allelic haplotype of Beta vulgaris expressed genes. Mol Breeding 8:63–74CrossRefGoogle Scholar
  39. Stoesser G, Baker W, Van Den Broek A, Garcia-Pastor M, Kanz C, Kulikova T, Leinonen R, Lin Q, Lombard V, Lopez R, Redaschi N, Stoehr P, Tuli MA, Tzouvara K, Vaughan R (2003) The EMBL nucleotide sequence database: major new developments. Nucleic Acids Res 31:17–22CrossRefPubMedGoogle Scholar
  40. Syvänen AC, Aalto-Setälä K, Harju L, Kontula K, Soderlund H (1990) A primer-guided nucleotide incorporation assay in the genotyping of Apolipoprotein E. Genomics 8:684–692PubMedGoogle Scholar
  41. The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815PubMedGoogle Scholar
  42. Useche FJ, Gao G, Harafey M, Rafalski A (2001) High-throughput identification, database storage and analysis of SNPs in EST sequences. Genome Inform Ser Workshop Genome Inform 12:194–203PubMedGoogle Scholar
  43. Wallace RB, Shaffer J, Murphy RF, Bonner J, Hirose T, Itakura K (1979) Hybridization of synthetic oligodeoxyribonucleotides to ϕX174 DNA: the effect of single base pair mis-match. Nucleic Acids Res 6:3543–3557PubMedGoogle Scholar
  44. Wallace RB, Johnson MJ, Hirose T, Miyake T, Kawashima EH, Itakura K (1981) The use of synthetic oligonucleotide as hybridization probes. II. Hybridization of oligonucleotides of mixed sequence to rabbit β-globin DNA. Nucleic Acids Res 9:879–894PubMedGoogle Scholar
  45. Waterston RH, Lander ES, Sulston JE (2002) On the sequencing of the human genome. Proc Natl Acad Sci USA 99:3712–3716CrossRefPubMedGoogle Scholar
  46. Wolford JK, Blunt D, Ballecer C, Prochazka M (2000) High-throughput SNP detection by using DNA pooling and denaturing high performance liquid chromatography (DHPLC). Human Genet 107:483–487CrossRefGoogle Scholar
  47. Wu DY, Wallance RB (1989) The ligation amplification reaction (LAR)—amplification of specific DNA sequences using sequential rounds of template-dependent ligation. Genomics 4:460–569PubMedGoogle Scholar
  48. Yu J, (2002) A draft sequence of the rice genome ( Oryza sativa L. ssp. indica) Science 296:79–92Google Scholar
  49. Zohary D, Hopf M (2000) Domestication of plants in the old world. Oxford University Press, Oxford, UKGoogle Scholar

Copyright information

© Springer-Verlag 2003

Authors and Affiliations

  • R. Kota
    • 1
  • S. Rudd
    • 2
  • A. Facius
    • 2
  • G. Kolesov
    • 2
  • T. Thiel
    • 1
  • H. Zhang
    • 1
  • N. Stein
    • 1
  • K. Mayer
    • 2
  • A. Graner
    • 1
  1. 1.Institute for Plant Genetics and Crop Plant Research (IPK)GaterslebenGermany
  2. 2.MIPS-Institute for BioinformaticsNational Research Center for Environment and Health (GSF)NeuherbergGermany

Personalised recommendations