Plant Molecular Biology

, Volume 54, Issue 3, pp 461–470 | Cite as

Automated SNP Detection in Expressed Sequence Tags: Statistical Considerations and Application to Maritime Pine Sequences

  • Loïck Le Dantec
  • David Chagné
  • David Pot
  • Olivier Cantin
  • Pauline Garnier-Géré
  • Frank Bedon
  • Jean-Marc Frigerio
  • Philippe Chaumeil
  • Patrick Léger
  • Virginie Garcia
  • Frédéric Laigret
  • Antoine de Daruvar
  • Christophe Plomion

Abstract

We developed an automated pipeline for the detection of single nucleotide polymorphisms (SNPs) in expressed sequence tag (EST) data sets, by combining three DNA sequence analysis programs:Phred, Phrap and PolyBayes. This application requires access to the individual electrophoregram traces. First, a reference set of 65 SNPs was obtained from the sequencing of 30 gametes in 13 maritime pine (Pinus pinaster Ait.) gene fragments (6671 bp), resulting in a frequency of 1 SNP every 102.6 bp. Second, parameters of the three programs were optimized in order to retrieve as many true SNPs, while keeping the rate of false positive as low as possible. Overall, the efficiency of detection of true SNPs was 83.1%. However, this rate varied largely as a function of the rare SNP allele frequency: down to 41% for rare SNP alleles (frequency ` 10%), up to 98% for allele frequencies above 10%. Third, the detection method was applied to the 18498 assembled maritime pine (Pinus pinaster Ait.) ESTs, allowing to identify a total of 1400 candidate SNPs, in contigs containing between 4 and 20 sequence reads. These genetic resources, described for the first time in a forest tree species, were made available at http://www.pierroton.inra/genetics/Pinesnps. We also derived an analytical expression for the SNP detection probability as a function of the SNP allele frequency, the number of haploid genomes used to generate the EST sequence database, and the sample size of the contigs considered for SNP detection. The frequency of the SNP allele was shown to be the main factor influencing the probability of SNP detection.

EST in silico detection maritime pine SNP 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Batley, J., Barker, G., O'Sullivan, H., Edwards, K.J. and Edwards, D. 2003. Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence tag data. Plant Physiol. 132: 84-91.CrossRefPubMedGoogle Scholar
  2. Bebenek, K., Abbotts, J., Wilson, S.H. and Kunkel, T.A. 1993. Error-prone polymerization by HIV-1 reverse transcriptase. J. Biol. Chem. 268: 10324-10334.PubMedGoogle Scholar
  3. Brookes, A.J. 1999. The essence of SNPs. Gene 234: 177-186.CrossRefPubMedGoogle Scholar
  4. Brumfield, R.T., Beerli, P., Nickerson, D.A. and Edwards, S.V. 2003. The utility of single nucleotide polymorphisms in inferences of population history. Trends Ecol. Evol. 18:249- 256.CrossRefGoogle Scholar
  5. Cantón, F.R., Le Provost, G., Garcia, V., Barré, A., Frigério, J.-M., Paiva, J., Fevereiro, P., Á vila, C., Mouret, J.-F., de Daruvar, A., Cánovas, F.M. and Plomion C. in press. Transcriptome analysis of wood formation in maritime pine. In: Sustainable Forestry, Wood products & Biotechnology, BIOFOR Proceeding.Google Scholar
  6. Chagné, D., Lalanne, C., Madur, D., Kumar, S., Frigerio, J.-M., Krier, C., Decroocq, S., Savoure, A., Bou-Dagher, K.-M., Bertocchi, E., Brach, J. and Plomion, C. 2002. A high density genetic map of maritime pine based on AFLPs. Ann. For. Sci. 59: 627-636.CrossRefGoogle Scholar
  7. Chagné, D., Brown, G., Lalanne, C., Madur, D., Pot, D., Neale, D. and Plomion, C. 2003. Comparative genome and QTL mapping between maritime and loblolly pines. Mol. Breeding 12: 185-195.CrossRefGoogle Scholar
  8. Ching, A., Caldwell, K.S., Jung, M., Dolan, M., Smith, O.S., Tingey, S., Morgante, M. and Rafalski, A.J. 2002. SNP frequency, haplotype structure and linkage disequilibrium in elite maize inbred lines. BMC Genet. 3: 1-19.CrossRefPubMedGoogle Scholar
  9. Cho, R.J., Mindrinos, M., Richards, D.R., Sapolsky, R.J., Anderson, M., Drenkard, E., Dewdney, J., Reuber, T.L., Stammers, M., Federspiel, N., Theologis, A., Yang, W.H., Hubbell, E., Au, M., Chung, E.Y., Lashkari, D., Lemieux, B., Dean, C., Lipshutz, R.J., Ausubel, F.M., Davis, R.W. and Oefner, P.J. 1999. Genome-wide mapping with biallelic markers in Arabidopsis thaliana. Nat. Genet. 23: 203-207.CrossRefPubMedGoogle Scholar
  10. Christoffels, A., van Gelder, A., Greyling, G., Miller, R., Hide, T. and Hide, W. 2001. STACK: sequence tag alignment and consensus knowledgebase. Nuc. Ac. Res. 29(1): 238-238.Google Scholar
  11. Collins, F.S., Guyer, M.S. and Charkravarti, A. 1997. Variations on a theme: cataloging human DNA sequence variation. Science 278: 1580-1581.CrossRefPubMedGoogle Scholar
  12. Collins, A., Lonjou, C. and Morton, N.E. 1999. Genetic epidemiology of single-nucleotide polymorphisms. Proc. Natl. Acad. Sci. USA 96: 15173-15177.CrossRefPubMedGoogle Scholar
  13. Emahazion, T., Feuk, L., Jobs, M., Sawyer, S.L., Fredman, D., St Clair, D., Prince, J.A. and Brookes, A.J. 2001. SNP association studies in Alzheimer's disease highlight problems for complex disease analysis. Trends Genet. 17: 407-413.CrossRefPubMedGoogle Scholar
  14. Ewing, B. and Green, P. 1998. Base calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 8: 186-194.PubMedGoogle Scholar
  15. Ewing, B., Hiller, L.D., Wendl, M.C. and Green, P. 1998. Base calling of automated sequencer traces using Phred. II. Accuracy assessment. Genome Res. 8: 175-185.PubMedGoogle Scholar
  16. Frigerio, J.-M., Dubos, C., Chaumeil, P., Salin, F., Garcia, V., Barré, A. and Plomion, C. in press. Using transcriptome analysis to identify osmotic stress candidate genes in maritime pine (Pinus pinaster Ait.). In: Sustainable Forestry, Wood products & Biotechnology, BIOFOR Proceeding.Google Scholar
  17. Gallagher, S.R. (Ed.), 1992. Gus Protocols: using the GUS Gene as a Reporter of Gene Expression. Academic Press, New York, 221 pp.Google Scholar
  18. Gordon, D., Abajian, C. and Green P. 1998. Consed: a graphical tool for sequence finishing. Genome Res. 8: 195-202.PubMedGoogle Scholar
  19. Grivet, L., Glaszmann, J.-C., Vincentz, M., da Silva, F. and Arruda, P. 2003. ESTs as a source for sequence polymorphism discovery in sugarcane: example of the Adh genes. Theor. Appl. Genet. 106: 190-197.PubMedGoogle Scholar
  20. Gray, I.C., Campbell, D.A. and Spurr, N.K. 2000. Single nucleotide polymorphisms as tools in human genetics. Hum. Mol. Genet. 9: 2403-2408.CrossRefPubMedGoogle Scholar
  21. Kinlaw and Neale, 1997.Google Scholar
  22. Kota, R., Rudd, S., Facius, A., Kolesov, G., Thiel, T., Zhang, H., Stein, N., Mayer, K. and Graner, A. in press. Snipping polymorphisms from large EST collections in barley (Hordeum vulgare L.). Mol. Gen. Genome.Google Scholar
  23. Kruglyak, L. 1997. The use of a genetic map of biallelic markers in linkage studies. Nat. Genet. 17: 21-24.CrossRefPubMedGoogle Scholar
  24. Le Provost, G., Paiva, J., Pot, D., Brach, J. and Plomion, C. 2003. Seasonal variation in transcript accumulation in wood forming tissues of maritime pine (Pinus pinaster Ait.) with emphasis on a cell wall glycin rich protein. Planta 217: 820-830.CrossRefPubMedGoogle Scholar
  25. Letondal, C. 2001. A Web interface generator for moleular biology programs in Unix, Bioinformatics 17: 73-82.CrossRefGoogle Scholar
  26. Marth, G.T., Korf, I., Yandell, M.D., Yeh, R.T., Gu, Z., Zakeri, H., Stitziel, N.O., Hillier, L., Kwok, P. and Gish, W.R. 1999. A general approach to single-nucleotide polymorphism discovery. Nat. Genet. 23: 452-456.CrossRefPubMedGoogle Scholar
  27. Nickerson, D.A., Tobe, V.O. and Taylor, S.L. 1997. PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nuc. Ac. Res. 25: 2745-2751.CrossRefGoogle Scholar
  28. Nordborg, M., Borevitz, J.O., Bergelson, J., Berry, C.C., Chory, J., Hagenblad, J., Kreitman, M., Maloof, J.N., Noyes, T., Oefner, P.J., Stahl, E.A. and Weigel, D. 2002. The extent of linkage disequilibrium in Arabidopsis thaliana. Nat. Genet. 30: 190-193.CrossRefPubMedGoogle Scholar
  29. Picoult-Newberg, L., Ideker, T.E., Pohl, M.G., Taylor, S.L., Donaldson, M.A., Nickerson, D.A. and Boyce-Jacino, M. 1999. Mining SNPs from EST databases. Genome Res. 9: 167-174.PubMedGoogle Scholar
  30. Rounsley, S., Xiaoying, L. and Ketchum, K.A. 1998. Largescale sequencing of plant genomes. Curr. Opin. Plant Biol. 1: 136-141.CrossRefPubMedGoogle Scholar
  31. Riley, J.H., Allan, C.J., Lai, E. and Roses, A. 2000. The use of single nucleotide polymorphisms in the isolation of common disease genes. Pharmacogenomics 1: 39-47.PubMedGoogle Scholar
  32. Sachidanandam, R., Weissman, D., Schmidt, S.C., Kakol, J.M., Stein, L.D., Marth, G., Sherry, S., Mullikin, J.C., Mortimore, B.J., Willey, D.L., Hunt, S.E., Cole, C.G., Coggill, P.C., Rice, C.M., Ning, Z., Rogers, J., Bentley, D.R., Kwok, P.Y., Mardis, E.R., Yeh, R.T., Schultz, B., Cook, L., Davenport, R., Dante, M., Fulton, L.Google Scholar
  33. Hillier, L., Waterston, R.H., McPherson, J.D., Gilman, B., Schaffner, S., Van Etten, W.J., Reich, D., Higgins, J., Daly, M.J., Blumenstiel, B., Baldwin, J., Stange-Thomann, N., Zody, M.C., Linton, L., Lander, E.S. and Atshuler, D. 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 928- 933.CrossRefPubMedGoogle Scholar
  34. Somers, D.L., Kirkpatrick, R., Moniwa, M. and Walsh, A. 2003. Mining single-nucleotide polymorphisms from hexaploid wheat ESTs. Genome 49: 431-437.CrossRefGoogle Scholar
  35. Thornsberry, J.M., Goodman, M.M., Doebley, J., Kresovich, S., Nielsen, D. and Buckler, E.S. 2001. Dwarf polymorphisms associate with variation in flowering time. Nat. Genet. 28: 286-289.CrossRefPubMedGoogle Scholar
  36. Useche, F.J., Gao, G., Harafey, M. and Rafalski, A. 2001. High-throughput identification, database storage and analysis of SNPs in EST sequences. Genome Inform. Ser. Workshop Genome Inform. 12: 194-203.Google Scholar
  37. Wilson, M.R., Di Zinno, J.A., Polanskey, D., Replogle, J. and Budowle, B. 1995. Validation of mitochondrial DNA sequencing for forensic casework analysis. Int. J. Legal Med. 108: 68-74.PubMedGoogle Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Loïck Le Dantec
    • 1
    • 2
  • David Chagné
    • 3
  • David Pot
    • 3
  • Olivier Cantin
    • 1
    • 2
  • Pauline Garnier-Géré
    • 3
  • Frank Bedon
    • 3
  • Jean-Marc Frigerio
    • 3
  • Philippe Chaumeil
    • 3
  • Patrick Léger
    • 3
  • Virginie Garcia
    • 4
  • Frédéric Laigret
    • 1
  • Antoine de Daruvar
    • 2
  • Christophe Plomion
    • 3
  1. 1.Unité de Recherche sur les Espèces Fruitières et la Vigne, INRAVillenave d'Ornon CedexFrance
  2. 2.Centre de Bioinformatique de BordeauxUniversité V. Segalen Bordeaux 2Bordeaux CédexFrance
  3. 3.UMR 1202 BIOGECO, INRA, Equipe de Génétique, 69 route d'ArcachonCestas CedexFrance
  4. 4.UMR 619 Physiologie et Biotechnologie Végétales, INRA, IBVMVillenave d'Ornon CedexFrance

Personalised recommendations