Skip to main content
Log in

Automated SNP Detection in Expressed Sequence Tags: Statistical Considerations and Application to Maritime Pine Sequences

  • Published:
Plant Molecular Biology Aims and scope Submit manuscript

Abstract

We developed an automated pipeline for the detection of single nucleotide polymorphisms (SNPs) in expressed sequence tag (EST) data sets, by combining three DNA sequence analysis programs:Phred, Phrap and PolyBayes. This application requires access to the individual electrophoregram traces. First, a reference set of 65 SNPs was obtained from the sequencing of 30 gametes in 13 maritime pine (Pinus pinaster Ait.) gene fragments (6671 bp), resulting in a frequency of 1 SNP every 102.6 bp. Second, parameters of the three programs were optimized in order to retrieve as many true SNPs, while keeping the rate of false positive as low as possible. Overall, the efficiency of detection of true SNPs was 83.1%. However, this rate varied largely as a function of the rare SNP allele frequency: down to 41% for rare SNP alleles (frequency ` 10%), up to 98% for allele frequencies above 10%. Third, the detection method was applied to the 18498 assembled maritime pine (Pinus pinaster Ait.) ESTs, allowing to identify a total of 1400 candidate SNPs, in contigs containing between 4 and 20 sequence reads. These genetic resources, described for the first time in a forest tree species, were made available at http://www.pierroton.inra/genetics/Pinesnps. We also derived an analytical expression for the SNP detection probability as a function of the SNP allele frequency, the number of haploid genomes used to generate the EST sequence database, and the sample size of the contigs considered for SNP detection. The frequency of the SNP allele was shown to be the main factor influencing the probability of SNP detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Batley, J., Barker, G., O'Sullivan, H., Edwards, K.J. and Edwards, D. 2003. Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence tag data. Plant Physiol. 132: 84-91.

    Article  PubMed  Google Scholar 

  • Bebenek, K., Abbotts, J., Wilson, S.H. and Kunkel, T.A. 1993. Error-prone polymerization by HIV-1 reverse transcriptase. J. Biol. Chem. 268: 10324-10334.

    PubMed  Google Scholar 

  • Brookes, A.J. 1999. The essence of SNPs. Gene 234: 177-186.

    Article  PubMed  Google Scholar 

  • Brumfield, R.T., Beerli, P., Nickerson, D.A. and Edwards, S.V. 2003. The utility of single nucleotide polymorphisms in inferences of population history. Trends Ecol. Evol. 18:249- 256.

    Article  Google Scholar 

  • Cantón, F.R., Le Provost, G., Garcia, V., Barré, A., Frigério, J.-M., Paiva, J., Fevereiro, P., Á vila, C., Mouret, J.-F., de Daruvar, A., Cánovas, F.M. and Plomion C. in press. Transcriptome analysis of wood formation in maritime pine. In: Sustainable Forestry, Wood products & Biotechnology, BIOFOR Proceeding.

  • Chagné, D., Lalanne, C., Madur, D., Kumar, S., Frigerio, J.-M., Krier, C., Decroocq, S., Savoure, A., Bou-Dagher, K.-M., Bertocchi, E., Brach, J. and Plomion, C. 2002. A high density genetic map of maritime pine based on AFLPs. Ann. For. Sci. 59: 627-636.

    Article  Google Scholar 

  • Chagné, D., Brown, G., Lalanne, C., Madur, D., Pot, D., Neale, D. and Plomion, C. 2003. Comparative genome and QTL mapping between maritime and loblolly pines. Mol. Breeding 12: 185-195.

    Article  Google Scholar 

  • Ching, A., Caldwell, K.S., Jung, M., Dolan, M., Smith, O.S., Tingey, S., Morgante, M. and Rafalski, A.J. 2002. SNP frequency, haplotype structure and linkage disequilibrium in elite maize inbred lines. BMC Genet. 3: 1-19.

    Article  PubMed  Google Scholar 

  • Cho, R.J., Mindrinos, M., Richards, D.R., Sapolsky, R.J., Anderson, M., Drenkard, E., Dewdney, J., Reuber, T.L., Stammers, M., Federspiel, N., Theologis, A., Yang, W.H., Hubbell, E., Au, M., Chung, E.Y., Lashkari, D., Lemieux, B., Dean, C., Lipshutz, R.J., Ausubel, F.M., Davis, R.W. and Oefner, P.J. 1999. Genome-wide mapping with biallelic markers in Arabidopsis thaliana. Nat. Genet. 23: 203-207.

    Article  PubMed  Google Scholar 

  • Christoffels, A., van Gelder, A., Greyling, G., Miller, R., Hide, T. and Hide, W. 2001. STACK: sequence tag alignment and consensus knowledgebase. Nuc. Ac. Res. 29(1): 238-238.

    Google Scholar 

  • Collins, F.S., Guyer, M.S. and Charkravarti, A. 1997. Variations on a theme: cataloging human DNA sequence variation. Science 278: 1580-1581.

    Article  PubMed  Google Scholar 

  • Collins, A., Lonjou, C. and Morton, N.E. 1999. Genetic epidemiology of single-nucleotide polymorphisms. Proc. Natl. Acad. Sci. USA 96: 15173-15177.

    Article  PubMed  Google Scholar 

  • Emahazion, T., Feuk, L., Jobs, M., Sawyer, S.L., Fredman, D., St Clair, D., Prince, J.A. and Brookes, A.J. 2001. SNP association studies in Alzheimer's disease highlight problems for complex disease analysis. Trends Genet. 17: 407-413.

    Article  PubMed  Google Scholar 

  • Ewing, B. and Green, P. 1998. Base calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 8: 186-194.

    PubMed  Google Scholar 

  • Ewing, B., Hiller, L.D., Wendl, M.C. and Green, P. 1998. Base calling of automated sequencer traces using Phred. II. Accuracy assessment. Genome Res. 8: 175-185.

    PubMed  Google Scholar 

  • Frigerio, J.-M., Dubos, C., Chaumeil, P., Salin, F., Garcia, V., Barré, A. and Plomion, C. in press. Using transcriptome analysis to identify osmotic stress candidate genes in maritime pine (Pinus pinaster Ait.). In: Sustainable Forestry, Wood products & Biotechnology, BIOFOR Proceeding.

  • Gallagher, S.R. (Ed.), 1992. Gus Protocols: using the GUS Gene as a Reporter of Gene Expression. Academic Press, New York, 221 pp.

  • Gordon, D., Abajian, C. and Green P. 1998. Consed: a graphical tool for sequence finishing. Genome Res. 8: 195-202.

    PubMed  Google Scholar 

  • Grivet, L., Glaszmann, J.-C., Vincentz, M., da Silva, F. and Arruda, P. 2003. ESTs as a source for sequence polymorphism discovery in sugarcane: example of the Adh genes. Theor. Appl. Genet. 106: 190-197.

    PubMed  Google Scholar 

  • Gray, I.C., Campbell, D.A. and Spurr, N.K. 2000. Single nucleotide polymorphisms as tools in human genetics. Hum. Mol. Genet. 9: 2403-2408.

    Article  PubMed  Google Scholar 

  • Kinlaw and Neale, 1997.

  • Kota, R., Rudd, S., Facius, A., Kolesov, G., Thiel, T., Zhang, H., Stein, N., Mayer, K. and Graner, A. in press. Snipping polymorphisms from large EST collections in barley (Hordeum vulgare L.). Mol. Gen. Genome.

  • Kruglyak, L. 1997. The use of a genetic map of biallelic markers in linkage studies. Nat. Genet. 17: 21-24.

    Article  PubMed  Google Scholar 

  • Le Provost, G., Paiva, J., Pot, D., Brach, J. and Plomion, C. 2003. Seasonal variation in transcript accumulation in wood forming tissues of maritime pine (Pinus pinaster Ait.) with emphasis on a cell wall glycin rich protein. Planta 217: 820-830.

    Article  PubMed  Google Scholar 

  • Letondal, C. 2001. A Web interface generator for moleular biology programs in Unix, Bioinformatics 17: 73-82.

    Article  Google Scholar 

  • Marth, G.T., Korf, I., Yandell, M.D., Yeh, R.T., Gu, Z., Zakeri, H., Stitziel, N.O., Hillier, L., Kwok, P. and Gish, W.R. 1999. A general approach to single-nucleotide polymorphism discovery. Nat. Genet. 23: 452-456.

    Article  PubMed  Google Scholar 

  • Nickerson, D.A., Tobe, V.O. and Taylor, S.L. 1997. PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nuc. Ac. Res. 25: 2745-2751.

    Article  Google Scholar 

  • Nordborg, M., Borevitz, J.O., Bergelson, J., Berry, C.C., Chory, J., Hagenblad, J., Kreitman, M., Maloof, J.N., Noyes, T., Oefner, P.J., Stahl, E.A. and Weigel, D. 2002. The extent of linkage disequilibrium in Arabidopsis thaliana. Nat. Genet. 30: 190-193.

    Article  PubMed  Google Scholar 

  • Picoult-Newberg, L., Ideker, T.E., Pohl, M.G., Taylor, S.L., Donaldson, M.A., Nickerson, D.A. and Boyce-Jacino, M. 1999. Mining SNPs from EST databases. Genome Res. 9: 167-174.

    PubMed  Google Scholar 

  • Rounsley, S., Xiaoying, L. and Ketchum, K.A. 1998. Largescale sequencing of plant genomes. Curr. Opin. Plant Biol. 1: 136-141.

    Article  PubMed  Google Scholar 

  • Riley, J.H., Allan, C.J., Lai, E. and Roses, A. 2000. The use of single nucleotide polymorphisms in the isolation of common disease genes. Pharmacogenomics 1: 39-47.

    PubMed  Google Scholar 

  • Sachidanandam, R., Weissman, D., Schmidt, S.C., Kakol, J.M., Stein, L.D., Marth, G., Sherry, S., Mullikin, J.C., Mortimore, B.J., Willey, D.L., Hunt, S.E., Cole, C.G., Coggill, P.C., Rice, C.M., Ning, Z., Rogers, J., Bentley, D.R., Kwok, P.Y., Mardis, E.R., Yeh, R.T., Schultz, B., Cook, L., Davenport, R., Dante, M., Fulton, L.

  • Hillier, L., Waterston, R.H., McPherson, J.D., Gilman, B., Schaffner, S., Van Etten, W.J., Reich, D., Higgins, J., Daly, M.J., Blumenstiel, B., Baldwin, J., Stange-Thomann, N., Zody, M.C., Linton, L., Lander, E.S. and Atshuler, D. 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 928- 933.

    Article  PubMed  Google Scholar 

  • Somers, D.L., Kirkpatrick, R., Moniwa, M. and Walsh, A. 2003. Mining single-nucleotide polymorphisms from hexaploid wheat ESTs. Genome 49: 431-437.

    Article  Google Scholar 

  • Thornsberry, J.M., Goodman, M.M., Doebley, J., Kresovich, S., Nielsen, D. and Buckler, E.S. 2001. Dwarf polymorphisms associate with variation in flowering time. Nat. Genet. 28: 286-289.

    Article  PubMed  Google Scholar 

  • Useche, F.J., Gao, G., Harafey, M. and Rafalski, A. 2001. High-throughput identification, database storage and analysis of SNPs in EST sequences. Genome Inform. Ser. Workshop Genome Inform. 12: 194-203.

    Google Scholar 

  • Wilson, M.R., Di Zinno, J.A., Polanskey, D., Replogle, J. and Budowle, B. 1995. Validation of mitochondrial DNA sequencing for forensic casework analysis. Int. J. Legal Med. 108: 68-74.

    PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dantec, L.L., Chagné, D., Pot, D. et al. Automated SNP Detection in Expressed Sequence Tags: Statistical Considerations and Application to Maritime Pine Sequences. Plant Mol Biol 54, 461–470 (2004). https://doi.org/10.1023/B:PLAN.0000036376.11710.6f

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:PLAN.0000036376.11710.6f

Navigation