Skip to main content
Log in

Identification of protein-coding regions in Arabidopsis thaliana genome based on quadratic discriminant analysis

  • Published:
Plant Molecular Biology Aims and scope Submit manuscript


A new method (MZEF) for predicting internal coding exons in genomic DNA sequences has been developed. This method is based on a prediction algorithm that uses the quadratic discriminant function for multivariate statistical pattern recognition. With improved feature measures, an Arabidopsis thaliana-specific implementation of MZEF is completed and made available to the plant genome community.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others


  1. Wooster R et al.: Identification of the breast cancer susceptibility gene BRCA2. Nature 378: 789–792 (1995).

    PubMed  Google Scholar 

  2. Tartaglia LA et al.: Identification and expression cloning of a leptin receptor, OB-R. Cell 83: 1263–1271 (1995).

    Google Scholar 

  3. Editorial: Capitalizing on the genome. Nature Genet 13: 1–5 (1995).

  4. Collins F, Galas D: A new five-year plan for the U.S. Human Genome Project. Science 267: 43–46 (1993).

    Google Scholar 

  5. Zhang, MQ: Identification of protein coding regions in the human genome based on quadratic discriminant analysis. Proc Natl Acad Sci USA 94: 565–568 (1997).

    PubMed  Google Scholar 

  6. Solovyev VV, Salamov AA, Lawrence CB: Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucl Acids Res 22: 5156–5163 (1994).

    PubMed  Google Scholar 

  7. Uberbacher EC, Mural RJ: Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc Natl Acad Sci USA 88: 1261–1265 (1991).

    PubMed  Google Scholar 

  8. Claverie J-M: Computational methods for the identification of genes in vertebrate genomic sequence. Hum Mol Genet 6: 1735–1744 (1997).

    PubMed  Google Scholar 

  9. Kramer D: First plant genome sequencing planned. Nature 383: 208 (1996).

    Google Scholar 

  10. McLachlan GJ: Discriminant Analysis and Statistical Pattern Recognition, John Wiley, New York (1992).

    Google Scholar 

  11. Fisher RA: The use of multiple measurements in taxonomic problems. Ann Eugen 7: 79–188 (1936).

    Google Scholar 

  12. Krzanowski WJ: Principles of Multivariate Analysis, p. 347. Clarendon Press, Oxford (1993).

    Google Scholar 

  13. Korning PG, Hebsgaard SM, Rouze P, Brunak S: Cleaning the GenBank Arabidopsis thaliana data set. Nucl Acids Res 24: 316–320.

  14. Wiebauer K, Herrero J-J, Filipowicz W: Nuclear pre-mRNA processing in plants: distinct modes of 3′-splice-site selection in plants and animals. Mol Cell Biol 8: 2042–2051 (1988).

    PubMed  Google Scholar 

  15. Waigmann E, Barta A: Processing of chimeric introns in dicot plants: evidence for a close cooperation between 5′ and 3′ splice sites. Nucl Acids Res 20: 75–81 (1992).

    PubMed  Google Scholar 

  16. Goodall GJ, Filipowicz W: The AU-rich sequences present in the introns of plant nuclear pre-mRNAs are required for splicing. Cell 58: 473–483 (1989).

    Article  PubMed  Google Scholar 

  17. Hebsgaard SM, Korning PG, Tolstrup N, Engelbrecht J, Rouze P, Brunak S: Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucl Acids Res 24: 3439–3452 (1996).

    PubMed  Google Scholar 

  18. Tolstrup N, Rouze P, Brunak S: A branch point consensus from Arabidopsis found by non-circular analysis allows for better prediction of acceptor sites. Nucl Acids Res 25: 3159–3163 (1997).

    PubMed  Google Scholar 

  19. Borodovsky M, Mclninch JD: GENMARK: Parallel gene prediction for both DNA strand. Comput Chem 17: 123–133 (1993).

    Google Scholar 

  20. Green P: Genefinder. Unpublished.

  21. Parnell L, Dedhia N, McCombie WR: A statistical analysis of the success of exon prediction algorithims. The 1997 Biolotechnology Conference on the Arabidopsis Genome: From Sequence to Function, Cold Spring Harbor, NY, 11–14, Dec. 1997.

  22. Burge C, Karlin S: Prediction of complete gene structure in human genomic DNA: J Mol Biol 268: 1–17 (1997).

    PubMed  Google Scholar 

  23. Solovyev V, Salamov A: The Gene-Finder computer tools for analysis of human and model organisms genome sequences. ISMB 5: 294–302 (1997).

    PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations


Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, M. Identification of protein-coding regions in Arabidopsis thaliana genome based on quadratic discriminant analysis. Plant Mol Biol 37, 803–806 (1998).

Download citation

  • Issue Date:

  • DOI: