Abstract
A new method (MZEF) for predicting internal coding exons in genomic DNA sequences has been developed. This method is based on a prediction algorithm that uses the quadratic discriminant function for multivariate statistical pattern recognition. With improved feature measures, an Arabidopsis thaliana-specific implementation of MZEF is completed and made available to the plant genome community.
Similar content being viewed by others
References
Wooster R et al.: Identification of the breast cancer susceptibility gene BRCA2. Nature 378: 789–792 (1995).
Tartaglia LA et al.: Identification and expression cloning of a leptin receptor, OB-R. Cell 83: 1263–1271 (1995).
Editorial: Capitalizing on the genome. Nature Genet 13: 1–5 (1995).
Collins F, Galas D: A new five-year plan for the U.S. Human Genome Project. Science 267: 43–46 (1993).
Zhang, MQ: Identification of protein coding regions in the human genome based on quadratic discriminant analysis. Proc Natl Acad Sci USA 94: 565–568 (1997).
Solovyev VV, Salamov AA, Lawrence CB: Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucl Acids Res 22: 5156–5163 (1994).
Uberbacher EC, Mural RJ: Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc Natl Acad Sci USA 88: 1261–1265 (1991).
Claverie J-M: Computational methods for the identification of genes in vertebrate genomic sequence. Hum Mol Genet 6: 1735–1744 (1997).
Kramer D: First plant genome sequencing planned. Nature 383: 208 (1996).
McLachlan GJ: Discriminant Analysis and Statistical Pattern Recognition, John Wiley, New York (1992).
Fisher RA: The use of multiple measurements in taxonomic problems. Ann Eugen 7: 79–188 (1936).
Krzanowski WJ: Principles of Multivariate Analysis, p. 347. Clarendon Press, Oxford (1993).
Korning PG, Hebsgaard SM, Rouze P, Brunak S: Cleaning the GenBank Arabidopsis thaliana data set. Nucl Acids Res 24: 316–320.
Wiebauer K, Herrero J-J, Filipowicz W: Nuclear pre-mRNA processing in plants: distinct modes of 3′-splice-site selection in plants and animals. Mol Cell Biol 8: 2042–2051 (1988).
Waigmann E, Barta A: Processing of chimeric introns in dicot plants: evidence for a close cooperation between 5′ and 3′ splice sites. Nucl Acids Res 20: 75–81 (1992).
Goodall GJ, Filipowicz W: The AU-rich sequences present in the introns of plant nuclear pre-mRNAs are required for splicing. Cell 58: 473–483 (1989).
Hebsgaard SM, Korning PG, Tolstrup N, Engelbrecht J, Rouze P, Brunak S: Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucl Acids Res 24: 3439–3452 (1996).
Tolstrup N, Rouze P, Brunak S: A branch point consensus from Arabidopsis found by non-circular analysis allows for better prediction of acceptor sites. Nucl Acids Res 25: 3159–3163 (1997).
Borodovsky M, Mclninch JD: GENMARK: Parallel gene prediction for both DNA strand. Comput Chem 17: 123–133 (1993).
Green P: Genefinder. Unpublished.
Parnell L, Dedhia N, McCombie WR: A statistical analysis of the success of exon prediction algorithims. The 1997 Biolotechnology Conference on the Arabidopsis Genome: From Sequence to Function, Cold Spring Harbor, NY, 11–14, Dec. 1997.
Burge C, Karlin S: Prediction of complete gene structure in human genomic DNA: J Mol Biol 268: 1–17 (1997).
Solovyev V, Salamov A: The Gene-Finder computer tools for analysis of human and model organisms genome sequences. ISMB 5: 294–302 (1997).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Zhang, M. Identification of protein-coding regions in Arabidopsis thaliana genome based on quadratic discriminant analysis. Plant Mol Biol 37, 803–806 (1998). https://doi.org/10.1023/A:1006023912378
Issue Date:
DOI: https://doi.org/10.1023/A:1006023912378