Abstract
Motivation: It was found that high accuracy splicing-site recognition of rice (Oryza sativa L.) DNA sequence is especially difficult. We described a new method for the splicing-site recognition of rice DNA sequences. Method: Based on the intron in eukaryotic organisms conforming to the principle of GT-AG, we used support vector machines (SVM) to predict the splicing sites. By machine learning, we built a model and used it to test the effect of the test data set of true and pseudo splicing sites. Results: The prediction accuracy we obtained was 87.53% at the true 5′ end splicing site and 87.37% at the true 3′ end splicing sites. The results suggested that the SVM approach could achieve higher accuracy than the previous approaches.
Similar content being viewed by others
References
Burge, C., 1997. Identification of Genes in Human Genomic DNA. Doctoral Thesis, Stanford University.
Burbidge, R., Trotter, M., Buxton, B. and Holden, S., 2001. Drug design by machine learning: support vector machines for pharmaceutical data analysis.Computers and Chemistry,26: 5–14.
Chang, C. C., Hsu, C. W. and Lin, C. J., 2000. The analysis of decomposition methods for support vector machines.IEEE Trans. Neural Networks,11(4): 1003–1008.
Cortes, C. and Vapnik, V., 1995. Support-Vector networks.Machine learning,20: 275–297.
Gao, J. R. and Ye, L. B., 1999. Molecular Biology. Wuhan University Press, Wuhan, p. 135–138 (in Chinese).
Hua, S. J. and Sun, Z. R., 2001a. A Novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach.J. Mol. Biol.,308: 397–407.
Hua, S. J. and Sun, Z. R., 2001b. Support vector machine approach for protein sub cellular localization prediction.Bioinformatics,17(8): 721–728.
Ogura, H. and Hideyuki, Agata, 1997. A study of learning splicing site of DNA sequence by neural networks.Comput. Biol. Med.,27(1): 67–75.
Osuna, E., Freund, R. and Girosi, F., 1997. Support Vector Machines: Training and Applications. AI Memo 1602, Massachusetts Institute of Technology.
Sun, J., Xu, J. and Lin, L. J., 1993. Using neural networks to recognize the splicing sites of mRNA.Transactions of Biophysical Sinica,9(1): 127–131 (in Chinese).
Tong, K. Z., 1998. Gene and its Expression. Science Press, Beijing.
Yu, J., Hu, S. N. and Wang, J., 2002. A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. Indica).Science,296: 79–92.
Vapnik, V., 2000. The Nature of Statistical Learning Theory. Traslated by Zhang Yuegong, Tsinghua University Press, Beijing (in Chinese).
Author information
Authors and Affiliations
Additional information
Project partially supported by the Start-up Funding of Zhejiang University to Chen Liang-biao
Rights and permissions
About this article
Cite this article
Si-hua, P., Long-jiang, F., Xiao-ning, P. et al. Splicing-site recognition of rice (Oryza sativa L.) DNA sequences by support vector machines. J. Zhejiang Univ. Sci. A 4, 573–577 (2003). https://doi.org/10.1631/jzus.2003.0573
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.2003.0573