Improving the Accuracy of Classifiers for the Prediction of Translation Initiation Sites in Genomic Sequences
The prediction of the Translation Initiation Site (TIS) in a genomic sequence is an important issue in biological research. Although several methods have been proposed to deal with this problem, there is a great potential for the improvement of the accuracy of these methods. Due to various reasons, including noise in the data as well as biological reasons, TIS prediction is still an open problem and definitely not a trivial task. In this paper we follow a three-step approach in order to increase TIS prediction accuracy. In the first step, we use a feature generation algorithm we developed. In the second step, all the candidate features, including some new ones generated by our algorithm, are ranked according to their impact to the accuracy of the prediction. Finally, in the third step, a classification model is built using a number of the top ranked features. We experiment with various feature sets, feature selection methods and classification algorithms, compare with alternative methods, draw important conclusions and propose improved models with respect to prediction accuracy.
KeywordsInitiation Codon Translation Initiation Site Amino Acid Pattern Correlation Base Feature Selection Adjusted Accuracy
Unable to display preview. Download preview PDF.
- 1.Cohen, W.: Fast Effective Rule Induction. In: Proceedings of the 12th International Conference on Machine Learning, pp. 80–89. Morgan Kaufmann, Lake Tahoe (1995)Google Scholar
- 2.GenBank Overview, http://www.ncbi.nlm.nih.gov/Genbank/index.html
- 4.John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Francisco (1995)Google Scholar
- 5.Kent Ridge Bio-medical Data Set Repository, http://sdmc.i2r.a-star.edu.sg/rp/
- 9.Kozak, M., Shatkin, A.J.: Migration of 40 S Ribosomal Subunits on Messenger RNA in the Presence of Edeine. Journal of Biological Chemistry 253(18), 6568–6577 (1978)Google Scholar
- 10.Liu, H., Han, H., Li, J., Wong, L.: Using Amino Acid Patterns to Accurately Predict Translation Initiation Sites. Silico Biology 4(3), 255–269 (2004)Google Scholar
- 13.Pedersen, A.G., Nielsen, H.: Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome analysis. In: Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, pp. 226–233. AAAI Press, Menlo Park (1997)Google Scholar
- 15.Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
- 18.Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco (2000)Google Scholar
- 19.Zeng, F., Yap, H., Wong, L.: Using Feature Generation and Feature Selection for Accurate Prediction of Translation Initiation Sites. In: Proceedings of the 13th International Conference on Genome Informatics, Tokyo, Japan, pp. 192–200 (2002)Google Scholar