A Novel Data Mining Approach for the Accurate Prediction of Translation Initiation Sites

  • George Tzanis
  • Christos Berberidis
  • Ioannis Vlahavas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4345)


In an mRNA sequence, the prediction of the exact codon where the process of translation starts (Translation Initiation Site – TIS) is a particularly important problem. So far it has been tackled by several researchers that apply various statistical and machine learning techniques, achieving high accuracy levels, often over 90%. In this paper we propose a mahine learning approach that can further improve the prediction accuracy. First, we provide a concise review of the literature in this field. Then we propose a novel feature set. We perform extensive experiments on a publicly available, real world dataset for various vertebrate organisms using a variety of novel features and classification setups. We evaluate our results and compare them with a reference study and show that our approach that involves new features and a combination of the Ribosome Scanning Model with a meta-classifier shows higher accuracy in most cases.


Feature Selection Weighted Vote Reference Study Gain Ratio Translation Initiation Site 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aha, D., Kibler, D.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)Google Scholar
  2. 2.
    Benson, D., Boguski, M., Lipman, D., Ostell, J.: Genbank. Nucleic Acids Research 25, 1–6 (1997)CrossRefGoogle Scholar
  3. 3.
    Cohen, W.: Fast Effective Rule Induction. In: Proceedings of the 12th International Conference on Machine Learning, pp. 80–89. Morgan Kaufmann, Lake Tahoe, USA (1995)Google Scholar
  4. 4.
    Hatzigeorgiou, A.: Translation Initiation Start Prediction in Human cDNAs with High Accuracy. Bioinformatics 18(2), 343–350 (2002)CrossRefGoogle Scholar
  5. 5.
    John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Mateo, USA (1995)Google Scholar
  6. 6.
    Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: Proceedings of 14th International Joint Conference on Artificial Intelligence (IJCAI) (1995)Google Scholar
  7. 7.
    Kohavi, R.: The Power of Decision Tables. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912, pp. 174–189. Springer, Heidelberg (1995)Google Scholar
  8. 8.
    Kozak, M.: An Analysis of 5’-Noncoding Sequences from 699 Vertebrate Messenger RNAs. Nucleic Acids Research 15(20), 8125–8148 (1987)CrossRefGoogle Scholar
  9. 9.
    Kozak, M.: The Scanning Model for Translation: An Update. The Journal of Cell Biology 108(2), 229–241 (1989)CrossRefGoogle Scholar
  10. 10.
    Kozak, M., Shatkin, A.J.: Migration of 40 S Ribosomal Subunits on Messenger RNA in the Presence of Edeine. Journal of Biological Chemistry 253(18), 6568–6577 (1978)Google Scholar
  11. 11.
    Li, G., Leong, T.-Y., Zhang, L.: Translation Initiation Sites Prediction with Mixture Gaussian Models in Human cDNA Sequences. IEEE Transactions on Knowledge and Data Engineering 8(17), 1152–1160 (2005)CrossRefGoogle Scholar
  12. 12.
    Liu, H., Han, H., Li, J., Wong, L.: Using Amino Acid Patterns to Accurately Predict Translation Initiation Sites. Silico Biology 4(3), 255–269 (2004)Google Scholar
  13. 13.
    Liu, H., Wong, L.: Data Mining Tools for Biological Sequences. Journal of Bioinformatics and Computational Biology 1(1), 139–168 (2003)CrossRefGoogle Scholar
  14. 14.
    Nadershahi, A., Fahrenkrug, S.C., Ellis, L.B.M.: Comparison of computational methods for identifying translation initiation sites in EST data. BMC Bioinformatics 5(14) (2004)Google Scholar
  15. 15.
    Nishikawa, T., Ota, T., Isogai, T.: Prediction whether a Human cDNA Sequence Contains Initiation Codon by Combining Statistical Information and Similarity with Protein Sequences. Bioinformatics 16(11), 960–967 (2000)CrossRefGoogle Scholar
  16. 16.
    Pedersen, A.G., Nielsen, H.: Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome analysis. In: Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, pp. 226–233. AAAI Press, Menlo Park (1997)Google Scholar
  17. 17.
    Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)Google Scholar
  18. 18.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  19. 19.
    Salamov, A.A., Nishikawa, T., Swindells, M.B.: Assessing Protein Coding Region Integrity in cDNA Sequencing Projects. Bioinformatics 14(5), 384–390 (1998)CrossRefGoogle Scholar
  20. 20.
    Stormo, G.D., Schneider, T.D., Gold, L., Ehrenfeucht, A.: Use of the ’Perceptron’ Algorithm to Distinguish Translational Initiation Sites in E. coli. Nucleic Acids Research 10(9), 2997–3011 (1982)CrossRefGoogle Scholar
  21. 21.
    Tzanis, G., Berberidis, C., Alexandridou, A., Vlahavas, I.: Improving the Accuracy of Classifiers for the Prediction of Translation Initiation Sites in Genomic Sequences. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 426–436. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  22. 22.
    Tzanis, G., Vlahavas, I.: Prediction of Translation Initiation Sites Using Classifier Selection. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds.) SETN 2006. LNCS (LNAI), vol. 3955, pp. 367–377. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  23. 23.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco (2000)Google Scholar
  24. 24.
    Zeng, F., Yap, H., Wong, L.: Using Feature Generation and Feature Selection for Accurate Prediction of Translation Initiation Sites. In: Proceedings of the 13th International Conference on Genome Informatics, Tokyo, Japan, pp. 192–200 (2002)Google Scholar
  25. 25.
    Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., Müller, K.R.: Engineering Support Vector Machine Kernels that Recognize Translation Initiation Sites. Bioinformatics 16(9), 799–807 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • George Tzanis
    • 1
  • Christos Berberidis
    • 1
  • Ioannis Vlahavas
    • 1
  1. 1.Department of InformaticsAristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations