Prediction of Translation Initiation Sites Using Classifier Selection

  • George Tzanis
  • Ioannis Vlahavas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3955)


The prediction of the translation initiation site (TIS) in a genomic sequence is an important issue in biological research. Several methods have been proposed to deal with it. However, it is still an open problem. In this paper we follow an approach consisting of a number of steps in order to increase TIS prediction accuracy. First, all the sequences are scanned and the candidate TISs are detected. These sites are grouped according to the length of the sequence upstream and downstream them and a number of features is generated for each one. The features are evaluated among the instances of every group and a number of the top ranked ones are selected for building a classifier. A new instance is assigned to a group and is classified by the corresponding classifier. We experiment with various feature sets and classification algorithms, compare with alternative methods and draw important conclusions.


Translation Initiation Site Amino Acid Pattern Initial Dataset Classifier Selection Multiple Classifier System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Frank, E., Witten, I.H.: Generating Accurate Rule Sets Without Global Optimization. In: Proceedings of the 15th International Conference on Machine Learning, Madison, Wisconson, USA, pp. 144–151 (1998)Google Scholar
  2. 2.
  3. 3.
    Hatzigeorgiou, A.: Translation Initiation Start Prediction in Human cDNAs with High Accuracy. Bioinformatics 18(2), 343–350 (2002)CrossRefGoogle Scholar
  4. 4.
    John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Mateo (1995)Google Scholar
  5. 5.
    Kent Ridge Biomedical Data Set Repository,
  6. 6.
    Kozak, M.: An Analysis of 5’-Noncoding Sequences from 699 Vertebrate Messenger RNAs. Nucleic Acids Research 15(20), 8125–8148 (1987)CrossRefGoogle Scholar
  7. 7.
    Kozak, M.: The Scanning Model for Translation: An Update. The Journal of Cell Biology 108(2), 229–241 (1989)CrossRefGoogle Scholar
  8. 8.
    Kozak, M., Shatkin, A.J.: Migration of 40 S Ribosomal Subunits on Messenger RNA in the Presence of Edeine. Journal of Biological Chemistry 253(18), 6568–6577 (1978)Google Scholar
  9. 9.
    Li, G., Leong, T.-Y., Zhang, L.: Translation Initiation Sites Prediction with Mixture Gaussian Models in Human cDNA Sequences. IEEE Transactions on Knowledge and Data Engineering 8(17), 1152–1160 (2005)CrossRefGoogle Scholar
  10. 10.
    Liu, H., Han, H., Li, J., Wong, L.: Using Amino Acid Patterns to Accurately Predict Translation Initiation Sites. Silico Biology 4(3), 255–269 (2004)Google Scholar
  11. 11.
    Liu, H., Wong, L.: Data Mining Tools for Biological Sequences. Journal of Bioinformatics and Computational Biology 1(1), 139–168 (2003)CrossRefGoogle Scholar
  12. 12.
    Nishikawa, T., Ota, T., Isogai, T.: Prediction whether a Human cDNA Sequence Contains Initiation Codon by Combining Statistical Information and Similarity with Protein Sequences. Bioinformatics 16(11), 960–967 (2000)CrossRefGoogle Scholar
  13. 13.
    Pedersen, A.G., Nielsen, H.: Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome analysis. In: Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, pp. 226–233. AAAI Press, Menlo Park (1997)Google Scholar
  14. 14.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, California, USA (1993)Google Scholar
  15. 15.
    Salamov, A.A., Nishikawa, T., Swindells, M.B.: Assessing Protein Coding Region Integrity in cDNA Sequencing Projects. Bioinformatics 14(5), 384–390 (1998)CrossRefGoogle Scholar
  16. 16.
    Stormo, G.D., Schneider, T.D., Gold, L., Ehrenfeucht, A.: Use of the ’Perceptron’ Algorithm to Distinguish Translational Initiation Sites in E. coli. Nucleic Acids Research 10(9), 2997–3011 (1982)CrossRefGoogle Scholar
  17. 17.
    Tzanis, G., Berberidis, C., Alexandridou, A., Vlahavas, I.P.: Improving the accuracy of classifiers for the prediction of translation initiation sites in genomic sequences. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 426–436. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  18. 18.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco (2000)Google Scholar
  19. 19.
    Zeng, F., Yap, H., Wong, L.: Using Feature Generation and Feature Selection for Accurate Prediction of Translation Initiation Sites. In: Proceedings of the 13th International Conference on Genome Informatics, Tokyo, Japan, pp. 192–200 (2002)Google Scholar
  20. 20.
    Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., Müller, K.R.: Engineering Support Vector Machine Kernels that Recognize Translation Initiation Sites. Bioinformatics 16(9), 799–807 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • George Tzanis
    • 1
  • Ioannis Vlahavas
    • 1
  1. 1.Department of InformaticsAristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations