High Efficiency on Prediction of Translation Initiation Site (TIS) of RefSeq Sequences

  • Cristiane N. Nobre
  • J. Miguel Ortega
  • Antônio de Pádua Braga
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4643)


An important task in the area of gene discovery is the correct prediction of the translation initiation site (TIS). The TIS can correspond to the first AUG, but this is not always the case. This task can be modeled as a classification problem between positive (TIS) and negative patterns. Here we have used Support Vector Machine working with data processed by the class balancing method called Smote (Synthetic Minority Over-sampling Technique). Smote was used because the average imbalance has a positive/negative pattern ratio of around 1:28 for the databases used in this work. As a result we have attained accuracy, precision, sensitivity and specificity values of 99% on average.


Translation Initiation Site Support Vector Machine Smote Imbalanced Data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Zien, A., Ratsch, G., Mika, S., Scholkopf, B., Lemmen, C., Smola, A., Lengauer, T., Muller, K.R.: Engineering support vector machine kernels that recognize translation initiation sites. In: Proc. German Conference on Bioinformatics ’99, pp. 37–43 (1999)Google Scholar
  2. 2.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence and Research 16, 321–357 (2002), Disponível em Google Scholar
  3. 3.
    Pedersen, A.G., Nielsen, H.: Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis. In: Proc. 5th International Conference on Intelligent Systems for Molecular Biology, pp. 226–233 (1997)Google Scholar
  4. 4.
    Stormo, G.D., Schneider, T.D., Gold, L.M.: Characterization of translational Initiation sites. E. coli. Nucleic Acid Res. 10, 2971–2996 (1982)CrossRefGoogle Scholar
  5. 5.
    Haykin, Simon: Redes Neurais: princípios e prática. Bookman (2001)Google Scholar
  6. 6.
    Kozak, M.: Compilation and analysis of sequences upstream from the translational start site in eukaryotic mRNAs. Nucleic. Acids Research 12, 857–872 (1984)CrossRefGoogle Scholar
  7. 7.
    Kozak, M.: An analysis of 5’-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic. Acids Research 15, 8125–8148 (1987)CrossRefGoogle Scholar
  8. 8.
    Kozak, M.: The scanning model for translation: an update. J. Cell. Biol. 108, 229–241 (1989)CrossRefGoogle Scholar
  9. 9.
    Hatzigeorgiou, A.G.: Translation initiation start prediction in human cDNAs with high accuracy. Bioinformatics 18, 343–350 (2002)CrossRefGoogle Scholar
  10. 10.
    Benson, D., Boguski, M., Lipman, D., Ostell, J.: Genbank. Nucleic Acids Research. 25, 1–6 (1997)CrossRefGoogle Scholar
  11. 11.
    Pruitt, K.D., Maglott, D.R.: Refseq and locuslink: NCBI Gene-centered resources. Nucleic Acids Research 29, 137–140 (2001)CrossRefGoogle Scholar
  12. 12.
    Zien, A., Ratsch, G., Mika, S., Scholkopf, B., Lemmen, C., Smola, A., Lengauer, T., Muller, K.-R.: Engineering support vector machine kernels that recognize translation Initiation sites. Bioinformatics 16, 799–807 (2000)CrossRefGoogle Scholar
  13. 13.
    Zeng, F., Yap, R.H., Wong, L.: Using feature generation and feature selection for accurate prediction of translation initiation sites. Genome Informatics Ser Workshop Genome Informatics 13, 192–200 (2002)Google Scholar
  14. 14.
    Liu, H., Han, H., Li, J., Wong, L.: Using amino acid patterns to accurately predict translation initiation sites. Silico Biology 4, 0022 (2004)Google Scholar
  15. 15.
    Li, H., Jiang, T.: A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs. In: Proceedings of the Eighth International Conference on Research in Computational Molecular Biology, San Diego, California, USA, pp. 262–271 (2004)Google Scholar
  16. 16.
    Tzanis, G., Berberidis, C., Alexandridou, A., Vlahavas, I.: Improving the Accuracy of Classifiers for the Prediction of Translation Initiation Sites in Genomic Sequences. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 11–13. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Tzanis, G., Berberidis, C., Vlahavas, I.: A Novel Data Mining Approach for the Accurate Prediction of Translation Initiation Sites. In: Maglaveras, N., et al. (eds.) 7th International Symposium on Biological and Medical Data Analysis, Thessaloniki, Greece, pp. 92–103. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Tzanis, G., Vlahavas, I.: Prediction of Translation Initiation Sites Using Classifier Selection. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds.) SETN 2006. LNCS (LNAI), vol. 3955, pp. 367–377. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Carvalho, B.P.R., Almeida, M.B., Braga, A.P.: Support Vector Machines - um estudo sobre técnicas de treinamento. Technical Report Monogra_a interna no.3, Universidade Federal de Minas Gerais, Belo Horizonte, MG (2002)Google Scholar
  20. 20.
    Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Computational Learing Theory, pp. 144–152 (1992)Google Scholar
  21. 21.
    Burbidge, R., Buxton, B.: An introduction to support vector machines for data mining. In: Sheppee, M. (ed.) Keynote Papers, Young OR12, University of Nottingham, 3.15 Operational Research Society: Operational Research Society (2001)Google Scholar
  22. 22.
    Scholkopf, B., Mika, S., Burges, C.J.C., Knirsch, P., Muller, K.R., Ratsch, G., Smola, A.J.: Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks 10, 1000–1017 (1999)CrossRefGoogle Scholar
  23. 23.
    Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, MIT-Press, Cambridge (1999), Google Scholar
  24. 24.
    Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: Proceedings of 14th International Joint Conference on Artificial Intelligence (IJCAI) (1995)Google Scholar
  25. 25.
    Agarwal, P., Bafna, V.: The ribosome scanning model for translation initiation for gene prediction and full-length cDNA detection. In: Proc. 5th International Conference on Intelligent Systems for Molecular Biology, pp. 2–7 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Cristiane N. Nobre
    • 1
  • J. Miguel Ortega
    • 2
  • Antônio de Pádua Braga
    • 3
  1. 1.Bioinformática, UFMG 
  2. 2.Laboratório de Biodados, ICB, UFMG 
  3. 3.Engenharia Eletrônica, UFMG 

Personalised recommendations