Advertisement

Machine Learning Techniques Applied to the Cleavage Site Prediction Problem

  • Gloria Inés Alvarez
  • Enrique Bravo
  • Diego Linares
  • Jheyson Faride Vargas
  • Jairo Andrés Velasco
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8265)

Abstract

The Genome of the Potyviridae virus family is usually expressed as a polyprotein which can be divided into ten proteins through the action of enzymes or proteases which cut the chain in specific places called cleavage sites. Three different techniques were employed to model each cleavage site: Hidden Markov Models (HMM), grammatical inference OIL algorithm (OIL), and Artificial Neural Networks (ANN).

Based on experimentation, the Hidden Markov Model has the best classification performance as well as a high robustness in relation to class imbalance. However, the Order Independent Language (OIL) algorithm is found to exhibit the ability to improve when models are trained using a greater number of samples without regard to their huge imbalance.

Keywords

Hide Markov Model Cleavage Site Negative Sample Machine Learning Technique Vote System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bendtsen, J.D., Nielsen, H., von Heijne, G., Brunak, S.: Improved prediction of signal peptides: SignalP 3.0. Journal of Molecular Biology 340(4), 783–795 (2004)CrossRefGoogle Scholar
  2. 2.
    Nielsen, H., Brunak, S., von Heijne, G.: Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Engineering 12(1), 3–9 (1999)CrossRefGoogle Scholar
  3. 3.
    Leversen, N.A., de Souza, G.A., Målen, H., Prasad, S., Jonassen, I., Wiker, H.G.: Evaluation of signal peptide prediction algorithms for identification of mycobacterial signal peptides using sequence data from proteomic methods. Microbiology 155(7), 2375–2383 (2009)CrossRefGoogle Scholar
  4. 4.
    Álvarez, G.I.: Estudio de la mezcla de estados determinista y no determinista en el diseño de algoritmos para inferencia gramatical de lenguajes regulares. PhD thesis, Universitad Politécnica de Valéncia, Departamento de Sistemas Informáticos y Computación (2008)Google Scholar
  5. 5.
    Garćia, P., de Parga, M.V., Álvarez, G.I., Ruiz, J.: Universal automata and NFA learning. Theoretical Computer Science 407(1-3), 192–202 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  7. 7.
    Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River (1998)Google Scholar
  8. 8.
    Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5), 412–424 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Gloria Inés Alvarez
    • 1
  • Enrique Bravo
    • 2
  • Diego Linares
    • 1
  • Jheyson Faride Vargas
    • 1
  • Jairo Andrés Velasco
    • 1
  1. 1.Pontificia Universidad Javeriana CaliColombia
  2. 2.Universidad del ValleColombia

Personalised recommendations