Abstract
The Genome of the Potyviridae virus family is usually expressed as a polyprotein which can be divided into ten proteins through the action of enzymes or proteases which cut the chain in specific places called cleavage sites. Three different techniques were employed to model each cleavage site: Hidden Markov Models (HMM), grammatical inference OIL algorithm (OIL), and Artificial Neural Networks (ANN).
Based on experimentation, the Hidden Markov Model has the best classification performance as well as a high robustness in relation to class imbalance. However, the Order Independent Language (OIL) algorithm is found to exhibit the ability to improve when models are trained using a greater number of samples without regard to their huge imbalance.
The translation for publication in English was done by John Field Palencia Roth, assistant professor in the Department of Communication and Language of the Faculty of Humanities and Social Sciences at the Pontificia Universidad Javeriana Cali. This work is funded by the Departamento Administrativo de Ciencia, Tecnología e Innovación de Colombia ( COLCIENCIAS) under the grant project code 1251-521-28290.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bendtsen, J.D., Nielsen, H., von Heijne, G., Brunak, S.: Improved prediction of signal peptides: SignalP 3.0. Journal of Molecular Biology 340(4), 783–795 (2004)
Nielsen, H., Brunak, S., von Heijne, G.: Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Engineering 12(1), 3–9 (1999)
Leversen, N.A., de Souza, G.A., Målen, H., Prasad, S., Jonassen, I., Wiker, H.G.: Evaluation of signal peptide prediction algorithms for identification of mycobacterial signal peptides using sequence data from proteomic methods. Microbiology 155(7), 2375–2383 (2009)
Álvarez, G.I.: Estudio de la mezcla de estados determinista y no determinista en el diseño de algoritmos para inferencia gramatical de lenguajes regulares. PhD thesis, Universitad Politécnica de Valéncia, Departamento de Sistemas Informáticos y Computación (2008)
Garćia, P., de Parga, M.V., Álvarez, G.I., Ruiz, J.: Universal automata and NFA learning. Theoretical Computer Science 407(1-3), 192–202 (2008)
Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River (1998)
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5), 412–424 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alvarez, G.I., Bravo, E., Linares, D., Vargas, J.F., Velasco, J.A. (2013). Machine Learning Techniques Applied to the Cleavage Site Prediction Problem. In: Castro, F., Gelbukh, A., González, M. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2013. Lecture Notes in Computer Science(), vol 8265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45114-0_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-45114-0_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45113-3
Online ISBN: 978-3-642-45114-0
eBook Packages: Computer ScienceComputer Science (R0)