Machine Learning Techniques Applied to the Cleavage Site Prediction Problem
The Genome of the Potyviridae virus family is usually expressed as a polyprotein which can be divided into ten proteins through the action of enzymes or proteases which cut the chain in specific places called cleavage sites. Three different techniques were employed to model each cleavage site: Hidden Markov Models (HMM), grammatical inference OIL algorithm (OIL), and Artificial Neural Networks (ANN).
Based on experimentation, the Hidden Markov Model has the best classification performance as well as a high robustness in relation to class imbalance. However, the Order Independent Language (OIL) algorithm is found to exhibit the ability to improve when models are trained using a greater number of samples without regard to their huge imbalance.
KeywordsHide Markov Model Cleavage Site Negative Sample Machine Learning Technique Vote System
Unable to display preview. Download preview PDF.
- 4.Álvarez, G.I.: Estudio de la mezcla de estados determinista y no determinista en el diseño de algoritmos para inferencia gramatical de lenguajes regulares. PhD thesis, Universitad Politécnica de Valéncia, Departamento de Sistemas Informáticos y Computación (2008)Google Scholar
- 7.Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River (1998)Google Scholar