Functional Protein Prediction Using HMM Based Feature Representation and Relevance Analysis
The prediction of subcellular location aims to understand the biological processes being carried out within the cell. Here, a feature representation methodology is proposed to identify subcellular locations in gram-positive bacteria. Regarding this, each considered class is employed to train a hidden Markov model, and the probability of a sequence of amino acids, being generated by each of the trained models is employed as a feature in further classification stage. Our proposal is tested on a well known database, containing amino acids sequences of bacteria. For concrete testing, a percentage of less than 80% identity is studied, using a multi-label Support Vector Machines with soft margin classifier. Attained results show that our approach improves issues raised in PfamFeat. Moreover, it seems to be an appropriate tool for predicting subcellular location proteins.
KeywordsHMM Multiclass SVM Protein Subcellular Localization
Unable to display preview. Download preview PDF.
- 5.Punta, M., Coggill, P.C., Eberhardt, R.Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., Clements, J., Heger, A., Holm, L., Sonnhammer, E.L.L., Eddy, S.R., Bateman, A., Finn, R.D.: The Pfam protein families database. Nucleic Acids Research 40(Database issue), D290–D301 (2012)Google Scholar
- 8.Scholkopg, B., Smola, A.J.: Learning with Kernels. The MIT Press, Cambridge (2002)Google Scholar
- 9.Rey, S., Acab, M., Gardy, J.L., Laird, M.R., Lambert, C., Brinkman, F.S., et al.: Psortdb: a protein subcellular localization database for bacteria. Nucleic Acids Research 33(suppl. 1), D164–D168 (2005)Google Scholar