Abstract
The detection of the cleavage side in Chip-seq data is one of the main interests to find the lock-and-key relationship between enzymes and prohibits in certain diseases such as AIDS and to produce the proper inhibitors for these illnesses. For this detection, different approaches like support vector machines and artificial neural networks have been suggested. In this study, we use the hidden Markov model (HMM) for the cleavage site detection. In our application, initially, we comprehensively explain the mathematical details of HMM and the inference of the model parameters, and then we discuss the effect of various clustering approaches both in feature selection and state formation. We demonstrate the calculation of each step in a toy and benchmark dataset and evaluate the accuracy of estimates with other approaches in the literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bezdek J.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Cai, Y.D., Chou, K.C.: Artificial neural network model for predicting HIV protease cleavage sites in protein. Adv. Eng. Softw. 29(2), 119–128 (1998)
Cai, Y., Liu, X., Xu, X., Chou, K.: Support vector machines for predicting HIV protease cleavage sites in protein. J. Comput. Chem. 23, 267–274 (2002)
Chormungea, S., Jenab, S.: Correlation based feature selection with clustering for high dimensional data. J. Electr. Syst. Inf. Technol. (2018). doi: https://doi.org/10.1016/j.jesit.2017.06.004
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge, UK (1998)
Eisner, J.: An interactive spreadsheet for teaching the forward-backward algorithm. In: Proc. of the ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL 10–18 (2002)
Gallo, R.C., Salahuddin, S.Z., Popovic, M., Shearer, G.M., Kaplan, M., Haynes, B.F., Palker, T.J., Redfield, R., Oleske, J., Safai, B., White, Cl., Foster, P., Markham, P.D.: Frequent detect on and isolation of cytopathic retroviruses (HTLV-III) from patients with AIDS and at risk for AIDS. Science 224(4648), 500–503 (1984)
Gustafson, D.E., Kessel, W.C.: Fuzzy clustering with a fuzzy covariance matrix. Proc. IEEE CDC 761–766 (1978)
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A K-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28, 100–108 (1979)
Jaeger, S., Chen, S.-S.: Information fusion for biological prediction. J. Data Sci. 8, 269–288 (2010)
Jayavardhana Rama, G.L., Palaniswami, M.: Cleavage knowledge extraction in HIV-1 protease using hidden Markov model. In: Proc. 2nd International Conference on Intelligent Sensing and Information Processing, pp. 469–473 (2005)
Jianying, H., Brown, M.K., Turin, W.: HMM based online handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 18, 1039–1045 (1996)
Juang, B., Rabiner, L.: Hidden Markov models for speech recognition. Technometrics 33(3), 251–272 (1991)
Kim, G., Kim, Y., Lim, H., Kim, H.: An MLP-based feature subset selection for HIV-1 protease cleavage site analysis. Artif. Intell. Med. 48, 83–89 (2010)
Kohl, N.E., Emini, E.A., Schlief, W.A., Davis, L.J., Heimbach, J., Dixon, R.A.F., Scolnik, E.M., Sigal, I.S.: Active human immunodeficiency virus protease is required for viral infectivity. Proc. Nutl. Sci. USA. 85(15), 4686–4690 (1988)
Kouemou, G.L.: History and Theoretical Basics of Hidden Markov Models. Hidden Markov Models Przemyslaw Dymarski, IntechOpen (2011). doi: https://doi.org/10.5772/15205
Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for Web mining. IEEE Trans. Fuzzy Syst. 9(4), 595–607 (2001)
Miller, M., Schneider, J., Sathyanarayana, B.K., Toth, M.V., Marshall, G.R., Clawson, L., Selk, L., Kent, S.B.H., Wlodawer, A.: Structure of complex of synthetic HIV-l protease with a substrate-based inhibitor at 2.3 A resolution. Science 246, 1149–1152 (1989)
Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)
Munch, K., Krogh, A.: Automatic generation of gene finders for eukaryotic species. BMC Bioinform. 7, 263 (2006)
Murtagh, F.: Multidimensional Clustering Algorithms. Physica-Verlag (1985)
Nakai, K., Kidera, A., Kanehisa, M.: Cluster analysis of amino acid indices for prediction of protein structure and function. Protein Eng. 2, 93–100 (1988)
Nanni, L.: Comparison among feature extraction methods for HIV-1 protease cleavage site prediction. Pattern Recognit. 39(4), 711–713 (2006)
Niu, B., Yuan, X.C., Roeper, P., Su, Q., Peng, C.R., Yin, J.Y., Ding, J., Li, H., Lu, W.C.: HIV-1 protease cleavage site prediction based on two-stage feature selection method. Protein Pept. Lett. 20, 290–298 (2013)
Pachter, L., Alexandersson, M., Cawley, S.: Applications of generalized pair hidden Markov models to alignment and gene finding problems. J. Comput. Biol. 9, 389–399 (2002)
Park, H., Jun, C.: A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 36, 3336–3341 (2009)
Rögnvaldsson, T., You, L., Garwicz, D.: State of the art prediction of HIV-1 protease cleavage sites. Bioinformatics 31, 1204–1210 (2015)
Schroff, R.W., Gottlieb, M.S., Prince, H.E., Chai, L.L., Fahey, J.L.: Immunological studies of homosexual men with immunodeficiency and Kaposi’s sarcoma. Clin. Immunol. Immunopathol. 27(3), 300–314 (1983)
Sesane, M., Geyer, S.: The perceptions of community members regarding the role of social workers in enhancing social capital in metropolitan areas to manage HIV and AIDS. Social Work 53(1), 1–26 (2017)
Starner, T., Pentland, A.: Real-time American Sign Language recognition from video using hidden Markov models. In: Proc. of International Symposium on Computer Vision - ISCV, pp. 265–270 (1995)
Stultz, C.M.: Structural analysis based on state-space modeling. Protein Sci. 2, 305–314 (1993)
Strug, D.L., Grube, B.A., Beckerman, N.L.: Challenges and changing roles in HIV/AIDS. Soc. Work Health Care 35(4), 1–19 (2008)
Thompson, T.B., Chou, K.C., Zheng, C.: Neural network prediction of the HIV-1 protease cleavage sites. J. Theor. Biol. 177(4), 369–379 (1995)
Turhal, U., Gök, M., Durgut, A.: Comparison among feature encoding techniques for HIV-1 protease cleavage specificity. Int. J. Intell. Syst. Appl. Eng. 3(2), 62–66 (2015)
White, J.V.: Protein classification by stochastic modeling and optimal filtering of amino-acid sequences. Math. Biosci. 119, 35–75 (1994)
Zhang, C., Bickis, M.G., Wu, F.X., Kusalik, A.J.: Optimally-connected hidden Markov models for predicting MHC-binding peptides. J. Bioinform. Comput. Biol. 4(5), 959–980 (2006)
Acknowledgements
The authors would like to thank the EU 7th Framework project, called COSTNET (Project no: CA15109), and the BAP project at the Middle East Technical University (Project no: BAP-08-11-2017-035) for their supports.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Dar, E.D., Purutçuoğlu, V., Purutçuoğlu, E. (2020). Detection of HIV-1 Protease Cleavage Sites via Hidden Markov Model and Physicochemical Properties of Amino Acids. In: Machado, J., Özdemir, N., Baleanu, D. (eds) Numerical Solutions of Realistic Nonlinear Phenomena. Nonlinear Systems and Complexity, vol 31. Springer, Cham. https://doi.org/10.1007/978-3-030-37141-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-37141-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37140-1
Online ISBN: 978-3-030-37141-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)