Abstract
HIV-1 protease has a broad and complex substrate specificity. The discovery of an accurate, robust, and rapid method for predicting the cleavage sites in proteins by HIV protease would greatly expedite the search for inhibitors of HIV protease. During the last two decades, various methods have been developed to explore the specificity of HIV protease cleavage activity. However, because little advancement has been made in the understanding of HIV-1 protease cleavage site specificity, not much progress has been reported in either extracting effective methods or maintaining high prediction accuracy. In this article, a theoretical framework is developed, based on the kernel method for dimensionality reduction and prediction for HIV-1 protease cleavage site specificity. A nonlinear dimensionality reduction kernel method, based on manifold learning, is proposed to reduce the high dimensions of protease specificity. A support vector machine is applied to predict the protease cleavage. Superior performance in comparison to that previously published in literature is obtained using numerical simulations showing that the basic specificities of the HIV-1 protease are maintained in reduction feature space, and by combining the nonlinear dimensionality reduction algorithm with a support vector machine classifier.
Similar content being viewed by others
References
Rögnvaldsson T, You L (2004) Why neural networks should not be used for HIV-1 protease cleavage site prediction. Bioinformatics 20(11):1702–1709
Beck ZQ, Hervio L, Dawson PE, Elder JE, Madison EL (2000) Identification of efficiently cleaved substrates for HIV-1 protease using a phage display library and use in inhibitor development. Virology 274:391–401
Thompson TB, Chou KC, Zheng C (1995) Neural network prediction of the HIV-1 protease cleavage sites. J Theor Biol 177:369–379
Cai YD, Chou KC (1998) Artificial neural network model for predicting HIV protease cleavage sites in protein. Adv Eng Softw 29:119–128
Narayanan A, Wu X, Yang Z (2002) Mining viral protease data to extract cleavage knowledge. Bioinformatics 18:S5–S13
Cai YD, Liu XJ, Xu XB, Chou KC (2002) Support vector machines for predicting HIV protease cleavage sites in protein. J Comput Chem 23:267–274
Brik A, Wong C (2003) Hiv-1 protease: mechanism and drug discovery. Org Biomol Chem 1:5–14
Dauber D, Ziermann R, Parkin N, Maly D, Mahrus S, Harris J, Ellman J, Petropoulos C, Craik C (2002) Altered substrate specificity of drug-resistant human immunodeficiency virus type 1 protease. J Virol 76:1359–1368
Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10:1299–1319
Pearl LH, Taylor WR (1987) A structural model for the retroviral proteases. Nature 329:351–354
Miller M, Schneider J, Sathayanarayana BK, Toth MV, Marshall GR, Clawson L, Selk L, Kent SB, Wlodawer A (1989) A structure of complex of synthetic HIV-1 protease with substrate-based inhibitor at 2.3 A resolution. Science 246:1149–1152
You L, Garwicz D, Rognvaldsson T (2005) Comprehensive bioinformatic analysis of the specificity of human immunodeficiency virus type 1 protease. J Virol 79(19):12477–12486
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Saul L, Roweis S (2002) Think globally, fit locally: unsupervised learning of nonlinear manifolds. Technical Report MS CIS-02-18, University of Pennsylvania, 37:134–135
Vapnik VN (1998) Statistical learning theory. John Wiley, New York
Qian Z, Cai YD, Li Y (2006) A novel computational method to predict transcription factor DNA binding preference. Biochem Biophys Res Commun 348:1034–1037
Kim HK, Choi IJ, Kim HS et al (2004) DNA microarray analysis of the correlation between gene expression patterns and acquired resistance to 5-FU/cisplatin in gastric cancer. Biochem Biophys Res Commun 316:781–789
Lukasz K, Ke C (2007) Prediction of protein structural class for the twilight zone sequences. Biochem Biophys Res Commun 357:453–460
Qian N, Sejnowskij TJ (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202:865–884
Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PA (1995) A second generation force field for the simulation of proteins, nucleic acids and organic molecules. J Am Chem Soc 117:5179–5197
Poorman RA, Tomasselli AG, Heinrikson RL, Kédy FJ (1991) A cumulative specificity model for proteases from human immunodeficiency virus types 1 and 2, inferred from statistical analysis of an extended substrate data base. J Biol Chem 266(22):14554–14561
Begg R, Kamruzzaman J (2005) A machine learning approach for automated recognition of movement patterns using basic, kinetic and kinematic gait data. J Biomech 38:401–408
Acknowledgments
This study is supported by Foundation of National Natural science No.10671030. We are very grateful to Prof. J.F. Wang for guidance on information acquisition and technical assistance on proteases.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, X., Hu, H. & Shu, L. Predicting human immunodeficiency virus protease cleavage sites in nonlinear projection space. Mol Cell Biochem 339, 127–133 (2010). https://doi.org/10.1007/s11010-009-0376-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11010-009-0376-y