Abstract
Different statistical modeling methods (SMMs) are used for nonlinear system classification and regression. On the basis of Bayesian probabilistic inference, Gaussian process (GP) is preliminarily used in the field of quantitative structure-activity relationship (QSAR) but has not yet been applied to quantitative sequence-activity model (QSAM) of biosystems. This paper proposes the application of GP as an alternative tool for the QSAM modeling of peptides. To investigate the modeling performance of GP, three classical peptide panels were used: Angiotensin-I converting enzyme inhibitory dipeptides, bradykinin-potentiating pentapeptides and cationic antimicrobial pentadecapeptides. On this basis, we made a comprehensive comparison between the GP and some widely used SMMs such as PLS, artificial neural network (ANN) and support vector machine (SVM), and gave the conclusions as follow: (1) for those of structurally complicated peptides, particularly the polypeptides, linear PLS was incapable of capturing all dependences hidden in the peptide systems, (2) even in assistance with the monitoring technique, ANN was inclined to be overtrained in the cases of insufficient number of peptide samples, (3) SVM and GP performed best for the three peptide panels. Moreover, since GP was able to correlate the linear and nonlinear-hybrid relationship, it was slightly superior to SVM at most peptide sets.
Similar content being viewed by others
References
Armas RR, Gonzalez-Diaz H, Molina R, Uriarte E (2005) Stochastic-based descriptors studying biopolymers biological properties: extended MARCH-INSIDE methodology describing antibacterial activity of lactoferricin derivatives. Biopolymers 77:247–256. doi:10.1002/bip.20202
Ažman K, Kocijan J (2007) Application of Gaussian processes for black-box modeling of biosystems. ISA Trans 46:443–457. doi:10.1016/j.isatra.2007.04.001
Burden FR (2001) Quantitative structure-activity relationship studies using Gaussian processes. J Chem Inf Comput Sci 41:830–835. doi:10.1021/ci000459c
Chen T, Morris J, Martin E (2007) Gaussian process regression for multivariate spectroscopic calibration. Chemom Intell Lab Syst 87:59–71. doi:10.1016/j.chemolab.2006.09.004
Cho SJ, Zheng W, Tropsha A (1998) Rational design of targeted combinatorial peptide libraries using chemical similarity probe and the inverse QSAR approaches. J Chem Inf Comput Sci 38:259–268. doi:10.1021/ci9700945
Cocchi M, Johansson E (1993) Amino acids characterization by GRID and multivariate data analysis. Quant Struct Act Relat 12:1–8. doi:10.1002/qsar.19930120102
Collantes ER, Dunn WJ (1995) Amino acid side chain descriptors for quantitative structure-activity relationship studies of peptide analogues. J Med Chem 38:2705–2713. doi:10.1021/jm00014a022
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–293
Cushman DW, Ondetti MA, Cheung HS, Antonaccio MJ, Murthy VS, Rubin B (1980) Inhibitors of angiotensin converting enzymes. Adv Exp Med Biol 130:199–225
Dea-Ayuela MA, Perez-Castillo Y, Meneses-Marcel A, Ubeira FM, Bolas-Fernandez F, Chou KC, Gonzalez-Diaz H (2008) HP-Lattice QSAR for dynein proteins: experimental proteomics (2D-electrophoresis, mass spectrometry) and theoretic study of a Leishmania infantum sequence. Bioorg Med Chem 16:7770–7776. doi:10.1016/j.bmc.2008.07.023
Doytchinova IA, Walshe V, Borrow P, Flower DR (2005) Towards the chemometric dissection of peptide-HLA-A*0201 binding affinity: comparison of local and global QSAR models. J Comput Aided Mol Des 19:203–212. doi:10.1007/s10822-005-3993-x
Enot D, Gautier R, Le Marouille J (2001) Gaussian process: an efficient technique to solve quantitative structure-property relationship problems. SAR QSAR Environ Res 12:461–469. doi:10.1080/10629360108035385
Freyhult EK, Andersson K, Gustafsson MG (2003) Structural modeling extends QSAR analysis of antibody-lysozyme interactions to 3D-QSAR. Biophys J 84:2264–2272
Gedeck P, Rohde B, Bartels C (2006) QSAR—how good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets. J Chem Inf Model 46:1924–1936. doi:10.1021/ci050413p
Geladi P, Kowalski B (1986) Partial least squares regression: a tutorial. Anal Chim Acta 185:1–17. doi:10.1016/0003-2670(86)80028-9
Genst ED, Areskoug D, Decanniere K, Muyldermans S, Andersson K (2002) Kinetic and affinity predictions of a protein-protein interaction using multivariate experimental design. J Biol Chem 277:29897–29907. doi:10.1074/jbc.M202359200
Golbraikh A, Tropsha A (2002) Beware of q2!. J Mol Graph Model 20:269–276. doi:10.1016/S1093-3263(01)00123-1
Gonzalez-Diaz H, Vilar S, Santana L, Uriarte E (2007) Medicinal chemistry and bioinformatics—current trends in drugs discovery with networks topological indices. Curr Top Med Chem 7:1015–1029. doi:10.2174/156802607780906771
Gonzalez-Diaz H, Gonzalez-Diaz Y, Santana L, Ubeira FM, Uriarte E (2008) Proteomics, networks and connectivity indices. Proteomics 8:750–778. doi:10.1002/pmic.200700638
Guan P, Doytchinova IA, Walshe VA, Borrow P, Flower DR (2005) Analysis of peptide-protein binding using amino acid descriptors: prediction and experimental verification for human histocompatibility complex HLA-A*0201. J Med Chem 48:7418–7425. doi:10.1021/jm0505258
Gunn S (1998) Support vector machines for classification and regression. Technical report. University of Southampton, Southampton
Haykin S (1999) Neural networks, a comprehensive foundation. Prentice Hall, Upper Saddle River, NJ
Hellberg S, Sjöström M, Wold S (1986) The prediction of bradykinin potentiating potency of pentapeptides. An example of a peptide quantitative structure-activity relationship. Acta Chem Scand B 40:135–140. doi:10.3891/acta.chem.scand.40b-0135
Hellberg S, Sjöström M, Skagerberg B, Wold S (1987) Peptide quantitative structure-activity relationships, a multivariate approach. J Med Chem 30:1126–1135. doi:10.1021/jm00390a003
Hellberg S, Eriksson L, Jonsson J, Lindgren F, Sjöström M, Skagerberg B, Wold S, Andrews P (1991) Minimum analogue peptide sets (MAPS) for quantitative structure-activity relationships. Int J Pept Protein Res 37:414–424
Heravi MJ, Parastar F (2000) Use of artificial neural networks in a QSAR study of anti-HIV activity for a large group of HEPT derivatives. J Chem Inf Comput Sci 40:147–154. doi:10.1021/ci990314+
Jenssen H, Gutteberg TJ, Lejon T (2005) Modeling of anti-HSV activity of lactoferricin analogues using amino acid descriptors. J Pept Sci 11:97–103. doi:10.1002/psc.604
Jenssen H, Hamill P, Hancock REW (2006) Peptide antimicrobial agents. Clin Microbiol Rev 19:491–511. doi:10.1128/CMR.00056-05
Jonsson J, Norberg T, Carlsson L, Gustafsson C, Wold S (1993) Quantitative sequence-activity models (QSAM) tools for sequence design. Nucleic Acids Res 21:733–739. doi:10.1093/nar/21.3.733
Kidera A, Konishi Y, Oka M (1985) Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J Protein Chem 4:23–55. doi:10.1007/BF01025492
Kiryu H, Oshima T, Asai K (2005) Extracting relations between promoter sequences and their strengths from microarray data. Bioinformatics 21:1062–1068. doi:10.1093/bioinformatics/bti094
Ladiwala A, Xia F, Luo Q, Breneman CM, Cramer SM (2006) Investigation of protein retention and selectivity in HIC systems using quantitative structure retention relationship models. Biotechnol Bioeng 93:836–850. doi:10.1002/bit.20771
Lin Z, Wu Y, Zhu B, Ni B, Wang L (2004) Toward the quantitative prediction of T-cell epitopes: QSAR studies on peptides having affinity with the class I MHC molecular HLA-A*0201. J Comput Biol 11:683–694. doi:10.1089/cmb.2004.11.683
Liu W, Meng X, Xu Q, Flower DR, Li T (2006) Quantitative prediction of mouse class I MHC peptide binding affinity using support vector machine regression (SVR) models. BMC Bioinformatics 7:182. doi:10.1186/1471-2105-7-182
MacKay DJC (1998) Introduction to Gaussian processes. In: Bishop CM (ed) Neural networks and machine learning. Springer, Heidelberg
Neal RM (1997) Monte Carlo implementation of Gaussian process models for Bayesian regression and classification. Technical report, Department of Statistics, University of Toronto
O’Hagan A (1978) Curve fitting and optimal design for prediction. J R Stat Soc B 40:1–42
Obrezanova O, Csányi G, Gola JMR, Segall MD (2007) Gaussian processes: a method for automatic QSAR modeling of ADME properties. J Chem Inf Model 47:1847–1857. doi:10.1021/ci7000633
Patel S, Stott IP, Bhakoo M, Elliott P (1998) Patenting computer-designed peptides. J Comput Aided Mol Des 12:543–556. doi:10.1023/A:1008095802767
Polyak BT (1969) The conjugate gradient method in extreme problems. USSR Comput Math Math Phys 9:94–112. doi:10.1016/0041-5553(69)90035-4
Rasmussen CE (1996) Evaluation of Gaussian processes and other methods for non-linear regression. PhD thesis, University of Toronto, Canada
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, MA
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back propagating errors. Nature 323:533–536. doi:10.1038/323533a0
Sandberg M, Eriksson L, Jonsson J, Sjöström M, Wold S (1998) New chemical descriptors for the design of biologically active peptides. A multivariate characterization of 87 amino acids. J Med Chem 41:2481–2491. doi:10.1021/jm9700575
Schlkopf B, Mika S, Burges C (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10:1000–1017. doi:10.1109/72.788641
Schneider G, Schrödl W, Wallukat G, Müller J, Nissen E, Rönspeck W, Wrede P, Kunze R (1998) Peptide design by artificial neural networks and computer-based evolutionary search. Proc Natl Acad Sci USA 95:12179–12184. doi:10.1073/pnas.95.21.12179
Schroeter TS, Schwaighofer A, Mika S, Laak AT, Suelzle D, Ganzer U, Heinrich N, Müller K-R (2007) Predicting lipophilicity of drug-discovery molecules using Gaussian process models. Chem Med Chem 2:1265–1267. doi:10.1002/cmdc.200700041
Schwaighofer A, Schroeter T, Mika S, Laub J, Laak AT, Sulzle D, Ganzer U, Heinrich N, Muller KR (2007) Accurate solubility prediction with error bars for electrolytes: a machine learning approach. J Chem Inf Model 47:407–424. doi:10.1021/ci600205g
Skilling J (2006) Nested sampling for general Bayesiam computations. Bayesian Anal 1:833–860. doi:10.1214/06-BA127
Sneath PH (1966) Relations between chemical structure and biological activity in peptides. J Theor Biol 12:157–195. doi:10.1016/0022-5193(66)90112-3
Tian F, Zhou P, Li Z (2007a) T-scale as a novel vector of topological descriptors for amino acids and its application in QSARs of peptides. J Mol Struct 830:106–115. doi:10.1016/j.molstruc.2006.07.004
Tian F, Zhou P, Lv F, Song R, Li Z (2007b) Three-dimensional holograph vector of atomic interaction field (3D-HoVAIF): a novel rotation-translation invariant 3D structure descriptor and its applications to peptides. J Pept Sci 13:549–566. doi:10.1002/psc.892
Tian F, Li Y, Lv F, Yang Q, Zhou P (2008) In silico quantitative prediction of peptides binding affinity to human MHC molecule: an intuitive quantitative structure-activity relationship approach. Amino Acids (in press). doi:10.1007/s00726-008-0116-8
Tino P, Nabney IT, Williams BS, Losel J, Sun Y (2004) Nonlinear prediction of quantitative structure-activity relationships. J Chem Inf Comput Sci 44:1647–1653. doi:10.1021/ci034255i
Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22:69–77. doi:10.1002/qsar.200390007
Tung C-W, Ho S-Y (2007) POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties. Bioinformatics 23:942–949. doi:10.1093/bioinformatics/btm061
Udaka K, Mamitsuka H, Nakaseko Y, Abe N (2002) Empirical evaluation of a dynamic experiment design method for prediction of MHC class I-binding peptides. J Immunol 169:5744–5753
Ufkes JGR, Visser RJ, Heuver G, van der Meer C (1978) Structure-activity relationships of bradykinin potentiating peptides. Eur J Pharmacol 50:119–122. doi:10.1016/0014-2999(78)90006-7
Ufkes JGR, Visser RJ, Heuver G, Wynne HJ, van der Meer C (1982) Further studies on the structure-activity relationships of bradykinin potentiating peptides. Eur J Pharmacol 79:155–158. doi:10.1016/0014-2999(82)90590-8
Wade D, Englund J (2002) Synthetic antibiotic peptides database. Protein Pept Lett 9:53–57. doi:10.2174/0929866023408986
Wilson SR, Cui W (2004) Applications of simulated annealing to peptides. Biopolymers 29:225–235. doi:10.1002/bip.360290127
Wold S, Ruhe A, Wold H, Dunn WJIII (1984) The collinearity problem in linear regression—the partial least squares (PLS) approach to generalized inverses. Siam J Sci Stat Comput 5:735–743. doi:10.1137/0905052
Wolfe P (1969) Convergence conditions for ascent methods. SIAM Rev 11:226–235. doi:10.1137/1011036
Wu J, Aluko RE, Nakai S (2006) Structural requirements of angiotensin I-converting enzyme inhibitory peptides: quantitative structure-activity relationship modeling of peptides containing 4–10 amino acid residues. QSAR Comb Sci 25:873–880. doi:10.1002/qsar.200630005
Zaliani A, Gancia E (1999) MS-WHIM scores for amino acids: a new 3D-description for peptide QSAR and QSPR studies. J Chem Inf Comput Sci 39:525–533. doi:10.1021/ci980211b
Zhou P, Li Z, Tian F, Zhang M (2006) QSAM-based computer-aided virtual vaccine library design. Acta Chim Sin 64:2065–2070
Zhou P, Tian F, Li Z (2007) A structure-based, quantitative structure-activity relationship approach for predicting HLA-A*0201-restricted cytotoxic T lymphocyte epitopes. Chem Biol Drug Des 69:56–67. doi:10.1111/j.1747-0285.2007.00472.x
Zhou P, Tian F, Wu Y, Li Z, Shang Z (2008a) Quantitative sequence–activity model (QSAM): applying QSAR strategy to model and predict bioactivity and function of peptides, proteins and nucleic acids. Curr Comput Aided Drug Des 4:311–321. doi:10.2174/157340908786785994
Zhou P, Tian F, Chen X, Shang Z (2008b) Modeling and prediction of binding affinities between the human amphiphysin SH3 domain and its peptide ligands using genetic algorithm-Gaussian processes. Biopolymers (Pept Sci) 90:792–802. doi:10.1002/bip.21091
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Zhou, P., Chen, X., Wu, Y. et al. Gaussian process: an alternative approach for QSAM modeling of peptides. Amino Acids 38, 199–212 (2010). https://doi.org/10.1007/s00726-008-0228-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-008-0228-1