Skip to main content

Advertisement

Log in

Gaussian process: an alternative approach for QSAM modeling of peptides

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

Different statistical modeling methods (SMMs) are used for nonlinear system classification and regression. On the basis of Bayesian probabilistic inference, Gaussian process (GP) is preliminarily used in the field of quantitative structure-activity relationship (QSAR) but has not yet been applied to quantitative sequence-activity model (QSAM) of biosystems. This paper proposes the application of GP as an alternative tool for the QSAM modeling of peptides. To investigate the modeling performance of GP, three classical peptide panels were used: Angiotensin-I converting enzyme inhibitory dipeptides, bradykinin-potentiating pentapeptides and cationic antimicrobial pentadecapeptides. On this basis, we made a comprehensive comparison between the GP and some widely used SMMs such as PLS, artificial neural network (ANN) and support vector machine (SVM), and gave the conclusions as follow: (1) for those of structurally complicated peptides, particularly the polypeptides, linear PLS was incapable of capturing all dependences hidden in the peptide systems, (2) even in assistance with the monitoring technique, ANN was inclined to be overtrained in the cases of insufficient number of peptide samples, (3) SVM and GP performed best for the three peptide panels. Moreover, since GP was able to correlate the linear and nonlinear-hybrid relationship, it was slightly superior to SVM at most peptide sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Armas RR, Gonzalez-Diaz H, Molina R, Uriarte E (2005) Stochastic-based descriptors studying biopolymers biological properties: extended MARCH-INSIDE methodology describing antibacterial activity of lactoferricin derivatives. Biopolymers 77:247–256. doi:10.1002/bip.20202

    Article  PubMed  Google Scholar 

  • Ažman K, Kocijan J (2007) Application of Gaussian processes for black-box modeling of biosystems. ISA Trans 46:443–457. doi:10.1016/j.isatra.2007.04.001

    Article  PubMed  Google Scholar 

  • Burden FR (2001) Quantitative structure-activity relationship studies using Gaussian processes. J Chem Inf Comput Sci 41:830–835. doi:10.1021/ci000459c

    CAS  PubMed  Google Scholar 

  • Chen T, Morris J, Martin E (2007) Gaussian process regression for multivariate spectroscopic calibration. Chemom Intell Lab Syst 87:59–71. doi:10.1016/j.chemolab.2006.09.004

    Article  CAS  Google Scholar 

  • Cho SJ, Zheng W, Tropsha A (1998) Rational design of targeted combinatorial peptide libraries using chemical similarity probe and the inverse QSAR approaches. J Chem Inf Comput Sci 38:259–268. doi:10.1021/ci9700945

    CAS  PubMed  Google Scholar 

  • Cocchi M, Johansson E (1993) Amino acids characterization by GRID and multivariate data analysis. Quant Struct Act Relat 12:1–8. doi:10.1002/qsar.19930120102

    Article  CAS  Google Scholar 

  • Collantes ER, Dunn WJ (1995) Amino acid side chain descriptors for quantitative structure-activity relationship studies of peptide analogues. J Med Chem 38:2705–2713. doi:10.1021/jm00014a022

    Article  CAS  PubMed  Google Scholar 

  • Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–293

    Google Scholar 

  • Cushman DW, Ondetti MA, Cheung HS, Antonaccio MJ, Murthy VS, Rubin B (1980) Inhibitors of angiotensin converting enzymes. Adv Exp Med Biol 130:199–225

    CAS  PubMed  Google Scholar 

  • Dea-Ayuela MA, Perez-Castillo Y, Meneses-Marcel A, Ubeira FM, Bolas-Fernandez F, Chou KC, Gonzalez-Diaz H (2008) HP-Lattice QSAR for dynein proteins: experimental proteomics (2D-electrophoresis, mass spectrometry) and theoretic study of a Leishmania infantum sequence. Bioorg Med Chem 16:7770–7776. doi:10.1016/j.bmc.2008.07.023

    Article  CAS  PubMed  Google Scholar 

  • Doytchinova IA, Walshe V, Borrow P, Flower DR (2005) Towards the chemometric dissection of peptide-HLA-A*0201 binding affinity: comparison of local and global QSAR models. J Comput Aided Mol Des 19:203–212. doi:10.1007/s10822-005-3993-x

    Article  CAS  PubMed  Google Scholar 

  • Enot D, Gautier R, Le Marouille J (2001) Gaussian process: an efficient technique to solve quantitative structure-property relationship problems. SAR QSAR Environ Res 12:461–469. doi:10.1080/10629360108035385

    Article  CAS  PubMed  Google Scholar 

  • Freyhult EK, Andersson K, Gustafsson MG (2003) Structural modeling extends QSAR analysis of antibody-lysozyme interactions to 3D-QSAR. Biophys J 84:2264–2272

    Article  CAS  PubMed  Google Scholar 

  • Gedeck P, Rohde B, Bartels C (2006) QSAR—how good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets. J Chem Inf Model 46:1924–1936. doi:10.1021/ci050413p

    Article  CAS  PubMed  Google Scholar 

  • Geladi P, Kowalski B (1986) Partial least squares regression: a tutorial. Anal Chim Acta 185:1–17. doi:10.1016/0003-2670(86)80028-9

    Article  CAS  Google Scholar 

  • Genst ED, Areskoug D, Decanniere K, Muyldermans S, Andersson K (2002) Kinetic and affinity predictions of a protein-protein interaction using multivariate experimental design. J Biol Chem 277:29897–29907. doi:10.1074/jbc.M202359200

    Article  PubMed  Google Scholar 

  • Golbraikh A, Tropsha A (2002) Beware of q2!. J Mol Graph Model 20:269–276. doi:10.1016/S1093-3263(01)00123-1

    Article  CAS  PubMed  Google Scholar 

  • Gonzalez-Diaz H, Vilar S, Santana L, Uriarte E (2007) Medicinal chemistry and bioinformatics—current trends in drugs discovery with networks topological indices. Curr Top Med Chem 7:1015–1029. doi:10.2174/156802607780906771

    Article  CAS  PubMed  Google Scholar 

  • Gonzalez-Diaz H, Gonzalez-Diaz Y, Santana L, Ubeira FM, Uriarte E (2008) Proteomics, networks and connectivity indices. Proteomics 8:750–778. doi:10.1002/pmic.200700638

    Article  CAS  PubMed  Google Scholar 

  • Guan P, Doytchinova IA, Walshe VA, Borrow P, Flower DR (2005) Analysis of peptide-protein binding using amino acid descriptors: prediction and experimental verification for human histocompatibility complex HLA-A*0201. J Med Chem 48:7418–7425. doi:10.1021/jm0505258

    Article  CAS  PubMed  Google Scholar 

  • Gunn S (1998) Support vector machines for classification and regression. Technical report. University of Southampton, Southampton

    Google Scholar 

  • Haykin S (1999) Neural networks, a comprehensive foundation. Prentice Hall, Upper Saddle River, NJ

    Google Scholar 

  • Hellberg S, Sjöström M, Wold S (1986) The prediction of bradykinin potentiating potency of pentapeptides. An example of a peptide quantitative structure-activity relationship. Acta Chem Scand B 40:135–140. doi:10.3891/acta.chem.scand.40b-0135

    Article  CAS  PubMed  Google Scholar 

  • Hellberg S, Sjöström M, Skagerberg B, Wold S (1987) Peptide quantitative structure-activity relationships, a multivariate approach. J Med Chem 30:1126–1135. doi:10.1021/jm00390a003

    Article  CAS  PubMed  Google Scholar 

  • Hellberg S, Eriksson L, Jonsson J, Lindgren F, Sjöström M, Skagerberg B, Wold S, Andrews P (1991) Minimum analogue peptide sets (MAPS) for quantitative structure-activity relationships. Int J Pept Protein Res 37:414–424

    CAS  PubMed  Google Scholar 

  • Heravi MJ, Parastar F (2000) Use of artificial neural networks in a QSAR study of anti-HIV activity for a large group of HEPT derivatives. J Chem Inf Comput Sci 40:147–154. doi:10.1021/ci990314+

    Google Scholar 

  • Jenssen H, Gutteberg TJ, Lejon T (2005) Modeling of anti-HSV activity of lactoferricin analogues using amino acid descriptors. J Pept Sci 11:97–103. doi:10.1002/psc.604

    Article  CAS  PubMed  Google Scholar 

  • Jenssen H, Hamill P, Hancock REW (2006) Peptide antimicrobial agents. Clin Microbiol Rev 19:491–511. doi:10.1128/CMR.00056-05

    Article  CAS  PubMed  Google Scholar 

  • Jonsson J, Norberg T, Carlsson L, Gustafsson C, Wold S (1993) Quantitative sequence-activity models (QSAM) tools for sequence design. Nucleic Acids Res 21:733–739. doi:10.1093/nar/21.3.733

    Article  CAS  PubMed  Google Scholar 

  • Kidera A, Konishi Y, Oka M (1985) Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J Protein Chem 4:23–55. doi:10.1007/BF01025492

    Article  CAS  Google Scholar 

  • Kiryu H, Oshima T, Asai K (2005) Extracting relations between promoter sequences and their strengths from microarray data. Bioinformatics 21:1062–1068. doi:10.1093/bioinformatics/bti094

    Article  CAS  PubMed  Google Scholar 

  • Ladiwala A, Xia F, Luo Q, Breneman CM, Cramer SM (2006) Investigation of protein retention and selectivity in HIC systems using quantitative structure retention relationship models. Biotechnol Bioeng 93:836–850. doi:10.1002/bit.20771

    Article  CAS  PubMed  Google Scholar 

  • Lin Z, Wu Y, Zhu B, Ni B, Wang L (2004) Toward the quantitative prediction of T-cell epitopes: QSAR studies on peptides having affinity with the class I MHC molecular HLA-A*0201. J Comput Biol 11:683–694. doi:10.1089/cmb.2004.11.683

    Article  CAS  Google Scholar 

  • Liu W, Meng X, Xu Q, Flower DR, Li T (2006) Quantitative prediction of mouse class I MHC peptide binding affinity using support vector machine regression (SVR) models. BMC Bioinformatics 7:182. doi:10.1186/1471-2105-7-182

    Article  PubMed  Google Scholar 

  • MacKay DJC (1998) Introduction to Gaussian processes. In: Bishop CM (ed) Neural networks and machine learning. Springer, Heidelberg

    Google Scholar 

  • Neal RM (1997) Monte Carlo implementation of Gaussian process models for Bayesian regression and classification. Technical report, Department of Statistics, University of Toronto

  • O’Hagan A (1978) Curve fitting and optimal design for prediction. J R Stat Soc B 40:1–42

    Google Scholar 

  • Obrezanova O, Csányi G, Gola JMR, Segall MD (2007) Gaussian processes: a method for automatic QSAR modeling of ADME properties. J Chem Inf Model 47:1847–1857. doi:10.1021/ci7000633

    Article  CAS  PubMed  Google Scholar 

  • Patel S, Stott IP, Bhakoo M, Elliott P (1998) Patenting computer-designed peptides. J Comput Aided Mol Des 12:543–556. doi:10.1023/A:1008095802767

    Article  CAS  PubMed  Google Scholar 

  • Polyak BT (1969) The conjugate gradient method in extreme problems. USSR Comput Math Math Phys 9:94–112. doi:10.1016/0041-5553(69)90035-4

    Article  Google Scholar 

  • Rasmussen CE (1996) Evaluation of Gaussian processes and other methods for non-linear regression. PhD thesis, University of Toronto, Canada

  • Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, MA

    Google Scholar 

  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back propagating errors. Nature 323:533–536. doi:10.1038/323533a0

    Article  Google Scholar 

  • Sandberg M, Eriksson L, Jonsson J, Sjöström M, Wold S (1998) New chemical descriptors for the design of biologically active peptides. A multivariate characterization of 87 amino acids. J Med Chem 41:2481–2491. doi:10.1021/jm9700575

    Article  CAS  PubMed  Google Scholar 

  • Schlkopf B, Mika S, Burges C (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10:1000–1017. doi:10.1109/72.788641

    Article  Google Scholar 

  • Schneider G, Schrödl W, Wallukat G, Müller J, Nissen E, Rönspeck W, Wrede P, Kunze R (1998) Peptide design by artificial neural networks and computer-based evolutionary search. Proc Natl Acad Sci USA 95:12179–12184. doi:10.1073/pnas.95.21.12179

    Article  CAS  PubMed  Google Scholar 

  • Schroeter TS, Schwaighofer A, Mika S, Laak AT, Suelzle D, Ganzer U, Heinrich N, Müller K-R (2007) Predicting lipophilicity of drug-discovery molecules using Gaussian process models. Chem Med Chem 2:1265–1267. doi:10.1002/cmdc.200700041

    CAS  PubMed  Google Scholar 

  • Schwaighofer A, Schroeter T, Mika S, Laub J, Laak AT, Sulzle D, Ganzer U, Heinrich N, Muller KR (2007) Accurate solubility prediction with error bars for electrolytes: a machine learning approach. J Chem Inf Model 47:407–424. doi:10.1021/ci600205g

    Article  CAS  PubMed  Google Scholar 

  • Skilling J (2006) Nested sampling for general Bayesiam computations. Bayesian Anal 1:833–860. doi:10.1214/06-BA127

    Article  Google Scholar 

  • Sneath PH (1966) Relations between chemical structure and biological activity in peptides. J Theor Biol 12:157–195. doi:10.1016/0022-5193(66)90112-3

    Article  CAS  PubMed  Google Scholar 

  • Tian F, Zhou P, Li Z (2007a) T-scale as a novel vector of topological descriptors for amino acids and its application in QSARs of peptides. J Mol Struct 830:106–115. doi:10.1016/j.molstruc.2006.07.004

    Article  CAS  Google Scholar 

  • Tian F, Zhou P, Lv F, Song R, Li Z (2007b) Three-dimensional holograph vector of atomic interaction field (3D-HoVAIF): a novel rotation-translation invariant 3D structure descriptor and its applications to peptides. J Pept Sci 13:549–566. doi:10.1002/psc.892

    Article  CAS  PubMed  Google Scholar 

  • Tian F, Li Y, Lv F, Yang Q, Zhou P (2008) In silico quantitative prediction of peptides binding affinity to human MHC molecule: an intuitive quantitative structure-activity relationship approach. Amino Acids (in press). doi:10.1007/s00726-008-0116-8

  • Tino P, Nabney IT, Williams BS, Losel J, Sun Y (2004) Nonlinear prediction of quantitative structure-activity relationships. J Chem Inf Comput Sci 44:1647–1653. doi:10.1021/ci034255i

    CAS  PubMed  Google Scholar 

  • Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22:69–77. doi:10.1002/qsar.200390007

    Article  CAS  Google Scholar 

  • Tung C-W, Ho S-Y (2007) POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties. Bioinformatics 23:942–949. doi:10.1093/bioinformatics/btm061

    Article  CAS  PubMed  Google Scholar 

  • Udaka K, Mamitsuka H, Nakaseko Y, Abe N (2002) Empirical evaluation of a dynamic experiment design method for prediction of MHC class I-binding peptides. J Immunol 169:5744–5753

    CAS  PubMed  Google Scholar 

  • Ufkes JGR, Visser RJ, Heuver G, van der Meer C (1978) Structure-activity relationships of bradykinin potentiating peptides. Eur J Pharmacol 50:119–122. doi:10.1016/0014-2999(78)90006-7

    Article  CAS  PubMed  Google Scholar 

  • Ufkes JGR, Visser RJ, Heuver G, Wynne HJ, van der Meer C (1982) Further studies on the structure-activity relationships of bradykinin potentiating peptides. Eur J Pharmacol 79:155–158. doi:10.1016/0014-2999(82)90590-8

    Article  CAS  PubMed  Google Scholar 

  • Wade D, Englund J (2002) Synthetic antibiotic peptides database. Protein Pept Lett 9:53–57. doi:10.2174/0929866023408986

    Article  CAS  PubMed  Google Scholar 

  • Wilson SR, Cui W (2004) Applications of simulated annealing to peptides. Biopolymers 29:225–235. doi:10.1002/bip.360290127

    Article  Google Scholar 

  • Wold S, Ruhe A, Wold H, Dunn WJIII (1984) The collinearity problem in linear regression—the partial least squares (PLS) approach to generalized inverses. Siam J Sci Stat Comput 5:735–743. doi:10.1137/0905052

    Article  Google Scholar 

  • Wolfe P (1969) Convergence conditions for ascent methods. SIAM Rev 11:226–235. doi:10.1137/1011036

    Article  Google Scholar 

  • Wu J, Aluko RE, Nakai S (2006) Structural requirements of angiotensin I-converting enzyme inhibitory peptides: quantitative structure-activity relationship modeling of peptides containing 4–10 amino acid residues. QSAR Comb Sci 25:873–880. doi:10.1002/qsar.200630005

    Article  CAS  Google Scholar 

  • Zaliani A, Gancia E (1999) MS-WHIM scores for amino acids: a new 3D-description for peptide QSAR and QSPR studies. J Chem Inf Comput Sci 39:525–533. doi:10.1021/ci980211b

    CAS  Google Scholar 

  • Zhou P, Li Z, Tian F, Zhang M (2006) QSAM-based computer-aided virtual vaccine library design. Acta Chim Sin 64:2065–2070

    CAS  Google Scholar 

  • Zhou P, Tian F, Li Z (2007) A structure-based, quantitative structure-activity relationship approach for predicting HLA-A*0201-restricted cytotoxic T lymphocyte epitopes. Chem Biol Drug Des 69:56–67. doi:10.1111/j.1747-0285.2007.00472.x

    Article  CAS  PubMed  Google Scholar 

  • Zhou P, Tian F, Wu Y, Li Z, Shang Z (2008a) Quantitative sequence–activity model (QSAM): applying QSAR strategy to model and predict bioactivity and function of peptides, proteins and nucleic acids. Curr Comput Aided Drug Des 4:311–321. doi:10.2174/157340908786785994

    Article  CAS  Google Scholar 

  • Zhou P, Tian F, Chen X, Shang Z (2008b) Modeling and prediction of binding affinities between the human amphiphysin SH3 domain and its peptide ligands using genetic algorithm-Gaussian processes. Biopolymers (Pept Sci) 90:792–802. doi:10.1002/bip.21091

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhicai Shang.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, P., Chen, X., Wu, Y. et al. Gaussian process: an alternative approach for QSAM modeling of peptides. Amino Acids 38, 199–212 (2010). https://doi.org/10.1007/s00726-008-0228-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-008-0228-1

Keywords

Navigation