Predicting protein–protein interactions from sequence using correlation coefficient and high-quality interaction dataset

Shi, Ming-Guang; Xia, Jun-Feng; Li, Xue-Ling; Huang, De-Shuang

doi:10.1007/s00726-009-0295-y

Predicting protein–protein interactions from sequence using correlation coefficient and high-quality interaction dataset

Original Article
Published: 24 April 2009

Volume 38, pages 891–899, (2010)
Cite this article

Amino Acids Aims and scope Submit manuscript

Ming-Guang Shi^1,2,3,
Jun-Feng Xia^1,4,
Xue-Ling Li¹ &
…
De-Shuang Huang¹

1832 Accesses
72 Citations
Explore all metrics

Abstract

Identifying protein–protein interactions (PPIs) is critical for understanding the cellular function of the proteins and the machinery of a proteome. Data of PPIs derived from high-throughput technologies are often incomplete and noisy. Therefore, it is important to develop computational methods and high-quality interaction dataset for predicting PPIs. A sequence-based method is proposed by combining correlation coefficient (CC) transformation and support vector machine (SVM). CC transformation not only adequately considers the neighboring effect of protein sequence but describes the level of CC between two protein sequences. A gold standard positives (interacting) dataset MIPS Core and a gold standard negatives (non-interacting) dataset GO-NEG of yeast Saccharomyces cerevisiae were mined to objectively evaluate the above method and attenuate the bias. The SVM model combined with CC transformation yielded the best performance with a high accuracy of 87.94% using gold standard positives and gold standard negatives datasets. The source code of MATLAB and the datasets are available on request under smgsmg@mail.ustc.edu.cn.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust and accurate prediction of protein–protein interactions by exploiting evolutionary information

Article Open access 19 August 2021

Identification of self-interacting proteins by integrating random projection classifier and finite impulse response filter

Article Open access 27 December 2019

Using discriminative vector machine model with 2DPCA to predict interactions among proteins

Article Open access 24 December 2019

References

Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, Hogue CW (2001) BIND-the biomolecular interaction network database. Nucleic Acids Res 29:242–245
Article CAS PubMed Google Scholar
Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Comput 12:2385–2404
Article CAS PubMed Google Scholar
Ben-Hur A, Noble WS (2006) Choosing negative examples for the prediction of protein–protein interactions. BMC Bioinformatics 7:S2
Article PubMed Google Scholar
Brenner SE, Chothia C, Hubbard TJ (1998) Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Natl Acad Sci USA 95:6073–6078
Article CAS PubMed Google Scholar
Charton M, Charton BI (1982) The structural dependence of amino acid hydrophobicity parameters. J Theor Biol 99:629–644
Article CAS PubMed Google Scholar
Chothia C (1976) The nature of the accessible and buried surfaces in proteins. J Mol Biol 105:1–12
Article CAS PubMed Google Scholar
Deane CM, Salwinski L, Xenarios I, Eisenberg D (2002) Protein interactions: Two methods for assessment of the reliability of high throughput observations. Mol Cell Proteomics 1:349–356
Article CAS PubMed Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Article Google Scholar
Eisenberg D, McLachlan AD (1986) Solvation energy in protein folding and binding. Nature 319:199–203
Article CAS PubMed Google Scholar
Fauchere JL (1988) Amino acid side chain parameters for correlation studies in biology and pharmacology. Int J Pept Protein Res 32:269–278
CAS PubMed Google Scholar
Faulon JL, Misra M, Martin S, Sale K, Sapra R (2008) Genome scale enzyme-metabolites and drug-target interaction predictions using the signature molecular descriptor. Bioinformatics 24:225–233
Article CAS PubMed Google Scholar
Feng ZP, Zhang CT (2000) Prediction of membrane protein types based on the hydrophobic index of amino acids. J Protein Chem 19:269–275
Article CAS PubMed Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Article Google Scholar
Garel JP (1973) Coefficients de partage d’aminoacides, nucleobases, nucleosides et nucleotides dans un systeme solvant salin. J Chromatogr 78:381–391
CAS PubMed Google Scholar
Gavin AC et al (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415:141–147
Article CAS PubMed Google Scholar
Giot L et al (2003) A protein interaction map of Drosophila melanogaster. Science 302:1727–1736
Article CAS PubMed Google Scholar
Gomez SM, Noble WS, Rzhetsky A (2003) Learning to predict protein–protein interactions. Bioinformatics 19:1875–1881
Article CAS PubMed Google Scholar
Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185:862–864
Article CAS PubMed Google Scholar
Guldener U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A, Mewes HW, Stumpflen V (2006) MPact: the MIPS protein interaction resource on yeast. Nucleic Acids Res 34:D436–D441
Article PubMed Google Scholar
Guo X et al (2006) Assessing semantic similarity measures for the characterization of human regulatory pathways. Bioinformatics 22:967–973
Article CAS PubMed Google Scholar
Guo J, Wu XM, Zhang DY, Lin K (2008a) Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein–protein interaction dataset. Nucleic Acids Res 36:2002–2011
Article CAS PubMed Google Scholar
Guo YZ, Yu LZ, Wen ZN, Li ML (2008b) Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Res 36:3025–3030
Article CAS PubMed Google Scholar
Ho Y et al (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415:180–183
Article CAS PubMed Google Scholar
Hopp TP, Woods KR (1981) Prediction of protein antigenic determinants from amino acid sequences. Proc Natl Acad Sci USA 78:3824–3828
Article CAS PubMed Google Scholar
Horne DS (1988) Prediction of protein helix content from an autocorrelation analysis of sequence hydrophobicities. Biopolymers 27:451–477
Article CAS PubMed Google Scholar
Hutchens JO (1970) Heat capacities, absolute entropies, and entropies of formation of amino acids and related compounds. In: Sober HA (ed) Handbook of biochemistry, 2nd edn. Chemical Rubber Co., Cleveland, pp B60–B61
Google Scholar
Ito T et al (2000) Toward a protein–protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad Sci USA 97:1143–1147
Article CAS PubMed Google Scholar
Ito T et al (2001) A comprehensive two-hybrid analysis to explore the yeast protein ineractome. Proc Natl Acad Sci USA 98:4569–4574
Article CAS PubMed Google Scholar
Janin J (1979) Surface and inside volumes in globular proteins. Nature 277:491–492
Article CAS PubMed Google Scholar
Jansen R, Gerstein M (2004) Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. Curr Opin Microbiol 7:535–545
Article CAS PubMed Google Scholar
Koji T, William SN (2004) Learning kernels from biological networks by maximizing entropy. Bioinformatics 20:i326–i333
Article Google Scholar
Krogan NJ et al (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440:637–643
Article CAS PubMed Google Scholar
Leslie C, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for SVM protein classification. In: Proceedings of the Pacific symposium on biocomputing, New Jersey. World Scientific, Singapore, pp 564–575
Li S et al (2004) A map of the interactome network of the metazoan c elegans. Science 303:540–543
Article CAS PubMed Google Scholar
Madaoui H, Guerois R (2008) Coevolution at protein complex interfaces can be detected by the complementarity trace with important impact for predictive docking. Proc Natl Acad Sci USA 105:7708–7713
Article CAS PubMed Google Scholar
Manly KF, Nettleton D, Hwang JT (2004) Genomics, prior probability, and statistical tests of multiple hypotheses. Genome Res 14:997–1001
Article CAS PubMed Google Scholar
Martin S, Roe D, Faulon JL (2005) Predicting protein–protein interactions using signature products. Bioinformatics 21:218–226
Article CAS PubMed Google Scholar
Mewes HW et al (2006) MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucleic Acids Res 34:D169–D172
Article CAS PubMed Google Scholar
Prabhakaran M, Ponnuswamy PK (1982) Shape and surface features of globular proteins. Macromolecules 15:314–320
Article CAS Google Scholar
Rain JC et al (2001) The protein–protein interaction map of Helicobacter pylori. Nature 409:211–215
Article CAS PubMed Google Scholar
Resnik P (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95–130
Google Scholar
Saito R et al (2003) Construction of reliable protein–protein interaction networks with a new interaction generality measure. Bioinformatics 19:756–763
Article CAS PubMed Google Scholar
Scholkopf B, Smola A, Muller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10:1299–1319
Article Google Scholar
Shen JW et al (2007) Predicting protein–protein interactions based only on sequences information. Proc Natl Acad Sci USA 104:4337–4341
Article CAS PubMed Google Scholar
Sokal RR, Thomson BA (2006) Population structure inferred by local spatial autocorrelation: an example from an Amerindian tribal population. Am J Phys Anthropol 129:121–131
Article PubMed Google Scholar
Sprinzak E, Margalit H (2001) Correlated sequence-signatures as markers of protein–protein interaction. J Mol Biol 311:681–692
Article CAS PubMed Google Scholar
Sweet RM, Eisenberg D (1983) Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. J Mol Biol 171:479–488
Article CAS PubMed Google Scholar
Uetz P et al (2000) A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature 403:623–627
Article CAS PubMed Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
Google Scholar
Von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P (2002) Comparative assessment of large scale data sets of protein–protein interactions. Nature 417:399–403
Article Google Scholar
Wang JZ, Du ZD, Payattakool R, Yu PS, Chen CF (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23:1274–1281
Article CAS PubMed Google Scholar
Wiwatwattana N, Landau CM, Cope GJ, Harp GA, Kumar A (2007) Organelle DB: an updated resource of eukaryotic protein localization and function. Nucleic Acids Res 35:D810–D814
Article CAS PubMed Google Scholar
Wold S et al (1993) DNA and peptide sequences and chemical processes mutlivariately modelled by principal component analysis and partial least-squares projections to latent structures. Anal Chim Acta 277:239–253
Article CAS Google Scholar
Wu X, Zhu L, Guo J, Zhang DY, Lin K (2006) Prediction of yeast protein–protein interaction network: insights from the gene ontology and annotations. Nucleic Acids Res 34:2137–2150
Article CAS PubMed Google Scholar
Xenarios I et al (2002) Dip, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 30:303–305
Article CAS PubMed Google Scholar
Yeang CH, Haussler D (2007) Detecting coevolution in and among protein domains. PLoS Comput Biol 3:e211
Article PubMed Google Scholar
Zhu H et al (2001) Global analysis of protein activities using proteome chips. Science 293:2101–2105
Article CAS PubMed Google Scholar

Download references

Acknowledgments

This work was supported by the grants of the National Science Foundation of China, Nos. 60472111 and 30570368, the grant from the National Basic Research Program of China (973 Program), No. 2007CB311002, the grants from the National High Technology Research and Development Program of China (863 Program), Nos. 2007AA01Z167 and 2006AA02Z309, the grant of Oversea Outstanding Scholars Fund of CAS, No. 2005-1-18, HFUT, No. 070403F and the Knowledge Innovation Program of the Chinese Academy of Sciences (0823A16121).

Author information

Authors and Affiliations

Intelligent Computing Lab, Hefei Institute of Intelligent Machines, Chinese Academy of Sciences, 230031, Hefei, China
Ming-Guang Shi, Jun-Feng Xia, Xue-Ling Li & De-Shuang Huang
Department of Automation, University of Science and Technology of China, 230026, Hefei, China
Ming-Guang Shi
School of Electric Engineering and Automation, Hefei University of Technology, 230009, Hefei, China
Ming-Guang Shi
School of Life Science, University of Science and Technology of China, 230026, Hefei, China
Jun-Feng Xia

Authors

Ming-Guang Shi
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Feng Xia
View author publications
You can also search for this author in PubMed Google Scholar
Xue-Ling Li
View author publications
You can also search for this author in PubMed Google Scholar
De-Shuang Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to De-Shuang Huang.

Electronic supplementary material

Supplementary Material 1 (XLS 18 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi, MG., Xia, JF., Li, XL. et al. Predicting protein–protein interactions from sequence using correlation coefficient and high-quality interaction dataset. Amino Acids 38, 891–899 (2010). https://doi.org/10.1007/s00726-009-0295-y

Download citation

Received: 22 December 2008
Accepted: 03 April 2009
Published: 24 April 2009
Issue Date: March 2010
DOI: https://doi.org/10.1007/s00726-009-0295-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting protein–protein interactions from sequence using correlation coefficient and high-quality interaction dataset

Abstract

Access this article

Similar content being viewed by others

Robust and accurate prediction of protein–protein interactions by exploiting evolutionary information

Identification of self-interacting proteins by integrating random projection classifier and finite impulse response filter

Using discriminative vector machine model with 2DPCA to predict interactions among proteins

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary Material 1 (XLS 18 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting protein–protein interactions from sequence using correlation coefficient and high-quality interaction dataset

Abstract

Access this article

Similar content being viewed by others

Robust and accurate prediction of protein–protein interactions by exploiting evolutionary information

Identification of self-interacting proteins by integrating random projection classifier and finite impulse response filter

Using discriminative vector machine model with 2DPCA to predict interactions among proteins

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary Material 1 (XLS 18 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation