Abstract
In the era of structural genomics, the prediction of protein interactions using docking algorithms is an important goal. The success of this method critically relies on the identification of good docking solutions among a vast excess of false solutions. We have adapted the concept of mutual information (MI) from information theory to achieve a fast and quantitative screening of different structural features with respect to their ability to discriminate between physiological and nonphysiological protein interfaces. The strategy includes the discretization of each structural feature into distinct value ranges to optimize its mutual information. We have selected 11 structural features and two datasets to demonstrate that the MI is dimensionless and can be directly compared for diverse structural features and between datasets of different sizes. Conversion of the MI values into a simple scoring function revealed that those features with a higher MI are actually more powerful for the identification of good docking solutions. Thus, an MI-based approach allows the rapid screening of structural features with respect to their information content and should therefore be helpful for the design of improved scoring functions in future. In addition, the concept presented here may also be adapted to related areas that require feature selection for biomolecules or organic ligands.
Similar content being viewed by others
References
Lensink MF, Mendez R, Wodak SJ (2007) Docking and scoring protein complexes: CAPRI 3rd edition. Proteins 69:704–718
Lensink MF, Wodak SJ (2010) Docking and scoring protein interactions: CAPRI 2009. Proteins 78:3073–3084
Janin J (2010) Protein–protein docking tested in blind predictions: the CAPRI experiment. Mol Biosyst 6:2351–2362
Janin J (2010) The targets of CAPRI rounds 13–19. Proteins 78:3067–3072
Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA (1992) Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proc Natl Acad Sci USA 89:2195–2199
Walls PH, Sternberg MJ (1992) New algorithm to model protein–protein recognition based on surface complementarity. Applications to antibody–antigen docking. J Mol Biol 228:277–297
Jones S, Thornton JM (1996) Principles of protein–protein interactions. Proc Natl Acad Sci USA 93:13–20
Meyer M, Wilson P, Schomburg D (1996) Hydrogen bonding and molecular surface shape complementarity as a basis for protein docking. J Mol Biol 264:199–210
Ausiello G, Cesareni G, Helmer-Citterich M (1997) Escher: a new docking procedure applied to the reconstruction of protein tertiary structure. Proteins 28:556–567
Vakser IA, Aflalo C (1994) Hydrophobic docking: a proposed enhancement to molecular recognition techniques. Proteins 20:320–329
Gabb HA, Jackson RM, Sternberg MJ (1997) Modelling protein docking using shape complementarity, electrostatics and biochemical information. J Mol Biol 272:106–120
Robert CH, Janin J (1998) A soft, mean-field potential derived from crystal contacts for predicting protein–protein interactions. J Mol Biol 283:1037–1047
Moont G, Gabb HA, Sternberg MJ (1999) Use of pair potentials across protein interfaces in screening predicted docked complexes. Proteins 35:364–373
Zhang C, Liu S, Zhou H, Zhou Y (2004) An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state. Protein Sci 13:400–411
Pons C, Talavera D, de la Cruz X, Orozco M, Fernandez-Recio J (2011) Scoring by intermolecular pairwise propensities of exposed residues (sipper): a new efficient potential for protein–protein docking. J Chem Inf Model 51:370–377
Cover TM, Thomas JA (2006) Elements of information theory. Wiley-Interscience, Hoboken
Douguet D, Chen HC, Tovchigrechko A, Vakser IA (2006) Dockground resource for studying protein–protein interfaces. Bioinformatics 22:2612–2618
Gao Y, Douguet D, Tovchigrechko A, Vakser IA (2007) Dockground system of databases for protein recognition studies: unbound structures for docking. Proteins 69:845–851
Liu S, Gao Y, Vakser IA (2008) Dockground protein–protein docking decoy set. Bioinformatics 24:2634–2635
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637
Fiorucci S, Zacharias M (2010) Prediction of protein–protein interaction sites using electrostatic desolvation profiles. Biophys J 98:1921–1930
Aloy P, Russell RB (2002) Interrogating protein interaction networks through structural biology. Proc Natl Acad Sci USA 99:5896–5901
Ansari S, Helms V (2005) Statistical analysis of predominantly transient protein–protein interfaces. Proteins 61:344–355
Melo F, Feytmans E (1997) Novel knowledge-based mean force potential at atomic level. J Mol Biol 267:207–222
Melo F, Sanchez R, Sali A (2002) Statistical potentials for fold assessment. Protein Sci 11:430–448
Launay G, Mendez R, Wodak S, Simonson T (2007) Recognizing protein–protein interfaces with empirical potentials and reduced amino acid alphabets. BMC Bioinforma 8:270
Fiorucci S, Zacharias M (2010) Binding site prediction and improved scoring during flexible protein–protein docking with attract. Proteins 78:3131–3139
Ohlson MB, Huang Z, Alto NM, Blanc MP, Dixon JE, Chai J, Miller SI (2008) Structure and function of salmonella sifa indicate that its interactions with skip, ssej, and rhoa family gtpases induce endosomal tubulation. Cell Host Microbe 4:434–446
Diacovich L, Dumont A, Lafitte D, Soprano E, Guilhon AA, Bignon C, Gorvel JP, Bourne Y, Meresse S (2009) Interaction between the sifa virulence factor and its host target skip is essential for salmonella pathogenesis. J Biol Chem 284:33151–33160
Perkins JR, Diboun I, Dessailly BH, Lees JG, Orengo C (2010) Transient protein–protein interactions: structural, functional, and network properties. Structure 18:1233–1243
Dey S, Pal A, Chakrabarti P, Janin J (2010) The subunit interfaces of weakly associated homodimeric proteins. J Mol Biol 398:146–160
Gatenby RA, Frieden BR (2007) Information theory in living systems, methods, applications, and challenges. Bull Math Biol 69:635–657
Kauffman C, Karypis G (2008) An analysis of information content present in protein–DNA interactions. Pac Symp Biocomput:477–488
Sterner B, Singh R, Berger B (2007) Predicting and annotating catalytic residues: an information theoretic approach. J Comput Biol 14:1058–1073
Magliery TJ, Regan L (2005) Sequence variation in ligand binding sites in proteins. BMC Bioinforma 6:240
Kulharia M, Goody RS, Jackson RM (2008) Information theory-based scoring function for the structure-based prediction of protein–ligand binding affinity. J Chem Inf Model 48:1990–1998
Wassermann AM, Nisius B, Vogt M, Bajorath J (2010) Identification of descriptors capturing compound class-specific features by mutual information analysis. J Chem Inf Model 50:1935–1940
Cline MS, Karplus K, Lathrop RH, Smith TF, Rogers RG Jr, Haussler D (2002) Information-theoretic dissection of pairwise contact potentials. Proteins 49:7–14
Shackelford G, Karplus K (2007) Contact prediction using mutual information and neural nets. Proteins 69(Suppl 8):159–164
Miller CS, Eisenberg D (2008) Using inferred residue contacts to distinguish between correct and incorrect protein models. Bioinformatics 24:1575–1582
Solis AD, Rackovsky S (2008) Information and discrimination in pairwise contact potentials. Proteins 71:1071–1087
Acknowledgments
The authors thank Kristin Kassler and Dr. Christophe Jardin for critically reading the manuscript. The project was funded within the DFG (Deutsche Forschungsgemeinschaft) priority program (SPP 1395) by grants to JH and HS.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof that I ( X ; y j ) is always positive
We will show that
is always greater than or equal to 0. For this purpose, the log sum inequality is quoted first [16]:
With a i = Pr(x i ,y j ), b i = Pr(x i )Pr(y j ) and n = M x , it follows that
Rights and permissions
About this article
Cite this article
Othersen, O.G., Stefani, A.G., Huber, J.B. et al. Application of information theory to feature selection in protein docking. J Mol Model 18, 1285–1297 (2012). https://doi.org/10.1007/s00894-011-1157-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00894-011-1157-6