Journal of Molecular Modeling

, Volume 18, Issue 4, pp 1285–1297 | Cite as

Application of information theory to feature selection in protein docking

  • Olaf G. Othersen
  • Arno G. Stefani
  • Johannes B. Huber
  • Heinrich Sticht
Original Paper


In the era of structural genomics, the prediction of protein interactions using docking algorithms is an important goal. The success of this method critically relies on the identification of good docking solutions among a vast excess of false solutions. We have adapted the concept of mutual information (MI) from information theory to achieve a fast and quantitative screening of different structural features with respect to their ability to discriminate between physiological and nonphysiological protein interfaces. The strategy includes the discretization of each structural feature into distinct value ranges to optimize its mutual information. We have selected 11 structural features and two datasets to demonstrate that the MI is dimensionless and can be directly compared for diverse structural features and between datasets of different sizes. Conversion of the MI values into a simple scoring function revealed that those features with a higher MI are actually more powerful for the identification of good docking solutions. Thus, an MI-based approach allows the rapid screening of structural features with respect to their information content and should therefore be helpful for the design of improved scoring functions in future. In addition, the concept presented here may also be adapted to related areas that require feature selection for biomolecules or organic ligands.


Protein interaction Structure analysis Feature selection Protein interface Mutual information 



The authors thank Kristin Kassler and Dr. Christophe Jardin for critically reading the manuscript. The project was funded within the DFG (Deutsche Forschungsgemeinschaft) priority program (SPP 1395) by grants to JH and HS.


  1. 1.
    Lensink MF, Mendez R, Wodak SJ (2007) Docking and scoring protein complexes: CAPRI 3rd edition. Proteins 69:704–718Google Scholar
  2. 2.
    Lensink MF, Wodak SJ (2010) Docking and scoring protein interactions: CAPRI 2009. Proteins 78:3073–3084Google Scholar
  3. 3.
    Janin J (2010) Protein–protein docking tested in blind predictions: the CAPRI experiment. Mol Biosyst 6:2351–2362Google Scholar
  4. 4.
    Janin J (2010) The targets of CAPRI rounds 13–19. Proteins 78:3067–3072Google Scholar
  5. 5.
    Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA (1992) Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proc Natl Acad Sci USA 89:2195–2199CrossRefGoogle Scholar
  6. 6.
    Walls PH, Sternberg MJ (1992) New algorithm to model protein–protein recognition based on surface complementarity. Applications to antibody–antigen docking. J Mol Biol 228:277–297Google Scholar
  7. 7.
    Jones S, Thornton JM (1996) Principles of protein–protein interactions. Proc Natl Acad Sci USA 93:13–20Google Scholar
  8. 8.
    Meyer M, Wilson P, Schomburg D (1996) Hydrogen bonding and molecular surface shape complementarity as a basis for protein docking. J Mol Biol 264:199–210CrossRefGoogle Scholar
  9. 9.
    Ausiello G, Cesareni G, Helmer-Citterich M (1997) Escher: a new docking procedure applied to the reconstruction of protein tertiary structure. Proteins 28:556–567CrossRefGoogle Scholar
  10. 10.
    Vakser IA, Aflalo C (1994) Hydrophobic docking: a proposed enhancement to molecular recognition techniques. Proteins 20:320–329CrossRefGoogle Scholar
  11. 11.
    Gabb HA, Jackson RM, Sternberg MJ (1997) Modelling protein docking using shape complementarity, electrostatics and biochemical information. J Mol Biol 272:106–120CrossRefGoogle Scholar
  12. 12.
    Robert CH, Janin J (1998) A soft, mean-field potential derived from crystal contacts for predicting protein–protein interactions. J Mol Biol 283:1037–1047Google Scholar
  13. 13.
    Moont G, Gabb HA, Sternberg MJ (1999) Use of pair potentials across protein interfaces in screening predicted docked complexes. Proteins 35:364–373CrossRefGoogle Scholar
  14. 14.
    Zhang C, Liu S, Zhou H, Zhou Y (2004) An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state. Protein Sci 13:400–411CrossRefGoogle Scholar
  15. 15.
    Pons C, Talavera D, de la Cruz X, Orozco M, Fernandez-Recio J (2011) Scoring by intermolecular pairwise propensities of exposed residues (sipper): a new efficient potential for protein–protein docking. J Chem Inf Model 51:370–377CrossRefGoogle Scholar
  16. 16.
    Cover TM, Thomas JA (2006) Elements of information theory. Wiley-Interscience, HobokenGoogle Scholar
  17. 17.
    Douguet D, Chen HC, Tovchigrechko A, Vakser IA (2006) Dockground resource for studying protein–protein interfaces. Bioinformatics 22:2612–2618Google Scholar
  18. 18.
    Gao Y, Douguet D, Tovchigrechko A, Vakser IA (2007) Dockground system of databases for protein recognition studies: unbound structures for docking. Proteins 69:845–851CrossRefGoogle Scholar
  19. 19.
    Liu S, Gao Y, Vakser IA (2008) Dockground protein–protein docking decoy set. Bioinformatics 24:2634–2635Google Scholar
  20. 20.
    Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637CrossRefGoogle Scholar
  21. 21.
    Fiorucci S, Zacharias M (2010) Prediction of protein–protein interaction sites using electrostatic desolvation profiles. Biophys J 98:1921–1930Google Scholar
  22. 22.
    Aloy P, Russell RB (2002) Interrogating protein interaction networks through structural biology. Proc Natl Acad Sci USA 99:5896–5901CrossRefGoogle Scholar
  23. 23.
    Ansari S, Helms V (2005) Statistical analysis of predominantly transient protein–protein interfaces. Proteins 61:344–355Google Scholar
  24. 24.
    Melo F, Feytmans E (1997) Novel knowledge-based mean force potential at atomic level. J Mol Biol 267:207–222CrossRefGoogle Scholar
  25. 25.
    Melo F, Sanchez R, Sali A (2002) Statistical potentials for fold assessment. Protein Sci 11:430–448CrossRefGoogle Scholar
  26. 26.
    Launay G, Mendez R, Wodak S, Simonson T (2007) Recognizing protein–protein interfaces with empirical potentials and reduced amino acid alphabets. BMC Bioinforma 8:270Google Scholar
  27. 27.
    Fiorucci S, Zacharias M (2010) Binding site prediction and improved scoring during flexible protein–protein docking with attract. Proteins 78:3131–3139Google Scholar
  28. 28.
    Ohlson MB, Huang Z, Alto NM, Blanc MP, Dixon JE, Chai J, Miller SI (2008) Structure and function of salmonella sifa indicate that its interactions with skip, ssej, and rhoa family gtpases induce endosomal tubulation. Cell Host Microbe 4:434–446CrossRefGoogle Scholar
  29. 29.
    Diacovich L, Dumont A, Lafitte D, Soprano E, Guilhon AA, Bignon C, Gorvel JP, Bourne Y, Meresse S (2009) Interaction between the sifa virulence factor and its host target skip is essential for salmonella pathogenesis. J Biol Chem 284:33151–33160CrossRefGoogle Scholar
  30. 30.
    Perkins JR, Diboun I, Dessailly BH, Lees JG, Orengo C (2010) Transient protein–protein interactions: structural, functional, and network properties. Structure 18:1233–1243Google Scholar
  31. 31.
    Dey S, Pal A, Chakrabarti P, Janin J (2010) The subunit interfaces of weakly associated homodimeric proteins. J Mol Biol 398:146–160CrossRefGoogle Scholar
  32. 32.
    Gatenby RA, Frieden BR (2007) Information theory in living systems, methods, applications, and challenges. Bull Math Biol 69:635–657CrossRefGoogle Scholar
  33. 33.
    Kauffman C, Karypis G (2008) An analysis of information content present in protein–DNA interactions. Pac Symp Biocomput:477–488Google Scholar
  34. 34.
    Sterner B, Singh R, Berger B (2007) Predicting and annotating catalytic residues: an information theoretic approach. J Comput Biol 14:1058–1073Google Scholar
  35. 35.
    Magliery TJ, Regan L (2005) Sequence variation in ligand binding sites in proteins. BMC Bioinforma 6:240CrossRefGoogle Scholar
  36. 36.
    Kulharia M, Goody RS, Jackson RM (2008) Information theory-based scoring function for the structure-based prediction of protein–ligand binding affinity. J Chem Inf Model 48:1990–1998Google Scholar
  37. 37.
    Wassermann AM, Nisius B, Vogt M, Bajorath J (2010) Identification of descriptors capturing compound class-specific features by mutual information analysis. J Chem Inf Model 50:1935–1940CrossRefGoogle Scholar
  38. 38.
    Cline MS, Karplus K, Lathrop RH, Smith TF, Rogers RG Jr, Haussler D (2002) Information-theoretic dissection of pairwise contact potentials. Proteins 49:7–14CrossRefGoogle Scholar
  39. 39.
    Shackelford G, Karplus K (2007) Contact prediction using mutual information and neural nets. Proteins 69(Suppl 8):159–164CrossRefGoogle Scholar
  40. 40.
    Miller CS, Eisenberg D (2008) Using inferred residue contacts to distinguish between correct and incorrect protein models. Bioinformatics 24:1575–1582CrossRefGoogle Scholar
  41. 41.
    Solis AD, Rackovsky S (2008) Information and discrimination in pairwise contact potentials. Proteins 71:1071–1087CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Olaf G. Othersen
    • 1
  • Arno G. Stefani
    • 2
  • Johannes B. Huber
    • 2
  • Heinrich Sticht
    • 1
  1. 1.Bioinformatik, Institut für BiochemieFriedrich-Alexander-Universität Erlangen-NürnbergErlangenGermany
  2. 2.Lehrstuhl für InformationsübertragungFriedrich-Alexander-Universität Erlangen-NürnbergErlangenGermany

Personalised recommendations