Journal of Molecular Modeling

, Volume 19, Issue 9, pp 3901–3910 | Cite as

An information-theoretic classification of amino acids for the assessment of interfaces in protein–protein docking

  • Christophe Jardin
  • Arno G. Stefani
  • Martin Eberhardt
  • Johannes B. Huber
  • Heinrich Sticht
Original Paper


Docking represents a versatile and powerful method to predict the geometry of protein–protein complexes. However, despite significant methodical advances, the identification of good docking solutions among a large number of false solutions still remains a difficult task. We have previously demonstrated that the formalism of mutual information (MI) from information theory can be adapted to protein docking, and we have now extended this approach to enhance its robustness and applicability. A large dataset consisting of 22,934 docking decoys derived from 203 different protein–protein complexes was used for an MI-based optimization of reduced amino acid alphabets representing the protein–protein interfaces. This optimization relied on a clustering analysis that allows one to estimate the mutual information of whole amino acid alphabets by considering all structural features simultaneously, rather than by treating them individually. This clustering approach is fast and can be applied in a similar fashion to the generation of reduced alphabets for other biological problems like fold recognition, sequence data mining, or secondary structure prediction. The reduced alphabets derived from the present work were converted into a scoring function for the evaluation of docking solutions, which is available for public use via the web service score-MI:


Protein interaction Structure analysis Reduced amino acid alphabet Protein interface Mutual information 



The project was funded by Deutsche Forschungsgemeinschaft within the priority program 1395 (SPP 1395) by grants to J.H. and H.S.


  1. 1.
    Jones S, Thornton JM (1995) Protein-protein interactions: a review of protein dimer structures. Prog Biophys Mol Biol 63:31–65CrossRefGoogle Scholar
  2. 2.
    Lo Conte L, Chothia C, Janin J (1999) The atomic structure of protein–protein recognition sites. J Mol Biol 285:2177–2198CrossRefGoogle Scholar
  3. 3.
    Jones S, Thornton JM (1996) Principles of protein–protein interactions. Proc Natl Acad Sci USA 93:13–20CrossRefGoogle Scholar
  4. 4.
    Nooren IM, Thornton JM (2003) Diversity of protein–protein interactions. EMBO J 22:3486–3492CrossRefGoogle Scholar
  5. 5.
    Pawson T, Nash P (2003) Assembly of cell regulatory systems through protein interaction domains. Science 300:445–452CrossRefGoogle Scholar
  6. 6.
    Aloy P, Russell RB (2004) Ten thousand interactions for the molecular biologist. Nat Biotechnol 22:1317–1321CrossRefGoogle Scholar
  7. 7.
    Young KH (1998) Yeast two-hybrid: so many interactions, (in) so little time. Biol Reprod 58:302–311CrossRefGoogle Scholar
  8. 8.
    Aebersold R, Mann M (2003) Mass spectrometry-based proteomics. Nature 422:198–207CrossRefGoogle Scholar
  9. 9.
    Yu H, Braun P, Yildirim MA, Lemmens I, Venkatesan K, Sahalie J, Hirozane-Kishikawa T, Gebreab F, Li N, Simonis N, Hao T, Rual JF, Dricot A, Vazquez A, Murray RR, Simon C, Tardivo L, Tam S, Svrzikapa N, Fan C, de Smet AS, Motyl A, Hudson ME, Park J, Xin X, Cusick ME, Moore T, Boone C, Snyder M, Roth FP, Barabasi AL, Tavernier J, Hill DE, Vidal M (2008) High-quality binary protein interaction map of the yeast interactome network. Science 322:104–110CrossRefGoogle Scholar
  10. 10.
    Gavin AC, Superti-Furga G (2003) Protein complexes and proteome organization from yeast to man. Curr Opin Chem Biol 7:21–27CrossRefGoogle Scholar
  11. 11.
    Gietz RD, Triggs-Raine B, Robbins A, Graham KC, Woods RA (1997) Identification of proteins that interact with a protein of interest: applications of the yeast two-hybrid system. Mol Cell Biochem 172:67–79CrossRefGoogle Scholar
  12. 12.
    Halperin I, Ma B, Wolfson H, Nussinov R (2002) Principles of docking: an overview of search algorithms and a guide to scoring functions. Proteins 47:409–443CrossRefGoogle Scholar
  13. 13.
    Smith GR, Sternberg MJ (2002) Prediction of protein–protein interactions by docking methods. Curr Opin Struct Biol 12:28–35CrossRefGoogle Scholar
  14. 14.
    Othersen OG, Stefani AG, Huber JB, Sticht H (2012) Application of information theory to feature selection in protein docking. J Mol Model 18:1285–1297CrossRefGoogle Scholar
  15. 15.
    Gabb HA, Jackson RM, Sternberg MJ (1997) Modelling protein docking using shape complementarity, electrostatics and biochemical information. J Mol Biol 272:106–120CrossRefGoogle Scholar
  16. 16.
    Douguet D, Chen HC, Tovchigrechko A, Vakser IA (2006) Dockground resource for studying protein–protein interfaces. Bioinformatics 22:2612–2618CrossRefGoogle Scholar
  17. 17.
    Gao Y, Douguet D, Tovchigrechko A, Vakser IA (2007) Dockground system of databases for protein recognition studies: unbound structures for docking. Proteins 69:845–851CrossRefGoogle Scholar
  18. 18.
    Liu S, Gao Y, Vakser IA (2008) Dockground protein-protein docking decoy set. Bioinformatics 24:2634–2635CrossRefGoogle Scholar
  19. 19.
    Levitt M, Warshel A (1975) Computer-simulation of protein folding. Nature 253:694–698CrossRefGoogle Scholar
  20. 20.
    Cover TM, Thomas JA (2006) Elements of information theory. Wiley-Interscience, HobokenGoogle Scholar
  21. 21.
    Bacardit J, Stout M, Hirst JD, Valencia A, Smith RE, Krasnogor N (2009) Automated alphabet reduction for protein datasets. BMC Bioinforma 10:6CrossRefGoogle Scholar
  22. 22.
    Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice hall advanced reference series. Prentice-HallGoogle Scholar
  23. 23.
    Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323CrossRefGoogle Scholar
  24. 24.
    Achtert E, Goldhofer S, Kriegel H-P, Schubert E, Zimek A (2012) Evaluation of clusterings—metrics and visual support. 28th International Conference on Data Engineering (ICDE), Washington, pp 1285–1288Google Scholar
  25. 25.
    Lloyd SP (1982) Least-squares quantization in pcm. IEEE Trans Inf Theory 28:129–137CrossRefGoogle Scholar
  26. 26.
    Arthur D, Vassilvitskii S (2007) K-means++: The advantages of careful seeding. Paper presented at the proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. New Orleans, LouisianaGoogle Scholar
  27. 27.
    Pierce B, Weng Z (2007) Zrank: Reranking protein docking predictions with an optimized energy function. Proteins 67:1078–1086CrossRefGoogle Scholar
  28. 28.
    Yang Y, Zhou Y (2008) Specific interactions for ab initio folding of protein terminal regions with secondary structures. Proteins 72:793–803CrossRefGoogle Scholar
  29. 29.
    Yang Y, Zhou Y (2008) Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions. Protein Sci 17:1212–1219CrossRefGoogle Scholar
  30. 30.
    Wallace AC, Laskowski RA, Thornton JM (1995) Ligplot: A program to generate schematic diagrams of protein-ligand interactions. Protein Eng 8:127–134CrossRefGoogle Scholar
  31. 31.
    Jmol: An open-source java viewer for chemical structures in 3d. http://www.Jmol.Org/
  32. 32.
    Launay G, Mendez R, Wodak S, Simonson T (2007) Recognizing protein-protein interfaces with empirical potentials and reduced amino acid alphabets. BMC Bioinforma 8:270CrossRefGoogle Scholar
  33. 33.
    Melo F, Marti-Renom MA (2006) Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets. Proteins 63:986–995CrossRefGoogle Scholar
  34. 34.
    Peterson EL, Kondev J, Theriot JA, Phillips R (2009) Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment. Bioinformatics 25:1356–1362CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Christophe Jardin
    • 1
  • Arno G. Stefani
    • 2
  • Martin Eberhardt
    • 1
  • Johannes B. Huber
    • 2
  • Heinrich Sticht
    • 1
  1. 1.Bioinformatik, Institut für BiochemieFriedrich-Alexander-Universität Erlangen-NürnbergErlangenGermany
  2. 2.Lehrstuhl für InformationsübertragungFriedrich-Alexander-Universität Erlangen-NürnbergErlangenGermany

Personalised recommendations