Application of Methods from Information Theory in Protein-Interaction Analysis

  • Arno G. StefaniEmail author
  • Achim Sandmann
  • Andreas Burkovski
  • Johannes B. Huber
  • Heinrich Sticht
  • Christophe Jardin
Part of the Lecture Notes in Bioengineering book series (LNBE)


The interaction of proteins with other biomolecules plays a central role in various aspects of the structural and functional organization of the cell. Their elucidation is crucial to understand processes such as metabolic control, signal transduction, and gene regulation. However, an experimental structural characterization of all of them is impractical, and only a small fraction of the potential complexes will be amenable to direct experimental analysis. Docking represents a versatile and powerful method to predict the geometry of protein–protein complexes. However, despite significant methodical advances, the identification of good docking solutions among a large number of false solutions still remains a difficult task. The present work allowed to adapt the formalism of mutual information (MI) from information theory to protein docking. In this context, we have developed a method, which finds a lower bound for the MI between a binary and an arbitrary finite random variable with joint distributions that have a variational distance not greater than a known value to a known joint distribution. This lower bound can be applied to MI estimation with confidence intervals. Different from previous results, these confidence intervals do not need any assumptions on the distribution or the sample size. An MI-based optimization protocol in conjunction with a clustering procedure was used to define reduced amino acids alphabets describing the interface properties of protein complexes. The reduced alphabets were subsequently converted into a scoring function for the evaluation of docking solutions, which is available for public use via a web service. The approach outlined above has recently been extended to the analysis of protein–DNA complexes by taking also into account geometrical parameters of the DNA.


Docking Solutions Direct Experimental Analysis Reduced Amino Acid Alphabet Mutual Information (MI) Finite Symbol Alphabet 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publications within the Project

  1. Jardin C et al (2013) An information-theoretic classification of amino acids for the assessment of interfaces in protein-protein docking. J Mol Model 19(9):3901–3910CrossRefGoogle Scholar
  2. Othersen OG et al (2012) Application of information theory to feature selection in protein docking. J Mol Model 18(4):1285–1297CrossRefGoogle Scholar
  3. Stefani AG et al (2012) Towards confidence intervals for the mutual information between two binary random variables. In: Proceedings of the 9th international workshop on computational systems biology, pp 105–105Google Scholar
  4. Stefani AG et al (2013) A lower bound for the confidence interval of the mutual information of high dimensional random variables. In: Proceedings of the 10th international workshop on computational systems biology, pp. 136–136Google Scholar
  5. Stefani AG et al (2014a) A tight lower bound on the mutual information of a binary and an arbitrary finite random variable as a function of the variational distance. In: Australian communications theory workshop (AusCTW), pp 1–4Google Scholar
  6. Stefani AG et al (2014b) Confidence intervals for the mutual information. Int J Mach Intell Sens Signal Process 1(3):201–214. doi: 10.1504/IJMISSP.2014.066430
  7. Stefani AG (2017, to appear) Nonparametric and nonasymptotic confidence intervals for estimation of mutual information with applications in protein–protein docking analysis. Ph.D. thesis. Friedrich-Alexander-Universität Erlangen-NürnbergGoogle Scholar

Other Publications

  1. Achtert E et al (2012) Evaluation of clusterings - metrics and visual support. In: IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1–5 April 2012Google Scholar
  2. Bacardit J et al (2009) Automated alphabet reduction for protein datasets. BMC Bioinform 10(1):1–16. doi: 10.1186/1471-2105-10-6 MathSciNetCrossRefGoogle Scholar
  3. Boyd S, Vandenberghe L (2004) Convex Optimization. Cambridge University Press, New YorkCrossRefzbMATHGoogle Scholar
  4. Cover TM, Thomas JA (2006) Elements of information theory, 2nd. Wiley, New YorkzbMATHGoogle Scholar
  5. Grant M, Boyd S (2014) CVX: matlab software for disciplined convex programming, version 2.1Google Scholar
  6. Ho S-W, Yeung RW (2010) The interplay between entropy and variational distance. IEEE Trans Inf Theory 56(12):5906–5929MathSciNetCrossRefzbMATHGoogle Scholar
  7. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323. doi: 10.1145/331499.331504 CrossRefGoogle Scholar
  8. Launay G et al (2007) Recognizing protein-protein interfaces with empirical potentials and reduced amino acid alphabets. BMC Bioinform 8(1):1–22. doi: 10.1186/1471-2105-8-270 CrossRefGoogle Scholar
  9. Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137MathSciNetCrossRefzbMATHGoogle Scholar
  10. Melo F, Marti-Renom MA (2006) Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets. Proteins Struct Function Bioinform 63(4):986–995. doi: 10.1002/prot.20881 CrossRefGoogle Scholar
  11. Peterson EL et al (2009) Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment. Bioinformatics 25(11):1356–1362.
  12. Pierce B, Weng Z (2007) ZRANK: reranking protein docking predictions with an optimized energy function. Proteins Struct Function Bioinform 67(4):1078–1086. doi: 10.1002/prot.21373 CrossRefGoogle Scholar
  13. Vacic V, Iakoucheva LM, Radivojac P (2006) Two sample logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics 22(12):1536–1537CrossRefGoogle Scholar
  14. Weissman T et al (2003) Inequalities for the \(L_{1}\) deviation of the empirical distribution. Technical report HPL-2003-97 (R.1). Palo Alto: HP LaboratoriesGoogle Scholar
  15. Yang Y, Zhou Y (2008) Specific interactions for ab initio folding of protein terminal regions with secondary structures. Proteins Struct Function Bioinform 72(2):793–803. doi: 10.1002/prot.21968

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Arno G. Stefani
    • 1
    Email author
  • Achim Sandmann
    • 2
  • Andreas Burkovski
    • 3
  • Johannes B. Huber
    • 1
  • Heinrich Sticht
    • 2
  • Christophe Jardin
    • 2
  1. 1.Institute for Information TransmissionFriedrich-Alexander Universität Erlangen-NürnbergErlangenGermany
  2. 2.Emil-Fischer Zentrum, Institut für BiochemieFriedrich-Alexander Universität Erlangen-NürnbergErlangenGermany
  3. 3.Department BiologieFriedrich-Alexander Universität Erlangen-NürnbergErlangenGermany

Personalised recommendations