Abstract
The interaction of proteins with other biomolecules plays a central role in various aspects of the structural and functional organization of the cell. Their elucidation is crucial to understand processes such as metabolic control, signal transduction, and gene regulation. However, an experimental structural characterization of all of them is impractical, and only a small fraction of the potential complexes will be amenable to direct experimental analysis. Docking represents a versatile and powerful method to predict the geometry of protein–protein complexes. However, despite significant methodical advances, the identification of good docking solutions among a large number of false solutions still remains a difficult task. The present work allowed to adapt the formalism of mutual information (MI) from information theory to protein docking. In this context, we have developed a method, which finds a lower bound for the MI between a binary and an arbitrary finite random variable with joint distributions that have a variational distance not greater than a known value to a known joint distribution. This lower bound can be applied to MI estimation with confidence intervals. Different from previous results, these confidence intervals do not need any assumptions on the distribution or the sample size. An MI-based optimization protocol in conjunction with a clustering procedure was used to define reduced amino acids alphabets describing the interface properties of protein complexes. The reduced alphabets were subsequently converted into a scoring function for the evaluation of docking solutions, which is available for public use via a web service. The approach outlined above has recently been extended to the analysis of protein–DNA complexes by taking also into account geometrical parameters of the DNA.
Keywords
- Docking Solutions
- Direct Experimental Analysis
- Reduced Amino Acid Alphabet
- Mutual Information (MI)
- Finite Symbol Alphabet
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
A finite random variable is a discrete random variable with finite symbol alphabet.
- 2.
The superscripts 1 and 2 are indices and should not be confused with powers.
- 3.
Please notice: \(\mathbf {R}\) corresponds to \(r_{XY}\), not to \(R_{XY}\).
- 4.
The letter \(\mathrm {l}\) in \(\varepsilon _\mathrm {l}\) and \(\mathbf {Q}^\mathrm {l}\) stand for lower value and should not be confused with the digit 1.
- 5.
The letter \(\mathrm {l}\) in \(\varepsilon _\mathrm {ld}\) and \(\mathbf {Q}^\mathrm {ld}\) again stands for lower value and the letter \(\mathrm {d}\) for determinant.
- 6.
The letter \(\mathrm {u}\) in \(\varepsilon _\mathrm {ud}\) and \(\mathbf {Q}^\mathrm {ud}\) stands for upper value and the letter \(\mathrm {d}\) again for determinant.
- 7.
The superscripts 1 and 2 of \(q_{Y|X}^1\), \(q_{Y|X}^2\) are indices, not powers.
Publications within the Project
Jardin C et al (2013) An information-theoretic classification of amino acids for the assessment of interfaces in protein-protein docking. J Mol Model 19(9):3901–3910
Othersen OG et al (2012) Application of information theory to feature selection in protein docking. J Mol Model 18(4):1285–1297
Stefani AG et al (2012) Towards confidence intervals for the mutual information between two binary random variables. In: Proceedings of the 9th international workshop on computational systems biology, pp 105–105
Stefani AG et al (2013) A lower bound for the confidence interval of the mutual information of high dimensional random variables. In: Proceedings of the 10th international workshop on computational systems biology, pp. 136–136
Stefani AG et al (2014a) A tight lower bound on the mutual information of a binary and an arbitrary finite random variable as a function of the variational distance. In: Australian communications theory workshop (AusCTW), pp 1–4
Stefani AG et al (2014b) Confidence intervals for the mutual information. Int J Mach Intell Sens Signal Process 1(3):201–214. doi:10.1504/IJMISSP.2014.066430
Stefani AG (2017, to appear) Nonparametric and nonasymptotic confidence intervals for estimation of mutual information with applications in protein–protein docking analysis. Ph.D. thesis. Friedrich-Alexander-Universität Erlangen-Nürnberg
Other Publications
Achtert E et al (2012) Evaluation of clusterings - metrics and visual support. In: IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1–5 April 2012
Bacardit J et al (2009) Automated alphabet reduction for protein datasets. BMC Bioinform 10(1):1–16. doi:10.1186/1471-2105-10-6
Boyd S, Vandenberghe L (2004) Convex Optimization. Cambridge University Press, New York
Cover TM, Thomas JA (2006) Elements of information theory, 2nd. Wiley, New York
Grant M, Boyd S (2014) CVX: matlab software for disciplined convex programming, version 2.1
Ho S-W, Yeung RW (2010) The interplay between entropy and variational distance. IEEE Trans Inf Theory 56(12):5906–5929
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323. doi:10.1145/331499.331504
Launay G et al (2007) Recognizing protein-protein interfaces with empirical potentials and reduced amino acid alphabets. BMC Bioinform 8(1):1–22. doi:10.1186/1471-2105-8-270
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
Melo F, Marti-Renom MA (2006) Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets. Proteins Struct Function Bioinform 63(4):986–995. doi:10.1002/prot.20881
Peterson EL et al (2009) Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment. Bioinformatics 25(11):1356–1362. http://bioinformatics.oxfordjournals.org/content/25/11/1356.abstract
Pierce B, Weng Z (2007) ZRANK: reranking protein docking predictions with an optimized energy function. Proteins Struct Function Bioinform 67(4):1078–1086. doi:10.1002/prot.21373
Vacic V, Iakoucheva LM, Radivojac P (2006) Two sample logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics 22(12):1536–1537
Weissman T et al (2003) Inequalities for the \(L_{1}\) deviation of the empirical distribution. Technical report HPL-2003-97 (R.1). Palo Alto: HP Laboratories
Yang Y, Zhou Y (2008) Specific interactions for ab initio folding of protein terminal regions with secondary structures. Proteins Struct Function Bioinform 72(2):793–803. doi:10.1002/prot.21968
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Stefani, A.G., Sandmann, A., Burkovski, A., Huber, J.B., Sticht, H., Jardin, C. (2018). Application of Methods from Information Theory in Protein-Interaction Analysis. In: Bossert, M. (eds) Information- and Communication Theory in Molecular Biology. Lecture Notes in Bioengineering. Springer, Cham. https://doi.org/10.1007/978-3-319-54729-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-54729-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54728-2
Online ISBN: 978-3-319-54729-9
eBook Packages: EngineeringEngineering (R0)