Information-Theoretic Inference of an Optimal Dictionary of Protein Supersecondary Structures
We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340–349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159–164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.html.
Key wordsMinimum message length MML Tableau representation Protein folding pattern Supersecondary structure
- 3.Subramanian R, Allison L, Stuckey PJ, Garcia De La Banda M, Abramson D, Lesk AM, Konagurthu AS (2017) Statistical compression of protein folding patterns for inference of recurrent substructural themes. In: IEEE data compression conference proceedings (DCC), pp 340–349Google Scholar
- 9.Wallace CS (2005) Statistical and inductive inference by minimum message length. Springer Science & Business Media, New YorkGoogle Scholar