Information-Theoretic Inference of an Optimal Dictionary of Protein Supersecondary Structures

  • Arun S. KonagurthuEmail author
  • Ramanan Subramanian
  • Lloyd Allison
  • David Abramson
  • Maria Garcia de la Banda
  • Peter J. Stuckey
  • Arthur M. Lesk
Part of the Methods in Molecular Biology book series (MIMB, volume 1958)


We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340–349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159–164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at

Key words

Minimum message length MML Tableau representation Protein folding pattern Supersecondary structure 


  1. 1.
    Lesk AM (1995) Systematic representation of protein folding patterns. J Mol Graph 13:159–164CrossRefGoogle Scholar
  2. 2.
    Konagurthu AS, Lesk AM, Allison L (2012) Minimum message length inference of secondary structure from protein coordinate data. Bioinformatics 28(12):i97–i105CrossRefGoogle Scholar
  3. 3.
    Subramanian R, Allison L, Stuckey PJ, Garcia De La Banda M, Abramson D, Lesk AM, Konagurthu AS (2017) Statistical compression of protein folding patterns for inference of recurrent substructural themes. In: IEEE data compression conference proceedings (DCC), pp 340–349Google Scholar
  4. 4.
    Fox NK, Brenner SE, Chandonia JM (2013) SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res 42(D1):D304–D309CrossRefGoogle Scholar
  5. 5.
    Kamat AP, Lesk AM (2007) Contact patterns between helices and strands of sheet define protein folding patterns. Proteins 66:869–876CrossRefGoogle Scholar
  6. 6.
    Konagurthu AS, Lesk AM (2010) Cataloging topologies of protein folding patterns. J Mol Recognit 23(2):253–257CrossRefGoogle Scholar
  7. 7.
    Konagurthu AS, Stuckey PJ, Lesk AM (2008) Structural search and retrieval using a tableau representation of protein folding patterns. Bioinformatics 24(5):645–651CrossRefGoogle Scholar
  8. 8.
    Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423 and 623–656CrossRefGoogle Scholar
  9. 9.
    Wallace CS (2005) Statistical and inductive inference by minimum message length. Springer Science & Business Media, New YorkGoogle Scholar
  10. 10.
    Allison L (2018) Coding Ockham’s Razor. Springer, ChamCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Arun S. Konagurthu
    • 1
    Email author
  • Ramanan Subramanian
    • 1
  • Lloyd Allison
    • 1
  • David Abramson
    • 2
  • Maria Garcia de la Banda
    • 1
  • Peter J. Stuckey
    • 1
    • 3
  • Arthur M. Lesk
    • 4
  1. 1.Faculty of Information TechnologyMonash UniversityClaytonAustralia
  2. 2.Research Computing CentreUniversity of QueenslandSt LuciaAustralia
  3. 3.Department of Computing and Information SystemsUniversity of MelbourneParkvilleAustralia
  4. 4.Department of Biochemistry and Molecular BiologyPennsylvania State UniversityUniversity ParkUSA

Personalised recommendations