Multiscale Persistent Functions for Biomolecular Structure Characterization

Abstract

In this paper, we introduce multiscale persistent functions for biomolecular structure characterization. The essential idea is to combine our multiscale rigidity functions (MRFs) with persistent homology analysis, so as to construct a series of multiscale persistent functions, particularly multiscale persistent entropies, for structure characterization. To clarify the fundamental idea of our method, the multiscale persistent entropy (MPE) model is discussed in great detail. Mathematically, unlike the previous persistent entropy (Chintakunta et al. in Pattern Recognit 48(2):391–401, 2015; Merelli et al. in Entropy 17(10):6872–6892, 2015; Rucco et al. in: Proceedings of ECCS 2014, Springer, pp 117–128, 2016), a special resolution parameter is incorporated into our model. Various scales can be achieved by tuning its value. Physically, our MPE can be used in conformational entropy evaluation. More specifically, it is found that our method incorporates in it a natural classification scheme. This is achieved through a density filtration of an MRF built from angular distributions. To further validate our model, a systematical comparison with the traditional entropy evaluation model is done. It is found that our model is able to preserve the intrinsic topological features of biomolecular data much better than traditional approaches, particularly for resolutions in the intermediate range. Moreover, by comparing with traditional entropies from various grid sizes, bond angle-based methods and a persistent homology-based support vector machine method (Cang et al. in Mol Based Math Biol 3:140–162, 2015), we find that our MPE method gives the best results in terms of average true positive rate in a classic protein structure classification test. More interestingly, all-alpha and all-beta protein classes can be clearly separated from each other with zero error only in our model. Finally, a special protein structure index (PSI) is proposed, for the first time, to describe the “regularity” of protein structures. Basically, a protein structure is deemed as regular if it has a consistent and orderly configuration. Our PSI model is tested on a database of 110 proteins; we find that structures with larger portions of loops and intrinsically disorder regions are always associated with larger PSI, meaning an irregular configuration, while proteins with larger portions of secondary structures, i.e., alpha-helix or beta-sheet, have smaller PSI. Essentially, PSI can be used to describe the “regularity” information in any systems.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

References

  1. Baron R, Hunenberger PH, McCammon JA (2009) Absolute single-molecule entropies from quasi-harmonic analysis of microsecond molecular dynamics: correction terms and convergence properties. J Chem Theory Comput 5(12):3150–3160

    Article  Google Scholar 

  2. Baruah A, Rani P, Biswas P (2015) Conformational entropy of intrinsically disordered proteins from amino acid triads. Sci Rep 5:11740

    Article  Google Scholar 

  3. Bauer U, Kerber M, Reininghaus J (2014) Distributed computation of persistent homology. In: Proceedings of the sixteenth workshop on algorithm engineering and experiments (ALENEX), 2014

  4. Bendich P, Edelsbrunner H, Kerber M (2010) Computing robustness and persistence for images. IEEE Trans Vis Comput Gr 16:1251–1260

    Article  Google Scholar 

  5. Biasotti S, De Floriani L, Falcidieno B, Frosini P, Giorgi D, Landi C, Papaleo L, Spagnuolo M (2008) Describing shapes by geometrical-topological properties of real functions. ACM Comput Surv 40(4):12

    Article  Google Scholar 

  6. Binchi J, Merelli E, Rucco M, Petri G, Vaccarino F (2014) jHoles: a tool for understanding biological complex networks via clique weight rank persistent homology. Electron Notes Theor Comput Sci 306:5–18

    MathSciNet  MATH  Article  Google Scholar 

  7. Bowen R (1973) Topological entropy for noncompact sets. Trans Am Math Soc 184:125–136

    MathSciNet  MATH  Article  Google Scholar 

  8. Brady GP, Sharp KA (1997) Entropy in protein folding and in protein protein interactions. Curr Opinn Struct Biol 7(2):215–221

    Article  Google Scholar 

  9. Brooijmans N, Kuntz ID (2003) Molecular recognition and docking algorithms. Ann Rev Biophys Biomol Struct 32(1):335–373

    Article  Google Scholar 

  10. Bubenik P (2015) Statistical topological data analysis using persistence landscapes. J Mach Learn Res 16(1):77–102

    MathSciNet  MATH  Google Scholar 

  11. Bubenik P, Kim PT (2007) A statistical approach to persistent homology. Homol Homot Appl 19:337–362

    MathSciNet  MATH  Article  Google Scholar 

  12. Cang ZX, Mu L, Wu KD, Opron K, Xia KL, Wei GW (2015) A topological approach to protein classification. Mol Based Math Biol 3:140–162

    MATH  Google Scholar 

  13. Carlsson G (2009) Topology and data. Am Math Soc 46(2):255–308

    MathSciNet  MATH  Article  Google Scholar 

  14. Carlsson G (2014) Topological pattern recognition for point cloud data. Acta Numerica 23:289

    MathSciNet  Article  Google Scholar 

  15. Carlsson G, Ishkhanov T, Silva V, Zomorodian A (2008) On the local behavior of spaces of natural images. Int J Comput Vis 76(1):1–12

    MathSciNet  Article  Google Scholar 

  16. Carlsson G, Singh G, Zomorodian A (2009) Computing multidimensional persistence. Algorithms and computation. Springer, Berlin, pp 730–739

    MATH  Google Scholar 

  17. Carlsson G, Zomorodian A (2009) The theory of multidimensional persistence. Discrete Comput Geom 42(1):71–93

    MathSciNet  MATH  Article  Google Scholar 

  18. Cerri A, Fabio B, Ferri M, Frosini P, Landi C (2013) Betti numbers in multidimensional persistent homology are stable functions. Math Methods Appl Sci 36(12):1543–1557

    MathSciNet  MATH  Article  Google Scholar 

  19. Cerri A, Landi C (2013) The persistence space in multidimensional persistent homology. Discrete geometry for computer imagery. Springer, Berlin, pp 180–191

    MATH  Book  Google Scholar 

  20. Chazal F, De Silva V, Oudot S (2014) Persistence stability for geometric complexes. Geometriae Dedicata 173(1):193–214

    MathSciNet  MATH  Article  Google Scholar 

  21. Chintakunta H, Gentimis T, Gonzalez-Diaz R, Jimenez MJ, Krim H (2015) An entropy-based persistence barcode. Pattern Recognit 48(2):391–401

    MATH  Article  Google Scholar 

  22. Chung F (1997) Spectral graph theory. American Mathematical Society, Providence

    MATH  Google Scholar 

  23. Cohen-Steiner D, Edelsbrunner H, Morozov D (2006) Vines and vineyards by updating persistence in linear time. In: Proceedings of the twenty-second annual symposium on Computational geometry, ACM. pp 119–126

  24. Dey TK, Li KY, Sun J, David CS (2008) Computing geometry aware handle and tunnel loops in 3d models. ACM Trans Gr 27:45

    Article  Google Scholar 

  25. Dey TK, Wang YS (2013) Reeb graphs: approximation and persistence. Discrete Comput Geom 49(1):46–73

    MathSciNet  MATH  Article  Google Scholar 

  26. Di Fabio B, Landi C (2011) A Mayer–Vietoris formula for persistent homology with an application to shape recognition in the presence of occlusions. Found Comput Math 11:499–527

    MathSciNet  MATH  Article  Google Scholar 

  27. Dionysus: the persistent homology software. Software available at http://www.mrzv.org/software/dionysus

  28. Doig AJ, Sternberg MJE (1995) Side-chain conformational entropy in protein folding. Prot Sci 4(11):2247–2251

    Article  Google Scholar 

  29. Edelsbrunner H (2010) Computational topology: an introduction. American Mathematical Society, Providence

    MATH  Google Scholar 

  30. Edelsbrunner H, Letscher D, Zomorodian A (2002) Topological persistence and simplification. Discrete Comput Geom 28:511–533

    MathSciNet  MATH  Article  Google Scholar 

  31. Edelsbrunner H, Mucke EP (1994) Three-dimensional alpha shapes. Phys Rev Lett 13:43–72

    MATH  Google Scholar 

  32. Fitter J (2003) A measure of conformational entropy change during thermal protein unfolding using neutron spectroscopy. Biophys J 84(6):3924–3930

    Article  Google Scholar 

  33. Frederick KK, Marlow MS, Valentine KG, Wand AJ (2007) Conformational entropy in molecular recognition by proteins. Nature 448(7151):325–329

    Article  Google Scholar 

  34. Frosini P, Landi C (2013) Persistent Betti numbers for a noise tolerant shape-based approach to image retrieval. Pattern Recognit Lett 34(8):863–872

    Article  Google Scholar 

  35. Frosini Patrizio, Landi Claudia (1999) Size theory as a topological tool for computer vision. Pattern Recognit Image Anal 9(4):596–603

    Google Scholar 

  36. Gameiro M, Hiraoka Y, Izumi S, Kramar M, Mischaikow K, Nanda V (2015) A topological measurement of protein compressibility. Jpn J Ind Appl Math 32(1):1–17

  37. Gellman SH (1997) Introduction: molecular recognition. Chem Rev 97(5):1231–1232

    Article  Google Scholar 

  38. Ghrist R (2008) Barcodes: the persistent topology of data. Bull Am Math Soc 45(1):61–75

    MathSciNet  MATH  Article  Google Scholar 

  39. Halle B (2002) Flexibility and packing in proteins. PNAS 99:1274–1279

    Article  Google Scholar 

  40. Hatcher A (2001) Algebraic topology. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  41. Horak D, Maletic S, Rajkovic M (2009) Persistent homology of complex networks. J Stat Mech Theory Exp 2009(03):P03034

    MathSciNet  Article  Google Scholar 

  42. Janin J, Sternberg MJ (2013) Protein flexibility, not disorder, is intrinsic to molecular recognition. F1000 Biol Rep 5(2):1–7

    Google Scholar 

  43. Kaczynski T, Mischaikow K, Mrozek M (2004) Computational homology. Springer, Springer

    MATH  Book  Google Scholar 

  44. Karplus M, Kushick JN (1981) Method for estimating the configurational entropy of macromolecules. Macromolecules 14(2):325–332

    Article  Google Scholar 

  45. Kasson PM, Zomorodian A, Park S, Singhal N, Guibas LJ, Pande VS (2007) Persistent voids a new structural metric for membrane fusion. Bioinformatics 23:1753–1759

    Article  Google Scholar 

  46. Korkut A, Hendrickson WA (2013) Stereochemistry of polypeptide conformation in Coarse Grained analysis. In: Biomolecular forms and functions: a celebration of 50 years of the Ramachandran Map, World Scientific Publishing. pp 136–147

  47. Lee H, Kang H, Chung MK, Kim B, Lee DS (2012) Persistent brain network homology from the perspective of dendrogram. IEEE Trans Med Imaging 31(12):2267–2277

    Article  Google Scholar 

  48. Levitt M, Warshel A (1975) Computer simulation of protein folding. Nature 253(5494):694–698

    Article  Google Scholar 

  49. Liu X, Xie Z, Yi DY (2012) A fast algorithm for constructing topological structure in large data. Homol Homot Appl 14:221–238

    MathSciNet  MATH  Article  Google Scholar 

  50. Marlow MS, Dogan J, Frederick KK, Valentine KG, Wand AJ (2010) The role of conformational entropy in molecular recognition by calmodulin. Nat Chem Biol 6(5):352–358

    Article  Google Scholar 

  51. Merelli E, Rucco M, Sloot P, Tesei L (2015) Topological characterization of complex systems: using persistent entropy. Entropy 17(10):6872–6892

    Article  Google Scholar 

  52. Mischaikow K, Mrozek M, Reiss J, Szymczak A (1999) Construction of symbolic dynamics from experimental time series. Phys Rev Lett 82:1144–1147

    Article  Google Scholar 

  53. Mischaikow K, Nanda V (2013) Morse theory for filtrations and efficient computation of persistent homology. Discrete Comput Geom 50(2):330–353

    MathSciNet  MATH  Article  Google Scholar 

  54. Munkres JR (1984) Elements of algebraic topology, vol 2. Addison-Wesley, Menlo Park

    MATH  Google Scholar 

  55. Nanda V Perseus: the persistent homology software. Software available at http://www.sas.upenn.edu/~vnanda/perseus

  56. Nguyen D, Xia KL, Wei GW (2016) Generalized flexibility–rigidity index. J Chem Phys 144(23):234106

    Article  Google Scholar 

  57. Niyogi P, Smale S, Weinberger S (2011) A topological view of unsupervised learning from noisy data. SIAM J Comput 40:646–663

    MathSciNet  MATH  Article  Google Scholar 

  58. Opron K, Xia KL, Burton ZF, Wei GW (2016) Flexibility rigidity index for protein nucleic acid flexibility and fluctuation analysis. J Comput Chem 37(14):1283–1295

    Article  Google Scholar 

  59. Opron K, Xia KL, Wei GW (2014) Fast and anisotropic flexibility–rigidity index for protein flexibility and fluctuation analysis. J Chem Phys 140:234105

    Article  Google Scholar 

  60. Opron K, Xia KL, Wei GW (2015) Communication: capturing protein multiscale thermal fluctuations. J Chem Phys 142(21):211101

    Article  Google Scholar 

  61. Pachauri D, Hinrichs C, Chung MK, Johnson SC, Singh V (2011) Topology-based kernels with application to inference problems in alzheimer’s disease. IEEE Trans Med Imaging 30(10):1760–1770

    Article  Google Scholar 

  62. Rieck B, Mara H, Leitte H (2012) Multivariate data analysis using persistence-based filtering and topological signatures. IEEE Trans Vis Comput Gr 18:2382–2391

    Article  Google Scholar 

  63. Robins Vanessa (1999) Towards computing homology from finite approximations. Topol Proc 24:503–532

    MathSciNet  MATH  Google Scholar 

  64. Rucco M, Castiglione F, Merelli E, Pettini M (2016) Characterisation of the idiotypic immune network through persistent entropy. In: Proceedings of ECCS 2014, Springer. pp 117–128

  65. Rucco M, Gonzalez-Diaz R, Jimenez MJ, Atienza N, Cristalli C, Concettoni E, Ferrante A, Merelli E (2017) A new topological entropy-based approach for measuring similarities among piecewise linear functions. Signal Process 134:130–138

    Article  Google Scholar 

  66. Sapienza PJ, Lee AL (2010) Using NMR to study fast dynamics in proteins: methods and applications. Curr Opin Pharmacol 10(6):723–730

    Article  Google Scholar 

  67. Shen MY, Sali A (2006) Statistical potential for assessment and prediction of protein structures. Prot Sci 15(11):2507–2524

    Article  Google Scholar 

  68. Silva VD, Ghrist R (2005) Blind swarms for coverage in 2-d. In: Proceedings of robotics: science and systems, pp 01

  69. Singh G, Memoli F, Ishkhanov T, Sapiro G, Carlsson G, Ringach DL (2008) Topological analysis of population activity in visual cortex. J Vis 8(8):11.1–18

  70. Stites WE, Pranata J (1995) Empirical evaluation of the influence of side chains on the conformational entropy of the polypeptide backbone. Prot Struct Funct Bioinf 22(2):132–140

    Article  Google Scholar 

  71. Tausz A, Vejdemo-Johansson M, Adams H (2011) Javaplex: a research software package for persistent (co)homology. Software available at http://code.google.com/p/javaplex

  72. Thompson JB, Hansma HG, Hansma PK, Plaxco KW (2002) The backbone conformational entropy of protein folding: experimental measures from atomic force microscopy. J Mol Biol 322(3):645–652

    Article  Google Scholar 

  73. Trbovic N, Cho JH, Abel R, Friesner RA, Rance M, Palmer AG III (2008) Protein side-chain dynamics and residual conformational entropy. J Am Chem Soc 131(2):615–622

    Article  Google Scholar 

  74. Wang B, Summa B, Pascucci V, Vejdemo-Johansson M (2011) Branching and circular features in high dimensional data. IEEE Trans Vis Comput Gr 17:1902–1911

    Article  Google Scholar 

  75. Wang B, Wei GW (2016) Object-oriented persistent homology. J Comput Phys 305:276–299

    MathSciNet  MATH  Article  Google Scholar 

  76. Xia KL, Feng X, Tong YY, Wei GW (2015) Persistent homology for the quantitative prediction of fullerene stability. J Comput Chem 36:408–422

    Article  Google Scholar 

  77. Xia KL, Opron K, Wei GW (2013) Multiscale multiphysics and multidomain models—flexibility and rigidity. J Chem Phys 139:194109

    Article  Google Scholar 

  78. Xia KL, Opron K, Wei GW (2015) Multiscale Gaussian network model (mGNM) and multiscale anisotropic network model (manm). J Chem Phys 143(20):204106

    Article  Google Scholar 

  79. Xia KL, Wei GW (2014) Persistent homology analysis of protein structure, flexibility and folding. Int J Numer Methods Biomed Eng 30:814–844

    MathSciNet  Article  Google Scholar 

  80. Xia KL, Wei GW (2015) Multidimensional persistence in biomolecular data. J Comput Chem 36:1502–1520

    Article  Google Scholar 

  81. Xia KL, Wei GW (2015) Persistent topology for cryo-EM data analysis. Int J Numer Methods Biomed Eng 31:e02719

    MathSciNet  Article  Google Scholar 

  82. Xia KL, Zhao ZX, Wei GW (2015) Multiresolution topological simplification. J Comput Biol 22:1–5

    Article  Google Scholar 

  83. Yao Y, Sun J, Huang XH, Bowman GR, Singh G, Lesnick M, Guibas LJ, Pande VS, Carlsson G (2009) Topological methods for exploring low-density states in biomolecular folding pathways. J Chem Phys 130:144115

    Article  Google Scholar 

  84. Zhang J, Lin M, Chen R, Wang W, Liang J (2008) Discrete state model and accurate estimation of loop entropy of rna secondary structures. J Chem Phys 128(12):125107

    Article  Google Scholar 

  85. Zhong S, Moix JM, Quirk S, Hernandez R (2006) Dihedral-angle information entropy as a gauge of secondary structure propensity. Biophys J 91(11):4014–4023

    Article  Google Scholar 

  86. Zomorodian A, Carlsson G (2005) Computing persistent homology. Discrete Comput Geom 33:249–274

    MathSciNet  MATH  Article  Google Scholar 

  87. Zomorodian Afra, Carlsson Gunnar (2008) Localized homology. Comput Geom Theory Appl 41(3):126–148

    MathSciNet  MATH  Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by Nanyang Technological University Startup Grant M4081842.110 and Singapore Ministry of Education Academic Research fund Tier 1 M401110000. Zhiming Li thanks the Chinese Scholarship Council for the financial support No. 201506775038. Lin Mu’s research is based upon work supported in part by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under award number ERKJE45; and by the Laboratory Directed Research and Development program at the Oak Ridge National Laboratory, which is operated by UT-Battelle, LLC., for the U.S. Department of Energy under Contract DE-AC05-00OR22725.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Kelin Xia.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xia, K., Li, Z. & Mu, L. Multiscale Persistent Functions for Biomolecular Structure Characterization. Bull Math Biol 80, 1–31 (2018). https://doi.org/10.1007/s11538-017-0362-6

Download citation

Keywords

  • Conformational entropy (CE)
  • Persistent entropy
  • Multiscale rigidity function (MRF)
  • Multiscale persistent function (MPF)
  • Multiscale persistent entropy (MPE)
  • Protein structure
  • Persistent homology