Abstract
In this paper, we introduce multiscale persistent functions for biomolecular structure characterization. The essential idea is to combine our multiscale rigidity functions (MRFs) with persistent homology analysis, so as to construct a series of multiscale persistent functions, particularly multiscale persistent entropies, for structure characterization. To clarify the fundamental idea of our method, the multiscale persistent entropy (MPE) model is discussed in great detail. Mathematically, unlike the previous persistent entropy (Chintakunta et al. in Pattern Recognit 48(2):391–401, 2015; Merelli et al. in Entropy 17(10):6872–6892, 2015; Rucco et al. in: Proceedings of ECCS 2014, Springer, pp 117–128, 2016), a special resolution parameter is incorporated into our model. Various scales can be achieved by tuning its value. Physically, our MPE can be used in conformational entropy evaluation. More specifically, it is found that our method incorporates in it a natural classification scheme. This is achieved through a density filtration of an MRF built from angular distributions. To further validate our model, a systematical comparison with the traditional entropy evaluation model is done. It is found that our model is able to preserve the intrinsic topological features of biomolecular data much better than traditional approaches, particularly for resolutions in the intermediate range. Moreover, by comparing with traditional entropies from various grid sizes, bond angle-based methods and a persistent homology-based support vector machine method (Cang et al. in Mol Based Math Biol 3:140–162, 2015), we find that our MPE method gives the best results in terms of average true positive rate in a classic protein structure classification test. More interestingly, all-alpha and all-beta protein classes can be clearly separated from each other with zero error only in our model. Finally, a special protein structure index (PSI) is proposed, for the first time, to describe the “regularity” of protein structures. Basically, a protein structure is deemed as regular if it has a consistent and orderly configuration. Our PSI model is tested on a database of 110 proteins; we find that structures with larger portions of loops and intrinsically disorder regions are always associated with larger PSI, meaning an irregular configuration, while proteins with larger portions of secondary structures, i.e., alpha-helix or beta-sheet, have smaller PSI. Essentially, PSI can be used to describe the “regularity” information in any systems.
Similar content being viewed by others
References
Baron R, Hunenberger PH, McCammon JA (2009) Absolute single-molecule entropies from quasi-harmonic analysis of microsecond molecular dynamics: correction terms and convergence properties. J Chem Theory Comput 5(12):3150–3160
Baruah A, Rani P, Biswas P (2015) Conformational entropy of intrinsically disordered proteins from amino acid triads. Sci Rep 5:11740
Bauer U, Kerber M, Reininghaus J (2014) Distributed computation of persistent homology. In: Proceedings of the sixteenth workshop on algorithm engineering and experiments (ALENEX), 2014
Bendich P, Edelsbrunner H, Kerber M (2010) Computing robustness and persistence for images. IEEE Trans Vis Comput Gr 16:1251–1260
Biasotti S, De Floriani L, Falcidieno B, Frosini P, Giorgi D, Landi C, Papaleo L, Spagnuolo M (2008) Describing shapes by geometrical-topological properties of real functions. ACM Comput Surv 40(4):12
Binchi J, Merelli E, Rucco M, Petri G, Vaccarino F (2014) jHoles: a tool for understanding biological complex networks via clique weight rank persistent homology. Electron Notes Theor Comput Sci 306:5–18
Bowen R (1973) Topological entropy for noncompact sets. Trans Am Math Soc 184:125–136
Brady GP, Sharp KA (1997) Entropy in protein folding and in protein protein interactions. Curr Opinn Struct Biol 7(2):215–221
Brooijmans N, Kuntz ID (2003) Molecular recognition and docking algorithms. Ann Rev Biophys Biomol Struct 32(1):335–373
Bubenik P (2015) Statistical topological data analysis using persistence landscapes. J Mach Learn Res 16(1):77–102
Bubenik P, Kim PT (2007) A statistical approach to persistent homology. Homol Homot Appl 19:337–362
Cang ZX, Mu L, Wu KD, Opron K, Xia KL, Wei GW (2015) A topological approach to protein classification. Mol Based Math Biol 3:140–162
Carlsson G (2009) Topology and data. Am Math Soc 46(2):255–308
Carlsson G (2014) Topological pattern recognition for point cloud data. Acta Numerica 23:289
Carlsson G, Ishkhanov T, Silva V, Zomorodian A (2008) On the local behavior of spaces of natural images. Int J Comput Vis 76(1):1–12
Carlsson G, Singh G, Zomorodian A (2009) Computing multidimensional persistence. Algorithms and computation. Springer, Berlin, pp 730–739
Carlsson G, Zomorodian A (2009) The theory of multidimensional persistence. Discrete Comput Geom 42(1):71–93
Cerri A, Fabio B, Ferri M, Frosini P, Landi C (2013) Betti numbers in multidimensional persistent homology are stable functions. Math Methods Appl Sci 36(12):1543–1557
Cerri A, Landi C (2013) The persistence space in multidimensional persistent homology. Discrete geometry for computer imagery. Springer, Berlin, pp 180–191
Chazal F, De Silva V, Oudot S (2014) Persistence stability for geometric complexes. Geometriae Dedicata 173(1):193–214
Chintakunta H, Gentimis T, Gonzalez-Diaz R, Jimenez MJ, Krim H (2015) An entropy-based persistence barcode. Pattern Recognit 48(2):391–401
Chung F (1997) Spectral graph theory. American Mathematical Society, Providence
Cohen-Steiner D, Edelsbrunner H, Morozov D (2006) Vines and vineyards by updating persistence in linear time. In: Proceedings of the twenty-second annual symposium on Computational geometry, ACM. pp 119–126
Dey TK, Li KY, Sun J, David CS (2008) Computing geometry aware handle and tunnel loops in 3d models. ACM Trans Gr 27:45
Dey TK, Wang YS (2013) Reeb graphs: approximation and persistence. Discrete Comput Geom 49(1):46–73
Di Fabio B, Landi C (2011) A Mayer–Vietoris formula for persistent homology with an application to shape recognition in the presence of occlusions. Found Comput Math 11:499–527
Dionysus: the persistent homology software. Software available at http://www.mrzv.org/software/dionysus
Doig AJ, Sternberg MJE (1995) Side-chain conformational entropy in protein folding. Prot Sci 4(11):2247–2251
Edelsbrunner H (2010) Computational topology: an introduction. American Mathematical Society, Providence
Edelsbrunner H, Letscher D, Zomorodian A (2002) Topological persistence and simplification. Discrete Comput Geom 28:511–533
Edelsbrunner H, Mucke EP (1994) Three-dimensional alpha shapes. Phys Rev Lett 13:43–72
Fitter J (2003) A measure of conformational entropy change during thermal protein unfolding using neutron spectroscopy. Biophys J 84(6):3924–3930
Frederick KK, Marlow MS, Valentine KG, Wand AJ (2007) Conformational entropy in molecular recognition by proteins. Nature 448(7151):325–329
Frosini P, Landi C (2013) Persistent Betti numbers for a noise tolerant shape-based approach to image retrieval. Pattern Recognit Lett 34(8):863–872
Frosini Patrizio, Landi Claudia (1999) Size theory as a topological tool for computer vision. Pattern Recognit Image Anal 9(4):596–603
Gameiro M, Hiraoka Y, Izumi S, Kramar M, Mischaikow K, Nanda V (2015) A topological measurement of protein compressibility. Jpn J Ind Appl Math 32(1):1–17
Gellman SH (1997) Introduction: molecular recognition. Chem Rev 97(5):1231–1232
Ghrist R (2008) Barcodes: the persistent topology of data. Bull Am Math Soc 45(1):61–75
Halle B (2002) Flexibility and packing in proteins. PNAS 99:1274–1279
Hatcher A (2001) Algebraic topology. Cambridge University Press, Cambridge
Horak D, Maletic S, Rajkovic M (2009) Persistent homology of complex networks. J Stat Mech Theory Exp 2009(03):P03034
Janin J, Sternberg MJ (2013) Protein flexibility, not disorder, is intrinsic to molecular recognition. F1000 Biol Rep 5(2):1–7
Kaczynski T, Mischaikow K, Mrozek M (2004) Computational homology. Springer, Springer
Karplus M, Kushick JN (1981) Method for estimating the configurational entropy of macromolecules. Macromolecules 14(2):325–332
Kasson PM, Zomorodian A, Park S, Singhal N, Guibas LJ, Pande VS (2007) Persistent voids a new structural metric for membrane fusion. Bioinformatics 23:1753–1759
Korkut A, Hendrickson WA (2013) Stereochemistry of polypeptide conformation in Coarse Grained analysis. In: Biomolecular forms and functions: a celebration of 50 years of the Ramachandran Map, World Scientific Publishing. pp 136–147
Lee H, Kang H, Chung MK, Kim B, Lee DS (2012) Persistent brain network homology from the perspective of dendrogram. IEEE Trans Med Imaging 31(12):2267–2277
Levitt M, Warshel A (1975) Computer simulation of protein folding. Nature 253(5494):694–698
Liu X, Xie Z, Yi DY (2012) A fast algorithm for constructing topological structure in large data. Homol Homot Appl 14:221–238
Marlow MS, Dogan J, Frederick KK, Valentine KG, Wand AJ (2010) The role of conformational entropy in molecular recognition by calmodulin. Nat Chem Biol 6(5):352–358
Merelli E, Rucco M, Sloot P, Tesei L (2015) Topological characterization of complex systems: using persistent entropy. Entropy 17(10):6872–6892
Mischaikow K, Mrozek M, Reiss J, Szymczak A (1999) Construction of symbolic dynamics from experimental time series. Phys Rev Lett 82:1144–1147
Mischaikow K, Nanda V (2013) Morse theory for filtrations and efficient computation of persistent homology. Discrete Comput Geom 50(2):330–353
Munkres JR (1984) Elements of algebraic topology, vol 2. Addison-Wesley, Menlo Park
Nanda V Perseus: the persistent homology software. Software available at http://www.sas.upenn.edu/~vnanda/perseus
Nguyen D, Xia KL, Wei GW (2016) Generalized flexibility–rigidity index. J Chem Phys 144(23):234106
Niyogi P, Smale S, Weinberger S (2011) A topological view of unsupervised learning from noisy data. SIAM J Comput 40:646–663
Opron K, Xia KL, Burton ZF, Wei GW (2016) Flexibility rigidity index for protein nucleic acid flexibility and fluctuation analysis. J Comput Chem 37(14):1283–1295
Opron K, Xia KL, Wei GW (2014) Fast and anisotropic flexibility–rigidity index for protein flexibility and fluctuation analysis. J Chem Phys 140:234105
Opron K, Xia KL, Wei GW (2015) Communication: capturing protein multiscale thermal fluctuations. J Chem Phys 142(21):211101
Pachauri D, Hinrichs C, Chung MK, Johnson SC, Singh V (2011) Topology-based kernels with application to inference problems in alzheimer’s disease. IEEE Trans Med Imaging 30(10):1760–1770
Rieck B, Mara H, Leitte H (2012) Multivariate data analysis using persistence-based filtering and topological signatures. IEEE Trans Vis Comput Gr 18:2382–2391
Robins Vanessa (1999) Towards computing homology from finite approximations. Topol Proc 24:503–532
Rucco M, Castiglione F, Merelli E, Pettini M (2016) Characterisation of the idiotypic immune network through persistent entropy. In: Proceedings of ECCS 2014, Springer. pp 117–128
Rucco M, Gonzalez-Diaz R, Jimenez MJ, Atienza N, Cristalli C, Concettoni E, Ferrante A, Merelli E (2017) A new topological entropy-based approach for measuring similarities among piecewise linear functions. Signal Process 134:130–138
Sapienza PJ, Lee AL (2010) Using NMR to study fast dynamics in proteins: methods and applications. Curr Opin Pharmacol 10(6):723–730
Shen MY, Sali A (2006) Statistical potential for assessment and prediction of protein structures. Prot Sci 15(11):2507–2524
Silva VD, Ghrist R (2005) Blind swarms for coverage in 2-d. In: Proceedings of robotics: science and systems, pp 01
Singh G, Memoli F, Ishkhanov T, Sapiro G, Carlsson G, Ringach DL (2008) Topological analysis of population activity in visual cortex. J Vis 8(8):11.1–18
Stites WE, Pranata J (1995) Empirical evaluation of the influence of side chains on the conformational entropy of the polypeptide backbone. Prot Struct Funct Bioinf 22(2):132–140
Tausz A, Vejdemo-Johansson M, Adams H (2011) Javaplex: a research software package for persistent (co)homology. Software available at http://code.google.com/p/javaplex
Thompson JB, Hansma HG, Hansma PK, Plaxco KW (2002) The backbone conformational entropy of protein folding: experimental measures from atomic force microscopy. J Mol Biol 322(3):645–652
Trbovic N, Cho JH, Abel R, Friesner RA, Rance M, Palmer AG III (2008) Protein side-chain dynamics and residual conformational entropy. J Am Chem Soc 131(2):615–622
Wang B, Summa B, Pascucci V, Vejdemo-Johansson M (2011) Branching and circular features in high dimensional data. IEEE Trans Vis Comput Gr 17:1902–1911
Wang B, Wei GW (2016) Object-oriented persistent homology. J Comput Phys 305:276–299
Xia KL, Feng X, Tong YY, Wei GW (2015) Persistent homology for the quantitative prediction of fullerene stability. J Comput Chem 36:408–422
Xia KL, Opron K, Wei GW (2013) Multiscale multiphysics and multidomain models—flexibility and rigidity. J Chem Phys 139:194109
Xia KL, Opron K, Wei GW (2015) Multiscale Gaussian network model (mGNM) and multiscale anisotropic network model (manm). J Chem Phys 143(20):204106
Xia KL, Wei GW (2014) Persistent homology analysis of protein structure, flexibility and folding. Int J Numer Methods Biomed Eng 30:814–844
Xia KL, Wei GW (2015) Multidimensional persistence in biomolecular data. J Comput Chem 36:1502–1520
Xia KL, Wei GW (2015) Persistent topology for cryo-EM data analysis. Int J Numer Methods Biomed Eng 31:e02719
Xia KL, Zhao ZX, Wei GW (2015) Multiresolution topological simplification. J Comput Biol 22:1–5
Yao Y, Sun J, Huang XH, Bowman GR, Singh G, Lesnick M, Guibas LJ, Pande VS, Carlsson G (2009) Topological methods for exploring low-density states in biomolecular folding pathways. J Chem Phys 130:144115
Zhang J, Lin M, Chen R, Wang W, Liang J (2008) Discrete state model and accurate estimation of loop entropy of rna secondary structures. J Chem Phys 128(12):125107
Zhong S, Moix JM, Quirk S, Hernandez R (2006) Dihedral-angle information entropy as a gauge of secondary structure propensity. Biophys J 91(11):4014–4023
Zomorodian A, Carlsson G (2005) Computing persistent homology. Discrete Comput Geom 33:249–274
Zomorodian Afra, Carlsson Gunnar (2008) Localized homology. Comput Geom Theory Appl 41(3):126–148
Acknowledgements
This work was supported in part by Nanyang Technological University Startup Grant M4081842.110 and Singapore Ministry of Education Academic Research fund Tier 1 M401110000. Zhiming Li thanks the Chinese Scholarship Council for the financial support No. 201506775038. Lin Mu’s research is based upon work supported in part by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under award number ERKJE45; and by the Laboratory Directed Research and Development program at the Oak Ridge National Laboratory, which is operated by UT-Battelle, LLC., for the U.S. Department of Energy under Contract DE-AC05-00OR22725.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xia, K., Li, Z. & Mu, L. Multiscale Persistent Functions for Biomolecular Structure Characterization. Bull Math Biol 80, 1–31 (2018). https://doi.org/10.1007/s11538-017-0362-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11538-017-0362-6