Abstract
We explored the extent to which graph algorithms for community detection can improve the mining of structural information from the predicted Boltzmann/Gibbs ensemble for the biological objects known as RNA secondary structures. As described, a new computational pipeline was developed, implemented, and tested against the prior method RNAStructProfiling. Since the new approach was judged to provide more structural information in 75% of the test cases, this proof-of-principle analysis supports efforts to improve the data mining of RNA secondary structure ensembles.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
The results in this manuscript use a new version of profiling which is still under development and will be made available at https://github.com/gtDMMB/RNAStructProfiling.
References
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. - Theory Methods 3(1), 1–27 (1974)
Chan, C.Y., Lawrence, C.E., Ding, Y.: Structure clustering features on the Sfold Web server. Bioinformatics 21(20), 3926–3928 (2005)
Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 066111 (2004)
Cordasco, G., Gargano, L.: Community detection via semi-synchronous label propagation algorithms. Int. J. Soc. Netw. Min. 1, 3–26 (2012)
Crick, F.: Codon—anticodon pairing: The wobble hypothesis. J. Mol. Biol. 19(2), 548–555 (1966)
Ding, Y., Chan, C.Y., Lawrence, C.E.: RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. RNA 11(8), 1157–1166 (2005)
Ding, Y., Chan, C.Y., Lawrence, C.E.: Clustering of RNA secondary structures with application to messenger RNAs. J. Mol. Biol. 359(3), 554–571 (2006)
Ding, Y., Lawrence, C.E.: A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 31(24), 7280–7301 (2003)
Giegerich, R., Voß, B., Rehmsmeier, M.: Abstract shapes of RNA. Nucleic Acids Res. 32(16), 4843–4851 (2004)
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. U.S.A. 99, 7821–7826 (2002)
Hagberg, A.A., Schult, D.A., Swart, P.J.: Exploring network structure, dynamics, and function using NetworkX. In: G. Varoquaux, T. Vaught, J. Millman (eds.) Proceedings of the 7th Python in Science Conference, pp. 11 – 15. Pasadena, CA USA (2008)
Huang, J., Voß, B.: Analysing RNA-kinetics based on folding space abstraction. BMC Bioinformatics 15, 60 (2014)
Kerpedjiev, P., Hammer, S., Hofacker, I.L.: Forna (force-directed RNA): simple and effective online RNA secondary structure diagrams. Bioinformatics 31(20), 3377–3379 (2015)
Lenz, D.H., Mok, K.C., Lilley, B.N., Kulkarni, R.V., Wingreen, N.S., Bassler, B.L.: The small RNA chaperone Hfq and multiple small RNAs control quorum sensing in Vibrio harveyi and Vibrio cholerae. Cell 117(1), 69–82 (2004)
McCaskill, J.S.: The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers: Original Research on Biomolecules 29(6-7), 1105–1119 (1990)
Nussinov, R., Pieczenik, G., Griggs, J.R., Kleitman, D.J.: Algorithms for loop matchings. SIAM J. Appl. Math. 35(1), 68–82 (1978)
Pérez-Reytor, D.e.a.: Role of non-coding regulatory RNA in the virulence of human pathogenic Vibrios. Front. Microbiol. 7 (2017)
Pferschy, U., Schauer, J.: The maximum flow problem with disjunctive constraints. J. Comb. Optim. 26(1), 109–119 (2013)
Rogers, E., Heitsch, C.E.: Profiling small RNA reveals multimodal substructural signals in a Boltzmann ensemble. Nucleic Acids Res. 42(22), e171–e171 (2014)
Schaeffer, S.E.: Graph clustering. Computer Science Review 1, 27–64 (2007)
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27(3), 379–423 (1948)
Steffen, P., Voß, B., Rehmsmeier, M., Reeder, J., Giegerich, R.: RNAshapes: an integrated RNA analysis package based on abstract shapes. Bioinformatics 22(4), 500–503 (2005)
Tu, K.C., Bassler, B.L.: Multiple small RNAs act additively to integrate sensory information and control quorum sensing in Vibrio harveyi. Genes Dev 21, 221–233 (2007)
Turner, D.H., Mathews, D.H.: NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res 38(suppl 1), D280–D282 (2010)
Turner, D.H., Sugimoto, N., Freier, S.M.: RNA structure prediction. Annu. Rev. Biophys. Biophys. Chem. 17(1), 167–192 (1988)
Waterman, M.S., Smith, T.F.: RNA secondary structure: A complete mathematical analysis. Math. Biosci. 42(3-4), 257–266 (1978)
Watson, J.D., Crick, F.H., et al.: Molecular structure of nucleic acids. Nature 171(4356), 737–738 (1953)
Wuchty, S., Fontana, W., Hofacker, I.L., Schuster, P.: Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers: Original Research on Biomolecules 49(2), 145–165 (1999)
Zuker, M.: Computer prediction of RNA structure. In: Methods in enzymology, vol. 180, pp. 262–288. Elsevier (1989)
Zuker, M.: On finding all suboptimal foldings of an RNA molecule. Science 244(4900), 48–52 (1989)
Zuker, M., Mathews, D.H., Turner, D.H.: Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide. In: RNA Biochem Biotechnol, pp. 11–43. Springer (1999)
Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9(1), 133–148 (1981)
Acknowledgements
The work described herein was initiated during the Collaborative Workshop for Women in Mathematical Biology hosted by the Institute for Pure and Applied Mathematics at the University of California, Los Angeles in June 2019. Funding for the workshop was provided by IPAM, the Association for Women in Mathematics’ NSF ADVANCE “Career Advancement for Women Through Research-Focused Networks” (NSF-HRD 1500481) and the Society for Industrial and Applied Mathematics. The authors thank the organizers of the IPAM-WBIO workshop (Rebecca Segal, Blerta Shtylla, and Suzanne Sindi) for facilitating this research.
The authors thank the anonymous reviewers whose comments significantly improved the paper.
This research of Huijing Du was supported in part by NSF-DMS 1853636; of Margherita Maria Ferrari by the NSF-Simons Southeast Center for Mathematics and Biology (SCMB) through NSF-DMS 1764406 and Simons Foundation SFARI 594594; of Christine Heitsch by NIH R01GM126554; of Forrest Hurley by GBMF4560 (to Sullivan) and NSF-DMS 1344199 (to Heitsch); of Christine Mennicke by an NSF GRFP; of Blair D. Sullivan by the Gordon & Betty Moore Foundation’s Data-Driven Discovery Initiative under Grant GBMF4560; and of Bin Xu by the Robert and Sara Lumpkins Endowment for Postdoctoral Fellows in Applied and Computational Mathematics and Statistics at the University of Notre Dame.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Two data files (Figs. 9 and 10) are included to supplement the examples presented and discussed in Sect. 4.5. The files begin by listing the RNA sequence analyzed and the CH index for this partition, which had the largest proportional increase. Recall that the (dis)similarity measure is the frequency-weighted symmetric difference and the community detection is performed by the GMC graph algorithm.
The clusters of the partition are listed by decreasing total frequency. Each one is first described by a “Summary” which lists a cluster label (‘partition id’) and its total frequency mass, followed by the list of helix classes appearing in the union of the the cluster elements. This Summary is followed by a list of all the extended profiles in the cluster, also sorted by decreasing frequency. Each extended profile is specified by the cluster label, its frequency in the sample (‘multiplicity’), and the set of helix classes particular to this extended profile.
Rights and permissions
Copyright information
© 2021 The Association for Women in Mathematics and the Author(s)
About this chapter
Cite this chapter
Du, H. et al. (2021). Secondary Structure Ensemble Analysis via Community Detection. In: Segal, R., Shtylla, B., Sindi, S. (eds) Using Mathematics to Understand Biological Complexity. Association for Women in Mathematics Series, vol 22. Springer, Cham. https://doi.org/10.1007/978-3-030-57129-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-57129-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57128-3
Online ISBN: 978-3-030-57129-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)