Secondary Structure Ensemble Analysis via Community Detection

Du, Huijing; Ferrari, Margherita Maria; Heitsch, Christine; Hurley, Forrest; Mennicke, Christine V.; Sullivan, Blair D.; Xu, Bin

doi:10.1007/978-3-030-57129-0_4

Huijing Du⁵,
Margherita Maria Ferrari⁶,
Christine Heitsch⁷,
Forrest Hurley⁸,
Christine V. Mennicke⁹,
Blair D. Sullivan¹⁰ &
…
Bin Xu¹¹

Part of the book series: Association for Women in Mathematics Series ((AWMS,volume 22))

660 Accesses

Abstract

We explored the extent to which graph algorithms for community detection can improve the mining of structural information from the predicted Boltzmann/Gibbs ensemble for the biological objects known as RNA secondary structures. As described, a new computational pipeline was developed, implemented, and tested against the prior method RNAStructProfiling. Since the new approach was judged to provide more structural information in 75% of the test cases, this proof-of-principle analysis supports efforts to improve the data mining of RNA secondary structure ensembles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://rna.urmc.rochester.edu/RNAstructure.html
2.
https://github.com/gtDMMB/ipam-wbio-scripts
3.
The results in this manuscript use a new version of profiling which is still under development and will be made available at https://github.com/gtDMMB/RNAStructProfiling.

References

Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. - Theory Methods 3(1), 1–27 (1974)
Article MathSciNet Google Scholar
Chan, C.Y., Lawrence, C.E., Ding, Y.: Structure clustering features on the Sfold Web server. Bioinformatics 21(20), 3926–3928 (2005)
Article Google Scholar
Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 066111 (2004)
Article Google Scholar
Cordasco, G., Gargano, L.: Community detection via semi-synchronous label propagation algorithms. Int. J. Soc. Netw. Min. 1, 3–26 (2012)
Article Google Scholar
Crick, F.: Codon—anticodon pairing: The wobble hypothesis. J. Mol. Biol. 19(2), 548–555 (1966)
Article Google Scholar
Ding, Y., Chan, C.Y., Lawrence, C.E.: RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. RNA 11(8), 1157–1166 (2005)
Article Google Scholar
Ding, Y., Chan, C.Y., Lawrence, C.E.: Clustering of RNA secondary structures with application to messenger RNAs. J. Mol. Biol. 359(3), 554–571 (2006)
Article Google Scholar
Ding, Y., Lawrence, C.E.: A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 31(24), 7280–7301 (2003)
Article Google Scholar
Giegerich, R., Voß, B., Rehmsmeier, M.: Abstract shapes of RNA. Nucleic Acids Res. 32(16), 4843–4851 (2004)
Article Google Scholar
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. U.S.A. 99, 7821–7826 (2002)
Article MathSciNet Google Scholar
Hagberg, A.A., Schult, D.A., Swart, P.J.: Exploring network structure, dynamics, and function using NetworkX. In: G. Varoquaux, T. Vaught, J. Millman (eds.) Proceedings of the 7th Python in Science Conference, pp. 11 – 15. Pasadena, CA USA (2008)
Google Scholar
Huang, J., Voß, B.: Analysing RNA-kinetics based on folding space abstraction. BMC Bioinformatics 15, 60 (2014)
Article Google Scholar
Kerpedjiev, P., Hammer, S., Hofacker, I.L.: Forna (force-directed RNA): simple and effective online RNA secondary structure diagrams. Bioinformatics 31(20), 3377–3379 (2015)
Article Google Scholar
Lenz, D.H., Mok, K.C., Lilley, B.N., Kulkarni, R.V., Wingreen, N.S., Bassler, B.L.: The small RNA chaperone Hfq and multiple small RNAs control quorum sensing in Vibrio harveyi and Vibrio cholerae. Cell 117(1), 69–82 (2004)
Google Scholar
McCaskill, J.S.: The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers: Original Research on Biomolecules 29(6-7), 1105–1119 (1990)
Article Google Scholar
Nussinov, R., Pieczenik, G., Griggs, J.R., Kleitman, D.J.: Algorithms for loop matchings. SIAM J. Appl. Math. 35(1), 68–82 (1978)
Article MathSciNet Google Scholar
Pérez-Reytor, D.e.a.: Role of non-coding regulatory RNA in the virulence of human pathogenic Vibrios. Front. Microbiol. 7 (2017)
Google Scholar
Pferschy, U., Schauer, J.: The maximum flow problem with disjunctive constraints. J. Comb. Optim. 26(1), 109–119 (2013)
Article MathSciNet Google Scholar
Rogers, E., Heitsch, C.E.: Profiling small RNA reveals multimodal substructural signals in a Boltzmann ensemble. Nucleic Acids Res. 42(22), e171–e171 (2014)
Article Google Scholar
Schaeffer, S.E.: Graph clustering. Computer Science Review 1, 27–64 (2007)
Article Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27(3), 379–423 (1948)
Article MathSciNet Google Scholar
Steffen, P., Voß, B., Rehmsmeier, M., Reeder, J., Giegerich, R.: RNAshapes: an integrated RNA analysis package based on abstract shapes. Bioinformatics 22(4), 500–503 (2005)
Article Google Scholar
Tu, K.C., Bassler, B.L.: Multiple small RNAs act additively to integrate sensory information and control quorum sensing in Vibrio harveyi. Genes Dev 21, 221–233 (2007)
Google Scholar
Turner, D.H., Mathews, D.H.: NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res 38(suppl 1), D280–D282 (2010)
Article Google Scholar
Turner, D.H., Sugimoto, N., Freier, S.M.: RNA structure prediction. Annu. Rev. Biophys. Biophys. Chem. 17(1), 167–192 (1988)
Article Google Scholar
Waterman, M.S., Smith, T.F.: RNA secondary structure: A complete mathematical analysis. Math. Biosci. 42(3-4), 257–266 (1978)
Article Google Scholar
Watson, J.D., Crick, F.H., et al.: Molecular structure of nucleic acids. Nature 171(4356), 737–738 (1953)
Article Google Scholar
Wuchty, S., Fontana, W., Hofacker, I.L., Schuster, P.: Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers: Original Research on Biomolecules 49(2), 145–165 (1999)
Article Google Scholar
Zuker, M.: Computer prediction of RNA structure. In: Methods in enzymology, vol. 180, pp. 262–288. Elsevier (1989)
Google Scholar
Zuker, M.: On finding all suboptimal foldings of an RNA molecule. Science 244(4900), 48–52 (1989)
Article MathSciNet Google Scholar
Zuker, M., Mathews, D.H., Turner, D.H.: Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide. In: RNA Biochem Biotechnol, pp. 11–43. Springer (1999)
Google Scholar
Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9(1), 133–148 (1981)
Article Google Scholar

Download references

Acknowledgements

The work described herein was initiated during the Collaborative Workshop for Women in Mathematical Biology hosted by the Institute for Pure and Applied Mathematics at the University of California, Los Angeles in June 2019. Funding for the workshop was provided by IPAM, the Association for Women in Mathematics’ NSF ADVANCE “Career Advancement for Women Through Research-Focused Networks” (NSF-HRD 1500481) and the Society for Industrial and Applied Mathematics. The authors thank the organizers of the IPAM-WBIO workshop (Rebecca Segal, Blerta Shtylla, and Suzanne Sindi) for facilitating this research.

The authors thank the anonymous reviewers whose comments significantly improved the paper.

This research of Huijing Du was supported in part by NSF-DMS 1853636; of Margherita Maria Ferrari by the NSF-Simons Southeast Center for Mathematics and Biology (SCMB) through NSF-DMS 1764406 and Simons Foundation SFARI 594594; of Christine Heitsch by NIH R01GM126554; of Forrest Hurley by GBMF4560 (to Sullivan) and NSF-DMS 1344199 (to Heitsch); of Christine Mennicke by an NSF GRFP; of Blair D. Sullivan by the Gordon & Betty Moore Foundation’s Data-Driven Discovery Initiative under Grant GBMF4560; and of Bin Xu by the Robert and Sara Lumpkins Endowment for Postdoctoral Fellows in Applied and Computational Mathematics and Statistics at the University of Notre Dame.

Author information

Authors and Affiliations

Department of Mathematics, University of Nebraska-Lincoln, Lincoln, NE, USA
Huijing Du
Department of Mathematics and Statistics, University of South Florida, Tampa, FL, USA
Margherita Maria Ferrari
School of Mathematics, Georgia Institute of Technology, Atlanta, GA, USA
Christine Heitsch
North Carolina State University, Raleigh, NC, USA
Forrest Hurley
Department of Mathematics, North Carolina State University, Raleigh, NC, USA
Christine V. Mennicke
School of Computing, University of Utah, Salt Lake City, UT, USA
Blair D. Sullivan
Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN, USA
Bin Xu

Authors

Huijing Du
View author publications
You can also search for this author in PubMed Google Scholar
Margherita Maria Ferrari
View author publications
You can also search for this author in PubMed Google Scholar
Christine Heitsch
View author publications
You can also search for this author in PubMed Google Scholar
Forrest Hurley
View author publications
You can also search for this author in PubMed Google Scholar
Christine V. Mennicke
View author publications
You can also search for this author in PubMed Google Scholar
Blair D. Sullivan
View author publications
You can also search for this author in PubMed Google Scholar
Bin Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christine Heitsch .

Editor information

Editors and Affiliations

Department of Mathematics and Applied Mathematics, Virginia Commonwealth University, Richmond, VA, USA
Rebecca Segal
Department of Mathematics, Pomona College, Claremont, CA, USA
Blerta Shtylla
Department of Applied Mathematics, University of California, Merced, CA, USA
Suzanne Sindi

Appendix

Two data files (Figs. 9 and 10) are included to supplement the examples presented and discussed in Sect. 4.5. The files begin by listing the RNA sequence analyzed and the CH index for this partition, which had the largest proportional increase. Recall that the (dis)similarity measure is the frequency-weighted symmetric difference and the community detection is performed by the GMC graph algorithm.

The clusters of the partition are listed by decreasing total frequency. Each one is first described by a “Summary” which lists a cluster label (‘partition id’) and its total frequency mass, followed by the list of helix classes appearing in the union of the the cluster elements. This Summary is followed by a list of all the extended profiles in the cluster, also sorted by decreasing frequency. Each extended profile is specified by the cluster label, its frequency in the sample (‘multiplicity’), and the set of helix classes particular to this extended profile.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Du, H. et al. (2021). Secondary Structure Ensemble Analysis via Community Detection. In: Segal, R., Shtylla, B., Sindi, S. (eds) Using Mathematics to Understand Biological Complexity. Association for Women in Mathematics Series, vol 22. Springer, Cham. https://doi.org/10.1007/978-3-030-57129-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-57129-0_4
Published: 30 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57128-3
Online ISBN: 978-3-030-57129-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Secondary Structure Ensemble Analysis via Community Detection

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation