Skip to main content

Secondary Structure Ensemble Analysis via Community Detection

  • Chapter
  • First Online:
Using Mathematics to Understand Biological Complexity

Abstract

We explored the extent to which graph algorithms for community detection can improve the mining of structural information from the predicted Boltzmann/Gibbs ensemble for the biological objects known as RNA secondary structures. As described, a new computational pipeline was developed, implemented, and tested against the prior method RNAStructProfiling. Since the new approach was judged to provide more structural information in 75% of the test cases, this proof-of-principle analysis supports efforts to improve the data mining of RNA secondary structure ensembles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://rna.urmc.rochester.edu/RNAstructure.html

  2. 2.

    https://github.com/gtDMMB/ipam-wbio-scripts

  3. 3.

    The results in this manuscript use a new version of profiling which is still under development and will be made available at https://github.com/gtDMMB/RNAStructProfiling.

References

  1. Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. - Theory Methods 3(1), 1–27 (1974)

    Article  MathSciNet  Google Scholar 

  2. Chan, C.Y., Lawrence, C.E., Ding, Y.: Structure clustering features on the Sfold Web server. Bioinformatics 21(20), 3926–3928 (2005)

    Article  Google Scholar 

  3. Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 066111 (2004)

    Article  Google Scholar 

  4. Cordasco, G., Gargano, L.: Community detection via semi-synchronous label propagation algorithms. Int. J. Soc. Netw. Min. 1, 3–26 (2012)

    Article  Google Scholar 

  5. Crick, F.: Codon—anticodon pairing: The wobble hypothesis. J. Mol. Biol. 19(2), 548–555 (1966)

    Article  Google Scholar 

  6. Ding, Y., Chan, C.Y., Lawrence, C.E.: RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. RNA 11(8), 1157–1166 (2005)

    Article  Google Scholar 

  7. Ding, Y., Chan, C.Y., Lawrence, C.E.: Clustering of RNA secondary structures with application to messenger RNAs. J. Mol. Biol. 359(3), 554–571 (2006)

    Article  Google Scholar 

  8. Ding, Y., Lawrence, C.E.: A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 31(24), 7280–7301 (2003)

    Article  Google Scholar 

  9. Giegerich, R., Voß, B., Rehmsmeier, M.: Abstract shapes of RNA. Nucleic Acids Res. 32(16), 4843–4851 (2004)

    Article  Google Scholar 

  10. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. U.S.A. 99, 7821–7826 (2002)

    Article  MathSciNet  Google Scholar 

  11. Hagberg, A.A., Schult, D.A., Swart, P.J.: Exploring network structure, dynamics, and function using NetworkX. In: G. Varoquaux, T. Vaught, J. Millman (eds.) Proceedings of the 7th Python in Science Conference, pp. 11 – 15. Pasadena, CA USA (2008)

    Google Scholar 

  12. Huang, J., Voß, B.: Analysing RNA-kinetics based on folding space abstraction. BMC Bioinformatics 15, 60 (2014)

    Article  Google Scholar 

  13. Kerpedjiev, P., Hammer, S., Hofacker, I.L.: Forna (force-directed RNA): simple and effective online RNA secondary structure diagrams. Bioinformatics 31(20), 3377–3379 (2015)

    Article  Google Scholar 

  14. Lenz, D.H., Mok, K.C., Lilley, B.N., Kulkarni, R.V., Wingreen, N.S., Bassler, B.L.: The small RNA chaperone Hfq and multiple small RNAs control quorum sensing in Vibrio harveyi and Vibrio cholerae. Cell 117(1), 69–82 (2004)

    Google Scholar 

  15. McCaskill, J.S.: The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers: Original Research on Biomolecules 29(6-7), 1105–1119 (1990)

    Article  Google Scholar 

  16. Nussinov, R., Pieczenik, G., Griggs, J.R., Kleitman, D.J.: Algorithms for loop matchings. SIAM J. Appl. Math. 35(1), 68–82 (1978)

    Article  MathSciNet  Google Scholar 

  17. Pérez-Reytor, D.e.a.: Role of non-coding regulatory RNA in the virulence of human pathogenic Vibrios. Front. Microbiol. 7 (2017)

    Google Scholar 

  18. Pferschy, U., Schauer, J.: The maximum flow problem with disjunctive constraints. J. Comb. Optim. 26(1), 109–119 (2013)

    Article  MathSciNet  Google Scholar 

  19. Rogers, E., Heitsch, C.E.: Profiling small RNA reveals multimodal substructural signals in a Boltzmann ensemble. Nucleic Acids Res. 42(22), e171–e171 (2014)

    Article  Google Scholar 

  20. Schaeffer, S.E.: Graph clustering. Computer Science Review 1, 27–64 (2007)

    Article  Google Scholar 

  21. Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27(3), 379–423 (1948)

    Article  MathSciNet  Google Scholar 

  22. Steffen, P., Voß, B., Rehmsmeier, M., Reeder, J., Giegerich, R.: RNAshapes: an integrated RNA analysis package based on abstract shapes. Bioinformatics 22(4), 500–503 (2005)

    Article  Google Scholar 

  23. Tu, K.C., Bassler, B.L.: Multiple small RNAs act additively to integrate sensory information and control quorum sensing in Vibrio harveyi. Genes Dev 21, 221–233 (2007)

    Google Scholar 

  24. Turner, D.H., Mathews, D.H.: NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res 38(suppl 1), D280–D282 (2010)

    Article  Google Scholar 

  25. Turner, D.H., Sugimoto, N., Freier, S.M.: RNA structure prediction. Annu. Rev. Biophys. Biophys. Chem. 17(1), 167–192 (1988)

    Article  Google Scholar 

  26. Waterman, M.S., Smith, T.F.: RNA secondary structure: A complete mathematical analysis. Math. Biosci. 42(3-4), 257–266 (1978)

    Article  Google Scholar 

  27. Watson, J.D., Crick, F.H., et al.: Molecular structure of nucleic acids. Nature 171(4356), 737–738 (1953)

    Article  Google Scholar 

  28. Wuchty, S., Fontana, W., Hofacker, I.L., Schuster, P.: Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers: Original Research on Biomolecules 49(2), 145–165 (1999)

    Article  Google Scholar 

  29. Zuker, M.: Computer prediction of RNA structure. In: Methods in enzymology, vol. 180, pp. 262–288. Elsevier (1989)

    Google Scholar 

  30. Zuker, M.: On finding all suboptimal foldings of an RNA molecule. Science 244(4900), 48–52 (1989)

    Article  MathSciNet  Google Scholar 

  31. Zuker, M., Mathews, D.H., Turner, D.H.: Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide. In: RNA Biochem Biotechnol, pp. 11–43. Springer (1999)

    Google Scholar 

  32. Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9(1), 133–148 (1981)

    Article  Google Scholar 

Download references

Acknowledgements

The work described herein was initiated during the Collaborative Workshop for Women in Mathematical Biology hosted by the Institute for Pure and Applied Mathematics at the University of California, Los Angeles in June 2019. Funding for the workshop was provided by IPAM, the Association for Women in Mathematics’ NSF ADVANCE “Career Advancement for Women Through Research-Focused Networks” (NSF-HRD 1500481) and the Society for Industrial and Applied Mathematics. The authors thank the organizers of the IPAM-WBIO workshop (Rebecca Segal, Blerta Shtylla, and Suzanne Sindi) for facilitating this research.

The authors thank the anonymous reviewers whose comments significantly improved the paper.

This research of Huijing Du was supported in part by NSF-DMS 1853636; of Margherita Maria Ferrari by the NSF-Simons Southeast Center for Mathematics and Biology (SCMB) through NSF-DMS 1764406 and Simons Foundation SFARI 594594; of Christine Heitsch by NIH R01GM126554; of Forrest Hurley by GBMF4560 (to Sullivan) and NSF-DMS 1344199 (to Heitsch); of Christine Mennicke by an NSF GRFP; of Blair D. Sullivan by the Gordon & Betty Moore Foundation’s Data-Driven Discovery Initiative under Grant GBMF4560; and of Bin Xu by the Robert and Sara Lumpkins Endowment for Postdoctoral Fellows in Applied and Computational Mathematics and Statistics at the University of Notre Dame.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christine Heitsch .

Editor information

Editors and Affiliations

Appendix

Appendix

Two data files (Figs. 9 and 10) are included to supplement the examples presented and discussed in Sect. 4.5. The files begin by listing the RNA sequence analyzed and the CH index for this partition, which had the largest proportional increase. Recall that the (dis)similarity measure is the frequency-weighted symmetric difference and the community detection is performed by the GMC graph algorithm.

The clusters of the partition are listed by decreasing total frequency. Each one is first described by a “Summary” which lists a cluster label (‘partition id’) and its total frequency mass, followed by the list of helix classes appearing in the union of the the cluster elements. This Summary is followed by a list of all the extended profiles in the cluster, also sorted by decreasing frequency. Each extended profile is specified by the cluster label, its frequency in the sample (‘multiplicity’), and the set of helix classes particular to this extended profile.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Association for Women in Mathematics and the Author(s)

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Du, H. et al. (2021). Secondary Structure Ensemble Analysis via Community Detection. In: Segal, R., Shtylla, B., Sindi, S. (eds) Using Mathematics to Understand Biological Complexity. Association for Women in Mathematics Series, vol 22. Springer, Cham. https://doi.org/10.1007/978-3-030-57129-0_4

Download citation

Publish with us

Policies and ethics