Advertisement

Superbubbles, Ultrabubbles and Cacti

  • Benedict Paten
  • Adam M. Novak
  • Erik Garrison
  • Glenn Hickey
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10229)

Abstract

A superbubble is a type of directed acyclic subgraph with single distinct source and sink vertices. In genome assembly and genetics, the possible paths through a superbubble can be considered to represent the set of possible sequences at a location in a genome. Bidirected and biedged graphs are a generalization of digraphs that are increasingly being used to more fully represent genome assembly and variation problems. Here we define snarls and ultrabubbles, generalizations of superbubbles for bidirected and biedged graphs, and give an efficient algorithm for the detection of these more general structures. Key to this algorithm is the cactus graph, which we show encodes the nested decomposition of a graph into snarls and ultrabubbles within its structure. We propose and demonstrate empirically that this decomposition on bidirected and biedged graphs solves a fundamental problem by defining genetic sites for any collection of genomic variations, including complex structural variations, without need for any single reference genome coordinate system. Furthermore, the nesting of the decomposition gives a natural way to describe and model variations contained within large variations, a case not currently dealt with by existing formats, e.g. VCF.

Keywords

Simple Cycle Directed Walk Black Edge Breakpoint Graph Bidirected Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

This work was supported by the National Human Genome Research Institute of the National Institutes of Health under Award Number 5U54HG007990 and grants from the W.M. Keck foundation and the Simons Foundation.

References

  1. 1.
    Alekseyev, M.A., Pevzner, P.A.: Breakpoint graphs and ancestral genome reconstructions. Genome Res. 19(5), 943–957 (2009). http://genome.cshlp.org/cgi/content/abstract/19/5/943
  2. 2.
    Birmelé, E., Crescenzi, P., Ferreira, R., Grossi, R., Lacroix, V., Marino, A., Pisanti, N., Sacomoto, G., Sagot, M.-F.: Efficient bubble enumeration in directed graphs. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 118–129. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-34109-0_13 CrossRefGoogle Scholar
  3. 3.
    Brankovic, L., Iliopoulos, C.S., Kundu, R., Mohamed, M., Pissis, S.P., Vayani, F.: Linear-time superbubble identification algorithm for genome assembly. Theor. Comput. Sci. 609, 374–383 (2015). http://linkinghub.elsevier.com/retrieve/pii/S0304397515009147
  4. 4.
    de Bruijn, N.G.: A combinatorial problem. Koninklijke Nederlandse Akademie v. Wetenschappen 1(49), 758–764 (1946)zbMATHGoogle Scholar
  5. 5.
    Consortium, G.P., et al.: A global reference for human genetic variation. Nature 526(7571), 68–74 (2015)CrossRefGoogle Scholar
  6. 6.
    Edmonds, J., Johnson, E.L.: Matching: a well-solved class of integer linear programs. In: Jünger, M., Reinelt, G., Rinaldi, G. (eds.) Combinatorial Optimization — Eureka, You Shrink!. LNCS, vol. 2570, pp. 27–30. Springer, Heidelberg (2003). doi: 10.1007/3-540-36478-1_3 CrossRefGoogle Scholar
  7. 7.
    Harary, F., Uhlenbeck, G.E.: On the number of husimi trees: I. Proc. Natl. Acad. Sci. U.S.A. 39(4), 315–322 (1953). http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=Retrieve&list_uids=16589268&dopt=abstractplus
  8. 8.
    Iliopoulos, C.S., Kundu, R., Mohamed, M., Vayani, F.: Popping superbubbles and discovering clumps: recent developments in biological sequence analysis. In: Kaykobad, M., Petreschi, R. (eds.) WALCOM 2016. LNCS, vol. 9627, pp. 3–14. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-30139-6_1 CrossRefGoogle Scholar
  9. 9.
    Medvedev, P., Brudno, M.: Maximum likelihood genome assembly. J. Comput. Biol.: J. Comput. Mol. Cell Biol. 16(8), 1101–1116 (2009). http://www.liebertonline.com/doi/abs/10.1089/cmb.2009.0047 MathSciNetCrossRefGoogle Scholar
  10. 10.
    Myers, E.W.: The fragment assembly string graph. Bioinformatics 21(2), ii79–ii85 (2005). http://bioinformatics.oxfordjournals.org/content/21/suppl_2/ii79.long. (Oxford, England)Google Scholar
  11. 11.
    Onodera, T., Sadakane, K., Shibuya, T.: Detecting superbubbles in assembly graphs. In: Darling, A., Stoye, J. (eds.) WABI 2013. LNCS, vol. 8126, pp. 338–348. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40453-5_26 CrossRefGoogle Scholar
  12. 12.
    Paten, B., Diekhans, M., Earl, D., John, J.S., Ma, J., Suh, B., Haussler, D.: Cactus graphs for genome comparisons. J. Comput. Biol.: J. Comput. Mol. Cell Biol. 18(3), 469–481 (2011). http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=21385048&retmode=ref&cmd=prlinks MathSciNetCrossRefGoogle Scholar
  13. 13.
    Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A. 98(17), 9748–9753 (2001). http://www.pnas.org/cgi/content/full/98/17/9748
  14. 14.
    Pevzner, P.: Computational Molecular Biology: An Algorithmic Approach. MIT Press, Cambridge (2000)zbMATHGoogle Scholar
  15. 15.
    Sung, W.K., Sadakane, K., Shibuya, T., Belorkar, A., Pyrogova, I.: An O(m logm)-time algorithm for detecting super bubbles. IEEE/ACM Trans. Comput. Biol. Bioinf. 12(4), 770–777 (2015). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6998850 CrossRefGoogle Scholar
  16. 16.
    Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5), 821–829 (2008). http://www.genome.org/cgi/content/full/18/5/821 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Benedict Paten
    • 1
  • Adam M. Novak
    • 1
  • Erik Garrison
    • 2
  • Glenn Hickey
    • 1
  1. 1.UC Santa Cruz Genomics InstituteUniversity of California Santa CruzSanta CruzUSA
  2. 2.Wellcome Trust Sanger InstituteCambridgeUK

Personalised recommendations