# Superbubbles, Ultrabubbles and Cacti

## Abstract

A superbubble is a type of directed acyclic subgraph with single distinct source and sink vertices. In genome assembly and genetics, the possible paths through a superbubble can be considered to represent the set of possible sequences at a location in a genome. Bidirected and biedged graphs are a generalization of digraphs that are increasingly being used to more fully represent genome assembly and variation problems. Here we define snarls and ultrabubbles, generalizations of superbubbles for bidirected and biedged graphs, and give an efficient algorithm for the detection of these more general structures. Key to this algorithm is the cactus graph, which we show encodes the nested decomposition of a graph into snarls and ultrabubbles within its structure. We propose and demonstrate empirically that this decomposition on bidirected and biedged graphs solves a fundamental problem by defining genetic sites for any collection of genomic variations, including complex structural variations, without need for any single reference genome coordinate system. Furthermore, the nesting of the decomposition gives a natural way to describe and model variations contained within large variations, a case not currently dealt with by existing formats, e.g. VCF.

## Keywords

Simple Cycle Directed Walk Black Edge Breakpoint Graph Bidirected Graph## Notes

### Acknowledgements

This work was supported by the National Human Genome Research Institute of the National Institutes of Health under Award Number 5U54HG007990 and grants from the W.M. Keck foundation and the Simons Foundation.

## References

- 1.Alekseyev, M.A., Pevzner, P.A.: Breakpoint graphs and ancestral genome reconstructions. Genome Res.
**19**(5), 943–957 (2009). http://genome.cshlp.org/cgi/content/abstract/19/5/943 - 2.Birmelé, E., Crescenzi, P., Ferreira, R., Grossi, R., Lacroix, V., Marino, A., Pisanti, N., Sacomoto, G., Sagot, M.-F.: Efficient bubble enumeration in directed graphs. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 118–129. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-34109-0_13 CrossRefGoogle Scholar
- 3.Brankovic, L., Iliopoulos, C.S., Kundu, R., Mohamed, M., Pissis, S.P., Vayani, F.: Linear-time superbubble identification algorithm for genome assembly. Theor. Comput. Sci.
**609**, 374–383 (2015). http://linkinghub.elsevier.com/retrieve/pii/S0304397515009147 - 4.de Bruijn, N.G.: A combinatorial problem. Koninklijke Nederlandse Akademie v. Wetenschappen
**1**(49), 758–764 (1946)MATHGoogle Scholar - 5.Consortium, G.P., et al.: A global reference for human genetic variation. Nature
**526**(7571), 68–74 (2015)CrossRefGoogle Scholar - 6.Edmonds, J., Johnson, E.L.: Matching: a well-solved class of integer linear programs. In: Jünger, M., Reinelt, G., Rinaldi, G. (eds.) Combinatorial Optimization — Eureka, You Shrink!. LNCS, vol. 2570, pp. 27–30. Springer, Heidelberg (2003). doi: 10.1007/3-540-36478-1_3 CrossRefGoogle Scholar
- 7.Harary, F., Uhlenbeck, G.E.: On the number of husimi trees: I. Proc. Natl. Acad. Sci. U.S.A.
**39**(4), 315–322 (1953). http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=Retrieve&list_uids=16589268&dopt=abstractplus - 8.Iliopoulos, C.S., Kundu, R., Mohamed, M., Vayani, F.: Popping superbubbles and discovering clumps: recent developments in biological sequence analysis. In: Kaykobad, M., Petreschi, R. (eds.) WALCOM 2016. LNCS, vol. 9627, pp. 3–14. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-30139-6_1 CrossRefGoogle Scholar
- 9.Medvedev, P., Brudno, M.: Maximum likelihood genome assembly. J. Comput. Biol.: J. Comput. Mol. Cell Biol.
**16**(8), 1101–1116 (2009). http://www.liebertonline.com/doi/abs/10.1089/cmb.2009.0047 MathSciNetCrossRefGoogle Scholar - 10.Myers, E.W.: The fragment assembly string graph. Bioinformatics
**21**(2), ii79–ii85 (2005). http://bioinformatics.oxfordjournals.org/content/21/suppl_2/ii79.long. (Oxford, England)Google Scholar - 11.Onodera, T., Sadakane, K., Shibuya, T.: Detecting superbubbles in assembly graphs. In: Darling, A., Stoye, J. (eds.) WABI 2013. LNCS, vol. 8126, pp. 338–348. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40453-5_26 CrossRefGoogle Scholar
- 12.Paten, B., Diekhans, M., Earl, D., John, J.S., Ma, J., Suh, B., Haussler, D.: Cactus graphs for genome comparisons. J. Comput. Biol.: J. Comput. Mol. Cell Biol.
**18**(3), 469–481 (2011). http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=21385048&retmode=ref&cmd=prlinks MathSciNetCrossRefGoogle Scholar - 13.Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A.
**98**(17), 9748–9753 (2001). http://www.pnas.org/cgi/content/full/98/17/9748 - 14.Pevzner, P.: Computational Molecular Biology: An Algorithmic Approach. MIT Press, Cambridge (2000)MATHGoogle Scholar
- 15.Sung, W.K., Sadakane, K., Shibuya, T., Belorkar, A., Pyrogova, I.: An O(m logm)-time algorithm for detecting super bubbles. IEEE/ACM Trans. Comput. Biol. Bioinf.
**12**(4), 770–777 (2015). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6998850 CrossRefGoogle Scholar - 16.Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res.
**18**(5), 821–829 (2008). http://www.genome.org/cgi/content/full/18/5/821 CrossRefGoogle Scholar