Describing the Local Structure of Sequence Graphs
Analysis of genetic variation using graph structures is an emerging paradigm of genomics. However, defining genetic sites on sequence graphs remains an open problem. Paten’s invention of the ultrabubble and snarl, special subgraphs of sequence graphs which can identified with efficient algorithms, represents important first step to segregating graphs into genetic sites. We extend the theory of ultrabubbles to a special subclass where every detail of the ultrabubble can be described in a series and parallel arrangement of genetic sites. We furthermore introduce the concept of bundle structures, which allows us to recognize the graph motifs created by additional combinations of variation in the graph, including but not limited to runs of abutting single nucleotide variants. We demonstrate linear-time identification of bundles in a bidirected graph. These two advances build on initial work on ultrabubbles in bidirected graphs, and define a more granular concept of genetic site.
KeywordsSequence graphs Genetic variants
Y.R. is supported by a Howard Hughes Medical Institute Medical Research Fellowship. This work was also supported by the National Human Genome Research Institute of the National Institutes of Health under Award Number 5U54HG007990 and grants from the W.M. Keck foundation and the Simons Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We thank Wolfgang Beyer for his visualizations of 1000 Genomes data in a variation graph.
- 1.1000 Genomes Project Consortium, et al.: A global reference for human genetic variation. Nature 526(7571), 68–74 (2015)Google Scholar
- 2.Beyer, W.: Sequence tube maps (2016). https://github.com/wolfib/sequenceTubeMap
- 3.Brankovic, L., Iliopoulos, C.S., Kundu, R., Mohamed, M., Pissis, S.P., Vayani, F.: Linear-time superbubble identification algorithm for genome assembly. Theor. Comput. Sci. 609(Pt. 2), 374–383 (2016). http://www.sciencedirect.com/science/article/pii/S0304397515009147
- 5.Duffin, R.: Topology of series-parallel networks. J. Math. Anal. Appl. 10(2), 303–318 (1965). http://www.sciencedirect.com/science/article/pii/0022247X65901253
- 7.Novak, A.M., Hickey, G., Garrison, E., Blum, S., Connelly, A., Dilthey, A., Eizenga, J., Elmohamed, M.A.S., Guthrie, S., Kahles, A., Keenan, S., Kelleher, J., Kural, D., Li, H., Lin, M.F., Miga, K., Ouyang, N., Rakocevic, G., Smuga-Otto, M., Zaranek, A.W., Durbin, R., McVean, G., Haussler, D., Paten, B.: Genome graphs. bioRxiv (2017). http://biorxiv.org/content/early/2017/01/18/101378
- 9.Paten, B., Novak, A.M., Garrison, E., Hickey, G.: Superbubbles, ultrabubbles and cacti. bioRxiv (2017). http://biorxiv.org/content/early/2017/01/18/101493
- 11.Sung, W.K., Sadakane, K., Shibuya, T., Belorkar, A., Pyrogova, I.: An o(m log m)-time algorithm for detecting superbubbles. IEEE/ACM Trans. Comput. Biol. Bioinform. 12(4), 770–777. https://doi.org/10.1109/TCBB.2014.2385696
- 12.Valdes, J., Tarjan, R.E., Lawler, E.L.: The recognition of series parallel digraphs. SIAM J. Comput. 11(2), 298–313 (1982). http://dx.doi.org/10.1137/0211023