Advertisement

On Bubble Generators in Directed Graphs

  • Vicente Acuña
  • Roberto Grossi
  • Giuseppe F. Italiano
  • Leandro Lima
  • Romeo Rizzi
  • Gustavo Sacomoto
  • Marie-France Sagot
  • Blerina Sinaimeri
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10520)

Abstract

Bubbles are pairs of internally vertex-disjoint (st)-paths with applications in the processing of DNA and RNA data. For example, enumerating alternative splicing events in a reference-free context can be done by enumerating all bubbles in a de Bruijn graph built from RNA-seq reads [16]. However, listing and analysing all bubbles in a given graph is usually unfeasible in practice, due to the exponential number of bubbles present in real data graphs. In this paper, we propose a notion of a bubble generator set, i.e. a polynomial-sized subset of bubbles from which all the others can be obtained through the application of a specific symmetric difference operator. This set provides a compact representation of the bubble space of a graph, which can be useful in practice since some pertinent information about all the bubbles can be more conveniently extracted from this compact set. Furthermore, we provide a polynomial-time algorithm to decompose any bubble of a graph into the bubbles of such a generator in a tree-like fashion.

Keywords

Bubbles Bubble generator set Bubble space Decomposition algorithm 

Notes

Acknowledgments

V. Acuña was supported by Fondecyt 1140631, CIRIC-INRIA Chile and Basal Project PBF 03. R. Grossi and G.F. Italiano were partially supported by MIUR, the Italian Ministry of Education, University and Research, under the Project AMANDA (Algorithmics for MAssive and Networked DAta). Part of this work was done while G.F. Italiano was visiting Université de Lyon. L. Lima is supported by the Brazilian Ministry of Science, Technology and Innovation (in portuguese, Ministério da Ciência, Tecnologia e Inovação - MCTI) through the National Counsel of Technological and Scientific Development (in portuguese, Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq), under the Science Without Borders (in portuguese, Ciências Sem Fronteiras) scholarship grant process number 203362/2014-4. B. Sinaimeri, L. Lima and M.-F. Sagot are partially funded by the French ANR project Aster (2016–2020), and together with V. Acuña, also by the Stic AmSud project MAIA (2016–2017). This work was performed using the computing facilities of the CC LBBE/PRABI.

References

  1. 1.
    Birmelé, E., et al.: Efficient bubble enumeration in directed graphs. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 118–129. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-34109-0_13 CrossRefGoogle Scholar
  2. 2.
    Bollobás, B.: Modern Graph Theory. Graduate Texts in Mathematics, vol. 184. Springer-Verlag, Berlin (1998). doi: 10.1007/978-1-4612-0619-4 MATHGoogle Scholar
  3. 3.
    Bondy, J.A., Murty, U.S.R.: Graph Theory with Applications. Elsevier, New York (1976)CrossRefMATHGoogle Scholar
  4. 4.
    Brankovic, L., Iliopoulos, C.S., Kundu, R., Mohamed, M., Pissis, S.P., Vayani, F.: Linear-time superbubble identification algorithm for genome assembly. Theoret. Comput. Sci. 609, 374–383 (2016)CrossRefMathSciNetMATHGoogle Scholar
  5. 5.
    Deo, N.: Graph Theory with Applications to Engineering and Computer Science. Prentice-Hall series in Automatic Computation. Prentice-Hall, Englewood Cliffs (1974)MATHGoogle Scholar
  6. 6.
    Gleiss, P.M., Leydold, J., Stadler, P.F.: Circuit bases of strongly connected digraphs. Discuss. Math. Graph Theory 23(2), 241–260 (2003)CrossRefMathSciNetMATHGoogle Scholar
  7. 7.
    Iqbal, Z., Caccamo, M., Turner, I., Flicek, P., McVean, G.: De novo assembly and genotyping of variants using colored de bruijn graphs. Nat. Genet. 44(2), 226–232 (2012)CrossRefGoogle Scholar
  8. 8.
    Kavitha, T., Liebchen, C., Mehlhorn, K., Michail, D., Rizzi, R., Ueckerdt, T., Zweig, K.A.: Cycle bases in graphs characterization, algorithms, complexity, and applications. Comput. Sci. Rev. 3(4), 199–243 (2009)CrossRefMATHGoogle Scholar
  9. 9.
    Kavitha, T., Mehlhorn, K.: Algorithms to compute minimum cycle bases in directed graphs. Theory Comput. Syst. 40(4), 485–505 (2007)CrossRefMathSciNetMATHGoogle Scholar
  10. 10.
    Li, H.: Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly. Bioinformatics 28(14), 1838–1844 (2012)CrossRefGoogle Scholar
  11. 11.
    Lima, L., Sinaimeri, B., Sacomoto, G., Lopez-Maestre, H., Marchet, C., Miele, V., Sagot, M.F., Lacroix, V.: Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads. Algorithms Mol. Biol. 12(1), 2:1–2:19 (2017). doi: 10.1186/s13015-017-0091-2 CrossRefGoogle Scholar
  12. 12.
    MacLane, S.: A combinatorial condition for planar graphs. Fundam. Math. 28, 22–32 (1937)MATHGoogle Scholar
  13. 13.
    Onodera, T., Sadakane, K., Shibuya, T.: Detecting superbubbles in assembly graphs. In: Darling, A., Stoye, J. (eds.) WABI 2013. LNCS, vol. 8126, pp. 338–348. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40453-5_26 CrossRefGoogle Scholar
  14. 14.
    Pevzner, P.A., Tang, H., Tesler, G.: De novo repeat classification and fragment assembly. Genome Res. 14(9), 1786–1796 (2004)CrossRefGoogle Scholar
  15. 15.
    Sacomoto, G., Lacroix, V., Sagot, M.-F.: A polynomial delay algorithm for the enumeration of bubbles with length constraints in directed graphs and its application to the detection of alternative splicing in RNA-seq data. In: Darling, A., Stoye, J. (eds.) WABI 2013. LNCS, vol. 8126, pp. 99–111. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40453-5_9 CrossRefGoogle Scholar
  16. 16.
    Sacomoto, G., Kielbassa, J., Chikhi, R., Uricaru, R., Antoniou, P., Sagot, M.F., Peterlongo, P., Lacroix, V.: KISSPLICE: de-novo calling alternative splicing events from RNA-seq data. BMC Bioinf. 13(S–6), S5 (2012)Google Scholar
  17. 17.
    Sammeth, M.: Complete alternative splicing events are bubbles in splicing graphs. J. Comput. Biol. 16(8), 1117–1140 (2009)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J.M., Birol, I.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)CrossRefGoogle Scholar
  19. 19.
    Sung, W.K., Sadakane, K., Shibuya, T., Belorkar, A., Pyrogova, I.: An o(m log m)-time algorithm for detecting superbubbles. IEEE/ACM Trans. Comput. Biol. Bioinf. 12(4), 770–777 (2015)CrossRefGoogle Scholar
  20. 20.
    Uricaru, R., Rizk, G., Lacroix, V., Quillery, E., Plantard, O., Chikhi, R., Lemaitre, C., Peterlongo, P.: Reference-free detection of isolated SNPs. Nucleic Acids Res. 43(2), e11 (2015)CrossRefGoogle Scholar
  21. 21.
    Younsi, R., MacLean, D.: Using 2k+2 bubble searches to find single nucleotide polymorphisms in k-mer graphs. Bioinformatics 31(5), 642–646 (2015)CrossRefGoogle Scholar
  22. 22.
    Zerbino, D., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Vicente Acuña
    • 1
  • Roberto Grossi
    • 2
    • 5
  • Giuseppe F. Italiano
    • 3
  • Leandro Lima
    • 5
  • Romeo Rizzi
    • 4
  • Gustavo Sacomoto
    • 5
  • Marie-France Sagot
    • 5
  • Blerina Sinaimeri
    • 5
  1. 1.Center for Mathematical Modeling (UMI 2807 CNRS)University of ChileSantiagoChile
  2. 2.Università di PisaPisaItaly
  3. 3.Università di Roma “Tor Vergata”RomaItaly
  4. 4.Università di VeronaVeronaItaly
  5. 5.Inria Grenoble Rhône-Alpes, CNRS, UMR5558, LBBEUniversité Lyon 1VilleurbanneFrance

Personalised recommendations