Efficient Bubble Enumeration in Directed Graphs

  • Etienne Birmelé
  • Pierluigi Crescenzi
  • Rui Ferreira
  • Roberto Grossi
  • Vincent Lacroix
  • Andrea Marino
  • Nadia Pisanti
  • Gustavo Sacomoto
  • Marie-France Sagot
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7608)

Abstract

Polymorphisms in DNA- or RNA-seq data lead to recognisable patterns in a de Bruijn graph representation of the reads obtained by sequencing. Such patterns have been called mouths, or bubbles in the literature. They correspond to two vertex-disjoint directed paths between a source s and a target t. Due to the high number of such bubbles that may be present in real data, their enumeration is a major issue concerning the efficiency of dedicated algorithms. We propose in this paper the first linear delay algorithm to enumerate all bubbles with a given source.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Robertson, G., et al.: De novo assembly and analysis of RNA-seq data. Nature Methods 7(11), 909–912 (2010)CrossRefGoogle Scholar
  2. 2.
    Sacomoto, G., et al.: KisSplice: de-novo calling alternative splicing events from rna-seq data. In: RECOMB-Seq, BMC Bioinformatics (2012)Google Scholar
  3. 3.
    Simpson, J.T., et al.: ABySS: A parallel assembler for short read sequence data. Genome Research 19(6), 1117–1123 (2009)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Peterlongo, P., Schnel, N., Pisanti, N., Sagot, M.-F., Lacroix, V.: Identifying SNPs without a Reference Genome by Comparing Raw Reads. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 147–158. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Gusfield, D., Eddhu, S., Langley, C.H.: Optimal, efficient reconstruction of phylogenetic networks with constrained recombination. J. Bioinf. and Comput. Biol. 2(1), 173–214 (2004)CrossRefGoogle Scholar
  6. 6.
    Iqbal, Z., Caccamo, M., Turner, I., Flicek, P., McVean, G.: De novo assembly and genotyping of variants using colored de bruijn graphs. Nature Genetics (2012)Google Scholar
  7. 7.
    Johnson, D.B.: Finding all the elementary circuits of a directed graph. SIAM J. Comput. 4(1), 77–84 (1975)MathSciNetMATHCrossRefGoogle Scholar
  8. 8.
    Pevzner, P.A., Tang, H., Tesler, G.: De novo repeat classification and fragment assembly. In: RECOMB, pp. 213–222 (2004)Google Scholar
  9. 9.
    Sammeth, M.: Complete alternative splicing events are bubbles in splicing graphs. J. Comput. Biol. 16(8), 1117–1140 (2009)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Tarjan, R.E.: Enumeration of the elementary circuits of a directed graph. SIAM Journal on Computing 2(3), 211–216 (1973)MathSciNetMATHCrossRefGoogle Scholar
  11. 11.
    Tiernan, J.C.: An efficient search algorithm to find the elementary circuits of a graph. Commun. ACM 13(12), 722–726 (1970)MathSciNetMATHCrossRefGoogle Scholar
  12. 12.
    Zerbino, D.R., Birney, E.: Velvet: Algorithms for de novo short read assembly using de bruijn graphs. Genome Research 18(5), 821–829 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Etienne Birmelé
    • 1
    • 3
  • Pierluigi Crescenzi
    • 4
  • Rui Ferreira
    • 5
  • Roberto Grossi
    • 5
  • Vincent Lacroix
    • 1
    • 2
  • Andrea Marino
    • 1
    • 4
  • Nadia Pisanti
    • 5
  • Gustavo Sacomoto
    • 1
    • 2
  • Marie-France Sagot
    • 1
    • 2
  1. 1.INRIA Grenoble Rhône-AlpesFrance
  2. 2.Université de Lyon 1VilleurbanneFrance
  3. 3.Université d’ÉvryFrance
  4. 4.Dipartimento di Sistemi e InformaticaUniversità di FirenzeFirenzeItaly
  5. 5.Dipartimento di InformaticaUniversità di PisaPisaItaly

Personalised recommendations