Skip to main content

Constructing String Graphs in External Memory

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNBI,volume 8701)

Abstract

In this paper we present an efficient external memory algorithm to compute the string graph from a collection of reads, which is a fundamental data representation used for sequence assembly.

Our algorithm builds upon some recent results on lightweight Burrows-Wheeler Transform (BWT) and Longest Common Prefix (LCP) construction providing, as a by-product, an efficient procedure to extend intervals of the BWT that could be of independent interest.

We have implemented our algorithm and compared its efficiency against SGA—the most advanced assembly string graph construction program.

Keywords

  • Main Memory
  • External Memory
  • Secondary Memory
  • Graph Reduction
  • Splice Graph

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-662-44753-6_23
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-662-44753-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bankevich, A., Nurk, S., Antipov, D., et al.: SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012)

    CrossRef  MathSciNet  Google Scholar 

  2. Bauer, M., Cox, A., Rosone, G.: Lightweight algorithms for constructing and inverting the BWT of string collections. Theor. Comput. Sci. 483, 134–148 (2013)

    CrossRef  MATH  MathSciNet  Google Scholar 

  3. Bauer, M.J., Cox, A.J., Rosone, G., Sciortino, M.: Lightweight LCP construction for next-generation sequencing datasets. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 326–337. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  4. Beretta, S., Bonizzoni, P., Della Vedova, G., Pirola, Y., Rizzi, R.: Modeling alternative splicing variants from RNA-Seq data with isoform graphs. J. Comput. Biol. 16(1), 16–40 (2014)

    CrossRef  Google Scholar 

  5. Cox, A.J., Jakobi, T., Rosone, G., Schulz-Trieglaff, O.B.: Comparing DNA sequence collections by direct comparison of compressed text indexes. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 214–224. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  6. Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. Algorithmica 63(3), 707–730 (2012)

    CrossRef  MATH  MathSciNet  Google Scholar 

  7. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)

    CrossRef  MathSciNet  Google Scholar 

  8. Lam, T., Li, R., Tam, A., Wong, S., Wu, E., Yiu, S.: High throughput short read alignment via bi-directional BWT. In: BIBM 2009, pp. 31–36 (2009)

    Google Scholar 

  9. Myers, E.: The fragment assembly string graph. Bioinformatics 21, ii79–ii85 (2005)

    Google Scholar 

  10. Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: IDBA – A practical iterative de bruijn graph de novo assembler. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 426–440. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  11. Salzberg, S.L., et al.: GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res. 22(3), 557–567 (2012)

    CrossRef  Google Scholar 

  12. Shi, F.: Suffix arrays for multiple strings: A method for on-line multiple string searches. In: Jaffar, J., Yap, R.H.C. (eds.) ASIAN 1996. LNCS, vol. 1179, pp. 11–22. Springer, Heidelberg (1996)

    CrossRef  Google Scholar 

  13. Simpson, J., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12), i367–i373 (2010)

    Google Scholar 

  14. Simpson, J., Durbin, R.: Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22, 549–556 (2012)

    CrossRef  Google Scholar 

  15. Simpson, J., Wong, K., Jackman, S., et al.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bonizzoni, P., Della Vedova, G., Pirola, Y., Previtali, M., Rizzi, R. (2014). Constructing String Graphs in External Memory. In: Brown, D., Morgenstern, B. (eds) Algorithms in Bioinformatics. WABI 2014. Lecture Notes in Computer Science(), vol 8701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44753-6_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-44753-6_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-44752-9

  • Online ISBN: 978-3-662-44753-6

  • eBook Packages: Computer ScienceComputer Science (R0)