Advertisement

Divide and Conquer Computation of the Multi-string BWT and LCP Array

  • Paola Bonizzoni
  • Gianluca Della Vedova
  • Serena Nicosia
  • Yuri Pirola
  • Marco Previtali
  • Raffaella Rizzi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10936)

Abstract

Indexing huge collections of strings, such as those produced by the widespread sequencing technologies, heavily relies on multi-string generalizations of the Burrows-Wheeler Transform (BWT) and the Longest Common Prefix (LCP) array, since solving efficiently both problems are essential ingredients of several algorithms on a collection of strings.

In this paper we explore lightweight and parallel computational strategies for building the BWT and LCP array. We design a novel algorithm based on a divide and conquer approach that leads to a simultaneous and parallel computation of multi-string BWT and LCP array.

References

  1. 1.
    Bauer, M.J., Cox, A.J., Rosone, G.: Lightweight algorithms for constructing and inverting the BWT of string collections. Theor. Comp. Sci. 483, 134–148 (2013)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bauer, M.J., Cox, A.J., Rosone, G., Sciortino, M.: Lightweight LCP construction for next-generation sequencing datasets. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 326–337. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33122-0_26CrossRefGoogle Scholar
  3. 3.
    Belazzougui, D., Gagie, T., Mäkinen, V., Previtali, M., Puglisi, S.J.: Bidirectional variable-order de Bruijn graphs. In: Kranakis, E., Navarro, G., Chávez, E. (eds.) LATIN 2016. LNCS, vol. 9644, pp. 164–178. Springer, Heidelberg (2016).  https://doi.org/10.1007/978-3-662-49529-2_13CrossRefGoogle Scholar
  4. 4.
    Beretta, S., Bonizzoni, P., Denti, L., Previtali, M., Rizzi, R.: Mapping RNA-seq data to a transcript graph via approximate pattern matching to a hypertext. In: Figueiredo, D., Martín-Vide, C., Pratas, D., Vega-Rodríguez, M.A. (eds.) AlCoB 2017. LNCS, vol. 10252, pp. 49–61. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-58163-7_3CrossRefGoogle Scholar
  5. 5.
    Bonizzoni, P., Della Vedova, G., Pirola, Y., Previtali, M., Rizzi, R.: LSG: an external-memory tool to compute string graphs for next-generation sequencing data assembly. J. Comput. Biol. 23(3), 137–149 (2016)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Bonizzoni, P., Della Vedova, G., Pirola, Y., Previtali, M., Rizzi, R.: Computing the BWT and LCP array of a set of strings in external memory. CoRR abs/1705.07756 (2017). http://arxiv.org/abs/1705.07756
  7. 7.
    Bonizzoni, P., Della Vedova, G., Pirola, Y., Previtali, M., Rizzi, R.: FSG: fast string graph construction for de novo assembly. J. Comput. Biol. 24(10), 953–968 (2017)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical report, Digital Systems Research Center (1994)Google Scholar
  9. 9.
    Cox, A.J., Garofalo, F., Rosone, G., Sciortino, M.: Lightweight LCP construction for very large collections of strings. J. Discrete Algorithms 37(C), 17–33 (2016)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Egidi, L., Manzini, G.: Lightweight BWT and LCP merging via the gap algorithm. In: Fici, G., Sciortino, M., Venturini, R. (eds.) SPIRE 2017. LNCS, vol. 10508, pp. 176–190. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-67428-5_15CrossRefGoogle Scholar
  11. 11.
    Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and indexing labeled trees, with applications. J. ACM 57(1), 4:1–4:33 (2009)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Gagie, T., Manzini, G., Sirén, J.: Wheeler graphs: a framework for BWT-based data structures. Theor. Comput. Sci. 698, 67–78 (2017)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Holt, J., McMillan, L.: Merging of multi-string BWTs with applications. Bioinformatics 30(24), 3524–3531 (2014)CrossRefGoogle Scholar
  15. 15.
    Li, H.: Fast construction of FM-index for long sequence reads. Bioinformatics 30(22), 3274–3275 (2014)CrossRefGoogle Scholar
  16. 16.
    Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An extension of the Burrows-Wheeler transform. Theor. Comput. Sci. 387(3), 298–312 (2007)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Myers, E.: The fragment assembly string graph. Bioinformatics 21(suppl. 2), ii79–ii85 (2005)Google Scholar
  18. 18.
    Rosone, G., Sciortino, M.: The Burrows-Wheeler transform between data compression and combinatorics on words. In: Bonizzoni, P., Brattka, V., Löwe, B. (eds.) CiE 2013. LNCS, vol. 7921, pp. 353–364. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-39053-1_42CrossRefMATHGoogle Scholar
  19. 19.
    Simpson, J., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12), i367–i373 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Paola Bonizzoni
    • 1
  • Gianluca Della Vedova
    • 1
  • Serena Nicosia
    • 1
  • Yuri Pirola
    • 1
  • Marco Previtali
    • 1
  • Raffaella Rizzi
    • 1
  1. 1.DISCoUniversity of Milano-BicoccaMilanItaly

Personalised recommendations