Advertisement

Parallel Computation for the All-Pairs Suffix-Prefix Problem

  • Felipe A. LouzaEmail author
  • Simon Gog
  • Leandro Zanotto
  • Guido Araujo
  • Guilherme P. Telles
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9954)

Abstract

We show how to parallelize the optimal algorithm proposed by Tustumi et al. [19] to solve the all-pairs suffix-prefix matching problem for general alphabets. We compared our parallel algorithm with \(\mathsf {SOF}\)  [17], a practical solution for DNA sequences that exhibits good time and space performance in multithreading environments. The experimental results showed that our parallel algorithm achieves a consistent speedup when compared with the sequential algorithm, and it is competitive with \(\mathsf {SOF}\) when the minimum overlap length is small.

Keywords

Suffix-prefix matching Parallel algorithm Multithreading Suffix array LCP array 

Notes

Acknowledgments

FAL acknowledges the financial support CAPES and CNPq (grant No. 162338/2015-5). GPT acknowledges the support of CNPq. The authors thank Prof. Nalvo Almeida for granting access to the machine used for the experiments.

References

  1. 1.
    Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53–86 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Dinh, H., Rajasekaran, S.: A memory-efficient data structure representing exact-match overlap graphs with application for next-generation DNA assembly. Bioinformatics 27(14), 1901–1907 (2011)CrossRefGoogle Scholar
  3. 3.
    El-Metwally, S., Hamza, T., Zakaria, M., Helmy, M.: Next-generation sequence assembly: four stages of data processing and computational challenges. PLoS Comput. Biol. 9(12), e1003345 (2013)CrossRefGoogle Scholar
  4. 4.
    Gog, S., Beller, T., Moffat, A., Petri, M.: From theory to practice: plug and play with succinct data structures. In: Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 326–337. Springer, Heidelberg (2014)Google Scholar
  5. 5.
    Gonnella, G., Kurtz, S.: Readjoiner: a fast and memory efficient string graph-based sequence assembler. BMC Bioinform. 13(1), 82 (2012)CrossRefGoogle Scholar
  6. 6.
    Gonnet, G.H., Baeza-Yates, R.A., Snider, T.: New indices for text: pat trees and pat arrays. In: Information Retrieval, pp. 66–82. Prentice-Hall Inc, Upper Saddle River (1992)Google Scholar
  7. 7.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)CrossRefzbMATHGoogle Scholar
  8. 8.
    Gusfield, D., Landau, G.M., Schieber, B.: An efficient algorithm for the all pairs suffix-prefix problem. Inf. Process. Lett. 41(4), 181–185 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Kalyanaraman, A., Aluru, S.: Handbook of computational molecular biology, chap. In: Expressed Sequence Tags: Clustering and applications. CRC Press, Boca Raton (2005)Google Scholar
  10. 10.
    Kasai, T., Lee, G.H., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  11. 11.
    Louza, F.A., Gog, S., Telles, G.P.: Induced suffix sorting for string collections. In: Proceeding DCC, pp. 43–52. IEEE, Snowbird (2016)Google Scholar
  12. 12.
    Louza, F.A., Telles, G.P., Ciferri, C.D.D.A.: External memory generalized suffix and LCP arrays construction. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 201–210. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  13. 13.
    Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Ohlebusch, E.: Bioinformatics Algorithms: Sequence Analysis, Genome Rearrangements, and Phylogenetic Reconstruction. Verlag, Oldenbusch (2013)zbMATHGoogle Scholar
  15. 15.
    Ohlebusch, E., Gog, S.: Efficient algorithms for the all-pairs suffix-prefix problem and the all-pairs substring-prefix problem. Inf. Process. Lett. 110(3), 123–128 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Puglisi, S.J., Smyth, W.F., Turpin, A.H.: A taxonomy of suffix array construction algorithms. ACM Comp. Surv. 39(2), 1–31 (2007)CrossRefGoogle Scholar
  17. 17.
    Rachid, M.H., Malluhi, Q.: A practical and scalable tool to find overlaps between sequences. BioMed Res. Int. 2015, 1–12 (2015)Google Scholar
  18. 18.
    Simpson, J.T., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12), i367–i373 (2010)CrossRefGoogle Scholar
  19. 19.
    Tustumi, W.H., Gog, S., Telles, G.P., Louza, F.A.: An improved algorithm for the all-pairs suffix-prefix problem. J. Discrete Algorithms 47, 34–43 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Weiner, P.: Linear pattern matching algorithms. In: Proceeding Annual Symposium on Switching and Automata Theory, pp. 1–11. IEEE Computer Society, Washington, DC (1973)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Felipe A. Louza
    • 1
    Email author
  • Simon Gog
    • 2
  • Leandro Zanotto
    • 1
  • Guido Araujo
    • 1
  • Guilherme P. Telles
    • 1
  1. 1.Institute of ComputingUniversity of CampinasSão PauloBrazil
  2. 2.Institute of Theoretical InformaticsKarlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations