Improved Filters for the Approximate Suffix-Prefix Overlap Problem

  • Gregory Kucherov
  • Dekel Tsur
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8799)


Computing suffix-prefix overlaps for a large collection of strings is a fundamental building block for the analysis of genomic next-generation sequencing data. The approximate suffix-prefix overlap problem is to find all pairs of strings from a given set such that a prefix of one string is similar to a suffix of the other. Välimäki et al. (Information and Computation, 2012) gave a solution to this problem based on suffix filters. In this work, we propose two improvements to the method of Välimäki et al. that reduce the running time of the computation.


Edit Distance Overlap Problem Index Point Partitioning Scheme Fundamental Building Block 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Burkhardt, S., Kärkkäinen, J.: Better filtering with gapped q-grams. Fundamenta Informaticae 56(1,2), 51–70 (2003)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Gusfield, D., Landau, G., Schieber, B.: An efficient algorithm for the all pairs suffix-prefix problem. Inf. Process. Lett. 41(4), 181–185 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Kärkkäinen, J., Na, J.C.: Faster filters for approximate string matching. In: Proc. 9th Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 84–90 (2007)Google Scholar
  4. 4.
    Kucherov, G., Salikhov, K., Tsur, D.: Approximate string matching using a bidirectional index. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 222–231. Springer, Heidelberg (2014), Full version at
  5. 5.
    Lam, T.W., Li, R., Tam, A., Wong, S.C.K., Wu, E., Yiu, S.-M.: High throughput short read alignment via bi-directional BWT. In: Proc. IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 31–36 (2009)Google Scholar
  6. 6.
    Li, Z., Chen, Y., Mu, D., Yuan, J., Shi, Y., Zhang, H., Gan, J., Li, N., Hu, X., Liu, B., Yang, B., Fan, W.: Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-Bruijn-graph. Brief Funct. Genomics 11(1), 25–37 (2012)CrossRefGoogle Scholar
  7. 7.
    Ma, B., Tromp, J., Li, M.: PatternHunter: Faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)CrossRefGoogle Scholar
  8. 8.
    Myers, E.W., Sutton, G.G., Delcher, A.L., Dew, I.M., Fasulo, D.P., Flanigan, M.J., Kravitz, S.A., Mobarry, C.M., Reinert, K.H., Remington, K.A., Anson, E.L., Bolanos, R.A., Chou, H.H., Jordan, C.M., Halpern, A.L., Lonardi, S., Beasley, E.M., Brandon, R.C., Chen, L., Dunn, P.J., Lai, Z., Liang, Y., Nusskern, D.R., Zhan, M., Zhang, Q., Zheng, X., Rubin, G.M., Adams, M.D., Venter, J.C.: A whole-genome assembly of Drosophila. Science 287(5461), 2196–2204 (2000)CrossRefGoogle Scholar
  9. 9.
    Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings – Practical on-line search algorithms for texts and biological sequences. Cambridge University Press (2002)Google Scholar
  10. 10.
    Noé, L., Kucherov, G.: YASS: Enhancing the sensitivity of DNA similarity search. Nucleic Acid Research 33, W540–W543 (2005)Google Scholar
  11. 11.
    Ohlebusch, E., Gog, S.: Efficient algorithms for the all-pairs suffix-prefix problem and the all-pairs substring-prefix problem. Information Processing Letters 110(3), 123–128 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Simpson, J.T., Durbin, R.: Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22(3), 549–556 (2012)CrossRefGoogle Scholar
  13. 13.
    Välimäki, N., Ladra, S., Mäkinen, V.: Approximate all-pairs suffix/prefix overlaps. Information and Computation 213, 49–58 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Vyverman, M., De Baets, B., Fack, V., Dawyndt, P.: Prospects and limitations of full-text index structures in genome analysis. Nucleic Acids Res. 40(15), 6993–7015 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Gregory Kucherov
    • 1
    • 2
  • Dekel Tsur
    • 2
  1. 1.CNRS/LIGMUniversité Paris-Est Marne-la-ValléeFrance
  2. 2.Department of Computer ScienceBen-Gurion University of the NegevIsrael

Personalised recommendations