Faster Algorithms for 1-Mappability of a Sequence

  • Mai Alzamel
  • Panagiotis Charalampopoulos
  • Costas S. Iliopoulos
  • Solon P. Pissis
  • Jakub RadoszewskiEmail author
  • Wing-Kin Sung
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10628)


In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where \(k=1\). The fastest known algorithm for \(k=1\) requires time \(\mathcal {O}(mn \log n/\log \log n)\) and space \(\mathcal {O}(n)\). We present two new algorithms that require worst-case time \(\mathcal {O}(mn)\) and \(\mathcal {O}(n \log n \log \log n)\), respectively, and space \(\mathcal {O}(n)\), thus greatly improving the state of the art. Moreover, we present another algorithm that requires average-case time and space \(\mathcal {O}(n)\) for integer alphabets of size \(\sigma \) if \(m=\varOmega (\log _\sigma n)\). Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time \(\mathcal {O}(kn)\) and space \(\mathcal {O}(n)\) if \(m=\varOmega (k\log _\sigma n)\).



We warmly thank Szymon Grabowski who drew our attention via personal communication to Remark 10 and Ref. [9]; the latter reduced the complexity of the algorithm described in Sect. 4.2 from \(\mathcal {O}(n \log ^2 n)\) to \(\mathcal {O}(n \log n \log \log n)\).


  1. 1.
    Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM Trans. Algor. 3(2), 19 (2007). CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Antoniou, P., Daykin, J.W., Iliopoulos, C.S., Kourie, D., Mouchard, L., Pissis, S.P.: Mapping uniquely occurring short sequences derived from high throughput technologies to a reference genome. In: 2009 9th International Conference on Information Technology and Applications in Biomedicine, pp. 1–4. IEEE Computer Society (2009).
  3. 3.
    Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000). CrossRefGoogle Scholar
  4. 4.
    Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Babai, L. (ed.) Proceedings of the 36th Annual ACM Symposium on Theory of Computing, 2004, pp. 91–100. ACM (2004).
  5. 5.
    Crochemore, M., Tischler, G.: The gapped suffix array: a new index structure for fast approximate matching. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 359–364. Springer, Heidelberg (2010). CrossRefGoogle Scholar
  6. 6.
    Derrien, T., Estellé, J., Marco Sola, S., Knowles, D., Raineri, E., Guigó, R., Ribeca, P.: Fast computation and applications of genome mappability. PLoS ONE 7(1), e30377 (2012). CrossRefGoogle Scholar
  7. 7.
    Farach, M.: Optimal suffix tree construction with large alphabets. In: 38th Annual Symposium on Foundations of Computer Science, FOCS 1997, pp. 137–143. IEEE Computer Society (1997).
  8. 8.
    Fischer, J.: Inducing the LCP-array. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) WADS 2011. LNCS, vol. 6844, pp. 374–385. Springer, Heidelberg (2011). CrossRefGoogle Scholar
  9. 9.
    Fischer, J., Köppl, D., Kurpicz, F.: On the benefit of merging suffix array intervals for parallel pattern matching. In: Grossi, R., Lewenstein, M. (eds.) 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016. LIPIcs, vol. 54, pp. 26:1–26:11. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2016).
  10. 10.
    Fonseca, N.A., Rung, J., Brazma, A., Marioni, J.C.: Tools for mapping high-throughput sequencing data. Bioinformatics 28(24), 3169–3177 (2012). CrossRefGoogle Scholar
  11. 11.
    Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with O(1) worst case access time. J. ACM 31(3), 538–544 (1984). CrossRefzbMATHMathSciNetGoogle Scholar
  12. 12.
    Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993). CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Manzini, G.: Longest common prefix with mismatches. In: Iliopoulos, C., Puglisi, S., Yilmaz, E. (eds.) SPIRE 2015. LNCS, vol. 9309, pp. 299–310. Springer, Cham (2015). CrossRefGoogle Scholar
  14. 14.
    Metzker, M.L.: Sequencing technologies - the next generation. Nat. Rev. Genet. 11(1), 31–46 (2010). CrossRefGoogle Scholar
  15. 15.
    Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: Storer, J.A., Marcellin, M.W. (eds.) 2009 Data Compression Conference (DCC 2009), pp. 193–202. IEEE Computer Society (2009).
  16. 16.
    Thankachan, S.V., Apostolico, A., Aluru, S.: A provably efficient algorithm for the k-mismatch average common substring problem. J. Comput. Biol. 23(6), 472–482 (2016). CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Mai Alzamel
    • 1
  • Panagiotis Charalampopoulos
    • 1
  • Costas S. Iliopoulos
    • 1
  • Solon P. Pissis
    • 1
  • Jakub Radoszewski
    • 1
    • 2
    Email author
  • Wing-Kin Sung
    • 3
  1. 1.Department of InformaticsKing’s College LondonLondonUK
  2. 2.Faculty of Mathematics, Informatics and MechanicsUniversity of WarsawWarsawPoland
  3. 3.Department of Computer ScienceNational University of SingaporeSingaporeSingapore

Personalised recommendations