Faster Algorithms for 1-Mappability of a Sequence

Alzamel, Mai; Charalampopoulos, Panagiotis; Iliopoulos, Costas S.; Pissis, Solon P.; Radoszewski, Jakub; Sung, Wing-Kin

doi:10.1007/978-3-319-71147-8_8

Mai Alzamel¹⁶,
Panagiotis Charalampopoulos¹⁶,
Costas S. Iliopoulos¹⁶,
Solon P. Pissis¹⁶,
Jakub Radoszewski^16,17 &
…
Wing-Kin Sung¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10628))

Included in the following conference series:

International Conference on Combinatorial Optimization and Applications

951 Accesses
5 Citations

Abstract

In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where $k=1$. The fastest known algorithm for $k=1$ requires time $\mathcal {O}(mn \log n/\log \log n)$ and space $\mathcal {O}(n)$. We present two new algorithms that require worst-case time $\mathcal {O}(mn)$ and $\mathcal {O}(n \log n \log \log n)$, respectively, and space $\mathcal {O}(n)$, thus greatly improving the state of the art. Moreover, we present another algorithm that requires average-case time and space $\mathcal {O}(n)$ for integer alphabets of size $\sigma $ if $m=\varOmega (\log _\sigma n)$. Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time $\mathcal {O}(kn)$ and space $\mathcal {O}(n)$ if $m=\varOmega (k\log _\sigma n)$.

M. Alzamel and C.S. Iliopoulos—Partially supported by the Onassis Foundation.

J. Radoszewski—Supported by the “Algorithms for text processing with errors and uncertainties” project carried out within the HOMING programme of the Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Longest Common Extensions in Sublinear Space

Complete Variable-Length Codes: An Excursion into Word Edit Operations

$$\textit{K}$$ -trivial, $$\textit{K}$$ -low and $${{\mathrm{\textit{MLR}}}}$$ -low Sequences: A Tutorial

References

Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM Trans. Algor. 3(2), 19 (2007). http://doi.acm.org/10.1145/1240233.1240242
Article MATH MathSciNet Google Scholar
Antoniou, P., Daykin, J.W., Iliopoulos, C.S., Kourie, D., Mouchard, L., Pissis, S.P.: Mapping uniquely occurring short sequences derived from high throughput technologies to a reference genome. In: 2009 9th International Conference on Information Technology and Applications in Biomedicine, pp. 1–4. IEEE Computer Society (2009). https://doi.org/10.1109/ITAB.2009.5394394
Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000). https://doi.org/10.1007/10719839_9
Chapter Google Scholar
Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Babai, L. (ed.) Proceedings of the 36th Annual ACM Symposium on Theory of Computing, 2004, pp. 91–100. ACM (2004). http://doi.acm.org/10.1145/1007352.1007374
Crochemore, M., Tischler, G.: The gapped suffix array: a new index structure for fast approximate matching. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 359–364. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16321-0_37
Chapter Google Scholar
Derrien, T., Estellé, J., Marco Sola, S., Knowles, D., Raineri, E., Guigó, R., Ribeca, P.: Fast computation and applications of genome mappability. PLoS ONE 7(1), e30377 (2012). https://doi.org/10.1371/journal.pone.0030377
Article Google Scholar
Farach, M.: Optimal suffix tree construction with large alphabets. In: 38th Annual Symposium on Foundations of Computer Science, FOCS 1997, pp. 137–143. IEEE Computer Society (1997). https://doi.org/10.1109/SFCS.1997.646102
Fischer, J.: Inducing the LCP-array. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) WADS 2011. LNCS, vol. 6844, pp. 374–385. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22300-6_32
Chapter Google Scholar
Fischer, J., Köppl, D., Kurpicz, F.: On the benefit of merging suffix array intervals for parallel pattern matching. In: Grossi, R., Lewenstein, M. (eds.) 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016. LIPIcs, vol. 54, pp. 26:1–26:11. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2016). https://doi.org/10.4230/LIPIcs.CPM.2016.26
Fonseca, N.A., Rung, J., Brazma, A., Marioni, J.C.: Tools for mapping high-throughput sequencing data. Bioinformatics 28(24), 3169–3177 (2012). https://doi.org/10.1093/bioinformatics/bts605
Article Google Scholar
Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with O(1) worst case access time. J. ACM 31(3), 538–544 (1984). http://doi.acm.org/10.1145/828.1884
Article MATH MathSciNet Google Scholar
Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993). https://doi.org/10.1137/0222058
Article MATH MathSciNet Google Scholar
Manzini, G.: Longest common prefix with mismatches. In: Iliopoulos, C., Puglisi, S., Yilmaz, E. (eds.) SPIRE 2015. LNCS, vol. 9309, pp. 299–310. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23826-5_29
Chapter Google Scholar
Metzker, M.L.: Sequencing technologies - the next generation. Nat. Rev. Genet. 11(1), 31–46 (2010). https://doi.org/10.1038/nrg2626
Article Google Scholar
Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: Storer, J.A., Marcellin, M.W. (eds.) 2009 Data Compression Conference (DCC 2009), pp. 193–202. IEEE Computer Society (2009). https://doi.org/10.1109/DCC.2009.42
Thankachan, S.V., Apostolico, A., Aluru, S.: A provably efficient algorithm for the k-mismatch average common substring problem. J. Comput. Biol. 23(6), 472–482 (2016). https://doi.org/10.1089/cmb.2015.0235
Article MathSciNet Google Scholar

Download references

Acknowledgements

We warmly thank Szymon Grabowski who drew our attention via personal communication to Remark 10 and Ref. [9]; the latter reduced the complexity of the algorithm described in Sect. 4.2 from $\mathcal {O}(n \log ^2 n)$ to $\mathcal {O}(n \log n \log \log n)$.

Author information

Authors and Affiliations

Department of Informatics, King’s College London, London, UK
Mai Alzamel, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Solon P. Pissis & Jakub Radoszewski
Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
Jakub Radoszewski
Department of Computer Science, National University of Singapore, Singapore, Singapore
Wing-Kin Sung

Authors

Mai Alzamel
View author publications
You can also search for this author in PubMed Google Scholar
Panagiotis Charalampopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Costas S. Iliopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Solon P. Pissis
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Radoszewski
View author publications
You can also search for this author in PubMed Google Scholar
Wing-Kin Sung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakub Radoszewski .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Xiaofeng Gao
Harbin Institute of Technology, Shenzhen, China
Hongwei Du
Kennesaw State University, Kennesaw, Georgia, USA
Meng Han

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alzamel, M., Charalampopoulos, P., Iliopoulos, C.S., Pissis, S.P., Radoszewski, J., Sung, WK. (2017). Faster Algorithms for 1-Mappability of a Sequence. In: Gao, X., Du, H., Han, M. (eds) Combinatorial Optimization and Applications. COCOA 2017. Lecture Notes in Computer Science(), vol 10628. Springer, Cham. https://doi.org/10.1007/978-3-319-71147-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-71147-8_8
Published: 16 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71146-1
Online ISBN: 978-3-319-71147-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Faster Algorithms for 1-Mappability of a Sequence

Abstract

Access this chapter

Similar content being viewed by others

Longest Common Extensions in Sublinear Space

Complete Variable-Length Codes: An Excursion into Word Edit Operations

$$\textit{K}$$ -trivial, $$\textit{K}$$ -low and $${{\mathrm{\textit{MLR}}}}$$ -low Sequences: A Tutorial

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Faster Algorithms for 1-Mappability of a Sequence

Abstract

Access this chapter

Similar content being viewed by others

Longest Common Extensions in Sublinear Space

Complete Variable-Length Codes: An Excursion into Word Edit Operations

$$\textit{K}$$ -trivial, $$\textit{K}$$ -low and $${{\mathrm{\textit{MLR}}}}$$ -low Sequences: A Tutorial

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation