Abstract
Alignment-free approaches for sequence similarity based on substring composition are increasingly attracting interest from the scientific community. In fact, in several contexts, with respect to alignment-based approaches, alignment-free techniques are faster but less accurate. Recently, several studies (e.g. [4, 8, 9]) attempted to bridge the accuracy gap with the introduction of approximate matches in the definition of composition-based distance measures.
In this work we present MissMax, an exact algorithm for the computation of the longest common substring with mismatches between each suffix of a sequence x and a sequence y. This collection of statistics is useful for the computation of two similarity distances that have been recently extended to incorporate approximate matching, namely the longest and the average common substring with k mismatches. Our approach is exact, and it is based on a filtering technique that showed, in a set of preliminary experiments, to substantially reduce the size of the set of potential sites of a longest match.
This research was partially supported by PRIN 20122F8B2.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Here we use a slightly different notation than the one used in [4].
References
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2, 53–86 (2004)
Aluru, S., Apostolico, A., Thankachan, S.V.: Efficient alignment free sequence comparison with bounded mismatches. In: Przytycka, T.M. (ed.) RECOMB 2015. LNCS, vol. 9029, pp. 1–12. Springer, Heidelberg (2015)
Apostolico, A., Denas, O., Dress, A.: Efficient tools for comparative substring analysis. J. Biotechnol. 149(3), 120–126 (2010)
Apostolico A., Guerra, C., Pizzi, C.: Alignment free sequence similarity with bounded hamming distance. In: Data Compression Conference, pp. 183–192. IEEE Press (2014)
Apostolico, A., Pizzi, C.: Motif discovery by monotone scores. Discrete Appl. Math. 155(6–7), 695–706 (2007)
Flouri, T., Giaquinta, E., Kobert, K., Ukkonen, E.: Longest common substrings with k mismatches. Inormation Process. Lett. 115(6–8), 643–647 (2015)
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestor. SIAM J. Comput. 13, 338–355 (1984)
Leimeister, C.-A., Boden, M., Horwege, S., Lindner, S., Morgenstern, B.: Fast alignment-free sequence comparison using spaced-word frequencies. Bioinformatics 30(14), 1991–1999 (2014)
Leimeister, C.-A., Morgenstern, B.: kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison. Bioinformatics 30(14), 2000–2008 (2014)
Pizzi, C.: K-difference matching in amortized linear time for all the words in a text. Theor. Comput. Sci. 410(8–10), 983–987 (2009)
Qi, J., Wang, W., Hao, B.: Whole proteome prokaryote phylogeny without sequence alignment: a k-string composition approach. Mol. Evol. 58(1), 1–11 (2004)
Ulitsky, I., Burstein, D., Tuller, T., Chor, B.: The average common substring approach to phylogenetic reconstruction. J. Comput. Biol. 13(2), 336–350 (2006)
Vinga, S., Almeida, J.: Alignment-free sequence comparison - a review. Bioinformatics 20, 206–215 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pizzi, C. (2015). A Filtering Approach for Alignment-Free Biosequences Comparison with Mismatches. In: Pop, M., Touzet, H. (eds) Algorithms in Bioinformatics. WABI 2015. Lecture Notes in Computer Science(), vol 9289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48221-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-662-48221-6_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48220-9
Online ISBN: 978-3-662-48221-6
eBook Packages: Computer ScienceComputer Science (R0)