A Filtering Approach for Alignment-Free Biosequences Comparison with Mismatches

Pizzi, Cinzia

doi:10.1007/978-3-662-48221-6_17

A Filtering Approach for Alignment-Free Biosequences Comparison with Mismatches

Cinzia Pizzi⁶

Conference paper
First Online: 01 January 2015

1083 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9289))

Abstract

Alignment-free approaches for sequence similarity based on substring composition are increasingly attracting interest from the scientific community. In fact, in several contexts, with respect to alignment-based approaches, alignment-free techniques are faster but less accurate. Recently, several studies (e.g. [4, 8, 9]) attempted to bridge the accuracy gap with the introduction of approximate matches in the definition of composition-based distance measures.

In this work we present MissMax, an exact algorithm for the computation of the longest common substring with mismatches between each suffix of a sequence x and a sequence y. This collection of statistics is useful for the computation of two similarity distances that have been recently extended to incorporate approximate matching, namely the longest and the average common substring with k mismatches. Our approach is exact, and it is based on a filtering technique that showed, in a set of preliminary experiments, to substantially reduce the size of the set of potential sites of a longest match.

This research was partially supported by PRIN 20122F8B2.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Here we use a slightly different notation than the one used in [4].

References

Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2, 53–86 (2004)
Article MathSciNet MATH Google Scholar
Aluru, S., Apostolico, A., Thankachan, S.V.: Efficient alignment free sequence comparison with bounded mismatches. In: Przytycka, T.M. (ed.) RECOMB 2015. LNCS, vol. 9029, pp. 1–12. Springer, Heidelberg (2015)
Google Scholar
Apostolico, A., Denas, O., Dress, A.: Efficient tools for comparative substring analysis. J. Biotechnol. 149(3), 120–126 (2010)
Article Google Scholar
Apostolico A., Guerra, C., Pizzi, C.: Alignment free sequence similarity with bounded hamming distance. In: Data Compression Conference, pp. 183–192. IEEE Press (2014)
Google Scholar
Apostolico, A., Pizzi, C.: Motif discovery by monotone scores. Discrete Appl. Math. 155(6–7), 695–706 (2007)
Article MathSciNet MATH Google Scholar
Flouri, T., Giaquinta, E., Kobert, K., Ukkonen, E.: Longest common substrings with k mismatches. Inormation Process. Lett. 115(6–8), 643–647 (2015)
Article MathSciNet Google Scholar
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestor. SIAM J. Comput. 13, 338–355 (1984)
Article MathSciNet MATH Google Scholar
Leimeister, C.-A., Boden, M., Horwege, S., Lindner, S., Morgenstern, B.: Fast alignment-free sequence comparison using spaced-word frequencies. Bioinformatics 30(14), 1991–1999 (2014)
Article Google Scholar
Leimeister, C.-A., Morgenstern, B.: kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison. Bioinformatics 30(14), 2000–2008 (2014)
Article Google Scholar
Pizzi, C.: K-difference matching in amortized linear time for all the words in a text. Theor. Comput. Sci. 410(8–10), 983–987 (2009)
Article MathSciNet MATH Google Scholar
Qi, J., Wang, W., Hao, B.: Whole proteome prokaryote phylogeny without sequence alignment: a k-string composition approach. Mol. Evol. 58(1), 1–11 (2004)
Article Google Scholar
Ulitsky, I., Burstein, D., Tuller, T., Chor, B.: The average common substring approach to phylogenetic reconstruction. J. Comput. Biol. 13(2), 336–350 (2006)
Article MathSciNet Google Scholar
Vinga, S., Almeida, J.: Alignment-free sequence comparison - a review. Bioinformatics 20, 206–215 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, University of Padova, Padova, Italy
Cinzia Pizzi

Authors

Cinzia Pizzi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cinzia Pizzi .

Editor information

Editors and Affiliations

University of Maryland, College Park, Maryland, USA
Mihai Pop
University of Lille, Lille, France
Hélène Touzet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pizzi, C. (2015). A Filtering Approach for Alignment-Free Biosequences Comparison with Mismatches. In: Pop, M., Touzet, H. (eds) Algorithms in Bioinformatics. WABI 2015. Lecture Notes in Computer Science(), vol 9289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48221-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-662-48221-6_17
Published: 28 August 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48220-9
Online ISBN: 978-3-662-48221-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics