Skip to main content

A Filtering Approach for Alignment-Free Biosequences Comparison with Mismatches

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9289))

Abstract

Alignment-free approaches for sequence similarity based on substring composition are increasingly attracting interest from the scientific community. In fact, in several contexts, with respect to alignment-based approaches, alignment-free techniques are faster but less accurate. Recently, several studies (e.g. [4, 8, 9]) attempted to bridge the accuracy gap with the introduction of approximate matches in the definition of composition-based distance measures.

In this work we present MissMax, an exact algorithm for the computation of the longest common substring with mismatches between each suffix of a sequence x and a sequence y. This collection of statistics is useful for the computation of two similarity distances that have been recently extended to incorporate approximate matching, namely the longest and the average common substring with k mismatches. Our approach is exact, and it is based on a filtering technique that showed, in a set of preliminary experiments, to substantially reduce the size of the set of potential sites of a longest match.

This research was partially supported by PRIN 20122F8B2.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Here we use a slightly different notation than the one used in [4].

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2, 53–86 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  2. Aluru, S., Apostolico, A., Thankachan, S.V.: Efficient alignment free sequence comparison with bounded mismatches. In: Przytycka, T.M. (ed.) RECOMB 2015. LNCS, vol. 9029, pp. 1–12. Springer, Heidelberg (2015)

    Google Scholar 

  3. Apostolico, A., Denas, O., Dress, A.: Efficient tools for comparative substring analysis. J. Biotechnol. 149(3), 120–126 (2010)

    Article  Google Scholar 

  4. Apostolico A., Guerra, C., Pizzi, C.: Alignment free sequence similarity with bounded hamming distance. In: Data Compression Conference, pp. 183–192. IEEE Press (2014)

    Google Scholar 

  5. Apostolico, A., Pizzi, C.: Motif discovery by monotone scores. Discrete Appl. Math. 155(6–7), 695–706 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  6. Flouri, T., Giaquinta, E., Kobert, K., Ukkonen, E.: Longest common substrings with k mismatches. Inormation Process. Lett. 115(6–8), 643–647 (2015)

    Article  MathSciNet  Google Scholar 

  7. Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestor. SIAM J. Comput. 13, 338–355 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  8. Leimeister, C.-A., Boden, M., Horwege, S., Lindner, S., Morgenstern, B.: Fast alignment-free sequence comparison using spaced-word frequencies. Bioinformatics 30(14), 1991–1999 (2014)

    Article  Google Scholar 

  9. Leimeister, C.-A., Morgenstern, B.: kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison. Bioinformatics 30(14), 2000–2008 (2014)

    Article  Google Scholar 

  10. Pizzi, C.: K-difference matching in amortized linear time for all the words in a text. Theor. Comput. Sci. 410(8–10), 983–987 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  11. Qi, J., Wang, W., Hao, B.: Whole proteome prokaryote phylogeny without sequence alignment: a k-string composition approach. Mol. Evol. 58(1), 1–11 (2004)

    Article  Google Scholar 

  12. Ulitsky, I., Burstein, D., Tuller, T., Chor, B.: The average common substring approach to phylogenetic reconstruction. J. Comput. Biol. 13(2), 336–350 (2006)

    Article  MathSciNet  Google Scholar 

  13. Vinga, S., Almeida, J.: Alignment-free sequence comparison - a review. Bioinformatics 20, 206–215 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cinzia Pizzi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pizzi, C. (2015). A Filtering Approach for Alignment-Free Biosequences Comparison with Mismatches. In: Pop, M., Touzet, H. (eds) Algorithms in Bioinformatics. WABI 2015. Lecture Notes in Computer Science(), vol 9289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48221-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48221-6_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48220-9

  • Online ISBN: 978-3-662-48221-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics