Skip to main content

Fast Algorithm for Vernier Search of Long Repeats in DNA Sequences with Bounded Error Density

  • Conference paper
  • First Online:
Algorithms for Computational Biology (AlCoB 2018)

Abstract

A novel algorithm to find all sufficiently long repeating nucleotide substrings in one or several DNA sequences is proposed. The algorithm searches approximately matching strings very fast with given level of local error density. Some biological applications illustrate the method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alsmadi, I., Nuser, M.: String matching evaluation methods for DNA comparison. Int. J. Adv. Sci. Technol. 47, 13–32 (2012)

    Google Scholar 

  2. Altschul, S.F., Gish, W., Miller, W., Meyers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Article  Google Scholar 

  3. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. NAR 25(17), 3389–3402 (1997)

    Article  Google Scholar 

  4. Bugaenko, N.N., Gorban, A.N., Sadovsky, M.G.: Maximum entropy method in analysis of genetic text and measurement of its information content. Open Syst. Inf. Dyn. 5(2), 265–278 (1998)

    Article  Google Scholar 

  5. Canzar, S., Salzberg, S.L.: Short read mapping: an algorithmic tour. Proc. IEEE 105(3), 436–458 (2017)

    Article  Google Scholar 

  6. Chegrane I., Belazzougui, D.: Simple, compact and robust approximate string dictionary. arXiv:1312.4678v2 [cs.DS] (2014)

    Article  MathSciNet  Google Scholar 

  7. Gonnet, G.H.: Some string matching problems from Bioinformatics which still need better solutions. J. Dis. Alg. 2(1), 3–15 (2004)

    MathSciNet  MATH  Google Scholar 

  8. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)

    MathSciNet  MATH  Google Scholar 

  9. Maier, D.: The complexity of some problems on substrings and supersequences. J. ACM 25(2), 322–336 (1978)

    Article  Google Scholar 

  10. Marçais, G., Delcher, A.L., Phillippy, A.M., Coston, R., Salzberg, S.L., Zimin, A.: MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14(1), e1005944 (2018)

    Article  Google Scholar 

  11. Martin, R.R.: On the computation of edit distance functions. Dis. Math. 338(2), 291–305 (2015)

    Article  MathSciNet  Google Scholar 

  12. Pearson, W., Lipman, D.: Improved tools for biological sequence comparison. PNAS 85, 2444–2448 (1988)

    Article  Google Scholar 

  13. Sadovsky, M.G.: Comparison of real frequencies of strings vs. the expected ones reveals the information capacity of macromoleculae. J. Biol. Phys. 29, 23 (2003)

    Article  Google Scholar 

  14. Sadovsky, M.G.: Information capacity of nucleotide sequences and its applications. Bull. Math. Biol. 68, 156 (2006)

    Article  MathSciNet  Google Scholar 

  15. Tsarev, S.P., Sadovsky, M.G.: New error tolerant method for search of long repeats in DNA sequences. In: Boton-Fernandez, M., Martin-Vide, C., Santander-Jimenez, S., Vega-Rodríguez, M.A. (eds.) Algorithms for Computational Biology. LNCS, vol. 9702, pp. 171–182. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-38827-4_14

    Chapter  MATH  Google Scholar 

  16. https://en.wikipedia.org/wiki/Vernier_scale

Download references

Acknowledgments

This study was supported by a research grant No. 14.Y26.31.0004 from the Government of the Russian Federation (M.G. Sadovsky) and the grant from Russian Ministry of Education and Science \(N^{o}\) 1.8591.2017/VU (S.P. Tsarev).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael G. Sadovsky .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tsarev, S.P., Senashova, M.Y., Sadovsky, M.G. (2018). Fast Algorithm for Vernier Search of Long Repeats in DNA Sequences with Bounded Error Density. In: Jansson, J., Martín-Vide, C., Vega-Rodríguez, M. (eds) Algorithms for Computational Biology. AlCoB 2018. Lecture Notes in Computer Science(), vol 10849. Springer, Cham. https://doi.org/10.1007/978-3-319-91938-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91938-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91937-9

  • Online ISBN: 978-3-319-91938-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics