Skip to main content

New Error Tolerant Method for Search of Long Repeats in DNA Sequences

  • Conference paper
  • First Online:
Algorithms for Computational Biology (AlCoB 2016)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9702))

Included in the following conference series:

Abstract

A new method to identify all sufficiently long repeating nucleotide substrings in one or several DNA sequences is proposed. The method based on a specific gauge applied to DNA sequences that guarantees identification of the repeating substrings. The method allows the matching substrings to contain a given level of errors. The gauge is based on the development of a heavily sparse dictionary of repeats, thus drastically accelerating the search procedure. Some biological applications illustrate the method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Altschul, S.F., Gish, W., Miller, W., Meyers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Article  Google Scholar 

  2. Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: SPIRE 2000, pp. 39–48. IEEE Computer Society (2000)

    Google Scholar 

  3. Chen, G.L., Chang, Y.J., Hsueh, C.H.: PRAP: an ab initio software package for automated genome-wide analysis of DNA repeats for prokaryotes. Bioinformatics. 29(21), 2683–2689 (2013)

    Article  Google Scholar 

  4. Erciyes, K.: Distributed and Sequential Algorithms for Bioinformatics. Computational Biology, vol. 23, p. 367. Springer, Cham (2015)

    MATH  Google Scholar 

  5. Girgis, H.Z.: Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinform. 16, 227 (2015)

    Article  Google Scholar 

  6. Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM 18(6), 341–343 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  7. Lian, S., Chen, X., Wang, P., Zhang, X., Dai, X.: A complete and accurate Ab initio repeat finding algorithm. Interdiscip. Sci. 8(1), 75–83 (2016)

    Article  Google Scholar 

  8. Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25(2), 322–336 (1978). ACM Press

    Article  MathSciNet  MATH  Google Scholar 

  9. Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20(1), 18–31 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  10. Novák, P., Neumann, P., Pech, J., Steinhaisl, J., Macas, J.: RepeatExplorer: a galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29(6), 792–793 (2013)

    Article  Google Scholar 

  11. Pearson, W., Lipman, D.: Improved tools for biological sequence comparison. PNAS USA 85, 2444–2448 (1988)

    Article  Google Scholar 

  12. Sadovsky, M.G.: Information capacity of nucleotide sequences and its applications. Bull. Math. Biology. 68, 156 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  13. https://en.wikipedia.org/wiki/Vernier_scale

Download references

Acknowledgments

This study was supported by a research grant No. 14.Y26.31.0004 from the Government of the Russian Federation (M.G. Sadovsky) and the grant from Russian Ministry of Education and Science to Siberian Federal University, contract No. 1.1462.2014/K (S.P. Tsarev).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael G. Sadovsky .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Tsarev, S.P., Sadovsky, M.G. (2016). New Error Tolerant Method for Search of Long Repeats in DNA Sequences. In: Botón-Fernández, M., Martín-Vide, C., Santander-Jiménez, S., Vega-Rodríguez, M.A. (eds) Algorithms for Computational Biology. AlCoB 2016. Lecture Notes in Computer Science(), vol 9702. Springer, Cham. https://doi.org/10.1007/978-3-319-38827-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-38827-4_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-38826-7

  • Online ISBN: 978-3-319-38827-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics