New Error Tolerant Method for Search of Long Repeats in DNA Sequences

Tsarev, Sergey P.; Sadovsky, Michael G.

doi:10.1007/978-3-319-38827-4_14

Sergey P. Tsarev¹⁸ &
Michael G. Sadovsky¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9702))

Included in the following conference series:

International Conference on Algorithms for Computational Biology

532 Accesses
2 Citations

Abstract

A new method to identify all sufficiently long repeating nucleotide substrings in one or several DNA sequences is proposed. The method based on a specific gauge applied to DNA sequences that guarantees identification of the repeating substrings. The method allows the matching substrings to contain a given level of errors. The gauge is based on the development of a heavily sparse dictionary of repeats, thus drastically accelerating the search procedure. Some biological applications illustrate the method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Altschul, S.F., Gish, W., Miller, W., Meyers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Article Google Scholar
Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: SPIRE 2000, pp. 39–48. IEEE Computer Society (2000)
Google Scholar
Chen, G.L., Chang, Y.J., Hsueh, C.H.: PRAP: an ab initio software package for automated genome-wide analysis of DNA repeats for prokaryotes. Bioinformatics. 29(21), 2683–2689 (2013)
Article Google Scholar
Erciyes, K.: Distributed and Sequential Algorithms for Bioinformatics. Computational Biology, vol. 23, p. 367. Springer, Cham (2015)
MATH Google Scholar
Girgis, H.Z.: Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinform. 16, 227 (2015)
Article Google Scholar
Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM 18(6), 341–343 (1975)
Article MathSciNet MATH Google Scholar
Lian, S., Chen, X., Wang, P., Zhang, X., Dai, X.: A complete and accurate Ab initio repeat finding algorithm. Interdiscip. Sci. 8(1), 75–83 (2016)
Article Google Scholar
Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25(2), 322–336 (1978). ACM Press
Article MathSciNet MATH Google Scholar
Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20(1), 18–31 (1980)
Article MathSciNet MATH Google Scholar
Novák, P., Neumann, P., Pech, J., Steinhaisl, J., Macas, J.: RepeatExplorer: a galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29(6), 792–793 (2013)
Article Google Scholar
Pearson, W., Lipman, D.: Improved tools for biological sequence comparison. PNAS USA 85, 2444–2448 (1988)
Article Google Scholar
Sadovsky, M.G.: Information capacity of nucleotide sequences and its applications. Bull. Math. Biology. 68, 156 (2006)
Article MathSciNet MATH Google Scholar
https://en.wikipedia.org/wiki/Vernier_scale

Download references

Acknowledgments

This study was supported by a research grant No. 14.Y26.31.0004 from the Government of the Russian Federation (M.G. Sadovsky) and the grant from Russian Ministry of Education and Science to Siberian Federal University, contract No. 1.1462.2014/K (S.P. Tsarev).

Author information

Authors and Affiliations

Siberian Federal University, Kirenskogo, 26, 660074, Krasnoyarsk, Russia
Sergey P. Tsarev
Institute of Computational Modelling of SB RAS, Akademgorodok, 660036, Krasnoyarsk, Russia
Michael G. Sadovsky

Authors

Sergey P. Tsarev
View author publications
You can also search for this author in PubMed Google Scholar
Michael G. Sadovsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael G. Sadovsky .

Editor information

Editors and Affiliations

CETA-Ciemat, Trujillo, Spain
María Botón-Fernández
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide
University of Extremadura, Cáceres, Spain
Sergio Santander-Jiménez
University of Extremadura, Cáceres, Spain
Miguel A. Vega-Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tsarev, S.P., Sadovsky, M.G. (2016). New Error Tolerant Method for Search of Long Repeats in DNA Sequences. In: Botón-Fernández, M., Martín-Vide, C., Santander-Jiménez, S., Vega-Rodríguez, M.A. (eds) Algorithms for Computational Biology. AlCoB 2016. Lecture Notes in Computer Science(), vol 9702. Springer, Cham. https://doi.org/10.1007/978-3-319-38827-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-38827-4_14
Published: 12 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38826-7
Online ISBN: 978-3-319-38827-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics