Abstract
New error tolerant method for the comparison and analysis of symbol sequences is proposed. The method is based on convolution function calculation, where the function is defined over the binary numeric sequences obtained by the specific transformation of original symbol sequence. The method allows highly parallel implementation and is of great value for insertion/delition mutations search. To calculate the convolution function, fast Fourier transform is used in the method implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Actually, there is no difference between insertion and deletion: changing a reference sequence, one always is able to convert the situation to a single mutation, e. g. to insertion.
References
Tsarev, S.P., Sadovsky, M.G.: New error tolerant method for search of long repeats in DNA sequences. In: Botón-Fernández, M., MartÃn-Vide, C., Santander-Jiménez, S., Vega-RodrÃguez, M.A. (eds.) AlCoB 2016. LNCS, vol. 9702, pp. 171–182. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-38827-4_14
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Freschi, V., Bogliolo, A.: A faster algorithm for the computation of string convolutions using lz78 parsing. Inform. Process. Lett. 110(14), 609–613 (2010)
Freschi, V., Bogliolo, A.: Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism. Inform. Process. Lett. 90(4), 167–173 (2004)
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)
Katoh, K., Misawa, K., Kuma, K.I., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30(14), 3059–3066 (2002)
Janacek, G.J., Bagnall, A.J., Powell, M.: A likelihood ratio distance measure for the similarity between the Fourier transform of time series. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 737–743. Springer, Heidelberg (2005). https://doi.org/10.1007/11430919_85
Hetland, M.L.: A survey of recent methods for efficient retrieval of similar time sequences. In: Data Mining in Time Series Databases, pp. 23–42. World Scientific (2004)
Benson, D.C.: Fourier methods for biosequence analysis. Nucleic Acids Res. 18(21), 6305–6310 (1990)
Aho, A.V., Hopcroft, J.E.: The Design and Analysis of Computer Algorithms. Pearson Education India, Bengaluru (1974)
Baase, S.: Computer Algorithms: Introduction to Design and Analysis. Pearson Education India, Bengaluru (2009)
Kozen, D.C.: The Design and Analysis of Algorithms. Springer, Heidleberg (2012)
Levenshtein, V.I.: Bounds for deletion/insertion correcting codes. In: Proceedings IEEE International Symposium on Information Theory, p. 370. IEEE (2002)
Merhi, S., Zhang, R., Iwen, M.A., Christlieb, A.: A new class of fully discrete sparse fourier transforms: faster stable implementations with guarantees. J. Fourier Anal. Appl. 25(3), 751–784 (2019)
Karam, C., Sugimoto, K., Hirakawa, K.: Fast convolutional distance transform. IEEE Signal Process. Lett. 26(6), 853–857 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Molyavko, A., Shaidurov, V., Karepova, E., Sadovsky, M. (2020). Highly Parallel Convolution Method to Compare DNA Sequences with Enforced In/Del and Mutation Tolerance. In: Rojas, I., Valenzuela, O., Rojas, F., Herrera, L., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2020. Lecture Notes in Computer Science(), vol 12108. Springer, Cham. https://doi.org/10.1007/978-3-030-45385-5_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-45385-5_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45384-8
Online ISBN: 978-3-030-45385-5
eBook Packages: Computer ScienceComputer Science (R0)