Abstract
In this work, we consider the problem of pattern matching under the dynamic time warping (\(\textrm{DTW}\)) distance motivated by potential applications in the analysis of biological data produced by the third generation sequencing. To measure the \(\textrm{DTW}\) distance between two strings, one must “warp” them, that is, double some letters in the strings to obtain two equal-lengths strings, and then sum the distances between the letters in the corresponding positions. When the distances between letters are integers, we show that for a pattern P with m runs and a text T with n runs:
-
1.
There is an \(\mathcal {O}(m+n)\)-time algorithm that computes all locations where the \(\textrm{DTW}\) distance from P to T is at most 1;
-
2.
There is an \(\mathcal {O}(kmn)\)-time algorithm that computes all locations where the \(\textrm{DTW}\) distance from P to T is at most k.
As a corollary of the second result, we also derive an approximation algorithm for general metrics on the alphabet.
Keywords
- Dynamic time warping distance
- Pattern matching
- Small-distance regime
- Approximation algorithms
This work was partially funded by the grants ANR-20-CE48-0001, ANR-19-CE45-0008 SeqDigger and ANR-19-CE48-0016 from the French National Research Agency.
This is a preview of subscription content, access via your institution.
Buying options




Notes
- 1.
The preprocessing time \(\mathcal {O}(|\varSigma |^2 \log L)\) that is required to embed \(\mu \) into a well-separated metric is not accounted for in the runtime of the algorithm.
References
Abboud, A., Backurs, A., Williams, V.V.: Tight hardness results for LCS and other sequence similarity measures. In: FOCS 2015, pp. 59–78. IEEE Computer Society (2015). https://doi.org/10.1109/FOCS.2015.14
Amarasinghe, S.L., Su, S., Dong, X., Zappia, L., Ritchie, M.E., Gouil, Q.: Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21(1), 1–16 (2020)
Bansal, N., Buchbinder, N., Madry, A., Naor, J.: A polylogarithmic-competitive algorithm for the k-server problem. In: FOCS 2011, pp. 267–276 (2011). https://doi.org/10.1109/FOCS.2011.63
Braverman, V., Charikar, M., Kuszmaul, W., Woodruff, D.P., Yang, L.F.: The one-way communication complexity of dynamic time warping distance. In: SoCG 2019. LIPIcs, vol. 129, pp. 16:1–16:15 (2019). https://doi.org/10.4230/LIPIcs.SoCG.2019.16
Bringmann, K., Künnemann, M.: Quadratic conditional lower bounds for string problems and dynamic time warping. In: FOCS 2015, pp. 79–97 (2015). https://doi.org/10.1109/FOCS.2015.15
Chen, J.Q., Wu, Y., Yang, H., Bergelson, J., Kreitman, M., Tian, D.: Variation in the ratio of nucleotide substitution and indel rates across genomes in mammals and bacteria. Mol. Biol. Evol. 26(7), 1523–1531 (2009). https://doi.org/10.1093/molbev/msp063
Driemel, A., Silvestri, F.: Locality-sensitive hashing of curves. In: SoCG 2017. LIPIcs, vol. 77, pp. 37:1–37:16 (2017). https://doi.org/10.4230/LIPIcs.SoCG.2017.37
Dupont, M., Marteau, P.-F.: Coarse-DTW for sparse time series alignment. In: Douzal-Chouakria, A., Vilar, J.A., Marteau, P.-F. (eds.) AALTD 2015. LNCS (LNAI), vol. 9785, pp. 157–172. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44412-3_11
Emiris, I.Z., Psarros, I.: Products of euclidean metrics and applications to proximity questions among curves. In: SoCG 2018. LIPIcs, vol. 99, pp. 37:1–37:13 (2018). https://doi.org/10.4230/LIPIcs.SoCG.2018.37
Fakcharoenphol, J., Rao, S., Talwar, K.: A tight bound on approximating arbitrary metrics by tree metrics. In: STOC 2003, pp. 448–455 (2003). https://doi.org/10.1145/780542.780608
Froese, V., Jain, B.J., Rymar, M., Weller, M.: Fast exact dynamic time warping on run-length encoded time series. CoRR abs/1903.03003 (2019)
Gold, O., Sharir, M.: Dynamic time warping and geometric edit distance: breaking the quadratic barrier. ACM Trans. Algorithms 14(4), 50:1–50:17 (2018). https://doi.org/10.1145/3230734
Gonzalez-Garay, M.L.: Introduction to isoform sequencing using pacific biosciences technology (Iso-Seq). In: Wu, J. (ed.) Transcriptomics and Gene Regulation. TRBIO, vol. 9, pp. 141–160. Springer, Dordrecht (2016). https://doi.org/10.1007/978-94-017-7450-5_6
Huang, Y.T., Liu, P.Y., Shih, P.W.: Homopolish: a method for the removal of systematic errors in nanopore sequencing by homologous polishing. Genome Biol. 22(1), 95 (2021). https://doi.org/10.1186/s13059-021-02282-6
Hwang, Y., Gelfand, S.B.: Sparse dynamic time warping. In: Perner, P. (ed.) MLDM 2017. LNCS (LNAI), vol. 10358, pp. 163–175. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62416-7_12
Hwang, Y., Gelfand, S.B.: Binary sparse dynamic time warping. In: MLDM 2019, pp. 748–759. ibai Publishing (2019)
Kuszmaul, W.: Dynamic time warping in strongly subquadratic time: algorithms for the low-distance regime and approximate evaluation. In: ICALP 2019. LIPIcs, vol. 132, pp. 80:1–80:15 (2019). https://doi.org/10.4230/LIPIcs.ICALP.2019.80
Kuszmaul, W.: Dynamic time warping in strongly subquadratic time: algorithms for the low-distance regime and approximate evaluation. CoRR abs/1904.09690 (2019). https://doi.org/10.48550/ARXIV.1904.09690
Kuszmaul, W.: Binary dynamic time warping in linear time. CoRR abs/2101.01108 (2021)
Landau, G.M., Myers, E.W., Schmidt, J.P.: Incremental string comparison. SIAM J. Comput. 27(2), 557–582 (1998). https://doi.org/10.1137/S0097539794264810
Landau, G.M., Vishkin, U.: Fast string matching with k differences. J. Comput. Syst. Sci. 37(1), 63–78 (1988). https://doi.org/10.1016/0022-0000(88)90045-1
Li, H.: Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34(18), 3094–3100 (2018). https://doi.org/10.1093/bioinformatics/bty191
Mahmoud, M., Gobet, N., Cruz-Dávalos, D.I., Mounier, N., Dessimoz, C., Sedlazeck, F.J.: Structural variant calling: the long and the short of it. Genome Biol. 20(1), 1–14 (2019). https://doi.org/10.1186/s13059-019-1828-7
Mueen, A., Chavoshi, N., Abu-El-Rub, N., Hamooni, H., Minnich, A.: AWarp: fast warping distance for sparse time series. In: ICDM 2016, pp. 350–359. IEEE (2016)
Nishi, A., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: Towards efficient interactive computation of dynamic time warping distance. In: Boucher, C., Thankachan, S.V. (eds.) SPIRE 2020. LNCS, vol. 12303, pp. 27–41. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59212-7_3
Sakai, Y., Inenaga, S.: A reduction of the dynamic time warping distance to the longest increasing subsequence length. In: ISAAC 2020. LIPIcs, vol. 181, pp. 6:1–6:16 (2020). https://doi.org/10.4230/LIPIcs.ISAAC.2020.6
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Sig. Process. 26(1), 43–49 (1978)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix A
Lemma 2
Consider a block \(B = D[i_p\mathinner {.\,.}j_p, i_t \mathinner {.\,.}j_t]\) and cell (a, b) in it. If \(i_p \le a < j_p\), then \(D[a,b] \le D[a+1,b]\) and if \(i_t \le b < j_t\), then \(D[a,b] \le D[a,b+1]\).
Proof
Let us first give an equivalent statement of the lemma: if (a, b) and \((a+1,b)\) are in the same block, then \(D[a,b] \le D[a+1,b]\), and if (a, b) and \((a,b+1)\) are in the same block, then \(D[a,b] \le D[a,b+1]\).
We show the lemma by induction on \(a+b\). The base of the induction are the cells such that \(a = 0\) or \(b = 0\), and for them the statement holds by the definition of D. Consider now a cell (a, b), where \(a,b \ge 1\). Assume that the induction assumption holds for all cells (x, y) such that \(x+y < a+b\). By Eq. 1, we have:
Assume that (a, b) and \((a+1,b)\) are in the same block. We have \(D[a,b] \le D[a, b-1]+d\) and trivially \(D[a,b] \le D[a,b] + d\). By the induction assumption, \(D[a,b-1] \le D[a+1,b-1]\) (the cells \((a,b-1)\) and \((a+1,b-1)\) must belong to the same block). Therefore,
Assume now that (a, b) and \((a,b+1)\) are in the same block. We have \(D[a,b] \le D[a-1, b]+d\). Furthermore, as \((a-1,b)\) and \((a-1,b+1)\) are in the same block, we have \(D[a-1,b] \le D[a-1,b+1]\) by the induction assumption. Therefore,
This concludes the proof of the lemma. \(\square \)
Appendix B
Theorem 2
Given run-length encodings of a pattern P with m runs and of a text T with n runs over an alphabet \(\varSigma \). Assume that the \(\textrm{DTW}\) distance is specified by a metric \(\mu \) on \(\varSigma \), and suppose that the ratio between the largest and the smallest non-zero distances between the letters of \(\varSigma \) is at most exponential in \(L = \max \{|P|,|T|\}\). For any \(0< \epsilon < 1\), there is a \(\mathcal {O}(L^{1-\varepsilon } \cdot mn \log ^3 L)\)-time algorithm that computes \(\mathcal {O}(L^{\varepsilon })\)-approximation of the smallest \(\textrm{DTW}\) distance between P and a substring of T correctly with high probability (See Footnote 1).
Proof
Any metric \(\mu \) can be embedded in \(\mathcal {O}(\sigma ^2)\) time into a well-separated tree metric \(\mu _\tau \) of depth \(\mathcal {O}(\log \sigma )\) with expected distortion \(\mathcal {O}(\log \sigma )\) (see [10] and [3, Theorem 2.4]). Furthermore, the ratio between the smallest distance and the largest distance grows at most polynomially. Formally, for any two letters a, b we have \(\mu (a,b) \le \mu _\tau (a,b)\) and \(\mathbb {E}(\mu _\tau (a,b)) \le \mathcal {O}(\log \sigma ) \cdot d(a,b)\). Therefore, we have:
Let \(\delta = \min _{S-\text { substr. of }T} \textrm{DTW}_\mu (P,S)\) and \(\delta _\tau = \min _{S-\text { substr. of }T} \textrm{DTW}_{\mu _\tau } (P,S)\). Assume that \(\delta \) is realised on a substring X, and \(\delta _\tau \) on a substring \(X_\tau \). By Eq. 4, we then obtain:
And Eq. 5 gives the following:
We apply the embedding \(\log L\) times independently to obtain well-separated tree metrics \(\mu _\tau ^i\), \(i = 1, 2, \ldots , \log L\). From above and by Chernoff bounds,
gives an \(\mathcal {O}(\log \sigma ) = \mathcal {O}(\log L)\) approximation of \(\delta \) with high probability and can be computed in time \(\mathcal {O}(L^{1-\varepsilon } \cdot mn \log ^3 L)\) by Lemma 6, concluding the proof of the theorem. \(\square \)
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gourdel, G., Driemel, A., Peterlongo, P., Starikovskaya, T. (2022). Pattern Matching Under \(\textrm{DTW}\) Distance. In: Arroyuelo, D., Poblete, B. (eds) String Processing and Information Retrieval. SPIRE 2022. Lecture Notes in Computer Science, vol 13617. Springer, Cham. https://doi.org/10.1007/978-3-031-20643-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-20643-6_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20642-9
Online ISBN: 978-3-031-20643-6
eBook Packages: Computer ScienceComputer Science (R0)