Skip to main content
Log in

Improving the efficiency of traditional DTW accelerators

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Dynamic time warping (DTW) is the most popular approach for evaluating the similarity of time series, but its computation is costly. Therefore, simple functions lower bounding DTW distances have been designed, accelerating searches by quickly pruning sequences that could not possibly be best matches. The tighter the bounds, the more they prune and the better the performance. Designing new functions that are even tighter is difficult because their computation is likely to become complex, canceling the benefits of their pruning. It is possible, however, to design simple functions with a higher pruning power by relaxing the no false dismissal assumption, resulting in approximate lower bound functions. This paper describes how very popular approaches accelerating DTW such as \(\text {LB}\_\text {Keogh}{}\) and \(\text {LB}\_\text {PAA}{}\) can be made more efficient via approximations. The accuracy of approximations can be tuned, ranging from no false dismissal to potential losses when aggressively set for great response time savings. At very large scale, indexing time series is mandatory. This paper also describes how approximate lower bound functions can be used with iSAX. Furthermore, it shows that a \(k\)-means-based quantization step for iSAX gives significant performance gains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The \(\mathcal{A}\_{}\) prefix stand for approximate.

  2. Note that, in this paper, we chose to rely on the DTW formulation that is used in the work of Sakoe and Chiba [14], which leads to the Manhattan distance being an upper bound while, when using the same formulation as in Keogh and Ratanamahatana [8], DTW is upper-bounded by Euclidean distance.

  3. There exists exactly one such node. isax_approximate_search is in fact the method defined in Shieh and Keogh [15] and used in the next section describing \(i\text {SAX}\_\text {Approx}{}\) indexing.

References

  1. Aach J, Church G (2001) Aligning gene expression time series with time warping algorithms. Bioinformatics 17(6):495

    Article  Google Scholar 

  2. Camerra A, Palpanas T, Shieh J, Keogh EJ (2010) iSAX 2.0: indexing and mining one billion time series. In: Proceedings of the IEEE international conference on data mining

  3. Chu S, Keogh E, Hart D, Pazzani M et al (2002) Iterative deepening dynamic time warping for time series. In: Proceedings of the SIAM international conference on data mining

  4. Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of the ACM SIGMOD conference on management of data

  5. Gavrila D, Davis L (1995) Towards 3-d model-based tracking and recognition of human movement: a multi-view approach. In: International workshop on automatic face-and gesture-recognition, pp 272–277

  6. Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23(1):67–72

    Article  Google Scholar 

  7. Kashyap S, Karras P (2011) Scalable knn search on vertically stored time series. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining, ACM, pp 1334–1342

  8. Keogh E, Ratanamahatana C (2005) Exact indexing of dynamic time warping. Knowl Inform Syst 7(3):358–386

    Article  Google Scholar 

  9. Keogh E, Xi X, Wei L, Ratanamahatana CA (2006) The ucr time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data/

  10. Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Mining Knowl Discov 15(2):107–144

    Article  MathSciNet  Google Scholar 

  11. Munich M, Perona P (1999) Continuous dynamic time warping for translation-invariant curve alignment with applications to signature verification. In: Proceedings of the IEEE international conference on computer vision, vol 1, pp 108–115

  12. Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2161–2168

  13. Paulevé L, Jégou H, Amsaleg L (2010) Locality sensitive hashing: a comparison of hash function types and querying mechanisms. Pattern Recogn Lett 31(11):1348–1358

    Article  Google Scholar 

  14. Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26:43–49

    Article  MATH  Google Scholar 

  15. Shieh J, Keogh E (2008) iSAX: indexing and mining terabyte sized time series. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining

  16. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the IEEE international conference on computer vision, pp 1470–1477

  17. Tavenard R, Jégou H, Amsaleg L (2011) Balancing clusters to reduce response time variability in large scale image search. In: Proceedings of the IEEE workshop on content-based multimedia indexing

  18. Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh EJ (2003) Indexing multi-dimensional time-series with support for multiple distance measures. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining, pp 216–225

  19. Yi B, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Proceedings of the IEEE international conference on data engineering

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Romain Tavenard.

Additional information

This work has been conducted while Romain Tavenard was pursuing his Ph.D. at INRIA, Rennes, with a scholarship from Université de Rennes 1.

Appendices

Appendices

1.1 Proofs and mathematical definitions for upper bounds

We prove here that UB_Keogh is upper bounding DTW when the latter is restricted to a Sakoe–Chiba band. We also introduce upper bounds related to \(\text {LB}\_\text {PAA}{}\) and \(i\text {SAX}\_\text {MinDist}{}\) for which we omit the proofs as they follow the exact same principles.

Definition 1

Let UB_Keogh be:

$$\begin{aligned} \text {UB}\_\text {Keogh}{}(Q,C) = \sum _{i=1}^{n} {\left\{ \begin{array}{ll} (c_i - L_i) &{} \text { if } c_i > U_i\\ (U_i - c_i) &{} \text { if } c_i < L_i\\ \max (U_i - c_i, c_i - L_i) &{} \text { otherwise} \end{array}\right. } \end{aligned}$$
(11)

Lemma 1

For any two sequences \(Q\) and \(C\) of length \(n\), the following inequality stands:

$$\begin{aligned} L_1(Q,C) \ge \text {DTW}{}(Q,C) \end{aligned}$$

where the considered DTW is constrained to a Sakoe–Chiba band of width \(r\).

Proof

Let \(Q\) and \(C\) be two sequences of length \(n\). Manhattan distance \(L_1\) corresponds to the alignment that follows the diagonal path. Hence, this distance is associated with one of the possible paths considered by the DTWalgorithm and is therefore greater than the cost of the minimal path, that is the value returned by DTW, which concludes the proof for Lemma 1. \(\square \)

Proposition 1

For any two sequences \(Q\) and \(C\) of length \(n\), the following inequality stands:

$$\begin{aligned} \text {UB}\_\text {Keogh}{}(Q,C) \ge \text {DTW}{}(Q,C) \end{aligned}$$

where the considered DTW is constrained to a Sakoe–Chiba band of width \(r\).

Proof

It is important to notice that each term in the sum that occurs in the definition of UB_Keogh is related to exactly one term in the computation of the \(L_1\) distance. The only difference is that for UB_Keogh, the \(i\)th term corresponds to the distance between the \(i\)th point in the candidate sequence and its furthest corresponding point in the envelope of the query, while for \(L_1\), the same term is equal to the distance between the \(i\)th point in the candidate sequence and one of its possible corresponding points in the envelope of the query. The latter distance is then, by definition, smaller than the former, and the following inequality is then straightforward, coming from Lemma 1:

$$\begin{aligned} \text {UB}\_\text {Keogh}{}(Q,C) \ge L_1(Q,C) \ge \text {DTW}{}(Q,C). \end{aligned}$$

\(\square \)

Definition 2

Let us define UB_PAA as:

$$\begin{aligned} \text {UB}\_\text {PAA}{}(Q,C)&= \frac{n}{N} \cdot {} \left( \sum _{i=1}^{N}\max (\hat{U}_i - \bar{c_i}, \bar{c_i} - \hat{L}_i)\right) \nonumber \\&+ \frac{n}{N} \cdot {} \left( \max (C)-\min (C)\right) . \end{aligned}$$
(12)

Lemma 2

For any two sequences \(Q\) and \(C\) of length \(n\), the following inequality stands:

$$\begin{aligned} \text {UB}\_\text {PAA}{}(Q,C) \ge \text {UB}\_\text {Keogh}{}(Q,C). \end{aligned}$$

Proposition 2

For any two sequences \(Q\) and \(C\) of length \(n\), the following inequality stands:

$$\begin{aligned} \text {UB}\_\text {PAA}{}(Q,C) \ge \text {DTW}{}(Q,C) \end{aligned}$$

where the considered DTW is constrained to a Sakoe–Chiba band of width \(r\).

Proof

It is straightforward that proving Lemma 2 is sufficient to prove, using Proposition 1, that Proposition 2 holds.

Let \(Q\) and \(C\) be two sequences of length \(n\) and \(W\) be the minimum cost path used for DTW computation with a Sakoe–Chiba band of with \(r\) between \(Q\) and \(C\). So as to prove:

$$\begin{aligned} \sum _{i=1}^{n}\max (U_i - c_i, c_i - L_i) \le \frac{n}{N} \cdot {} \left( \sum _{i=1}^{N}\max (\hat{U}_i - \bar{c_i}, \bar{c_i} - \hat{L}_i) + \max (C)-\min (C)\right) \end{aligned}$$

it is sufficient to prove that, for all \(i \in \{ 1,\ldots ,N \}\):

$$\begin{aligned} \sum _{j=\frac{n}{N}(i-1)+1}^{\frac{n}{N}i}\max (U_j - c_j, c_j - L_j) \le \frac{n}{N} \cdot {} \left( \max (\hat{U}_i - \bar{c_i}, \bar{c_i} - \hat{L}_i) + (\max (C)-\min (C)\right) \end{aligned}$$

Let \(i \in \{ 1,\ldots ,N \}\). If we denote, for all \(j \in \{ \frac{n}{N}(i-1)+1,\ldots ,\frac{n}{N}i \}\), \(c_j = \bar{c_i} \!+\! \Delta c_j\), we get :

$$\begin{aligned}&\sum _{j=\frac{n}{N}(i-1)+1}^{\frac{n}{N}i}\max (U_j - c_j, c_j - L_j)\\&\quad \qquad = \sum _{j=\frac{n}{N}(i-1)+1}^{\frac{n}{N}i}\max (U_j - (\bar{c_i} + \Delta c_j), \bar{c_i} + \Delta c_j - L_j) \\&\quad \qquad \le \sum _{j=\frac{n}{N}(i-1)+1}^{\frac{n}{N}i}\max (\hat{U_i} - (\bar{c_i} + \Delta c_j), \bar{c_i} + \Delta c_j - \hat{L_i}) \\&\quad \qquad \le \sum _{j=\frac{n}{N}(i-1)+1}^{\frac{n}{N}i}\max (\hat{U_i} - \bar{c_i}, \bar{c_i} - \hat{L_i}) + |\Delta c_j| \\&\quad \qquad \le \frac{n}{N}\left( \max (\hat{U_i} - \bar{c_i}, \bar{c_i} - \hat{L_i}) + \max (C) - \min (C)\right) \end{aligned}$$

which concludes the proof. \(\square \)

Definition 3

Let us define iSAX_MaxDist as:

$$\begin{aligned} { i}\text {SAX}\_\text {MaxDist}{}(Q,R) = \sqrt{\frac{n}{N} \sum _{i=1}^{N} X_i} \end{aligned}$$
(13)

where

$$\begin{aligned} \forall i \le N, X_i = {\left\{ \begin{array}{ll} (\bar{q_i} - B_i)^2 &{} \text { if } \bar{q_i} > H_i\\ (H_i - \bar{q_i})^2 &{} \text { if } \bar{q_i} < B_i\\ \max (H_i - \bar{q_i}, \bar{q_i} - B_i)^2 &{} \text { otherwise} \end{array}\right. }. \end{aligned}$$
(14)

1.2 Balancing \(k\)-means

When \(k=2\), balancing \(k\)-means does not require any iterative process as proposed in Tavenard et al. [17]. It is possible to derive elevation \(h\) that the most populated clusters’ centroid will get in order for both clusters to finally get equal populations without resorting to any iterative process.

Let us assume, without loss of generality, that \(k\)-means produced two centroids \(\mathbf {C_1}\) and \(\mathbf {C_2}\) and that cluster \(C_1\) is more populated than \(C_2\). Using notations introduced in Fig. 9, intersection between the line \((\mathbf {C_1}, \mathbf {C_2})\) and the boundary between classes \(C_1\) and \(C_2\) is then \(\mathbf {C_0}\), middle of the line segment \([\mathbf {C_1}, \mathbf {C_2}]\). We aim at evaluating elevation \(h\) such that this point moves to \(\mathbf {C'_0}\) that is the median of projected data points. It is straightforward that if one builds a new boundary that is parallel to the original one and passes through \(\mathbf {C'_0}\), both clusters will be equally populated. After solving the related system of equations, one gets:

$$\begin{aligned} h = \sqrt{2(x_2-x_1) \left( \frac{x_1+x_2}{2}-x'_0\right) ,} \end{aligned}$$
(15)

where \(x_1\) and \(x_2\) are known from the \(k\)-means and \(x'_0\) is the median of projected data points.

Fig. 9
figure 9

Balancing \(k\)-means for the \(k=2\) case

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tavenard, R., Amsaleg, L. Improving the efficiency of traditional DTW accelerators. Knowl Inf Syst 42, 215–243 (2015). https://doi.org/10.1007/s10115-013-0698-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0698-7

Keywords

Navigation