Abstract
Dynamic time warping (DTW) is the most popular approach for evaluating the similarity of time series, but its computation is costly. Therefore, simple functions lower bounding DTW distances have been designed, accelerating searches by quickly pruning sequences that could not possibly be best matches. The tighter the bounds, the more they prune and the better the performance. Designing new functions that are even tighter is difficult because their computation is likely to become complex, canceling the benefits of their pruning. It is possible, however, to design simple functions with a higher pruning power by relaxing the no false dismissal assumption, resulting in approximate lower bound functions. This paper describes how very popular approaches accelerating DTW such as \(\text {LB}\_\text {Keogh}{}\) and \(\text {LB}\_\text {PAA}{}\) can be made more efficient via approximations. The accuracy of approximations can be tuned, ranging from no false dismissal to potential losses when aggressively set for great response time savings. At very large scale, indexing time series is mandatory. This paper also describes how approximate lower bound functions can be used with iSAX. Furthermore, it shows that a \(k\)-means-based quantization step for iSAX gives significant performance gains.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0698-7/MediaObjects/10115_2013_698_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0698-7/MediaObjects/10115_2013_698_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0698-7/MediaObjects/10115_2013_698_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0698-7/MediaObjects/10115_2013_698_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0698-7/MediaObjects/10115_2013_698_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0698-7/MediaObjects/10115_2013_698_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0698-7/MediaObjects/10115_2013_698_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0698-7/MediaObjects/10115_2013_698_Fig8_HTML.gif)
Similar content being viewed by others
Notes
The \(\mathcal{A}\_{}\) prefix stand for approximate.
There exists exactly one such node. isax_approximate_search is in fact the method defined in Shieh and Keogh [15] and used in the next section describing \(i\text {SAX}\_\text {Approx}{}\) indexing.
References
Aach J, Church G (2001) Aligning gene expression time series with time warping algorithms. Bioinformatics 17(6):495
Camerra A, Palpanas T, Shieh J, Keogh EJ (2010) iSAX 2.0: indexing and mining one billion time series. In: Proceedings of the IEEE international conference on data mining
Chu S, Keogh E, Hart D, Pazzani M et al (2002) Iterative deepening dynamic time warping for time series. In: Proceedings of the SIAM international conference on data mining
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of the ACM SIGMOD conference on management of data
Gavrila D, Davis L (1995) Towards 3-d model-based tracking and recognition of human movement: a multi-view approach. In: International workshop on automatic face-and gesture-recognition, pp 272–277
Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23(1):67–72
Kashyap S, Karras P (2011) Scalable knn search on vertically stored time series. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining, ACM, pp 1334–1342
Keogh E, Ratanamahatana C (2005) Exact indexing of dynamic time warping. Knowl Inform Syst 7(3):358–386
Keogh E, Xi X, Wei L, Ratanamahatana CA (2006) The ucr time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data/
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Mining Knowl Discov 15(2):107–144
Munich M, Perona P (1999) Continuous dynamic time warping for translation-invariant curve alignment with applications to signature verification. In: Proceedings of the IEEE international conference on computer vision, vol 1, pp 108–115
Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2161–2168
Paulevé L, Jégou H, Amsaleg L (2010) Locality sensitive hashing: a comparison of hash function types and querying mechanisms. Pattern Recogn Lett 31(11):1348–1358
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26:43–49
Shieh J, Keogh E (2008) iSAX: indexing and mining terabyte sized time series. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the IEEE international conference on computer vision, pp 1470–1477
Tavenard R, Jégou H, Amsaleg L (2011) Balancing clusters to reduce response time variability in large scale image search. In: Proceedings of the IEEE workshop on content-based multimedia indexing
Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh EJ (2003) Indexing multi-dimensional time-series with support for multiple distance measures. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining, pp 216–225
Yi B, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Proceedings of the IEEE international conference on data engineering
Author information
Authors and Affiliations
Corresponding author
Additional information
This work has been conducted while Romain Tavenard was pursuing his Ph.D. at INRIA, Rennes, with a scholarship from Université de Rennes 1.
Appendices
Appendices
1.1 Proofs and mathematical definitions for upper bounds
We prove here that UB_Keogh is upper bounding DTW when the latter is restricted to a Sakoe–Chiba band. We also introduce upper bounds related to \(\text {LB}\_\text {PAA}{}\) and \(i\text {SAX}\_\text {MinDist}{}\) for which we omit the proofs as they follow the exact same principles.
Definition 1
Let UB_Keogh be:
Lemma 1
For any two sequences \(Q\) and \(C\) of length \(n\), the following inequality stands:
where the considered DTW is constrained to a Sakoe–Chiba band of width \(r\).
Proof
Let \(Q\) and \(C\) be two sequences of length \(n\). Manhattan distance \(L_1\) corresponds to the alignment that follows the diagonal path. Hence, this distance is associated with one of the possible paths considered by the DTWalgorithm and is therefore greater than the cost of the minimal path, that is the value returned by DTW, which concludes the proof for Lemma 1. \(\square \)
Proposition 1
For any two sequences \(Q\) and \(C\) of length \(n\), the following inequality stands:
where the considered DTW is constrained to a Sakoe–Chiba band of width \(r\).
Proof
It is important to notice that each term in the sum that occurs in the definition of UB_Keogh is related to exactly one term in the computation of the \(L_1\) distance. The only difference is that for UB_Keogh, the \(i\)th term corresponds to the distance between the \(i\)th point in the candidate sequence and its furthest corresponding point in the envelope of the query, while for \(L_1\), the same term is equal to the distance between the \(i\)th point in the candidate sequence and one of its possible corresponding points in the envelope of the query. The latter distance is then, by definition, smaller than the former, and the following inequality is then straightforward, coming from Lemma 1:
\(\square \)
Definition 2
Let us define UB_PAA as:
Lemma 2
For any two sequences \(Q\) and \(C\) of length \(n\), the following inequality stands:
Proposition 2
For any two sequences \(Q\) and \(C\) of length \(n\), the following inequality stands:
where the considered DTW is constrained to a Sakoe–Chiba band of width \(r\).
Proof
It is straightforward that proving Lemma 2 is sufficient to prove, using Proposition 1, that Proposition 2 holds.
Let \(Q\) and \(C\) be two sequences of length \(n\) and \(W\) be the minimum cost path used for DTW computation with a Sakoe–Chiba band of with \(r\) between \(Q\) and \(C\). So as to prove:
it is sufficient to prove that, for all \(i \in \{ 1,\ldots ,N \}\):
Let \(i \in \{ 1,\ldots ,N \}\). If we denote, for all \(j \in \{ \frac{n}{N}(i-1)+1,\ldots ,\frac{n}{N}i \}\), \(c_j = \bar{c_i} \!+\! \Delta c_j\), we get :
which concludes the proof. \(\square \)
Definition 3
Let us define iSAX_MaxDist as:
where
1.2 Balancing \(k\)-means
When \(k=2\), balancing \(k\)-means does not require any iterative process as proposed in Tavenard et al. [17]. It is possible to derive elevation \(h\) that the most populated clusters’ centroid will get in order for both clusters to finally get equal populations without resorting to any iterative process.
Let us assume, without loss of generality, that \(k\)-means produced two centroids \(\mathbf {C_1}\) and \(\mathbf {C_2}\) and that cluster \(C_1\) is more populated than \(C_2\). Using notations introduced in Fig. 9, intersection between the line \((\mathbf {C_1}, \mathbf {C_2})\) and the boundary between classes \(C_1\) and \(C_2\) is then \(\mathbf {C_0}\), middle of the line segment \([\mathbf {C_1}, \mathbf {C_2}]\). We aim at evaluating elevation \(h\) such that this point moves to \(\mathbf {C'_0}\) that is the median of projected data points. It is straightforward that if one builds a new boundary that is parallel to the original one and passes through \(\mathbf {C'_0}\), both clusters will be equally populated. After solving the related system of equations, one gets:
where \(x_1\) and \(x_2\) are known from the \(k\)-means and \(x'_0\) is the median of projected data points.
Rights and permissions
About this article
Cite this article
Tavenard, R., Amsaleg, L. Improving the efficiency of traditional DTW accelerators. Knowl Inf Syst 42, 215–243 (2015). https://doi.org/10.1007/s10115-013-0698-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0698-7