Skip to main content
Log in

Sufficient conditions for the existence of a sample mean of time series under dynamic time warping

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

Time series averaging is an important subroutine for several time series data mining tasks. The most successful approaches formulate the problem of time series averaging as an optimization problem based on the dynamic time warping (DTW) distance. The existence of an optimal solution, called sample mean, is an open problem for more than four decades. Its existence is a necessary prerequisite to formulate exact algorithms, to derive complexity results, and to study statistical consistency. In this article, we propose sufficient conditions for the existence of a sample mean. A key result for deriving the proposed sufficient conditions is the Reduction Theorem that provides an upper bound for the minimum length of a sample mean.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Abanda, A., Mori, U., Lozano, J.A.: A review on distance based time series classification. Data Mining and Knowledge Discovery (2018)

  2. Abdulla, W.H., Chow, D., Sin, G.: Cross-words reference template for DTW based speech recognition systems. Conference on Convergent Technologies for Asia-Pacific Region (2003)

  3. Aghabozorgi, S., Shirkhorshidi, A.S., Wah, T.Y.: Time-series clustering – a decade review. Inf. Syst. 53, 16–38 (2015)

    Article  Google Scholar 

  4. Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E.: The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Disc. 31(3), 606–660 (2017)

    Article  MathSciNet  Google Scholar 

  5. Bhattacharya, R., Patrangenaru, V.: Large sample theory of intrinsic and extrinsic sample means on manifolds. Ann. Stat. 31(1), 1–29 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  6. Brill, M., Fluschnik, T., Froese, V., Jain, B., Niedermeier, R, Schultz, D.: Exact mean computation in dynamic time warping spaces. Data Mining and Knowledge Discovery (2019)

  7. Bulteau, L., Froese, V., Niedermeier, R.: Hardness of Consensus Problems for Circular Strings and Time Series Averaging. arXiv:1804.02854(2018)

  8. Cuturi, M., Blondel, M.: Soft-DTW: a differentiable loss function for time-series. International Conference on Machine Learning (2017)

  9. Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.: Querying and mining of time series data: experimental comparison of representations and distance measures. Proc. VLDB Endowment 1(2), 1542–1552 (2008)

    Article  Google Scholar 

  10. Dryden, I.L., Mardia, KV: Statistical shape analysis. Wiley, New York (1998)

  11. Feragen, A., Lo, P., De Bruijne, M., Nielsen, M., Lauze, F.: Toward a theory of statistical tree-shape analysis. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2008–2021 (2013)

    Article  Google Scholar 

  12. Fletcher, P.T., Lu, C., Pizer, S.M., Joshi, S.: Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Trans. Med. Imaging 23(8), 995–1005 (2004)

    Article  Google Scholar 

  13. Fréchet, M.: Les éléments aléatoires de nature quelconque dans un espace distancié. Annales de l’,institut Henri Poincaré 10, 215–310 (1948)

  14. Ginestet, C.E.: Strong Consistency of Fré,chet Sample Mean Sets for Graph-Valued Random Variables. arXiv:1204.3183 (2012)

  15. Hautamaki, V., Nykanen, P., Franti, P.: Time-series clustering by approximate prototypes. International Conference on Pattern Recognition (2008)

  16. Jain, B.J.: Generalized gradient learning on time series. Mach. Learn. 100(2-3), 587–608 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  17. Jain, B.J.: Statistical analysis of graphs. Pattern Recogn. 60, 802–812 (2016)

    Article  MATH  Google Scholar 

  18. Jain, B.J., Schultz, D.: On the existence of a sample mean in dynamic time warping spaces. arXiv:1610.04460 (2016)

  19. Jain, B.J., Schultz, D.: Asymmetric learning vector quantization for efficient nearest neighbor classification in dynamic time warping spaces. Pattern Recogn. 76, 349–366 (2018)

    Article  Google Scholar 

  20. Jain, B.: Revisiting Inaccuracies of Time Series Averaging under Dynamic Time Warping. Pattern Recogn. Lett. 125, 418–424 (2019)

    Article  Google Scholar 

  21. Kendall, D.G.: Shape manifolds, procrustean metrics, and complex projective spaces. Bull. Lond. Math. Soc. 16, 81–121 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  22. Kohonen, T., Somervuo, P.: Self-organizing maps of symbol strings. Neurocomputing 21(1-3), 19–30 (1998)

    Article  MATH  Google Scholar 

  23. Liu, Y., Zhang, Y., Zeng, M.: Adaptive Global Time Sequence Averaging Method Using Dynamic Time Warping. IEEE Trans. Signal Process. 67(8), 2129–2142 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  24. Petitjean, F., Ketterlin, A., Gancarski, P.: A global averaging method for dynamic time warping, with applications to clustering. Pattern Recogn. 44(3), 678–693 (2011)

    Article  MATH  Google Scholar 

  25. Petitjean, F., Forestier, G., Webb, G.I., Nicholson, A.E., Chen, Y., Keogh, E.: Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm. Knowl. Inf. Syst. 47(1), 1–26 (2016)

    Article  Google Scholar 

  26. Rabiner, L.R., Wilpon, J.G.: Considerations in applying clustering techniques to speaker-independent word recognition. J. Acoust. Soc. Am. 66(3), 663–673 (1979)

    Article  Google Scholar 

  27. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)

    Article  MATH  Google Scholar 

  28. Schultz, D., Jain, B.: Nonsmooth analysis and subgradient methods for averaging in dynamic time warping spaces. Pattern Recogn. 74, 340–358 (2018)

    Article  Google Scholar 

  29. Soheily-Khah, S., Douzal-Chouakria, A., Gaussier, E.: Generalized k-means-based clustering for temporal data under weighted and kernel time warp. Pattern Recogn. Lett. 75, 63–69 (2016)

    Article  Google Scholar 

  30. Somervuo, P., Kohonen, T.: Self-organizing maps and learning vector quantization for feature sequences. Neural. Process. Lett. 10(2), 151–159 (1999)

    Article  Google Scholar 

  31. Sverdrup-Thygeson, H.: Strong law of large numbers for measures of central tendency and dispersion of random variables in compact metric spaces. Ann. Stat. 9(1), 141–145 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  32. Tan, C.W., Webb, G.I., Petitjean, F.: Indexing and classifying gigabytes of time series under time warping. International Conference on Data Mining (2017)

  33. Wilpon, J.G., Rabiner, L.R.: A Modified K-Means Clustering Algorithm for Use in Isolated Work Recognition

  34. Ziezold, H.: On expected figures and a strong law of large numbers for random elements in quasi-metric spaces. Transactions of the Seventh Prague Conference on Information Theory, Statistical Decision Functions Random Processes and of the 1974 European Meeting of Statisticians. (1977)

    Chapter  Google Scholar 

Download references

Acknowledgements

B. Jain was funded by the DFG Sachbeihilfe JA 2109/4-2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brijnesh Jain.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

A Proof of Example 9

Proof

From the Reduction Theorem follows that it is sufficient to consider candidate solutions of length one and two. Thus, it is sufficient to consider the restricted Fréchet functions F1 and F2. In addition, it is sufficient to assume that all warping paths are compact (see Example 25). Then the set §Pm,n with m,n ∈ {1,2} consists of exactly one warping path. Suppose that x = (x1), y = (y1,y2) and z = (z1,z2). Then the squared DTW distances are of the form δ(x,z)2 = d(x1,z1) + d(x1,z2)δ(y,z)2 = d(y1,z1) + d(y2,z2).

We proceed with considering a slightly modified setting using the local distance function \(d^{\prime }(a, a^{\prime }) = (a-a^{\prime })^{2}\) for all \(a, a^{\prime } \in \mathcal {A}\). Under this setting, we denote the DTW distance by \(\delta ^{\prime }\) and the restricted Fréchet functions by \(F^{\prime }_{1}\) and \(F^{\prime }_{2}\). The function \(F^{\prime }_{1}(x)\) at time series x = (x1) is of the form

$$ \begin{array}{@{}rcl@{}} F^{\prime}_{1}(x) = \underbrace{(x_{1} - 1)^{2} + (x_{1} - 1)^{2}}_{= \delta^{\prime}(x, x^{(1)})} + \underbrace{(x_{1} - 1)^{2} + (x_{1} + 1)^{2}}_{= \delta^{\prime}(x, x^{(2)})}. \end{array} $$
The function \(F^{\prime }_{1}\) is convex and differentiable with respect to x1. Taking the gradient, setting to zero and solving yields the unique solution x1 = 0.5. Thus, z = (0.5) is the restricted sample mean of \(\mathcal {X}\) on §T1 with Fréchet variation \(F^{\prime }_{1}(z) = 3\).

For a given x = (x1,x2), a similar calculation with

$$ \begin{array}{@{}rcl@{}} F^{\prime}_{2}(x) = (x_{1} - 1)^{2} + (x_{2} - 1)^{2} + (x_{1} - 1)^{2} + (x_{2} + 1)^{2} \end{array} $$
gives z = (1,0) as the unique restricted sample mean on \(\mathcal {T}_{2}\) with Fréchet variation \(F^{\prime }_{2}(z) = 2\). By combining both results, we conclude that z = (1,0) is an unrestricted sample mean of \(\mathcal {X}\) with total variation \(F^{\prime *} = 2\).

Next, we assume the original local distance function d on §A as defined in Example 9. Again, we first consider time series x = (x1) of length one. Then we have \(F_{1}(x) = F^{\prime }_{1}(x)\) if x1≠ 0 and F1(x) = 4 if x1 = 0. Thus, z = (0.5) is the restricted sample mean of \(\mathcal {X}\) on \(\mathcal {T}_{1}\) with Fréchet variation F1(z) = 3.

Now, we consider time series of length two. Let xε = (1,ε) for some \(\varepsilon \in \mathbb {R}\). Then

$$ \begin{array}{@{}rcl@{}} \lim_{\varepsilon \to 0} F(x_{\varepsilon}) = 2 \end{array} $$
(5)
but F(x0) = 4. Suppose there is a restricted sample mean z = (z1,z2) on \(\mathcal {T}_{2}\). From (5) follows that F(z) ≤ 2. If at least one element of z is zero, we have F(z) ≥ 4. This contradicts our assumption that z is a sample mean. Thus, the elements of z are both non-zero. Then we have \(F_{2}(z) = F^{\prime }_{2}(z)\). Recall that the unique minimizer of \(F^{\prime }_{2}\) has a zero element. This yields the contradiction \(2 < F^{\prime }_{2}(z) = F_{2}(z) \leq 2\). Consequently, the function F2 has no minimizer. Thus, the unrestricted Fréchet function F never attains its infimum 2 and therefore \(\mathcal {X}\) has no sample mean. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jain, B., Schultz, D. Sufficient conditions for the existence of a sample mean of time series under dynamic time warping. Ann Math Artif Intell 88, 313–346 (2020). https://doi.org/10.1007/s10472-019-09682-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10472-019-09682-2

Keywords

Mathematics Subject Classification (2010)

Navigation