Skip to main content

Uncertainty-DTW for Time Series and Sequences

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13681))

Included in the following conference series:

Abstract

Dynamic Time Warping (DTW) is used for matching pairs of sequences and celebrated in applications such as forecasting the evolution of time series, clustering time series or even matching sequence pairs in few-shot action recognition. The transportation plan of DTW contains a set of paths; each path matches frames between two sequences under a varying degree of time warping, to account for varying temporal intra-class dynamics of actions. However, as DTW is the smallest distance among all paths, it may be affected by the feature uncertainty which varies across time steps/frames. Thus, in this paper, we propose to model the so-called aleatoric uncertainty of a differentiable (soft) version of DTW. To this end, we model the heteroscedastic aleatoric uncertainty of each path by the product of likelihoods from Normal distributions, each capturing variance of pair of frames. (The path distance is the sum of base distances between features of pairs of frames of the path.) The Maximum Likelihood Estimation (MLE) applied to a path yields two terms: (i) a sum of Euclidean distances weighted by the variance inverse, and (ii) a sum of log-variance regularization terms. Thus, our uncertainty-DTW is the smallest weighted path distance among all paths, and the regularization term (penalty for the high uncertainty) is the aggregate of log-variances along the path. The distance and the regularization term can be used in various objectives. We showcase forecasting the evolution of time series, estimating the Fréchet mean of time series, and supervised/unsupervised few-shot action recognition of the articulated human 3D body joints.

L. Wang and P. Koniusz—Equal contribution. Code: https://github.com/LeiWangR/uDTW.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We use temporal blocks as they were shown more robust than frame-wise FSAR [50] models.

References

  1. Abid, A., Zou, J.: AutoWarp: learning a warping distance from unlabeled time series using sequence autoencoders. In: NIPS 2018. Curran Associates Inc., Red Hook (2018)

    Google Scholar 

  2. Ben-Ari, R., Shpigel Nacson, M., Azulai, O., Barzelay, U., Rotman, D.: TAEN: temporal aware embedding network for few-shot action recognition. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2780–2788 (2021)

    Google Scholar 

  3. Blondel, M., Mensch, A., Vert, J.P.: Differentiable divergences between time series. In: Banerjee, A., Fukumizu, K. (eds.) Proceedings of the 24th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 130, pp. 3853–3861. PMLR (2021)

    Google Scholar 

  4. Cao, K., Ji, J., Cao, Z., Chang, C.Y., Niebles, J.C.: Few-shot video classification via temporal alignment. In: CVPR (2020)

    Google Scholar 

  5. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1325–1339 (2014)

    Article  Google Scholar 

  6. Cuturi, M.: Fast global alignment kernels. In: International Conference on Machine Learning (ICML) (2011)

    Google Scholar 

  7. Cuturi, M., Blondel, M.: Soft-DTW: a differentiable loss function for time-series. In: International Conference on Machine Learning (ICML) (2017)

    Google Scholar 

  8. Dau, H.A., et al.: The UCR Time Series Classification Archive (2018). https://www.cs.ucr.edu/~eamonn/time_series_data_2018/

  9. Dempster, A., Schmidt, D.F., Webb, G.I.: MINIROCKET: a very fast (almost) deterministic transform for time series classification. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD 2021, pp. 248–257. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3447548.3467231

  10. Donahue, J., Dieleman, S., Binkowski, M., Elsen, E., Simonyan, K.: End-to-end adversarial text-to-speech. In: International Conference on Learning Representations (2021)

    Google Scholar 

  11. Dosovitskiy, A., et al.: An image is worth \(16 \times 16\) words: transformers for image recognition at scale. In: International Conference on Learning Representations (2020)

    Google Scholar 

  12. García-García, D., Parrado Hernández, E., Díaz-de María, F.: A new distance measure for model-based sequence clustering. IEEE Trans. Pattern Anal. Mach. Intell. 31(7), 1325–1331 (2009). https://doi.org/10.1109/TPAMI.2008.268

    Article  Google Scholar 

  13. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics, Springer, New York (2001)

    Book  MATH  Google Scholar 

  14. Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110(3), 457–506 (2021). https://doi.org/10.1007/s10994-021-05946-3

    Article  MathSciNet  MATH  Google Scholar 

  15. Indrayan, A.: Medical Biostatistics, 2nd edn. Chapman & Hall/CRC, Boca Raton (2008). https://www.loc.gov/catdir/toc/ecip0723/2007030353.html

  16. Kay, W., et al.: The kinetics human action video dataset (2017)

    Google Scholar 

  17. Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  18. Kiureghian, A.D., Ditlevsen, O.: Aleatory or epistemic? Does it matter? Struct. Saf. 31(2), 105–112 (2009). https://doi.org/10.1016/j.strusafe.2008.06.020. Risk Acceptance and Risk Communication

  19. Koniusz, P., Mikolajczyk, K.: Soft assignment of visual words as linear coordinate coding and optimisation of its reconstruction error. In: 2011 18th IEEE International Conference on Image Processing, pp. 2413–2416 (2011). https://doi.org/10.1109/ICIP.2011.6116129

  20. Koniusz, P., Wang, L., Cherian, A.: Tensor representations for action recognition. TPAMI 44, 648–665 (2020)

    Article  Google Scholar 

  21. Koniusz, P., Yan, F., Mikolajczyk, K.: Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection. Comput. Vis. Image Underst. 117(5), 479–492 (2013). https://doi.org/10.1016/j.cviu.2012.10.010

    Article  Google Scholar 

  22. Li, S., et al.: TTAN: two-stage temporal alignment network for few-shot action recognition. CoRR (2021)

    Google Scholar 

  23. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45, 503–528 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  24. Liu, J., Shahroudy, A., Perez, M., Wang, G., Duan, L.Y., Kot, A.C.: NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding. IEEE Trans. Pattern Anal. Mach. Intell. (2019). https://doi.org/10.1109/TPAMI.2019.2916873

  25. Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: 2011 International Conference on Computer Vision, pp. 2486–2493 (2011). https://doi.org/10.1109/ICCV.2011.6126534

  26. Lohit, S., Wang, Q., Turaga, P.: Temporal transformer networks: joint learning of invariant and discriminative time warping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  27. Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2659–2668 (2017). https://doi.org/10.1109/ICCV.2017.288

  28. Matthies, H.G.: Quantifying uncertainty: modern computational representation of probability and applications. In: Ibrahimbegovic, A., Kozar, I. (eds.) Extreme Man-Made and Natural Hazards in Dynamics of Structures, pp. 105–135. Springer, Netherlands, Dordrecht (2007). https://doi.org/10.1007/978-1-4020-5656-7_4

    Chapter  Google Scholar 

  29. Memmesheimer, R., Häring, S., Theisen, N., Paulus, D.: Skeleton-DML: deep metric learning for skeleton-based one-shot action recognition (2021)

    Google Scholar 

  30. Memmesheimer, R., Theisen, N., Paulus, D.: Signal level deep metric learning for multimodal one-shot action recognition (2020)

    Google Scholar 

  31. Mensch, A., Blondel, M.: Differentiable dynamic programming for structured prediction and attention. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 3462–3471. PMLR (2018)

    Google Scholar 

  32. Mina, B., Zoumpourlis, G., Patras, I.: Tarn: temporal attentive relation network for few-shot and zero-shot action recognition. In: Sidorov, K., Hicks, Y. (eds.) Proceedings of the British Machine Vision Conference (BMVC), pp. 130.1–130.14. BMVA Press (2019). https://doi.org/10.5244/C.33.130

  33. Perrett, T., Masullo, A., Burghardt, T., Mirmehdi, M., Damen, D.: Temporal-relational crosstransformers for few-shot action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 475–484 (2021)

    Google Scholar 

  34. Ramachandran, P., Liu, P.J., Le, Q.V.: Unsupervised pretraining for sequence to sequence learning (2018)

    Google Scholar 

  35. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978). https://doi.org/10.1109/TASSP.1978.1163055

    Article  MATH  Google Scholar 

  36. Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  37. Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017, pp. 4077–4087 (2017)

    Google Scholar 

  38. Su, B., Hua, G.: Order-preserving optimal transport for distances between sequences. IEEE Trans. Pattern Anal. Mach. Intell. 41(12), 2961–2974 (2019). https://doi.org/10.1109/TPAMI.2018.2870154

    Article  Google Scholar 

  39. Su, B., Wen, J.R.: Temporal alignment prediction for supervised representation learning and few-shot sequence classification. In: International Conference on Learning Representations (2022)

    Google Scholar 

  40. Su, B., Zhou, J., Wu, Y.: Order-preserving Wasserstein discriminant analysis. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9884–9893 (2019). https://doi.org/10.1109/ICCV.2019.00998

  41. Tan, S., Yang, R.: Learning similarity: feature-aligning network for few-shot action recognition. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2019)

    Google Scholar 

  42. Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016, pp. 3630–3638 (2016)

    Google Scholar 

  43. Wang, L.: Analysis and evaluation of Kinect-based action recognition algorithms. Master’s thesis, School of the Computer Science and Software Engineering, The University of Western Australia (2017)

    Google Scholar 

  44. Wang, L., Huynh, D.Q., Koniusz, P.: A comparative review of recent Kinect-based action recognition algorithms. IEEE Trans. Image Process. 29, 15–28 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  45. Wang, L., Huynh, D.Q., Mansour, M.R.: Loss switching fusion with similarity search for video classification. In: ICIP (2019)

    Google Scholar 

  46. Wang, L., Koniusz, P.: Self-supervising action recognition by statistical moment and subspace descriptors, pp. 4324–4333. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3474085.3475572

  47. Wang, L., Koniusz, P., Huynh, D.Q.: Hallucinating IDT descriptors and I3D optical flow features for action recognition with CNNs. In: The IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  48. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI (2018)

    Google Scholar 

  49. Yang, C.H.H., Tsai, Y.Y., Chen, P.Y.: Voice2series: reprogramming acoustic models for time series classification. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 11808–11819. PMLR (2021)

    Google Scholar 

  50. Zhang, H., Zhang, L., Qi, X., Li, H., Torr, P.H.S., Koniusz, P.: Few-shot action recognition with permutation-invariant attention. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 525–542. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_31

    Chapter  Google Scholar 

  51. Zhu, H., Koniusz, P.: Simple spectral graph convolution. In: International Conference on Learning Representations (ICLR) (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Piotr Koniusz .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 668 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, L., Koniusz, P. (2022). Uncertainty-DTW for Time Series and Sequences. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13681. Springer, Cham. https://doi.org/10.1007/978-3-031-19803-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19803-8_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19802-1

  • Online ISBN: 978-3-031-19803-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics