Sequence likelihood divergence for fast time series comparison

Huang, Yi; Rotaru, Victor; Chattopadhyay, Ishanu

doi:10.1007/s10115-023-01855-0

Sequence likelihood divergence for fast time series comparison

Regular Paper
Published: 16 March 2023

Volume 65, pages 3079–3098, (2023)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

214 Accesses
2 Altmetric
Explore all metrics

Abstract

Comparing and contrasting subtle historical patterns is central to time series analysis. Here we introduce a new approach to quantify deviations in the underlying hidden stochastic generators of sequential discrete-valued data streams. The proposed measure is universal in the sense that we can compare data streams without any feature engineering step, and without the need of any hyper-parameters. Our core idea here is the generalization of the Kullback–Leibler divergence, often used to compare probability distributions, to a notion of divergence between finite-valued ergodic stationary stochastic processes. Using this notion of process divergence, we craft a measure of deviation on finite sample paths which we call the sequence likelihood divergence (SLD) which approximates a metric on the space of the underlying generators within a well-defined class of discrete-valued stochastic processes. We compare the performance of SLD against the state of the art approaches, e.g., dynamic time warping (Petitjean et al. in Pattern Recognit 44(3):678–693, 2011) with synthetic data, real-world applications with electroencephalogram data and in gait recognition, and on diverse time-series classification problems from the University of California, Riverside time series classification archive (Thanawin Rakthanmanon and Westover). We demonstrate that the new tool is at par or better in classification accuracy, while being significantly faster in comparable implementations. Released in the publicly domain, we are hopeful that SLD will enhance the standard toolbox used in classification, clustering and inference problems in time series analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Siamese Neural Networks: An Overview

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Article Open access 18 December 2020

Alejandro Pasos Ruiz, Michael Flynn, … Anthony Bagnall

A survey of methods for time series change point detection

Article 08 September 2016

Samaneh Aminikhanghahi & Diane J. Cook

References

Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD workshop, vol 10. Seattle, WA, pp 359–370
Bondy JA, Murty USR (2008) Graph theory. Grad. Texts in Math (2008)
Chattopadhyay I (2014) Causality networks. arXiv preprint arXiv:1406.6651
Chattopadhyay I, Lipson H (2013) Abductive learning of quantized stochastic processes with probabilistic finite automata. Philos Trans R Soc A Math Phys Eng Sci 371(1984):20110543
Article MathSciNet MATH Google Scholar
Chattopadhyay I, Lipson H (2014) Data smashing: uncovering lurking order in data. J R Soc Interface 11(101):20140826
Article Google Scholar
Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 491–502. ACM
Ching WK, Ng MK (2006) Chains: models, algorithms and applications. International Series in Operations Research & Management Science. Springer US, ISBN 9780387293370
Cover TM, Thomas JA (2012) Elements of information theory. Wiley, New York
MATH Google Scholar
Crutchfield JP (1994) The calculi of emergence: computation, dynamics and induction. Physica D Nonlinear Phenomena 75(1–3):11–54
Article MATH Google Scholar
Dau HA, Bagnall A, Kamgar K, Yeh C-CM, Zhu Y, Gharghabi S, Ratanamahatana CA, Keogh E (2019) The UCR time series archive. IEEE/CAA J Automatica Sinica 6(6):1293–1305
Article Google Scholar
Dekking FM, Kraaikamp C, Lopuhaä HP, Meester LE (2005) A modern introduction to probability and statistics: understanding why and how. Springer, Berlin
Book MATH Google Scholar
Dempster A, Petitjean F, Webb GI (2020) Rocket: exceptionally fast and accurate time series classification using random convolutional kernels. Data Min Knowl Discov 34(5):1454–1495
Article MathSciNet MATH Google Scholar
Dua D, Graff C (2017) UCI machine learning repository
Dupont P, Denis F, Esposito Y (2005) Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms. Pattern Recognit 38(9):1349–1371
Article MATH Google Scholar
Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Eugene Stanley H (2000) Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220
Article Google Scholar
Gupta G, Pequito S, Bogdan P (2018) Dealing with unknown unknowns: identification and selection of minimal sensing for fractional dynamics with unknown inputs. In: 2018 Annual American Control Conference (ACC). IEEE, pp 2814–2820
Gupta G, Pequito S, Bogdan P (2019) Learning latent fractional dynamics with unknown unknowns. In: 2019 American Control Conference (ACC). IEEE, pp 217–222
Hardy GH (1992) Divergent series, with a preface by je littlewood and a note by ls bosanquet, reprint of the revised (1963) edition. Éditions Jacques Gabay, Sceaux
Helstrom CW (1991) Probability and stochastic processes for engineers. Macmillan Coll Division
Jain S, Xiao X, Bogdan P, Bruck J (2021) Generator based approach to analyze mutations in genomic datasets. Sci Rep 11(1):1–12
Article Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article MathSciNet MATH Google Scholar
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery. ACM, pp 2–11
Löning M, Bagnall A, Ganesh S, Kazakov V, Lines J, Király FJ (2019) sktime: a unified interface for machine learning with time series. arXiv preprint arXiv:1909.07872
Middlehurst M, Large J, Flynn M, Lines J, Bostrom A, Bagnall A (2021) Hive-cote 2.0: a new meta ensemble for time series classification. Mach Learn 110(11):3211–3243
Möller-Levet CS, Klawonn F, Cho K-H, Wolkenhauer O (2003) Fuzzy clustering of short time-series and unevenly distributed sampling points. In: International symposium on intelligent data analysis. Springer, pp 330–340
Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv (CSUR) 33(1):31–88
Article Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Petitjean F, Ketterlin A, Gançarski P (2011) A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognit 44(3):678–693
Article MATH Google Scholar
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Article Google Scholar
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. pp 262–270
Rényi A (1965) On the foundations of information theory. Revue de l’Institut International de Statistique, pp 1–14
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021) The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 35(2):401–449
Article MathSciNet MATH Google Scholar
Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 5(1):3–55
Article MathSciNet Google Scholar
Abdullah Mueen Qiang Zhu Jesin Zakaria Eamonn Keogh Gustavo Batista Thanawin Rakthanmanon, Bilson Campana and Brandon Westover. UCR suite for time series subsequence search. (Accessed on 01/20/2021)
Vidyasagar M (2007) Bounds on the Kullback–Leibler divergence rate between hidden Markov models. In: 2007 46th IEEE conference on decision and control. IEEE, pp 6160–6165
Vidyasagar M (2014) Hidden Markov processes: theory and applications to biology, vol 44. Princeton University Press, Princeton
Book MATH Google Scholar
Xue Y, Bogdan P (2019) Reconstructing missing complex networks against adversarial interventions. Nat Commun 10(1):1–12
Article Google Scholar
Xue Y, Rodriguez S, Bogdan P (2016) A spatio-temporal fractal model for a CPS approach to brain-machine-body interfaces. In: 2016 design, automation & test in Europe conference & exhibition (DATE), pp 642–647. IEEE
Yang R, Sala F, Bogdan P (2021) Hidden network generating rules from partially observed complex networks. Commun Phys 4(1):1–12
Article Google Scholar

Download references

Acknowledgements

We thank anonymous reviewers for their very useful comments and suggestions. Part of this work was done while Li Shen and Ling Cheng were doing research in Griffith University. The work was supported by Australian Research Council (ARC) Large Grant A849602031.

Author information

Authors and Affiliations

Department of Medicine, University of Chicago, Chicago, IL, USA
Yi Huang, Victor Rotaru & Ishanu Chattopadhyay
Committee on Quantitative Methods in Social, Behavioral, and Health Sciences, University of Chicago, Chicago, IL, USA
Ishanu Chattopadhyay

Authors

Yi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Victor Rotaru
View author publications
You can also search for this author in PubMed Google Scholar
Ishanu Chattopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ishanu Chattopadhyay.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, Y., Rotaru, V. & Chattopadhyay, I. Sequence likelihood divergence for fast time series comparison. Knowl Inf Syst 65, 3079–3098 (2023). https://doi.org/10.1007/s10115-023-01855-0

Download citation

Received: 04 June 2021
Revised: 17 February 2023
Accepted: 25 February 2023
Published: 16 March 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s10115-023-01855-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Sequence likelihood divergence for fast time series comparison

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

A survey of methods for time series change point detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sequence likelihood divergence for fast time series comparison

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

A survey of methods for time series change point detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation