Skip to main content
Log in

C22MP: the marriage of catch22 and the matrix profile creates a fast, efficient and interpretable anomaly detector

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Many time series data mining algorithms work by reasoning about the relationships the conserved shapes of subsequences. To facilitate this, the Matrix Profile is a data structure that annotates a time series by recording each subsequence’s Euclidean distance to its nearest neighbor. In recent years, the community has shown that using the Matrix Profile it is possible to discover many useful properties of a time series, including repeated behaviors (motifs), anomalies, evolving patterns, regimes, etc. However, the Matrix Profile is limited to representing the relationship between the subsequence’s shapes. It is understood that, for some domains, useful information is conserved not in the subsequence’s shapes, but in the subsequence’s features. In recent years, a new set of features for time series called catch22 has revolutionized feature-based mining of time series. Combining these two ideas seems to offer many possibilities for novel data mining applications; however, there are two difficulties in attempting this. A direct application of the Matrix Profile with the catch22 features would be prohibitively slow. Less obviously, as we will demonstrate, in almost all domains, using all twenty-two of the catch22 features produces poor results, and we must somehow select the subset appropriate for the domain. In this work, we introduce novel algorithms to solve both problems and demonstrate that, for most domains, the proposed C22MP is a state-of-the-art anomaly detector.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

Data availability

C22MP (2022) Supporting webpage: sites.google.com/view/c22mp/home.

Notes

  1. The two most cited datasets for evaluating TSAD algorithms are tiny: NY-Taxi (length 10,320) and Yahoo! Webscope (mean length 1415) [49].

  2. This contrived example is not as implausible as it may seem. Suppose we are monitoring the accelerometer time series from a smartphone in a user’s pocket. If the user takes a call, and then returns the phone to her pocket upside down, the Y-axis time series will flip upside down, but will not be flipped backwards.

  3. In blog forums, private conversations, openreview.net etc.

References

  1. Agrahari R et al (2022) Assessing feature representations for instance-based cross-domain anomaly detection in cloud services univariate time series data. IoT 3(1):123–144

    Article  Google Scholar 

  2. Alzantot M, Chakraborty S, Srivastava M (2017) Sensegen: a deep learning architecture for synthetic sensor data generation. In: 2017 IEEE international conference on pervasive computing and communications workshops (PerCom Workshops), pp 188–193. IEEE

  3. Aminifar F et al (2022) A review of power system protection and asset management with machine learning techniques. Energy Syst 13(4):855–892

    Article  Google Scholar 

  4. Audibert J, Marti S, Guyard F, Zuluaga MA (2021) From univariate to multivariate time series anomaly detection with non-local information. In: International workshop on advanced analytics and learning on temporal data, pp 186–194. Springer, Cham

  5. Audibert J, Michiardi P, Guyard F, Marti S, Zuluaga MA (2022) Do deep neural networks contribute to multivariate time series anomaly detection? arXiv preprint https://arxiv.org/abs/2204.01637

  6. Audibert J, Michiardi P, Guyard F, Marti S, Zuluaga MA (2020) USAD: unsupervised anomaly detection on multivariate time series. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 3395–3404

  7. Boniol P, Linardi M, Roncallo F, Palpanas T, Meftah M, Remy E (2021) Unsupervised and scalable subsequence anomaly detection in large data series. VLDB J 30(6):909–931

    Article  Google Scholar 

  8. Brophy E, Wang Z, She Q, Ward T (2021) Generative adversarial networks in time series: A survey and taxonomy. arXiv preprint https://arxiv.org/abs/2107.11098

  9. C22MP (2022) Supporting webpage: sites.google.com/view/c22mp/home

  10. Dau HA et al (2019) The UCR time series archive. IEEE/CAA J Automatica Sinica 6(6):1293–1305

    Article  Google Scholar 

  11. Fährmann D, Damer N, Kirchbuchner F, Kuijper A (2022) Lightweight long short-term memory variational auto-encoder for multivariate time series anomaly detection in industrial control systems. Sensors 22(8):2886

    Article  Google Scholar 

  12. Fengming Z, Shufang L, Zhimin G, Bo W, Shiming T, Mingming P (2017) Anomaly detection in smart grid based on encoder-decoder framework with recurrent neural network. J China Univ Posts Telecommun 24(6):67–73

    Article  Google Scholar 

  13. Geiger A, Liu D, Alnegheimish S, Cuesta-Infante A, Veeramachaneni K (2020) Tadgan: time series anomaly detection using generative adversarial networks. In: 2020 IEEE international conference on big data (Big Data), pp 33–43. IEEE

  14. Goh J, Adepu S, Junejo KN, Mathur A (2016) A dataset to support research in the design of secure water treatment systems. In: International conference on critical information infrastructures security, pp 88–99. Springer

  15. Haq IU, Lee BS (2023) TransNAS-TSAD: harnessing transformers for multi-objective neural architecture search in time series anomaly detection. arXiv preprint https://arxiv.org/abs/2311.18061

  16. Huet A, Navarro JM, Rossi D (2022) Local evaluation of time series anomaly detection algorithms. In: Proceedings of the 28th ACM SIGKDD, pp 635–645

  17. Hundman K, Constantinou V, Laporte C, Colwell I, Soderstrom T (2018) Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In: Proceedings of 24th ACM SIGKDD, pp 387–395

  18. Idé T (2006) Why does subsequence time-series clustering produce sine waves? In: Knowledge discovery in databases: PKDD 2006: 10th European conference on principles and practice of knowledge discovery in databases Berlin, Germany, Proceedings, vol 10, pp 211–222. Springer, Berlin

  19. Jackson TD et al (2021) The motion of trees in the wind: a data synthesis. Biogeosciences 18(13):4059–4072

    Article  Google Scholar 

  20. Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8:154–177

    Article  Google Scholar 

  21. Kravchik M, Shabtai A (2021) Efficient cyber attack detection in industrial control systems using lightweight neural networks and PCA. IEEE Trans Depend Secure Comput 19(4):2179–2197

    Article  Google Scholar 

  22. Lai KH, Zha D, Xu J, Zhao Y, Wang G, Hu X (2021) Revisiting time series outlier detection: Definitions and benchmarks. In: 35th Conference on NeurIPS datasets and benchmarks track

  23. Li D, Chen D, Jin B, Shi L, Goh J, Ng SK (2019) MAD-GAN: multivariate anomaly detection for time series data with generative adversarial networks. In: Artificial neural networks and machine learning—ICANN 2019: text and time series: 28th international conference on artificial neural networks, Munich, Germany, Proceedings, part IV, pp 703–716. Springer, Cham

  24. Liu HY, Gao ZZ, Wang ZH, Deng YH (2022) Time series classification with shapelet and canonical features. Appl Sci 12(17):8685

    Article  Google Scholar 

  25. Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):14–23

    Article  Google Scholar 

  26. Lu Y, Wu R, Mueen A, Zuluaga MA, Keogh E (2022) Matrix profile XXIV: scaling time series anomaly detection to trillions of datapoints and ultra-fast arriving data streams. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pp 1173–1182

  27. Lubba CH, Sethi SS, Knaute P, Schultz SR, Fulcher BD, Jones NS (2019) catch22: CAnonical time-series CHaracteristics. Data Min Knowl Disc 33(6):1821–1852

    Article  Google Scholar 

  28. Lauer J, Zhou M, Ye S, Menegas W, Nath T, Rahman MM, Di Santo V, Soberanes D, Feng G, Murthy VN, Lauder G (2021) Multi-animal pose estimation and tracking with DeepLabCut. BioRxiv

  29. MacQueen J (1967) Classification and analysis of multivariate observations. In: 5th Berkeley symposium on mathematics and statistics and probability, pp 281–297. University of California, Los Angeles

  30. Marimon X, Traserra S, Jiménez M, Ospina A, Benítez R (2022) Detection of abnormal cardiac response patterns in cardiac tissue using deep learning. Mathematics 10(15):2786

    Article  Google Scholar 

  31. Munir M, Siddiqui SA, Dengel A, Ahmed S (2018) DeepAnT: a deep learning approach for unsupervised anomaly detection in time series. IEEE Access 19(7):1991–2005

    Google Scholar 

  32. Nakamura T, Imamura M, Mercer R, Keogh E (2020) Merlin: parameter-free discovery of arbitrary length anomalies in massive time series archives. In: 2020 IEEE ICDM, pp 1190–1195

  33. Park D, Hoshi Y, Kemp CC (2018) A multimodal anomaly detector for robot-assisted feeding using an LSTM-based variational autoencoder. IEEE Robot Autom Lett 3(3):1544–1551

    Article  Google Scholar 

  34. Ren H, Xu B, Wang Y, Yi C, Huang C, Kou X, Xing T, Yang M, Tong J, Zhang Q (2019) Time-series anomaly detection service at microsoft. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 3009–3017

  35. Rewicki F, Denzler J, Niebling J (2022) Is it worth it? An experimental comparison of six deep-and classical machine learning methods for unsupervised anomaly detection in time series. arXiv preprint https://arxiv.org/abs/2212.11080

  36. Saarela M, Jauhiainen S (2021) Comparison of feature importance measures as explanations for classification models. SN Appl Sci 3:1–2

    Article  Google Scholar 

  37. Thompson DW (1917) On growth and form. Cambridge University Press

    Book  Google Scholar 

  38. Tuli S, Casale G, Jennings NR. Tranad: deep transformer networks for anomaly detection in multivariate time series data. arXiv preprint https://arxiv.org/abs/2201.07284

  39. Turowski M et al. (2022) Modeling and generating synthetic anomalies for energy and power time series. In: Proceedings of the 13th ACM e-Energy, pp 471–484

  40. Wang R, Liu C, Mou X, Guo X, Gao K, Liu P, Wo T, Liu X (2022) Deep contrastive one-class time series anomaly detection. arXiv preprint https://arxiv.org/abs/2207.01472

  41. Wen Q, Sun L, Yang F, Song X, Gao J, Wang X, Xu H (2020) Time series data augmentation for deep learning: a survey. arXiv preprint https://arxiv.org/abs/2002.12478

  42. Wu R, Keogh E (2021) Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress. IEEE TKDE

  43. Yairi T, Kato Y, Hori K (2001) Fault detection by mining association rules from house-keeping data. In: Proceedings of the 6th international symposium on artificial intelligence, robotics and automation in space, vol 18, p 21. Citeseer

  44. Yankov D, Keogh E, Rebbapragada U (2008) Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl Inf Syst 17:241–262

    Article  Google Scholar 

  45. Yoon J, Jarrett D, Van der Schaar M (2019) Time-series generative adversarial networks. In: Advances in neural information processing systems, vol 32

  46. Zhang C, Kuppannagari SR, Kannan R, Prasanna VK (2018) Generative adversarial network for synthetic time series data generation in smart grids. In: 2018 IEEE international conference on communications, control, and computing technologies for smart grids (SmartGridComm), pp 1–6. IEEE

  47. Zhu Y, Yeh CC, Zimmerman Z, Kamgar K, Keogh E (2018) Matrix profile XI: SCRIMP++: time series motif discovery at interactive speeds. In: 2018 IEEE ICDM, pp 837–846

Download references

Acknowledgements

We thank all the creators of the data sets used in this work and the original authors of catch22, who were very helpful with their time [33].

Funding

Funding was provided by gifts from Google, Mitsubishi and by NSF Award 2103976.

Author information

Authors and Affiliations

Authors

Contributions

ST involved in algorithm design, writing, and implementation. YL involved in design of algorithm comparison measure. RW involved in optimization of code. TVAS involved in design of normalization algorithm. HDC involved in design of biological (mouse) experiments. RM involved in design of DNA experiments. EK involved in writing and editing.

Corresponding author

Correspondence to Sadaf Tafazoli.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tafazoli, S., Lu, Y., Wu, R. et al. C22MP: the marriage of catch22 and the matrix profile creates a fast, efficient and interpretable anomaly detector. Knowl Inf Syst (2024). https://doi.org/10.1007/s10115-024-02107-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10115-024-02107-5

Keywords

Navigation