PETSC: pattern-based embedding for time series classification

Feremans, Len; Cule, Boris; Goethals, Bart

doi:10.1007/s10618-022-00822-7

PETSC: pattern-based embedding for time series classification

Published: 24 March 2022

Volume 36, pages 1015–1061, (2022)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Len Feremans¹,
Boris Cule^2,3 &
Bart Goethals^1,4

1863 Accesses
10 Citations
2 Altmetric
Explore all metrics

Abstract

Efficient and interpretable classification of time series is an essential data mining task with many real-world applications. Recently several dictionary- and shapelet-based time series classification methods have been proposed that employ contiguous subsequences of fixed length. We extend pattern mining to efficiently enumerate long variable-length sequential patterns with gaps. Additionally, we discover patterns at multiple resolutions thereby combining cohesive sequential patterns that vary in length, duration and resolution. For time series classification we construct an embedding based on sequential pattern occurrences and learn a linear model. The discovered patterns form the basis for interpretable insight into each class of time series. The pattern-based embedding for time series classification (PETSC) supports both univariate and multivariate time series datasets of varying length subject to noise or missing data. We experimentally validate that MR-PETSC performs significantly better than baseline interpretable methods such as DTW, BOP and SAX-VSM on univariate and multivariate time series. On univariate time series, our method performs comparably to many recent methods, including BOSS, cBOSS, S-BOSS, ProximityForest and ResNET, and is only narrowly outperformed by state-of-the-art methods such as HIVE-COTE, ROCKET, TS-CHIEF and InceptionTime. Moreover, on multivariate datasets PETSC performs comparably to the current state-of-the-art such as HIVE-COTE, ROCKET, CIF and ResNET, none of which are interpretable. PETSC scales to large datasets and the total time for training and making predictions on all 85 ‘bake off’ datasets in the UCR archive is under 3 h making it one of the fastest methods available. PETSC is particularly useful as it learns a linear model where each feature represents a sequential pattern in the time domain, which supports human oversight to ensure predictions are trustworthy and fair which is essential in financial, medical or bioinformatics applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introducing the contrast profile: a novel time series primitive that allows real world classification

Article 17 March 2022

Fast classification of univariate and multivariate time series through shapelet discovery

Article 12 December 2015

Elastic similarity and distance measures for multivariate time series

Article Open access 14 February 2023

Notes

We remark that alternatives to sliding-window based frequency for sequential patterns have been investigated that do not require choosing \(\varDelta t\) (Cule et al. 2019). However, this is not compatible with window-based normalisation performed by SAX.
This observation has led to adaptations for numerosity reduction in time series classification (Lin et al. 2012) or non-overlapping minimal windows in frequent pattern (or episode) mining (Zhu et al. 2010; Cule et al. 2019).
We use the ordinal values for SAX symbols when computing Euclidean distance, that is \(b-a\) is 1 and \(c-a\) is 2.
Note that pattern mining has a worst-case time complexity which is exponential in the size of the pattern and the alphabet size. That is, with a pattern size (or word size) of w and \(\alpha \) different symbols, there are \(\alpha ^w\) possible sequential patterns of length w. However, we assume parameters such as w, \(\alpha \), k and \( rdur \) are constants. That is, we argue that in the context of time series classification, and not pattern mining, it is less relevant to perform a detailed analysis of the efficiency of our method for large values of k or rdur, since we do not observe an increase in time series classification accuracy for large values of both k and rdur.
Source code of PETSC: https://bitbucket.org/len_feremans/petsc.
We remark that there are differences in our creation of BeetleFly dataset compared to the UCR version due to small changes in the pre-processing of the original MPEG-7 source images.
Full experimental results: https://bitbucket.org/len_feremans/petsc/src/master/Results.html.
We use the implementations available in the sktime library (Löning et al. 2019).

References

Adamek T, O’Connor N (2003) Efficient contour-based shape representation and matching. In: Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval, pp 138–143
Aggarwal CC, Jiawei H (2014) Frequent pattern mining. Springer, Berlin
Book Google Scholar
Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proceedings 20th international conference on very large databases, vol 1215, pp 487–499
Bagnall A, Lines J, Hills J, Bostrom A (2015) Time-series classification with cote: the collective of transformation-based ensembles. IEEE Trans Knowl Data Eng 27(9):2522–2535
Article Google Scholar
Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31(3):606–660
Article MathSciNet Google Scholar
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018) The UEA multivariate time series classification archive. arXiv preprint arXiv:1811.00075
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305
MathSciNet MATH Google Scholar
Bober M (2001) Mpeg-7 visual shape descriptors. IEEE Trans Circuits Syst Video Technol 11(6):716–719
Article Google Scholar
Chen Y, Nascimento MA, Ooi BC, Tung AKH (2007) Spade: on shape-based pattern detection in streaming time series. In: 2007 IEEE 23rd international conference on data engineering. IEEE, pp 786–795
Cheng H, Yan X, Han J, Philip SY (2008) Direct discriminative pattern mining for effective classification. In: 2008 IEEE 24th international conference on data engineering. IEEE, pp 169–178
Cule B, Feremans L, Goethals B (2019) Efficiently mining cohesion-based patterns and rules in event sequences. Data Min Knowl Discov 33(4):1125–1182
Article MathSciNet Google Scholar
Dau HA, Keogh E, Kamgar K, Yeh C-CM, Zhu Y, Gharghabi S, Ratanamahatana CA, Yanping HB, Begum N, Bagnall A, Mueen A, Batista G, Hexagon ML (2018) The UCR time series classification archive, October 2018. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/
Dempster A, Petitjean F,Webb GI (2020) Rocket: exceptionally fast and accurate time series classification using random convolutional kernels. Data Min Knowl Discov 34(5):1454–1495
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Deng H, Runger G, Tuv E, Vladimir M (2013) A time series forest for classification and feature extraction. Inf Sci 239:142–153
Article MathSciNet Google Scholar
Fan W, Zhang K, Cheng H, Gao J, Yan X, Han J, Yu P, Verscheure O (2008) Direct mining of discriminative and essential frequent patterns via model-based search tree. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 230–238
Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller PA (2019) Deep learning for time series classification: a review. Data Min Knowl Discov 33(4):917–963
Article MathSciNet Google Scholar
Fawaz HI, Lucas B, Forestier G, Pelletier C, Schmidt DF, Weber J, Webb GI, Idoumghar L, Muller P-A, Petitjean F (2020) Inceptiontime: finding alexnet for time series classification.Data Min Knowl Discov 34(6):1936–1962
Feremans L, Cule B, Goethals B (2018) Mining top-k quantile-based cohesive sequential patterns. In: Proceedings of the 2018 SIAM international conference on data mining. SIAM, pp 90–98
Fournier-Viger P, Gomariz A, Gueniche T, Mwamikazi E, Thomas R (2013) Tks: efficient mining of top-k sequential patterns. In: International conference on advanced data mining and applications. Springer, pp 109–120
Fradkin D, Mörchen F (2015) Mining sequential patterns for classification. Knowl Inf Syst 45(3):731–749
Article Google Scholar
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier
Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Min Knowl Discov 28(4):851–881
Article MathSciNet Google Scholar
Hsieh T-Y, Wang S, Sun Y, Honavar V (2021) Explainable multivariate time series classification: a deep neural network which learns to attend to important variables as well as time intervals. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 607–615
Hyndman RJ, Athanasopoulos G (2018) Forecasting: principles and practice. OTexts
Karlsson I, Papapetrou P, Boström H (2016) Generalized random shapelet forests. Data Min Knowl Discov 30(5):1053–1085
Article MathSciNet Google Scholar
Kate RJ (2016) Using dynamic time warping distances as features for improved time series classification. Data Min Knowl Discov 30(2):283–312
Article MathSciNet Google Scholar
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286
Article Google Scholar
Lam HT, Mörchen F, Fradkin D, Calders T (2014) Mining compressing sequential patterns. Stat Anal Data Min ASA Data Sci J 7(1):34–52
Article MathSciNet Google Scholar
Large J, Bagnall A, Malinowski S, Tavenard R (2019) On time series classification with dictionary-based classifiers. Intell Data Anal 23(5):1073–1089
Article Google Scholar
Laxman S, Sastry PS, Unnikrishnan KP (2007) A fast algorithm for finding frequent episodes in event streams. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 410–419
Le Nguyen T, Gsponer S, Ifrim G (2017) Time series classification by sequence learning in all-subsequence space. In: 2017 IEEE 33rd international conference on data engineering (ICDE). IEEE, pp 947–958
Le Nguyen T, Gsponer S, Ilie I, O’Reilly M, Ifrim G (2019) Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations. Data Min Knowl Discov 33(4):1183–1222
Article MathSciNet Google Scholar
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery. ACM, pp 2–11
Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315
Article Google Scholar
Lines J, Taylor S, Bagnall A (2018) Time series classification with hive-cote: the hierarchical vote collective of transformation-based ensembles. ACM Trans Knowl Discov Data 12(5):1–35
Löning M, Bagnall A, Ganesh S, Kazakov V, Lines J, Király FJ (2019) sktime: A unified interface for machine learning with time series. In: Workshop on systems for ML at NeurIPS
Lucas B, Shifaz A, Pelletier C, O’Neill L, Zaidi N, Goethals B, Petitjean F, Webb GI (2019) Proximity forest: an effective and scalable distance-based classifier for time series. Data Min Knowl Discov 33(3):607–635
Article Google Scholar
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Proceedings of the 31st international conference on neural information processing systems, pp 4768–4777
Mannila H, Toivonen H, Inkeri Verkamo A (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289
Article Google Scholar
Middlehurst M, Vickers W, Bagnall A (2019) Scalable dictionary classifiers for time series classification. In: International conference on intelligent data engineering and automated learning. Springer, pp 11–19
Middlehurst M, Large J, Bagnall A (2020) The canonical interval forest (CIF) classifier for time series classification. arXiv preprint arXiv:2008.09172
Middlehurst M, Large J, Cawley G, Bagnall A (2020) The temporal dictionary ensemble (TDE) classifier for time series classification. In: The European conference on machine learning and principles and practice of knowledge discovery in databases
Molnar C (2020) Interpretable machine learning. Lulu.com
Nguyen D, Luo W, Nguyen TD, Venkatesh S, Phung D (2018) Sqn2vec: learning sequence representation via sequential patterns with a gap constraint. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 569–584
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M-C (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Article Google Scholar
Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160
Article Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Petitjean F, Forestier G, Webb GI, Nicholson AE, Chen Y, Keogh E (2014) Dynamic time warping averaging of time series allows faster and more accurate classification. In: 2014 IEEE international conference on data mining. IEEE, pp 470–479
Petitjean F, Li T, Tatti N, Webb GI (2016) Skopus: mining top-k sequential patterns under leverage. Data Min Knowl Discov 30(5):1086–1111
Article MathSciNet Google Scholar
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 262–270
Raza A, Kramer S (2020) Accelerating pattern-based time series classification: a linear time and space string mining approach. Knowl Inf Syst 62(3):1113–1141
Article Google Scholar
Ribeiro MT, Singh S, Guestrin C (2016) “Why should i trust you?” Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021) The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 35(2):401–449
Schäfer P (2015) The boss is concerned with time series classification in the presence of noise. Data Min Knowl Discov 29(6):1505–1530
Article MathSciNet Google Scholar
Schäfer P (2016) Scalable time series classification. Data Min Knowl Discov 30(5):1273–1298
Article MathSciNet Google Scholar
Schäfer P, Leser U (2017) Fast and accurate time series classification with weasel. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp 637–646
Senin P, Malinchik S (2013) Sax-vsm: interpretable time series classification using sax and vector space model. In: 2013 IEEE 13th international conference on data mining. IEEE, pp 1175–1180
Shifaz A, Pelletier C, Petitjean F, Webb GI (2020) TS-CHIEF: a scalable and accurate forest algorithm for time series classification. Data Min Knowl Discov 34(3):742–775
Shokoohi-Yekta M, Bing H, Jin H, Wang J, Keogh E (2017) Generalizing DTW to the multi-dimensional case requires an adaptive approach. Data Min Knowl Discov 31(1):1–31
Article MathSciNet Google Scholar
Wang Z, Yan W, Oates T (2017) Time series classification from scratch with deep neural networks: a strong baseline. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 1578–1585
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82
Article Google Scholar
Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1–2):149–182
Article MathSciNet Google Scholar
Yeh C-CM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 1317–1322
Zaki MJ, Meira W (2014) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, Cambridge
Book Google Scholar
Zhou C, Cule B, Goethals B (2016) Pattern based sequence classification. IEEE Trans Knowl Data Eng 28(5):1285–1298. https://doi.org/10.1109/TKDE.2015.2510010
Article Google Scholar
Zhu H, Wang P, He X, Li Y, Wang W, Shi B (2010) Efficient episode mining with minimal and non-overlapping occurrences. In: 2010 IEEE international conference on data mining. IEEE, pp 1211–1216
Zimmermann A (2014) Understanding episode mining techniques: benchmarking on diverse, realistic, artificial data. Intell Data Anal 18(5):761–791
Article Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat. Methodol.) 67(2):301–320
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Antwerp, Antwerp, Belgium
Len Feremans & Bart Goethals
Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands
Boris Cule
Department of Accountancy and Finance, University of Antwerp, Antwerp, Belgium
Boris Cule
Faculty of Information Technology, Monash University, Melbourne, Australia
Bart Goethals

Authors

Len Feremans
View author publications
You can also search for this author in PubMed Google Scholar
Boris Cule
View author publications
You can also search for this author in PubMed Google Scholar
Bart Goethals
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Len Feremans.

Additional information

Responsible editor: Panagiotis Papapetrou.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feremans, L., Cule, B. & Goethals, B. PETSC: pattern-based embedding for time series classification. Data Min Knowl Disc 36, 1015–1061 (2022). https://doi.org/10.1007/s10618-022-00822-7

Download citation

Received: 30 September 2020
Accepted: 22 January 2022
Published: 24 March 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s10618-022-00822-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PETSC: pattern-based embedding for time series classification

Abstract

Access this article

Similar content being viewed by others

Introducing the contrast profile: a novel time series primitive that allows real world classification

Fast classification of univariate and multivariate time series through shapelet discovery

Elastic similarity and distance measures for multivariate time series

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PETSC: pattern-based embedding for time series classification

Abstract

Access this article

Similar content being viewed by others

Introducing the contrast profile: a novel time series primitive that allows real world classification

Fast classification of univariate and multivariate time series through shapelet discovery

Elastic similarity and distance measures for multivariate time series

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation