Abstract
Motif discovery can be used as a subroutine in many time series data mining tasks such as classification, clustering and anomaly detection. A motif represents two or more highly similar subsequences of a time series. The vast majority of the motif discovery methods implicitly assume that subsequences need to be normalized before determining their similarity. While normalization is widely adopted, it may affect the discovery of motifs. We examine the effect of normalization on motif discovery using 96 real-world time series. To determine if the discovered motifs are meaningful, all time series are assigned labels that indicate the states of the system generating the time series. Our experiments show that in over half of the considered cases, normalization affects motif discovery negatively by returning motifs that are not meaningful. We therefore conclude that the assumption underlying normalization does not always hold for real-world time series and thus should not be uncritically adopted.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
In the practical implementation of motif discovery, an exclusion zone of length m/2 before and after the location of the subsequence of interest is commonly set [20]. This ensures that so-called trivial matches are avoided.
- 3.
The time series are downloaded from the website of the authors, where the names of some time series differ from the original naming convention.
- 4.
Due to the nature of the ECG and SLC time series, we concatenated the separate sequences into a single time series. Due to the large size of the SLC time series, we took a subset including two different states occurring at least twice.
- 5.
The selected time series, code and results are available at Github.
- 6.
It can be the case that several subsequences have minimum distance to the subsequence of interest. In this case only one subsequence is randomly chosen to be directly compared.
- 7.
Due to the demanding running time of the calculations, the maximum length of the time series is set to \(n=20,000\). We only consider time series without missing values and for which hold that the (sub)set consists of two or more states.
References
Dau, H.A., et al.: The UCR time series archive. IEEE/CAA J. Autom. Sin. 6(6), 1293–1305 (2019). https://doi.org/10.1109/jas.2019.1911747
Dau, H.A., Keogh, E.: Matrix profile V: a generic technique to incorporate domain knowledge into motif discovery. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 125–134 (2017)
Esling, P., Agon, C.: Time-series data mining. ACM Comput. Surv. 45(1), 12 (2012)
Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.-A.: Deep learning for time series classification: a review. Data Min. Knowl. Disc. 33(4), 917–963 (2019). https://doi.org/10.1007/s10618-019-00619-1
Gao, Y., Lin, J.: HIME: discovering variable-length motifs in large-scale time series. Knowl. Inf. Syst. 61(1), 513–542 (2018). https://doi.org/10.1007/s10115-018-1279-6
Gao, Y., Lin, J., Rangwala, H.: Iterative grammar-based framework for discovering variable-length time series motifs. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 111–116. IEEE (2017)
Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min. Knowl. Disc. 7(4), 349–371 (2003)
van Leeuwen, F., Bosma, B., van den Born, A., Postma, E.: RTL: a robust time series labeling algorithm. In: Abreu, P.H., Rodrigues, P.P., Fernández, A., Gama, J. (eds.) IDA 2021. LNCS, vol. 12695, pp. 414–425. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-74251-5_33
Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2–11 (2003)
Linardi, M., Zhu, Y., Palpanas, T., Keogh, E.: Matrix profile X: VALMOD-scalable discovery of variable-length motifs in data series. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1053–1066 (2018)
Madrid, F., Singh, S., Chesnais, Q., Mauck, K., Keogh, E.: Matrix profile XVI: efficient and effective labeling of massive time series archives. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 463–472 (2019). https://doi.org/10.1109/DSAA.2019.00061
Mohammad, Y., Nishida, T.: Exact discovery of length-range motifs. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) ACIIDS 2014. LNCS (LNAI), vol. 8398, pp. 23–32. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05458-2_3
Mueen, A., Chavoshi, N.: Enumeration of time series motifs of all lengths. Knowl. Inf. Syst. 45(1), 105–132 (2015)
Mueen, A., Keogh, E.: Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1089–1098 (2010)
Mueen, A., Keogh, E., Zhu, Q., Cash, S., Westover, B.: Exact discovery of time series motifs. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 473–484. SIAM (2009)
Patel, P., Keogh, E., Lin, J., Lonardi, S.: Mining motifs in massive time series databases. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings, pp. 370–377. IEEE (2002)
Senin, P., et al.: GrammarViz 2.0: a tool for grammar-based pattern discovery in time series. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8726, pp. 468–472. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44845-8_37
Shifaz, A., Pelletier, C., Petitjean, F., Webb, G.I.: TS-CHIEF: a scalable and accurate forest algorithm for time series classification. Data Min. Knowl. Disc. 34(3), 742–775 (2020)
Wang, X., et al.: RPM: representative pattern mining for efficient time series classification. In: EDBT, pp. 185–196 (2016)
Yeh, C.C.M., et al.: Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 1317–1322. IEEE (2016)
Yin, M.S., Tangsripairoj, S., Pupacdi, B.: Variable length motif-based time series classification. In: Boonkrong, S., Unger, H., Meesad, P. (eds.) Recent Advances in Information and Communication Technology. AISC, vol. 265, pp. 73–82. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06538-0_8
Yingchareonthawornchai, S., Sivaraks, H., Rakthanmanon, T., Ratanamahatana, C.A.: Efficient proper length time series motif discovery. In: 2013 IEEE 13th International Conference on Data Mining, pp. 1265–1270. IEEE (2013)
Zhu, Y., Yeh, C.C.M., Zimmerman, Z., Kamgar, K., Keogh, E.: Matrix profile XI: SCRIMP++: time series motif discovery at interactive speeds. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 837–846. IEEE (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
van Leeuwen, F., Bosma, B., den Born, A.v., Postma, E. (2023). Normalization in Motif Discovery. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2022. Lecture Notes in Computer Science, vol 13811. Springer, Cham. https://doi.org/10.1007/978-3-031-25891-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-25891-6_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25890-9
Online ISBN: 978-3-031-25891-6
eBook Packages: Computer ScienceComputer Science (R0)