Abstract
The problem of time series anomaly detection has attracted a lot of attention due to its usefulness in various application domains. However, most of the methods proposed so far used Euclidean distance to deal with this problem. Dynamic Time Warping (DTW) distance is more suitable than Euclidean distance because of its capability in shape-based similarity checking in many practical fields, for example those with multimedia data. In this paper, we propose two efficient anomaly detection methods, EP-Leader-DTW and SEP-Leader-DTW, for static and streaming time series under DTW, respectively. Our methods are based on time series segmentation, subsequence clustering, and anomaly scoring. For segmentation, the major extrema method is used to obtain subsequences. For clustering, we apply Leader algorithm to cluster the subsequences along with a lower bounding technique to accelerate DTW distance computation. Experimental results on several benchmark time series datasets reveal that our method for anomaly detection in static time series under DTW can perform very fast and accurately on large time series datasets. For streaming time series, our method can meet the instantaneous requirement with high accuracy. As a result, our anomaly detection methods are applicable to both static and streaming time series in practice.
Similar content being viewed by others
References
Anh, D.T., & Thanh, L.H. (2015). An efficient implementation of k-means clustering for time series data with DTW distance. International Journal of Business Intelligence and Data Mining, (Scopus), 10 (3), 213–232. https://doi.org/10.1504/IJBIDM.2015.071311, .
Berndt, D.J., & Clifford, J. (1994). Using dynamic time warping to find patterns in time series. In Proceedings of AAAI-94 Workshop on Knowledge Discovery in Databases, Seattle (pp. 229–248).
Bu, Y., Leung, T.W., Fu, A., Keogh, E., Pei, J, & Meshkin S. (2007). WAT: Finding Top-K Discords in Time Series Database. In Proceedings of the 2007 SIAM International Conference on Data Mining (SDM’07), Minneapolis, MN, USA, April 26-28, DOI https://doi.org/10.1137/1.9781611972771.43., (to appear in print).
Chandola, V., Cheboli, D., & Kumar, V. (2009). Detecting anomalies in a time series database Technical Report TR-09-004. University of Minnesota: Department of Computer Science and Engineering.
Do, L.V., & Anh, D.T. (2017). Time series motif discovery based on subsequence join under dynamic time warping. In Proceedings of the 2017 international conference on data mining, communications and information technology DMICT, May 25–26 2017 Phuket Thailand, DOI https://doi.org/10.1145/3089871.3089874, (to appear in print).
Fink, E., & Gandhi, H.S. (2007). Important extrema of time series. In Proceedings of IEEE International Conference on System, Man and Cybernetics. Montreal, Canada 366–372, DOI https://doi.org/10.1109/ICSMC.2007.4414161, (to appear in print).
Gensler, A., & Sick, B. (2014). Novel criteria to measure performance of time series segmentation techniques (pp. 193–204).
Hartigan, J.A. (1975). Clustering Algorithms. New York: John Wiley & Sons.
He Z, Xu X, & Deng S. (2003). Discovering Cluster-based Local Outliers. Pattern Recognition Letters, 24(9-10), 1641–1650. https://doi.org/10.1016/S0167-8655(03)00003-5.
Itakura, F. (1975). Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics Speech, and Signal Processing, 23(1), 67–72. https://doi.org/10.1109/TASSP.1975.1162641.
Jones, M., Nikovski, D., Imamura, M., & Hirata, T. (2016). Exemplar learning for extremely anomaly detection in real-value time series.
Keogh, E., & Ratanamahatana, C.A. (2005). Exact indexing of dynamic time warping. Knowledge and information systems, 7(3), 358–386. https://doi.org/10.1007/s10115-004-0154-9.
Keogh, E., Chakrabarti, K., Pazzani, M., & Mehrotra, S. (2001). Dimensionality deduction for fast similarity search in large time series database. Journal of Knowledge and Information Systems, 3(3), 263–286.
Keogh, E., Lin, J., & Fu, A. (2005). HOT SAX: Efficiently finding the most unusual time series subsequence. In Proceedings of 5th IEEE Int. Confefence on Data Mining ICDM Houston (pp. 226–233), DOI https://doi.org/10.1109/ICDM.2005.79, (to appear in print).
Keogh, E., Lin, J., & Fu, A. (2019). website of UCR Archive: http://www.cs.ucr.edu/~eamonn/discords/, Accessed 17 September.
Kha, N.H., & Anh, D.T. (2015). From cluster-based outlier detection to time series discord discovery Trends and Applications in Knowledge Discovery and Data Mining-PAKDD 2015 Workshops: BigPMA, VLSP, QIMIE, BAEBH, Ho Chi Minh City, Vietnam, May 19-21, 2015, X.L. Li et al. (Eds.), LNAI 9441, Springer, pp 16-28. https://doi.org/10.1007/978-3-319-25660-3_2..
Kim, S., Park, S., & Chu, W.W. (2001). An index-based approach for similarity search supporting time warping in large sequence databases. In proceedings of 17th International Conference on Data Engineering (pp. 607–614).
Lemire, D. (2009). Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern recognition, 42(9), 2169–2180. https://doi.org/10.1016/j.patcog.2008.11.030.
Leng, M., Chen, X., & Li, L. (2008). Variable length methods for detecting anomaly patterns in time series. In 2008 International Symposium on Computational Intelligence and Design, (Vol. 2 pp. 52–56): IEEE, DOI https://doi.org/10.1109/ISCID.2008.95, (to appear in print).
Li, G., Braysy, O., Jiang, L., Wu, Z., & Wang, Y. (2013). Finding time series discord based on bit representation clustering. Knowledge-Based Systems, 52, 243–254.
Lin, J., Keogh, E., Lonardi, S., & Chiu, B. (2003). Symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, San Diego, CA, Jun 13 (pp. 2–11).
Ma, J., & Perkins, S. (2003). Time series novelty detection using one-class support vector machines. In Proceedings of International Joint Conference on Neural Networks, (Vol. 3 pp. 1741–1745), DOI https://doi.org/10.1109/IJCNN.2003.1223670.
Nevill-Manning, C.G., & Witten, I.H. (1997). Identifying hierarchical structure in sequences: a linear-time algorithm. Journal of Artificial Intelligence Research, 7, 67–82. https://doi.org/10.1613/jair.374.
Park, H.S., & Jun, C.H. (2009). A simple and fast algorithm for K-medoids clustering. Expert systems with applications, 36 (2), 3336–3341. https://doi.org/10.1016/j.eswa.2008.01.039.
Petitjean, F., Ketterlin, A., & Gancarski, P. (2011). A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognition, 44(3), 678–693. https://doi.org/10.1016/j.patcog.2010.09.013.
Phien, N.N. (2018). An Efficient Method for Estimating Time Series Motif Length Using Sequitur Algorithm. In International Conference on Machine Learning and Intelligent Communications 2018 Jul 6, Springer, Cham, (pp. 531-538). https://doi.org/10.1007/978-3-030-00557-3_52.
Pratt, K.B., & Fink, E. (2002). Search for Pattern in Compressed Time Series. International Journal of Image and Graphics, 2(1), 89–106. https://doi.org/10.1142/S0219467802000482..
Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., & Keogh, E. (2013). Addressing big data time series: Mining trillions of time series subsequences under dynamic time warping. ACM Transactions on Knowledge Discovery from Data TKDD, 7(3), 10. https://doi.org/10.1145/2513092.2500489.
Ratanamahatana, C.A., & Keogh, E. (2004). Everything you know about Dynamic Time Warping is wrong. In Proceedings of 3rd Workshop on Mining Temporal and Sequential Data (pp. 22–25).
Safia, A.M.B., & Aghbari, Z.A. (2011). Searching data streams for variable length anomalies. In Proceedings of International Conference on Innovations in Information Technology, Apr 25, Abu Dhabi (pp. 297–302: IEEE).
Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43–49. https://doi.org/10.1109/TASSP.1978.1163055.
Thuy, H.T.T., Anh, D.T., & Chau, V.T.N. (2017). Comparing three time series segmentation methods via novel evaluation criteria. In Proceedings of IEEE International Conference on Information Technology, Information Systems, and Electrical Engineering, Indonesia Nov 1 (pp. 171–176): IEEE, DOI https://doi.org/10.1109/ICITISEE.2017.8285489, (to appear in print).
Truong, C.D., & Anh, D.T. (2015). An efficient method for motif and anomaly detection in time series based on clustering. International Journal of Business Intelligence and Data Mining, 10(4), 356–377. https://doi.org/10.1504/IJBIDM.2015.072212.
Vinh, V.D., & Anh, D.T. (2016). Efficient Subsequence Join over Time Series under Dynamic Time Warping. Recent Developments in Intelligent Information and Database Systems, 42, 41–52. https://doi.org/10.1007/978-3-319-31277-4_4..
Vlachos, M., Yu, P., & Castelli, V. (2005). On periodicity detection and structural periodic similarity. In Proceedings of SIAM International Conference on Data Mining SDM, Apr 21 (pp. 449–460): Society for Industrial and Applied Mathematics, DOI https://doi.org/10.1137/1.9781611972757.40, (to appear in print).
Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., & Keogh, E. (2013). Experimental comparison of representations and distance measures for time series data. Data Mining and Knowledge Discovery, 26, 275–309. https://doi.org/10.1007/s10618-012-0250-5.
Wei, L., Keogh, E., & Xi, X. (2006). Saxually explicit images: Finding unusual shapes in large image databases. In Proceedings of the 6th IEEE International Conference on Data Mining ICDM 2006 Dec 18, (pp. 711–720): IEEE.
Yi, B.K., Jagadish, H.V., & Faloutsos, C. (1998). Efficient retrieval of similar time sequences under time warping. In Proceedings of 14th International Conference on Data Engineering (pp. 201–208).
Zhang, C., Liu, H., & Yin, A. (2017). Research of detection algorithm for time series abnormal subsequence. In Proceedings of International Conference of Pioneering Computer Scientists, Engineers and Educators ICPCSEE CCIS 727 (pp. 12–26).
Acknowledgements
This research is funded by Ho Chi Minh City University of Technology (HCMUT), VNU-HCM, under grant number BK-SDH-2020-8141217.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Thuy, H.T.T., Anh, D.T. & Chau, V.T.N. Efficient segmentation-based methods for anomaly detection in static and streaming time series under dynamic time warping. J Intell Inf Syst 56, 121–146 (2021). https://doi.org/10.1007/s10844-020-00609-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-020-00609-6