Abstract
Dynamic time warping (DTW) has proven itself to be an exceptionally strong distance measure for time series. DTW in combination with one-nearest neighbor, one of the simplest machine learning methods, has been difficult to convincingly outperform on the time series classification task. In this paper, we present a simple technique for time series classification that exploits DTW’s strength on this task. But instead of directly using DTW as a distance measure to find nearest neighbors, the technique uses DTW to create new features which are then given to a standard machine learning method. We experimentally show that our technique improves over one-nearest neighbor DTW on 31 out of 47 UCR time series benchmark datasets. In addition, this method can be easily extended to be used in combination with other methods. In particular, we show that when combined with the symbolic aggregate approximation (SAX) method, it improves over it on 37 out of 47 UCR datasets. Thus the proposed method also provides a mechanism to combine distance-based methods like DTW with feature-based methods like SAX. We also show that combining the proposed classifiers through ensembles further improves the performance on time series classification.
Similar content being viewed by others
Notes
Using two-tailed Wilcoxon signed-rank test.
Using more than one nearest neighbors has not been found to be helpful.
We compare all classifiers together using Friedman test in Sect. 5.3.
The same \(r\) values were also used in Feature-DTW-DTW-R as well as in other feature-based classifiers that used DTW-R.
References
Batista G, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. SDM, SIAM 11:699–710
Baydogan M, Runger G, Tuv E (2013) A bag-of-features framework to classify time series. IEEE Trans Pattern Recogn Mach Intell 35(11):2796–2802
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD workshop, Seattle, vol 10, pp 359–370
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27
Chen L, Ng R (2004) On the marriage of lp-norms and edit distance. In: Proceedings of the Thirtieth international conference on Very large data bases. vol 30, VLDB Endowment, pp 792–803
Chen Y, Hu B, Keogh EJ, Batista G (2013) DTW-D: Time series semi-supervised learning from a single example. In: Proceedings of the Nineteenth ACM SIGKDD Omternational Conference on Knowledge Discovery and Data Mining (KDD-2013) pp 383–391
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Dietterich TG (2000) Ensemble methods in machine learning. Multiple classifier systems. Springer, Heidelberg, pp 1–15
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases, In: Proceedings of the 1994 ACM SIGMOD international conference on Management of data, pp 419–429
Fulcher BD, Jones NS (2014) Highly comparative feature-based time-series classification. IEEE Trans Knowl Data Eng 26(12):3026–3037
Geurts P (2001) Pattern extraction for time series classification. Principles of data mining and knowledge discovery. Springer, Berlin, pp 115–127
Gudmundsson S, Runarsson TP, Sigurdsson S (2008) Support vector machines and dynamic time warping for time series. In: IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008 (IEEE World Congress on Computational Intelligence), pp 2772–2776
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Harvey D, Todd MD (2014) Automated feature design for numeric sequence classification by genetic programming. IEEE Trans Evolut Comput doi:10.1109/TEVC.2014.2341451
Haussler D (1999) Convolution kernels on discrete structures. Technical report, UC Santa Cruz
Hayashi A, Mizuhara Y, Suematsu N (2005) Embedding time series data for classification. Machine learning and data mining in pattern recognition. Springer, Berlin, pp 356–365
Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2013) Classification of time series by shapelet transformation. Data Min Knowl Discov 2:1–31
Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23(1):67–72
Japkowicz N, Shah M (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press, Cambridge
Kate RJ (2014) UWM time series classification webpage. http://www.uwm.edu/~katerj/timeseries
Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371
Keogh E, Lin J, Fu A (2005) HOT SAX: Efficiently finding the most unusual time series subsequence. In: Fifth IEEE International Conference on Data Mining (ICDM), pp 226–233
Keogh E, Zhu Q, Hu B, Hao Y, Xi X, Wei L, Ratanamahatana CA (2011) The UCR time series classification/clustering homepage. http://www.cs.ucr.edu/~eamonn/time_series_data
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144
Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315
Lines J, Bagnall A (2014) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 4:1–28
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge
Mizuhara Y, Hayashi A, Suematsu N (2006) Embedding of time series data by using dynamic time warping distances. Syst Comput Japan 37(3):1–9
Nanopoulos A, Alcock R, Manolopoulos Y (2001) Feature-based classification of time-series data. Int J Comput Res 10(3):49–61
Nugent C, Lopez J, Black N, Webb J (2002) The application of neural networks in the classification of the electrocardiogram. Computational intelligence processing in medical diagnosis. Springer, Berlin, pp 229–260
Ordónez P, Armstrong T, Oates T, Fackler J (2011) Classification of patients using novel multivariate time series representations of physiological data. In: IEEE 10th International Conference on Machine Learning and Applications and Workshops (ICMLA), vol 2, pp 172–179
Ratanamahatana CA, Keogh E (2004a) Everything you know about dynamic time warping is wrong. In: Third Workshop on Mining Temporal and Sequential Data, pp 22–25
Ratanamahatana CA, Keogh E (2004b) Making time-series classification more accurate using learned constraints. In: Proceedings of SIAM International Conference on Data Mining (SDM ’04), pp 11–22
Rodríguez JJ, Alonso CJ (2004) Interval and dynamic time warping-based decision trees. In: Proceedings of the 2004 ACM symposium on Applied computing, ACM, pp 548–552
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49
Senin P, Malinchik S (2013) SAX-VSM: Interpretable time series classification using sax and vector space model. In: IEEE 13th International Conference on Data Mining (ICDM), pp 1175–1180
Shieh J, Keogh E (2008) i SAX: Indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 623–631
Vickers A (2010) What is a P-value anyway? 34 stories to help you actually understand statistics. Addison-Wesley, Boston
Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings 18th International Conference on Data Engineering, 2002, pp 673–684
Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309
Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In: Proceedings of the 23rd international conference on Machine learning, ACM, pp 1033–1040
Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. ACM SIGKDD Explor Newsl 12(1):40–48
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 947–956
Acknowledgments
We thank editor Eamonn Keogh and the anonymous reviewers for their feedback to improve this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Eamonn Keogh.
Rights and permissions
About this article
Cite this article
Kate, R.J. Using dynamic time warping distances as features for improved time series classification. Data Min Knowl Disc 30, 283–312 (2016). https://doi.org/10.1007/s10618-015-0418-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-015-0418-x