Data Mining and Knowledge Discovery

, Volume 30, Issue 2, pp 283–312 | Cite as

Using dynamic time warping distances as features for improved time series classification

  • Rohit J. KateEmail author


Dynamic time warping (DTW) has proven itself to be an exceptionally strong distance measure for time series. DTW in combination with one-nearest neighbor, one of the simplest machine learning methods, has been difficult to convincingly outperform on the time series classification task. In this paper, we present a simple technique for time series classification that exploits DTW’s strength on this task. But instead of directly using DTW as a distance measure to find nearest neighbors, the technique uses DTW to create new features which are then given to a standard machine learning method. We experimentally show that our technique improves over one-nearest neighbor DTW on 31 out of 47 UCR time series benchmark datasets. In addition, this method can be easily extended to be used in combination with other methods. In particular, we show that when combined with the symbolic aggregate approximation (SAX) method, it improves over it on 37 out of 47 UCR datasets. Thus the proposed method also provides a mechanism to combine distance-based methods like DTW with feature-based methods like SAX. We also show that combining the proposed classifiers through ensembles further improves the performance on time series classification.


Time series classification Dynamic time warping Symbolic aggregate approximation 



We thank editor Eamonn Keogh and the anonymous reviewers for their feedback to improve this paper.


  1. Batista G, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. SDM, SIAM 11:699–710Google Scholar
  2. Baydogan M, Runger G, Tuv E (2013) A bag-of-features framework to classify time series. IEEE Trans Pattern Recogn Mach Intell 35(11):2796–2802CrossRefGoogle Scholar
  3. Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD workshop, Seattle, vol 10, pp 359–370Google Scholar
  4. Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27CrossRefGoogle Scholar
  5. Chen L, Ng R (2004) On the marriage of lp-norms and edit distance. In: Proceedings of the Thirtieth international conference on Very large data bases. vol 30, VLDB Endowment, pp 792–803Google Scholar
  6. Chen Y, Hu B, Keogh EJ, Batista G (2013) DTW-D: Time series semi-supervised learning from a single example. In: Proceedings of the Nineteenth ACM SIGKDD Omternational Conference on Knowledge Discovery and Data Mining (KDD-2013) pp 383–391Google Scholar
  7. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  8. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetzbMATHGoogle Scholar
  9. Dietterich TG (2000) Ensemble methods in machine learning. Multiple classifier systems. Springer, Heidelberg, pp 1–15CrossRefGoogle Scholar
  10. Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552CrossRefGoogle Scholar
  11. Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases, In: Proceedings of the 1994 ACM SIGMOD international conference on Management of data, pp 419–429Google Scholar
  12. Fulcher BD, Jones NS (2014) Highly comparative feature-based time-series classification. IEEE Trans Knowl Data Eng 26(12):3026–3037CrossRefGoogle Scholar
  13. Geurts P (2001) Pattern extraction for time series classification. Principles of data mining and knowledge discovery. Springer, Berlin, pp 115–127CrossRefGoogle Scholar
  14. Gudmundsson S, Runarsson TP, Sigurdsson S (2008) Support vector machines and dynamic time warping for time series. In: IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008 (IEEE World Congress on Computational Intelligence), pp 2772–2776Google Scholar
  15. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18CrossRefGoogle Scholar
  16. Harvey D, Todd MD (2014) Automated feature design for numeric sequence classification by genetic programming. IEEE Trans Evolut Comput doi: 10.1109/TEVC.2014.2341451
  17. Haussler D (1999) Convolution kernels on discrete structures. Technical report, UC Santa CruzGoogle Scholar
  18. Hayashi A, Mizuhara Y, Suematsu N (2005) Embedding time series data for classification. Machine learning and data mining in pattern recognition. Springer, Berlin, pp 356–365CrossRefGoogle Scholar
  19. Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2013) Classification of time series by shapelet transformation. Data Min Knowl Discov 2:1–31Google Scholar
  20. Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23(1):67–72CrossRefGoogle Scholar
  21. Japkowicz N, Shah M (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  22. Kate RJ (2014) UWM time series classification webpage.
  23. Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371CrossRefMathSciNetGoogle Scholar
  24. Keogh E, Lin J, Fu A (2005) HOT SAX: Efficiently finding the most unusual time series subsequence. In: Fifth IEEE International Conference on Data Mining (ICDM), pp 226–233Google Scholar
  25. Keogh E, Zhu Q, Hu B, Hao Y, Xi X, Wei L, Ratanamahatana CA (2011) The UCR time series classification/clustering homepage.
  26. Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144CrossRefMathSciNetGoogle Scholar
  27. Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315CrossRefGoogle Scholar
  28. Lines J, Bagnall A (2014) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 4:1–28CrossRefMathSciNetGoogle Scholar
  29. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  30. Mizuhara Y, Hayashi A, Suematsu N (2006) Embedding of time series data by using dynamic time warping distances. Syst Comput Japan 37(3):1–9CrossRefGoogle Scholar
  31. Nanopoulos A, Alcock R, Manolopoulos Y (2001) Feature-based classification of time-series data. Int J Comput Res 10(3):49–61Google Scholar
  32. Nugent C, Lopez J, Black N, Webb J (2002) The application of neural networks in the classification of the electrocardiogram. Computational intelligence processing in medical diagnosis. Springer, Berlin, pp 229–260CrossRefGoogle Scholar
  33. Ordónez P, Armstrong T, Oates T, Fackler J (2011) Classification of patients using novel multivariate time series representations of physiological data. In: IEEE 10th International Conference on Machine Learning and Applications and Workshops (ICMLA), vol 2, pp 172–179Google Scholar
  34. Ratanamahatana CA, Keogh E (2004a) Everything you know about dynamic time warping is wrong. In: Third Workshop on Mining Temporal and Sequential Data, pp 22–25Google Scholar
  35. Ratanamahatana CA, Keogh E (2004b) Making time-series classification more accurate using learned constraints. In: Proceedings of SIAM International Conference on Data Mining (SDM ’04), pp 11–22Google Scholar
  36. Rodríguez JJ, Alonso CJ (2004) Interval and dynamic time warping-based decision trees. In: Proceedings of the 2004 ACM symposium on Applied computing, ACM, pp 548–552Google Scholar
  37. Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49CrossRefzbMATHGoogle Scholar
  38. Senin P, Malinchik S (2013) SAX-VSM: Interpretable time series classification using sax and vector space model. In: IEEE 13th International Conference on Data Mining (ICDM), pp 1175–1180Google Scholar
  39. Shieh J, Keogh E (2008) i SAX: Indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 623–631Google Scholar
  40. Vickers A (2010) What is a P-value anyway? 34 stories to help you actually understand statistics. Addison-Wesley, BostonGoogle Scholar
  41. Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings 18th International Conference on Data Engineering, 2002, pp 673–684Google Scholar
  42. Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309CrossRefMathSciNetGoogle Scholar
  43. Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In: Proceedings of the 23rd international conference on Machine learning, ACM, pp 1033–1040Google Scholar
  44. Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. ACM SIGKDD Explor Newsl 12(1):40–48CrossRefGoogle Scholar
  45. Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 947–956Google Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  1. 1.University of Wisconsin-MilwaukeeMilwaukeeUSA

Personalised recommendations