Advertisement

Knowledge and Information Systems

, Volume 54, Issue 1, pp 237–263 | Cite as

Speeding up dynamic time warping distance for sparse time series data

  • Abdullah MueenEmail author
  • Nikan Chavoshi
  • Noor Abu-El-Rub
  • Hossein Hamooni
  • Amanda Minnich
  • Jonathan MacCarthy
Regular Paper

Abstract

Dynamic time warping (DTW) distance has been effectively used in mining time series data in a multitude of domains. However, in its original formulation DTW is extremely inefficient in comparing long sparse time series, containing mostly zeros and some unevenly spaced nonzero observations. Original DTW distance does not take advantage of this sparsity, leading to redundant calculations and a prohibitively large computational cost for long time series. We derive a new time warping similarity measure (AWarp) for sparse time series that works on the run-length encoded representation of sparse time series. The complexity of AWarp is quadratic on the number of observations as opposed to the range of time of the time series. Therefore, AWarp can be several orders of magnitude faster than DTW on sparse time series. AWarp is exact for binary-valued time series and a close approximation of the original DTW distance for any-valued series. We discuss useful variants of AWarp: bounded (both upper and lower), constrained, and multidimensional. We show applications of AWarp to three data mining tasks including clustering, classification, and outlier detection, which are otherwise not feasible using classic DTW, while producing equivalent results. Potential areas of application include bot detection, human activity classification, search trend analysis, seismic analysis, and unusual review pattern mining.

Keywords

Sparse time series Dynamic time warping Run-length encoding 

Notes

Acknowledgements

This work was supported by the NSF CCF Grant No. 1527127 and the NSF Graduate Research Fellowship under Grant No. DGE-0237002.

References

  1. 1.
    Mueen A, Keogh E (2010) Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining—KDD’10, number C in KDD’10. ACM Press, p 1089Google Scholar
  2. 2.
    Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp 947–956Google Scholar
  3. 3.
    Shokoohi-Yekta M, Chen Y, Campana B, Hu B, Zakaria J, Keogh E (2015) Discovery of meaningful rules in time series. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining—KDD’15. ACM Press, New York, pp 1085–1094Google Scholar
  4. 4.
    Hamooni H, Mueen A (2014) Dual-domain hierarchical classification of phonetic time series. In: ICDM 2014. ICDMGoogle Scholar
  5. 5.
    Keogh E (2002) Exact indexing of dynamic time warping. In: Proceedings of the 28th international conference on very large data bases, VLDB’02, pp 406–417Google Scholar
  6. 6.
    Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. ACM SIGMOD Rec 23(2):419–429CrossRefGoogle Scholar
  7. 7.
    Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Disc 26(2):275–309MathSciNetCrossRefGoogle Scholar
  8. 8.
    Murray D, Stankovic L, Refit: electrical load measurements. http://www.refitsmarthomes.org/
  9. 9.
    Cook DJ, Crandall AS, Thomas BL, Krishnan NC (2013) CASAS: a smart home in a box. Computer 46(7):62–69CrossRefGoogle Scholar
  10. 10.
  11. 11.
    Boulgouris N, Plataniotis K, Hatzinakos D (2004) Gait recognition using dynamic time warping. In: IEEE 6th workshop on multimedia signal processing. IEEE, pp 263–266Google Scholar
  12. 12.
    Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49CrossRefzbMATHGoogle Scholar
  13. 13.
    Keogh EJ, Pazzani MJ (2000) Scaling up dynamic time warping for datamining applications. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining—KDD’00. ACM Press, New York, pp 285–289Google Scholar
  14. 14.
    Rath TM, Manmatha R (2003) Word image matching using dynamic time warping. In: 2003. Proceedings. 2003 IEEE computer society conference on computer vision and pattern recognition, vol 2. IEEE, p II—521Google Scholar
  15. 15.
    Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD Workshop, pp 359–370Google Scholar
  16. 16.
    Al-Naymat G, Chawla S, Taheri J (2009) SparseDTW: a novel approach to speed up dynamic time warping. In: Proceedings of the Eighth Australasian data mining conference, vol 101. Australian computer society, Inc., Darlinghurst, Australia, pp 117–127Google Scholar
  17. 17.
    Tan LN, Alwan A, Kossan G, Cody ML, Taylor CE (2015) Dynamic time warping and sparse representation classification for birdsong phrase classification using limited training data. J Acoust Soc Am 137(3):1069–80CrossRefGoogle Scholar
  18. 18.
    Chu S, Keogh E, Hart D, Pazzani M (2002) Iterative deepening dynamic time warping for time series, Chapter 12, pp 195–212Google Scholar
  19. 19.
    Salvador S, Chan P (2007) Toward accurate dynamic time warping in linear time and space. Intell Data Anal 11(5):561–580Google Scholar
  20. 20.
    Sart D, Mueen A, Najjar W, Niennattrakul V, Keogh E (2010) Accelerating dynamic time warping subsequnce search with GPUs and FPGAs. ICDM 2010. In: Proceedings—IEEE international conference on data mining, ICDM, pp 1001–1006Google Scholar
  21. 21.
    Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining—KDD ’12. ACM Press, New York, p 262Google Scholar
  22. 22.
    Begum N, Ulanova L, Wang J, Keogh E (2015) Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In: Proceedings of the 21st ACM SIGKDD international conference on knowledge discovery and data mining- KDD’15. ACM Press, New York, pp 49–58Google Scholar
  23. 23.
    Assent I, Wichterich M, Krieger R, Kremer H, Seidl T (2009) Anticipatory DTW for efficient similarity search in time series databases. J Proc VLDB Endow 2(1):826–837CrossRefGoogle Scholar
  24. 24.
    Candan KS, Rossini R, Sapino ML, Wang X (2012) sDTW: computing DTW distances using locally relevant constraints based on salient feature alignments. PVLDB 5(11):1519–1530Google Scholar
  25. 25.
    Shokoohi-Yekta M, Wang J, Keogh E, On the non-trivial generalization of dynamic time warping to the multi-dimensional case, Chapter 33, pp 289–297Google Scholar
  26. 26.
    Lines J, Davis L, Hills J, Bagnall A (2012) A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp 289–297Google Scholar
  27. 27.
    Mueen A (2013) Enumeration of time series motifs of all lengths. In: Proceedings—IEEE international conference on data mining, ICDM. ICDM, pp 547–556Google Scholar
  28. 28.
    Zhu Y, Zimmerman Z, Senobari NS, Yeh CCM, Funning G, Mueen A, Brisk P, Keogh E (2016) Matrix profile II: exploiting a novel algorithm and GPUs to break the one hundred million Barrier for time series motifs and joins. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 739–748Google Scholar
  29. 29.
    Awarp: Warping Similarity for Sparse Time Series. http://www.cs.unm.edu/~mueen/Projects/AWarp/
  30. 30.
    Zhu Q, Batista G, Rakthanmanon T, Keogh E (2012) A novel approximation to dynamic time warping allows anytime clustering of massive time series datasets. In: Proceedings of the 2012 SIAM international conference on data mining, pp 999–1010Google Scholar
  31. 31.
    Yeh CCM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 1317–1322Google Scholar
  32. 32.
    Silva DF, Batista GEAPA (2016) Speeding up all-pairwise dynamic time warping matrix calculation. In: Proceedings of the 2016 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, Philadelphia, pp 837–845Google Scholar
  33. 33.
    Shieh J, Keogh E (2009) ISAX: disk-aware mining and indexing of massive time series datasets. Data Min Knowl Disc 19(1):24–57CrossRefGoogle Scholar
  34. 34.
    Chavoshi N, Hamooni H, Mueen A (2016) DeBot: Twitter Bot detection via warped correlation. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, 12, pp 817–822Google Scholar
  35. 35.
    Mueen A, Keogh E, Zhu Q, Cash S, Westover B (2009) Exact discovery of time series motifs. In: Proceedings of the 2009 SIAM international conference on data mining, pp 473–484Google Scholar
  36. 36.
    Yankov D, Keogh E, Medina J, Chiu B, Zordan V (2007) Detecting time series motifs under uniform scaling. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining KDD 07, KDD’07, p 844Google Scholar
  37. 37.
    Anderson KR, Gaby JE (1983) Dynamic waveform matching. Inf Sci 31(3):221–242 12MathSciNetCrossRefGoogle Scholar
  38. 38.
    Herrera RH, Fomel S, van der Baan M (2014) Automatic approaches for seismic to well tying. Interpretation 2(2):SD9–SD17CrossRefGoogle Scholar
  39. 39.
  40. 40.
  41. 41.
    Yankov D, Keogh EJ, Rebbapragada U (2007) Disk aware discord discovery: finding unusual time series in terabyte sized datasets. In: ICDM, pp 381–390Google Scholar
  42. 42.
    Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: The 17th ACM SIGKDD international conference, pp 1154–1162Google Scholar

Copyright information

© Springer-Verlag London Ltd. 2017

Authors and Affiliations

  • Abdullah Mueen
    • 1
    Email author
  • Nikan Chavoshi
    • 1
  • Noor Abu-El-Rub
    • 1
  • Hossein Hamooni
    • 1
  • Amanda Minnich
    • 1
  • Jonathan MacCarthy
    • 2
  1. 1.Department of Computer ScienceUniversity of New MexicoAlbuquerqueUSA
  2. 2.Los Alamos National LaboratoryLos AlamosUSA

Personalised recommendations