Advertisement

Knowledge and Information Systems

, Volume 60, Issue 2, pp 1105–1134 | Cite as

Similarity measures for time series data classification using grid representation and matrix distance

  • Yanqing YeEmail author
  • Jiang Jiang
  • Bingfeng Ge
  • Yajie Dou
  • Kewei Yang
Regular Paper
  • 367 Downloads

Abstract

Two similarity measures are proposed that can successfully capture both the numerical and point distribution characteristics of time series. More specifically, a novel grid representation for time series is first presented, with which a time series is segmented and compiled into a matrix format. Based on the proposed grid representation, two matrix matching algorithms, matrix-based Euclidean distance (GMED) and matrix-based dynamic time warping (GMDTW), are adapted to measure the similarity of matrix-like time series. Last, to assess the effectiveness of the proposed similarity measures, 1NN classification and K-means experiments are conducted using 22 online datasets from the UCR time series datasets Web site. In general, the results indicate that GMDTW measure is apparently superior to most current measures in accuracy, while the GMED can achieve much higher efficiency than dynamic time warping algorithm with equivalent performance. Furthermore, effects of the parameters in the proposed measures are analyzed and a way to determine the values of the parameters has been given.

Keywords

Time series Similarity measure Grid representation Matrix distance 1NN classification 

Notes

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 71571182, 71571185, and 71671186, and the Research Project of National University of Defense Technology. The authors would like to thank the UCR time series for providing online datasets and results of partial measures. Many thanks to the reviewers for proposing sound advices that are really helpful in improving our paper.

References

  1. 1.
    Leary DEO (2016) Ethics for big data and analytics. IEEE Intell Syst 31(4):81–84CrossRefGoogle Scholar
  2. 2.
    Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering: a decade review. Inf Syst 53:16–38CrossRefGoogle Scholar
  3. 3.
    Gandhi, A (2002) Content-based image retrieval: plant species identification. MS thesis, Oregon State UniversityGoogle Scholar
  4. 4.
    Esling P, Agon C (2012) Time series data mining. ACM Comput Surv 45(1):7–7CrossRefzbMATHGoogle Scholar
  5. 5.
    Nielsen CB, Larsen PG, Fitzgerald J, Woodcock J, Peleska J (2015) Systems of systems engineering: basic concepts, model-based techniques, and research directions. ACM Comput Surv 48(2):1–41CrossRefGoogle Scholar
  6. 6.
    Mori U, Mendiburu A, Lozano JA (2016) Similarity measure selection for clustering time series databases. IEEE Trans Knowl Data Eng 28(1):181–195CrossRefGoogle Scholar
  7. 7.
    Serra J, Arcos JL (2014) An empirical evaluation of similarity measures for time series classification. Knowl Based Syst 67:305–314CrossRefGoogle Scholar
  8. 8.
    Baydogan MG, Runger G (2016) Time series representation and similarity based on local autopatterns. Data Min Knowl Discov 30(2):476–509MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Keogh E, Chakrabarti K, Mehrotra S, Pazzani M (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, pp 151–163Google Scholar
  10. 10.
    Keogh E (1997) Fast similarity search in the presence of longitudinal scaling in time series databases. In: Proceedings of the ninth IEEE international conference on tools with artificial intelligence, pp 578–584Google Scholar
  11. 11.
    Keogh E, Pazzani M (2000) A simple dimensionality reduction technique for fast similarity search in large time series databases. In: Proceedings of the 4th Pacific-Asia conference on knowledge discovery and data mining, pp 122–133Google Scholar
  12. 12.
    Azzouzi M, Nabney IT (1998) Analysing time series structure with hidden Markov models. In: Proceedings of the IEEE conference on neural networks and signal processing, pp 402–408Google Scholar
  13. 13.
    Serr J, Kantz H, Serra X, Andrzejak RG (2012) Predictability of music descriptor time series and its application to cover song detection. IEEE Trans Audio Speech Lang Process 20:514–525Google Scholar
  14. 14.
    Weng X, Shen J (2008) Classification of multivariate time series using two-dimensional singular value decomposition. Knowl Based Syst 21:535–539CrossRefGoogle Scholar
  15. 15.
    Shieh J, Keogh E (2008) iSAX: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 623–631Google Scholar
  16. 16.
    Zhang Z, Tang P, Duan R (2015) Dynamic time warping under pointwise shape context. Inf Sci 315:88–101MathSciNetCrossRefGoogle Scholar
  17. 17.
    Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. www.cs.ucr.edu/eamonn/time_series_data/
  18. 18.
    Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time series clustering: a decade review. Inf Syst 53:16–38CrossRefGoogle Scholar
  19. 19.
    Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11Google Scholar
  20. 20.
    Shieh J, Keogh E (2008) iSAX: indexing and mining terabyte sized time series. In: Proceedings the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 623–631Google Scholar
  21. 21.
    Agrawal R, Faloutsos C, Swami A, Lomet D (ed) (1993) Efficient similarity search in sequence databases, foundations of data organization and algorithms. Springer, Berlin, pp 69–84Google Scholar
  22. 22.
    Chen L, TamerOzsu M (2003) Similarity-based retrieval of time-series data using multi-scale histograms, computer sciences technical report. University of Waterloo, Waterloo, CS-2003-31Google Scholar
  23. 23.
    An J, Chen H, Furuse K, Ohbo N, Keogh E (2003) Grid-based indexing for large time series databases. In: Intelligent data engineering and automated learning (IDEAL), pp 614–621Google Scholar
  24. 24.
    Duan G, Suzuki Y, Kawagoe K (2006) Grid representation of efficient similarity search in time series databases. In: Proceedings of the 22nd international conference on data engineering workshops (ICDEW’06), pp 64–70Google Scholar
  25. 25.
    Reshef DN, Reshef YA, Finucane HK, Grossman SR, Mcvean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524CrossRefzbMATHGoogle Scholar
  26. 26.
    Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–386CrossRefGoogle Scholar
  27. 27.
    Gorecki T (2014) Using derivatives in a longest common subsequence dissimilarity measure for time series classification. Pattern Recogn Lett 45:99–105CrossRefGoogle Scholar
  28. 28.
    Jeong YS, Jayaraman R (2015) Support vector-based algorithms with weighted dynamic time warping kernel function for time series classification. Knowl Based Syst 75:184–191CrossRefGoogle Scholar
  29. 29.
    Jeong YS, Jeong MK, Omitaomu OA (2011) Weighted dynamic time warping for time series classification. Pattern Recogn 44:2231–2240CrossRefGoogle Scholar
  30. 30.
    Chen L, Ng R (2004) On the marriage of Lp-norms and edit distance. In: VLDB04: Proceedings of the 30th international conference on very large data bases, pp 792–803Google Scholar
  31. 31.
    Das G, Gunopulos D, Mannila H (1997) Finding similar time series. In: Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery. Springer, Berlin, pp 88–100CrossRefGoogle Scholar
  32. 32.
    Morse MD, Patel JM (2007) An efficient and accurate method for evaluating time series similarity. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 569–580Google Scholar
  33. 33.
    Chen L, Zsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 491–502Google Scholar
  34. 34.
    Yueguo C, Nascimento MA, Beng CO, Tung AKH (2007) SpADe: on shape based pattern detection in streaming time series. In: Proceedings of the IEEE 23rd international conference on data engineering, pp 786–795Google Scholar
  35. 35.
    Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7:358–386CrossRefGoogle Scholar
  36. 36.
    Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1:1542–1552CrossRefGoogle Scholar
  37. 37.
    Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh EJ (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26:275–309MathSciNetCrossRefGoogle Scholar
  38. 38.
    Batista GEAPA, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: Proceedings of the 11th SIAM international conference on data mining. SIAM, pp 699–710Google Scholar
  39. 39.
    Javid MAJ, Blackwell T, Zimmer R, Alrifaie MM (2016) Analysis of information gain and Kolmogorov complexity for structural evaluation of cellular automata configurations. Connect Sci 28(2):1–16Google Scholar
  40. 40.
    Greckia T, Luczak M (2015) Multivariate time series classification with parametric derivative dynamic time warping. Expert Syst Appl 42:2305–2312CrossRefGoogle Scholar
  41. 41.
    Kate RJ (2015) Using dynamic time warping distances as features for improved time series classification. Data Min Knowl Discov 30(2):283–312MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Pietzsch T, Saalfeld S, Preibisch S, Tomancak P (2015) BigDataViewer: visualization and processing for large image data sets. Nat Methods 12(6):481–483CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.College of Systems EngineeringNational University of Defense TechnologyChangshaChina

Personalised recommendations