Advertisement

Data Mining and Knowledge Discovery

, Volume 26, Issue 2, pp 275–309 | Cite as

Experimental comparison of representation methods and distance measures for time series data

  • Xiaoyue Wang
  • Abdullah Mueen
  • Hui Ding
  • Goce Trajcevski
  • Peter Scheuermann
  • Eamonn Keogh
Article

Abstract

The previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation methods for dimensionality reduction or novel similarity measures for the underlying data. In the vast majority of cases, each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this article, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. In addition to providing a unified validation of some of the existing achievements, our experiments also indicate that, in some cases, certain claims in the literature may be unduly optimistic.

Keywords

Time series Representation Distance measure Experimental comparison 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aßfalg J, Kriegel H-P, Kröger P, Kunath P, Pryakhin A, Renz M (2006) Similarity search on time series based on threshold queries. In: EDBTGoogle Scholar
  2. Aßfalg J, Kriegel H-P, Kroger P, Kunath P, Pryakhin A, Renz M (2008) Similarity search in multimedia time series data using amplitude-level features. In: MMM’08, pp 123–133Google Scholar
  3. Additional experiment results for representation and similarity measures of time series. http://www.ece.northwestern.edu/~hdi117/tsim.htm
  4. Alon J, Athitsos V, Sclaroff S (2005) Online and offline character recognition using alignment to prototypes. In: ICDAR’05, pp 839–845Google Scholar
  5. André-Jönsson H, Badal DZ (1997) Using signature files for querying time-series data. In: PKDDGoogle Scholar
  6. Assent I, Wichterich M, Krieger R, Kremer H, Seidl T (2009) Anticipatory dtw for efficient similarity search in time series databases. PVLDB 2(1): 826–837Google Scholar
  7. Bennet B, Galton A (2004) A unifying semantics for time and events. Artif Intell 153(1–2): 13–48CrossRefGoogle Scholar
  8. Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD workshop, pp 359–370Google Scholar
  9. Cai Y, Ng RT (2004) Indexing spatio-temporal trajectories with chebyshev polynomials. In: SIGMOD conferenceGoogle Scholar
  10. Cardle M (2004) Automated motion EDITInG. In: Technical report, Computer Laboratory, University of Cambridge, CambridgeGoogle Scholar
  11. Chan K-p, Fu AW-C (1999) Efficient time series matching by wavelets. In: ICDEGoogle Scholar
  12. Chen L, Ng RT (2004) On the marriage of lp-norms and edit distance. In: VLDBGoogle Scholar
  13. Chen L, Özsu MT, Oria V (2005a) Robust and fast similarity search for moving object trajectories. In: SIGMOD conferenceGoogle Scholar
  14. Chen L, Özsu MT, Oria V (2005b) Using multi-scale histograms to answer pattern existence and shape match queries. In: SSDBMGoogle Scholar
  15. Chen Q, Chen L, Lian X, Liu Y, Yu JX (2007a) Indexable PLA for efficient similarity search. In: VLDBGoogle Scholar
  16. Chen Y, Nascimento MA, Ooi BC, Tung AKH (2007b) SpADe: On shape-based pattern detection in streaming time series. In: ICDEGoogle Scholar
  17. Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New YorkzbMATHGoogle Scholar
  18. Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: SIGMOD conferenceGoogle Scholar
  19. Flato E (2000) Robust and efficient computation of planar minkowski sums. Master’s thesis, School of Exact Sciences, Tel-Aviv UniversityGoogle Scholar
  20. Frentzos E, Gratsias K, Theodoridis Y (2007) Index-based most similar trajectory search. In: ICDEGoogle Scholar
  21. Geurts P (2001) Pattern extraction for time series classification. In: PKDDGoogle Scholar
  22. Geurts P (2002) Contributions to decision tree induction: bias/variance tradeoff and time series classification. PhD thesis, University of Liège, BelgiumGoogle Scholar
  23. Jia S, Qian Y, Dai G (2004) An advanced segmental semi-markov model based online series pattern detection. In: ICPR (3)’04, pp 634–637Google Scholar
  24. Jiawei H, Kamber M (2005) Data mining: concepts and techniques. Morgan Kaufmann Publishers, CaliforniaGoogle Scholar
  25. Karamitopoulos L, Evangelidis G (2009) A dispersion-based paa representation for time series. In: CSIE (4), pp 490–494Google Scholar
  26. Karydis I, Nanopoulos A, Papadopoulos AN, Manolopoulos Y (2005) Evaluation of similarity searching methods for music data in P2P networks. IJBIDM 1(2)Google Scholar
  27. Kawagoe K, Ueda T (2002) a similarity search method of time series data with combination of Fourier and wavelet transforms. In: TIMEGoogle Scholar
  28. Keogh EJ (2002) Exact indexing of dynamic time warping. In: VLDBGoogle Scholar
  29. Keogh EJ (2006) A decade of progress in indexing and mining large time series databases. In: VLDBGoogle Scholar
  30. Keogh EJ, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4): 349–371MathSciNetCrossRefGoogle Scholar
  31. Keogh EJ, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3): 358–386CrossRefGoogle Scholar
  32. Keogh EJ, Chakrabarti K, Mehrotra S, Pazzani MJ (2001a) Locally adaptive dimensionality reduction for indexing large time series databases. In: SIGMOD conference, pp 151–162Google Scholar
  33. Keogh EJ, Chakrabarti K, Pazzani MJ, Mehrotra S (2001b) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3): 263–286zbMATHCrossRefGoogle Scholar
  34. Keogh E, Xi X, Wei L, Ratanamahatana C (2006) The UCR time series dataset. http://www.cs.ucr.edu/~eamonn/time_series_data/
  35. Keogh EJ, Wei L, Xi X, Vlachos M, Lee S-H, Protopapas P (2009) Supporting exact indexing of arbitrarily rotated shapes and periodic time series under euclidean and warping distance measures. VLDB J 18(3):611–630Google Scholar
  36. Kim S-W, Park S, Chu WW (2001) An index-based approach for similarity search supporting time warping in large sequence databases. In: ICDEGoogle Scholar
  37. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAIGoogle Scholar
  38. Korn F, Jagadish HV, Faloutsos C (1997) Efficiently supporting ad hoc queries in large datasets of time sequences. In: SIGMOD conferenceGoogle Scholar
  39. Kumar A, Jawahar CV, Manmatha R (2007) Efficient search in document image collections. In: ACCV (1)’07, pp 586–595Google Scholar
  40. Lemire D (2009) Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern recognition, pp 2169–2180Google Scholar
  41. Lin Y (2006) Efficient human motion retrieval in large databases. In: GRAPHITE, pp 31–37Google Scholar
  42. Lin J, Keogh EJ, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2): 107–144MathSciNetCrossRefGoogle Scholar
  43. Morse MD, Patel JM (2007) An efficient and accurate method for evaluating time series similarity. In: SIGMOD conferenceGoogle Scholar
  44. Olofsson P (2005) Probability, statistics and stochastic processes. Wiley-Interscience, HobokenzbMATHCrossRefGoogle Scholar
  45. Papadopoulos AN (2008) Trajectory retrieval with latent semantic analysis. In: SAC’08, pp 1089–1094Google Scholar
  46. Park S, Kim S-W (2006) Prefix-querying with an 11 distance metric for time-series subsequence matching under time warpingGoogle Scholar
  47. Popivanov I, Miller RJ (2002) Similarity search over time-series data using wavelets. In: ICDEGoogle Scholar
  48. Ratanamahatana CA, Keogh EJ (2005) Three myths about dynamic time warping data mining. In: SDMGoogle Scholar
  49. Sakurai Y, Yoshikawa M, Faloutsos C (2005) Ftw: fast similarity search under the time warping distance. In: PODS’05, pp 326–337Google Scholar
  50. Salzberg S (1997) On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Discov 1(3): 317–328CrossRefGoogle Scholar
  51. Steinbach M, Tan P-N, Kumar V, Klooster SA, Potter C (2003) Discovery of climate indices using clustering. In: KDDGoogle Scholar
  52. Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, ReadingGoogle Scholar
  53. Tansel A, Clifford J, Jajodia S, Segev A, Snodgrass R (1993) Temporal databases: theory and implementation. Benjamin/Cummings Publishing Co., Menlo ParkGoogle Scholar
  54. Vlachos M, Gunopulos D, Kollios G (2002) Discovering similar multidimensional trajectories. In: ICDE, pp 673–684Google Scholar
  55. Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh EJ (2006) Indexing multidimensional time-series. VLDB J 15(1): 1–20CrossRefGoogle Scholar
  56. Workshop and challenge on time series classification at SIGKDD (2007). http://www.cs.ucr.edu/~eamonn/SIGKDD2007TimeSeries.html
  57. Wu Y-L, Agrawal D, Abbadi AE (2000) A comparison of DFT and DWT based similarity search in time-series databases. In: CIKMGoogle Scholar
  58. Xi X, Keogh EJ, Shelton CR, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In: ICMLGoogle Scholar
  59. Yi B-K, Faloutsos C (2000) Fast time sequence indexing for arbitrary Lp norms. In: VLDBGoogle Scholar
  60. Yi B-K, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: ICDE. IEEE Computer SocietyGoogle Scholar
  61. Zhang G HB, Kinsner W (2009) Electrocardiogram data mining based on frame classification by dynamic time warping matching. Comput Methods Biomech Biomed EngGoogle Scholar
  62. Zhou M, Wong MH (2007) Boundary-based lower-bound functions for dynamic time warping and their indexing. In: ICDE’07, pp 1307–1311Google Scholar
  63. Zhu Y, Shasha D (2003) Warping indexes with envelope transforms for query by humming. In: SIGMOD conferenceGoogle Scholar

Copyright information

© The Author(s) 2012

Authors and Affiliations

  • Xiaoyue Wang
    • 1
  • Abdullah Mueen
    • 1
  • Hui Ding
    • 2
  • Goce Trajcevski
    • 2
  • Peter Scheuermann
    • 2
  • Eamonn Keogh
    • 1
  1. 1.University of California RiversideRiversideUSA
  2. 2.Northwestern UniversityEvanstonUSA

Personalised recommendations