Pattern Analysis and Applications

, Volume 16, Issue 4, pp 535–548 | Cite as

A shape-based similarity measure for time series data with ensemble learning

  • Tetsuya Nakamura
  • Keishi Taki
  • Hiroki Nomiya
  • Kazuhiro Seki
  • Kuniaki Uehara
Theoretical Advances

Abstract

This paper introduces a shape-based similarity measure, called the angular metric for shape similarity (AMSS), for time series data. Unlike most similarity or dissimilarity measures, AMSS is based not on individual data points of a time series but on vectors equivalently representing it. AMSS treats a time series as a vector sequence to focus on the shape of the data and compares data shapes by employing a variant of cosine similarity. AMSS is, by design, expected to be robust to time and amplitude shifting and scaling, but sensitive to short-term oscillations. To deal with the potential drawback, ensemble learning is adopted, which integrates data smoothing when AMSS is used for classification. Evaluative experiments reveal distinct properties of AMSS and its effectiveness when applied in the ensemble framework as compared to existing measures.

Keywords

Time series analysis Similarity measures Machine learning 

References

  1. 1.
    Aßfalg J, Kriegel H-P, Kröger P, Kunath P, Pryakhin A, and Renz M (2006) Similarity search on time series based on threshold queries. In: Proceedings of the 10th international conference on extending database technology, pp 276–294Google Scholar
  2. 2.
    Berndt DJ, Clifford J (1996) Finding patterns in time series: a dynamic programming approach. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI, Menlo Park, pp 229–248Google Scholar
  3. 3.
    Chan K-P, Fu AW-C (1999) Efficient time series matching by wavelets. In: Proceedings of the 15th international conference on data engineering, pp 126–133Google Scholar
  4. 4.
    Chen L, Ng R (2004) On the marriage of lp-norms and edit distance. In: Proceedings of the 30th international conference on very large data bases, pp 792–803Google Scholar
  5. 5.
    Chen L, Øzsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 491–502Google Scholar
  6. 6.
    Chen Y, Nascimento MA, Ooi BC, Tung AKH (2007) Spade: on shape-based pattern detection in streaming time series. In: Proceedings of the IEEE 23rd international conference on data engineering, pp 786–795Google Scholar
  7. 7.
    Das G, Gunopulos D, Mannila H (1997) Finding similar time series. In: Proceedings of the first european symposium on principles of data mining and knowledge discovery, pp 88–100Google Scholar
  8. 8.
    Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the 1st international workshop on multiple classifier systems, pp 1–15Google Scholar
  9. 9.
    Ding H, Trajcevski G, Scheuermann P, Wang X, and Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measure. Proc VLDB Endowment 1:1542–1552Google Scholar
  10. 10.
    Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM SIGMOD international conference on management of data, pp 419–429Google Scholar
  11. 11.
    Frentzos E, Gratsias K, Theodoridis Y (2007) Index-based most similar trajectory search. In: Proceedings of the IEEE 23rd international conference on data engineering, pp 816–825Google Scholar
  12. 12.
    Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Proceedings of the second european conference on computational learning theory, pp 23–37Google Scholar
  13. 13.
    Freund Y, and Schapire RE (1997) A decision theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Geurts P, Wehenkel L (2005) Segment and combine approach for non-parametric time-series classification. In: Proceedings of the 9th European conference on principles and practice of knowledge discovery in databases, pp 478–485Google Scholar
  15. 15.
    Gunopulos D, Das G (2000) Time series similarity measures. In: Tutorial notes of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 243–307Google Scholar
  16. 16.
    Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23(1):67–72CrossRefGoogle Scholar
  17. 17.
    Keogh E (2002) Exact indexing of dynamic time warping. In: Proceedings of the 28th international conference on very large data bases, pp 406–417Google Scholar
  18. 18.
    Keogh E, Xi X, Wei L, Ratanamahatana CA (2006) The UCR time series classification/clustering homepage. http://www.cs.ucr.edu/˜eamonn/time_series_data/
  19. 19.
    Keogh EJ, Pazzani MJ (2001) Derivative dynamic time warping. In: Proceedings of the 1st SIAM international conference on data mining, pp 1–11Google Scholar
  20. 20.
    Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11Google Scholar
  21. 21.
    Liu Y, Yao X (1999) Ensemble learning via negative correlation. Neural Netw 12(10):1399–1404CrossRefGoogle Scholar
  22. 22.
    Morse MD, Patel JM (2007) An efficient and accurate method for evaluating time series similarity. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 569–580Google Scholar
  23. 23.
    Nomiya H, Uehara K (2007) Multistrategical approach in visual learning. In: Proceedings of the 8th Asian conference on computer vision, pp 502–511Google Scholar
  24. 24.
    Pavlidis T, Horowitz SL (1974) Segmentation of plane curves. IEEE Trans Comput 23:860–870MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Rafiei D, Mendelzon A (1997) Similarity-based queries for time series data. In: Proceedings of the 1997 ACM SIGMOD international conference on management of data, pp 13–25Google Scholar
  26. 26.
    Rodríguez JJ, Kuncheva LI (2007) Time series classification: Decision forests and SVM on interval and DTW features. In: Proceedings of the workshop and challenge on time series classificationGoogle Scholar
  27. 27.
    Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):46–49CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2012

Authors and Affiliations

  • Tetsuya Nakamura
    • 1
  • Keishi Taki
    • 1
  • Hiroki Nomiya
    • 2
  • Kazuhiro Seki
    • 1
  • Kuniaki Uehara
    • 1
  1. 1.Kobe UniversityKobeJapan
  2. 2.Kyoto Institute of TechnologyKyotoJapan

Personalised recommendations