# A shape-based similarity measure for time series data with ensemble learning

- 774 Downloads
- 15 Citations

## Abstract

This paper introduces a shape-based similarity measure, called the angular metric for shape similarity (AMSS), for time series data. Unlike most similarity or dissimilarity measures, AMSS is based not on individual data points of a time series but on vectors equivalently representing it. AMSS treats a time series as a vector sequence to focus on the shape of the data and compares data shapes by employing a variant of cosine similarity. AMSS is, by design, expected to be robust to time and amplitude shifting and scaling, but sensitive to short-term oscillations. To deal with the potential drawback, ensemble learning is adopted, which integrates data smoothing when AMSS is used for classification. Evaluative experiments reveal distinct properties of AMSS and its effectiveness when applied in the ensemble framework as compared to existing measures.

## Keywords

Time series analysis Similarity measures Machine learning## Notes

### Acknowledgments

The authors would like to thank Takashi Okamura for his help with implementations and experiments.

## References

- 1.Aßfalg J, Kriegel H-P, Kröger P, Kunath P, Pryakhin A, and Renz M (2006) Similarity search on time series based on threshold queries. In: Proceedings of the 10th international conference on extending database technology, pp 276–294Google Scholar
- 2.Berndt DJ, Clifford J (1996) Finding patterns in time series: a dynamic programming approach. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI, Menlo Park, pp 229–248Google Scholar
- 3.Chan K-P, Fu AW-C (1999) Efficient time series matching by wavelets. In: Proceedings of the 15th international conference on data engineering, pp 126–133Google Scholar
- 4.Chen L, Ng R (2004) On the marriage of lp-norms and edit distance. In: Proceedings of the 30th international conference on very large data bases, pp 792–803Google Scholar
- 5.Chen L, Øzsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 491–502Google Scholar
- 6.Chen Y, Nascimento MA, Ooi BC, Tung AKH (2007) Spade: on shape-based pattern detection in streaming time series. In: Proceedings of the IEEE 23rd international conference on data engineering, pp 786–795Google Scholar
- 7.Das G, Gunopulos D, Mannila H (1997) Finding similar time series. In: Proceedings of the first european symposium on principles of data mining and knowledge discovery, pp 88–100Google Scholar
- 8.Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the 1st international workshop on multiple classifier systems, pp 1–15Google Scholar
- 9.Ding H, Trajcevski G, Scheuermann P, Wang X, and Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measure. Proc VLDB Endowment 1:1542–1552Google Scholar
- 10.Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM SIGMOD international conference on management of data, pp 419–429Google Scholar
- 11.Frentzos E, Gratsias K, Theodoridis Y (2007) Index-based most similar trajectory search. In: Proceedings of the IEEE 23rd international conference on data engineering, pp 816–825Google Scholar
- 12.Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Proceedings of the second european conference on computational learning theory, pp 23–37Google Scholar
- 13.Freund Y, and Schapire RE (1997) A decision theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MathSciNetCrossRefzbMATHGoogle Scholar
- 14.Geurts P, Wehenkel L (2005) Segment and combine approach for non-parametric time-series classification. In: Proceedings of the 9th European conference on principles and practice of knowledge discovery in databases, pp 478–485Google Scholar
- 15.Gunopulos D, Das G (2000) Time series similarity measures. In: Tutorial notes of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 243–307Google Scholar
- 16.Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23(1):67–72CrossRefGoogle Scholar
- 17.Keogh E (2002) Exact indexing of dynamic time warping. In: Proceedings of the 28th international conference on very large data bases, pp 406–417Google Scholar
- 18.Keogh E, Xi X, Wei L, Ratanamahatana CA (2006) The UCR time series classification/clustering homepage. http://www.cs.ucr.edu/˜eamonn/time_series_data/
- 19.Keogh EJ, Pazzani MJ (2001) Derivative dynamic time warping. In: Proceedings of the 1st SIAM international conference on data mining, pp 1–11Google Scholar
- 20.Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11Google Scholar
- 21.Liu Y, Yao X (1999) Ensemble learning via negative correlation. Neural Netw 12(10):1399–1404CrossRefGoogle Scholar
- 22.Morse MD, Patel JM (2007) An efficient and accurate method for evaluating time series similarity. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 569–580Google Scholar
- 23.Nomiya H, Uehara K (2007) Multistrategical approach in visual learning. In: Proceedings of the 8th Asian conference on computer vision, pp 502–511Google Scholar
- 24.Pavlidis T, Horowitz SL (1974) Segmentation of plane curves. IEEE Trans Comput 23:860–870MathSciNetCrossRefzbMATHGoogle Scholar
- 25.Rafiei D, Mendelzon A (1997) Similarity-based queries for time series data. In: Proceedings of the 1997 ACM SIGMOD international conference on management of data, pp 13–25Google Scholar
- 26.Rodríguez JJ, Kuncheva LI (2007) Time series classification: Decision forests and SVM on interval and DTW features. In: Proceedings of the workshop and challenge on time series classificationGoogle Scholar
- 27.Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):46–49CrossRefGoogle Scholar