# Classification of time series by shapelet transformation

- 3.8k Downloads
- 68 Citations

## Abstract

Time-series classification (TSC) problems present a specific challenge for classification algorithms: how to measure similarity between series. A *shapelet* is a time-series subsequence that allows for TSC based on local, phase-independent similarity in shape. Shapelet-based classification uses the similarity between a shapelet and a series as a discriminatory feature. One benefit of the shapelet approach is that shapelets are comprehensible, and can offer insight into the problem domain. The original shapelet-based classifier embeds the shapelet-discovery algorithm in a decision tree, and uses information gain to assess the quality of candidates, finding a new shapelet at each node of the tree through an enumerative search. Subsequent research has focused mainly on techniques to speed up the search. We examine how best to use the shapelet primitive to construct classifiers. We propose a single-scan shapelet algorithm that finds the best \(k\) shapelets, which are used to produce a transformed dataset, where each of the \(k\) features represent the distance between a time series and a shapelet. The primary advantages over the embedded approach are that the transformed data can be used in conjunction with any classifier, and that there is no recursive search for shapelets. We demonstrate that the transformed data, in conjunction with more complex classifiers, gives greater accuracy than the embedded shapelet tree. We also evaluate three similarity measures that produce equivalent results to information gain in less time. Finally, we show that by conducting post-transform clustering of shapelets, we can enhance the interpretability of the transformed data. We conduct our experiments on 29 datasets: 17 from the UCR repository, and 12 we provide ourselves.

## Keywords

Support Vector Machine Classification Accuracy Bayesian Network Information Gain Near Neighbour## References

- Bagnall A, Hills J, Lines J (2012) Shapelet based time series classification. http://www.uea.ac.uk/computing/machine-learning/Shapelets. Accessed 14 May 2013
- Bagnall A, Davis L, Hills J, Lines J (2012) Transformation based ensembles for time series classification. Proceedings of the twelfth SIAM conference on data mining (SDM)Google Scholar
- Batista G, Wang X, Keogh E (2011) A complexity-invariant distance measure for time series. Proceedings of the eleventh SIAM conference on data mining (SDM)Google Scholar
- Bober M (2001) Mpeg-7 visual shape descriptors. IEEE Trans Circ Syst Video Technol 11(6):716–719CrossRefGoogle Scholar
- Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefzbMATHGoogle Scholar
- Buza K (2011) Fusion methods for time-series classification. Ph.D. thesis, University of Hildesheim, GermanyGoogle Scholar
- Campana S, Casselman J (1993) Stock discrimination using otolith shape analysis. Can J Fish Aquat Sci 50(5):1062–1083CrossRefGoogle Scholar
- Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
- Davis LM, Theobald B-J, Lines J, Toms A, Bagnall A (2012) On the segmentation and classification of hand radiographs. Int J Neural Syst 22(5):1250020Google Scholar
- Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30zbMATHMathSciNetGoogle Scholar
- Deng H, Runger G, Tuv E, Vladimir M (2011) A time series forest for classification and feature extraction. Tech. rep., Arizona State UniversityGoogle Scholar
- De Vries D, Grimes C, Prager M (2002) Using otolith shape analysis to distinguish eastern gulf of mexico and atlantic ocean stocks of king mackerel. Fish Res 57(1):51–62CrossRefGoogle Scholar
- Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552Google Scholar
- Duarte-Neto P, Lessa R, Stosic B, Morize E (2008) The use of sagittal otoliths in discriminating stocks of common dolphinfish (coryphaena hippurus) off northeastern brazil using multishape descriptors. ICES J Mar Sci J du Conseil 65(7):1144–1152CrossRefGoogle Scholar
- Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163CrossRefzbMATHGoogle Scholar
- Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18CrossRefGoogle Scholar
- Hartmann B, Link N (2010) Gesture recognition with inertial sensors and optimized dtw prototypes. Systems man and cybernetics (SMC), 2010 IEEE international conference on. IEEE pp 2102–2109Google Scholar
- He Q, Dong Z, Zhuang F, Shi Z (2012) Fast Time Series Classification based on infrequent shapelets. Machine learning and applications (ICMLA), 2012 11th international conference on. IEEE, pp 215–219Google Scholar
- Hoare C (1962) Quicksort. Comput J 5(1):10–16CrossRefzbMATHMathSciNetGoogle Scholar
- Image Processing and Informatics Lab, University of Southern California, The Digital Hand Atlas Database System. http://www.ipilab.org/BAAweb
- Janacek G, Bagnall A, Powell M (2005) A likelihood ratio distance measure for the similarity between the fourier transform of time series. Proceedings of the Ninth Pacific-Asia Conference on knowledge discovery and data mining (PAKDD)Google Scholar
- Jeong Y, Jeong M, Omitaomu O (2010) Weighted dynamic time warping for time series classification. Pattern Recognit 44:2231–2240CrossRefGoogle Scholar
- Hu B, Chen Y, Keogh E (2013) Time series classification under more realistic assumptions. Proceedings of the thirteenth SIAM conference on data mining (SDM)Google Scholar
- Kruskal W (1952) A nonparametric test for the several sample problem. Ann Math Stat 23(4):525–540CrossRefzbMATHMathSciNetGoogle Scholar
- Latecki L, Lakamper R, Eckhardt T (2000) Shape descriptors for non-rigid shapes with a single closed contour. Computer vision and pattern recognition, 2000. Proceedings. IEEE conference on vol. 1. IEEE, pp 424–429Google Scholar
- Lines J, Bagnall A (2012) Alternative quality measures for time series shapelets. Intelligent data engineering and automated learning (IDEAL). Lect Notes Comput Sci 7435:475–483CrossRefGoogle Scholar
- Lines J, Davis L, Hills J, Bagnall A (2012) A shapelet transform for time series classification. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 289–297Google Scholar
- Mood AM, Graybill FA, Boes DC (1974) Introduction to the theory of statistics, 3rd edn. McGraw-Hill, New YorkGoogle Scholar
- Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1154–1162Google Scholar
- Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 262–270Google Scholar
- Rakthanmanon T, Keogh E (2013) Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets. Proceedings of the thirteenth SIAM conference on data mining (SDM)Google Scholar
- Rodriguez J, Alonso C (2005) Support vector machines of interval-based features for time series classification. Knowl-Based Syst 18:171–178Google Scholar
- Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. Pattern Anal Mach Intell IEEE Trans 28(10):1619–1630CrossRefGoogle Scholar
- Shannon C, Weaver W, Blahut R, Hajek B (1949) The mathematical theory of communication, vol 117. University of Illinois press, UrbanazbMATHGoogle Scholar
- Sivakumar P et al. (2012) Human gait recognition and classification using time series shapelets. International conference on advances in computing and communications (ICACC)Google Scholar
- Stransky C (2005) Geographic variation of golden redfish (sebastes marinus) and deep-sea redfish (s. mentella) in the north atlantic based on otolith shape analysis. ICES J Mar Sci J du Conseil 62(8):1691–1698CrossRefGoogle Scholar
- Wu Y, Agrawal D, Abbadi AE (2000) A comparison of dft and dwt based similarity search in time-series databases. Proceedings of the ninth international conference on information and knowledge management (ACM CIKM)Google Scholar
- Xing Z, Pei J, Yu P (2012) Early classification on time series. Knowl Inform syst 31(1):105–127CrossRefGoogle Scholar
- Xing Z, Pei J, Yu P, Wang K (2011) Extracting interpretable features for early classification on time series. Proceedings of the eleventh SIAM conference on data mining (SDM)Google Scholar
- Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 947–956Google Scholar
- Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1):149–182CrossRefzbMATHMathSciNetGoogle Scholar
- Zakaria J, Mueen A, Keogh E (2012) Clustering time series using unsupervised-shapelets. Data mining (ICDM), 2012 IEEE 12th international conference. pp. 785–794Google Scholar