Scalable time series classification
- 1.6k Downloads
Time series classification tries to mimic the human understanding of similarity. When it comes to long or larger time series datasets, state-of-the-art classifiers reach their limits because of unreasonably high training or testing times. One representative example is the 1-nearest-neighbor dynamic time warping classifier (1-NN DTW) that is commonly used as the benchmark to compare to. It has several shortcomings: it has a quadratic time complexity in the time series length and its accuracy degenerates in the presence of noise. To reduce the computational complexity, early abandoning techniques, cascading lower bounds, or recently, a nearest centroid classifier have been introduced. Still, classification times on datasets of a few thousand time series are in the order of hours. We present our Bag-Of-SFA-Symbols in Vector Space classifier that is accurate, fast and robust to noise. We show that it is significantly more accurate than 1-NN DTW while being multiple orders of magnitude faster. Its low computational complexity combined with its good classification accuracy makes it relevant for use cases like long or large amounts of time series or real-time analytics.
KeywordsTime series Classification Data mining Symbolic representation
The author would like to thank Claudia Eichert-Schäfer, Florian Schintke, the anonymous reviewers and the owners of the datasets.
Compliance with Ethical Standards
This project was motived and partially funded by the German Federal Ministry of Education and Research through the project “Berlin Big Data Center (BBDC)”, Funding mark: 01IS14013A.
Conflict of Interest
The author P. Schäfer received research grants from this project.
- Bagnall A, Davis LM, Hills J, Lines J (2012) Transformation based ensembles for time series classification. In: Proceedings of the 2012 SIAM international conference on data mining, vol 12. SIAM, pp 307–318Google Scholar
- Bagnall A, Lines J (2014) An experimental evaluation of nearest neighbour time series classification. arXiv:1406.4757
- Batista G, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: Proceedings of the 2011 SIAM international conference on data mining. SIAM, pp 699–710Google Scholar
- Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representationsx and distance measures. In: Proceedings of the VLDB endowment. Number 2, VLDB Endowment, pp 1542–1552Google Scholar
- Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: Proceedings of the 2014 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM 392–401Google Scholar
- Hu B, Chen Y, Keogh E (2013) Time series classification under more realistic assumptions. In: Proceedings of the 2013 SIAM international conference on data mining. SIAM, pp 578–586Google Scholar
- Jerzak Z, Ziekow H (2014) The DEBS 2014 grand challenge. In: Proceedings of the 2014 ACM international conference on distributed event-based systems. ACM, pp 266–269Google Scholar
- Kumar N, Lolla VN, Keogh EJ, Lonardi S, Ratanamahatana CA (2005) Time-series bitmaps: a practical visualization tool for working with large time series databases. In: Proceedings of the 2005 SIAM international conference on data mining. SIAM, pp 531–535Google Scholar
- Mueen A, Keogh EJ, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 2011 ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1154–1162Google Scholar
- Mutschler C, Ziekow H, Jerzak Z (2013) The DEBS 2013 grand challenge. In: Proceedings of the 2013 ACM international conference on distributed event-based systems. ACM, pp 289–294Google Scholar
- Petitjean F, Forestier G, Webb GI, Nicholson AE, Chen Y, Keogh E (2014) Dynamic Time Warping averaging of time series allows faster and more accurate classification. In: Proceedings of the 2014 IEEE international conference on data mining, IEEE, pp 470–479Google Scholar
- Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 2012 ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 262–270Google Scholar
- Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of the 2013 SIAM international conference on data mining. SIAMGoogle Scholar
- Schäfer P (2014) Towards time series classification without human preprocessing. In: Machine learning and data mining in pattern recognition. Springer, Berlin, pp 228–242Google Scholar
- Schäfer P, Högqvist M (2012) SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In: Proceedings of the 2012 international conference on extending database technology. ACM, pp 516–527Google Scholar
- Senin P, Malinchik S (2013) SAX-VSM: Interpretable time series classification using SAX and vector space model. In: Proceedings of the 2013 IEEE international conference on data mining. IEEE, pp 1175–1180Google Scholar
- Urbanski J, Weber M (2012) Big Data im Praxiseinsatz–Szenarien, Beispiele, Effekte. http://www.bitkom.org/files/documents/BITKOM_LF_big_data_2012_online(1)