Data Mining and Knowledge Discovery

, Volume 30, Issue 5, pp 1273–1298 | Cite as

Scalable time series classification

Article

Abstract

Time series classification tries to mimic the human understanding of similarity. When it comes to long or larger time series datasets, state-of-the-art classifiers reach their limits because of unreasonably high training or testing times. One representative example is the 1-nearest-neighbor dynamic time warping classifier (1-NN DTW) that is commonly used as the benchmark to compare to. It has several shortcomings: it has a quadratic time complexity in the time series length and its accuracy degenerates in the presence of noise. To reduce the computational complexity, early abandoning techniques, cascading lower bounds, or recently, a nearest centroid classifier have been introduced. Still, classification times on datasets of a few thousand time series are in the order of hours. We present our Bag-Of-SFA-Symbols in Vector Space classifier that is accurate, fast and robust to noise. We show that it is significantly more accurate than 1-NN DTW while being multiple orders of magnitude faster. Its low computational complexity combined with its good classification accuracy makes it relevant for use cases like long or large amounts of time series or real-time analytics.

Keywords

Time series Classification Data mining Symbolic representation 

Notes

Acknowledgments

The author would like to thank Claudia Eichert-Schäfer, Florian Schintke, the anonymous reviewers and the owners of the datasets.

Compliance with Ethical Standards

Funding

This project was motived and partially funded by the German Federal Ministry of Education and Research through the project “Berlin Big Data Center (BBDC)”, Funding mark: 01IS14013A.

Conflict of Interest

The author P. Schäfer received research grants from this project.

References

  1. Aucouturier JJ, Defreville B, Pachet F (2007) The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. J Acoust Soc Am 122(2):881–891CrossRefGoogle Scholar
  2. Bagnall A, Lines J, Hills J, Bostrom A (2015) Time-series classification with COTE: the collective of transformation-based ensembles. IEEE Trans Knowl Data Eng 27(9):2522–2535CrossRefGoogle Scholar
  3. Bagnall A, Davis LM, Hills J, Lines J (2012) Transformation based ensembles for time series classification. In: Proceedings of the 2012 SIAM international conference on data mining, vol 12. SIAM, pp 307–318Google Scholar
  4. Bagnall A, Lines J (2014) An experimental evaluation of nearest neighbour time series classification. arXiv:1406.4757
  5. Batista G, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: Proceedings of the 2011 SIAM international conference on data mining. SIAM, pp 699–710Google Scholar
  6. Baydogan MG, Runger G, Tuv E (2013) A bag-of-features framework to classify time series. IEEE Trans Pattern Anal Mach Intell 35(11):2796–2802CrossRefGoogle Scholar
  7. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATHGoogle Scholar
  8. Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representationsx and distance measures. In: Proceedings of the VLDB endowment. Number 2, VLDB Endowment, pp 1542–1552Google Scholar
  9. Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv 45(1):12:1–12:34CrossRefMATHGoogle Scholar
  10. Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: Proceedings of the 2014 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM 392–401Google Scholar
  11. Hu B, Chen Y, Keogh E (2013) Time series classification under more realistic assumptions. In: Proceedings of the 2013 SIAM international conference on data mining. SIAM, pp 578–586Google Scholar
  12. Jerzak Z, Ziekow H (2014) The DEBS 2014 grand challenge. In: Proceedings of the 2014 ACM international conference on distributed event-based systems. ACM, pp 266–269Google Scholar
  13. Kumar N, Lolla VN, Keogh EJ, Lonardi S, Ratanamahatana CA (2005) Time-series bitmaps: a practical visualization tool for working with large time series databases. In: Proceedings of the 2005 SIAM international conference on data mining. SIAM, pp 531–535Google Scholar
  14. Lin J, Keogh EJ, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144MathSciNetCrossRefGoogle Scholar
  15. Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315CrossRefGoogle Scholar
  16. Lines J, Bagnall A (2014) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 29(3):565–592MathSciNetCrossRefGoogle Scholar
  17. Mueen A, Keogh EJ, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 2011 ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1154–1162Google Scholar
  18. Mutschler C, Ziekow H, Jerzak Z (2013) The DEBS 2013 grand challenge. In: Proceedings of the 2013 ACM international conference on distributed event-based systems. ACM, pp 289–294Google Scholar
  19. Petitjean F, Forestier G, Webb GI, Nicholson AE, Chen Y, Keogh E (2014) Dynamic Time Warping averaging of time series allows faster and more accurate classification. In: Proceedings of the 2014 IEEE international conference on data mining, IEEE, pp 470–479Google Scholar
  20. Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 2012 ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 262–270Google Scholar
  21. Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of the 2013 SIAM international conference on data mining. SIAMGoogle Scholar
  22. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620CrossRefMATHGoogle Scholar
  23. Schäfer P (2014) Towards time series classification without human preprocessing. In: Machine learning and data mining in pattern recognition. Springer, Berlin, pp 228–242Google Scholar
  24. Schäfer P (2015) The BOSS is concerned with time series classification in the presence of noise. Data Min Knowl Discov 29(6):1505–1530MathSciNetCrossRefGoogle Scholar
  25. Schäfer P, Högqvist M (2012) SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In: Proceedings of the 2012 international conference on extending database technology. ACM, pp 516–527Google Scholar
  26. Senin P, Malinchik S (2013) SAX-VSM: Interpretable time series classification using SAX and vector space model. In: Proceedings of the 2013 IEEE international conference on data mining. IEEE, pp 1175–1180Google Scholar
  27. Urbanski J, Weber M (2012) Big Data im Praxiseinsatz–Szenarien, Beispiele, Effekte. http://www.bitkom.org/files/documents/BITKOM_LF_big_data_2012_online(1)

Copyright information

© The Author(s) 2015

Authors and Affiliations

  1. 1.Zuse Institute BerlinBerlinGermany

Personalised recommendations