Data Mining and Knowledge Discovery

, Volume 29, Issue 6, pp 1505–1530 | Cite as

The BOSS is concerned with time series classification in the presence of noise

Article

Abstract

Similarity search is one of the most important and probably best studied methods for data mining. In the context of time series analysis it reaches its limits when it comes to mining raw datasets. The raw time series data may be recorded at variable lengths, be noisy, or are composed of repetitive substructures. These build a foundation for state of the art search algorithms. However, noise has been paid surprisingly little attention to and is assumed to be filtered as part of a preprocessing step carried out by a human. Our Bag-of-SFA-Symbols (BOSS) model combines the extraction of substructures with the tolerance to extraneous and erroneous data using a noise reducing representation of the time series. We show that our BOSS ensemble classifier improves the best published classification accuracies in diverse application areas and on the official UCR classification benchmark datasets by a large margin.

Keywords

Time series Classification Similarity Noise  Fourier transform 

References

  1. Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. Foundations of Data Organization and Algorithms.Google Scholar
  2. Albrecht S, Cumming I, Dudas J (1997) The momentary fourier transformation derived from recursive matrix transformations. In: IEEE Digital Signal Processing Proceedings.Google Scholar
  3. Bagnall A, Davis LM, Hills J, Lines J (2012) Transformation based ensembles for time series classification. In: SDM. SIAM/Omnipress.Google Scholar
  4. Batista G, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: SDM. SIAM/Omnipress.Google Scholar
  5. Chen Q, Chen L, Lian X, Liu Y, Yu JX (2007) Indexable PLA for efficient similarity search. In: VLDB. ACM.Google Scholar
  6. Ding H (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. VLDB Endowment.Google Scholar
  7. Hu B, Chen Y, Keogh E (2013) Time series classification under more realistic assumptions. In: SDM.Google Scholar
  8. Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286.Google Scholar
  9. Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the 8th KDD, ACM, pp. 102–111.Google Scholar
  10. Kumar N, Lolla VN, Keogh EJ, Lonardi S (2005) Ratanamahatana, C.A.: Time-series bitmaps: a practical visualization tool for working with large time series databases. In: SDM.Google Scholar
  11. Lin J, Keogh EJ, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144.Google Scholar
  12. Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315.Google Scholar
  13. Mueen A, Keogh EJ, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: KDD. ACM.Google Scholar
  14. Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: ACM SIGKDD. ACM.Google Scholar
  15. Rakthanmanon T, Campana BJL, Mueen A, Batista GEAPA, Westover M, Zakaria J, Keogh EJ (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: KDD. ACM.Google Scholar
  16. Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: SDM.Google Scholar
  17. Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 1:43–49CrossRefGoogle Scholar
  18. Schäfer P, Högqvist M (2012) SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In: EDBT. ACM.Google Scholar
  19. Senin P, Malinchik S (2013) SAX-VSM: Interpretable time series classification using SAX and vector space model. In: IEEE 13th International Conference on Data Mining (ICDM) 2013.Google Scholar
  20. Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: ICDE, San Jose.Google Scholar
  21. Liao Warren T (2005) Clustering of time series data-a survey. Pattern Recognit 38(11):1857–1874CrossRefMATHGoogle Scholar
  22. Ye L, Keogh EJ (2009) Time series shapelets: a new primitive for data mining. In: KDD. ACM.Google Scholar
  23. Ye L, Keogh EJ (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1–2):149–182.Google Scholar
  24. Zakaria J, Mueen A, Keogh EJ (2012) Clustering time series using unsupervised-shapelets. In: ICDM. IEEE Computer Society.Google Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  1. 1.Zuse Institute BerlinBerlinGermany

Personalised recommendations