Similarity search is one of the most important and probably best studied methods for data mining. In the context of time series analysis it reaches its limits when it comes to mining raw datasets. The raw time series data may be recorded at variable lengths, be noisy, or are composed of repetitive substructures. These build a foundation for state of the art search algorithms. However, noise has been paid surprisingly little attention to and is assumed to be filtered as part of a preprocessing step carried out by a human. Our Bag-of-SFA-Symbols (BOSS) model combines the extraction of substructures with the tolerance to extraneous and erroneous data using a noise reducing representation of the time series. We show that our BOSS ensemble classifier improves the best published classification accuracies in diverse application areas and on the official UCR classification benchmark datasets by a large margin.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
The BIDMC congestive heart failure database: http://www.physionet.org/physiobank/database/chfdb/. Accessed 2014.
UCR Time Series Classification/Clustering Homepage: http://www.cs.ucr.edu/~eamonn/time_series_data. Accessed 2014.
CMU Graphics Lab Motion Capture Database: http://mocap.cs.cmu.edu/. Accessed 2014.
Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. Foundations of Data Organization and Algorithms.
Albrecht S, Cumming I, Dudas J (1997) The momentary fourier transformation derived from recursive matrix transformations. In: IEEE Digital Signal Processing Proceedings.
Bagnall A, Davis LM, Hills J, Lines J (2012) Transformation based ensembles for time series classification. In: SDM. SIAM/Omnipress.
Batista G, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: SDM. SIAM/Omnipress.
Chen Q, Chen L, Lian X, Liu Y, Yu JX (2007) Indexable PLA for efficient similarity search. In: VLDB. ACM.
Ding H (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. VLDB Endowment.
Fast Shapelet Results: (2012) http://alumni.cs.ucr.edu/rakthant/FastShapelet/
Hu B, Chen Y, Keogh E (2013) Time series classification under more realistic assumptions. In: SDM.
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286.
Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the 8th KDD, ACM, pp. 102–111.
Kumar N, Lolla VN, Keogh EJ, Lonardi S (2005) Ratanamahatana, C.A.: Time-series bitmaps: a practical visualization tool for working with large time series databases. In: SDM.
Lin J, Keogh EJ, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144.
Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315.
Mueen A, Keogh EJ, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: KDD. ACM.
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: ACM SIGKDD. ACM.
Rakthanmanon T, Campana BJL, Mueen A, Batista GEAPA, Westover M, Zakaria J, Keogh EJ (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: KDD. ACM.
Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: SDM.
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 1:43–49
Schäfer P, Högqvist M (2012) SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In: EDBT. ACM.
Senin P, Malinchik S (2013) SAX-VSM: Interpretable time series classification using SAX and vector space model. In: IEEE 13th International Conference on Data Mining (ICDM) 2013.
Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: ICDE, San Jose.
Liao Warren T (2005) Clustering of time series data-a survey. Pattern Recognit 38(11):1857–1874
Webpage, The BOSS (2014) http://www.zib.de/patrick.schaefer/boss/
Ye L, Keogh EJ (2009) Time series shapelets: a new primitive for data mining. In: KDD. ACM.
Ye L, Keogh EJ (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1–2):149–182.
Zakaria J, Mueen A, Keogh EJ (2012) Clustering time series using unsupervised-shapelets. In: ICDM. IEEE Computer Society.
The author would like to thank the anonymous reviewers, Claudia Eichert-Schäfer, Florian Schintke, Florian Wende, and Ulf Leser for their valuable comments on the paper and the owners of the datasets.
Responsible editors: Toon Calders, Floriana Esposito, Eyke Hüllermeier, Rosa Meo.
About this article
Cite this article
Schäfer, P. The BOSS is concerned with time series classification in the presence of noise. Data Min Knowl Disc 29, 1505–1530 (2015). https://doi.org/10.1007/s10618-014-0377-7
- Time series
- Fourier transform