Towards Time Series Classification without Human Preprocessing

  • Patrick Schäfer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8556)

Abstract

Similarity search is a core functionality in many data mining algorithms. Over the past decade these algorithms were designed to mostly work with human assistance to extract characteristic, aligned patterns of equal length and scaling. Human assistance is not cost-effective. We propose our shotgun distance similarity metric that extracts, scales, and aligns segments from a query to a sample time series. This simplifies the classification of time series as produced by sensors. A time series is classified based on its segments at varying lengths as part of our shotgun ensemble classifier. It improves the best published accuracies on case studies in the context of bioacoustics, human motion detection, spectrographs or personalized medicine. Finally, it performs better than state of the art on the official UCR classification benchmark.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)CrossRefGoogle Scholar
  2. 2.
    Bagnall, A., Davis, L.M., Hills, J., Lines, J.: Transformation Based Ensembles for Time Series Classification. In: SDM, vol. 12, pp. 307–318. SIAM (2012)Google Scholar
  3. 3.
    Batista, G., Wang, X., Keogh, E.J.: A Complexity-Invariant Distance Measure for Time Series. In: SDM, vol. 11, pp. 699–710. SIAM/Omnipress (2011)Google Scholar
  4. 4.
  5. 5.
    Chen, Q., Chen, L., Lian, X., Liu, Y., Yu, J.X.: Indexable PLA for Efficient Similarity Search. In: VLDB, pp. 435–446. ACM (2007)Google Scholar
  6. 6.
    CMU Graphics Lab Motion Capture Database, http://mocap.cs.cmu.edu/
  7. 7.
    Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.: Querying and mining of time series data: experimental comparison of representations and distance measures. Proceedings of the VLDB Endowment 1(2), 1542–1552 (2008)CrossRefGoogle Scholar
  8. 8.
    Hu, B., Chen, Y., Keogh, E.: Time Series Classification under More Realistic Assumptions. In: SDM, pp. 578–586. SIAM (2013)Google Scholar
  9. 9.
    Jeong, Y., Jeong, M.K., Omitaomu, O.A.: Weighted dynamic time warping for time series classification. Pattern Recognition 44(9), 2231–2240 (2011)CrossRefGoogle Scholar
  10. 10.
    Kaggle: Go from Big Data to Big Analytics, https://www.kaggle.com
  11. 11.
    Keogh, E., Xi, X., Wei, L., Ratanamahatana, C.A.: UCR Time Series Classification/Clustering Homepage, http://www.cs.ucr.edu/~eamonn/time_series_data
  12. 12.
    Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. In: KDD, pp. 102–111. ACM (2002)Google Scholar
  13. 13.
    Lin, J., Keogh, E.J., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Mining and Knowledge Discovery 15(2) (2007)Google Scholar
  14. 14.
    Lin, J., Khade, R., Li, Y.: Rotation-invariant similarity in time series using bag-of-patterns representation. J. Intell. Inf. Syst. 39(2), 287–315 (2012)CrossRefGoogle Scholar
  15. 15.
    Lipowsky, C., Dranischnikow, E., Göttler, H., Gottron, T., Kemeter, M., Schömer, E.: Alignment of Noisy and Uniformly Scaled Time Series. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2009. LNCS, vol. 5690, pp. 675–688. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  16. 16.
    Mueen, A., Keogh, E.J., Young, N.: Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011, pp. 1154–1162. ACM (2011)Google Scholar
  17. 17.
    Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Searching and mining trillions of time series subsequences under dynamic time warping, pp. 262–270. ACM (2012)Google Scholar
  18. 18.
    Rakthanmanon, T., Keogh, E.: Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets. In: SDM, pp. 668–676. SIAM (2013)Google Scholar
  19. 19.
    Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust., Speech, Signal Processing (1), 43–49 (1978)Google Scholar
  20. 20.
    Schäfer, P., Dreßler, S.: Shooting Audio Recordings of Insects with SFA. In: AmiBio Workshop, Bonn, Germany (2013) (to appear)Google Scholar
  21. 21.
    Schäfer, P., Högqvist, M.: SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In: Rundensteiner, E.A., Markl, V., Manolescu, I., Amer-Yahia, S., Naumann, F., Ari, I. (eds.) EDBT, pp. 516–527. ACM (2012)Google Scholar
  22. 22.
  23. 23.
    UCR Insect Contest (2012), http://www.cs.ucr.edu/~eamonn/CE
  24. 24.
    Venter, J.C., et al.: The Sequence of the Human Genome. Science 291(5507), 1304–1351 (2001)CrossRefGoogle Scholar
  25. 25.
    Warren Liao, T.: Clustering of time series data—a survey. Pattern Recognition 38(11), 1857–1874 (2005)MATHCrossRefGoogle Scholar
  26. 26.
    Ye, L., Keogh, E.J.: Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. DMKD 22(1-2), 149–182 (2011)MATHMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Patrick Schäfer
    • 1
  1. 1.Zuse Institute BerlinBerlinGermany

Personalised recommendations