Advertisement

Time Series Retrieval Using DTW-Preserving Shapelets

  • Ricardo Carlini Sperandio
  • Simon Malinowski
  • Laurent Amsaleg
  • Romain Tavenard
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11223)

Abstract

Dynamic Time Warping (DTW) is a very popular similarity measure used for time series classification, retrieval or clustering. DTW is, however, a costly measure, and its application on numerous and/or very long time series is difficult in practice. This paper proposes a new approach for time series retrieval: time series are embedded into another space where the search procedure is less computationally demanding, while still accurate. This approach is based on transforming time series into high-dimensional vectors using DTW-preserving shapelets. That transform is such that the relative distance between the vectors in the Euclidean transformed space well reflects the corresponding DTW measurements in the original space. We also propose strategies for selecting a subset of shapelets in the transformed space, resulting in a trade-off between the complexity of the transformation and the accuracy of the retrieval. Experimental results using the well known UCR time series demonstrate the importance of this trade-off.

Notes

Acknowledgments

The current work has been performed with the support of CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico), Brazil (Process number 233209/2014–0). The authors are grateful to the TRANSFORM project funded by STIC-AMSUD (18-STIC-09) for the partial financial support to this work.

References

  1. 1.
    Blum, A., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Chen, Y., et al.: The UCR time series classification archive, July 2015. www.cs.ucr.edu/~eamonn/time_series_data/
  3. 3.
    Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.J.: Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB 1(2), 1542–1552 (2008)Google Scholar
  4. 4.
    Esling, P., Agón, C.: Time-series data mining. CSUR 45(1), 12:1–12:34 (2012)CrossRefGoogle Scholar
  5. 5.
    Grabocka, J., Schilling, N., Wistuba, M., Schmidt-Thieme, L.: Learning time-series shapelets. In: KDD, pp. 392–401. ACM (2014)Google Scholar
  6. 6.
    Hills, J., Lines, J., Baranauskas, E., Mapp, J., Bagnall, A.: Classification of time series by shapelet transformation. DMKD 28(4), 851–881 (2014)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Itakura, F.: Minimum prediction residual principle applied to speech recognition. IEEE Trans. Sig. Process. 23(1), 67–72 (1975)CrossRefGoogle Scholar
  8. 8.
    Keogh, E.J.: Exact indexing of dynamic time warping. In: VLDB, pp. 406–417. Morgan Kaufmann, Burlington (2002)CrossRefGoogle Scholar
  9. 9.
    Keogh, E.J., Chakrabarti, K., Pazzani, M.J., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. KAIS 3(3), 263–286 (2001)zbMATHGoogle Scholar
  10. 10.
    Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. 50(6), 94:1–94:45 (2017)CrossRefGoogle Scholar
  11. 11.
    Lods, A., Malinowski, S., Tavenard, R., Amsaleg, L.: Learning DTW-preserving shapelets. In: Adams, N., Tucker, A., Weston, D. (eds.) IDA 2017. LNCS, vol. 10584, pp. 198–209. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68765-0_17CrossRefGoogle Scholar
  12. 12.
    Moradi, P., Rostami, M.: A graph theoretic approach for unsupervised feature selection. Eng. Appl. AI 44, 33–45 (2015)Google Scholar
  13. 13.
    Papapetrou, P., Athitsos, V., Potamias, M., Kollios, G., Gunopulos, D.: Embedding-based subsequence matching in time-series databases. TODS 36(3), 17:1–17:39 (2011)CrossRefGoogle Scholar
  14. 14.
    Rakthanmanon, T., et al.: Searching and mining trillions of time series subsequences under DTW. In: KDD, pp. 262–270. ACM (2012)Google Scholar
  15. 15.
    Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  16. 16.
    Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Sig. Process. 26(1), 43–49 (1978)CrossRefGoogle Scholar
  17. 17.
    Shieh, J., Keogh, E.J.: iSAX: indexing and mining terabyte sized time series. In: KDD, pp. 623–631. ACM (2008)Google Scholar
  18. 18.
    Tan, C.W., Webb, G.I., Petitjean, F.: Indexing and classifying gigabytes of time series under time warping. In: SDM, pp. 282–290. SIAM (2017)CrossRefGoogle Scholar
  19. 19.
    Tavenard, R.: tslearn: a machine learning toolkit dedicated to time-series data (2017). https://github.com/rtavenar/tslearn
  20. 20.
    Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.J.: Experimental comparison of representation methods and distance measures for time series data. DMKD 26(2), 275–309 (2013)MathSciNetGoogle Scholar
  21. 21.
    Ye, L., Keogh, E.J.: Time series shapelets: a new primitive for data mining. In: KDD, pp. 947–956. ACM (2009)Google Scholar
  22. 22.
    Yi, B., Faloutsos, C.: Fast time sequence indexing for arbitrary Lp norms. In: VLDB, pp. 385–394. Morgan Kaufmann, Burlington (2000)Google Scholar
  23. 23.
    Zakaria, J., Mueen, A., Keogh, E.J.: Clustering time series using unsupervised-shapelets. In: ICDM, pp. 785–794. IEEE Computer Society (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.IRISA-InriaRennesFrance
  2. 2.IRISA-Univ. Rennes 1RennesFrance
  3. 3.CNRS-IRISARennesFrance
  4. 4.Univ. Rennes 2RennesFrance

Personalised recommendations