Scaling up Dynamic Time Warping to Massive Datasets

  • Eamonn J. Keogh
  • Michael J. Pazzani
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1704)

Abstract

There has been much recent interest in adapting data mining algorithms to time series databases. Many of these algorithms need to compare time series. Typically some variation or extension of Euclidean distance is used. However, as we demonstrate in this paper, Euclidean distance can be an extremely brittle distance measure. Dynamic time warping (DTW) has been suggested as a technique to allow more robust distance calculations, however it is computationally expensive. In this paper we introduce a modification of DTW which operates on a higher level abstraction of the data, in particular, a piecewise linear representation. We demonstrate that our approach allows us to outperform DTW by one to three orders of magnitude. We experimentally evaluate our approach on medical, astronomical and sign language data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Lin, K.I., Sawhney, H.S., Shim, K.: Fast similarity search in the presence of noise, scaling, and translation in times-series databases. In: VLDB (September 1995)Google Scholar
  2. Bay, S.: UCI Repository of Kdd databases. University of California, Irvine, Department of Information and Computer Science (1999), http://kdd.ics.uci.edu/
  3. Berndt, D., Clifford, J.: Using dynamic time warping to find patterns in time series. In: AAAI 1994 Workshop on Knowledge Discovery in Databases (KDD 1994), Seattle, Washington (1994)Google Scholar
  4. Caiani, E.G., Porta, A., Baselli, G., Turiel, M., Muzzupappa, S., Pieruzzi, F., Crema, C., Malliani, A., Cerutti, S.: Warped-average template technique to track on a cycle-by-cycle basis the cardiac filling phases on left ventricular volume. in: IEEE Computers in Cardiology. NY, USA, Vol. 25 Cat. No.98CH36292, (1998)Google Scholar
  5. Das, G., Lin, K., Mannila, H., Renganathan, G., Smyth, P.: Rule discovery form time series. In: Proceedings of the 4rd International Conference of Knowledge Discovery and Data Mining, pp. 16–22. AAAI Press, Menlo Park (1998)Google Scholar
  6. Debregeas, A., Hebrail, G.: Interactive interpretation of Kohonen maps applied to curves. In: Proceedings of the 4rd International Conference of Knowledge Discovery and Data Mining, pp. 179–183. AAAI Press, Menlo Park (1998)Google Scholar
  7. Derriere, S. (1998), D.E.N.I.S strip 3792: http://cdsweb.ustrasbg.fr/DENIS/qual_gif/cpl3792.dat
  8. Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: Proc. ACM SIGMOD Conf., Minneapolis (May 1994)Google Scholar
  9. Gavrila, D.M., Davis, L.S.: Towards 3-d model-based tracking and recognition of human movement: a multi-view approach. In: In International Workshop on Automatic Face and Gesture-Recognition, IEEE Computer Society, Zurich (1995)Google Scholar
  10. Gollmer, K., Posten, C.: Detection of distorted pattern using dynamic time warping algorithm and application for supervision of bioprocesses. In: Morris, A.J., Martin, E.B. (eds.) On-Line Fault Detection and Supervision in the Chemical Process Industries (1995)Google Scholar
  11. Hagit, S., Zdonik, S.: Approximate queries and representations for large data sequences. In: Proc. 12th IEEE International Conference on Data Engineering, New Orleans, Louisiana, pp. 546–553 (February 1996)Google Scholar
  12. Keogh, E., Pazzani, M.: An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In: Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining, pp. 239–241. AAAI Press, Menlo Park (1998)Google Scholar
  13. Keogh, E., Pazzani, M.: An indexing scheme for fast similarity search in large time series databases. In: Proceedings of the 11th International Conference on Scientific and Statistical Database Management (1999) (to appear)Google Scholar
  14. Keogh, E., Smyth, P.: A probabilistic approach to fast pattern matching in time series databases. In: Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining, pp. 20–24. AAAI Press, Menlo Park (1997)Google Scholar
  15. Kruskall, J.B., Liberman, M.: The symmetric time warping algorithm: From continuous to discrete. In: Time Warps, String Edits and Macromolecules: The Theory and Practice of String Comparison. Addison-Wesley, Reading (1983)Google Scholar
  16. Pavlidis, T., Horowitz, S.: Segmentation of plane curves. IEEE Transactions on Computers C-23(8) (August 1974)Google Scholar
  17. Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs (1993)Google Scholar
  18. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoustics, Speech, and Signal Proc. ASSP-26, 43–49 (1978)CrossRefGoogle Scholar
  19. Schmill, M., Oates, T., Cohen, P.: Learned models for continuous planning. In: Seventh International Workshop on Artificial Intelligence and Statistics (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Eamonn J. Keogh
    • 1
  • Michael J. Pazzani
    • 1
  1. 1.Department of Information and Computer ScienceUniversity of CaliforniaIrvineUSA

Personalised recommendations