Quality of Similarity Rankings in Time Series

  • Thomas Bernecker
  • Michael E. Houle
  • Hans-Peter Kriegel
  • Peer Kröger
  • Matthias Renz
  • Erich Schubert
  • Arthur Zimek
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6849)

Abstract

Time series data objects can be interpreted as high- dimensional vectors, which allows the application of many traditional distance measures as well as more specialized measures. However, many distance functions are known to suffer from poor contrast in high-dimensional settings, putting their usefulness as similarity measures into question. On the other hand, shared-nearest-neighbor distances based on the ranking of data objects induced by some primary distance measure have been known to lead to improved performance in high-dimensional settings. In this paper, we study the performance of shared-neighbor similarity measures in the context of similarity search for time series data objects. Our findings are that the use of shared-neighbor similarity measures generally results in more stable performances than that of their associated primary distance measures.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berndt, D., Clifford, J.: Using dynamic time warping to find patterns in time series. In: KDD Workshop (1994)Google Scholar
  2. 2.
    Vlachos, M., Kollios, G., Gunopulos, D.: Discovering similar multidimensional trajectories. In: Proc. ICDE (2002)Google Scholar
  3. 3.
    Chen, L., Özsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: Proc. SIGMOD (2005)Google Scholar
  4. 4.
    Chen, L., Ng, R.: On the marriage of Lp-norms and edit distance. In: Proc. VLDB (2004)Google Scholar
  5. 5.
    Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730. Springer, Heidelberg (1993)Google Scholar
  6. 6.
    Yi, B.K., Jagadish, H., Faloutsos, C.: Fast time sequence indexing for arbitrary Lp norms. In: Proc. VLDB (2000)Google Scholar
  7. 7.
    Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE TC 23(1), 90–93 (1974)MATHGoogle Scholar
  8. 8.
    Chan, K.P., Fu, A.W.C.: Efficient time series matching by wavelets. In: Proc. ICDE, pp. 126–133 (1999)Google Scholar
  9. 9.
    Cai, Y., Ng, R.: Index spatio-temporal trajectories with Chebyshev polynomials. In: Proc. SIGMOD (2004)Google Scholar
  10. 10.
    Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. IEEE TKDD 3(1), 1–58 (2009)CrossRefGoogle Scholar
  11. 11.
    Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared near neighbors. IEEE TC C-22(11), 1025–1034 (1973)Google Scholar
  12. 12.
    Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proc. SDM (2003)Google Scholar
  13. 13.
    Houle, M.E.: The relevant-set correlation model for data clustering. Stat. Anal. Data Min. 1(3), 157–176 (2008)CrossRefGoogle Scholar
  14. 14.
    Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in axis-parallel subspaces of high dimensional data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 831–838. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  15. 15.
    Houle, M.E., Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Can shared-neighbor distances defeat the curse of dimensionality? In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 482–500. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Houle, M.E.: Navigating massive data sets via local clustering. In: Proc. KDD (2003)Google Scholar
  17. 17.
    Achtert, E., Bernecker, T., Kriegel, H.P., Schubert, E., Zimek, A.: ELKI in time: ELKI 0.2 for the performance evaluation of distance measures for time series. In: Mamoulis, N., Seidl, T., Pedersen, T.B., Torp, K., Assent, I. (eds.) SSTD 2009. LNCS, vol. 5644, pp. 436–440. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  18. 18.
    Keogh, E., Xi, X., Wei, L., Ratanamahatana, C.A.: The UCR time series classification/clustering homepage (2006), http://www.cs.ucr.edu/~eamonn/time_series_data/
  19. 19.
    Saito, N.: Local feature extraction and its application using a library of bases. PhD thesis, Yale University (1994)Google Scholar
  20. 20.
    Pham, D.T., Chan, A.B.: Control chart pattern recognition using a new type of self-organizing neural network. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering 212(2), 115–127 (1998)Google Scholar
  21. 21.
    Gandhi, A.: Content-based image retrieval: Plant species identification. Master’s thesis, Oregon State University (2002)Google Scholar
  22. 22.
    Davis, S., Jacobson, A., Suszcynsky, D., Cai, M., Eads, D.: FORTE forte time series dataset (2005), http://nis-www.lanl.gov/~eads/datasets/forte
  23. 23.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  24. 24.
    Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: Proc. VLDB (2000)Google Scholar
  25. 25.
    Aggarwal, C.C., Hinneburg, A., Keim, D.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, p. 420. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  26. 26.
    Chakrabarti, K., Keogh, E., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. ACM TODS 27(2), 188–228 (2002)CrossRefGoogle Scholar
  27. 27.
    Verleysen, M., François, D.: The curse of dimensionality in data mining and time series prediction. In: Cabestany, J., Prieto, A.G., Sandoval, F. (eds.) IWANN 2005. LNCS, vol. 3512, pp. 758–770. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  28. 28.
    Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Time-series classification in many intrinsic dimensions. In: Proc. SDM (2010)Google Scholar
  29. 29.
    Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Hubs in space: Popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11, 2487–2531 (2010)MATHGoogle Scholar
  30. 30.
    Bennett, K.P., Fayyad, U., Geiger, D.: Density-based indexing for approximate nearest-neighbor queries. In: Proc. KDD (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Thomas Bernecker
    • 1
  • Michael E. Houle
    • 2
  • Hans-Peter Kriegel
    • 1
  • Peer Kröger
    • 1
  • Matthias Renz
    • 1
  • Erich Schubert
    • 1
  • Arthur Zimek
    • 1
  1. 1.Ludwig-Maximilians-Universität MünchenMünchenGermany
  2. 2.National Institute of InformaticsTokyoJapan

Personalised recommendations