Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency

  • David Novak
  • Petr Volny
  • Pavel Zezula
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7447)

Abstract

Subsequence matching has appeared to be an ideal approach for solving many problems related to the fields of data mining and similarity retrieval. It has been shown that almost any data class (audio, image, biometrics, signals) is or can be represented by some kind of time series or string of symbols, which can be seen as an input for various subsequence matching approaches. The variety of data types, specific tasks and their solutions is so wide that their proper comparison and combination suitable for a particular task might be very complicated and time-consuming. In this work, we present a new generic Subsequence Matching Framework (SMF) that tries to overcome the aforementioned problem by a uniform frame that simplifies and speeds up the design, development and evaluation of subsequence matching related systems. We identify several relatively separate subtasks solved differently over the literature and SMF enables to combine them in a straightforward manner achieving new quality and efficiency. The strictly modular architecture and openness of SMF enables also involvement of efficient solutions from different fields, for instance advanced metric-based indexes.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Keogh, E., Kasetty, S.: On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. In: Proceedings of ACM SIGKDD 2002, pp. 102–111. ACM Press (2002)Google Scholar
  2. 2.
    Keogh, E., Zhu, Q., Hu, B., Hay, Y., Xi, X., Wei, L., Ratanamahatana, C.A.: The UCR Time Series Classification/Clustering Homepage (2011)Google Scholar
  3. 3.
    Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Springer (2006)Google Scholar
  4. 4.
    Novak, D., Batko, M., Zezula, P.: Metric Index: An Efficient and Scalable Solution for Precise and Approximate Similarity Search. Information Systems 36(4), 721–733 (2011)CrossRefGoogle Scholar
  5. 5.
    Novak, D., Volny, P., Zezula, P.: Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency. Technical report, arXiv:1206.2510v1 (2012)Google Scholar
  6. 6.
    Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast Subsequence Matching in Time-Series Databases. ACM SIGMOD Record 23(2), 419–429 (1994)CrossRefGoogle Scholar
  7. 7.
    Guttman, A.: R-Trees: A Dynamic Index Structure for Spacial Searching. ACM SIGMOD Record 14(2), 47–57 (1984)CrossRefGoogle Scholar
  8. 8.
    Moon, Y.S., Whang, K.Y., Loh, W.K.: Duality-Based Subsequence Matching in Time-Series Databases. In: Proceedings of the 17th International Conference on Data Engineering, p. 263 (2001)Google Scholar
  9. 9.
    Moon, Y.S., Whang, K.Y., Han, W.S.: General Match: A Subsequence Matching Method in Time-series Databases Based on Generalized Windows. In: International Conference on Management of Data, p. 382 (2002)Google Scholar
  10. 10.
    Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowledge and Information Systems 7(3), 358–386 (2004)CrossRefGoogle Scholar
  11. 11.
    Han, W.S., Lee, J., Moon, Y.S., Jiang, H.: Ranked Subsequence Matching in Time-series Databases. In: Proceedings VLDB 2007, pp. 423–434. ACM (2007)Google Scholar
  12. 12.
    Chan, K.P., Fu, A.W.C.: Efficient Time Series Matching by Wavelets. In: Proceedings ICDE 1999, pp. 126–133 (1999)Google Scholar
  13. 13.
    Korn, F., Jagadish, H.V., Faloutsos, C.: Efficiently supporting ad hoc queries in large datasets of time sequences. ACM SIGMOD Record 26(2), 289–300 (1997)CrossRefGoogle Scholar
  14. 14.
    Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowledge and Information Systems 3(3), 263–286 (2001)MATHCrossRefGoogle Scholar
  15. 15.
    Perng, C.S., Wang, H., Zhang, S.R., Parker, D.S.: Landmarks: A New Model for Similarity-based Pattern Querying in Time Series Databases. In: Proceedings of ICDE 2000, pp. 33–42. IEEE Computer Society, Washington, DC (2000)Google Scholar
  16. 16.
    Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics Speech and Signal Processing 26(1), 43–49 (1978)MATHCrossRefGoogle Scholar
  17. 17.
    Chen, L., Ng, R.: On the Marriage of Lp-norms and Edit Distance. In: Proceedings of VLDB 2004, pp. 792–803 (2004)Google Scholar
  18. 18.
    Shieh, J., Keogh, E.: i SAX. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, p. 623. ACM Press, New York (2008)CrossRefGoogle Scholar
  19. 19.
    Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: iSAX 2.0: Indexing and Mining One Billion Time Series. In: 2010 IEEE International Conference on Data Mining, pp. 58–67. IEEE (2010)Google Scholar
  20. 20.
    Batko, M., Novak, D., Zezula, P.: MESSIF: Metric Similarity Search Implementation Framework. In: Thanos, C., Borri, F., Candela, L. (eds.) Digital Libraries: R&D. LNCS, vol. 4877, pp. 1–10. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • David Novak
    • 1
  • Petr Volny
    • 1
  • Pavel Zezula
    • 1
  1. 1.Masaryk UniversityBrnoCzech Republic

Personalised recommendations