Advertisement

Scalable Similarity Matching in Streaming Time Series

  • Alice Marascu
  • Suleiman A. Khan
  • Themis Palpanas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7302)

Abstract

Nowadays online monitoring of data streams is essential in many real life applications, like sensor network monitoring, manufacturing process control, and video surveillance. One major problem in this area is the online identification of streaming sequences similar to a predefined set of pattern-sequences.

In this paper, we present a novel solution that extends the state of the art both in terms of effectiveness and efficiency. We propose the first online similarity matching algorithm based on Longest Common SubSequence that is specifically designed to operate in a streaming context, and that can effectively handle time scaling, as well as noisy data. In order to deal with high stream rates and multiple streams, we extend the algorithm to operate on multilevel approximations of the streaming data, therefore quickly pruning the search space. Finally, we incorporate in our approach error estimation mechanisms in order to reduce the number of false negatives.

We perform an extensive experimental evaluation using forty real datasets, diverse in nature and characteristics, and we also compare our approach to previous techniques. The experiments demonstrate the validity of our approach.

Keywords

data stream online similarity matching time series 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Airoldi, E., Faloutsos, C.: Recovering latent time-series from their observed sums: network tomography with particle filters. In: KDD 2004 (2004)Google Scholar
  2. 2.
    Borgne, Y.-A.L., Santini, S., Bontempi, G.: Adaptive model selection for time series prediction in wireless sensor networks. Signal Process. 87(12), 3010–3020 (2007)zbMATHCrossRefGoogle Scholar
  3. 3.
    Zhu, Y., Shasha, D.: Statstream: statistical monitoring of thousands of data streams in real time. In: VLDB 2002 (2002)Google Scholar
  4. 4.
    Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: iSAX 2.0: Indexing and Mining One Billion Time Series. In: ICDM 2010 (2010)Google Scholar
  5. 5.
    Dallachiesa, M., Nushi, B., Mirylenka, K., Palpanas, T.: Similarity Matching for Uncertain Time Series: Analytical and Experimental Comparison. In: QUeST 2011 (2011)Google Scholar
  6. 6.
    Wei, L., Keogh, E.J., Herle, H.V., Neto, A.M.: Atomic Wedgie: Efficient Query Filtering for Streaming Times Series. In: ICDM 2005, pp. 490–497 (2005)Google Scholar
  7. 7.
    Capitani, P., Ciaccia, P.: Warping the time on data streams. Data and Knowledge Engineering (62), 438–458 (2007)Google Scholar
  8. 8.
    Vlachos, M., Gunopulos, D., Kollios, G.: Discovering similar multidimensional trajectories. In: ICDE 2002, pp. 673–684 (2002)Google Scholar
  9. 9.
    Sakurai, Y., Faloutsos, C., Yamamuro, M.: Stream Monitoring under the Time Warping Distance. In: ICDE 2007 (2007)Google Scholar
  10. 10.
    Ratanamahatana, C.A., Keogh, E.: Everything you know about Dynamic Time Warping is Wrong. In: Third Workshop on Mining Temporal and Sequential Data 2004 (2004)Google Scholar
  11. 11.
    Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.: Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures. In: VLDB 2008 (2008)Google Scholar
  12. 12.
    Salvador, S., Chan, P.: FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space. Intelligent Data Analysis 11(5), 561–580 (2007)Google Scholar
  13. 13.
    Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. ASSP (1978)Google Scholar
  14. 14.
    Itakura, F.: Minimum Prediction Residual Principle Applied to Speech Recognition. ASSP 23, 52–72 (1975)Google Scholar
  15. 15.
    Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient Similarity Search in Sequence Databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)CrossRefGoogle Scholar
  16. 16.
    Chen, Y., Nascimento, M.A., Ooi, B.C., Tung, A.K.H.: SpADe: On Shape-based Pattern Detection in Streaming Time Series. In: ICDE 2007 (2007)Google Scholar
  17. 17.
    Marascu, A., Masseglia, F.: Mining Sequential Patterns from Data Streams: a Centroid Approach. J. Intell. Inf. Syst. 27(3), 291–307 (2006)CrossRefGoogle Scholar
  18. 18.
    Harada, L.: Detection of complex temporal patterns over data streams. Information Systems 29(6), 439–459 (2004)CrossRefGoogle Scholar
  19. 19.
    Lian, X., Chen, L., Yu, J.X., Wang, G., Yu, G.: Similarity Match Over High Speed Time-Series Streams. In: ICDE 2007 (2007)Google Scholar
  20. 20.
    Keogh, E.J., Chakrabarti, K., Pazzani, M.J., Mehrotra, S.: Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowl. Inf. Syst. 3(3) (2001)Google Scholar
  21. 21.
    Babcock, B., Datar, M., Motwani, R.: Sampling From a Moving Window Over Streaming Data. In: SODA 2002 (2002)Google Scholar
  22. 22.
    Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining Variance And k-medians Over Data Stream Windows. In: PODS, pp. 234–243 (2003)Google Scholar
  23. 23.
    Ben-David, S., Gehrke, J., Kifer, D.: Identifying Distribution Change in Data Streams. In: VLDB, Toronto, ON, Canada (2004)Google Scholar
  24. 24.
  25. 25.
    UCR: Time Series Data Archive, http://www.cs.ucr.edu/~eamonn/time_series_data/

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Alice Marascu
    • 1
  • Suleiman A. Khan
    • 2
  • Themis Palpanas
    • 1
  1. 1.University of TrentoItaly
  2. 2.Aalto UniversityFinland

Personalised recommendations