Skip to main content
Log in

Speeding up pattern matching in streaming time-series via block vector and multilevel lower bound

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The pattern matching is one of the essential tasks in streaming time-series data mining. Its purpose is to identify all sliding windows in streaming time-series whose Euclidean Distances with predefined patterns are smaller than a threshold pre-determined. The pattern can be high-dimensional data and the streaming time-series is frequently updated. Thus, the brute-force method, which involves calculating Euclidean Distances between each sliding window and all patterns, is not effective in practical applications. This paper develops a lower bound-basedmethod that can perform pattern matching in less time while guaranteeing the same results as brute-force method. The proposed method achieves speedup without any sacrifice in matching accuracy. The block vector is utilized to calculate the lower bound of Euclidean Distance. Our proposal can safely eliminate many expensive Euclidean Distance calculations between patterns and sliding window; thus, the efficiency of pattern matching can be improved. Besides, we present an approach that can obtain the block vectors on-the-fly in the streaming scenarios to improve efficiency further. The experimental study in synthetic and real-life data sets verifies the efficiency and advantage of the proposals over the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Algorithm 2
Algorithm 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The synthetic data sets that support the findings of this study are available from the corresponding author upon reasonable request. The public online data sets are available for access on the website [47].

References

  1. Mohammadi M, Al-Fuqaha A, Sorour S, Guizani M (2018) Deep learning for IoT big data and streaming analytics: a survey. IEEE Commun Surv Tutor 20(4):2923–2960

    Article  Google Scholar 

  2. Gomes HM, Read J, Bifet A, Barddal JP, Gama J (2019) Machine learning for streaming data: state of the art, challenges, and opportunities. ACM SIGKDD Explor Newsl 21(2):6–22

    Article  Google Scholar 

  3. Bodenham DA, Adams NM (2017) Continuous monitoring for changepoints in data streams using adaptive estimation. Stat Comput 27(5):1257–1270

    Article  MathSciNet  Google Scholar 

  4. Butler B, Pearson RG, Birtles RA (2021) Water-quality and ecosystem impacts of recreation in streams: monitoring and management. Environ Chall 5:100328

    Article  Google Scholar 

  5. Henning S, Hasselbring W (2020) Scalable and reliable multi-dimensional sensor data aggregation in data streaming architectures. Data-Enabled Discov Appl 4(1):1–12

    Article  Google Scholar 

  6. Lin H, Wu S, Kou NM, Gao Y, Lu D et al (2018) Finding the hottest item in data streams. Inf Sci 430:314–330

    Article  ADS  MathSciNet  Google Scholar 

  7. Chen L, Zou L-J, Tu L (2012) A clustering algorithm for multiple data streams based on spectral component similarity. Inf Sci 183(1):35–47

    Article  Google Scholar 

  8. Wu J, Wang P, Pan N, Wang C, Wang W, Wang J (2019) Kv-match: a subsequence matching approach supporting normalization and time warping. In: 2019 IEEE 35th international conference on data engineering (ICDE), pp 866–877. IEEE

  9. Alghamdi N, Zhang L, Zhang H, Rundensteiner EA, Eltabakh MY (2020) Chainlink: indexing big time series data for long subsequence matching. In: 2020 IEEE 36th international conference on data engineering (ICDE), pp 529–540. IEEE

  10. Gong X, Fong S, Si Y-W (2019) Fast fuzzy subsequence matching algorithms on time-series. Expert Syst Appl 116:275–284

    Article  Google Scholar 

  11. Peng B, Fatourou P, Palpanas T (2021) Fast data series indexing for in-memory data. VLDB J, 1–27

  12. Linardi M, Palpanas T (2020) Scalable data series subsequence matching with ULISSE. VLDB J 29(6):1449–1474

    Article  Google Scholar 

  13. Lian X, Chen L, Yu JX, Han J, Ma J (2008) Multiscale representations for fast pattern matching in stream time series. IEEE Trans Knowl Data Eng 21(4):568–581

    Article  Google Scholar 

  14. Zhou K, Hou Q, Wang R, Guo B (2008) Real-time kd-tree construction on graphics hardware. ACM Trans Graph (TOG) 27(5):1–11

    Article  Google Scholar 

  15. Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Vldb, vol 97, pp 426–435. Citeseer

  16. Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbor. In: Proceedings of the 23rd international conference on machine learning, pp 97–104

  17. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD international conference on management of data, pp 47–57

  18. Almalawi AM, Fahad A, Tari Z, Cheema MA, Khalil I (2015) \(k\) NNVWC: an efficient \(k\)-nearest neighbors approach based on various-widths clustering. IEEE Trans Knowl Data Eng 28(1):68–81

    Article  Google Scholar 

  19. Pan Y, Pan Z, Wang Y, Wang W (2020) A new fast search algorithm for exact k-nearest neighbors based on optimal triangle-inequality-based check strategy. Knowl-Based Syst 189:105088

    Article  Google Scholar 

  20. Wang X (2011) A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In: The 2011 international joint conference on neural networks, pp 1293–1299. IEEE

  21. Camerra A, Palpanas T, Shieh J, Keogh E (2010) isax 2.0: Indexing and mining one billion time series. In: 2010 IEEE international conference on data mining, pp 58–67. IEEE

  22. Peng B, Fatourou P, Palpanas T (2020) Paris+: data series indexing on multi-core architectures. IEEE Trans Knowl Data Eng 33(5):2151–2164

    Google Scholar 

  23. Wang Y, Wang P, Pei J, Wang W, Huang S (2013) A data-adaptive and dynamic segmentation index for whole matching on time series. Proc VLDB Endow 6(10):793–804

    Article  Google Scholar 

  24. Zoumpatianos K, Idreos S, Palpanas T (2016) ADS: the adaptive data series index. VLDB J 25(6):843–866

    Article  Google Scholar 

  25. Shieh J, Keogh E (2008) \(i\)SAX: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 623–631

  26. Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, pp 151–162

  27. Peng J, Wang H, Li J, Gao H (2016) Set-based similarity search for time series. In: Proceedings of the 2016 international conference on management of data, pp 2039–2052

  28. Zhang H, Dong Y, Li J, Xu D (2021) An efficient method for time series similarity search using binary code representation and hamming distance. Intell Data Anal 25(2):439–461

    Article  Google Scholar 

  29. Ye Y, Jiang J, Ge B, Dou Y, Yang K (2019) Similarity measures for time series data classification using grid representation and matrix distance. Knowl Inf Syst 60(2):1105–1134

    Article  Google Scholar 

  30. Hwang Y, Baek M, Kim S, Han B, Ahn H-K (2018) Product quantized translation for fast nearest neighbor search. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  31. Hwang Y, Han B, Ahn H-K (2012) A fast nearest neighbor search algorithm by nonlinear embedding. In: 2012 IEEE conference on computer vision and pattern recognition, pp 3053–3060. IEEE

  32. Jeong S, Kim S-W, Kim K, Choi B-U (2006) An effective method for approximating the euclidean distance in high-dimensional space. In: International conference on database and expert systems applications, pp 863–872. Springer

  33. Li M, Zhang Y, Sun Y, Wang W, Tsang IW, Lin X (2018) An efficient exact nearest neighbor search by compounded embedding. In: International conference on database systems for advanced applications, pp 37–54. Springer

  34. Liu Y, Wei H, Cheng H (2018) Exploiting lower bounds to accelerate approximate nearest neighbor search on high-dimensional data. Inf Sci 465:484–504

    Article  MathSciNet  Google Scholar 

  35. Bottesch T, Bühler T, Kächele M (2016) Speeding up k-means by approximating Euclidean distances via block vectors. In: International conference on machine learning, pp 2578–2586. PMLR

  36. Zhang H, Dong Y, Xu D (2021) Accelerating exact nearest neighbor search in high dimensional Euclidean space via block vectors. Int J Intell Syst 37:1697–1722

    Article  Google Scholar 

  37. Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv (CSUR) 45(1):1–34

    Article  Google Scholar 

  38. Berndt DJ, Clifford J (1996) Finding patterns in time series: a dynamic programming approach. In: Advances in knowledge discovery and data mining, pp 229–248

  39. Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 491–502

  40. Marteau P-F (2008) Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans Pattern Anal Mach Intell 31(2):306–318

    Article  Google Scholar 

  41. Stefan A, Athitsos V, Das G (2012) The move-split-merge metric for time series. IEEE Trans Knowl Data Eng 25(6):1425–1438

    Article  Google Scholar 

  42. Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 262–270

  43. Kim S-W, Park S, Chu WW (2001) An index-based approach for similarity search supporting time warping in large sequence databases. In: Proceedings 17th international conference on data engineering, pp 607–614. IEEE

  44. Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–386

    Article  Google Scholar 

  45. Yi B-K, Faloutsos C (2000) Fast time sequence indexing for arbitrary Lp norms

  46. Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286

    Article  Google Scholar 

  47. Dau HA, Keogh E, Kamgar K, Yeh C-CM, Zhu Y, Gharghabi S, Ratanamahatana CA, Yanping, Hu B, Begum N, Bagnall A, Mueen A, Batista G, Hexagon-ML (2018) The UCR time series classification archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/

Download references

Acknowledgements

We want to express our gratitude to Dr. Eamonn Keogh for providing the data sets used in this paper. This work is supported by Science Foundation of Zhejiang Sci-Tech University (ZSTU) under Grant No. 22232264-Y.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haowen Zhang.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Li, J. Speeding up pattern matching in streaming time-series via block vector and multilevel lower bound. Neural Comput & Applic 36, 3389–3403 (2024). https://doi.org/10.1007/s00521-023-09291-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09291-5

Keywords

Navigation