Speeding up pattern matching in streaming time-series via block vector and multilevel lower bound

Zhang, Haowen; Li, Jing

doi:10.1007/s00521-023-09291-5

Speeding up pattern matching in streaming time-series via block vector and multilevel lower bound

Original Article
Published: 02 December 2023

Volume 36, pages 3389–3403, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

119 Accesses
Explore all metrics

Abstract

The pattern matching is one of the essential tasks in streaming time-series data mining. Its purpose is to identify all sliding windows in streaming time-series whose Euclidean Distances with predefined patterns are smaller than a threshold pre-determined. The pattern can be high-dimensional data and the streaming time-series is frequently updated. Thus, the brute-force method, which involves calculating Euclidean Distances between each sliding window and all patterns, is not effective in practical applications. This paper develops a lower bound-basedmethod that can perform pattern matching in less time while guaranteeing the same results as brute-force method. The proposed method achieves speedup without any sacrifice in matching accuracy. The block vector is utilized to calculate the lower bound of Euclidean Distance. Our proposal can safely eliminate many expensive Euclidean Distance calculations between patterns and sliding window; thus, the efficiency of pattern matching can be improved. Besides, we present an approach that can obtain the block vectors on-the-fly in the streaming scenarios to improve efficiency further. The experimental study in synthetic and real-life data sets verifies the efficiency and advantage of the proposals over the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization

Article Open access 11 March 2016

A Framework for Similarity Search in Streaming Time Series based on Spark Streaming

Article 11 June 2022

Improving SPRING Method in Similarity Search Over Time-Series Streams by Data Normalization

Data availability

The synthetic data sets that support the findings of this study are available from the corresponding author upon reasonable request. The public online data sets are available for access on the website [47].

References

Mohammadi M, Al-Fuqaha A, Sorour S, Guizani M (2018) Deep learning for IoT big data and streaming analytics: a survey. IEEE Commun Surv Tutor 20(4):2923–2960
Article Google Scholar
Gomes HM, Read J, Bifet A, Barddal JP, Gama J (2019) Machine learning for streaming data: state of the art, challenges, and opportunities. ACM SIGKDD Explor Newsl 21(2):6–22
Article Google Scholar
Bodenham DA, Adams NM (2017) Continuous monitoring for changepoints in data streams using adaptive estimation. Stat Comput 27(5):1257–1270
Article MathSciNet Google Scholar
Butler B, Pearson RG, Birtles RA (2021) Water-quality and ecosystem impacts of recreation in streams: monitoring and management. Environ Chall 5:100328
Article Google Scholar
Henning S, Hasselbring W (2020) Scalable and reliable multi-dimensional sensor data aggregation in data streaming architectures. Data-Enabled Discov Appl 4(1):1–12
Article Google Scholar
Lin H, Wu S, Kou NM, Gao Y, Lu D et al (2018) Finding the hottest item in data streams. Inf Sci 430:314–330
Article ADS MathSciNet Google Scholar
Chen L, Zou L-J, Tu L (2012) A clustering algorithm for multiple data streams based on spectral component similarity. Inf Sci 183(1):35–47
Article Google Scholar
Wu J, Wang P, Pan N, Wang C, Wang W, Wang J (2019) Kv-match: a subsequence matching approach supporting normalization and time warping. In: 2019 IEEE 35th international conference on data engineering (ICDE), pp 866–877. IEEE
Alghamdi N, Zhang L, Zhang H, Rundensteiner EA, Eltabakh MY (2020) Chainlink: indexing big time series data for long subsequence matching. In: 2020 IEEE 36th international conference on data engineering (ICDE), pp 529–540. IEEE
Gong X, Fong S, Si Y-W (2019) Fast fuzzy subsequence matching algorithms on time-series. Expert Syst Appl 116:275–284
Article Google Scholar
Peng B, Fatourou P, Palpanas T (2021) Fast data series indexing for in-memory data. VLDB J, 1–27
Linardi M, Palpanas T (2020) Scalable data series subsequence matching with ULISSE. VLDB J 29(6):1449–1474
Article Google Scholar
Lian X, Chen L, Yu JX, Han J, Ma J (2008) Multiscale representations for fast pattern matching in stream time series. IEEE Trans Knowl Data Eng 21(4):568–581
Article Google Scholar
Zhou K, Hou Q, Wang R, Guo B (2008) Real-time kd-tree construction on graphics hardware. ACM Trans Graph (TOG) 27(5):1–11
Article Google Scholar
Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Vldb, vol 97, pp 426–435. Citeseer
Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbor. In: Proceedings of the 23rd international conference on machine learning, pp 97–104
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD international conference on management of data, pp 47–57
Almalawi AM, Fahad A, Tari Z, Cheema MA, Khalil I (2015) \(k\) NNVWC: an efficient \(k\)-nearest neighbors approach based on various-widths clustering. IEEE Trans Knowl Data Eng 28(1):68–81
Article Google Scholar
Pan Y, Pan Z, Wang Y, Wang W (2020) A new fast search algorithm for exact k-nearest neighbors based on optimal triangle-inequality-based check strategy. Knowl-Based Syst 189:105088
Article Google Scholar
Wang X (2011) A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In: The 2011 international joint conference on neural networks, pp 1293–1299. IEEE
Camerra A, Palpanas T, Shieh J, Keogh E (2010) isax 2.0: Indexing and mining one billion time series. In: 2010 IEEE international conference on data mining, pp 58–67. IEEE
Peng B, Fatourou P, Palpanas T (2020) Paris+: data series indexing on multi-core architectures. IEEE Trans Knowl Data Eng 33(5):2151–2164
Google Scholar
Wang Y, Wang P, Pei J, Wang W, Huang S (2013) A data-adaptive and dynamic segmentation index for whole matching on time series. Proc VLDB Endow 6(10):793–804
Article Google Scholar
Zoumpatianos K, Idreos S, Palpanas T (2016) ADS: the adaptive data series index. VLDB J 25(6):843–866
Article Google Scholar
Shieh J, Keogh E (2008) \(i\)SAX: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 623–631
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, pp 151–162
Peng J, Wang H, Li J, Gao H (2016) Set-based similarity search for time series. In: Proceedings of the 2016 international conference on management of data, pp 2039–2052
Zhang H, Dong Y, Li J, Xu D (2021) An efficient method for time series similarity search using binary code representation and hamming distance. Intell Data Anal 25(2):439–461
Article Google Scholar
Ye Y, Jiang J, Ge B, Dou Y, Yang K (2019) Similarity measures for time series data classification using grid representation and matrix distance. Knowl Inf Syst 60(2):1105–1134
Article Google Scholar
Hwang Y, Baek M, Kim S, Han B, Ahn H-K (2018) Product quantized translation for fast nearest neighbor search. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Hwang Y, Han B, Ahn H-K (2012) A fast nearest neighbor search algorithm by nonlinear embedding. In: 2012 IEEE conference on computer vision and pattern recognition, pp 3053–3060. IEEE
Jeong S, Kim S-W, Kim K, Choi B-U (2006) An effective method for approximating the euclidean distance in high-dimensional space. In: International conference on database and expert systems applications, pp 863–872. Springer
Li M, Zhang Y, Sun Y, Wang W, Tsang IW, Lin X (2018) An efficient exact nearest neighbor search by compounded embedding. In: International conference on database systems for advanced applications, pp 37–54. Springer
Liu Y, Wei H, Cheng H (2018) Exploiting lower bounds to accelerate approximate nearest neighbor search on high-dimensional data. Inf Sci 465:484–504
Article MathSciNet Google Scholar
Bottesch T, Bühler T, Kächele M (2016) Speeding up k-means by approximating Euclidean distances via block vectors. In: International conference on machine learning, pp 2578–2586. PMLR
Zhang H, Dong Y, Xu D (2021) Accelerating exact nearest neighbor search in high dimensional Euclidean space via block vectors. Int J Intell Syst 37:1697–1722
Article Google Scholar
Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv (CSUR) 45(1):1–34
Article Google Scholar
Berndt DJ, Clifford J (1996) Finding patterns in time series: a dynamic programming approach. In: Advances in knowledge discovery and data mining, pp 229–248
Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 491–502
Marteau P-F (2008) Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans Pattern Anal Mach Intell 31(2):306–318
Article Google Scholar
Stefan A, Athitsos V, Das G (2012) The move-split-merge metric for time series. IEEE Trans Knowl Data Eng 25(6):1425–1438
Article Google Scholar
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 262–270
Kim S-W, Park S, Chu WW (2001) An index-based approach for similarity search supporting time warping in large sequence databases. In: Proceedings 17th international conference on data engineering, pp 607–614. IEEE
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–386
Article Google Scholar
Yi B-K, Faloutsos C (2000) Fast time sequence indexing for arbitrary Lp norms
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286
Article Google Scholar
Dau HA, Keogh E, Kamgar K, Yeh C-CM, Zhu Y, Gharghabi S, Ratanamahatana CA, Yanping, Hu B, Begum N, Bagnall A, Mueen A, Batista G, Hexagon-ML (2018) The UCR time series classification archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/

Download references

Acknowledgements

We want to express our gratitude to Dr. Eamonn Keogh for providing the data sets used in this paper. This work is supported by Science Foundation of Zhejiang Sci-Tech University (ZSTU) under Grant No. 22232264-Y.

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou, 310000, China
Haowen Zhang
College of Computer Science and Technology, Zhejiang University, Hangzhou, 310000, China
Jing Li

Authors

Haowen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haowen Zhang.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, H., Li, J. Speeding up pattern matching in streaming time-series via block vector and multilevel lower bound. Neural Comput & Applic 36, 3389–3403 (2024). https://doi.org/10.1007/s00521-023-09291-5

Download citation

Received: 07 April 2023
Accepted: 06 November 2023
Published: 02 December 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00521-023-09291-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speeding up pattern matching in streaming time-series via block vector and multilevel lower bound

Abstract

Access this article

Similar content being viewed by others

Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization

A Framework for Similarity Search in Streaming Time Series based on Spark Streaming

Improving SPRING Method in Similarity Search Over Time-Series Streams by Data Normalization

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speeding up pattern matching in streaming time-series via block vector and multilevel lower bound

Abstract

Access this article

Similar content being viewed by others

Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization

A Framework for Similarity Search in Streaming Time Series based on Spark Streaming

Improving SPRING Method in Similarity Search Over Time-Series Streams by Data Normalization

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation