Abstract
The technique of searching for similar patterns among time series data is very useful in many applications. The problem becomes difficult when shifting and scaling are considered. We find that we can treat the problem geometrically and the major contribution of this paper is that a uniform geometrical model that can analyze the existing related methods is proposed. Based on the analysis, we conclude that the angle between two vectors after the Shift-Eliminated Transformation is a more intrinsical similarity measure invariant to shifting and scaling. We then enhance the original conical index to adapt to the geometrical properties of the problem and compare its performance with that of sequential search and R*-tree. Experimental results show that the enhanced conical index achieves larger improvement on R*-tree and sequential search in high dimension. It can also keep a steady performance as the selectivity increases.
Similar content being viewed by others
References
Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. In: Proceedings of the fourth international conference on foundations of data organization and algorithms, pp 69–84
Agrawal R, Lin K-I, Sawhney HS, Shim K (1995) Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: Proceedings of the 21st VLDB conference, pp 490–501
Berchtold S, Böhm C, Kriegel H-P (1998) The pyramid-technique: Towards breaking the curse of dimensionality. In: SIGMOD 1998, proceedings ACM SIGMOD international conference on management of data, ACM Press, pp 142–153
Berndt DJ, Clifford J (1995) Finding patterns in time series: A dynamic programming approach. In: Advances in knowledge discovery and data mining, AAAI Press/MIT Press, pp 229–248
Bollobás B, Das G, Gunopulos D, Mannila H (1997) Time-series similarity problems and well-separated geometric sets. In: 13th annual ACM symposium on computational geometry, pp 454–456
Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: An efficient and robust access method for points and rectangles. In: Proceedings of the 1994 ACM SIGMOD international conference on management of data, pp 322–331
Bozkaya T, Yazdani N, Ozsoyoglu ZM (1997) Matching and indexing sequences of different lengths. In: Proceedings of the 1997 ACM CIKM, sixth international conference on information and knowledge management, pp 128–135
Chan K-P, Fu AW-C (1999) Efficient time series matching by wavelets. In: Proceedings of the 15th international conference on data engineering, IEEE Computer Society, pp 126–133
Chakrabarti K, Keogh E, Mehrotra S, Pazzani M (2002) Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans Database Syst 27(2):188–228
Chu KW, Lam SK, Wong MH (1998) An efficient hash-based algorithm for sequence data searching. Comput J 41:402–415
Chakrabarti K, Mehrotra S (1999) The hybrid tree: An index structure for high dimensional feature spaces. In: ICDE, pp 440–447
Chu KKW, Wong MH (1999) Fast time-series searching with scaling and shifting. In: Proceedings of the eighteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, ACM Press, pp 237–248
Das G, Gunopulos D, Mannila H (1997) Finding similar time series. In: First European symposium on principles of data mining and knowledge discovery, pp 88–100
Davis HF, Snider AD (1995) Introduction to vector analysis. Wm. C. Brown Publishers
Ferhatosmanoglu H, Agrawal D, Abbadi AE (2001) Efficient processing of conical queries. In: Proceedings of the 2001 ACM CIKM international conference on information and knowledge management, ACM Press, pp 1–8
Fraleigh JB, Beauregard, RA (1995) Linear algebra. Addison Wesley, Reading
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM SIGMOD international conference on management of data, pp 419–429
Gaede V, Günther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th international conference on very large data bases, pp 518–529
Goldin DQ, Kanellakis PC (1995) On similarity queries for time-series data: constraint specification and implementation. In: First international conference on the principles and practice of constraint programming, pp 137–153
Ge X, Smyth P (2000) Deformable markov model templates for time-series pattern matching. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, ACM Press, pp 81–90
Jagadish HV, Mendelzon AO, Milo T (1995) Similarity-based queries. In: Symposium on principles of database systems, pp 36–45
Keogh EJ, Chu S, Hart D, Pazzani MJ (2002) An online algorithm for segmenting time series. In: Proceedings of the 2001 IEEE international conference on data mining, IEEE Computer Society, pp 289–296
Keogh EJ, Chakrabarti K, Pazzani MJ, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(1):263–286
Keogh EJ (2001) Mining and indexing time series data. In: The 2001 IEEE international conference on data mining
Korn F, Jagadish HV, Faloutsos C (1997) Efficiently supporting ad hoc queries in large datasets of time sequences. In: SIGMOD 1997, proceedings ACM SIGMOD international conference on management of data, ACM Press, pp 289–300
Kahveci T, Singh A, Gurel A (2001) Shift and scale invariant search of multi-attribute time sequences. Technical report, University of California, Santa Barbara
Kim S-W, Yoon J, Park S, Kim T-H (2002) Shape-based retrieval of similar subsequences in time-series databases. In: Proceedings of the 17th symposium on proceedings of the 2002 ACM symposium on applied computing, ACM Press, pp 438–445
Lam SK, Wong MH (1998) A fast projection algorithm for sequence data searching. Data Knowl Eng 28:321–339; A preliminary version appeared in the third international workshop on next generation information technologies and systems, pp 172–181 (1997)
Li C-S, Yu PS, Castelli V (1996) Similarity search algorithm for databases of long sequences. In: Proceedings of the 12th international conference on data engineering, pp 546–553
Mortenson ME (1995) Geometric transformations. Industrial Press
Pratt KB, Fink E (2002) Search for patterns in compressed time series. Int J Image Graphics 2(1):89–106
Park S, Kim S-W, Chu WW (2001) Segment-based approach for subsequence searches in sequence databases. In: Proceedings of the 2001 ACM symposium on applied computing, ACM Press, pp 248–252
Popivanov I, Miller RJ (2002) Similarity search over time series data using wavelets. In: Proceedings of the 18th international conference on data engineering, IEEE Computer Society
Polly WPM, Wong NH (2001) Efficient and robust feature extraction and pattern matching of time series by a lattice structure. In: Proceedings of the 2001 ACM CIKM international conference on information and knowledge management, ACM, pp 271–278
Perng CS, Wang H, Zhang SR, Parker DS (2000) Landmarks: A new model for similarity-based pattern querying in time series databases. In: Proceedings of the 16th international conference on data engineering, IEEE Computer Society, pp 33–42
Rafiei D (1999) On similarity-based queries for time series data. In: Proceedings of the 15th international conference on data engineering, IEEE Computer Society, pp 410–417
Rafiei D, Mendelzon A (1997) Similarity-based queries for time series data. In: Proceedings of the 1997 ACM SIGMOD international conference on management of data, pp 13–25
Struzik ZR, Siebes A (1999) The haar wavelet transform in the time series similarity paradigm. In: Principles of data mining and knowledge discovery, third European conference, PKDD '99, Springer-Verlag, Berlin, pp 12–22
Shatkay H, Zdonik SB (1996) Approximate queries and representations for large data sequences. In: Proceedings of the 12th international conference on data engineering, pp 536–545
Wu Y-L, Agrawal D, Abbadi AE (2000) A comparison of DFT and DWT based similarity search in time-series databases. In: Proceedings of the ninth international conference on information and knowledge management, ACM Press, pp 488–495
Yi B-K, Faloutsos C (2000) Fast time sequence indexing for arbitrary lp norms. In: VLDB 2000, proceedings of 26th international conference on very large data bases, Morgan Kaufmann, pp 385–394
Yi B, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Proceedings of the 14th international conference on data engineering, pp 201–208
Keogh E, Folias T (2000) The UCR time series data mining archive. University of California, Computer Science & Engineering Department, Riveside, CA, [http://www.cs.ucr.edu/~eamonn/TSDMA/index.html]
Author information
Authors and Affiliations
Corresponding author
Additional information
Part of the result related to the geometrical model has been published in the Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp 237–248.
Mi Zhou was born in China. He received his BS and MS degrees in computer science from the Northeastern University, China, in 1999 and 2002, respectively. He is currently pursuing the Ph D degree in the Computer Science and Engineering Department, The Chinese University of Hong Kong. His research interests include indexing of time series data, high-dimensional index, and sensor network.
Man-Hon Wong received his BSc and MPhil degrees from The Chinese University of Hong Kong in 1987 and 1989 respectively. He then went to University of California at Santa Barbara where he got the PhD degree in 1993. Dr. Wong joined The Chinese University of Hong Kong in August 1993 as an assistant professor. He was promoted to associate professor in 1998. His research interests include transaction management, mobile databases, data replication, distributed systems, and computer and network security.
Kam-Wing Chu was born in Hong Kong. He received his BS and MPhil degrees in computer science and engineering from The Chinese University of Hong Kong. When he was in Hong Kong, his research interests included database indexing of high dimensional data, and data mining. He later went to United States and received his MS degree in computer science from University of Maryland at College Park. While he was in Maryland, he focused on high performance implementation and algorithm design of advanced database systems. He is currently a senior software engineer in Server Performance group at Actuate Corporation. His expertise is in enterprise software development and software performance optimization.
Rights and permissions
About this article
Cite this article
Zhou, M., Wong, MH. & Chu, KW. A geometrical solution to time series searching invariant to shifting and scaling. Knowl Inf Syst 9, 202–229 (2006). https://doi.org/10.1007/s10115-005-0215-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-005-0215-8