Skip to main content
Log in

A geometrical solution to time series searching invariant to shifting and scaling

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The technique of searching for similar patterns among time series data is very useful in many applications. The problem becomes difficult when shifting and scaling are considered. We find that we can treat the problem geometrically and the major contribution of this paper is that a uniform geometrical model that can analyze the existing related methods is proposed. Based on the analysis, we conclude that the angle between two vectors after the Shift-Eliminated Transformation is a more intrinsical similarity measure invariant to shifting and scaling. We then enhance the original conical index to adapt to the geometrical properties of the problem and compare its performance with that of sequential search and R*-tree. Experimental results show that the enhanced conical index achieves larger improvement on R*-tree and sequential search in high dimension. It can also keep a steady performance as the selectivity increases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. In: Proceedings of the fourth international conference on foundations of data organization and algorithms, pp 69–84

  2. Agrawal R, Lin K-I, Sawhney HS, Shim K (1995) Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: Proceedings of the 21st VLDB conference, pp 490–501

  3. Berchtold S, Böhm C, Kriegel H-P (1998) The pyramid-technique: Towards breaking the curse of dimensionality. In: SIGMOD 1998, proceedings ACM SIGMOD international conference on management of data, ACM Press, pp 142–153

  4. Berndt DJ, Clifford J (1995) Finding patterns in time series: A dynamic programming approach. In: Advances in knowledge discovery and data mining, AAAI Press/MIT Press, pp 229–248

  5. Bollobás B, Das G, Gunopulos D, Mannila H (1997) Time-series similarity problems and well-separated geometric sets. In: 13th annual ACM symposium on computational geometry, pp 454–456

  6. Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: An efficient and robust access method for points and rectangles. In: Proceedings of the 1994 ACM SIGMOD international conference on management of data, pp 322–331

  7. Bozkaya T, Yazdani N, Ozsoyoglu ZM (1997) Matching and indexing sequences of different lengths. In: Proceedings of the 1997 ACM CIKM, sixth international conference on information and knowledge management, pp 128–135

  8. Chan K-P, Fu AW-C (1999) Efficient time series matching by wavelets. In: Proceedings of the 15th international conference on data engineering, IEEE Computer Society, pp 126–133

  9. Chakrabarti K, Keogh E, Mehrotra S, Pazzani M (2002) Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans Database Syst 27(2):188–228

    Article  Google Scholar 

  10. Chu KW, Lam SK, Wong MH (1998) An efficient hash-based algorithm for sequence data searching. Comput J 41:402–415

    Article  Google Scholar 

  11. Chakrabarti K, Mehrotra S (1999) The hybrid tree: An index structure for high dimensional feature spaces. In: ICDE, pp 440–447

  12. Chu KKW, Wong MH (1999) Fast time-series searching with scaling and shifting. In: Proceedings of the eighteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, ACM Press, pp 237–248

  13. Das G, Gunopulos D, Mannila H (1997) Finding similar time series. In: First European symposium on principles of data mining and knowledge discovery, pp 88–100

  14. Davis HF, Snider AD (1995) Introduction to vector analysis. Wm. C. Brown Publishers

  15. Ferhatosmanoglu H, Agrawal D, Abbadi AE (2001) Efficient processing of conical queries. In: Proceedings of the 2001 ACM CIKM international conference on information and knowledge management, ACM Press, pp 1–8

  16. Fraleigh JB, Beauregard, RA (1995) Linear algebra. Addison Wesley, Reading

  17. Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM SIGMOD international conference on management of data, pp 419–429

  18. Gaede V, Günther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231

    Article  Google Scholar 

  19. Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th international conference on very large data bases, pp 518–529

  20. Goldin DQ, Kanellakis PC (1995) On similarity queries for time-series data: constraint specification and implementation. In: First international conference on the principles and practice of constraint programming, pp 137–153

  21. Ge X, Smyth P (2000) Deformable markov model templates for time-series pattern matching. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, ACM Press, pp 81–90

  22. Jagadish HV, Mendelzon AO, Milo T (1995) Similarity-based queries. In: Symposium on principles of database systems, pp 36–45

  23. Keogh EJ, Chu S, Hart D, Pazzani MJ (2002) An online algorithm for segmenting time series. In: Proceedings of the 2001 IEEE international conference on data mining, IEEE Computer Society, pp 289–296

  24. Keogh EJ, Chakrabarti K, Pazzani MJ, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(1):263–286

    Google Scholar 

  25. Keogh EJ (2001) Mining and indexing time series data. In: The 2001 IEEE international conference on data mining

  26. Korn F, Jagadish HV, Faloutsos C (1997) Efficiently supporting ad hoc queries in large datasets of time sequences. In: SIGMOD 1997, proceedings ACM SIGMOD international conference on management of data, ACM Press, pp 289–300

  27. Kahveci T, Singh A, Gurel A (2001) Shift and scale invariant search of multi-attribute time sequences. Technical report, University of California, Santa Barbara

  28. Kim S-W, Yoon J, Park S, Kim T-H (2002) Shape-based retrieval of similar subsequences in time-series databases. In: Proceedings of the 17th symposium on proceedings of the 2002 ACM symposium on applied computing, ACM Press, pp 438–445

  29. Lam SK, Wong MH (1998) A fast projection algorithm for sequence data searching. Data Knowl Eng 28:321–339; A preliminary version appeared in the third international workshop on next generation information technologies and systems, pp 172–181 (1997)

    Article  MATH  Google Scholar 

  30. Li C-S, Yu PS, Castelli V (1996) Similarity search algorithm for databases of long sequences. In: Proceedings of the 12th international conference on data engineering, pp 546–553

  31. Mortenson ME (1995) Geometric transformations. Industrial Press

  32. Pratt KB, Fink E (2002) Search for patterns in compressed time series. Int J Image Graphics 2(1):89–106

    Google Scholar 

  33. Park S, Kim S-W, Chu WW (2001) Segment-based approach for subsequence searches in sequence databases. In: Proceedings of the 2001 ACM symposium on applied computing, ACM Press, pp 248–252

  34. Popivanov I, Miller RJ (2002) Similarity search over time series data using wavelets. In: Proceedings of the 18th international conference on data engineering, IEEE Computer Society

  35. Polly WPM, Wong NH (2001) Efficient and robust feature extraction and pattern matching of time series by a lattice structure. In: Proceedings of the 2001 ACM CIKM international conference on information and knowledge management, ACM, pp 271–278

  36. Perng CS, Wang H, Zhang SR, Parker DS (2000) Landmarks: A new model for similarity-based pattern querying in time series databases. In: Proceedings of the 16th international conference on data engineering, IEEE Computer Society, pp 33–42

  37. Rafiei D (1999) On similarity-based queries for time series data. In: Proceedings of the 15th international conference on data engineering, IEEE Computer Society, pp 410–417

  38. Rafiei D, Mendelzon A (1997) Similarity-based queries for time series data. In: Proceedings of the 1997 ACM SIGMOD international conference on management of data, pp 13–25

  39. Struzik ZR, Siebes A (1999) The haar wavelet transform in the time series similarity paradigm. In: Principles of data mining and knowledge discovery, third European conference, PKDD '99, Springer-Verlag, Berlin, pp 12–22

  40. Shatkay H, Zdonik SB (1996) Approximate queries and representations for large data sequences. In: Proceedings of the 12th international conference on data engineering, pp 536–545

  41. Wu Y-L, Agrawal D, Abbadi AE (2000) A comparison of DFT and DWT based similarity search in time-series databases. In: Proceedings of the ninth international conference on information and knowledge management, ACM Press, pp 488–495

  42. Yi B-K, Faloutsos C (2000) Fast time sequence indexing for arbitrary lp norms. In: VLDB 2000, proceedings of 26th international conference on very large data bases, Morgan Kaufmann, pp 385–394

  43. Yi B, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Proceedings of the 14th international conference on data engineering, pp 201–208

  44. Keogh E, Folias T (2000) The UCR time series data mining archive. University of California, Computer Science & Engineering Department, Riveside, CA, [http://www.cs.ucr.edu/~eamonn/TSDMA/index.html]

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mi Zhou.

Additional information

Part of the result related to the geometrical model has been published in the Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp 237–248.

Mi Zhou was born in China. He received his BS and MS degrees in computer science from the Northeastern University, China, in 1999 and 2002, respectively. He is currently pursuing the Ph D degree in the Computer Science and Engineering Department, The Chinese University of Hong Kong. His research interests include indexing of time series data, high-dimensional index, and sensor network.

Man-Hon Wong received his BSc and MPhil degrees from The Chinese University of Hong Kong in 1987 and 1989 respectively. He then went to University of California at Santa Barbara where he got the PhD degree in 1993. Dr. Wong joined The Chinese University of Hong Kong in August 1993 as an assistant professor. He was promoted to associate professor in 1998. His research interests include transaction management, mobile databases, data replication, distributed systems, and computer and network security.

Kam-Wing Chu was born in Hong Kong. He received his BS and MPhil degrees in computer science and engineering from The Chinese University of Hong Kong. When he was in Hong Kong, his research interests included database indexing of high dimensional data, and data mining. He later went to United States and received his MS degree in computer science from University of Maryland at College Park. While he was in Maryland, he focused on high performance implementation and algorithm design of advanced database systems. He is currently a senior software engineer in Server Performance group at Actuate Corporation. His expertise is in enterprise software development and software performance optimization.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, M., Wong, MH. & Chu, KW. A geometrical solution to time series searching invariant to shifting and scaling. Knowl Inf Syst 9, 202–229 (2006). https://doi.org/10.1007/s10115-005-0215-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-005-0215-8

Keywords

Navigation