Skip to main content
Log in

Rotation-invariant similarity in time series using bag-of-patterns representation

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

For more than a decade, time series similarity search has been given a great deal of attention by data mining researchers. As a result, many time series representations and distance measures have been proposed. However, most existing work on time series similarity search relies on shape-based similarity matching. While some of the existing approaches work well for short time series data, they typically fail to produce satisfactory results when the sequence is long. For long sequences, it is more appropriate to consider the similarity based on the higher-level structures. In this work, we present a histogram-based representation for time series data, similar to the “bag of words” approach that is widely accepted by the text mining and information retrieval communities. We performed extensive experiments and show that our approach outperforms the leading existing methods in clustering, classification, and anomaly detection on dozens of real datasets. We further demonstrate that the representation allows rotation-invariant matching in shape datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. In previous work (Lin and Li 2009), we used a different normalization technique, min-max normalization, which resulted in slightly better clustering for Euclidean distance: signals #5 and #6 were correctly clustered together. However, z-normalization is more commonly used for time series data mining tasks.

References

  • Agrawal, R., Faloutsos, C., & Swami, A. (1993). Efficient similarity search in sequence databases. In Proceedings of the 4th int’l conference on foundations of data organization and algorithms (pp. 69–84). Chicago, IL.

  • Bradley, P., Fayyad, U., & Reina, C. (1998). Scaling clustering algorithms to large databases. In Proceedings of the 4th int’l conference on knowledge discovery and data mining (pp. 9–15). New York, NY.

  • Chan, K., & Fu, A. W. (1999). Efficient time series matching by wavelets. In Proceedings of the 15th IEEE int’l conference on data engineering (pp. 126–133). Sydney, Australia.

  • Chen, L., & Ng, R. (2004). On the marriage of Lp-norms and edit distance. In Proceedings of the thirtieth international conference on very large data bases (Vol. 30, pp. 792–803).

  • Crochemore, M., Czumaj, A., Gasjeniec, L., Jarominek, S., Lecroq, T., Plandowski, W., et al. (1994). Speeding up two string-matching algorithms. Algorithmica, 12, 247–267.

    Article  MathSciNet  MATH  Google Scholar 

  • Deng, K., Moore, A., & Nechyba, M. (1997). Learning to recognize time series: Combining ARMA models with memory-based learning. IEEE International Symposium on Computational Intelligence in Robotics and Automation, 1, 246–250.

    Google Scholar 

  • Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., & Keogh, E. (2008). Querying and mining of time series data: Experimental comparison of representations and distance measures. Proceedings of VLDB Endowment, 1(2), 1542–1552.

    Google Scholar 

  • Faloutsos, C., Ranganathan, M., & Manolopulos, Y. (1994). Fast subsequence matching in time-series databases. SIGMOD Record, 23, 419–429.

    Article  Google Scholar 

  • Gavrilov, M., Anguelov, D., Indyk, P., & Motwahl, R. (2000). Mining the stock market: Which measure is best? In Proceeding of the 6th ACM SIGKDD.

  • Ge, X., & Smyth, P. (2000). Deformable Markov model templates for time-series pattern matching. In Proceedings of the 6th ACM SIGKDD (pp. 81–90). Boston, MA.

  • Geurts, P. (2001). Pattern extraction for time series classification. In Proceedings of the 5th European conference on principles of data mining and knowledge discovery (pp. 115–127). Freiburg, Germany.

  • Goldberger, A. L., Amaral, L., Glass, L, Hausdorff, J. M., Ivanov, P. Ch., Mark, R. G., et al. (1997). PhysioBank, PhysioToolkit, and PhysioNet: Circulation 101(23):e215–e220. Discovery, 1(3).

  • Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 2, 241–254.

    Article  Google Scholar 

  • Keogh, E. (2002). Exact indexing of dynamic time warping. In Proceedings of the 28th international conference on very large data bases. Hong Kong, China.

  • Keogh, E. (2004). Tutorial in SIGKDD. In Data mining and machine learning in time series databases.

  • Keogh, E., & Kasetty, S. (2002). On the need for time series data mining benchmarks: A survey and empirical demonstration. In Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery (pp. 102–111). Edmonton, Alberta, Canada.

  • Keogh, E., Chakrabarti, K., & Pazzani, M. (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In Proceedings of ACM SIGMOD conference on management of data (pp. 151–162). Santa Barbara.

  • Keogh, E., Lonardi, S., & Ratanamahatana, C. A. (2004). Towards parameter-free data mining. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. Seattle, WA, USA.

  • Keogh, E., Xi, X., Wei, L., & Ratanamahatana, C. (2006a). The UCR time series classification/clustering homepage. http://www.cs.ucr.edu/~eamonn/time_series_data. Accessed 12 July 2011.

  • Keogh, E., Lin, J., & Fu, A. (2006b). Finding the most unusual time series subsequence: Algorithms and applications. Knowledge and Information Systems (KAIS). Springer-Verlag.

  • Keogh, E., Wei, L., Xi, X., Lee, S., & Vlachos, M. (2006c). LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures. In Proceedings of the 32nd international conference on very large data bases.

  • Ko, M. K., West, G., Venkatesh, S., & Kumar, M. (2005) Online context recognition in multisensor systems using dynamic time warping. In Intelligent sensors, sensor networks and information processing conference (pp. 283–288).

  • Kriegel, H.-P., Kroger, P., Pryakhin, A., Renz, M., & Zherdin, A. (2008). Approximate clustering of time series using compact model-based descriptions. In J. R. Haritsa, R. Kotagiri, & V. Pudi (Eds.), Proceedings of the 13th international conference on database systems for advanced applications (DASFAA’08) (pp. 364–379). Berlin, Heidelberg: Springer-Verlag.

    Google Scholar 

  • Li, M., & Vitanyi, P. (1997). An introduction to Kolmogorov complexity and its applications, 2nd Edn. Springer Verlag.

  • Lin, J., & Li, Y. (2009). Finding structural similarity in time series data using Bag-of-Patterns representation. In M. Winslett (Ed.), Proceedings of the 21st international conference on scientific and statistical database management (SSDBM 2009) (pp. 461–477). Berlin, Heidelberg: Springer-Verlag.

    Google Scholar 

  • Lin, J., Keogh, E., Li, W., & Lonardi, S. (2007). Experiencing SAX: A novel symbolic representation of time series. Data Mining and Knowledge Discovery, 15(2), 107–144.

    Article  MathSciNet  Google Scholar 

  • Lin, J., Vlachos, M., Keogh, E., & Gunopulos, D. (2004). Iterative incremental clustering of time series. In IX conference on extending database technology (EDBT).

  • Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.

  • Marteau, P.-F. (2009). Time warp edit distance with stiffness adjustment for time series matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2), 306–318.

    Article  Google Scholar 

  • McQueen, J. (1967). Some methods for classification and analysis of multivariate observation. In L. Le Cam, & J. Neyman (Eds.), Proceedings of the 5th Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297). Berkeley, CA.

  • Mueen, A., Keogh, E., & Young, N. (2011). Logical-shapelets: An expressive primitive for time series classification. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’11). San Diego, CA.

  • Nanopoulos, A., Alcock, R., & Manolopoulos, Y. (2001). Feature-based classification of time-series data. In N. Mastorakis, & S. D. Nikolopoulos (Eds.), Information processing and technology (pp. 49–61). Commack, NY: Nova Science Publishers.

    Google Scholar 

  • Olszewski, R. (2001). Generalized feature extraction for structural pattern recognition in time-series data. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA.

  • Radovanovic, M., Nanopoulos, A., & Ivanovic, M. (2010). Time-series classification in many intrinsic dimensions (pp. 677–688). SDM.

  • Ratanamahatana, C. A., & Keogh, E. (2004). Making time-series classification more accurate using learned constraints. In Proceedings of SIAM international conference on data mining. Lake Buena Vista, Florida.

  • Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 19, 613–620.

    Article  Google Scholar 

  • Sart, D., Mueen, A., Najjar, W., Keogh, E., & Niennattrakul, V. (2010). Accelerating dynamic time warping subsequence search with GPUs and FPGAs. In Proceedings of the 2010 IEEE international conference on data mining (ICDM’10) (pp. 1001–1006). Washington, DC, USA.

  • Wang, X., Smith, K., & Hyndman, R. (2006). Characteristic-based clustering for time series data. Data Mining and Knowledge Discovery, 13(3), 335–364.

    Article  MathSciNet  Google Scholar 

  • Wei, L., & Keogh, E. (2006). Semi-supervised time series classification. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 748–753). New York, NY, U.S.A.: ACM.

    Chapter  Google Scholar 

  • Wei, L., Keogh, E., & Xi, X. (2006) SAXually explicit images: finding unusual shapes. In Proceedings of the IEEE international conference on data mining. Hong Kong.

  • Vlachos, M., Gunopoulos, D., & Kollios, G. (2002). Discovering similar multidimensional trajectories. In Proceedings of the 18th International Conference on Data Engineering.

  • Xing, Z., Pei, J., Yu, P., & Wang, K. (2011). Extracting interpretable features for early classification on time series. In Proceedings of SDM, 2011.

  • Ye, L., & Keogh, E. (2009). Time series shapelets: A new primitive for data mining. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’09). New York, NY.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jessica Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, J., Khade, R. & Li, Y. Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39, 287–315 (2012). https://doi.org/10.1007/s10844-012-0196-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-012-0196-5

Keywords

Navigation