Data Mining pp 255-276 | Cite as

PCA-based Time Series Similarity Search

  • Leonidas Karamitopoulos
  • Georgios Evangelidis
  • Dimitris Dervos
Part of the Annals of Information Systems book series (AOIS, volume 8)


We propose a novel approach in multivariate time series similarity search for the purpose of improving the efficiency of data mining techniques without substantially affecting the quality of the obtained results. Our approach includes a representation based on principal component analysis (PCA) in order to reduce the intrinsically high dimensionality of time series and utilizes as a distance measure a variation of the squared prediction error (SPE), a well-known statistic in the Statistical Process Control community. Contrary to other PCA-based measures proposed in the literature, the proposed measure does not require applying the computationally expensive PCA technique on the query. In this chapter, we investigate the usefulness of our approach in the context of query by content and 1-NN classification. More specifically, we consider the case where there are frequently arriving objects that need to be matched with the most similar objects in a database or that need to be classified into one of several pre-determined classes. We conduct experiments on four data sets used extensively in the literature, and we provide the results of the performance of our measure and other PCA-based measures with respect to classi- fication accuracy and precision/recall. Experiments indicate that our approach is at least comparable to other PCA-based measures and a promising option for similarity search within the data mining context.


Time Series Similarity Search Time Instance Multivariate Time Series Query Object 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Proc. 4th Int. Conf. FODO, Evanston, IL, pp. 69–84, (1993).Google Scholar
  2. 2.
    Bakalov, P., Hadjieleftheriou, M., Keogh, E., Tsotras, V.J.: Efficient trajectory joins using symbolic representations. In: Proc. 6th Int. Conf. on Mobile data management, Ayia Napa, Cyprus, pp. 86–93, (2005).Google Scholar
  3. 3.
    Barbic, J., Safonova, A., Pan, J.Y., Faloutsos, C., Hodgins, J.K., Pollard, N.S.: Segmenting motion capture data into distinct behaviors. In: Proc. Graphics Interface Conf, London, Ontario, Canada, pp. 185–194, (2004).Google Scholar
  4. 4.
    Begleiter, H.: The UCI KDD Archive []. Irvine, CA: University of California, Department of Information and Computer Science, (1999).
  5. 5.
    Buzan, D., Sclaroff, S., Kollios, G.: Extraction and clustering of motion trajectories in video. In: Proc. 17th ICPR, Boston, MA, vol. 2, pp. 521–524, (2004).Google Scholar
  6. 6.
    Cai, Y., Ng, R.: Indexing spatio-temporal trajectories with Chebyshev polynomials. In: Proc. ACM SIGMOD, Paris, France, pp. 599–610, (2004).Google Scholar
  7. 7.
    Chapman, L., Thornes, J.E.: The use of geographical information systems in climatology and meteorology. Progress in Physical Geography, 27(3), pp. 313–330, (2003).CrossRefGoogle Scholar
  8. 8.
    Chen, L., Ozsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Baltimore, MD, pp. 491–502, (2005).Google Scholar
  9. 9.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons Inc., New York, (1991).CrossRefGoogle Scholar
  10. 10.
    Gower, J.C.: Multivariate Analysis and Multidimensional Geometry. The Statistician, 17(1), pp. 13–28, (1967).CrossRefGoogle Scholar
  11. 11.
    Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge, MA (2001).Google Scholar
  12. 12.
    Johannesmeyer, M.C.: Abnormal situation analysis using pattern recognition techniques and historical data. M.S. thesis, UCSB, Santa Barbara, CA, (1999).Google Scholar
  13. 13.
    Jolliffe, I.T.: Principal Component Analysis. Springer, New York, Chapter 1, (2004).Google Scholar
  14. 14.
    Kadous, M.W.: Temporal Classification: extending the classification paradigm to multivariate time series. Ph.D. Thesis, School of Computer Science and Engineering, University of New South Wales, (2002).Google Scholar
  15. 15.
    Kahveci, T., Singh, A., Gurel, A.: Similarity searching for multi-attribute sequences. In: Proc. 14th SSDBM, Edinburg, Scotland, pp. 175–184, (2002).Google Scholar
  16. 16.
    Kano, M., Nagao, K., Ohno, H., Hasebe, S., Hashimoto, I.: Dissimilarity of process data for statistical process monitoring. In: Proc. IFAC Symp. ADCHEM, Pisa, Italy, vol. I, pp. 231-236, (2000).Google Scholar
  17. 17.
    Kano, M., Nagao, K., Hasebe, S., Hashimoto, I., Ohno, H., Strauss, R., Bakshi, B.R.: Comparison of multivariate statistical process monitoring methods with applications to the Eastman challenge problem. Computers & Chemical Engineering, 26(2), pp. 161–174, (2002).CrossRefGoogle Scholar
  18. 18.
    Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowledge and Information Systems, 3(3), pp. 263–286, (2001).CrossRefGoogle Scholar
  19. 19.
    Kresta, J., MacGregor, J.F., Marlin, T.E.: Multivariate statistical monitoring of process operating performance. The Canadian Journal of Chemical Engineering, 69, pp. 35–47, (1991).CrossRefGoogle Scholar
  20. 20.
    Krzanowski, W.: Between-groups comparison of principal components. JASA, 74(367), pp. 703–707, (1979).Google Scholar
  21. 21.
    Lee, S.L., Chun, S.J., Kim, D.H., Lee, J.H., Chung, C.W.: Similarity search for multidimensional data sequences. In: Proc. ICDE, San Diego, CA, pp. 599–608, (2000).Google Scholar
  22. 22.
    Li, C., Prabhakaran, B.: A similarity measure for motion stream segmentation and recognition. In: Proc.6th Int. Workshop MDM/KDD, Chicago, IL, pp. 89–94, (2005).Google Scholar
  23. 23.
    Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proc. 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, San Diego, CA, pp. 2–11, (2003).Google Scholar
  24. 24.
    Moeslund, T.B., Granum, E.: A survey of computer vision-based human motion capture. Computer Vision and Image Understanding, 81(3), pp. 231–268, (2001).CrossRefGoogle Scholar
  25. 25.
    Otey, M.E., Parthasarathy, S.: A dissimilarity measure for comparing subsets of data: application to multivariate time series. In: Proc. ICDM Workshop on Temporal Data Mining, Houston, TX, (2005).Google Scholar
  26. 26.
    Quinlan, J.R.: C4.5 – Programs for machine learning. Morgan Kaufmann Publishers, San Mateo, (1993).Google Scholar
  27. 27.
    Ratanamahatana, C.A., Lin, J., Gunopulos, D., Keogh, E., Vlachos, M., Das, G.: Data Mining and Knowledge Discovery Handbook, chapter 51, Mining Time Series Data. Springer US, pp. 1069–1103, (2005).Google Scholar
  28. 28.
    Roverso, D.: Plant diagnostics by transient classification: the Aladdin approach. International Journal of Intelligent Systems, 17(8), pp. 767–790, (2002).CrossRefGoogle Scholar
  29. 29.
    Singhal, A., Seborg, D.E.: Clustering multivariate time-series data. Journal of Chemometrics, 19(8), pp. 427–438, (2005).CrossRefGoogle Scholar
  30. 30.
    Tanawongsuwan, R., Bobick, A.: Performance analysis of time-distance gait parameters under different speeds. In: Proc. 4th Int. Conf. AVBPA, Guilford, UK, pp. 715–724, (2003).Google Scholar
  31. 31.
    Valera, M., Velastin, S.A.: Intelligent distributed surveillance systems: a review. In: IEE Proc. Vision Image and Signal Processing, 152(2), pp. 192–204, (2005).Google Scholar
  32. 32.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York, (1995)Google Scholar
  33. 33.
    Vlachos, M., Hadjieleftheriou, M., Gunopoulos, D., Keogh, E.: Indexing multidimensional time-series with support for multiple disatance measures. In: Proc. 9th ACM SIGKDD, Washington, D.C., pp. 216–225, (2003).Google Scholar
  34. 34.
    Vlachos, M., Hadjieleftheriou, M., Gunopoulos, D., Keogh, E.: Indexing multidimensional time-series. VLDB Journal, 15(1), pp. 1–20, (2006).CrossRefGoogle Scholar
  35. 35.
    Yang, K., Shahabi, C.: A PCA-based similarity measure for multivariate time series. In: Proc. 2nd ACM MMDB, Washington, D.C., pp. 65–74, (2004).Google Scholar
  36. 36.
    Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary Lp Norms. In: Proc. VLDB-2000: Twenty-Sixth International Conference on Very Large Databases, Cairo, Egypt, (2000).Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Applied InformaticsUniversity of MacedoniaThessalonikiGreece
  2. 2.Information Technology DepartmentAlexander Technology Educational Institute of ThessalonikiSindosGreece

Personalised recommendations