Advances in Data Analysis and Classification

, Volume 6, Issue 3, pp 185–200 | Cite as

Time series classification by class-specific Mahalanobis distance measures

  • Zoltán PrekopcsákEmail author
  • Daniel Lemire
Regular Article


To classify time series by nearest neighbors, we need to specify or learn one or several distance measures. We consider variations of the Mahalanobis distance measures which rely on the inverse covariance matrix of the data. Unfortunately—for time series data—the covariance matrix has often low rank. To alleviate this problem we can either use a pseudoinverse, covariance shrinking or limit the matrix to its diagonal. We review these alternatives and benchmark them against competitive methods such as the related Large Margin Nearest Neighbor Classification (LMNN) and the Dynamic Time Warping (DTW) distance. As we expected, we find that the DTW is superior, but the Mahalanobis distance measures are one to two orders of magnitude faster. To get best results with Mahalanobis distance measures, we recommend learning one distance measure per class using either covariance shrinking or the diagonal approach.


Time-series classification Distance measure learning Nearest Neighbor Mahalanobis distance measure 

Mathematics Subject Classification (2000)

62-07 62H30 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Breiman L (1998) Classification and regression trees. Chapman & Hall/CRC, LondonGoogle Scholar
  2. Chai J, Liu H, Chen B, Bao Z (2010) Large margin nearest local mean classifier. Signal Process 90(1): 236–248zbMATHCrossRefGoogle Scholar
  3. Chouakria A, Nagabhushan P (2007) Adaptive dissimilarity index for measuring time series proximity. Adv Data Anal Classif 1: 5–21MathSciNetzbMATHCrossRefGoogle Scholar
  4. Csatári B, Prekopcsák Z (2010) Class-based attribute weighting for time series classification. In: POSTER 2010: Proceedings of the 14th International Student Conference on Electrical EngineeringGoogle Scholar
  5. Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: VLDB ’08, pp 1542–1552Google Scholar
  6. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7(2): 179–188Google Scholar
  7. Gaudin R, Nicoloyannis N (2006) An adaptable time warping distance for time series learning. In: ICMLA ’06, pp 213–218Google Scholar
  8. Hastie T, Tibshirani R (1996) Discriminant adaptive nearest neighbor classification. IEEE T Pattern Anal 18(6): 607–616CrossRefGoogle Scholar
  9. Ishikawa Y, Subramanya R, Faloutsos C (1998) Mindreader: Querying databases through multiple examples. In: VLDB ’98, pp 218–227Google Scholar
  10. Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE T Acoust Speech 23(1): 67–72CrossRefGoogle Scholar
  11. Jahromi MZ, Parvinnia E, John R (2009) A method of learning weighted similarity function to improve the performance of nearest neighbor. Inform Sci 179(17): 2964–2973zbMATHCrossRefGoogle Scholar
  12. Jeong YS, Jeong MK, Omitaomu OA (2011) Weighted dynamic time warping for time series classification. Pattern Recogn 44(9): 2231–2240CrossRefGoogle Scholar
  13. Keogh E, Xi X, Wei L, Ratanamahatana CA (2006) The UCR time series classification/clustering homepage. (last checked on 14/05/2012)
  14. Legrand B, Chang C, Ong S, Neo SY, Palanisamy N (2008) Chromosome classification using dynamic time warping. Pattern Recogn Lett 29(3): 215–222CrossRefGoogle Scholar
  15. Lemire D (2009) Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern Recogn 42: 2169–2180zbMATHCrossRefGoogle Scholar
  16. Mahalanobis PC (1936) On the generalised distance in statistics. Proc Natl Acad Sci India 2(1): 49–55MathSciNetzbMATHGoogle Scholar
  17. Matton M, Compernolle DV, Cools R (2010) Minimum classification error training in example based speech and pattern recognition using sparse weight matrices. J Comput Appl Math 234(4): 1303–1311MathSciNetzbMATHCrossRefGoogle Scholar
  18. Ouyang Y, Zhang F (2010) Histogram distance for similarity search in large time series database. In: IDEAL ’10, pp 170–177Google Scholar
  19. Paredes R, Vidal E (2000) A class-dependent weighted dissimilarity measure for nearest neighbor classification problems. Pattern Recogn Lett 21(12): 1027–1036zbMATHCrossRefGoogle Scholar
  20. Paredes R, Vidal E (2006) Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recogn 39(2): 180–188zbMATHCrossRefGoogle Scholar
  21. Pham DT, Chan AB (1998) Control chart pattern recognition using a new type of self-organizing neural network. Proc Inst Mech Eng I J Syst Control Eng 212(2): 115–127Google Scholar
  22. Prekopcsák Z (2011) Matlab code for the experiments. (last checked on 14/05/2012)
  23. Ratanamahatana CA, Keogh E (2005) Three myths about Dynamic Time Warping data mining. In: SDM ’05Google Scholar
  24. Saito N (1994) Local feature extraction and its applications using a library of bases. PhD thesis, Yale University, New HavenGoogle Scholar
  25. Sakoe H, Chiba S (1978a) Dynamic programming algorithm optimization for spoken word recognition. IEEE T Acoust Speech 26(1): 43–49zbMATHCrossRefGoogle Scholar
  26. Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE T Acoust Speech 26(1): 43–49zbMATHCrossRefGoogle Scholar
  27. Salvador S, Chan P (2007) FastDTW: Toward accurate dynamic time warping in linear time and space. Intell Data Anal 11(5): 561–580Google Scholar
  28. Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Molec Biol 4(1): 32Google Scholar
  29. Short R, Fukunaga K (1980) A new nearest neighbor distance measure. In: ICPR ’80, pp 81–86Google Scholar
  30. Shumway RH (1982) Discriminant analysis for time series. In: Krishnaiah P, Kanal L (eds) Classification pattern recognition and reduction of dimensionality, Handbook of Statistics, vol 2. Elsevier, Amsterdam, pp 1–46CrossRefGoogle Scholar
  31. Stein C (1956) Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In: Proceedings of the Third Berkeley Symposium on Mathematical and Statistical Probability, pp 197–206Google Scholar
  32. Sternickel K (2002) Automatic pattern recognition in ECG time series. Comput Meth Prog Bio 68(2): 109–115CrossRefGoogle Scholar
  33. Vandenberghe L, Boyd S (1996) Semidefinite programming. SIAM Rev, , pp pp 49–95Google Scholar
  34. Weihs C, Ligges U, Mrchen F, Mllensiefen D (2007) Classification in music research. Adv Data Anal Classif 1: 255–291MathSciNetzbMATHCrossRefGoogle Scholar
  35. Weinberger K, Saul L (2008) Large margin nearest neighbor—matlab code. (last checked on 14/05/2012)
  36. Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. JMLR 10: 207–244zbMATHGoogle Scholar
  37. Wettschereck D, Aha DW, Mohri T (1997) A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif Intell Rev 11(1–5): 273–314CrossRefGoogle Scholar
  38. Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Tech. rep., Michigan State University, USA. (last checked on 14/05/2012)
  39. Yu D, Yu X, Hu Q, Liu J, Wu A (2011) Dynamic time warping constraint learning for large margin nearest neighbor classification. Inform Sci 181(13): 2787–2796CrossRefGoogle Scholar
  40. Zhan DC, Li M, Li YF, Zhou ZH (2009) Learning instance specific distances using metric propagation. In: ICML’09, pp 1225–1232Google Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  1. 1.Budapest University of Technology and EconomicsBudapestHungary
  2. 2.LICEF, Université du Québec à Montréal (UQAM)MontrealCanada

Personalised recommendations