Skip to main content

Time series classification by class-specific Mahalanobis distance measures


To classify time series by nearest neighbors, we need to specify or learn one or several distance measures. We consider variations of the Mahalanobis distance measures which rely on the inverse covariance matrix of the data. Unfortunately—for time series data—the covariance matrix has often low rank. To alleviate this problem we can either use a pseudoinverse, covariance shrinking or limit the matrix to its diagonal. We review these alternatives and benchmark them against competitive methods such as the related Large Margin Nearest Neighbor Classification (LMNN) and the Dynamic Time Warping (DTW) distance. As we expected, we find that the DTW is superior, but the Mahalanobis distance measures are one to two orders of magnitude faster. To get best results with Mahalanobis distance measures, we recommend learning one distance measure per class using either covariance shrinking or the diagonal approach.

This is a preview of subscription content, access via your institution.


  • Breiman L (1998) Classification and regression trees. Chapman & Hall/CRC, London

    Google Scholar 

  • Chai J, Liu H, Chen B, Bao Z (2010) Large margin nearest local mean classifier. Signal Process 90(1): 236–248

    MATH  Article  Google Scholar 

  • Chouakria A, Nagabhushan P (2007) Adaptive dissimilarity index for measuring time series proximity. Adv Data Anal Classif 1: 5–21

    MathSciNet  MATH  Article  Google Scholar 

  • Csatári B, Prekopcsák Z (2010) Class-based attribute weighting for time series classification. In: POSTER 2010: Proceedings of the 14th International Student Conference on Electrical Engineering

  • Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: VLDB ’08, pp 1542–1552

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7(2): 179–188

    Google Scholar 

  • Gaudin R, Nicoloyannis N (2006) An adaptable time warping distance for time series learning. In: ICMLA ’06, pp 213–218

  • Hastie T, Tibshirani R (1996) Discriminant adaptive nearest neighbor classification. IEEE T Pattern Anal 18(6): 607–616

    Article  Google Scholar 

  • Ishikawa Y, Subramanya R, Faloutsos C (1998) Mindreader: Querying databases through multiple examples. In: VLDB ’98, pp 218–227

  • Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE T Acoust Speech 23(1): 67–72

    Article  Google Scholar 

  • Jahromi MZ, Parvinnia E, John R (2009) A method of learning weighted similarity function to improve the performance of nearest neighbor. Inform Sci 179(17): 2964–2973

    MATH  Article  Google Scholar 

  • Jeong YS, Jeong MK, Omitaomu OA (2011) Weighted dynamic time warping for time series classification. Pattern Recogn 44(9): 2231–2240

    Article  Google Scholar 

  • Keogh E, Xi X, Wei L, Ratanamahatana CA (2006) The UCR time series classification/clustering homepage. (last checked on 14/05/2012)

  • Legrand B, Chang C, Ong S, Neo SY, Palanisamy N (2008) Chromosome classification using dynamic time warping. Pattern Recogn Lett 29(3): 215–222

    Article  Google Scholar 

  • Lemire D (2009) Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern Recogn 42: 2169–2180

    MATH  Article  Google Scholar 

  • Mahalanobis PC (1936) On the generalised distance in statistics. Proc Natl Acad Sci India 2(1): 49–55

    MathSciNet  MATH  Google Scholar 

  • Matton M, Compernolle DV, Cools R (2010) Minimum classification error training in example based speech and pattern recognition using sparse weight matrices. J Comput Appl Math 234(4): 1303–1311

    MathSciNet  MATH  Article  Google Scholar 

  • Ouyang Y, Zhang F (2010) Histogram distance for similarity search in large time series database. In: IDEAL ’10, pp 170–177

  • Paredes R, Vidal E (2000) A class-dependent weighted dissimilarity measure for nearest neighbor classification problems. Pattern Recogn Lett 21(12): 1027–1036

    MATH  Article  Google Scholar 

  • Paredes R, Vidal E (2006) Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recogn 39(2): 180–188

    MATH  Article  Google Scholar 

  • Pham DT, Chan AB (1998) Control chart pattern recognition using a new type of self-organizing neural network. Proc Inst Mech Eng I J Syst Control Eng 212(2): 115–127

    Google Scholar 

  • Prekopcsák Z (2011) Matlab code for the experiments. (last checked on 14/05/2012)

  • Ratanamahatana CA, Keogh E (2005) Three myths about Dynamic Time Warping data mining. In: SDM ’05

  • Saito N (1994) Local feature extraction and its applications using a library of bases. PhD thesis, Yale University, New Haven

  • Sakoe H, Chiba S (1978a) Dynamic programming algorithm optimization for spoken word recognition. IEEE T Acoust Speech 26(1): 43–49

    MATH  Article  Google Scholar 

  • Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE T Acoust Speech 26(1): 43–49

    MATH  Article  Google Scholar 

  • Salvador S, Chan P (2007) FastDTW: Toward accurate dynamic time warping in linear time and space. Intell Data Anal 11(5): 561–580

    Google Scholar 

  • Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Molec Biol 4(1): 32

    Google Scholar 

  • Short R, Fukunaga K (1980) A new nearest neighbor distance measure. In: ICPR ’80, pp 81–86

  • Shumway RH (1982) Discriminant analysis for time series. In: Krishnaiah P, Kanal L (eds) Classification pattern recognition and reduction of dimensionality, Handbook of Statistics, vol 2. Elsevier, Amsterdam, pp 1–46

    Chapter  Google Scholar 

  • Stein C (1956) Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In: Proceedings of the Third Berkeley Symposium on Mathematical and Statistical Probability, pp 197–206

  • Sternickel K (2002) Automatic pattern recognition in ECG time series. Comput Meth Prog Bio 68(2): 109–115

    Article  Google Scholar 

  • Vandenberghe L, Boyd S (1996) Semidefinite programming. SIAM Rev, , pp pp 49–95

  • Weihs C, Ligges U, Mrchen F, Mllensiefen D (2007) Classification in music research. Adv Data Anal Classif 1: 255–291

    MathSciNet  MATH  Article  Google Scholar 

  • Weinberger K, Saul L (2008) Large margin nearest neighbor—matlab code. (last checked on 14/05/2012)

  • Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. JMLR 10: 207–244

    MATH  Google Scholar 

  • Wettschereck D, Aha DW, Mohri T (1997) A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif Intell Rev 11(1–5): 273–314

    Article  Google Scholar 

  • Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Tech. rep., Michigan State University, USA. (last checked on 14/05/2012)

  • Yu D, Yu X, Hu Q, Liu J, Wu A (2011) Dynamic time warping constraint learning for large margin nearest neighbor classification. Inform Sci 181(13): 2787–2796

    Article  Google Scholar 

  • Zhan DC, Li M, Li YF, Zhou ZH (2009) Learning instance specific distances using metric propagation. In: ICML’09, pp 1225–1232

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Zoltán Prekopcsák.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Prekopcsák, Z., Lemire, D. Time series classification by class-specific Mahalanobis distance measures. Adv Data Anal Classif 6, 185–200 (2012).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Time-series classification
  • Distance measure learning
  • Nearest Neighbor
  • Mahalanobis distance measure

Mathematics Subject Classification (2000)

  • 62-07
  • 62H30