Abstract
To classify time series by nearest neighbors, we need to specify or learn one or several distance measures. We consider variations of the Mahalanobis distance measures which rely on the inverse covariance matrix of the data. Unfortunately—for time series data—the covariance matrix has often low rank. To alleviate this problem we can either use a pseudoinverse, covariance shrinking or limit the matrix to its diagonal. We review these alternatives and benchmark them against competitive methods such as the related Large Margin Nearest Neighbor Classification (LMNN) and the Dynamic Time Warping (DTW) distance. As we expected, we find that the DTW is superior, but the Mahalanobis distance measures are one to two orders of magnitude faster. To get best results with Mahalanobis distance measures, we recommend learning one distance measure per class using either covariance shrinking or the diagonal approach.
Similar content being viewed by others
References
Breiman L (1998) Classification and regression trees. Chapman & Hall/CRC, London
Chai J, Liu H, Chen B, Bao Z (2010) Large margin nearest local mean classifier. Signal Process 90(1): 236–248
Chouakria A, Nagabhushan P (2007) Adaptive dissimilarity index for measuring time series proximity. Adv Data Anal Classif 1: 5–21
Csatári B, Prekopcsák Z (2010) Class-based attribute weighting for time series classification. In: POSTER 2010: Proceedings of the 14th International Student Conference on Electrical Engineering
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: VLDB ’08, pp 1542–1552
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7(2): 179–188
Gaudin R, Nicoloyannis N (2006) An adaptable time warping distance for time series learning. In: ICMLA ’06, pp 213–218
Hastie T, Tibshirani R (1996) Discriminant adaptive nearest neighbor classification. IEEE T Pattern Anal 18(6): 607–616
Ishikawa Y, Subramanya R, Faloutsos C (1998) Mindreader: Querying databases through multiple examples. In: VLDB ’98, pp 218–227
Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE T Acoust Speech 23(1): 67–72
Jahromi MZ, Parvinnia E, John R (2009) A method of learning weighted similarity function to improve the performance of nearest neighbor. Inform Sci 179(17): 2964–2973
Jeong YS, Jeong MK, Omitaomu OA (2011) Weighted dynamic time warping for time series classification. Pattern Recogn 44(9): 2231–2240
Keogh E, Xi X, Wei L, Ratanamahatana CA (2006) The UCR time series classification/clustering homepage. http://www.cs.ucr.edu/~eamonn/time_series_data/ (last checked on 14/05/2012)
Legrand B, Chang C, Ong S, Neo SY, Palanisamy N (2008) Chromosome classification using dynamic time warping. Pattern Recogn Lett 29(3): 215–222
Lemire D (2009) Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern Recogn 42: 2169–2180
Mahalanobis PC (1936) On the generalised distance in statistics. Proc Natl Acad Sci India 2(1): 49–55
Matton M, Compernolle DV, Cools R (2010) Minimum classification error training in example based speech and pattern recognition using sparse weight matrices. J Comput Appl Math 234(4): 1303–1311
Ouyang Y, Zhang F (2010) Histogram distance for similarity search in large time series database. In: IDEAL ’10, pp 170–177
Paredes R, Vidal E (2000) A class-dependent weighted dissimilarity measure for nearest neighbor classification problems. Pattern Recogn Lett 21(12): 1027–1036
Paredes R, Vidal E (2006) Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recogn 39(2): 180–188
Pham DT, Chan AB (1998) Control chart pattern recognition using a new type of self-organizing neural network. Proc Inst Mech Eng I J Syst Control Eng 212(2): 115–127
Prekopcsák Z (2011) Matlab code for the experiments. http://github.com/Preko/Time-series-classification (last checked on 14/05/2012)
Ratanamahatana CA, Keogh E (2005) Three myths about Dynamic Time Warping data mining. In: SDM ’05
Saito N (1994) Local feature extraction and its applications using a library of bases. PhD thesis, Yale University, New Haven
Sakoe H, Chiba S (1978a) Dynamic programming algorithm optimization for spoken word recognition. IEEE T Acoust Speech 26(1): 43–49
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE T Acoust Speech 26(1): 43–49
Salvador S, Chan P (2007) FastDTW: Toward accurate dynamic time warping in linear time and space. Intell Data Anal 11(5): 561–580
Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Molec Biol 4(1): 32
Short R, Fukunaga K (1980) A new nearest neighbor distance measure. In: ICPR ’80, pp 81–86
Shumway RH (1982) Discriminant analysis for time series. In: Krishnaiah P, Kanal L (eds) Classification pattern recognition and reduction of dimensionality, Handbook of Statistics, vol 2. Elsevier, Amsterdam, pp 1–46
Stein C (1956) Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In: Proceedings of the Third Berkeley Symposium on Mathematical and Statistical Probability, pp 197–206
Sternickel K (2002) Automatic pattern recognition in ECG time series. Comput Meth Prog Bio 68(2): 109–115
Vandenberghe L, Boyd S (1996) Semidefinite programming. SIAM Rev, , pp pp 49–95
Weihs C, Ligges U, Mrchen F, Mllensiefen D (2007) Classification in music research. Adv Data Anal Classif 1: 255–291
Weinberger K, Saul L (2008) Large margin nearest neighbor—matlab code. http://www.cse.wustl.edu/~kilian/Downloads/LMNN.html (last checked on 14/05/2012)
Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. JMLR 10: 207–244
Wettschereck D, Aha DW, Mohri T (1997) A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif Intell Rev 11(1–5): 273–314
Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Tech. rep., Michigan State University, USA. http://www.cs.cmu.edu/~liuy/frame_survey_v2.pdf (last checked on 14/05/2012)
Yu D, Yu X, Hu Q, Liu J, Wu A (2011) Dynamic time warping constraint learning for large margin nearest neighbor classification. Inform Sci 181(13): 2787–2796
Zhan DC, Li M, Li YF, Zhou ZH (2009) Learning instance specific distances using metric propagation. In: ICML’09, pp 1225–1232
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Prekopcsák, Z., Lemire, D. Time series classification by class-specific Mahalanobis distance measures. Adv Data Anal Classif 6, 185–200 (2012). https://doi.org/10.1007/s11634-012-0110-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-012-0110-6