Abstract
Dimensionality reduction algorithms are powerful mathematical tools for data analysis and visualization. In many pattern recognition applications, a feature extraction step is often required to mitigate the curse of the dimensionality, a collection of negative effects caused by an arbitrary increase in the number of features in classification tasks. Principal Component Analysis (PCA) is a classical statistical method that creates new features based on linear combinations of the original ones through the eigenvectors of the covariance matrix. In this paper, we propose PCA-KL, a parametric dimensionality reduction algorithm for unsupervised metric learning, based on the computation of the entropic covariance matrix, a surrogate for the covariance matrix of the data obtained in terms of the relative entropy between local Gaussian distributions instead of the usual Euclidean distance between the data points. Numerical experiments with several real datasets show that the proposed method is capable of producing better defined clusters and also higher classification accuracy in comparison to regular PCA and several manifold learning algorithms, making PCA-KL a promising alternative for unsupervised metric learning.
Similar content being viewed by others
References
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Bellet A, Habrard A, Sebban M (2013) A survey on metric learning for feature vectors and structured data. arXiv:1306.6709
Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton
Borg I, Groenen P (2005) Modern multidimensional scaling: theory and applications, 2nd edn. Springer, Berlin
Carreira-Perpinan MA (1997) A Review of Dimension Reduction Techniques. Technical report, University of Sheffield
Chávez E, Navarro G, Baeza-Yates R, Marroquín JL (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321
Cook JA, Sutskever I, Mnih A, Hinton G (2007) Visualizing similarity data with a mixture of maps. In: Proceedings of the 11 th international conference on artificial intelligence and statistics, vol 2, pp 67–74
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. MIT Press, Cambridge
Cox TF, Cox MAA (2001) Multidimensional scaling. In: Monographs on statistics and applied probability, vol 88. Chapman & Hall
de Ridder D, Duin RP (2002) Locally linear embedding for classification. Technical report, Delft University of Technology
Ding C, He X, Simon H (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of SIAM international conference on data mining, pp 606–610
Ding C, Li T, Peng W (2008) On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comput Stat Data Anal 52:3913–3927
Domingos P (1999) The role of occam’s razor in knowledge discovery. Data Min Knowl Discov 3(4):409–425
Errica F (2018) Step-by-step derivation of sne and t-sne gradients. http://pages.di.unipi.it/errica/curious/derivations-sne-tsne
Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, London
Günter S, Schraudolph NN, Vishwanathan S (2007) Fast iterative kernel principal component analysis. J Mach Learn Res 8:1893–1918
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, Berlin
Hinton GE, Roweis ST (2003) Stochastic neighbor embedding. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems, vol 15. MIT Press, Cambridge, pp 857–864
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24:498–520
Hughes GF (1968) On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory 14(1):55–63
Huo X, Ni X, Smith AK (2008) A survey of manifold-based learning methods. In: Series on computers and operations research, vol 6. World Scientific, pp 691–745
Hwang J, Lay S, Lippman A (1994) Nonparametric multivariate density estimation: a comparative study. IEEE Trans Signal Process 42(10):2795–2810
Hyvarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, Hoboken
Jimenes LO, Landgrebe D (1998) Supervised classification in high dimensional space: geometrical, statistical and asymptotical properties of multivariate data. IEEE Trans Syst Man Cybern 28(1):39–54
Johnstone IM, Paul D (2018) Pca in high dimensions: an orientation. Proc IEEE 106(8):1277–1292
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, Berlin
Kunzhe W, Huaitie X (2018) Sparse kernel feature extraction via support vector learning. Pattern Recognit Lett 101(1):67–73
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction. Springer, Berlin
Lee K, Park H (2012) Probabilistic learning of similarity measures for tensor PCA. Pattern Recognit Lett 33(10):1364–1372
Li D, Tian Y (2018) Survey and experimental study on metric learning methods. Neural Netw 105:447–462
Lin C (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw 18(6):1589–1596
Marimont R, Shapiro M (1979) Nearest neighbour searches and the curse of dimensionality. IMA J Appl Math 24(1):59–70
Melville J (2015) SNEER: stochastic neighbor embedding experiments in R. http://jlmelville.github.io/sneer/gradients.html
Nguyen LH, Holmes S (2019) Ten quick tips for effective dimensionality reduction. PLoS Comput Biol 15(6):e1006907
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Saul L, Roweis S (2003) Think globally, fit locally: unsupervised learning of low dimensional manifolds. J Mach Learn Res 4:119–155
Schölkopf B, Smola A, Müller KR (1999) Kernel principal component analysis. In: Advances in kernel methods—support vector learning. MIT Press, pp 327–352
Scott DW (1992) Multivariate density estimation. Wiley, Hoboken
Shlens J (2005) A tutorial on principal component analysis. In: Systems neurobiology laboratory. Salk Institute for Biological Studies
Suárez JL, García S, Herrera F (2018) A tutorial on distance metric learning: mathematical foundations, algorithms and software. arXiv:1812.05944
Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323
Trunk GV (1979) A problem of dimensionality: a simple example. IEEE Trans Pattern Anal Mach Intell 1(3):306–307
Vaswani N, Chi Y, Bouwmans T (2018) Rethinking pca for modern data sets: Theory, algorithms, and applications. Proc IEEE 106(8):1274–1276
van der Maaten L, Hinton G (2008) Visualizing high-dimensional data using t-sne. J Mach Learn Res 9:2579–2605
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395–416
Wang F, Sun J (2015) Survey on distance metric learning and dimensionality reduction in data mining. Data Min Knowl Discov 29(2):534–564
Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Mich State Univ 2:4
Yang TN, Wang SD (1999) Robust algorithms for principal component analysis. Pattern Recognit Lett 20(9):927–933
Young TY, Calvert TW (1974) Classification, Estimation and Pattern Recognition. Elsevier, Amsterdam
Zimek A, Schubert E, Kriegel HP (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min 5(5):363–387
Acknowledgements
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-Brasil (CAPES)-Finance Code 001.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Levada, A.L.M. PCA-KL: a parametric dimensionality reduction approach for unsupervised metric learning. Adv Data Anal Classif 15, 829–868 (2021). https://doi.org/10.1007/s11634-020-00434-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-020-00434-3