PCA-KL: a parametric dimensionality reduction approach for unsupervised metric learning

Levada, Alexandre L. M.

doi:10.1007/s11634-020-00434-3

PCA-KL: a parametric dimensionality reduction approach for unsupervised metric learning

Regular Article
Published: 07 January 2021

Volume 15, pages 829–868, (2021)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Alexandre L. M. Levada ORCID: orcid.org/0000-0001-8253-2729¹

891 Accesses
8 Citations
Explore all metrics

Abstract

Dimensionality reduction algorithms are powerful mathematical tools for data analysis and visualization. In many pattern recognition applications, a feature extraction step is often required to mitigate the curse of the dimensionality, a collection of negative effects caused by an arbitrary increase in the number of features in classification tasks. Principal Component Analysis (PCA) is a classical statistical method that creates new features based on linear combinations of the original ones through the eigenvectors of the covariance matrix. In this paper, we propose PCA-KL, a parametric dimensionality reduction algorithm for unsupervised metric learning, based on the computation of the entropic covariance matrix, a surrogate for the covariance matrix of the data obtained in terms of the relative entropy between local Gaussian distributions instead of the usual Euclidean distance between the data points. Numerical experiments with several real datasets show that the proposed method is capable of producing better defined clusters and also higher classification accuracy in comparison to regular PCA and several manifold learning algorithms, making PCA-KL a promising alternative for unsupervised metric learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

Maximizing adjusted covariance: new supervised dimension reduction for classification

Article 02 April 2024

Feature dimensionality reduction: a review

Article Open access 21 January 2022

References

Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Article Google Scholar
Bellet A, Habrard A, Sebban M (2013) A survey on metric learning for feature vectors and structured data. arXiv:1306.6709
Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton
Book Google Scholar
Borg I, Groenen P (2005) Modern multidimensional scaling: theory and applications, 2nd edn. Springer, Berlin
MATH Google Scholar
Carreira-Perpinan MA (1997) A Review of Dimension Reduction Techniques. Technical report, University of Sheffield
Chávez E, Navarro G, Baeza-Yates R, Marroquín JL (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321
Article Google Scholar
Cook JA, Sutskever I, Mnih A, Hinton G (2007) Visualizing similarity data with a mixture of maps. In: Proceedings of the 11 th international conference on artificial intelligence and statistics, vol 2, pp 67–74
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. MIT Press, Cambridge
MATH Google Scholar
Cox TF, Cox MAA (2001) Multidimensional scaling. In: Monographs on statistics and applied probability, vol 88. Chapman & Hall
de Ridder D, Duin RP (2002) Locally linear embedding for classification. Technical report, Delft University of Technology
Ding C, He X, Simon H (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of SIAM international conference on data mining, pp 606–610
Ding C, Li T, Peng W (2008) On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comput Stat Data Anal 52:3913–3927
Article MathSciNet Google Scholar
Domingos P (1999) The role of occam’s razor in knowledge discovery. Data Min Knowl Discov 3(4):409–425
Article Google Scholar
Errica F (2018) Step-by-step derivation of sne and t-sne gradients. http://pages.di.unipi.it/errica/curious/derivations-sne-tsne
Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, London
MATH Google Scholar
Günter S, Schraudolph NN, Vishwanathan S (2007) Fast iterative kernel principal component analysis. J Mach Learn Res 8:1893–1918
MathSciNet MATH Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, Berlin
Book Google Scholar
Hinton GE, Roweis ST (2003) Stochastic neighbor embedding. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems, vol 15. MIT Press, Cambridge, pp 857–864
Google Scholar
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24:498–520
Article Google Scholar
Hughes GF (1968) On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory 14(1):55–63
Article Google Scholar
Huo X, Ni X, Smith AK (2008) A survey of manifold-based learning methods. In: Series on computers and operations research, vol 6. World Scientific, pp 691–745
Hwang J, Lay S, Lippman A (1994) Nonparametric multivariate density estimation: a comparative study. IEEE Trans Signal Process 42(10):2795–2810
Article Google Scholar
Hyvarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, Hoboken
Book Google Scholar
Jimenes LO, Landgrebe D (1998) Supervised classification in high dimensional space: geometrical, statistical and asymptotical properties of multivariate data. IEEE Trans Syst Man Cybern 28(1):39–54
Article Google Scholar
Johnstone IM, Paul D (2018) Pca in high dimensions: an orientation. Proc IEEE 106(8):1277–1292
Article Google Scholar
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, Berlin
MATH Google Scholar
Kunzhe W, Huaitie X (2018) Sparse kernel feature extraction via support vector learning. Pattern Recognit Lett 101(1):67–73
Google Scholar
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Article Google Scholar
Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction. Springer, Berlin
Book Google Scholar
Lee K, Park H (2012) Probabilistic learning of similarity measures for tensor PCA. Pattern Recognit Lett 33(10):1364–1372
Article Google Scholar
Li D, Tian Y (2018) Survey and experimental study on metric learning methods. Neural Netw 105:447–462
Article Google Scholar
Lin C (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw 18(6):1589–1596
Article Google Scholar
Marimont R, Shapiro M (1979) Nearest neighbour searches and the curse of dimensionality. IMA J Appl Math 24(1):59–70
Article Google Scholar
Melville J (2015) SNEER: stochastic neighbor embedding experiments in R. http://jlmelville.github.io/sneer/gradients.html
Nguyen LH, Holmes S (2019) Ten quick tips for effective dimensionality reduction. PLoS Comput Biol 15(6):e1006907
Article Google Scholar
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article Google Scholar
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Google Scholar
Saul L, Roweis S (2003) Think globally, fit locally: unsupervised learning of low dimensional manifolds. J Mach Learn Res 4:119–155
MathSciNet MATH Google Scholar
Schölkopf B, Smola A, Müller KR (1999) Kernel principal component analysis. In: Advances in kernel methods—support vector learning. MIT Press, pp 327–352
Scott DW (1992) Multivariate density estimation. Wiley, Hoboken
Book Google Scholar
Shlens J (2005) A tutorial on principal component analysis. In: Systems neurobiology laboratory. Salk Institute for Biological Studies
Suárez JL, García S, Herrera F (2018) A tutorial on distance metric learning: mathematical foundations, algorithms and software. arXiv:1812.05944
Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323
Article Google Scholar
Trunk GV (1979) A problem of dimensionality: a simple example. IEEE Trans Pattern Anal Mach Intell 1(3):306–307
Article Google Scholar
Vaswani N, Chi Y, Bouwmans T (2018) Rethinking pca for modern data sets: Theory, algorithms, and applications. Proc IEEE 106(8):1274–1276
Article Google Scholar
van der Maaten L, Hinton G (2008) Visualizing high-dimensional data using t-sne. J Mach Learn Res 9:2579–2605
MATH Google Scholar
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395–416
Article MathSciNet Google Scholar
Wang F, Sun J (2015) Survey on distance metric learning and dimensionality reduction in data mining. Data Min Knowl Discov 29(2):534–564
Article MathSciNet Google Scholar
Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Mich State Univ 2:4
Google Scholar
Yang TN, Wang SD (1999) Robust algorithms for principal component analysis. Pattern Recognit Lett 20(9):927–933
Article Google Scholar
Young TY, Calvert TW (1974) Classification, Estimation and Pattern Recognition. Elsevier, Amsterdam
MATH Google Scholar
Zimek A, Schubert E, Kriegel HP (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min 5(5):363–387
Article MathSciNet Google Scholar

Download references

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-Brasil (CAPES)-Finance Code 001.

Author information

Authors and Affiliations

Computing Department, Federal University of São Carlos, São Carlos, Brazil
Alexandre L. M. Levada

Authors

Alexandre L. M. Levada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexandre L. M. Levada.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Levada, A.L.M. PCA-KL: a parametric dimensionality reduction approach for unsupervised metric learning. Adv Data Anal Classif 15, 829–868 (2021). https://doi.org/10.1007/s11634-020-00434-3

Download citation

Received: 10 December 2019
Revised: 29 April 2020
Accepted: 06 December 2020
Published: 07 January 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11634-020-00434-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PCA-KL: a parametric dimensionality reduction approach for unsupervised metric learning

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Maximizing adjusted covariance: new supervised dimension reduction for classification

Feature dimensionality reduction: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

PCA-KL: a parametric dimensionality reduction approach for unsupervised metric learning

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Maximizing adjusted covariance: new supervised dimension reduction for classification

Feature dimensionality reduction: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation