Skip to main content
Log in

PCA-KL: a parametric dimensionality reduction approach for unsupervised metric learning

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Dimensionality reduction algorithms are powerful mathematical tools for data analysis and visualization. In many pattern recognition applications, a feature extraction step is often required to mitigate the curse of the dimensionality, a collection of negative effects caused by an arbitrary increase in the number of features in classification tasks. Principal Component Analysis (PCA) is a classical statistical method that creates new features based on linear combinations of the original ones through the eigenvectors of the covariance matrix. In this paper, we propose PCA-KL, a parametric dimensionality reduction algorithm for unsupervised metric learning, based on the computation of the entropic covariance matrix, a surrogate for the covariance matrix of the data obtained in terms of the relative entropy between local Gaussian distributions instead of the usual Euclidean distance between the data points. Numerical experiments with several real datasets show that the proposed method is capable of producing better defined clusters and also higher classification accuracy in comparison to regular PCA and several manifold learning algorithms, making PCA-KL a promising alternative for unsupervised metric learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  Google Scholar 

  • Bellet A, Habrard A, Sebban M (2013) A survey on metric learning for feature vectors and structured data. arXiv:1306.6709

  • Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton

    Book  Google Scholar 

  • Borg I, Groenen P (2005) Modern multidimensional scaling: theory and applications, 2nd edn. Springer, Berlin

    MATH  Google Scholar 

  • Carreira-Perpinan MA (1997) A Review of Dimension Reduction Techniques. Technical report, University of Sheffield

  • Chávez E, Navarro G, Baeza-Yates R, Marroquín JL (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321

    Article  Google Scholar 

  • Cook JA, Sutskever I, Mnih A, Hinton G (2007) Visualizing similarity data with a mixture of maps. In: Proceedings of the 11 th international conference on artificial intelligence and statistics, vol 2, pp 67–74

  • Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. MIT Press, Cambridge

    MATH  Google Scholar 

  • Cox TF, Cox MAA (2001) Multidimensional scaling. In: Monographs on statistics and applied probability, vol 88. Chapman & Hall

  • de Ridder D, Duin RP (2002) Locally linear embedding for classification. Technical report, Delft University of Technology

  • Ding C, He X, Simon H (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of SIAM international conference on data mining, pp 606–610

  • Ding C, Li T, Peng W (2008) On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comput Stat Data Anal 52:3913–3927

    Article  MathSciNet  Google Scholar 

  • Domingos P (1999) The role of occam’s razor in knowledge discovery. Data Min Knowl Discov 3(4):409–425

    Article  Google Scholar 

  • Errica F (2018) Step-by-step derivation of sne and t-sne gradients. http://pages.di.unipi.it/errica/curious/derivations-sne-tsne

  • Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, London

    MATH  Google Scholar 

  • Günter S, Schraudolph NN, Vishwanathan S (2007) Fast iterative kernel principal component analysis. J Mach Learn Res 8:1893–1918

    MathSciNet  MATH  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, Berlin

    Book  Google Scholar 

  • Hinton GE, Roweis ST (2003) Stochastic neighbor embedding. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems, vol 15. MIT Press, Cambridge, pp 857–864

    Google Scholar 

  • Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24:498–520

    Article  Google Scholar 

  • Hughes GF (1968) On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory 14(1):55–63

    Article  Google Scholar 

  • Huo X, Ni X, Smith AK (2008) A survey of manifold-based learning methods. In: Series on computers and operations research, vol 6. World Scientific, pp 691–745

  • Hwang J, Lay S, Lippman A (1994) Nonparametric multivariate density estimation: a comparative study. IEEE Trans Signal Process 42(10):2795–2810

    Article  Google Scholar 

  • Hyvarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, Hoboken

    Book  Google Scholar 

  • Jimenes LO, Landgrebe D (1998) Supervised classification in high dimensional space: geometrical, statistical and asymptotical properties of multivariate data. IEEE Trans Syst Man Cybern 28(1):39–54

    Article  Google Scholar 

  • Johnstone IM, Paul D (2018) Pca in high dimensions: an orientation. Proc IEEE 106(8):1277–1292

    Article  Google Scholar 

  • Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, Berlin

    MATH  Google Scholar 

  • Kunzhe W, Huaitie X (2018) Sparse kernel feature extraction via support vector learning. Pattern Recognit Lett 101(1):67–73

    Google Scholar 

  • Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791

    Article  Google Scholar 

  • Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction. Springer, Berlin

    Book  Google Scholar 

  • Lee K, Park H (2012) Probabilistic learning of similarity measures for tensor PCA. Pattern Recognit Lett 33(10):1364–1372

    Article  Google Scholar 

  • Li D, Tian Y (2018) Survey and experimental study on metric learning methods. Neural Netw 105:447–462

    Article  Google Scholar 

  • Lin C (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw 18(6):1589–1596

    Article  Google Scholar 

  • Marimont R, Shapiro M (1979) Nearest neighbour searches and the curse of dimensionality. IMA J Appl Math 24(1):59–70

    Article  Google Scholar 

  • Melville J (2015) SNEER: stochastic neighbor embedding experiments in R. http://jlmelville.github.io/sneer/gradients.html

  • Nguyen LH, Holmes S (2019) Ten quick tips for effective dimensionality reduction. PLoS Comput Biol 15(6):e1006907

    Article  Google Scholar 

  • Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  Google Scholar 

  • Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326

    Google Scholar 

  • Saul L, Roweis S (2003) Think globally, fit locally: unsupervised learning of low dimensional manifolds. J Mach Learn Res 4:119–155

    MathSciNet  MATH  Google Scholar 

  • Schölkopf B, Smola A, Müller KR (1999) Kernel principal component analysis. In: Advances in kernel methods—support vector learning. MIT Press, pp 327–352

  • Scott DW (1992) Multivariate density estimation. Wiley, Hoboken

    Book  Google Scholar 

  • Shlens J (2005) A tutorial on principal component analysis. In: Systems neurobiology laboratory. Salk Institute for Biological Studies

  • Suárez JL, García S, Herrera F (2018) A tutorial on distance metric learning: mathematical foundations, algorithms and software. arXiv:1812.05944

  • Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323

    Article  Google Scholar 

  • Trunk GV (1979) A problem of dimensionality: a simple example. IEEE Trans Pattern Anal Mach Intell 1(3):306–307

    Article  Google Scholar 

  • Vaswani N, Chi Y, Bouwmans T (2018) Rethinking pca for modern data sets: Theory, algorithms, and applications. Proc IEEE 106(8):1274–1276

    Article  Google Scholar 

  • van der Maaten L, Hinton G (2008) Visualizing high-dimensional data using t-sne. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  • von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395–416

    Article  MathSciNet  Google Scholar 

  • Wang F, Sun J (2015) Survey on distance metric learning and dimensionality reduction in data mining. Data Min Knowl Discov 29(2):534–564

    Article  MathSciNet  Google Scholar 

  • Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Mich State Univ 2:4

    Google Scholar 

  • Yang TN, Wang SD (1999) Robust algorithms for principal component analysis. Pattern Recognit Lett 20(9):927–933

    Article  Google Scholar 

  • Young TY, Calvert TW (1974) Classification, Estimation and Pattern Recognition. Elsevier, Amsterdam

    MATH  Google Scholar 

  • Zimek A, Schubert E, Kriegel HP (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min 5(5):363–387

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-Brasil (CAPES)-Finance Code 001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandre L. M. Levada.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Levada, A.L.M. PCA-KL: a parametric dimensionality reduction approach for unsupervised metric learning. Adv Data Anal Classif 15, 829–868 (2021). https://doi.org/10.1007/s11634-020-00434-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-020-00434-3

Keywords

Mathematics Subject Classification

Navigation