PCA-KL: a parametric dimensionality reduction approach for unsupervised metric learning

Abstract

Dimensionality reduction algorithms are powerful mathematical tools for data analysis and visualization. In many pattern recognition applications, a feature extraction step is often required to mitigate the curse of the dimensionality, a collection of negative effects caused by an arbitrary increase in the number of features in classification tasks. Principal Component Analysis (PCA) is a classical statistical method that creates new features based on linear combinations of the original ones through the eigenvectors of the covariance matrix. In this paper, we propose PCA-KL, a parametric dimensionality reduction algorithm for unsupervised metric learning, based on the computation of the entropic covariance matrix, a surrogate for the covariance matrix of the data obtained in terms of the relative entropy between local Gaussian distributions instead of the usual Euclidean distance between the data points. Numerical experiments with several real datasets show that the proposed method is capable of producing better defined clusters and also higher classification accuracy in comparison to regular PCA and several manifold learning algorithms, making PCA-KL a promising alternative for unsupervised metric learning.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  1. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    MATH  Article  Google Scholar 

  2. Bellet A, Habrard A, Sebban M (2013) A survey on metric learning for feature vectors and structured data. arXiv:1306.6709

  3. Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton

    Google Scholar 

  4. Borg I, Groenen P (2005) Modern multidimensional scaling: theory and applications, 2nd edn. Springer, Berlin

    Google Scholar 

  5. Carreira-Perpinan MA (1997) A Review of Dimension Reduction Techniques. Technical report, University of Sheffield

  6. Chávez E, Navarro G, Baeza-Yates R, Marroquín JL (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321

    Article  Google Scholar 

  7. Cook JA, Sutskever I, Mnih A, Hinton G (2007) Visualizing similarity data with a mixture of maps. In: Proceedings of the 11 th international conference on artificial intelligence and statistics, vol 2, pp 67–74

  8. Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. MIT Press, Cambridge

    Google Scholar 

  9. Cox TF, Cox MAA (2001) Multidimensional scaling. In: Monographs on statistics and applied probability, vol 88. Chapman & Hall

  10. de Ridder D, Duin RP (2002) Locally linear embedding for classification. Technical report, Delft University of Technology

  11. Ding C, He X, Simon H (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of SIAM international conference on data mining, pp 606–610

  12. Ding C, Li T, Peng W (2008) On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comput Stat Data Anal 52:3913–3927

    MathSciNet  MATH  Article  Google Scholar 

  13. Domingos P (1999) The role of occam’s razor in knowledge discovery. Data Min Knowl Discov 3(4):409–425

    Article  Google Scholar 

  14. Errica F (2018) Step-by-step derivation of sne and t-sne gradients. http://pages.di.unipi.it/errica/curious/derivations-sne-tsne

  15. Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, London

    Google Scholar 

  16. Günter S, Schraudolph NN, Vishwanathan S (2007) Fast iterative kernel principal component analysis. J Mach Learn Res 8:1893–1918

    MathSciNet  MATH  Google Scholar 

  17. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, Berlin

    Google Scholar 

  18. Hinton GE, Roweis ST (2003) Stochastic neighbor embedding. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems, vol 15. MIT Press, Cambridge, pp 857–864

    Google Scholar 

  19. Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24:498–520

    MATH  Article  Google Scholar 

  20. Hughes GF (1968) On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory 14(1):55–63

    Article  Google Scholar 

  21. Huo X, Ni X, Smith AK (2008) A survey of manifold-based learning methods. In: Series on computers and operations research, vol 6. World Scientific, pp 691–745

  22. Hwang J, Lay S, Lippman A (1994) Nonparametric multivariate density estimation: a comparative study. IEEE Trans Signal Process 42(10):2795–2810

    Article  Google Scholar 

  23. Hyvarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, Hoboken

    Google Scholar 

  24. Jimenes LO, Landgrebe D (1998) Supervised classification in high dimensional space: geometrical, statistical and asymptotical properties of multivariate data. IEEE Trans Syst Man Cybern 28(1):39–54

    Article  Google Scholar 

  25. Johnstone IM, Paul D (2018) Pca in high dimensions: an orientation. Proc IEEE 106(8):1277–1292

    Article  Google Scholar 

  26. Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, Berlin

    Google Scholar 

  27. Kunzhe W, Huaitie X (2018) Sparse kernel feature extraction via support vector learning. Pattern Recognit Lett 101(1):67–73

    Google Scholar 

  28. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791

    MATH  Article  Google Scholar 

  29. Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction. Springer, Berlin

    Google Scholar 

  30. Lee K, Park H (2012) Probabilistic learning of similarity measures for tensor PCA. Pattern Recognit Lett 33(10):1364–1372

    Article  Google Scholar 

  31. Li D, Tian Y (2018) Survey and experimental study on metric learning methods. Neural Netw 105:447–462

    Article  Google Scholar 

  32. Lin C (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw 18(6):1589–1596

    Article  Google Scholar 

  33. Marimont R, Shapiro M (1979) Nearest neighbour searches and the curse of dimensionality. IMA J Appl Math 24(1):59–70

    MATH  Article  Google Scholar 

  34. Melville J (2015) SNEER: stochastic neighbor embedding experiments in R. http://jlmelville.github.io/sneer/gradients.html

  35. Nguyen LH, Holmes S (2019) Ten quick tips for effective dimensionality reduction. PLoS Comput Biol 15(6):e1006907

    Article  Google Scholar 

  36. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    MATH  Article  Google Scholar 

  37. Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326

    Article  Google Scholar 

  38. Saul L, Roweis S (2003) Think globally, fit locally: unsupervised learning of low dimensional manifolds. J Mach Learn Res 4:119–155

    MathSciNet  MATH  Google Scholar 

  39. Schölkopf B, Smola A, Müller KR (1999) Kernel principal component analysis. In: Advances in kernel methods—support vector learning. MIT Press, pp 327–352

  40. Scott DW (1992) Multivariate density estimation. Wiley, Hoboken

    Google Scholar 

  41. Shlens J (2005) A tutorial on principal component analysis. In: Systems neurobiology laboratory. Salk Institute for Biological Studies

  42. Suárez JL, García S, Herrera F (2018) A tutorial on distance metric learning: mathematical foundations, algorithms and software. arXiv:1812.05944

  43. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323

    Article  Google Scholar 

  44. Trunk GV (1979) A problem of dimensionality: a simple example. IEEE Trans Pattern Anal Mach Intell 1(3):306–307

    Article  Google Scholar 

  45. Vaswani N, Chi Y, Bouwmans T (2018) Rethinking pca for modern data sets: Theory, algorithms, and applications. Proc IEEE 106(8):1274–1276

    Article  Google Scholar 

  46. van der Maaten L, Hinton G (2008) Visualizing high-dimensional data using t-sne. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  47. von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395–416

    MathSciNet  Article  Google Scholar 

  48. Wang F, Sun J (2015) Survey on distance metric learning and dimensionality reduction in data mining. Data Min Knowl Discov 29(2):534–564

    MathSciNet  MATH  Article  Google Scholar 

  49. Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Mich State Univ 2:4

    Google Scholar 

  50. Yang TN, Wang SD (1999) Robust algorithms for principal component analysis. Pattern Recognit Lett 20(9):927–933

    Article  Google Scholar 

  51. Young TY, Calvert TW (1974) Classification, Estimation and Pattern Recognition. Elsevier, Amsterdam

    Google Scholar 

  52. Zimek A, Schubert E, Kriegel HP (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min 5(5):363–387

    MathSciNet  MATH  Article  Google Scholar 

Download references

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-Brasil (CAPES)-Finance Code 001.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Alexandre L. M. Levada.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Levada, A.L.M. PCA-KL: a parametric dimensionality reduction approach for unsupervised metric learning. Adv Data Anal Classif (2021). https://doi.org/10.1007/s11634-020-00434-3

Download citation

Keywords

  • Dimensionality reduction
  • PCA
  • KL-divergence
  • Unsupervised Metric learning

Mathematics Subject Classification

  • 62H30
  • 94A16
  • 94A17
  • 68T10