Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Polynomial Kernel Discriminant Analysis for 2D visualization of classification problems

  • 582 Accesses

Abstract

In multivariate classification problems, 2D visualization methods can be very useful to understand the data properties whenever they transform the n-dimensional data into a set of 2D patterns which are similar to the original data from the classification point of view. This similarity can be understood as that a classification method works similarly on the original n-dimensional and on the 2D mapped patterns, i.e., the classifier performance should not be much lower on the mapped than on the original patterns. We propose several simple and efficient mapping methods which allow to visualize classification problems in 2D. In order to preserve the structure about the original classification problem, the mappings minimize different class overlap measures, combined with different functions (linear, quadratic and polynomic of several degrees) from \({\mathbb {R}}^n\) to \({\mathbb {R}}^2\). They are also able to map into \({\mathbb {R}}^2\) new data points (out of sample), not used during the mapping learning. This is one of the main benefits of the proposed methods, since few supervised mappings offer a similar behavior. For 71 data sets of the UCI database, we compare the SVM performance using the original and the 2D mapped patterns. The comparison also includes other 34 popular supervised and unsupervised methods of dimensionality reduction, some of them used for the first time in classification. One of the proposed methods, the Polynomial Kernel Discriminant Analysis of degree 2 (PKDA2), outperforms the remaining mappings. Compared to the original n-dimensional patterns, PKDA2 achieves 82% of the performance (measured by the Cohen kappa), raising or keeping the performance for 26.8% of the data sets. For 36.6% of the data sets, the performance is reduced by less than 10%, and it is reduced by more than 20% only for 22.5% of the data sets. This low reduction in performance shows that the 2D maps created by PKDA2 really represent the original data, whose ability to be classified in 2D is highly preserved. Besides, PKDA is very fast, with times of the same order than LDA. The MATLAB code is available.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. 1.

    https://wiki.citius.usc.es/inv:downloadable_results:fish_ovary.

  2. 2.

    http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/mappings.

References

  1. 1.

    Agrafiotis D (2003) Stochastic proximity embedding. J Comput Chem 24(10):1215–1221

  2. 2.

    Balakrishnama S, Ganapathiraju A (1998) Linear discriminant analysis-a brief tutorial, vol 18. Institute for Signal and information Processing, Starkville

  3. 3.

    Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Comput 12(10):2385–2404

  4. 4.

    Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

  5. 5.

    Bengio Y, Paiement J, Vincent P, Delalleau O, Roux NL, Ouimet M (2004) Out-of-sample extensions for LLE, Isomap, MDS, eigenmaps, and spectral clustering. In: Advances in neural information processing systems, vol 16, pp 177–184

  6. 6.

    Brand M (2002) Charting a manifold. In: Proceedings of neural information processing systems, pp 961–968

  7. 7.

    Buja A, Swayne D, Littman M, Dean H, Hofmann H, Chen L (2008) Data visualization with multidimensional scaling. J Comput Graph Stat 17(2):444–472

  8. 8.

    Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27

  9. 9.

    Chen D, Cao X, Wen F, Sun J (2013) Blessing of dimensionality: high-dimensional feature and its efficient compression for face verification. In: Proceedings IEEE conference on computer vision and pattern recognition, pp 3025–3032

  10. 10.

    Coifman R, Lafon S (2006) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30

  11. 11.

    Cunningham J, Ghahramani Z (2015) Linear dimensionality reduction: survey, insights, and generalizations. J Mach Learn Res 16:2859–2900

  12. 12.

    Donoho D, Grimes C (2005) Hessian eigenmaps: new locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci 102:7426–7431

  13. 13.

    Duda R, Hart P, Stork D (2001) Pattern classification, 2nd edn. Wiley-Interscience, Hoboken

  14. 14.

    Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real classification problems? J Mach Learn Res 15:3133–3181

  15. 15.

    Globerson A, Roweis S (2006) Metric learning by collapsing classes. In: Advances in neural information processing systems, vol 18, pp 451–458

  16. 16.

    Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2004) Neighborhood component analysis. In: Proceedings of neural information processing systems, pp 513–520

  17. 17.

    González-Rufino E, Carrión P, Cernadas E, Fernández-Delgado M, Domínguez-Petit R (2013) Exhaustive comparison of colour texture features and classification methods to discriminate cells categories in histological images of fish ovary. Pattern Recognit 46:2391–2407

  18. 18.

    He X (2005) Locality preserving projections. Ph.D. thesis, University of Chicago

  19. 19.

    He X, Cai D, Yan S, Zhang H (2005) Neighborhood preserving embedding. In: Proceedings of IEEE international conference on computer vision, vol 2, pp 1208–1213

  20. 20.

    Hinton G, Roweis S (2002) Stochastic neighbor embedding. In: Proceedings of neural information processing systems, pp 833–840

  21. 21.

    Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

  22. 22.

    Jianzhong W (2011) Geometric structure of high-dimensional data and dimensionality reduction, chap. Maximum variance unfolding. Springer, Berlin, pp 181–202

  23. 23.

    Jolliffe I (2002) Principal component analysis. Wiley Online Library

  24. 24.

    Lanaaya H, Martin A, Aboutajdine D, Khenchaf AH (2005) A new dimensionality reduction method for seabed characterization: supervised curvilinear component analysis. In: Europe Oceans 2005, vol 1, pp 339–344

  25. 25.

    Lawrence N (2004) Gaussian process latent variable models for visualisation of high dimensional data. In: Advances in neural information processing systems, vol 16, pp 329–336

  26. 26.

    Lespinats S, Aupetit M, Meyer-Baese A (2015) Classimap: a new dimension reduction technique for exploratory data analysis of labeled data. Int J Pattern Recogn Artif Intell 29(06):1551008

  27. 27.

    Lespinats S, Verleysen M, Giron A, Fertil B (2007) DD-HDS: a method for visualization and exploration of high-dimensional data. IEEE Trans Neural Netw 18(5):1265–1279

  28. 28.

    Li C, Guo J (2006) Supervised Isomap with explicit mapping. In: Int Conf Innov Comput Inf Control, vol 3, pp 345–348

  29. 29.

    Lichman M (2013) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

  30. 30.

    Lisitsyn S, Widmer C, García F (2013) Tapkee: an efficient dimension reduction library. J Mach Learn Res 14:2355–2359. Software available at http://tapkee.lisitsyn.me

  31. 31.

    Maaten L (2007) An introduction to dimensionality reduction using Matlab. Technical report 2579-2605, Universiteit Maastricht. http://lvdmaaten.github.io/drtoolbox

  32. 32.

    Maaten L (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245

  33. 33.

    Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(2579–2605):85

  34. 34.

    Maaten L, Postma E, Herik H (2009) Dimensionality reduction: A comparative review. Technical report, Tilburg University. http://lvdmaaten.github.io/drtoolbox

  35. 35.

    Mika S, Ratsch G, Weston J, Schölkopf B, Mullers KR (1999) Fisher discriminant analysis with kernels. In: Proceedings of IEEE workshop in neural networks for signal processing, pp 41–48

  36. 36.

    Mthembu L, Greene J (2004) A comparison of three class separability measures. In: Proceedings of symposium of the Pattern Recognition Association of South Africa, pp 63–67

  37. 37.

    Ridder D, Kouropteva O, Okun O, Pietikainen M, Duin R (2003) Supervised locally linear embedding. In: Proceedings of joint international conference ICANN/ICONIP, Lecture Notes in Computer Science, vol 2714, pp 333–341. Springer, Berlin

  38. 38.

    Ridder D, Loog M, Reinders MJT (2004) Local Fisher embedding. In: Proceedings of the international conference on pattern recognition, vol 2, pp 295–298

  39. 39.

    Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

  40. 40.

    Sammon J (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18(5):401–409

  41. 41.

    Schölkopf B, Smola A, Müller K (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319

  42. 42.

    Sha F, Saul L (2005) Analysis and extension of spectral methods for nonlinear dimensionality reduction. In: Proceedings of the international conference on machine learning, pp 784–791

  43. 43.

    Sheskin D (2006) Handbook of parametric and nonparametric statistical procedures. CRC Press, Boca Raton

  44. 44.

    Silva V, Tenenbaum J (2003) Global versus local methods in nonlinear dimensionality reduction. In: Advances in neural information processing systems, vol 15, pp 705–712

  45. 45.

    Spearman C (1904) General intelligence objectively determined and measured. Am J Psichol 15:206–221

  46. 46.

    Teh Y, Roweis S (2002) Automatic alignment of hidden representations. In: Advances in neural information processing systems, vol 15, pp 841–848

  47. 47.

    Tenenbaum J, Silva VD, Langford J (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

  48. 48.

    Thornton C (1998) Separability is a learner’s best friend. In: Proceedings of the Neural Computation and Psychology Workshop. Springer, pp 40–46

  49. 49.

    Tipping M, Bishop C (1999) Probabilistic principal component analysis. J R Stat Soc Ser B 61:611–622

  50. 50.

    Torgerson W (1952) Multidimensional scaling: I. Theory and method. Psychometrika 17(4):401–419

  51. 51.

    Verbeek J (2006) Learning nonlinear image manifolds by global alignment of local linear models. IEEE Trans Pattern Anal Mach Intell 28(8):1236–1250

  52. 52.

    Webb A (1995) Multidimensional scaling by iterative majorization using radial basis functions. Pattern Recogn 28(5):753–759

  53. 53.

    Weinberger K, Saul L (2006) Unsupervised learning of image manifolds by semidefinite programming. Int J Comput Vis 70(1):77–90

  54. 54.

    Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244

  55. 55.

    Weinberger K, Sha F, Zhu Q, Saul L (2006) Graph Laplacian regularization for large-scale semidefinite programming. In: Advances in neural information processing systems, vol 19, pp 1489–1496

  56. 56.

    Zhang T, Yang J, Zhao D, Ge X (2007) Linear local tangent space alignment and application to face recognition. Neurocomputing 70(7–9):1547–1553

  57. 57.

    Zhang Z, Zha H (2004) Principal manifolds and nonlinear dimensionality reduction via local tangent space alignment. SIAM J Sci Comput 26(1):313–338

  58. 58.

    Zhao L, Zhang Z (2009) Supervised locally linear embedding with probability-based distance for classification. Comput Math Appl 57(6):919–926

Download references

Funding

This work was funded by the program Erasmus Mundus Acción 2, Strand 1, Lot 2, PEACE II, with Project Code 2013-2443/001-001.

Author information

Correspondence to Manuel Fernández-Delgado.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Alawadi, S., Fernández-Delgado, M., Mera, D. et al. Polynomial Kernel Discriminant Analysis for 2D visualization of classification problems. Neural Comput & Applic 31, 3515–3531 (2019). https://doi.org/10.1007/s00521-017-3290-3

Download citation

Keywords

  • Classification
  • Data visualization
  • Mapping
  • Dimensionality reduction
  • Class overlap
  • Discriminant analysis