Maximizing adjusted covariance: new supervised dimension reduction for classification

Park, Hyejoon; Kim, Hyunjoong; Lee, Yung-Seop

doi:10.1007/s00180-024-01472-7

Maximizing adjusted covariance: new supervised dimension reduction for classification

Original Paper
Published: 02 April 2024

(2024)
Cite this article

Computational Statistics Aims and scope Submit manuscript

55 Accesses
Explore all metrics

Abstract

This study proposes a new linear dimension reduction technique called Maximizing Adjusted Covariance (MAC), which is suitable for supervised classification. The new approach is to adjust the covariance matrix between input and target variables using the within-class sum of squares, thereby promoting class separation after linear dimension reduction. MAC has a low computational cost and can complement existing linear dimensionality reduction techniques for classification. In this study, the classification performance by MAC was compared with those of the existing linear dimension reduction methods using 44 datasets. In most of the classification models used in the experiment, the MAC dimension reduction method showed better classification accuracy and F1 score than other linear dimension reduction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature dimensionality reduction: a review

Article Open access 21 January 2022

Data clustering: application and trends

Article 27 November 2022

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Data Availability

The data is public data, it can be downloaded from the relevant site.

References

Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Logic Soft Comput 17(2):255–287
Google Scholar
Asuncion A, Newman DJ (2007) UCI machine learning repository, University of California, Irvine, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml, accessed 02 October 2018
Ballabio D, Consonni V (2013) Classification tools in chemistry Part 1: linear models PLS-DA. Anal Methods 5(16):3790–3798. https://doi.org/10.1039/C3AY40582F
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Article Google Scholar
Cayton L (2005) Algorithms for manifold learning. University of California at San Diego Technical Reports 12(1–17):1
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, (pp. 785–794 ), https://doi.org/10.48550/arXiv.1603.02754
Duda RO, Hart PE, Stork DG (2001) Pattern Classification, 2nd edn. Wiley-Interscience
Google Scholar
Fukunaga K (2013) Introduction to Statistical Pattern Recognition. Elsevier
Google Scholar
Guenther N, Schonlau M (2016) Support vector machines. Stata J 16(4):917–937. https://doi.org/10.1177/1536867X1601600407
Article Google Scholar
Gurney K (2018) An Introduction to Neural Networks. CRC Press
Book Google Scholar
Han J, Kamber M, Pei J (2012) Data Mining: Concepts and Techniques, 3rd edn. The Morgan Kaufmann Series in Data Management Systems, Elsevier, https://doi.org/10.1016/C2009-0-61819-5
Heinz G, Peterson LJ, Johnson RW, Kerk CJ (2003) Exploring relationships in body dimensions. J Stat Educ. https://doi.org/10.1080/10691898.2003.11910711
Article Google Scholar
Hennig C (2004) Asymmetric linear dimension reduction for classification. J Comput Graph Stat 13(4):930–945. https://doi.org/10.1198/106186004X12740
Article MathSciNet Google Scholar
Jolliffe IT (2002) Principal Component Analysis, 2nd edn. Springer Series in Statistics, Springer, New York. https://doi.org/10.1007/b98835
Book Google Scholar
Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and Information Conference, IEEE, (pp. 372–378 ), https://doi.org/10.1109/SAI.2014.6918213
Kim H, Kim H, Moon H, Ahn H (2011) A weight-adjusted voting algorithm for ensembles of classifiers. J Korean Stat Soc 40(4):437–449. https://doi.org/10.1016/j.jkss.2011.03.002
Article MathSciNet Google Scholar
Lee EK, Cook D, Klinke S, Lumley T (2012) Projection pursuit for exploratory supervised classification. J Comput Graph Stat 14(4):831–846. https://doi.org/10.1198/106186005X77702
Article MathSciNet Google Scholar
Lim TS, Loh WY, Shih YS (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40(3):203–228. https://doi.org/10.1023/A:1007608224229
Article Google Scholar
Liu Y, Rayens W (2007) PLS and dimension reduction for classification. Comput Stat 22(2):189–208. https://doi.org/10.1007/s00180-007-0039-y
Article MathSciNet Google Scholar
Loh WY (2009) Improving the precision of classification trees. Annals Appl Stat 3(4):1710–1737. https://doi.org/10.1214/09-AOAS260
Article MathSciNet Google Scholar
McInnes L, Healy J, Melville J (2018) UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, https://doi.org/10.48550/arXiv.1802.03426
Penrose R (1955) A generalized inverse for matrices. Math Proc Cambridge Philos Soc 51(3):406–413. https://doi.org/10.1017/S0305004100030401
Article Google Scholar
Raducanu B, Dornaika F (2012) A supervised non-linear dimensionality reduction approach for manifold learning. Pattern Recogn 45(6):2432–2444. https://doi.org/10.1016/j.patcog.2011.12.006
Article Google Scholar
Raju VG, Lakshmi KP, Jain VM, Kalidindi A, Padma V (2020) Study the influence of normalization/transformation process on the accuracy of supervised classification. In: 2020 Third international conference on smart systems and inventive technology (ICSSIT), IEEE, (pp. 729–735 ), https://doi.org/10.1109/ICSSIT48917.2020.9214160
Tang L, Peng S, Bi Y, Shan P, Hu X (2014) A new method combining LDA and PLS for dimension reduction. PLoS ONE 9(5):e96944. https://doi.org/10.1371/journal.pone.0096944
Article Google Scholar
Terhune J (1994) Geographical variation of harp seal underwater vocalizations. Can J Zool 72(5):892–897. https://doi.org/10.1139/z94-121
Article Google Scholar
Tharwat A (2016) Linear vs. quadratic discriminant analysis classifier: a tutorial. Int J Appl Pattern Recognit 3(2):145–180. https://doi.org/10.1504/IJAPR.2016.079050
Article Google Scholar
Tharwat A (2020) Classification assessment methods. Appl Comput Inf 17(1):168–192. https://doi.org/10.1016/j.aci.2018.08.003
Article Google Scholar
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(86):2579–2605
Google Scholar
Van Der Maaten L, Postma E, Van den Herik J et al (2009) Dimensionality reduction: a comparative review. J Mach Learn Res 10(66–71):13
Google Scholar
Vlachos P (2010) Statlib. dataset archive. Carnegie Mellon University, Department of Statistics. http://lib.stat.cmu.edu/datasets, accessed 02 October 2018
Vogelstein JT, Bridgeford EW, Tang M, Zheng D, Douville C, Burns R, Maggioni M (2021) Supervised dimensionality reduction for big data. Nat Commun 12(1):2872. https://doi.org/10.1038/s41467-021-23102-2
Article Google Scholar
Wang G, Wei Y, Qiao S (2018) Generalized Inverses: Theory and Computations. Developments in Mathematics, Springer Singapore
Book Google Scholar
Warne RT (2014) A primer on multivariate analysis of variance (MANOVA) for behavioral scientists. Pract Assessment, Res Eval 19(17):1–10. https://doi.org/10.7275/sm63-7h70
Article Google Scholar

Download references

Funding

Hyunjoong Kim’s work was supported by the National Research Foundation of Korea (NRF) grant (No. NRF-2016R1D1A1B02011696), and by the ICAN support program (IITP-2023-00259934) supervised by the IITP(Institute for Information & Communications Technology Planning & Evaluation), funded by the Korean government (Ministry of Science and ICT). Yung-Seop Lee’s work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. NRF-2021R1A2C1007095).

Author information

Authors and Affiliations

Department of Applied Statistics, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul, 03722, South Korea
Hyejoon Park & Hyunjoong Kim
Department of Statistics, Dongguk University, 30 Pildong-ro 1-gil, Jung-gu, Seoul, 04620, South Korea
Yung-Seop Lee

Authors

Hyejoon Park
View author publications
You can also search for this author in PubMed Google Scholar
Hyunjoong Kim
View author publications
You can also search for this author in PubMed Google Scholar
Yung-Seop Lee
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Hyejoon Park provided the core idea of the study, programmed it, and conducted a data experiment. Hyunjoong Kim provided the core idea of the study and wrote the manuscript. Yung-Seop Lee provided the conception and design for the project. All authors provided significant effort in the analysis/interpretation of data, reviewing the manuscript, final approval of the manuscript, and agreed to be accountable for all aspects of the work.

Corresponding authors

Correspondence to Hyunjoong Kim or Yung-Seop Lee.

Ethics declarations

Conflict of interest

All the authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Objective function of MAC

$$\begin{aligned}& {{\text{For readability, }}}{{\varvec{Y}}}_\text{dummy} {{\text{ is denoted by }}} {{\varvec{Y}}}.\nonumber \\ {}&\mathop {\text{argmax}}\limits \limits _{\Vert {\textbf {v}}\Vert =1, \Vert {\textbf {q}}\Vert =1}\frac{{{\textbf {v}}}^{T}{{\varvec{X}}}^{T}{{\varvec{Y}}} {{\textbf {q}}}}{{{\textbf {v}}}^{T}{{\varvec{S}}}_{\text {within}}{{\textbf {v}}}}\nonumber \\&\text {Let }\alpha = \frac{{{\textbf {v}}}^{T}{{\varvec{X}}}^{T}{{\varvec{Y}}}{{\textbf {q}}}}{{{\textbf {v}}}^{T}{{\varvec{S}}}_{\text {within}}{{\textbf {v}}}}, \qquad {{\textbf {y}}} = {{\varvec{Y}}}{{\textbf {q}}}\nonumber \\&\Longleftrightarrow {{{\textbf {v}}}^{T}{{\varvec{X}}}^{T}{{\textbf {y}}} = \alpha {{\textbf {v}}}^{T}{{\varvec{S}}}_{\text {within}}{{\textbf {v}}}}\nonumber \\&\Longrightarrow {\textit{f}({{\textbf {v}}}) = {{\textbf {v}}}^{T}{{\varvec{X}}}^{T}{{\textbf {y}}}}\nonumber \\&\qquad \textit{g}({{\textbf {v}}}) = \alpha {{\textbf {v}}}^{T}{{\varvec{S}}}_{\text {within}}{{\textbf {v}}}\nonumber \\&\text {Differentiate } \textit{f}({{\textbf {v}}}) \text { and } \textit{g}({{\textbf {v}}}) \text { with } {{\textbf {v}}}\nonumber \\&\Longrightarrow {\frac{\partial \textit{f}({{\textbf {v}}})}{\partial {{\textbf {v}}}} = \textit{f}'({{\textbf {v}}}) = {{\varvec{X}}}^{T}{{\textbf {y}}}}\nonumber \\&\qquad \frac{\partial \textit{g}({{\textbf {v}}})}{\partial {{\textbf {v}}}} = \textit{g}'({{\textbf {v}}}) = 2\alpha {{\varvec{S}}}_{\text {within}}{{\textbf {v}}}\nonumber \\&\text {By } \textit{f}({{\textbf {v}}}) = \textit{g}({{\textbf {v}}}) \Leftrightarrow \textit{f}'({{\textbf {v}}}) = \textit{g}'({{\textbf {v}}})\nonumber \\&\Longrightarrow {{{\varvec{X}}}^{T}{{\textbf {y}}} = 2\alpha {{\varvec{S}}}_{\text {within}}{{\textbf {v}}}}\nonumber \\&\text {Multiply both sides by }{{\varvec{S}}}_{\text {within}}^{-1} \text { and }{{\textbf {v}}}^{T}\text { for }(A1)\nonumber \\&\Longrightarrow {(\text {multiply } {{\varvec{S}}}_{\text {within}}^{-1})\qquad {{\varvec{S}}}_{\text {within}}^{-1} {{\varvec{X}}}^{T}{{\textbf {y}}} = 2\alpha {{\varvec{S}}}_{\text {within}}^{-1}{{\varvec{S}}}_{\text {within}}{{\textbf {v}}} = 2\alpha {{\textbf {v}}}}\qquad \qquad \qquad \qquad \qquad \qquad \quad \nonumber \\&\Longrightarrow {(\text {multiply } {{\textbf {v}}}^{T})\qquad \qquad {{\textbf {v}}}^{T} {{\varvec{S}}}_{\text {within}}^{-1}{{\varvec{X}}}^{T}{{\textbf {y}}} = 2\alpha {{\textbf {v}}}^{T}{{\textbf {v}}} = 2\alpha }\nonumber \\&\Longrightarrow {{{\textbf {v}}}^{T}{{\varvec{S}}}_{\text {within}}^{-1} {{\varvec{X}}}^{T}{{\varvec{Y}}}{{\textbf {q}}} = 2\alpha }\nonumber \\&\therefore {{\textbf {v}}}^{T}{{\varvec{S}}}_{\text {within}}^{-1} {{\varvec{X}}}^{T}{{\varvec{Y}}}{{\textbf {q}}}\propto \frac{{{\textbf {v}}}^{T} {{\varvec{X}}}^{T}{{\varvec{Y}}}{{\textbf {q}}}}{{{\textbf {v}}}^{T}{{\varvec{S}}}_{\text {within}} {{\textbf {v}}}} \end{aligned}$$

(A1)

Appendix B: Optimizing

$$\begin{aligned}&\text {For readability, }{{\varvec{Y}}}_\textrm{dummy}\text { is denoted by } {{\varvec{Y}}}.\nonumber \\ {}&\mathop {\textrm{argmax}}\limits \limits _{{\textbf {v}}, {\textbf {q}}}\frac{{{\textbf {v}}}^{T}{{\varvec{X}}}^{T}{{\varvec{Y}}}{{\textbf {q}}}}{{{\textbf {v}}}^{T}{{\varvec{S}}}_\textrm{within}{{\textbf {v}}}}\qquad {\textit{s.t.}}\; {{\textbf {v}}}^{T}{{\textbf {v}}}={{\textbf {q}}}^{T}{{\textbf {q}}}=1\nonumber \\&\Longleftrightarrow \; \mathop {\textrm{argmax}}\limits \limits _{{\textbf {v}}, {\textbf {q}}}{{\textbf {v}}}^{T}{{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}} {{\textbf {q}}}\qquad \textit{s.t.}\; {{\textbf {v}}}^{T}{{\textbf {v}}}={{\textbf {q}}}^{T}{{\textbf {q}}}=1\nonumber \\&\Longrightarrow {\text {by Lagrange multipliers}\;L({{\textbf {v}}}, {{\textbf {q}}}) = {{\textbf {v}}}^{T}{{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}} {{\textbf {q}}}-\lambda _{{{\textbf {v}}}}\big ({{\textbf {v}}}^{T} {{\textbf {v}}}-1\big )-\lambda _{{{\textbf {q}}}}\big ({{\textbf {q}}}^{T} {{\textbf {q}}}-1\big )}\nonumber \\&\text {Differentiate } \textit{L}({{\textbf {v}}}, {{\textbf {q}}}) \text { with } {{\textbf {v}}} \text { and } {{\textbf {q}}}\nonumber \\&\Longrightarrow {\frac{\partial \textit{L}({{\textbf {v}}}, {{\textbf {q}}})}{\partial {{\textbf {v}}}} = {{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}}{{\textbf {q}}} -2\lambda _{{{\textbf {v}}}}{{\textbf {v}}} = 0 \Longleftrightarrow {{{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}}{{\textbf {q}}} = 2\lambda _{{{\textbf {v}}}}{{\textbf {v}}}}} \end{aligned}$$

(B1)

$$\begin{aligned}&\qquad\, \frac{\partial \textit{L}({{\textbf {v}}}, {{\textbf {q}}})}{\partial {{\textbf {q}}}} = {{\textbf {v}}}^{T}{{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}} -2\lambda _{{{\textbf {q}}}}{{\textbf {q}}}^{T} = 0 \Longleftrightarrow {{{\textbf {v}}}^{T}{{\varvec{S}}}_\textrm{within}^{-1} {{\varvec{X}}}^{T}{{\varvec{Y}}} = 2\lambda _{{{\textbf {q}}}}{{\textbf {q}}}^{T}} \end{aligned}$$

(B2)

$$\begin{aligned}&\text {Multiply both sides by }{{\textbf {v}}}^{T}\text { for } (B1) \text { and }{{\textbf {q}}}\text { for }(B2)\text { to make the left side of }(B1) \text { and }(B2)\text { equal }\nonumber \\&\Longrightarrow\displaystyle { {(\text {multiply }{{\textbf {v}}}^{T}})\qquad {{\textbf {v}}}^{T}{{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}}{{\textbf {q}}} = 2\lambda _{{{\textbf {v}}}}{{\textbf {v}}}^{T}{{\textbf {v}}}} = 2\lambda _{{{\textbf {v}}}} \end{aligned}$$

(B3)

$$\begin{aligned}&\qquad\, (\text {multiply }{{\textbf {q}}})\qquad \;\;\, {{\textbf {v}}}^{T}{{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}}{{\textbf {q}}} = 2\lambda _{{{\textbf {q}}}}{{\textbf {q}}}^{T}{{\textbf {q}}} = 2\lambda _{{{\textbf {q}}}} \end{aligned}$$

(B4)

$$\begin{aligned}&\text {The left side of }(B3) \text { and the left side of }(B4)\text { are the same}\nonumber \\ {}&\therefore {{\textbf {v}}}^{T}{{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}}{{\textbf {q}}} = 2\lambda _{{{\textbf {v}}}} = 2\lambda _{{{\textbf {q}}}} = \lambda \end{aligned}$$

(B5)

$$\begin{aligned}&\text {Substitute }(B5)\text { into } (B1) \text { and }(B2)\nonumber \\&\Longrightarrow {{{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}}{{\textbf {q}}} = \lambda {{\textbf {v}}}} \end{aligned}$$

(B6)

$$\begin{aligned}& \qquad\, {{\textbf {v}}}^{T}{{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}} = \lambda {{\textbf {q}}}^{T} \end{aligned}$$

(B7)

$$\begin{aligned}&\text {By } (B7)\nonumber \\&{{\textbf {q}}}^{T}=\frac{1}{\lambda }{{\textbf {v}}}^{T} {{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}} \Leftrightarrow {\big ({{\textbf {q}}}^{T}\big )^{T}=\Big (\frac{1}{\lambda }{{\textbf {v}}}^{T}{{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}}\Big )^{T} \Leftrightarrow {{{\textbf {q}}}=\frac{1}{\lambda }{{\varvec{Y}}}^{T} {{\varvec{X}}}{{\varvec{S}}}_\textrm{within}^{-1}{{\textbf {v}}}}} \end{aligned}$$

(B8)

$$\begin{aligned}&\text {Substitute }(B8)\text { into }(B6)\nonumber \\&\Longrightarrow {{{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}}{{\textbf {q}}} = {{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}}\frac{1}{\lambda } {{\varvec{Y}}}^{T}{{\varvec{X}}}{{\varvec{S}}}_\textrm{within}^{-1}{{\textbf {v}}} = \lambda {{\textbf {v}}}}\nonumber \\&\Longleftrightarrow {{{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}} {{\varvec{Y}}}^{T}{{\varvec{X}}}{{\varvec{S}}}_\textrm{within}^{-1}{{\textbf {v}}} = \lambda ^{2}{{\textbf {v}}}}\nonumber \\&{{\textbf {v}}}_{j}\text { is the eigenvector of } \lambda _{j}^{2} (\lambda _{1}^{2}\ge \lambda _{2}^{2}\ge \cdots \ge \lambda _{d}^{2}), \text {multiplying both sides of }(B9)\text { by } {{\textbf {v}}}^{T}\nonumber \\&{{\textbf {v}}}^{T}{{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T} {{\varvec{Y}}}{{\varvec{Y}}}^{T}{{\varvec{X}}}{{\varvec{S}}}_\textrm{within}^{-1}{{\textbf {v}}} = \lambda ^{2}\nonumber \\ {}&\therefore J_{\text {MAC}}({\textbf {v}}, {\textbf {q}}) = \mathop {\textrm{argmax}}\limits \limits _{\Vert {\textbf {v}}\Vert =\Vert {\textbf {q}}\Vert =1}\Big (\frac{{\textbf {v}}^{T} {{\varvec{X}}}^{T}{{\varvec{Y}}}{} {\textbf {q}}}{{\textbf {v}}^{T}{{\varvec{S}}}_\textrm{within}{} {\textbf {v}}}\Big )\nonumber \\&\Longleftrightarrow {J_{\text {MAC}}({\textbf {v}}) = \mathop {\textrm{argmax}}\limits \limits _{\Vert {\textbf {v}}\Vert =1}({\textbf {v}}^{T}{{\varvec{S}}}_\textrm{within}^{-1}{{\varvec{X}}}^{T}{{\varvec{Y}}} {{\varvec{Y}}}^{T}{{\varvec{X}}}{{\varvec{S}}}_\textrm{within}^{-1}{} {\textbf {v}})} \end{aligned}$$

(B9)

Appendix C: Difference in numerator between CLDA and MAC

In this section, we will mathematically compare the numerator part of the CLDA and MAC methods.

$$\begin{aligned}&\text {For simplicity, let's assume zero-centered } {{\varvec{X}}} \text { and } {{\varvec{Y}}}_\textrm{dummy}. \text { The numerator of CLDA, } {\textbf {v}}^T{{\varvec{S}}}_\textrm{between}{} {\textbf {v}}, \text { is:}\nonumber \\ {}&{\textbf {v}}^T{{\varvec{S}}}_\textrm{between}{} {\textbf {v}} \,= {\textbf {v}}^T \sum _{i=1}^{g}({\textbf {1}}\bar{{{\textbf {x}}}}^T_i)^{T}({\textbf {1}}\bar{{{\textbf {x}}}}^T_i) {\textbf {v}} = {\textbf {v}}^T \Bigg [ \sum _{i=1}^{g} n_i \begin{pmatrix} m_{i1}\\ \vdots \\ m_{ip} \end{pmatrix} \begin{pmatrix} m_{i1} \cdots m_{ip} \end{pmatrix} \Bigg ] {\textbf {v}}, \end{aligned}$$

(C1)

$$\begin{aligned}&\text {where }m_{ij}\text { is the }{} \textit{j}\text {-th value of }\bar{{{\textbf {x}}}}^T_i. \text { The }{{\varvec{S}}}_\textrm{between}\text { in }(C1)\text { becomes }\nonumber \\&{{\varvec{S}}}_\textrm{between} \,= \sum _{i=1}^{g} n_i \begin{pmatrix} m_{i1}\\ \vdots \\ m_{ip} \end{pmatrix} \begin{pmatrix} m_{i1} \cdots m_{ip} \end{pmatrix} \,= \begin{pmatrix} \sum _{i=1}^{g}n_i m_{i1}^2 &{}\cdots &{}\sum _{i=1}^{g}n_i m_{ip}m_{i1}\\ \sum _{i=1}^{g}n_i m_{i1}m_{i2} &{}\cdots &{}\sum _{i=1}^{g}n_i m_{ip}m_{i2}\\ \vdots &{}\ddots &{}\vdots \\ \sum _{i=1}^{g}n_i m_{i1}m_{ip} &{}\cdots &{}\sum _{i=1}^{g}n_i m_{ip}^2 \end{pmatrix}\nonumber \\&\hspace{18.1em} = \Bigg [\sum _{i=1}^{g}n_i m_{i1}\bar{{{\textbf {x}}}}_i \; \cdots \; \sum _{i=1}^{g}n_i m_{ip}\bar{{{\textbf {x}}}}_i\Bigg ] \end{aligned}$$

(C2)

$$\begin{aligned}&\text {The numerator of MAC, Cov}({{\varvec{X}}}{} {\textbf {v}}, {{\varvec{Y}}}_\textrm{dummy}{} {\textbf {q}}),\text { is:}\nonumber \\&\text {Cov}({{\varvec{X}}}{} {\textbf {v}}, {{\varvec{Y}}}_\textrm{dummy}{} {\textbf {q}}) \,= E\big [({{\varvec{X}}}{} {\textbf {v}})\cdot ({{\varvec{Y}}}_\textrm{dummy}{} {\textbf {q}})\big ]\hspace{1em} \because \text {mean centering, } E({{\varvec{X}}}{} {\textbf {v}}) \,= E({{\varvec{Y}}}_\textrm{dummy}{} {\textbf {q}}) = 0\nonumber \\&\hspace{6.2em} = \frac{1}{n}\sum _{w=1}^{n}[({{\varvec{X}}}{} {\textbf {v}})_w \cdot ({{\varvec{Y}}}_\textrm{dummy}{} {\textbf {q}})_w] \,= \frac{1}{n}({{\varvec{X}}}{} {\textbf {v}})^T({{\varvec{Y}}}_\textrm{dummy}{} {\textbf {q}})\nonumber \\&\hspace{6.2em} = \frac{1}{n}{} {\textbf {v}}^T({{\varvec{X}}}^T{{\varvec{Y}}}_\textrm{dummy}){\textbf {q}}.\end{aligned}$$

(C3)

$$\begin{aligned}&\text {The }{{\varvec{X}}}^T{{\varvec{Y}}}_\textrm{dummy}\text { part in } (C3)\text { becomes}\nonumber \\&{{\varvec{X}}}^T{{\varvec{Y}}}_\textrm{dummy} \,= \begin{pmatrix} x_{11} &{}x_{21} &{}\cdots &{}x_{n1}\\ x_{12} &{}x_{22} &{}\cdots &{}x_{n2}\\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ x_{1p} &{}x_{2p} &{}\cdots &{}x_{np} \end{pmatrix} \cdot \begin{pmatrix} y_{11} &{}\cdots &{}y_{1g}\\ y_{21} &{}\cdots &{}y_{2g}\\ \vdots &{}\ddots &{}\vdots \\ y_{n1} &{}\cdots &{}y_{ng}\\ \end{pmatrix},\end{aligned}$$

(C4)

$$\begin{aligned}&\text {where }x_{wj}\text { represents the }{} \textit{j}\text {-th variable of the } \textit{w}\text {-th observation. By mean centering on } y_{wi},\nonumber \\ {}&\text {the value that was 1 changes to } 1-\frac{n_i}{n},\text { and the remaining values change to } -\frac{n_i}{n}.\nonumber \\ {}&\text {As a result, }(C5) \text { becomes:}\nonumber \\&{{\varvec{X}}}^T{{\varvec{Y}}}_\textrm{dummy} \,= \begin{pmatrix} x_{11} &{}x_{21} &{}\cdots &{}x_{n1}\\ x_{12} &{}x_{22} &{}\cdots &{}x_{n2}\\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ x_{1p} &{}x_{2p} &{}\cdots &{}x_{np} \end{pmatrix} \cdot \begin{pmatrix} y_{11} &{}\cdots &{}y_{1g}\\ y_{21} &{}\cdots &{}y_{2g}\\ \vdots &{}\ddots &{}\vdots \\ y_{n1} &{}\cdots &{}y_{ng}\\ \end{pmatrix} \,= \begin{pmatrix} \sum _{w=1}^{n}x_{w1}y_{w1} &{}\cdots &{}\sum _{w=1}^{n}x_{w1}y_{wg}\\ \sum _{w=1}^{n}x_{w2}y_{w1} &{}\cdots &{}\sum _{w=1}^{n}x_{w2}y_{wg}\\ \vdots &{}\ddots &{}\vdots \\ \sum _{w=1}^{n}x_{wp}y_{w1} &{}\cdots &{}\sum _{w=1}^{n}x_{wp}y_{wg} \end{pmatrix}\nonumber \\&\hspace{5.7em} = \begin{pmatrix} n_1 m_{11}+\sum _{i=1}^{g}n_i m_{i1}\big (-\frac{n_1}{n}\big ) &{}\cdots &{}n_g m_{g1}+\sum _{i=1}^{g}n_i m_{i1}\big (-\frac{n_g}{n}\big ) \\ n_1 m_{12}+\sum _{i=1}^{g}n_i m_{i2}\big (-\frac{n_1}{n}\big ) &{}\cdots &{}n_g m_{g2}+\sum _{i=1}^{g}n_i m_{i2}\big (-\frac{n_g}{n}\big ) \\ \vdots &{}\ddots &{}\vdots \\ n_1 m_{1p}+\sum _{i=1}^{g}n_i m_{ip}\big (-\frac{n_1}{n}\big ) &{}\cdots &{}n_g m_{gp}+\sum _{i=1}^{g}n_i m_{ip}\big (-\frac{n_g}{n}\big ) \end{pmatrix}\nonumber \\&\hspace{6em} = \Bigg [n_1 \bar{{{\textbf {x}}}}_1-\frac{n_1}{n}\sum _{i=1}^{g}n_i \bar{{{\textbf {x}}}}_i \; \cdots \; n_g \bar{{{\textbf {x}}}}_g-\frac{n_g}{n}\sum _{i=1}^{g}n_i \bar{{{\textbf {x}}}}_i\Bigg ] \end{aligned}$$

(C5)

(C5) is more sensitive to class-specific details than (C2) because it uses the number of observations per class.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Park, H., Kim, H. & Lee, YS. Maximizing adjusted covariance: new supervised dimension reduction for classification. Comput Stat (2024). https://doi.org/10.1007/s00180-024-01472-7

Download citation

Received: 28 June 2023
Accepted: 29 January 2024
Published: 02 April 2024
DOI: https://doi.org/10.1007/s00180-024-01472-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximizing adjusted covariance: new supervised dimension reduction for classification

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Data clustering: application and trends

Learning from imbalanced data: open challenges and future directions

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: Objective function of MAC

Appendix B: Optimizing

Appendix C: Difference in numerator between CLDA and MAC

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Maximizing adjusted covariance: new supervised dimension reduction for classification

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Data clustering: application and trends

Learning from imbalanced data: open challenges and future directions

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: Objective function of MAC

Appendix B: Optimizing

Appendix C: Difference in numerator between CLDA and MAC

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation