Abstract
This study proposes a new linear dimension reduction technique called Maximizing Adjusted Covariance (MAC), which is suitable for supervised classification. The new approach is to adjust the covariance matrix between input and target variables using the within-class sum of squares, thereby promoting class separation after linear dimension reduction. MAC has a low computational cost and can complement existing linear dimensionality reduction techniques for classification. In this study, the classification performance by MAC was compared with those of the existing linear dimension reduction methods using 44 datasets. In most of the classification models used in the experiment, the MAC dimension reduction method showed better classification accuracy and F1 score than other linear dimension reduction methods.
Similar content being viewed by others
Data Availability
The data is public data, it can be downloaded from the relevant site.
References
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Logic Soft Comput 17(2):255–287
Asuncion A, Newman DJ (2007) UCI machine learning repository, University of California, Irvine, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml, accessed 02 October 2018
Ballabio D, Consonni V (2013) Classification tools in chemistry Part 1: linear models PLS-DA. Anal Methods 5(16):3790–3798. https://doi.org/10.1039/C3AY40582F
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Cayton L (2005) Algorithms for manifold learning. University of California at San Diego Technical Reports 12(1–17):1
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, (pp. 785–794 ), https://doi.org/10.48550/arXiv.1603.02754
Duda RO, Hart PE, Stork DG (2001) Pattern Classification, 2nd edn. Wiley-Interscience
Fukunaga K (2013) Introduction to Statistical Pattern Recognition. Elsevier
Guenther N, Schonlau M (2016) Support vector machines. Stata J 16(4):917–937. https://doi.org/10.1177/1536867X1601600407
Gurney K (2018) An Introduction to Neural Networks. CRC Press
Han J, Kamber M, Pei J (2012) Data Mining: Concepts and Techniques, 3rd edn. The Morgan Kaufmann Series in Data Management Systems, Elsevier, https://doi.org/10.1016/C2009-0-61819-5
Heinz G, Peterson LJ, Johnson RW, Kerk CJ (2003) Exploring relationships in body dimensions. J Stat Educ. https://doi.org/10.1080/10691898.2003.11910711
Hennig C (2004) Asymmetric linear dimension reduction for classification. J Comput Graph Stat 13(4):930–945. https://doi.org/10.1198/106186004X12740
Jolliffe IT (2002) Principal Component Analysis, 2nd edn. Springer Series in Statistics, Springer, New York. https://doi.org/10.1007/b98835
Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and Information Conference, IEEE, (pp. 372–378 ), https://doi.org/10.1109/SAI.2014.6918213
Kim H, Kim H, Moon H, Ahn H (2011) A weight-adjusted voting algorithm for ensembles of classifiers. J Korean Stat Soc 40(4):437–449. https://doi.org/10.1016/j.jkss.2011.03.002
Lee EK, Cook D, Klinke S, Lumley T (2012) Projection pursuit for exploratory supervised classification. J Comput Graph Stat 14(4):831–846. https://doi.org/10.1198/106186005X77702
Lim TS, Loh WY, Shih YS (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40(3):203–228. https://doi.org/10.1023/A:1007608224229
Liu Y, Rayens W (2007) PLS and dimension reduction for classification. Comput Stat 22(2):189–208. https://doi.org/10.1007/s00180-007-0039-y
Loh WY (2009) Improving the precision of classification trees. Annals Appl Stat 3(4):1710–1737. https://doi.org/10.1214/09-AOAS260
McInnes L, Healy J, Melville J (2018) UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, https://doi.org/10.48550/arXiv.1802.03426
Penrose R (1955) A generalized inverse for matrices. Math Proc Cambridge Philos Soc 51(3):406–413. https://doi.org/10.1017/S0305004100030401
Raducanu B, Dornaika F (2012) A supervised non-linear dimensionality reduction approach for manifold learning. Pattern Recogn 45(6):2432–2444. https://doi.org/10.1016/j.patcog.2011.12.006
Raju VG, Lakshmi KP, Jain VM, Kalidindi A, Padma V (2020) Study the influence of normalization/transformation process on the accuracy of supervised classification. In: 2020 Third international conference on smart systems and inventive technology (ICSSIT), IEEE, (pp. 729–735 ), https://doi.org/10.1109/ICSSIT48917.2020.9214160
Tang L, Peng S, Bi Y, Shan P, Hu X (2014) A new method combining LDA and PLS for dimension reduction. PLoS ONE 9(5):e96944. https://doi.org/10.1371/journal.pone.0096944
Terhune J (1994) Geographical variation of harp seal underwater vocalizations. Can J Zool 72(5):892–897. https://doi.org/10.1139/z94-121
Tharwat A (2016) Linear vs. quadratic discriminant analysis classifier: a tutorial. Int J Appl Pattern Recognit 3(2):145–180. https://doi.org/10.1504/IJAPR.2016.079050
Tharwat A (2020) Classification assessment methods. Appl Comput Inf 17(1):168–192. https://doi.org/10.1016/j.aci.2018.08.003
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(86):2579–2605
Van Der Maaten L, Postma E, Van den Herik J et al (2009) Dimensionality reduction: a comparative review. J Mach Learn Res 10(66–71):13
Vlachos P (2010) Statlib. dataset archive. Carnegie Mellon University, Department of Statistics. http://lib.stat.cmu.edu/datasets, accessed 02 October 2018
Vogelstein JT, Bridgeford EW, Tang M, Zheng D, Douville C, Burns R, Maggioni M (2021) Supervised dimensionality reduction for big data. Nat Commun 12(1):2872. https://doi.org/10.1038/s41467-021-23102-2
Wang G, Wei Y, Qiao S (2018) Generalized Inverses: Theory and Computations. Developments in Mathematics, Springer Singapore
Warne RT (2014) A primer on multivariate analysis of variance (MANOVA) for behavioral scientists. Pract Assessment, Res Eval 19(17):1–10. https://doi.org/10.7275/sm63-7h70
Funding
Hyunjoong Kim’s work was supported by the National Research Foundation of Korea (NRF) grant (No. NRF-2016R1D1A1B02011696), and by the ICAN support program (IITP-2023-00259934) supervised by the IITP(Institute for Information & Communications Technology Planning & Evaluation), funded by the Korean government (Ministry of Science and ICT). Yung-Seop Lee’s work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. NRF-2021R1A2C1007095).
Author information
Authors and Affiliations
Contributions
Hyejoon Park provided the core idea of the study, programmed it, and conducted a data experiment. Hyunjoong Kim provided the core idea of the study and wrote the manuscript. Yung-Seop Lee provided the conception and design for the project. All authors provided significant effort in the analysis/interpretation of data, reviewing the manuscript, final approval of the manuscript, and agreed to be accountable for all aspects of the work.
Corresponding authors
Ethics declarations
Conflict of interest
All the authors declare no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Objective function of MAC
Appendix B: Optimizing
Appendix C: Difference in numerator between CLDA and MAC
In this section, we will mathematically compare the numerator part of the CLDA and MAC methods.
(C5) is more sensitive to class-specific details than (C2) because it uses the number of observations per class.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Park, H., Kim, H. & Lee, YS. Maximizing adjusted covariance: new supervised dimension reduction for classification. Comput Stat (2024). https://doi.org/10.1007/s00180-024-01472-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00180-024-01472-7