Abstract
Linear discriminant analysis (LDA) is a very popular method for dimensionality reduction in machine learning. Yet, the LDA cannot be implemented directly on unsupervised data as it requires the presence of class labels to train the algorithm. Thus, a clustering algorithm is needed to predict the class labels before the LDA can be utilized. However, different clustering algorithms have different parameters that need to be specified. The objective of this paper is to investigate how the parameters behave with a measurement criterion for feature selection, that is, the total error reduction ratio (TERR). The k-means and the Gaussian mixture distribution were adopted as the clustering algorithms and each algorithm was tested on four datasets with four distinct clustering evaluation criteria: Calinski-Harabasz, Davies-Bouldin, Gap and Silhouette. Overall, the k-means outperforms the Gaussian mixture distribution in selecting smaller feature subsets. It was found that if a certain threshold value of the TERR is set and the k-means algorithm is applied, the Calinski-Harabasz, Davies-Bouldin, and Silhouette criteria yield the same number of selected features, less than the feature subset size given by the Gap criterion. When the Gaussian mixture distribution algorithm is adopted, none of the criteria can consistently select features with the least number. The higher the TERR threshold value is set, the more the feature subset size will be, regardless of the type of clustering algorithm and the clustering evaluation criterion are used. These results are essential for future work direction in designing a robust unsupervised feature selection based on LDA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adegbola OA, Adeyemo IA, Semire FA, Popoola SI, Atayero AA (2020) A principal component analysis-based feature dimensionality reduction scheme for content-based image retrieval system. Telkomnika 18(4):1892–1896
Alharbi AS, Li Y, Xu Y (2017) Integrating LDA with clustering technique for relevance feature selection. In: Peng W, Alahakoon D, Li X (eds) Advances in Artificial Intelligence: 30th Australasian Joint Conference. Springer, Melbourne, pp 274–286
Baarsch J, Celebi ME (2012) Investigation of internal validity measures for k-means clustering. In: Proceedings of the International MultiConference of Engineers and Computer Scientists, pp 471–476. Newswood Limited, Hong Kong
Billings SA, Wei HL (2005) A multiple sequential orthogonal least squares algorithm for feature ranking and subset selection. ACSE Research Report (908). University of Sheffield
Chormunge S, Jena S (2018) Correlation based feature selection with clustering for high dimensional data. J Electr Syst Inf Technol 5(3): 542–549
Ding C, Li T (2007) Adaptive dimension reduction using discriminant analysis and K- means clustering. In: Ghahramani Z (ed) ACM International Conference Proceeding Series, vol 227. Association for Computing Machinery, New York, pp 521–528
El-Mandouh AM, Mahmoud HA, Abd-Elmegid LA, Haggag MH (2019) Optimized K-means clustering model based on gap statistic. Int J Adv Comput Sci Appl (IJACSA) 10(1):183–188
Gao W, Hu L, Zhang P (2020) Feature redundancy term variation for mutual information-based feature selection. Appl Intell 50(4):1272–1288
Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J Roy Stat Soc Ser B (Methodol) 58(1):155–176
He C, Fu H, Guo C, Luk W, Yang G (2017) A fully-pipelined hardware design for Gaussian mixture models. IEEE Trans Comput 66(11):1837–1850
Houari R, Bounceur A, Kechadi MT, Tari AK, Euler R (2016) Dimensionality reduction in data mining. Expert Syst Appl Int J 64(C): 247–260
Kamper H, Livescu K, Goldwater S (2017) An embedded segmental K-means model for unsupervised segmentation and clustering of speech. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU),pp 719–726
Krzanowski WJ (2018) Attribute selection in correspondence analysis of incidence matrices. J Roy Stat Soc: Ser C (Appl Stat) 42(3):529–541
Kumar, BS, Ravi V (2017) LDA based feature selection for document clustering. In: Proceedings of the 10th Annual ACM India Compute Conference, pp. 125–130. Association for Computing Machinery, New York
Lu J, Plataniotis KN, Venetsanopoulos AN (2003) Face recognition using LDA-based algorithms. IEEE Trans Neural Netw 14(1):195–200
Maugis C, Celeux G, Martin-Magniette ML (2009) Variable selection for clustering with Gaussian mixture models. Biometrics 65(3):701–709
Mohd MRS, Herman SH, Sharif Z (2017) Application of K-Means clustering in hot spot detection for thermal infrared images. In: IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), pp 107–110
Morissette L, Chartier S (2013) The k-means clustering technique: general considerations and implementation in Mathematica. Tutor Quant Methods Psychol 9(1)
Nazari Z, Kang D, Asharif MR, Sung Y, Ogawa S (2016) A new hierarchical clustering algorithm. In: ICIIBMS 2015–International Conference on Intelligent Informatics and Biomedical Sciences, pp 148–152
Duda O, Peter E, Hart DGS (eds) (2000) Pattern Classification. 2nd edn. Wiley, United States
Senawi A, Wei HL, Billings SA (2017) A new maximum relevance-minimum multicollinearity (MRmMC) method for feature selection and ranking. Pattern Recognit. 67: 47–61
Sharmin S, Shoyaib M, Ali AA, Khan MAH, Chae O (2019) Simultaneous featureselection and discretization based on mutual information. Pattern Recogn 91:162–174
Uddin, MP, Mamun, MA, Hossain, MA (2020) PCA-based feature reduction for hyperspectral remote sensing image classification. IETE Techn Rev 1–21
Ünlü R, Xanthopoulos P (2019) Estimating the number of clusters in a dataset via consensus clustering. Expert Syst Appl 125:33–39
Vashishth V, Chhabra A (2019) GMMR: a Gaussian mixture model based unsupervised machine learning approach for optimal routing in opportunistic IoT networks. Comput Commun 134:138–148
Xiao J, Lu J, Li X (2017) Davies bouldin index based hierarchical initialization K-means. Intell Data Anal 21(6):1327–1338
Acknowledgements
The authors would like to thank the International Islamic University Malaysia (IIUM), Universiti Malaysia Pahang (UMP) and the Universiti Teknologi MARA (UiTM) for providing financial support under the IIUM-UMP-UiTM Sustainable Research Collaboration Grant 2020 (Vote Number: RDU200722).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tie, K.H., Senawi, A., Chuan, Z.L. (2022). An Observation of Different Clustering Algorithms and Clustering Evaluation Criteria for a Feature Selection Based on Linear Discriminant Analysis. In: Khairuddin, I.M., et al. Enabling Industry 4.0 through Advances in Mechatronics. Lecture Notes in Electrical Engineering, vol 900. Springer, Singapore. https://doi.org/10.1007/978-981-19-2095-0_42
Download citation
DOI: https://doi.org/10.1007/978-981-19-2095-0_42
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2094-3
Online ISBN: 978-981-19-2095-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)