Skip to main content

An Observation of Different Clustering Algorithms and Clustering Evaluation Criteria for a Feature Selection Based on Linear Discriminant Analysis

  • Conference paper
  • First Online:
Enabling Industry 4.0 through Advances in Mechatronics

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 900))

Abstract

Linear discriminant analysis (LDA) is a very popular method for dimensionality reduction in machine learning. Yet, the LDA cannot be implemented directly on unsupervised data as it requires the presence of class labels to train the algorithm. Thus, a clustering algorithm is needed to predict the class labels before the LDA can be utilized. However, different clustering algorithms have different parameters that need to be specified. The objective of this paper is to investigate how the parameters behave with a measurement criterion for feature selection, that is, the total error reduction ratio (TERR). The k-means and the Gaussian mixture distribution were adopted as the clustering algorithms and each algorithm was tested on four datasets with four distinct clustering evaluation criteria: Calinski-Harabasz, Davies-Bouldin, Gap and Silhouette. Overall, the k-means outperforms the Gaussian mixture distribution in selecting smaller feature subsets. It was found that if a certain threshold value of the TERR is set and the k-means algorithm is applied, the Calinski-Harabasz, Davies-Bouldin, and Silhouette criteria yield the same number of selected features, less than the feature subset size given by the Gap criterion. When the Gaussian mixture distribution algorithm is adopted, none of the criteria can consistently select features with the least number. The higher the TERR threshold value is set, the more the feature subset size will be, regardless of the type of clustering algorithm and the clustering evaluation criterion are used. These results are essential for future work direction in designing a robust unsupervised feature selection based on LDA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adegbola OA, Adeyemo IA, Semire FA, Popoola SI, Atayero AA (2020) A principal component analysis-based feature dimensionality reduction scheme for content-based image retrieval system. Telkomnika 18(4):1892–1896

    Article  Google Scholar 

  2. Alharbi AS, Li Y, Xu Y (2017) Integrating LDA with clustering technique for relevance feature selection. In: Peng W, Alahakoon D, Li X (eds) Advances in Artificial Intelligence: 30th Australasian Joint Conference. Springer, Melbourne, pp 274–286

    Google Scholar 

  3. Baarsch J, Celebi ME (2012) Investigation of internal validity measures for k-means clustering. In: Proceedings of the International MultiConference of Engineers and Computer Scientists, pp 471–476. Newswood Limited, Hong Kong

    Google Scholar 

  4. Billings SA, Wei HL (2005) A multiple sequential orthogonal least squares algorithm for feature ranking and subset selection. ACSE Research Report (908). University of Sheffield

    Google Scholar 

  5. Chormunge S, Jena S (2018) Correlation based feature selection with clustering for high dimensional data. J Electr Syst Inf Technol 5(3): 542–549

    Google Scholar 

  6. Ding C, Li T (2007) Adaptive dimension reduction using discriminant analysis and K- means clustering. In: Ghahramani Z (ed) ACM International Conference Proceeding Series, vol 227. Association for Computing Machinery, New York, pp 521–528

    Google Scholar 

  7. El-Mandouh AM, Mahmoud HA, Abd-Elmegid LA, Haggag MH (2019) Optimized K-means clustering model based on gap statistic. Int J Adv Comput Sci Appl (IJACSA) 10(1):183–188

    Google Scholar 

  8. Gao W, Hu L, Zhang P (2020) Feature redundancy term variation for mutual information-based feature selection. Appl Intell 50(4):1272–1288

    Article  Google Scholar 

  9. Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J Roy Stat Soc Ser B (Methodol) 58(1):155–176

    MathSciNet  MATH  Google Scholar 

  10. He C, Fu H, Guo C, Luk W, Yang G (2017) A fully-pipelined hardware design for Gaussian mixture models. IEEE Trans Comput 66(11):1837–1850

    Article  MathSciNet  Google Scholar 

  11. Houari R, Bounceur A, Kechadi MT, Tari AK, Euler R (2016) Dimensionality reduction in data mining. Expert Syst Appl Int J 64(C): 247–260

    Google Scholar 

  12. Kamper H, Livescu K, Goldwater S (2017) An embedded segmental K-means model for unsupervised segmentation and clustering of speech. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU),pp 719–726

    Google Scholar 

  13. Krzanowski WJ (2018) Attribute selection in correspondence analysis of incidence matrices. J Roy Stat Soc: Ser C (Appl Stat) 42(3):529–541

    MATH  Google Scholar 

  14. Kumar, BS, Ravi V (2017) LDA based feature selection for document clustering. In: Proceedings of the 10th Annual ACM India Compute Conference, pp. 125–130. Association for Computing Machinery, New York

    Google Scholar 

  15. Lu J, Plataniotis KN, Venetsanopoulos AN (2003) Face recognition using LDA-based algorithms. IEEE Trans Neural Netw 14(1):195–200

    Article  Google Scholar 

  16. Maugis C, Celeux G, Martin-Magniette ML (2009) Variable selection for clustering with Gaussian mixture models. Biometrics 65(3):701–709

    Article  MathSciNet  Google Scholar 

  17. Mohd MRS, Herman SH, Sharif Z (2017) Application of K-Means clustering in hot spot detection for thermal infrared images. In: IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), pp 107–110

    Google Scholar 

  18. Morissette L, Chartier S (2013) The k-means clustering technique: general considerations and implementation in Mathematica. Tutor Quant Methods Psychol 9(1)

    Google Scholar 

  19. Nazari Z, Kang D, Asharif MR, Sung Y, Ogawa S (2016) A new hierarchical clustering algorithm. In: ICIIBMS 2015–International Conference on Intelligent Informatics and Biomedical Sciences, pp 148–152

    Google Scholar 

  20. Duda O, Peter E, Hart DGS (eds) (2000) Pattern Classification. 2nd edn. Wiley, United States

    Google Scholar 

  21. Senawi A, Wei HL, Billings SA (2017) A new maximum relevance-minimum multicollinearity (MRmMC) method for feature selection and ranking. Pattern Recognit. 67: 47–61

    Google Scholar 

  22. Sharmin S, Shoyaib M, Ali AA, Khan MAH, Chae O (2019) Simultaneous featureselection and discretization based on mutual information. Pattern Recogn 91:162–174

    Article  Google Scholar 

  23. Uddin, MP, Mamun, MA, Hossain, MA (2020) PCA-based feature reduction for hyperspectral remote sensing image classification. IETE Techn Rev 1–21

    Google Scholar 

  24. Ünlü R, Xanthopoulos P (2019) Estimating the number of clusters in a dataset via consensus clustering. Expert Syst Appl 125:33–39

    Article  Google Scholar 

  25. Vashishth V, Chhabra A (2019) GMMR: a Gaussian mixture model based unsupervised machine learning approach for optimal routing in opportunistic IoT networks. Comput Commun 134:138–148

    Article  Google Scholar 

  26. Xiao J, Lu J, Li X (2017) Davies bouldin index based hierarchical initialization K-means. Intell Data Anal 21(6):1327–1338

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the International Islamic University Malaysia (IIUM), Universiti Malaysia Pahang (UMP) and the Universiti Teknologi MARA (UiTM) for providing financial support under the IIUM-UMP-UiTM Sustainable Research Collaboration Grant 2020 (Vote Number: RDU200722).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Senawi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tie, K.H., Senawi, A., Chuan, Z.L. (2022). An Observation of Different Clustering Algorithms and Clustering Evaluation Criteria for a Feature Selection Based on Linear Discriminant Analysis. In: Khairuddin, I.M., et al. Enabling Industry 4.0 through Advances in Mechatronics. Lecture Notes in Electrical Engineering, vol 900. Springer, Singapore. https://doi.org/10.1007/978-981-19-2095-0_42

Download citation

Publish with us

Policies and ethics