A Feature Selection Method Using Conditional Correlation Dispersion and Redundancy Analysis

Zhang, Li

doi:10.1007/s11063-023-11256-7

A Feature Selection Method Using Conditional Correlation Dispersion and Redundancy Analysis

Published: 31 March 2023

Volume 55, pages 7175–7209, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Li Zhang ORCID: orcid.org/0000-0002-9306-7778¹

395 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Many irrelevant and redundant features are commonly found in high-dimensional small sample data. Feature selection effectively solves high-dimensional minor sample problems by removing many irrelevant and redundant features and improving the algorithm's accuracy. In some information-theoretic-based feature selection algorithms, the problem is that choosing different parameters means choosing different feature selection algorithms. How to dynamically circumvent the pre-determined a priori parameters become an urgent problem to be solved. The paper proposes a dynamic weighted conditional relevance dispersion and redundancy analysis (WRRFS) algorithm for feature selection. Firstly, the algorithm uses mutual information to calculate feature correlations and redundancy between features. Secondly, calculate the mean of the feature correlation terms, and the parameter weights of the conditional feature correlation terms are dynamically adjusted using the standard deviation. Finally, WRRFS is validated against other feature selection algorithms on three classifiers using 12 different datasets with classification accuracy metrics (f1_macro,f1_micro, and f1_weighted). The experimental results show that the WRRFS algorithm can improve the quality of feature subsets and increase classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Feature Selection Method Using Dynamic Dependency and Redundancy Analysis

Article 08 February 2022

Dynamic feature selection method with minimum redundancy information for linear data

Article 22 June 2020

Feature selection based on mutual information with correlation coefficient

Article 12 August 2021

References

Sen L, Anjun Ma, Sen Y et al (2018) A review of matched-pairs feature selection methods for gene expression data analysis. Comput Struct Biotechnol J 16:88–97. https://doi.org/10.1016/j.csbj.2018.02.005
Article Google Scholar
Sadat HE, Hossein MM (2019) Evolutionary feature subsets selection based on interaction information for high dimensional imbalanced data classification. Appl Soft Comput 82:105581. https://doi.org/10.1016/j.asoc.2019.105581
Article Google Scholar
Muhammed A-E, Marco A, Mohamed R (2021) Classification of breast cancer using microarray gene expression data: a survey. J Biomed Inform 117:103764. https://doi.org/10.1016/j.jbi.2021.103764
Article Google Scholar
Hambali MA, Oladele TO, Adewole KS (2020) Microarray cancer feature selection: Review, challenges and research directions. Int J Cogn Comput Eng 1:78–97. https://doi.org/10.1016/j.ijcce.2020.11.001
Article Google Scholar
Kushal KG, Shemim B, Aritra S et al (2021) Theoretical and empirical analysis of filter ranking methods: experimental study on benchmark DNA microarray data. Expert Syst Appl 169:114485. https://doi.org/10.1016/j.eswa.2020.114485
Article Google Scholar
Ali D, Abdelkamel T, Samy M et al (2021) Gene selection and classification of microarray data method based on mutual information and moth flame algorithm. Expert Syst Appl 166:114012. https://doi.org/10.1016/j.eswa.2020.114012
Article Google Scholar
Yang Z, Chaobo Z, Yiwen Z et al (2020) A review of data mining technologies in building energy systems: load prediction, pattern identification, fault detection and diagnosis. Energy Built Environ 1(2):149–164. https://doi.org/10.1016/j.enbenv.2019.11.003
Article Google Scholar
Heng L, Gregory D (2019) A semi-parallel framework for greedy information-theoretic feature selection. Inf Sci 492:13–28. https://doi.org/10.1016/j.ins.2019.03.075
Article MathSciNet MATH Google Scholar
Kumar PS, Bhushan MR, Kumar TA (2021) Machine learning based methods for software fault prediction: a survey. Expert Syst Appl 172:114595. https://doi.org/10.1016/j.eswa.2021.114595
Article Google Scholar
Wang X, Liu J, Cheng Y et al (2019) Dual hypergraph regularized PCA for biclustering of tumor gene expression data. IEEE Trans Knowl Data Eng 31(12):2292–2303. https://doi.org/10.1109/TKDE.2018.2874881
Article Google Scholar
Jie C, Jiawei L, Shulin W et al (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79. https://doi.org/10.1016/j.neucom.2017.11.077
Article Google Scholar
Liyang G, Weiguo W (2020) Relevance assignation feature selection method based on mutual information for machine learning. Knowl Based Syst 209:106439. https://doi.org/10.1016/j.knosys.2020.106439
Article Google Scholar
Gavin B, Adam P, Ming-Jie Z et al (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13:27–66
MathSciNet MATH Google Scholar
Lee C-Y, Cai J-Y (2020) LASSO variable selection in data envelopment analysis with small datasets. Omega 91:102019. https://doi.org/10.1016/j.omega.2018.12.008
Article Google Scholar
Divya J, Vijendra S (2018) Feature selection and classification systems for chronic disease prediction: a review. Egypt Inform J 19(3):179–189. https://doi.org/10.1016/j.eij.2018.03.002
Article Google Scholar
Albashish D, Hammouri AI, Braik M et al (2021) Binary biogeography-based optimization based SVM-RFE for feature selection. Appl Soft Comput 101:107026. https://doi.org/10.1016/j.asoc.2020.107026
Article Google Scholar
Hua Z, Zhou J, Hua Y et al (2020) Strong approximate Markov blanket and its application on filter-based feature selection. Appl Soft Comput 87:105957. https://doi.org/10.1016/j.asoc.2019.105957
Article Google Scholar
Zhang P, Gao W (2020) Feature selection considering uncertainty change ratio of the class label. Appl Soft Comput 95:106537. https://doi.org/10.1016/j.asoc.2020.106537
Article Google Scholar
Jun W, Jinmao W, Zhenglu Y et al (2017) Feature selection by maximizing independent classification information. IEEE Trans Knowl Data Eng 29(4):828–841. https://doi.org/10.1109/TKDE.2017.2650906
Article Google Scholar
Salem OAM, Liu F, Chen Y-PP et al (2021) Feature selection and threshold method based on fuzzy joint mutual information. Int J Approx Reason 132:107–126. https://doi.org/10.1016/j.ijar.2021.01.003
Article MathSciNet MATH Google Scholar
Emrah H, Bing X, Mengjie Z (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowl Based Syst 140:103–119. https://doi.org/10.1016/j.knosys.2017.10.028
Article Google Scholar
Li Z (2021) A new feature selection using dynamic interaction. Pattern Anal Appl 24(1):203–215. https://doi.org/10.1007/s10044-020-00916-2
Article Google Scholar
Xie J-Y, Wang M-Z, Zhou Y et al (2019) Differential expression gene selection algorithms for unbalanced gene datasets. Chin J Comput 42(06):1232–1251. https://doi.org/10.11897/SP.J.1016.2019.01232
Article Google Scholar
Yang HH, Moody J (1999) Data visualization and feature selection: new algorithms for nonGaussian data. In: Proceedings of the 12th international conference on neural information processing systems. MIT Press, Denver, CO, pp 687–693
Gao W, Hu L, Zhang P et al (2018) Feature selection considering the composition of feature relevancy. Pattern Recognit Lett 112:70–74. https://doi.org/10.1016/j.patrec.2018.06.005
Article Google Scholar
François F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555
MathSciNet MATH Google Scholar
Mohamed B, Yulia H, Rossitza S (2015) Feature selection using Joint mutual information maximisation. Expert Syst Appl 42(22):8520–8532. https://doi.org/10.1016/j.eswa.2015.07.007
Article Google Scholar
Lijun X, Guo J, Xiangyuan Gu (2019) Algorithm for selection of features based on dynamic weights using redundancy. J Xidian Univ 46(05):155–161. https://doi.org/10.19665/j.issn1001-2400.2019.05.022
Article Google Scholar
Zhang P, Gao W, Hu J et al (2021) A conditional-weight joint relevance metric for feature relevancy term. Eng Appl Artif Intell 106:104481. https://doi.org/10.1016/j.engappai.2021.104481
Article Google Scholar
Chen Z, Wu C, Zhang Y, Huang Z, Bin Ran MZ, Lyu N (2015) Feature selection with redundancy-complementariness dispersion. Knowl Based Syst 89:203–217. https://doi.org/10.1016/j.knosys.2015.07.004
Article Google Scholar
Dina R, Abecasis GR, Glaser B et al (2010) Functional gene group analysis reveals a role of synaptic heterotrimeric g proteins in cognitive ability. Am J Hum Genet 86(2):113–125. https://doi.org/10.1016/j.ajhg.2009.12.006
Article Google Scholar
Zhang L, Chen X (2021) Feature selection methods based on symmetric uncertainty coefficients and independent classification information. IEEE Access 9:13845–13856. https://doi.org/10.1109/access.2021.3049815
Article Google Scholar
Lin X, Li C, Ren W et al (2019) A new feature selection method based on symmetrical uncertainty and interaction gain. Comput Biol Chem 83:107149. https://doi.org/10.1016/j.compbiolchem.2019.107149
Article MathSciNet Google Scholar
Guanglu S, Jiabin Li, Jian D et al (2018) Feature selection for IoT based on maximal information coefficient. Futur Gener Comput Syst 89:606–616. https://doi.org/10.1016/j.future.2018.05.060
Article Google Scholar
Lewis DD (1992) Feature selection and feature extraction for text categorization. In: Proceedings of the workshop on speech and natural language. Association for Computational Linguistics, Harriman, pp 212–217. https://doi.org/10.3115/1075527.1075574
Marko R-Š, Igor K (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1):23–69. https://doi.org/10.1023/A:1025667309714
Article MATH Google Scholar
Juanying X, Mingzhao W, Ying Z et al (2014) Several feature selection algorithms based on the discernibility of a feature subset and support vector machines. Chin J Comput 37(08):1704–1718
Google Scholar
Hanchuan P, Fuhui L, Chris D (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238. https://doi.org/10.1109/TPAMI.2005.159
Article Google Scholar
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550. https://doi.org/10.1109/72.298224
Article Google Scholar
Lin D, Tang X (2006) Conditional infomax learning: an integrated framework for feature extraction and fusion. In: Proceedings of the 9th European conference on computer vision—volume part I. Springer, Graz, pp 68–82. https://doi.org/10.1007/11744023_6
Gao W, Hu L, Zhang P et al (2018) Feature selection by integrating two groups of feature evaluation criteria. Expert Syst Appl 110:11–19. https://doi.org/10.1016/j.eswa.2018.05.029
Article Google Scholar
Ping Z, Wanfu G, Guixia L (2018) Feature selection considering weighted relevancy. Appl Intell 48(12):4615–4625. https://doi.org/10.1007/s10489-018-1239-6
Article Google Scholar
Gao W, Hu L, Zhang P (2018) Class-specific mutual information variation for feature selection. Pattern Recogn 79:328–339. https://doi.org/10.1016/j.patcog.2018.02.020
Article Google Scholar
Gu X, Guo J, Xiao L, Li C (2022) Conditional mutual information-based feature selection algorithm for maximal relevance minimal redundancy. Appl Intell 52(2):1436–1447. https://doi.org/10.1007/s10489-021-02412-4
Article Google Scholar
Hongqiang L, Mingxi W, Jiuqiang H et al (2017) A filter feature selection method based on the maximal information coefficient and gram-Schmidt orthogonalization for biomedical data mining. Comput Biol Med 89:264–274. https://doi.org/10.1016/j.compbiomed.2017.08.021
Article Google Scholar
Gao W, Hu L, Zhang P (2020) Feature redundancy term variation for mutual information-based feature selection. Appl Intell 50(4):1272–1288. https://doi.org/10.1007/s10489-019-01597-z
Article Google Scholar
Zhou H, Wang X, Zhang Y (2020) Feature selection based on weighted conditional mutual information. Appl Comput Inform. https://doi.org/10.1016/j.aci.2019.12.003
Article Google Scholar
Zhou H, Wang X, Zhu R (2022) Feature selection based on mutual information with correlation coefficient. Appl Intell 52(5):5457–5474. https://doi.org/10.1007/s10489-021-02524-x
Article Google Scholar
Liu Yi, Cao J-J, Diao X-C et al (2018) Survey on stability of feature selection. J Softw 29(09):2559–2579. https://doi.org/10.13328/j.cnki.jos.005394
Article MathSciNet MATH Google Scholar

Download references

Funding

This study was supported by Jiangsu University of Technology Doctoral Research Start-up Fund (Grant No. KYY19042).

Author information

Authors and Affiliations

College of Computer Engineering, Jiangsu University of Technology, Changzhou, 213001, China
Li Zhang

Authors

Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, L. A Feature Selection Method Using Conditional Correlation Dispersion and Redundancy Analysis. Neural Process Lett 55, 7175–7209 (2023). https://doi.org/10.1007/s11063-023-11256-7

Download citation

Accepted: 13 March 2023
Published: 31 March 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11063-023-11256-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Feature Selection Method Using Conditional Correlation Dispersion and Redundancy Analysis

Abstract

Access this article

Similar content being viewed by others

A Feature Selection Method Using Dynamic Dependency and Redundancy Analysis

Dynamic feature selection method with minimum redundancy information for linear data

Feature selection based on mutual information with correlation coefficient

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Feature Selection Method Using Conditional Correlation Dispersion and Redundancy Analysis

Abstract

Access this article

Similar content being viewed by others

A Feature Selection Method Using Dynamic Dependency and Redundancy Analysis

Dynamic feature selection method with minimum redundancy information for linear data

Feature selection based on mutual information with correlation coefficient

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation