Abstract
Feature selection can effectively decrease data dimensions by selecting a relevant feature subset. Rough set theory provides a powerful theoretical framework for the feature selection of categorical data with complete labels. However, in reality, the given datasets have only a small number of objects with label information and many unlabelled objects. Furthermore, most of feature selection approaches are computationally expensive. To address the above problems, a semisupervised feature selection algorithm based on neighbourhood discernibility with pseudolabelled granular balls is proposed. First, the set of granular balls based on the purity is generated, which reduces the universe space by sampling. Then, the neighbourhood discernibility is proposed to validate the importance of the candidate features for both labelled and unlabelled objects. Finally, an ensemble voting algorithm is designed to execute feature selection, and a feature subset with satisfactory performance is selected fairly not arbitrarily. On UCI datasets, experimental results verify the advantage of the proposed feature selection algorithm in terms of the feature subset size, classification accuracy and computational time against other algorithms.
Similar content being viewed by others
Data availability and access
The data that support the findings of this study are openly available in the UCI machine learning repository at http://archive.ics.uci.edu/ml, reference number [44].
References
Khaire UM, Dhanalakshmi R (2022) Stability of feature selection algorithm: A review. Journal of King Saud University - Computer and Information Sciences 34(4):1060–1073
Li X, Wang Y et al (2020) A Survey on Sparse Learning Models for Feature Selection. IEEE Transactions on Cybernetics 52(3):1642–1660
Hancer E, Xue B et al (2022) Fuzzy filter cost-sensitive feature selection with differential evolution. Knowl-Based Syst 241:108259
Huang P, Yang X (2022) Unsupervised feature selection via adaptive graph and dependency score. Pattern Recogn 127:108622
Hja B, Bao Q (2022) On (O, G)-fuzzy rough sets based on overlap and grouping functions over complete lattices. Int J Approximate Reasoning 144:18–50
Shu W, Yan Z et al (2022) Information granularity-based incremental feature selection for partially labeled hybrid data. Intelligent Data Analysis 26(1):33–56
Hb A, Dla B et al (2022) Spatial rough set-based geographical detectors for nominal target variables. Inf Sci 586:525–539
Jxa B, Bao Q et al (2022) A novel method to attribute reduction based on weighted neighborhood probabilistic rough sets. Int J Approximate Reasoning 144:1–17
Chen B, Chen L et al (2022) Uncertainty Measurement and Attribute Reduction Algorithm Based on Kernel Similarity Rough Set Model. Journal of Mathematics 2022:5675200
Hu Q, Yu D et al (2022) Granular computing based machine learning in the era of big data. Inf Sci 591:422–423
Xia S, Zhang Z et al (2020) GBNRS: A Novel Rough Set Algorithm for Fast Adaptive Attribute Reduction in Classification. IEEE Trans Knowl Data Eng 34(3):1231–1242
Qian Y, Liang X et al (2018) Local rough set: A solution to rough data analysis in big data. Int J Approximate Reasoning 97:38–63
Wan J, Chen H et al (2021) A novel hybrid feature selection method considering feature interaction in neighborhood rough set. Knowl-Based Syst 227:107167
Kim K, Jun C (2018) Rough set model based feature selection for mixed-type data with feature space decomposition. Expert Syst Appl 103:196–205
Wang C, Huang Y et al (2019) Feature selection based on neighborhood self-information. IEEE Transactions on Cybernetics 50(1):4031–4042
Pang Q, Zhang L (2020) Semi-supervised neighborhood discrimination index for feature selection. Knowl-Based Syst 204:106244
Liu K, Yang X et al (2019) Rough set based semi-supervised feature selection via ensemble selector. Knowl-Based Syst 165:282–296
Dai J, Hu Q et al (2017) Attribute selection for partially labeled categorical data by rough set approach. IEEE Transcations on Cybernetics 47:2460–2471
Dai J, Liu Q (2022) Semi-supervised attribute reduction for interval data based on misclassification cost. Int J Mach Learn Cybern 13(6):1739–1750
Wang F, Liu J et al (2018) Semi-supervised feature selection algorithm based on information entropy. Computer Science 45:427–430
Gao C, Zhou J (2021) Granular conditional entropy-based attribute reduction for partially labeled data with proxy labels. Inf Sci 580:111–128
Liu K, Tsang E (2020) Neighborhood attribute reduction approach to partially labeled data. Granular Computing 5:239–250
Jiang Z, Liu K et al (2021) Accelerator for crosswise computing reduct. Appl Soft Comput 98:106740
Ni P, Zhao S (2019) PARA: A positive-region based attribute reduction accelerator. Inf Sci 503:533–550
Wang C, Huang Y et al (2019) Fuzzy rough set-based attribute reduction using distance measures. Knowl-Based Syst 164:205–212
Dai J, Wang W et al (2019) Attribute selection based on a new conditional entropy for incomplete decision systems. Knowl-Based Syst 39:207–213
Zhang X, Mei C et al (2020) Active incremental feature selection using a fuzzy-rough-set-based information entropy. IEEE Transacions on Fuzzy Systems 28(5):901–915
Luo S, Miao D et al (2020) A neighborhood rough set model with nominal metric embedding. Inf Sci 520:373–388
Sun L, Zhang X et al (2019) Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci 502:18–41
Wei W, Wu X et al (2018) Discernibility matrix based incremental attribute reduction for dynamic data. Knowl-Based Syst 140:142–157
Lin R, Li J et al (2021) Attribute reduction in fuzzy multi-covering decision systems via observational-consistency and fuzzy discernibility. Journal of Intelligent & Fuzzy Systems 40(3):5239–5253
Liu Y, Zheng L et al (2020) Discernibility matrix based incremental feature selection on fused decision tables. Int J Approximate Reasoning 118:1–26
Li L, Li M et al (2019) A simple discernibility matrix for attribute reduction in formal concept analysis based on granular concepts. Journal of Intelligent & Fuzzy Systems 37(3):4325–4337
Sheng K, Wang W et al (2020) Neighborhood Discernibility Degree Incremental Attribute Reduction Algorithm for Mixed Data. Acta Electron Sin 48(04):682–696
Jiang Z, Liu K et al (2020) Accelerator for supervised neighborhood based attribute reduction. Int J Approximate Reasoning 119:122–150
Jiang Z, Yang X et al (2019) Accelerator for multi-granularity attribute reduction. Knowl-Based Syst 177:145–158
Chen Y, Wang P et al (2021) Granular ball guided selector for attribute reduction. Knowl-Based Syst 229:107326
Zhao J, Liang J et al (2020) Accelerating information entropy-based feature selection using rough set theory with classified nested equivalence classes. Pattern Recogn 107:107517
Rao X, Yang X et al (2020) Quickly calculating reduct: An attribute relationship based approach. Knowl-Based Syst 200(7):106014
Xia S, Liu Y et al (2019) Granular ball computing classifiers for efficient, scalable and robust learning. Inf Sci 483:136–152
Xia S, Peng D et al (2020) A Fast Adaptive k-means with No Bounds. IEEE Trans Pattern Anal Mach Intell 44(1):87–99
Ba J, Chen Y et al (2021) Quick Strategy for Searching Granular Ball Rough Set Based Reduct. Journal of Nanjing University of Science and Technology 45(4):394–400
Shu W, Qian W et al (2020) Incremental feature selection for dynamic hybrid data using neighborhood rough set. Knowl-Based Syst 194:105516
UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No.62266018 and No.61966016), the Natural Science Foundation of Jiangxi Province, China (No.20224BAB202020), and the National Key Research and Development Program of China (No.2020YFD1100605).
Author information
Authors and Affiliations
Contributions
Conceptualization: Wenhao Shu; Jianhui Yu; Methodology: Jianhui Yu; Writing – original draft preparation: Jianhui Yu, Ting Chen; Writing – review and editing: Jianhui Yu,Wenhao Shu; Funding acquisition: Wenbin Qian.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shu, W., Yu, J., Chen, T. et al. Neighbourhood discernibility degree-based semisupervised feature selection for partially labelled mixed-type data with granular ball. Appl Intell 53, 22467–22487 (2023). https://doi.org/10.1007/s10489-023-04657-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04657-7