Skip to main content
Log in

Neighbourhood discernibility degree-based semisupervised feature selection for partially labelled mixed-type data with granular ball

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Feature selection can effectively decrease data dimensions by selecting a relevant feature subset. Rough set theory provides a powerful theoretical framework for the feature selection of categorical data with complete labels. However, in reality, the given datasets have only a small number of objects with label information and many unlabelled objects. Furthermore, most of feature selection approaches are computationally expensive. To address the above problems, a semisupervised feature selection algorithm based on neighbourhood discernibility with pseudolabelled granular balls is proposed. First, the set of granular balls based on the purity is generated, which reduces the universe space by sampling. Then, the neighbourhood discernibility is proposed to validate the importance of the candidate features for both labelled and unlabelled objects. Finally, an ensemble voting algorithm is designed to execute feature selection, and a feature subset with satisfactory performance is selected fairly not arbitrarily. On UCI datasets, experimental results verify the advantage of the proposed feature selection algorithm in terms of the feature subset size, classification accuracy and computational time against other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability and access

The data that support the findings of this study are openly available in the UCI machine learning repository at http://archive.ics.uci.edu/ml, reference number [44].

References

  1. Khaire UM, Dhanalakshmi R (2022) Stability of feature selection algorithm: A review. Journal of King Saud University - Computer and Information Sciences 34(4):1060–1073

    Article  Google Scholar 

  2. Li X, Wang Y et al (2020) A Survey on Sparse Learning Models for Feature Selection. IEEE Transactions on Cybernetics 52(3):1642–1660

    Article  Google Scholar 

  3. Hancer E, Xue B et al (2022) Fuzzy filter cost-sensitive feature selection with differential evolution. Knowl-Based Syst 241:108259

    Article  Google Scholar 

  4. Huang P, Yang X (2022) Unsupervised feature selection via adaptive graph and dependency score. Pattern Recogn 127:108622

  5. Hja B, Bao Q (2022) On (O, G)-fuzzy rough sets based on overlap and grouping functions over complete lattices. Int J Approximate Reasoning 144:18–50

    Article  MathSciNet  MATH  Google Scholar 

  6. Shu W, Yan Z et al (2022) Information granularity-based incremental feature selection for partially labeled hybrid data. Intelligent Data Analysis 26(1):33–56

    Article  Google Scholar 

  7. Hb A, Dla B et al (2022) Spatial rough set-based geographical detectors for nominal target variables. Inf Sci 586:525–539

    Article  Google Scholar 

  8. Jxa B, Bao Q et al (2022) A novel method to attribute reduction based on weighted neighborhood probabilistic rough sets. Int J Approximate Reasoning 144:1–17

    Article  MathSciNet  Google Scholar 

  9. Chen B, Chen L et al (2022) Uncertainty Measurement and Attribute Reduction Algorithm Based on Kernel Similarity Rough Set Model. Journal of Mathematics 2022:5675200

    Article  MathSciNet  Google Scholar 

  10. Hu Q, Yu D et al (2022) Granular computing based machine learning in the era of big data. Inf Sci 591:422–423

    Article  Google Scholar 

  11. Xia S, Zhang Z et al (2020) GBNRS: A Novel Rough Set Algorithm for Fast Adaptive Attribute Reduction in Classification. IEEE Trans Knowl Data Eng 34(3):1231–1242

    Article  Google Scholar 

  12. Qian Y, Liang X et al (2018) Local rough set: A solution to rough data analysis in big data. Int J Approximate Reasoning 97:38–63

    Article  MathSciNet  MATH  Google Scholar 

  13. Wan J, Chen H et al (2021) A novel hybrid feature selection method considering feature interaction in neighborhood rough set. Knowl-Based Syst 227:107167

  14. Kim K, Jun C (2018) Rough set model based feature selection for mixed-type data with feature space decomposition. Expert Syst Appl 103:196–205

    Article  Google Scholar 

  15. Wang C, Huang Y et al (2019) Feature selection based on neighborhood self-information. IEEE Transactions on Cybernetics 50(1):4031–4042

    Google Scholar 

  16. Pang Q, Zhang L (2020) Semi-supervised neighborhood discrimination index for feature selection. Knowl-Based Syst 204:106244

  17. Liu K, Yang X et al (2019) Rough set based semi-supervised feature selection via ensemble selector. Knowl-Based Syst 165:282–296

    Article  Google Scholar 

  18. Dai J, Hu Q et al (2017) Attribute selection for partially labeled categorical data by rough set approach. IEEE Transcations on Cybernetics 47:2460–2471

    Article  Google Scholar 

  19. Dai J, Liu Q (2022) Semi-supervised attribute reduction for interval data based on misclassification cost. Int J Mach Learn Cybern 13(6):1739–1750

    Article  Google Scholar 

  20. Wang F, Liu J et al (2018) Semi-supervised feature selection algorithm based on information entropy. Computer Science 45:427–430

    Google Scholar 

  21. Gao C, Zhou J (2021) Granular conditional entropy-based attribute reduction for partially labeled data with proxy labels. Inf Sci 580:111–128

    Article  MathSciNet  Google Scholar 

  22. Liu K, Tsang E (2020) Neighborhood attribute reduction approach to partially labeled data. Granular Computing 5:239–250

    Article  Google Scholar 

  23. Jiang Z, Liu K et al (2021) Accelerator for crosswise computing reduct. Appl Soft Comput 98:106740

  24. Ni P, Zhao S (2019) PARA: A positive-region based attribute reduction accelerator. Inf Sci 503:533–550

    Article  Google Scholar 

  25. Wang C, Huang Y et al (2019) Fuzzy rough set-based attribute reduction using distance measures. Knowl-Based Syst 164:205–212

    Article  Google Scholar 

  26. Dai J, Wang W et al (2019) Attribute selection based on a new conditional entropy for incomplete decision systems. Knowl-Based Syst 39:207–213

    Article  Google Scholar 

  27. Zhang X, Mei C et al (2020) Active incremental feature selection using a fuzzy-rough-set-based information entropy. IEEE Transacions on Fuzzy Systems 28(5):901–915

    Article  Google Scholar 

  28. Luo S, Miao D et al (2020) A neighborhood rough set model with nominal metric embedding. Inf Sci 520:373–388

    Article  MathSciNet  MATH  Google Scholar 

  29. Sun L, Zhang X et al (2019) Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci 502:18–41

    Article  MathSciNet  MATH  Google Scholar 

  30. Wei W, Wu X et al (2018) Discernibility matrix based incremental attribute reduction for dynamic data. Knowl-Based Syst 140:142–157

    Article  Google Scholar 

  31. Lin R, Li J et al (2021) Attribute reduction in fuzzy multi-covering decision systems via observational-consistency and fuzzy discernibility. Journal of Intelligent & Fuzzy Systems 40(3):5239–5253

    Article  Google Scholar 

  32. Liu Y, Zheng L et al (2020) Discernibility matrix based incremental feature selection on fused decision tables. Int J Approximate Reasoning 118:1–26

    Article  MathSciNet  MATH  Google Scholar 

  33. Li L, Li M et al (2019) A simple discernibility matrix for attribute reduction in formal concept analysis based on granular concepts. Journal of Intelligent & Fuzzy Systems 37(3):4325–4337

    Article  Google Scholar 

  34. Sheng K, Wang W et al (2020) Neighborhood Discernibility Degree Incremental Attribute Reduction Algorithm for Mixed Data. Acta Electron Sin 48(04):682–696

    Google Scholar 

  35. Jiang Z, Liu K et al (2020) Accelerator for supervised neighborhood based attribute reduction. Int J Approximate Reasoning 119:122–150

    Article  MathSciNet  MATH  Google Scholar 

  36. Jiang Z, Yang X et al (2019) Accelerator for multi-granularity attribute reduction. Knowl-Based Syst 177:145–158

    Article  Google Scholar 

  37. Chen Y, Wang P et al (2021) Granular ball guided selector for attribute reduction. Knowl-Based Syst 229:107326

  38. Zhao J, Liang J et al (2020) Accelerating information entropy-based feature selection using rough set theory with classified nested equivalence classes. Pattern Recogn 107:107517

  39. Rao X, Yang X et al (2020) Quickly calculating reduct: An attribute relationship based approach. Knowl-Based Syst 200(7):106014

  40. Xia S, Liu Y et al (2019) Granular ball computing classifiers for efficient, scalable and robust learning. Inf Sci 483:136–152

    Article  MathSciNet  Google Scholar 

  41. Xia S, Peng D et al (2020) A Fast Adaptive k-means with No Bounds. IEEE Trans Pattern Anal Mach Intell 44(1):87–99

    Google Scholar 

  42. Ba J, Chen Y et al (2021) Quick Strategy for Searching Granular Ball Rough Set Based Reduct. Journal of Nanjing University of Science and Technology 45(4):394–400

    Google Scholar 

  43. Shu W, Qian W et al (2020) Incremental feature selection for dynamic hybrid data using neighborhood rough set. Knowl-Based Syst 194:105516

  44. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No.62266018 and No.61966016), the Natural Science Foundation of Jiangxi Province, China (No.20224BAB202020), and the National Key Research and Development Program of China (No.2020YFD1100605).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Wenhao Shu; Jianhui Yu; Methodology: Jianhui Yu; Writing – original draft preparation: Jianhui Yu, Ting Chen; Writing – review and editing: Jianhui Yu,Wenhao Shu; Funding acquisition: Wenbin Qian.

Corresponding author

Correspondence to Wenbin Qian.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shu, W., Yu, J., Chen, T. et al. Neighbourhood discernibility degree-based semisupervised feature selection for partially labelled mixed-type data with granular ball. Appl Intell 53, 22467–22487 (2023). https://doi.org/10.1007/s10489-023-04657-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04657-7

Keywords

Navigation