Abstract
The theory of rough sets is one of the most representative models for handling supervised data entangled with vagueness, impreciseness, or uncertainty. However, little work has been devoted to learning from partially labeled data using rough sets. In this study, a rough sets-based tri-trade model is proposed for partially labeled data. More specifically, a new discernibility matrix that considers both labeled and unlabeled data is first proposed, based on which a beam search-based heuristic algorithm is provided to generate multiple semi-supervised reducts. Then, a tri-trade model using three diverse semi-supervised reducts is developed, in which a data editing technique is embedded to generate reliable pseudo-labels for unlabeled data to improve the tri-trade model. Both theoretical analysis and comparative experiments on the UCI datasets show that the proposed model can effectively utilize unlabeled data to improve generalization performance and compare favorably to other representative methods.
Similar content being viewed by others
Data Availability
Data and code are available from the corresponding author upon reasonable request.
References
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
Xu W, Yu J (2017) A novel approach to information fusion in multi-source datasets: a granular computing viewpoint. Inf Sci 378:410–423
Chen X, Xu W (2022) Double-quantitative multigranulation rough fuzzy set based on logical operations in multi-source decision systems. Int J Mach Learn Cybern 13(4):1021–1048
Xue Z, Zhang R, Qin C, Zeng X (2018) A rough ν-twin support vector regression machine. Appl Intell 48(11):4023–4046
Sun L, Zhang X, Qian Y, Xu J, Zhang S, Tian Y (2019) Joint neighborhood entropy-based gene selection method with fisher score for tumor classification. Appl Intell 49(4):1245–1259
Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publisher, Dordrecht
Bai S, Lin Y, Lv Y, Chen J, Wang C (2021) Kernelized fuzzy rough sets based online streaming feature selection for large-scale hierarchical classification. Appl Intell 51(3):1602–1615
Li Y, Cai M, Zhou J, Li Q (2022) Accelerated multi-granularity reduction based on neighborhood rough sets. Appl Intell 52(15):17636–17651
Sun L, Zhang J, Ding W, Xu J (2022) Mixed measure-based feature selection using the fisher score and neighborhood rough sets. Appl Intell 52:17264–17288
Wang Cz, Huang Y, Ding W, Cao Z (2021) Attribute reduction with fuzzy rough self-information measures. Inf Sci 549:68–86
Wang C, Qian Y, Ding W, Fan X (2022) Feature selection with fuzzy-rough minimum classification error criterion. IEEE Trans Fuzzy Syst 30(8):2930–2942
Wang C, Huang Y, Shao M, Hu Q, Chen D (2020) Feature selection based on neighborhood self-information. IEEE Trans Fuzzy Syst 50(9):4031–4042
Zhang X, Yao Y (2022) Tri-level attribute reduction in rough set theory. Expert Syst Appl 190:116–187
Zhang X, Jiang J (2022) Measurement, modeling, reduction of decision-theoretic multigranulation fuzzy rough sets based on three-way decisions. Inf Sci 607:1550–1582
Yang X, Li M, Fujita H, Liu D, Li T (2022) Incremental rough reduction with stable attribute group. Inf Sci 589:283–299
Liu K, Li T, Yang X, Ju H, Yang X, Liu D (2022) Hierarchical neighborhood entropy based multi-granularity attribute reduction with application to gene prioritization. Int J Approx Reason 148:57–67
Cai M, Lang G, Fujita H, Li Z, Yang T (2019) Incremental approaches to updating reducts under dynamic covering granularity. Knowl-Based Syst 172:130–140
Yang X, Yang Y, Luo J, Liu D, Li T (2022) A unified incremental updating framework of attribute reduction for two-dimensionally time-evolving data. Inf Sci 601:287–305
Yang X, Li Y, Liu D, Li T (2022) Hierarchical fuzzy rough approximations with three-way multi-granularity learning. IEEE Trans Fuzzy Syst 30(9):3486–3500
Wei W, Wu X, Liang J, Cui J, Sun Y (2018) Discernibility matrix based incremental attribute reduction for dynamic data. Knowl-Based Syst 140:142–157
Ma F, Ding M, Zhang T, Cao J (2019) Compressed binary discernibility matrix based incremental attribute reduction algorithm for group dynamic data. Neurocomputing 344:20–27
Liu Y, Zheng L, Xiu Y, Yin H, Zhao S, Wang X, Chen H, Li C (2020) Discernibility matrix based incremental feature selection on fused decision tables. Int J Approx Reason 118:1–26
Gao C, Zhou J, Miao D, Wen J, Yue X (2021) Three-way decision with co-training for partially labeled data. Inf Sci 544:500–518
Xin X, Shi C, Sun J, Xue Z, Song J, Peng W (2022) A novel attribute reduction method based on intuitionistic fuzzy three-way cognitive clustering. Appl Intell :1–15
Wu F, Jing X, Wei P, Lan C, Ji Y, Jiang G, Huang Q (2022) Semi-supervised multi-view graph convolutional networks with application to webpage classification. Inf Sci 591:142–154
Idhammad M, Afdel K, Belouch M (2018) Semi-supervised machine learning approach for ddos detection. Appl Intell 48(10):3193–3208
Mittal H, Pandey AC, Pal R, Tripathi A (2021) A new clustering method for the diagnosis of covid19 using medical images. Appl Intell 51(5):2988–3011
Dai J, Hu Q, Zhang J, Hu H, Zheng N (2016) Attribute selection for partially labeled categorical data by rough set approach. IEEE Trans Cybern 47(9):2460–2471
Hu S, Miao D, Zhang Z, Luo S, Zhang Y, Hu G (2018) A test cost sensitive heuristic attribute reduction algorithm for partially labeled data. In: International joint conference on rough sets, Springer, pp 257–269
Xie X, Qin X, Huang G, Zhao W (2019) Attribute reduction for partially labeled data based on hypergraph models. In: 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI), pp 1434–1439
Liu K, Yang X, Yu H, Mi J, Wang P, Chen X (2019) Rough set based semi-supervised feature selection via ensemble selector. Knowl-Based Syst 165:282–296
Gao C, Zhou J, Miao D, Yue X, Wan J (2021) Granular-conditional-entropy-based attribute reduction for partially labeled data with proxy labels. Inf Sci 580:111–128
Wang R, Chen D, Kwong S (2013) Fuzzy-rough-set-based active learning. IEEE Trans Fuzzy Syst 22(6):1699–1704
Min F, Liu F-L, Wen L-Y, Zhang Z-H (2019) Tri-partition cost-sensitive active learning through knn. Soft Comput 23(5):1557–1572
Cekik R, Uysal AK (2020) A novel filter feature selection method using rough set for short text data. Expert Syst Appl 160:113691
Kuo C, Shieh H (2015) A semi-supervised learning algorithm for data classification. Int J Pattern Recogn Artif Intell 29(05):1551007
Bharadwaj A, Ramanna S (2019) Categorizing relational facts from the web with fuzzy rough sets. Knowl Inf Syst 61(3):1695–1713
Agrawal S, Ahmed R, Anand Kumar M, Ramanna S (2022) Categorizing relations via semi-supervised learning using a hybrid tolerance rough sets and genetic algorithm approach. In: Soft computing for data analytics, classification model, and control, Springer, pp 103–116
Bougoudis I, Demertzis K, Iliadis L, Anezakis V-D, Papaleonidas A (2018) Fussffra, a fuzzy semi-supervised forecasting framework: the case of the air pollution in athens. Neural Comput Applic 29(7):375–388
Zhou Z, Li M (2005) Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17(11):1529–1541
Yang X, Chen Y, Fujita H, Liu D, Li T (2022) Mixed data-driven sequential three-way decision via subjective–objective dynamic fusion. Knowl-Based Syst 237:107728
Kostopoulos G, Karlos S, Kotsiantis S, Ragos O (2018) Semi-supervised regression: a recent review. J Intell Fuzzy Syst 35(2):1483–1500
Xu W, Guo Y (2016) Generalized multigranulation double-quantitative decision-theoretic rough set. Knowl-based Syst 105:190–205
Sang B, Yang L, Chen H, Xu W, Guo Y, Yuan Z (2019) Generalized multi-granulation double-quantitative decision-theoretic rough set of multi-source information system. Int J Approx Reason 115:157–179
Li W, Xu W, Zhang X, Zhang J (2021) Updating approximations with dynamic objects based on local multigranulation rough sets in ordered information systems. Artif Intell Rev 55:1821–1855
Zhou Z, Li M (2010) Semi-supervised learning by disagreement. Knowl Inf Syst 24(3):415–439
Triguero I, García S, Herrera F (2015) Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl Inf Syst 42(2):245–284
Tanha J, van Someren M, Afsarmanesh H (2017) Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybernet 8(1):355–370
Zhang M, Zhou Z (2011) Cotrade: confident co-training with data editing. IEEE Trans Syst Man Cybernet Part B (Cybernet) 41(6):1612–1626
Eibe F, Hall MA, Witten IH (2016) The weka workbench. In: Online appendix for data mining: practical machine learning tools and techniques Morgan Kaufmann. Elsevier, Amsterdam
Sun L, Wang T, Ding W, Xu J, Lin Y (2021) Feature selection using fisher score and multilabel neighborhood rough sets for multilabel classification. Inf Sci 578:887–912
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection, advances in nerual information processing systems, MIT Press, Cambridge
Acknowledgements
The authors would like to thank the Editor-in-Chief, Editor, and anonymous reviewers for their kind help and valuable comments. This work is supported in part by the National Natural Science Foundation of China (Nos. 61806127, 62076164), the Natural Science Foundation of Guangdong Province, China (No. 2021A1515011861), Shenzhen Science and Technology Program (No. JCYJ20210324094601005), and Shenzhen Institute of Artificial Intelligence and Robotics for Society.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests that could influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Luo, Z., Gao, C. & Zhou, J. Rough sets-based tri-trade for partially labeled data. Appl Intell 53, 17708–17726 (2023). https://doi.org/10.1007/s10489-022-04405-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04405-3