Abstract
Online streaming feature selection (OSFS) methods are used to dynamically update the feature space as well as remove irrelevant and redundant features from the data. Since most Big Data in real-world applications are generated in the form of data streams, effective methods should be developed in this area. Further, methods with low computational complexity are required to make online decisions. In this paper, the OSFS process is modeled as a multi-objective optimization problem. To the best of our knowledge, this is the first time that the concept of Pareto dominance has been applied to find the optimal subset of features in OSFS. When a new feature arrives, it is evaluated in the multi-objective space. The non-dominated features are the optimal subset for each timestamp. We proposed an efficient and effective method which enhances the classification accuracy in OSFS by minimizing the number of features within a short time. In addition, the proposed method is insensitive to the feature streams. Experiments are conducted using two classifiers and seven OSFS methods, including OSFSMI, K-OFSD, OFS-A3M, OFS-Density, Alpha-Investing, SAOLA, and OFSS-FI.
Similar content being viewed by others
Data availability
Data generated during the study are subject to a data sharing mandate and available in a few public repositories. All used data are cited in text.
References
Dhal P, Azad C (2022) A comprehensive survey on feature selection in the various fields of machine learning. Appl Intell 52:4543–4581. https://doi.org/10.1007/s10489-021-02550-9
Hashemi A, Bagher Dowlatshahi M, Nezamabadi-pour H (2021) An efficient Pareto-based feature selection algorithm for multi-label classification. Inf Sci 581:428–447. https://doi.org/10.1016/j.ins.2021.09.052
Jia W, Sun M, Lian J, Hou S (2022) Feature dimensionality reduction: a review. Complex Intell Syst 8:2663–2693. https://doi.org/10.1007/s40747-021-00637-x
Zaman EAK, Mohamed A, Ahmad A (2022) Feature selection for online streaming high-dimensional data: a state-of-the-art review. Appl Soft Comput 127:109355. https://doi.org/10.1016/j.asoc.2022.109355
You D, Sun M, Liang S et al (2022) Online feature selection for multi-source streaming features. Inf Sci 590:267–295. https://doi.org/10.1016/j.ins.2022.01.008
Zhou P, Zhang Y, Li P, Wu X (2022) General assembly framework for online streaming feature selection via Rough Set models. Expert Syst Appl 204:117520. https://doi.org/10.1016/j.eswa.2022.117520
Wu D, He Y, Luo X, Zhou M (2022) A latent factor analysis-based approach to online sparse streaming feature selection. IEEE Trans Syst Man Cybern Syst 52:6744–6758. https://doi.org/10.1109/TSMC.2021.3096065
Hashemi A, Joodaki M, Joodaki NZ, Dowlatshahi MB (2022) Ant colony optimization equipped with an ensemble of heuristics through multi-criteria decision making: a case study in ensemble feature selection. Appl Soft Comput 124:109046. https://doi.org/10.1016/j.asoc.2022.109046
Bayati H, Dowlatshahi MB, Hashemi A (2022) MSSL: a memetic-based sparse subspace learning algorithm for multi-label classification. Int J Mach Learn Cyber 13:3607–3624. https://doi.org/10.1007/s13042-022-01616-5
Dowlatshahi MB, Hashemi A (2023) Unsupervised feature selection: A fuzzy multi-criteria decision-making approach. Iran Fuzzy Syst 20:55–70
Karimi F, Dowlatshahi MB, Hashemi A (2023) SemiACO: A semi-supervised feature selection based on ant colony optimization. Expert Syst Appl 214:119130. https://doi.org/10.1016/j.eswa.2022.119130
Miri M, Dowlatshahi MB, Hashemi A et al (2022) Ensemble feature selection for multi-label text classification: an intelligent order statistics approach. Int J Intell Syst 37:11319–11341. https://doi.org/10.1002/int.23044
Eskandari S, Seifaddini M (2023) Online and offline streaming feature selection methods with bat algorithm for redundancy analysis. Pattern Recogn 133:109007. https://doi.org/10.1016/j.patcog.2022.109007
Hu X, Zhou P, Li P et al (2018) A survey on online feature selection with streaming features. Front Comput Sci 12:479–493. https://doi.org/10.1007/s11704-016-5489-3
Pajoohan M-R, Hashemi A, Dowlatshahi MB (2022) An online streaming feature selection method based on the Choquet fuzzy integral. Fuzzy Syst Appl 5:161–185. https://doi.org/10.22034/jfsa.2022.331660.1116
Rafie A, Moradi P, Ghaderzadeh A (2023) A Multi-Objective online streaming Multi-Label feature selection using mutual information. Expert Syst Appl 216:119428. https://doi.org/10.1016/j.eswa.2022.119428
Wang J, Zhao P, Hoi SCH, Jin R (2014) Online feature selection and its applications. IEEE Trans Knowl Data Eng 26:698–710. https://doi.org/10.1109/TKDE.2013.32
Hashemi A, Pajoohan M-R, Dowlatshahi MB (2022) Online streaming feature selection based on Sugeno fuzzy integral. In: 2022 9th Iranian joint congress on fuzzy and intelligent systems (CFIS). pp 1–6
Hashemi A, Dowlatshahi MB, Nezamabadi-pour H (2021) Minimum redundancy maximum relevance ensemble feature selection: A bi-objective Pareto-based approach. J Soft Comput Inf Technol
Hashemi A, Bagher Dowlatshahi M, Nezamabadi-pour H (2021) A pareto-based ensemble of feature selection algorithms. Expert Syst Appl 180:115130. https://doi.org/10.1016/j.eswa.2021.115130
Kashef S, Nezamabadi-pour H (2019) A label-specific multi-label feature selection algorithm based on the Pareto dominance concept. Pattern Recogn 88:654–667. https://doi.org/10.1016/j.patcog.2018.12.020
Perkins S, Theiler J (2003) Online feature selection using grafting. pp 592–599
Zhou JP, Foster DA, Stine RH, Ungar L (2006) Streamwise feature selection. J Mach Learn Res 3:1532–4435
Wu X, Yu K, Ding W et al (2013) Online feature selection with streaming features. IEEE Trans Pattern Anal Mach Intell 35:1178–1192. https://doi.org/10.1109/TPAMI.2012.197
Yu K, Wu X, Ding W, Pei J (2016) Scalable and accurate online feature selection for big data. ACM Trans Knowl Discov Data 11:16:1-16:39. https://doi.org/10.1145/2976744
Zhou P, Hu X, Li P, Wu X (2017) Online feature selection for high-dimensional class-imbalanced data. Knowl-Based Syst 136:187–199. https://doi.org/10.1016/j.knosys.2017.09.006
Rahmaninia M, Moradi P (2018) OSFSMI: Online stream feature selection method based on mutual information. Appl Soft Comput 68:733–746. https://doi.org/10.1016/j.asoc.2017.08.034
Zhou P, Hu X, Li P, Wu X (2019) Online streaming feature selection using adapted neighborhood rough set. Inf Sci 481:258–279. https://doi.org/10.1016/j.ins.2018.12.074
Zhou P, Hu X, Li P, Wu X (2019) OFS-Density: a novel online streaming feature selection method. Pattern Recogn 86:48–61. https://doi.org/10.1016/j.patcog.2018.08.009
Zhou P, Li P, Zhao S, Wu X (2021) Feature interaction for streaming feature selection. IEEE Trans Neural Netw Learn Syst 32:4691–4702. https://doi.org/10.1109/TNNLS.2020.3025922
Luo C, Wang S, Li T et al (2023) RHDOFS: a distributed online algorithm towards scalable streaming feature selection. IEEE Trans Parallel Distrib Syst 34:1830–1847. https://doi.org/10.1109/TPDS.2023.3265974
AlNuaimi N, Masud MM, Serhani MA, Zaki N (2020) Streaming feature selection algorithms for big data: a survey. Appl Comput Inf 18:113–135
Hashemi A, Pajoohan M-R, Dowlatshahi MB (2023) An election strategy for online streaming feature selection. In: 2023 28th international computer conference, computer society of Iran (CSICC). pp 01–04
Wang M, Li H, Tao D et al (2012) Multimodal graph-based reranking for web image search. IEEE Trans Image Process 21:4649–4661. https://doi.org/10.1109/TIP.2012.2207397
Li J, Hu X, Tang J, Liu H (2015) Unsupervised streaming feature selection in social media
Talbi E (2009) Metaheuristics: from design to implementation. Wiley
Shao F, Liu H (2021) The theoretical and experimental analysis of the maximal information coefficient approximate algorithm. J Syst Sci Inf 9:95–104
Gu Q, Li Z, Han J (2012) Generalized fisher score for feature selection. arXiv:https://arxiv.org/abs/1202.3725
Suryanarayan P, Subramanian A, Mandalapu D (2010) Dynamic hand pose recognition using depth data. In: 2010 20th international conference on pattern recognition. pp 3105–3108
Friedman M (1940) A Comparison of Alternative Tests of Significance for the Problem of m Rankings. Ann Math Stat 11:86–92
Bag S, Kumar SK, Tiwari MK (2019) An efficient recommendation generation using relevant Jaccard similarity. Inf Sci 483:53–64. https://doi.org/10.1016/j.ins.2019.01.023
Funding
This research did not receive any specific grant from public, commercial, or not-for-profit funding agencies.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hashemi, A., Pajoohan, MR. & Dowlatshahi, M.B. NSOFS: a non-dominated sorting-based online feature selection algorithm. Neural Comput & Applic 36, 1181–1197 (2024). https://doi.org/10.1007/s00521-023-09089-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09089-5