Skip to main content
Log in

NSOFS: a non-dominated sorting-based online feature selection algorithm

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Online streaming feature selection (OSFS) methods are used to dynamically update the feature space as well as remove irrelevant and redundant features from the data. Since most Big Data in real-world applications are generated in the form of data streams, effective methods should be developed in this area. Further, methods with low computational complexity are required to make online decisions. In this paper, the OSFS process is modeled as a multi-objective optimization problem. To the best of our knowledge, this is the first time that the concept of Pareto dominance has been applied to find the optimal subset of features in OSFS. When a new feature arrives, it is evaluated in the multi-objective space. The non-dominated features are the optimal subset for each timestamp. We proposed an efficient and effective method which enhances the classification accuracy in OSFS by minimizing the number of features within a short time. In addition, the proposed method is insensitive to the feature streams. Experiments are conducted using two classifiers and seven OSFS methods, including OSFSMI, K-OFSD, OFS-A3M, OFS-Density, Alpha-Investing, SAOLA, and OFSS-FI.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Data generated during the study are subject to a data sharing mandate and available in a few public repositories. All used data are cited in text.

Notes

  1. https://jundongl.github.io/scikit-feature/datasets.html.

  2. https://schlieplab.org/Static/Supplements/CompCancer/datasets.htm.

  3. https://search.r-project.org/CRAN.

  4. https://archive.ics.uci.edu/ml/datasets.php.

  5. http://www.rii.com/publications/2002/vantveer.htm.

References

  1. Dhal P, Azad C (2022) A comprehensive survey on feature selection in the various fields of machine learning. Appl Intell 52:4543–4581. https://doi.org/10.1007/s10489-021-02550-9

    Article  Google Scholar 

  2. Hashemi A, Bagher Dowlatshahi M, Nezamabadi-pour H (2021) An efficient Pareto-based feature selection algorithm for multi-label classification. Inf Sci 581:428–447. https://doi.org/10.1016/j.ins.2021.09.052

    Article  MathSciNet  Google Scholar 

  3. Jia W, Sun M, Lian J, Hou S (2022) Feature dimensionality reduction: a review. Complex Intell Syst 8:2663–2693. https://doi.org/10.1007/s40747-021-00637-x

    Article  Google Scholar 

  4. Zaman EAK, Mohamed A, Ahmad A (2022) Feature selection for online streaming high-dimensional data: a state-of-the-art review. Appl Soft Comput 127:109355. https://doi.org/10.1016/j.asoc.2022.109355

    Article  Google Scholar 

  5. You D, Sun M, Liang S et al (2022) Online feature selection for multi-source streaming features. Inf Sci 590:267–295. https://doi.org/10.1016/j.ins.2022.01.008

    Article  Google Scholar 

  6. Zhou P, Zhang Y, Li P, Wu X (2022) General assembly framework for online streaming feature selection via Rough Set models. Expert Syst Appl 204:117520. https://doi.org/10.1016/j.eswa.2022.117520

    Article  Google Scholar 

  7. Wu D, He Y, Luo X, Zhou M (2022) A latent factor analysis-based approach to online sparse streaming feature selection. IEEE Trans Syst Man Cybern Syst 52:6744–6758. https://doi.org/10.1109/TSMC.2021.3096065

    Article  Google Scholar 

  8. Hashemi A, Joodaki M, Joodaki NZ, Dowlatshahi MB (2022) Ant colony optimization equipped with an ensemble of heuristics through multi-criteria decision making: a case study in ensemble feature selection. Appl Soft Comput 124:109046. https://doi.org/10.1016/j.asoc.2022.109046

    Article  Google Scholar 

  9. Bayati H, Dowlatshahi MB, Hashemi A (2022) MSSL: a memetic-based sparse subspace learning algorithm for multi-label classification. Int J Mach Learn Cyber 13:3607–3624. https://doi.org/10.1007/s13042-022-01616-5

    Article  Google Scholar 

  10. Dowlatshahi MB, Hashemi A (2023) Unsupervised feature selection: A fuzzy multi-criteria decision-making approach. Iran Fuzzy Syst 20:55–70

    Google Scholar 

  11. Karimi F, Dowlatshahi MB, Hashemi A (2023) SemiACO: A semi-supervised feature selection based on ant colony optimization. Expert Syst Appl 214:119130. https://doi.org/10.1016/j.eswa.2022.119130

    Article  Google Scholar 

  12. Miri M, Dowlatshahi MB, Hashemi A et al (2022) Ensemble feature selection for multi-label text classification: an intelligent order statistics approach. Int J Intell Syst 37:11319–11341. https://doi.org/10.1002/int.23044

    Article  Google Scholar 

  13. Eskandari S, Seifaddini M (2023) Online and offline streaming feature selection methods with bat algorithm for redundancy analysis. Pattern Recogn 133:109007. https://doi.org/10.1016/j.patcog.2022.109007

    Article  Google Scholar 

  14. Hu X, Zhou P, Li P et al (2018) A survey on online feature selection with streaming features. Front Comput Sci 12:479–493. https://doi.org/10.1007/s11704-016-5489-3

    Article  Google Scholar 

  15. Pajoohan M-R, Hashemi A, Dowlatshahi MB (2022) An online streaming feature selection method based on the Choquet fuzzy integral. Fuzzy Syst Appl 5:161–185. https://doi.org/10.22034/jfsa.2022.331660.1116

    Article  Google Scholar 

  16. Rafie A, Moradi P, Ghaderzadeh A (2023) A Multi-Objective online streaming Multi-Label feature selection using mutual information. Expert Syst Appl 216:119428. https://doi.org/10.1016/j.eswa.2022.119428

    Article  Google Scholar 

  17. Wang J, Zhao P, Hoi SCH, Jin R (2014) Online feature selection and its applications. IEEE Trans Knowl Data Eng 26:698–710. https://doi.org/10.1109/TKDE.2013.32

    Article  Google Scholar 

  18. Hashemi A, Pajoohan M-R, Dowlatshahi MB (2022) Online streaming feature selection based on Sugeno fuzzy integral. In: 2022 9th Iranian joint congress on fuzzy and intelligent systems (CFIS). pp 1–6

  19. Hashemi A, Dowlatshahi MB, Nezamabadi-pour H (2021) Minimum redundancy maximum relevance ensemble feature selection: A bi-objective Pareto-based approach. J Soft Comput Inf Technol

  20. Hashemi A, Bagher Dowlatshahi M, Nezamabadi-pour H (2021) A pareto-based ensemble of feature selection algorithms. Expert Syst Appl 180:115130. https://doi.org/10.1016/j.eswa.2021.115130

    Article  Google Scholar 

  21. Kashef S, Nezamabadi-pour H (2019) A label-specific multi-label feature selection algorithm based on the Pareto dominance concept. Pattern Recogn 88:654–667. https://doi.org/10.1016/j.patcog.2018.12.020

    Article  Google Scholar 

  22. Perkins S, Theiler J (2003) Online feature selection using grafting. pp 592–599

  23. Zhou JP, Foster DA, Stine RH, Ungar L (2006) Streamwise feature selection. J Mach Learn Res 3:1532–4435

    MathSciNet  Google Scholar 

  24. Wu X, Yu K, Ding W et al (2013) Online feature selection with streaming features. IEEE Trans Pattern Anal Mach Intell 35:1178–1192. https://doi.org/10.1109/TPAMI.2012.197

    Article  Google Scholar 

  25. Yu K, Wu X, Ding W, Pei J (2016) Scalable and accurate online feature selection for big data. ACM Trans Knowl Discov Data 11:16:1-16:39. https://doi.org/10.1145/2976744

    Article  Google Scholar 

  26. Zhou P, Hu X, Li P, Wu X (2017) Online feature selection for high-dimensional class-imbalanced data. Knowl-Based Syst 136:187–199. https://doi.org/10.1016/j.knosys.2017.09.006

    Article  Google Scholar 

  27. Rahmaninia M, Moradi P (2018) OSFSMI: Online stream feature selection method based on mutual information. Appl Soft Comput 68:733–746. https://doi.org/10.1016/j.asoc.2017.08.034

    Article  Google Scholar 

  28. Zhou P, Hu X, Li P, Wu X (2019) Online streaming feature selection using adapted neighborhood rough set. Inf Sci 481:258–279. https://doi.org/10.1016/j.ins.2018.12.074

    Article  Google Scholar 

  29. Zhou P, Hu X, Li P, Wu X (2019) OFS-Density: a novel online streaming feature selection method. Pattern Recogn 86:48–61. https://doi.org/10.1016/j.patcog.2018.08.009

    Article  Google Scholar 

  30. Zhou P, Li P, Zhao S, Wu X (2021) Feature interaction for streaming feature selection. IEEE Trans Neural Netw Learn Syst 32:4691–4702. https://doi.org/10.1109/TNNLS.2020.3025922

    Article  MathSciNet  Google Scholar 

  31. Luo C, Wang S, Li T et al (2023) RHDOFS: a distributed online algorithm towards scalable streaming feature selection. IEEE Trans Parallel Distrib Syst 34:1830–1847. https://doi.org/10.1109/TPDS.2023.3265974

    Article  Google Scholar 

  32. AlNuaimi N, Masud MM, Serhani MA, Zaki N (2020) Streaming feature selection algorithms for big data: a survey. Appl Comput Inf 18:113–135

    Google Scholar 

  33. Hashemi A, Pajoohan M-R, Dowlatshahi MB (2023) An election strategy for online streaming feature selection. In: 2023 28th international computer conference, computer society of Iran (CSICC). pp 01–04

  34. Wang M, Li H, Tao D et al (2012) Multimodal graph-based reranking for web image search. IEEE Trans Image Process 21:4649–4661. https://doi.org/10.1109/TIP.2012.2207397

    Article  MathSciNet  Google Scholar 

  35. Li J, Hu X, Tang J, Liu H (2015) Unsupervised streaming feature selection in social media

  36. Talbi E (2009) Metaheuristics: from design to implementation. Wiley

    Book  Google Scholar 

  37. Shao F, Liu H (2021) The theoretical and experimental analysis of the maximal information coefficient approximate algorithm. J Syst Sci Inf 9:95–104

    Google Scholar 

  38. Gu Q, Li Z, Han J (2012) Generalized fisher score for feature selection. arXiv:https://arxiv.org/abs/1202.3725

  39. Suryanarayan P, Subramanian A, Mandalapu D (2010) Dynamic hand pose recognition using depth data. In: 2010 20th international conference on pattern recognition. pp 3105–3108

  40. Friedman M (1940) A Comparison of Alternative Tests of Significance for the Problem of m Rankings. Ann Math Stat 11:86–92

    Article  MathSciNet  Google Scholar 

  41. Bag S, Kumar SK, Tiwari MK (2019) An efficient recommendation generation using relevant Jaccard similarity. Inf Sci 483:53–64. https://doi.org/10.1016/j.ins.2019.01.023

    Article  Google Scholar 

Download references

Funding

This research did not receive any specific grant from public, commercial, or not-for-profit funding agencies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad-Reza Pajoohan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hashemi, A., Pajoohan, MR. & Dowlatshahi, M.B. NSOFS: a non-dominated sorting-based online feature selection algorithm. Neural Comput & Applic 36, 1181–1197 (2024). https://doi.org/10.1007/s00521-023-09089-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09089-5

Keywords

Navigation