Skip to main content

Feature Selection Method Based on Classification Performance Score and P-value

  • Conference paper
  • First Online:
Advanced Intelligent Systems for Sustainable Development (AI2SD’2020) (AI2SD 2020)

Abstract

Learning from high dimensionality data has gained increasing attention in recent years due to the massive growth of skewed data across many scientific fields. Some researches show that feature selection does play an essential role in selecting the significant features, also reducing the number of dimensions in the process.

In this study, we propose a feature selection approach based on P-value and a performance score. In fact, our solution contains two main steps that work in parallel. The first one measured the performance score of each feature applying support vector machine (SVM). The second estimated the P-value of each feature, then fixed the threshold. Combining both steps, we obtain the feature kth and the range features that will be used to create the subsets. We tested the classification performances of the different selected features subsets using performance score with three different classifiers, and we choose NB that showed the highest performance to continue the discussion section based on it. Besides, we assessed the performance score of the proposed approach with four different datasets from Kaggle. According to the results of the study, and comparing two cases for each selected subset, we concluded that the higher performance could be reached by including the Kth, Kth + 1 in the selected subset of features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ishwarappa, Anuradha, J.: A brief introduction on big data 5Vs characteristics and Hadoop technology. Procedia Comput. Sci. 48, 319–324 (2015)

    Google Scholar 

  2. Lensen, A., Xue, B., Zhang, M.: Using particle swarm optimisation and the silhouette metric to estimate the number of clusters, select features, and perform clustering. In: Squillero, G., Sim, K. (eds.) EvoApplications 2017. LNCS, vol. 10199, pp. 538–554. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55849-3_35

    Chapter  Google Scholar 

  3. Zhang, Y., Li, H.-G., Wang, Q., Peng, C.: A filter-based bare-bone particle swarm optimization algorithm for unsupervised feature selection, Appl. Intell. 49, 2889–2898 (2019)

    Google Scholar 

  4. Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)

    Article  Google Scholar 

  5. KhotanzadAlireza, H.H.: Rotation invariant image recogni-tion using features selected via a systematic method. Pattern Recognit. 23(10), 1089–1101 (1990)

    Article  Google Scholar 

  6. Goltsev, A., Gritsenko, V.: Investigation of efficient features for image recognition by neural networks. Neural Netw. 28, 15–23 (2012)

    Article  Google Scholar 

  7. Uçar, M.: Classification performance-based feature selection algorithm for machine learning: P-score. IRBM 41(4), 229–239 (2020). https://doi.org/10.1016/j.irbm.2020.01.006

    Article  Google Scholar 

  8. Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newsl. 6(1), 80–89 (2004)

    Article  Google Scholar 

  9. Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507 (2007)

    Article  Google Scholar 

  10. El Barakaz, F., Boutkhoum, O., El Moutaouakkil, A.: A new preprocessing method reduces the dimensionality of classification models. In: Proceedings of the 4th International Conference on Big Data and Internet of Things (2019)

    Google Scholar 

  11. Sullivan, G.M., Feinn, R.: Using effect size—or why the p value is not enough. J. Graduate Med. Educ. 4(3), 279–282 (2012)

    Google Scholar 

  12. Greenland, S., et al.: Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur. J. Epidemiol. 31(4), 337–350 (2016). https://doi.org/10.1007/s10654-016-0149-3

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatima El Barakaz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

El Barakaz, F., Boutkhoum, O., El Moutaouakkil, A. (2022). Feature Selection Method Based on Classification Performance Score and P-value. In: Kacprzyk, J., Balas, V.E., Ezziyyani, M. (eds) Advanced Intelligent Systems for Sustainable Development (AI2SD’2020). AI2SD 2020. Advances in Intelligent Systems and Computing, vol 1418. Springer, Cham. https://doi.org/10.1007/978-3-030-90639-9_30

Download citation

Publish with us

Policies and ethics