Skip to main content
Log in

Correlation Based Feature Selection Algorithms for Varying Datasets of Different Dimensionality

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Curse of dimensionality problem needs to be addressed carefully when designing a classifier. Given a huge dimensional dataset, one interesting problem is the choice of optimal selection of features for classification. Feature selection is an interesting and most optimal solution to the curse of dimensionality problem. Numerous feature selection algorithms have been proposed in the recent past to solve the curse of dimensionality problem but no one stop solution prevails. This paper proposes two novel algorithms for feature selection namely Reverse Piece-wise Correlation Based Feature Selection (RPwCBFS) and Shuffled Piece-wise Correlation Based Feature Selection (SPwCBFS) that divides the feature space into pieces and computes the similarity of feature subsets in reverse order and in random shuffled manner respectively. The proposed algorithms are compared with Fast Correlation Based Feature selection (FCBF), Fast Correlation Based Feature selection # (FCBF#) and Fast Correlation Based Feature selection In Piece (FCBFiP). Standard medium and huge dimensional datasets are used for experimentation purpose. Experimental results prove that the Reverse Piece-wise Correlation Based Feature Selection algorithm (RPwCBFS) and Shuffled Piece-wise Correlation Based Feature Selection algorithm (SPwCBFS) are prominent solution for feature selection when the underlying dataset is medium sized. For huge dimensional datasets, Shuffled Piece-wise Correlation Based Feature Selection algorithm (SPwCBFS) proves to be an optimal choice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03).

  2. Senliol, B, et al. (2008). Fast correlation based filter (FCBF) with a different search strategy. In 2008 23rd international symposium on computer and information sciences. IEEE.

  3. Egea, S., et al. (2018). Intelligent IoT traffic classification using novel search strategy for fast-based-correlation feature selection in industrial environments. IEEE Internet of Things Journal, 5(3), 1616–1624.

    Article  Google Scholar 

  4. Hancer, E., Xue, B., & Zhang, M. (2018). Differential evolution for filter feature selection based on information theory and feature ranking. Knowledge-Based Systems, 140, 103–119.

    Article  Google Scholar 

  5. Yu, L., & Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5, 1205–1224.

    MathSciNet  MATH  Google Scholar 

  6. Onan, A., & Korukoğlu, S. (2017). A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science, 43(1), 25–38.

    Article  Google Scholar 

  7. Hall, M. A., & Smith, L. A. (1997). Feature subset selection: A correlation based filter approach. In International Conference on Neural Information Processing and Intelligent Information Systems (pp. 855–858).

  8. Das, S. (2001). Filters, wrappers and a boosting-based hybrid for feature selection. In International Conference on Machine Learning (Vol. 1, pp. 74–81).

  9. Zhang, Y., Gong, D., & Cheng, J. (2017). Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 14(1), 64–75.

    Article  Google Scholar 

  10. Hall, M. A. (2000). Correlation-based feature selection of discrete and numeric class machine learning. In International Conference on Machine Learning (pp. 359–366).

  11. Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.

    Article  Google Scholar 

  12. Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Icml, (vol. 97, pp. 412–420).

  13. Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8, 1226–1238.

    Article  Google Scholar 

  14. Jacob, S., & Raju, G. (2017). Software defect prediction in large space systems through hybrid feature selection and classification. International Arab Journal of Information Technology, 14(2), 208–214.

    Google Scholar 

  15. Mao, K. Z. (2004). Orthogonal forward selection and backward elimination algorithms for feature subset selection. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 34(1), 629–634.

    Article  Google Scholar 

  16. Forina, M., et al. (2010). UCI machine learning repository. Wine Dataset, [Online] Available: https://archive.ics.uci.edu/ml/datasets/wine.

  17. Fisher, R. A. (2010). UCI machine learning repository. Iris Dataset, [Online] Available: https://archive.ics.uci.edu/ml/datasets/iris.

  18. Zwitter, M., & Soklic, M. (1988). UCI machine learning repository. Breast cancer Dataset, [Online] Available: http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29.

  19. Alpaydin, E., & Alimoglu, F. (2008). UCI machine learning repository. Digits Dataset, [Online] Available: https://archive.ics.uci.edu/ml/datasets/optical+recognition+of+handwritten+digits.

  20. Reyes-Ortiz, J. L., Anguita, D., Ghio, A. Oneto, L., & Parra, X. (2013). UCI machine learning repository. UCI HAR Dataset, [Online] Available: https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones.

  21. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3(Mar), 1157–1182.

    MATH  Google Scholar 

  22. Urbanowicz, R. J., et al. (2018). Relief-based feature selection: Introduction and review. Journal of Biomedical Informatics, 85, 189–203.

    Article  Google Scholar 

  23. Alsheikh, M. A., et al. (2014). Machine learning in wireless sensor networks: Algorithms, strategies, and applications. IEEE Communications Surveys and Tutorials, 16(4), 1996–2018.

    Article  Google Scholar 

  24. Wahid, F., Ghazali, R., & Ismail, L. H. (2019). An enhanced approach of artificial bee colony for energy management in energy efficient residential building. Wireless Personal Communications, 104(1), 235–257.

    Article  Google Scholar 

  25. Wahid, Fazli, & Ghazali, Rozaida. (2019). Hybrid of firefly algorithm and pattern search for solving optimization problems. Evolutionary Intelligence, 12(1), 1–10.

    Article  Google Scholar 

  26. Wahid, F., Ghazali, R., & Shah, H. (2018). An improved hybrid firefly algorithm for solving optimization problems. In International conference on soft computing and data mining, (pp. 14–23). Cham: Springer.

  27. Wahid, F., & Kim, D. H. (2016). An efficient approach for energy consumption optimization and management in residential building using artificial bee colony and fuzzy logic. In Mathematical Problems in Engineering (pp. 1–13). Hidawai.

  28. Wahid, F., & Kim, D. H. (2017) Short-term energy consumption prediction in korean residential buildings using optimized multi-layer perceptron. Kuwait Journal of Science, 44(2), 179–187.

    Google Scholar 

  29. Wahid, F., Ghazali, R., Shah, A. S., & Fayaz, M. (2017). Prediction of energy consumption in the buildings using multi-layer perceptron and random forest. IJAST, 101, 13–22.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Meena Kowshalya.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kowshalya, A.M., Madhumathi, R. & Gopika, N. Correlation Based Feature Selection Algorithms for Varying Datasets of Different Dimensionality. Wireless Pers Commun 108, 1977–1993 (2019). https://doi.org/10.1007/s11277-019-06504-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-019-06504-w

Keywords

Navigation