Abstract
Curse of dimensionality problem needs to be addressed carefully when designing a classifier. Given a huge dimensional dataset, one interesting problem is the choice of optimal selection of features for classification. Feature selection is an interesting and most optimal solution to the curse of dimensionality problem. Numerous feature selection algorithms have been proposed in the recent past to solve the curse of dimensionality problem but no one stop solution prevails. This paper proposes two novel algorithms for feature selection namely Reverse Piece-wise Correlation Based Feature Selection (RPwCBFS) and Shuffled Piece-wise Correlation Based Feature Selection (SPwCBFS) that divides the feature space into pieces and computes the similarity of feature subsets in reverse order and in random shuffled manner respectively. The proposed algorithms are compared with Fast Correlation Based Feature selection (FCBF), Fast Correlation Based Feature selection # (FCBF#) and Fast Correlation Based Feature selection In Piece (FCBFiP). Standard medium and huge dimensional datasets are used for experimentation purpose. Experimental results prove that the Reverse Piece-wise Correlation Based Feature Selection algorithm (RPwCBFS) and Shuffled Piece-wise Correlation Based Feature Selection algorithm (SPwCBFS) are prominent solution for feature selection when the underlying dataset is medium sized. For huge dimensional datasets, Shuffled Piece-wise Correlation Based Feature Selection algorithm (SPwCBFS) proves to be an optimal choice.
Similar content being viewed by others
References
Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03).
Senliol, B, et al. (2008). Fast correlation based filter (FCBF) with a different search strategy. In 2008 23rd international symposium on computer and information sciences. IEEE.
Egea, S., et al. (2018). Intelligent IoT traffic classification using novel search strategy for fast-based-correlation feature selection in industrial environments. IEEE Internet of Things Journal, 5(3), 1616–1624.
Hancer, E., Xue, B., & Zhang, M. (2018). Differential evolution for filter feature selection based on information theory and feature ranking. Knowledge-Based Systems, 140, 103–119.
Yu, L., & Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5, 1205–1224.
Onan, A., & Korukoğlu, S. (2017). A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science, 43(1), 25–38.
Hall, M. A., & Smith, L. A. (1997). Feature subset selection: A correlation based filter approach. In International Conference on Neural Information Processing and Intelligent Information Systems (pp. 855–858).
Das, S. (2001). Filters, wrappers and a boosting-based hybrid for feature selection. In International Conference on Machine Learning (Vol. 1, pp. 74–81).
Zhang, Y., Gong, D., & Cheng, J. (2017). Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 14(1), 64–75.
Hall, M. A. (2000). Correlation-based feature selection of discrete and numeric class machine learning. In International Conference on Machine Learning (pp. 359–366).
Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.
Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Icml, (vol. 97, pp. 412–420).
Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8, 1226–1238.
Jacob, S., & Raju, G. (2017). Software defect prediction in large space systems through hybrid feature selection and classification. International Arab Journal of Information Technology, 14(2), 208–214.
Mao, K. Z. (2004). Orthogonal forward selection and backward elimination algorithms for feature subset selection. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 34(1), 629–634.
Forina, M., et al. (2010). UCI machine learning repository. Wine Dataset, [Online] Available: https://archive.ics.uci.edu/ml/datasets/wine.
Fisher, R. A. (2010). UCI machine learning repository. Iris Dataset, [Online] Available: https://archive.ics.uci.edu/ml/datasets/iris.
Zwitter, M., & Soklic, M. (1988). UCI machine learning repository. Breast cancer Dataset, [Online] Available: http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29.
Alpaydin, E., & Alimoglu, F. (2008). UCI machine learning repository. Digits Dataset, [Online] Available: https://archive.ics.uci.edu/ml/datasets/optical+recognition+of+handwritten+digits.
Reyes-Ortiz, J. L., Anguita, D., Ghio, A. Oneto, L., & Parra, X. (2013). UCI machine learning repository. UCI HAR Dataset, [Online] Available: https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3(Mar), 1157–1182.
Urbanowicz, R. J., et al. (2018). Relief-based feature selection: Introduction and review. Journal of Biomedical Informatics, 85, 189–203.
Alsheikh, M. A., et al. (2014). Machine learning in wireless sensor networks: Algorithms, strategies, and applications. IEEE Communications Surveys and Tutorials, 16(4), 1996–2018.
Wahid, F., Ghazali, R., & Ismail, L. H. (2019). An enhanced approach of artificial bee colony for energy management in energy efficient residential building. Wireless Personal Communications, 104(1), 235–257.
Wahid, Fazli, & Ghazali, Rozaida. (2019). Hybrid of firefly algorithm and pattern search for solving optimization problems. Evolutionary Intelligence, 12(1), 1–10.
Wahid, F., Ghazali, R., & Shah, H. (2018). An improved hybrid firefly algorithm for solving optimization problems. In International conference on soft computing and data mining, (pp. 14–23). Cham: Springer.
Wahid, F., & Kim, D. H. (2016). An efficient approach for energy consumption optimization and management in residential building using artificial bee colony and fuzzy logic. In Mathematical Problems in Engineering (pp. 1–13). Hidawai.
Wahid, F., & Kim, D. H. (2017) Short-term energy consumption prediction in korean residential buildings using optimized multi-layer perceptron. Kuwait Journal of Science, 44(2), 179–187.
Wahid, F., Ghazali, R., Shah, A. S., & Fayaz, M. (2017). Prediction of energy consumption in the buildings using multi-layer perceptron and random forest. IJAST, 101, 13–22.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical Approval
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kowshalya, A.M., Madhumathi, R. & Gopika, N. Correlation Based Feature Selection Algorithms for Varying Datasets of Different Dimensionality. Wireless Pers Commun 108, 1977–1993 (2019). https://doi.org/10.1007/s11277-019-06504-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-019-06504-w