Abstract
Feature Selection, one of the most important preprocessing steps in Machine Learning, is the process where we automatically or manually select those features which contribute most to our prediction variable or output in which we are interested. This subset of features has some very important benefits: it reduces the computational complexity of learning algorithms, saves time, improves accuracy, and the selected features can be insightful for the people involved in the problem domain. Among the different ways of performing feature selection such as filter, wrapper and hybrid, filter-based separability methods can be used as a feature ranking tool in binary classification problems, most popular being the Bhattacharya distance and the Jeffries–Matusita (JM) distance. However, these measures are parametric and their computation requires knowledge of the distribution from which the samples are drawn. In real life, we often come across instances where it is difficult to have an idea about the distribution of observations. In this paper, we have presented a new non-parametric approach for performing feature selection called the ‘Non-Parametric Distance Measure’. The experiment with the new measure is performed over nine datasets and the results are compared with other ranking-based methods for feature selection using those datasets. This experiment proves that the new box-plot-based method can provide greater accuracy and efficiency than the conventional ranking-based measures for feature selection such as the Chi-Square, Symmetric Uncertainty and Information Gain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
A. Elisseeff, I. Guyon, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
E. Xing, M. Jordan, R. Karp, Feature selection for high-dimensional genomic microarray dat, in Proceedings of the Eighteenth International Conference on Machine Learning (2001), pp. 601–608
W. Shu, W. Qian, Y. Xie, Incremental approaches for feature selection from dynamic data with the variation of multiple objects. Knowl.-Based Syst. 163, 320–331 (2019)
H. Liu, L. Yu, Feature selection for high-dimensional data: a fast correlation-based filter solution, in International Conference on Machine Learning (2003)
S. Goswami, A. Chakrabarti, Feature selection: a practitioner view. Int. J. Inf. Technol. Comput. Sci. (IJITCS) 6(11), 66 (2014)
S. Goswami, A.K. Das, A. Chakrabarti, B. Chakraborty, A feature cluster taxonomy based feature selection technique. Expert Syst. Appl. 79, 76–89 (2017)
Sheng-Uei Guan, Yinan Qi, Chunyu Bao, An incremental approach to MSE-based feature selection. Int. J. Comput. Intell. Appl. 6, 451–471 (2006)
J. Liu, G. Wang, A hybrid feature selection method for data sets of thousands of variables, in 2010 2nd International Conference on Advanced Computer Control, Shenyang (2010), pp. 288–291
A.L. Blum, R.L. Rivest, Training a 3-node neural network is NP complete, COLT (1988)
M. Dash, H. Liu, Feature selection for classifications. Intell. Data Anal. Int. J. 1, 131–156 (1997)
I. Beheshti, H. Demirel, Feature-ranking-based Alzheimer’s disease classification from structural MRI”. Magn. Reson. Imaging 34(3), 252–263 (2016)
T.M. Khoshgoftaar, K. Gao, A. Napoliano, An empirical study of feature ranking techniques for software quality prediction. Int. J. Softw. Eng. Knowl. Eng. 22(02), 161–183 (2012)
A. Rehman, K. Javed, H.A. Babri, M. Saeed, Expert systems with applications relative discrimination criterion – a novel feature ranking method for text data. Expert Syst. Appl. 42(7), 3670–3681 (2015)
Shaohong Zhang, Hau-San Wong, Ying Shen, Dongqing Xie, A new unsupervised feature ranking method for gene expression data based on consensus affinity. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1257–1263 (2012)
K. Shin, X.M. Xu, Consistency-based feature selection, in Knowledge-based and intelligent information and engineering systems. KES 2009, ed. by J.D. Velásquez, S.A. Ríos, R.J. Howlett, L.C. Jain. Lecture notes in computer science, vol. 5711 (Springer, Berlin, 2009)
J. Novaković, P. Strbac, D. Bulatović, Toward optimal feature selection using ranking methods and classification algorithms. Yugosl. J. Oper. Res. 21(1), 119–135 (2011)
P. Drotár, J. Gazda, Z. Smékal, An experimental comparison of feature selection methods on two-class biomedical datasets. Comput. Biol. Med. 66, 1–10 (2015)
X. Geng, T.-Y. Liu, T. Qin, H. Li, Feature selection for ranking, in Proceedings of ACM SIGIR International Conference on Information Retrieval (2007)
S.B. Hariz, Z. Elouedi, Ranking-based feature selection method for dynamic belief clustering, in ICAIS (2011)
C.S. Kumar, R.J. Rama Sree, Application of ranking based attribute selection filters to perform automated evaluation of descriptive answers through sequential minimal optimization models, in SOCO 2014 (2014)
C. Winkler et al., Feature ranking of type 1 diabetes susceptibility genes improves prediction of type 1 diabetes. Diabetologia 57(12), 2521–2529 (2014)
J. Van Hulse, T.M. Khoshgoftaar, A. Napolitano, A comparative evaluation of feature ranking methods for high dimensional bioinformatics data, in Proceedings of 2011 IEEE International Conference on Information Reuse and Integration IRI 2011 (2011), pp. 315–320
C.C. Reyes-Aldasoro, A. Bhalerao, The Bhattacharyya space for feature selection and its application to texture segmentation. Pattern Recogn. 39(5), 812–826 (2006)
X. Guorong, C. Peiqi, W. Minhui, Bhattacharyya distance feature selection. Proc.-Int. Conf. Pattern Recogn. 2, 195–199 (1996)
G. Xuan, X. Zhu, P. Chai, Z. Zhang, Y.Q. Shi, D. Fu, Feature selection based on the Bhattacharyya distance, in Proceedings - International Conference on Pattern Recognition, vol. 3 (2006), pp. 1232–1235
S. Lei, A feature selection method based on information gain and genetic algorithm, in Proceedings—2012 International Conference on Computer Science and Electronics Engineering, ICCSEE (2012)
M.R.P. Homem, N.D.A. Mascarenhas, P.E. Cruvinel, The linear attenuation coefficients as features of multiple energy CT image classification, Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrom. Detect. Assoc. Equip. 452(1–2), 351–360 (2000)
S.B. Serpico, M. D’Inca, F. Melgani, G. Moser, Comparison of feature reduction techniques for classification of hyperspectral remote sensing data. Image Signal Process. Remote Sens. VIII 4885, 347–358 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Roychowdhury, S., Basak, A., Goswami, S. (2021). Non-parametric Distance—A New Class Separability Measure. In: Sharma, N., Chakrabarti, A., Balas, V.E., Martinovic, J. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1175. Springer, Singapore. https://doi.org/10.1007/978-981-15-5619-7_1
Download citation
DOI: https://doi.org/10.1007/978-981-15-5619-7_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5618-0
Online ISBN: 978-981-15-5619-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)