Non-parametric Distance—A New Class Separability Measure

Roychowdhury, Sayoni; Basak, Aditya; Goswami, Saptarsi

doi:10.1007/978-981-15-5619-7_1

Sayoni Roychowdhury¹⁸,
Aditya Basak¹⁹ &
Saptarsi Goswami²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1175))

858 Accesses

Abstract

Feature Selection, one of the most important preprocessing steps in Machine Learning, is the process where we automatically or manually select those features which contribute most to our prediction variable or output in which we are interested. This subset of features has some very important benefits: it reduces the computational complexity of learning algorithms, saves time, improves accuracy, and the selected features can be insightful for the people involved in the problem domain. Among the different ways of performing feature selection such as filter, wrapper and hybrid, filter-based separability methods can be used as a feature ranking tool in binary classification problems, most popular being the Bhattacharya distance and the Jeffries–Matusita (JM) distance. However, these measures are parametric and their computation requires knowledge of the distribution from which the samples are drawn. In real life, we often come across instances where it is difficult to have an idea about the distribution of observations. In this paper, we have presented a new non-parametric approach for performing feature selection called the ‘Non-Parametric Distance Measure’. The experiment with the new measure is performed over nine datasets and the results are compared with other ranking-based methods for feature selection using those datasets. This experiment proves that the new box-plot-based method can provide greater accuracy and efficiency than the conventional ranking-based measures for feature selection such as the Chi-Square, Symmetric Uncertainty and Information Gain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

A. Elisseeff, I. Guyon, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
E. Xing, M. Jordan, R. Karp, Feature selection for high-dimensional genomic microarray dat, in Proceedings of the Eighteenth International Conference on Machine Learning (2001), pp. 601–608
Google Scholar
W. Shu, W. Qian, Y. Xie, Incremental approaches for feature selection from dynamic data with the variation of multiple objects. Knowl.-Based Syst. 163, 320–331 (2019)
Article Google Scholar
H. Liu, L. Yu, Feature selection for high-dimensional data: a fast correlation-based filter solution, in International Conference on Machine Learning (2003)
Google Scholar
S. Goswami, A. Chakrabarti, Feature selection: a practitioner view. Int. J. Inf. Technol. Comput. Sci. (IJITCS) 6(11), 66 (2014)
Google Scholar
S. Goswami, A.K. Das, A. Chakrabarti, B. Chakraborty, A feature cluster taxonomy based feature selection technique. Expert Syst. Appl. 79, 76–89 (2017)
Article Google Scholar
Sheng-Uei Guan, Yinan Qi, Chunyu Bao, An incremental approach to MSE-based feature selection. Int. J. Comput. Intell. Appl. 6, 451–471 (2006)
Article Google Scholar
J. Liu, G. Wang, A hybrid feature selection method for data sets of thousands of variables, in 2010 2nd International Conference on Advanced Computer Control, Shenyang (2010), pp. 288–291
Google Scholar
A.L. Blum, R.L. Rivest, Training a 3-node neural network is NP complete, COLT (1988)
Google Scholar
M. Dash, H. Liu, Feature selection for classifications. Intell. Data Anal. Int. J. 1, 131–156 (1997)
Article Google Scholar
I. Beheshti, H. Demirel, Feature-ranking-based Alzheimer’s disease classification from structural MRI”. Magn. Reson. Imaging 34(3), 252–263 (2016)
Article Google Scholar
T.M. Khoshgoftaar, K. Gao, A. Napoliano, An empirical study of feature ranking techniques for software quality prediction. Int. J. Softw. Eng. Knowl. Eng. 22(02), 161–183 (2012)
Article Google Scholar
A. Rehman, K. Javed, H.A. Babri, M. Saeed, Expert systems with applications relative discrimination criterion – a novel feature ranking method for text data. Expert Syst. Appl. 42(7), 3670–3681 (2015)
Article Google Scholar
Shaohong Zhang, Hau-San Wong, Ying Shen, Dongqing Xie, A new unsupervised feature ranking method for gene expression data based on consensus affinity. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1257–1263 (2012)
Article Google Scholar
K. Shin, X.M. Xu, Consistency-based feature selection, in Knowledge-based and intelligent information and engineering systems. KES 2009, ed. by J.D. Velásquez, S.A. Ríos, R.J. Howlett, L.C. Jain. Lecture notes in computer science, vol. 5711 (Springer, Berlin, 2009)
Google Scholar
J. Novaković, P. Strbac, D. Bulatović, Toward optimal feature selection using ranking methods and classification algorithms. Yugosl. J. Oper. Res. 21(1), 119–135 (2011)
Article MathSciNet Google Scholar
P. Drotár, J. Gazda, Z. Smékal, An experimental comparison of feature selection methods on two-class biomedical datasets. Comput. Biol. Med. 66, 1–10 (2015)
Article Google Scholar
X. Geng, T.-Y. Liu, T. Qin, H. Li, Feature selection for ranking, in Proceedings of ACM SIGIR International Conference on Information Retrieval (2007)
Google Scholar
S.B. Hariz, Z. Elouedi, Ranking-based feature selection method for dynamic belief clustering, in ICAIS (2011)
Google Scholar
C.S. Kumar, R.J. Rama Sree, Application of ranking based attribute selection filters to perform automated evaluation of descriptive answers through sequential minimal optimization models, in SOCO 2014 (2014)
Google Scholar
C. Winkler et al., Feature ranking of type 1 diabetes susceptibility genes improves prediction of type 1 diabetes. Diabetologia 57(12), 2521–2529 (2014)
Article Google Scholar
J. Van Hulse, T.M. Khoshgoftaar, A. Napolitano, A comparative evaluation of feature ranking methods for high dimensional bioinformatics data, in Proceedings of 2011 IEEE International Conference on Information Reuse and Integration IRI 2011 (2011), pp. 315–320
Google Scholar
C.C. Reyes-Aldasoro, A. Bhalerao, The Bhattacharyya space for feature selection and its application to texture segmentation. Pattern Recogn. 39(5), 812–826 (2006)
Article Google Scholar
X. Guorong, C. Peiqi, W. Minhui, Bhattacharyya distance feature selection. Proc.-Int. Conf. Pattern Recogn. 2, 195–199 (1996)
Article Google Scholar
G. Xuan, X. Zhu, P. Chai, Z. Zhang, Y.Q. Shi, D. Fu, Feature selection based on the Bhattacharyya distance, in Proceedings - International Conference on Pattern Recognition, vol. 3 (2006), pp. 1232–1235
Google Scholar
S. Lei, A feature selection method based on information gain and genetic algorithm, in Proceedings—2012 International Conference on Computer Science and Electronics Engineering, ICCSEE (2012)
Google Scholar
M.R.P. Homem, N.D.A. Mascarenhas, P.E. Cruvinel, The linear attenuation coefficients as features of multiple energy CT image classification, Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrom. Detect. Assoc. Equip. 452(1–2), 351–360 (2000)
Google Scholar
S.B. Serpico, M. D’Inca, F. Melgani, G. Moser, Comparison of feature reduction techniques for classification of hyperspectral remote sensing data. Image Signal Process. Remote Sens. VIII 4885, 347–358 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Calcutta, Kolkata, India
Sayoni Roychowdhury
Keysight Technologies, Kolkata, India
Aditya Basak
A.K.Choudhury School of IT, University of Calcutta, Kolkata, India
Saptarsi Goswami

Authors

Sayoni Roychowdhury
View author publications
You can also search for this author in PubMed Google Scholar
Aditya Basak
View author publications
You can also search for this author in PubMed Google Scholar
Saptarsi Goswami
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aditya Basak .

Editor information

Editors and Affiliations

Society for Data Science, Pune, Maharashtra, India
Neha Sharma
A.K. Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India
Amlan Chakrabarti
Department of Automatics and Applied Software, Faculty of Engineering, University of Arad, Arad, Romania
Valentina Emilia Balas
IT4Innovations, VSB-Technical University of Ostrava, Ostrava, Czech Republic
Jan Martinovic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roychowdhury, S., Basak, A., Goswami, S. (2021). Non-parametric Distance—A New Class Separability Measure. In: Sharma, N., Chakrabarti, A., Balas, V.E., Martinovic, J. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1175. Springer, Singapore. https://doi.org/10.1007/978-981-15-5619-7_1

Download citation

DOI: https://doi.org/10.1007/978-981-15-5619-7_1
Published: 19 September 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5618-0
Online ISBN: 978-981-15-5619-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics