Skip to main content

Non-parametric Distance—A New Class Separability Measure

  • Conference paper
  • First Online:
Data Management, Analytics and Innovation

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1175))

  • 858 Accesses

Abstract

Feature Selection, one of the most important preprocessing steps in Machine Learning, is the process where we automatically or manually select those features which contribute most to our prediction variable or output in which we are interested. This subset of features has some very important benefits: it reduces the computational complexity of learning algorithms, saves time, improves accuracy, and the selected features can be insightful for the people involved in the problem domain. Among the different ways of performing feature selection such as filter, wrapper and hybrid, filter-based separability methods can be used as a feature ranking tool in binary classification problems, most popular being the Bhattacharya distance and the Jeffries–Matusita (JM) distance. However, these measures are parametric and their computation requires knowledge of the distribution from which the samples are drawn. In real life, we often come across instances where it is difficult to have an idea about the distribution of observations. In this paper, we have presented a new non-parametric approach for performing feature selection called the ‘Non-Parametric Distance Measure’. The experiment with the new measure is performed over nine datasets and the results are compared with other ranking-based methods for feature selection using those datasets. This experiment proves that the new box-plot-based method can provide greater accuracy and efficiency than the conventional ranking-based measures for feature selection such as the Chi-Square, Symmetric Uncertainty and Information Gain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. A. Elisseeff, I. Guyon, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  2. E. Xing, M. Jordan, R. Karp, Feature selection for high-dimensional genomic microarray dat, in Proceedings of the Eighteenth International Conference on Machine Learning (2001), pp. 601–608

    Google Scholar 

  3. W. Shu, W. Qian, Y. Xie, Incremental approaches for feature selection from dynamic data with the variation of multiple objects. Knowl.-Based Syst. 163, 320–331 (2019)

    Article  Google Scholar 

  4. H. Liu, L. Yu, Feature selection for high-dimensional data: a fast correlation-based filter solution, in International Conference on Machine Learning (2003)

    Google Scholar 

  5. S. Goswami, A. Chakrabarti, Feature selection: a practitioner view. Int. J. Inf. Technol. Comput. Sci. (IJITCS) 6(11), 66 (2014)

    Google Scholar 

  6. S. Goswami, A.K. Das, A. Chakrabarti, B. Chakraborty, A feature cluster taxonomy based feature selection technique. Expert Syst. Appl. 79, 76–89 (2017)

    Article  Google Scholar 

  7. Sheng-Uei Guan, Yinan Qi, Chunyu Bao, An incremental approach to MSE-based feature selection. Int. J. Comput. Intell. Appl. 6, 451–471 (2006)

    Article  Google Scholar 

  8. J. Liu, G. Wang, A hybrid feature selection method for data sets of thousands of variables, in 2010 2nd International Conference on Advanced Computer Control, Shenyang (2010), pp. 288–291

    Google Scholar 

  9. A.L. Blum, R.L. Rivest, Training a 3-node neural network is NP complete, COLT (1988)

    Google Scholar 

  10. M. Dash, H. Liu, Feature selection for classifications. Intell. Data Anal. Int. J. 1, 131–156 (1997)

    Article  Google Scholar 

  11. I. Beheshti, H. Demirel, Feature-ranking-based Alzheimer’s disease classification from structural MRI”. Magn. Reson. Imaging 34(3), 252–263 (2016)

    Article  Google Scholar 

  12. T.M. Khoshgoftaar, K. Gao, A. Napoliano, An empirical study of feature ranking techniques for software quality prediction. Int. J. Softw. Eng. Knowl. Eng. 22(02), 161–183 (2012)

    Article  Google Scholar 

  13. A. Rehman, K. Javed, H.A. Babri, M. Saeed, Expert systems with applications relative discrimination criterion – a novel feature ranking method for text data. Expert Syst. Appl. 42(7), 3670–3681 (2015)

    Article  Google Scholar 

  14. Shaohong Zhang, Hau-San Wong, Ying Shen, Dongqing Xie, A new unsupervised feature ranking method for gene expression data based on consensus affinity. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1257–1263 (2012)

    Article  Google Scholar 

  15. K. Shin, X.M. Xu, Consistency-based feature selection, in Knowledge-based and intelligent information and engineering systems. KES 2009, ed. by J.D. Velásquez, S.A. Ríos, R.J. Howlett, L.C. Jain. Lecture notes in computer science, vol. 5711 (Springer, Berlin, 2009)

    Google Scholar 

  16. J. Novaković, P. Strbac, D. Bulatović, Toward optimal feature selection using ranking methods and classification algorithms. Yugosl. J. Oper. Res. 21(1), 119–135 (2011)

    Article  MathSciNet  Google Scholar 

  17. P. Drotár, J. Gazda, Z. Smékal, An experimental comparison of feature selection methods on two-class biomedical datasets. Comput. Biol. Med. 66, 1–10 (2015)

    Article  Google Scholar 

  18. X. Geng, T.-Y. Liu, T. Qin, H. Li, Feature selection for ranking, in Proceedings of ACM SIGIR International Conference on Information Retrieval (2007)

    Google Scholar 

  19. S.B. Hariz, Z. Elouedi, Ranking-based feature selection method for dynamic belief clustering, in ICAIS (2011)

    Google Scholar 

  20. C.S. Kumar, R.J. Rama Sree, Application of ranking based attribute selection filters to perform automated evaluation of descriptive answers through sequential minimal optimization models, in SOCO 2014 (2014)

    Google Scholar 

  21. C. Winkler et al., Feature ranking of type 1 diabetes susceptibility genes improves prediction of type 1 diabetes. Diabetologia 57(12), 2521–2529 (2014)

    Article  Google Scholar 

  22. J. Van Hulse, T.M. Khoshgoftaar, A. Napolitano, A comparative evaluation of feature ranking methods for high dimensional bioinformatics data, in Proceedings of 2011 IEEE International Conference on Information Reuse and Integration IRI 2011 (2011), pp. 315–320

    Google Scholar 

  23. C.C. Reyes-Aldasoro, A. Bhalerao, The Bhattacharyya space for feature selection and its application to texture segmentation. Pattern Recogn. 39(5), 812–826 (2006)

    Article  Google Scholar 

  24. X. Guorong, C. Peiqi, W. Minhui, Bhattacharyya distance feature selection. Proc.-Int. Conf. Pattern Recogn. 2, 195–199 (1996)

    Article  Google Scholar 

  25. G. Xuan, X. Zhu, P. Chai, Z. Zhang, Y.Q. Shi, D. Fu, Feature selection based on the Bhattacharyya distance, in Proceedings - International Conference on Pattern Recognition, vol. 3 (2006), pp. 1232–1235

    Google Scholar 

  26. S. Lei, A feature selection method based on information gain and genetic algorithm, in Proceedings—2012 International Conference on Computer Science and Electronics Engineering, ICCSEE (2012)

    Google Scholar 

  27. M.R.P. Homem, N.D.A. Mascarenhas, P.E. Cruvinel, The linear attenuation coefficients as features of multiple energy CT image classification, Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrom. Detect. Assoc. Equip. 452(1–2), 351–360 (2000)

    Google Scholar 

  28. S.B. Serpico, M. D’Inca, F. Melgani, G. Moser, Comparison of feature reduction techniques for classification of hyperspectral remote sensing data. Image Signal Process. Remote Sens. VIII 4885, 347–358 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditya Basak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Roychowdhury, S., Basak, A., Goswami, S. (2021). Non-parametric Distance—A New Class Separability Measure. In: Sharma, N., Chakrabarti, A., Balas, V.E., Martinovic, J. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1175. Springer, Singapore. https://doi.org/10.1007/978-981-15-5619-7_1

Download citation

Publish with us

Policies and ethics