Skip to main content

Feature Selection Algorithms in Medical Data Classification: A Brief Survey and Experimentation

  • Conference paper
  • First Online:
ICDSMLA 2019

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 601))

Abstract

Feature selection algorithms play a crucial role in any machine learning problem. Choice of the best algorithm yields optimal subset of features thereby increasing the accuracy and reducing the time required for training. In the case of high dimensional datasets it is also advantageous in removing the irrelevant features. This paper presents a novel approach of surveying the popular feature selection algorithms specifically used in medical data classification, by considering the following types of medical data—signals, images and numerical. This work shall be very useful to researchers in collecting first hand information since we have reviewed the various aspects such as—available medical datasets, feature selection techniques, choice of classifier, issues in identifying the feature selection technique, analysis of major feature selection methodologies and detailed mechanisms thereof. We have also performed sample experimentation on the standard medical datasets from UCI and analyzed the effects on time and performance by employing 12 popular classifiers. The results demonstrate improved accuracy and lowered computation times.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Rasmita D (2018) An adaptive harmony search approach for gene selection and classification of high dimensional medical data. J King Saud Univ Comput Inf Sci 1–13. https://doi.org/10.1016/j.jksuci.2018.02.013

  2. Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–159. https://doi.org/10.1109/34.574797

    Article  Google Scholar 

  3. Jianyu M, Lingfeng N (2016) A survey on feature selection. Procedia Comput Sci 91:919–926. https://doi.org/10.1016/j.procs.2016.07.111

    Article  Google Scholar 

  4. Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput 56:94–106. https://doi.org/10.1016/j.asoc.2017.03.002

    Article  Google Scholar 

  5. Sasikala S, Appavu S, Geetha S (2016) Multi filtration feature selection (MFFS) to improve discriminatory ability in clinical data set. Appl Comput Inform 12:117–127. https://doi.org/10.1016/j.aci.2014.03.002

    Article  Google Scholar 

  6. Ghaddar B, Sawaya JN (2018) High dimensional data classification and feature selection using support vector machines. Eur J Oper Res 265:993–1004

    Article  MathSciNet  Google Scholar 

  7. Chinnaswamy A, Srinivasan R (2018) Attribute Selection using fuzzy rough set based customized similarity measure for lung cancer microarray gene expression data. Future Comput Inform J 3(1):131–142. https://doi.org/10.1016/j.fcij.2018.02.002

    Article  Google Scholar 

  8. Berbar MA (2018) Hybrid methods for feature extraction for breast masses classification. Egypt Inform J 19(1):63–73. https://doi.org/10.1016/j.eij.2017.08.001

    Article  Google Scholar 

  9. Nagpal S, Arora S, Dey S, Shreya (2017) Feature selection using gravitational search algorithm for biomedical data. Procedia Comput Sci 115:258–265. https://doi.org/10.1016/j.procs.2017.09.133

  10. Ka T, Jacob SG, Athilakshmi (2017) Feature selection techniques for prediction of neuro-degenerative disorders: a case study with Alzheimer’s And Parkinson’s disease. Procedia Comput Sci 115:188–194. https://doi.org/10.1016/j.procs.2017.09.125

  11. Dash R (2017) A two stage grading approach for feature selection and classification of microarray data using Pareto based feature ranking techniques: a case study. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2017.08.005 (In Press)

  12. Sweetlin JD, Nehemiah HK, Kannan A (2017) Computer aided diagnosis of pulmonary hamartoma from CT scan images using ant colony optimization based feature selection. Alex Eng J 57(3):1557–1567. https://doi.org/10.1016/j.aej.2017.04.014

    Article  Google Scholar 

  13. Lu H, Chen J, Yan K, Jin Q, Xue Y, Gao Z (2017) A hybrid feature selection algorithm for gene expression classification. Neurocomputing 256:56–62

    Article  Google Scholar 

  14. Trofimov AG, Shishkin SL, Kozyrskiy BL, Velichkovsky BM (2018) A greedy feature selection algorithm for brain computer interface classification committees. Procedia Comput Sci 123:488–493. https://doi.org/10.1016/j.procs.2018.01.074

    Article  Google Scholar 

  15. Moskovitch R, Choi H, Hripcsak G, Tatonetti NP (2017) Prognosis of clinical outcomes with temporal patterns and experiences with one class feature selection. IEEE/ACM Trans Comput Biol Bioinform 4(3):555–564

    Article  Google Scholar 

  16. Setyawati O, Arifianto AS, Sarosa M (2017) Feature selection for the classification of clinical data of stroke patients. In: 20th IEEE international conference on electrical machines and systems (ICEMS), pp 1–4. https://doi.org/10.1109/icems.2017.8056491

  17. Rad AB, Eftest T, Engan K, Irusta U, Kvaly JT, Johansen JK, Wik L, Katsaggelos AK (2017) ECG-based classification of resuscitation cardiac rhythms for retrospective data analysis. IEEE Trans Biomed Eng 64(10):2411–2418

    Article  Google Scholar 

  18. Yang S, Guo J, Jin J (2018) An improved Id3 algorithm for medical data classification. Comput Electr Eng 65. https://doi.org/10.1016/j.compeleceng.2017.08.005.19

  19. Lu W, Hou H, Chu J (2018) Feature fusion for imbalanced ECG data analysis. Biomed Signal Process Control 41:152–160

    Article  Google Scholar 

  20. Ma B, Xia Y (2017) A genetic algorithm based feature selection for binary phenotype prediction using structural brain magnetic resonance imaging. In: 13th international conference on natural computation, fuzzy systems and knowledge discovery, vol 4, no 5, pp 124–130. https://doi.org/10.1109/fskd.2017.8393025

  21. Pratama MO (2017) Kidney transplant classification with gene expression profiles using LI feature selection ensemble classifier based on data clustering. In: 9th international conference on advanced computer science and information systems, pp 239–245

    Google Scholar 

  22. Zu C, Wang Y, Zhou L, Wang L, Zhang D (2018) Multi-modality feature selection with adaptive similarity learning for classification of Alzheimer’s disease. In: 15th IEEE international symposium on biomedical imaging (ISBI), Washington DC, USA

    Google Scholar 

  23. Yamada M, Tang J, Lugo-Martinez J, Hodzic E, Shrestha R, Saha A, Ouyang H, Yin D, Mamitsuka H, Sahinalp C, Radivojac P, Menczer F, Chang Y (2018) Ultra high-dimensional nonlinear feature selection for big biological data. IEEE Trans Knowl Data Eng 30(7). https://doi.org/10.1109/tkde.2018.2789451

  24. Liu Q, Gu Q, Wu Z (2017) Feature selection method based on support vector machine and shape analysis for high-throughput medical data. Comput Biol Med 91:103–111. https://doi.org/10.1016/j.compbiomed.2017.10.008

    Article  Google Scholar 

  25. Nagpal S, Arora S, Dey S, Shreya (2017) Feature selection using gravitational search algorithm for biomedical data. Procedia Comput Sci 115:258–265

    Google Scholar 

  26. Sanjay A, Nair HV, Murali S, Krishnaveni KS (2018) A data mining model to predict breast cancer using improved feature selection method on real time data. In: 2018 international conference on advances in computing, communications and informatics (ICACCI), pp 2437–2440. https://doi.org/10.1109/icacci.2018.8554450

  27. Santos V, Datia N, Pato MPM (2014) Ensemble feature ranking applied to medical data. Procedia Technol 17:223–230. https://doi.org/10.1016/j.protcy.2014.10.232

    Article  Google Scholar 

  28. Vinod DF, Vasudevan VA (2016) Filter based feature set selection approach for big data classification of patient records. In: International conference on electrical, electronics, and optimization techniques (ICEEOT), pp 3684–3687

    Google Scholar 

  29. Kumar SS, Shaikh T (2017) Empirical evaluation of the performance of feature selection approaches on random forest. In: IEEE international conference on computer and applications, pp 227–231 https://doi.org/10.1109/comapp.2017.8079769

  30. Cheruku R, Edla DR (2017) Bin-BB: binning with branch & bound feature selection for improved diabetes classification. In: 14th IEEE India council international conference (INDICON), pp 1–4. https://doi.org/10.1109/indicon.2017.8487868

  31. Zhu M, Su B, Ning G (2017) Research on medical high dimensional imbalanced data classification—ensemble feature selection algorithm with random forest. In: International conference on smart grid and electrical automation, pp 273–277

    Google Scholar 

  32. Dhakate PP, Rajeswari K, Abin D (2015) An ensemble approach for cancerous dataset analysis using feature selection. In: Proceedings of 2015 global conference on communication technologies pp 479–482. https://doi.org/10.1109/gcct.2015.7342708

  33. Seethal CR, Panicker JR, Vasudevan V (2016) Feature selection in clinical data processing for classification. In: IEEE international conference on information science (ICIS), pp 172–175. https://doi.org/10.1109/infosci.2016.7845321

  34. Bar Y, Diamant I, Wolf L, Lieberman S, Konen E, Greenspan H (2016) Chest pathology identification using deep feature selection with non-medical training. Comput Methods Biomech Biomed Eng Imaging Vis 6(3):259–263

    Article  Google Scholar 

  35. Pavithra D, Lakshmanan B (2017) Feature selection and classification in gene expression cancer data. In: International conference on computational intelligence in data science (ICCIDS), pp 1–6. https://doi.org/10.1109/iccids.2017.8272668

  36. Peker M, Arslan A, Sen B, Çelebi FV, But AA (2015) Novel hybrid method for determining the depth of anesthesia level: combining ReliefF feature selection and random forest algorithm (ReliefF + RF). In: 2015 International symposium on innovations intelligent systems and applications (INISTA), pp 1–8. https://doi.org/10.1109/inista.2015.7276737

  37. Wang H, Liu Y, Huang W (2017) The application of feature selection in Hepatitis B virus reactivation. In: IEEE second international conference on big data analysis, pp 893–896

    Google Scholar 

  38. Wosiak A, Zakrzewska D (2017) Unsupervised feature selection using reversed correlation for improved medical diagnosis. In: 2017 IEEE international conference on INnovations in Intelligent SysTems and Applications (INISTA), pp 1–5. https://doi.org/10.1109/inista.2017.8001125

  39. Li K, Peng H, Zhou X, Li S (2016) Feature selection based on multiple correlation measures for medical examination dataset. In: 2016 IEEE advanced information management, communicates, electronic and automation control conference (IMCEC), pp 845–849. https://doi.org/10.1109/imcec.2016.7867329

  40. Suzuki A, Ryu K (2014) Feature selection method for estimating systolic blood pressure using the Taguchi method. IEEE Trans Ind Inform 10(2):1077–1083

    Article  Google Scholar 

  41. Yusof MM, Mohamed R, Wahid N (2016) Benchmark of feature selection techniques with machine learning algorithms for cancer datasets. In: ACM international conference on image analysis and recognition, pp 1–5. https://doi.org/10.1145/2952744.2952753

  42. Nalband S, Sundar A, Prince AA, Agarwal A (2016) Feature selection and classification methodology for the detection of knee joint disorders. Computer methods and programs in biomedicine. In: 2016 IEEE advanced information management, communicates, electronic and automation control conference (IMCEC), pp 94–104. https://doi.org/10.1109/imcec.2016.7867329

  43. Keles MK, Kilic U (2018) Artificial Bee Colony Algorithm for feature selection on SCADI dataset. In: 3rd international conference on computer science and engineering (UBMK), pp 463–466. https://doi.org/10.1109/ubmk.2018.8566287

  44. Zhou J, Lu Z, Sun J, Yuan L, Wang F, Ye J (2013) FeaFiner: biomarker identification from medical data through feature generalization and selection. In: ACM SIGKDD knowledge discovery and data mining conference, pp 1034–1042

    Google Scholar 

  45. WEKA. https://www.cs.waikato.ac.nz/ml/weka/

  46. Autistic Spectrum Disorder Screening Data for Children. https://archive.ics.uci.edu/ml/datasets/Autistic+Spectrum+Disorder+Screening+Data+for+Children++

  47. Hepatitis Data. Set https://archive.ics.uci.edu/ml/datasets/Hepatitis

  48. Sanchez A, Soguero-Ruiz C, Mora-Jiménez I, Rivas-Flores FJ, Lehmann DJ, Rubio-Sánchez M (2018) Scaled radial axes for interactive visual feature selection—a case study for analyzing chronic conditions. Expert Syst Appl 100:182–196

    Article  Google Scholar 

  49. Nouinou S, Afia AE, Fkihi SE (2018) Overview on last advances of feature selection In: International conference on learning and optimization algorithms: theory and applications, pp 1–6. https://doi.org/10.1145/3230905.3230959

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Gayathri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Panicker, S.S., Gayathri, P. (2020). Feature Selection Algorithms in Medical Data Classification: A Brief Survey and Experimentation. In: Kumar, A., Paprzycki, M., Gunjan, V. (eds) ICDSMLA 2019. Lecture Notes in Electrical Engineering, vol 601. Springer, Singapore. https://doi.org/10.1007/978-981-15-1420-3_90

Download citation

Publish with us

Policies and ethics