Abstract
Feature selection algorithms play a crucial role in any machine learning problem. Choice of the best algorithm yields optimal subset of features thereby increasing the accuracy and reducing the time required for training. In the case of high dimensional datasets it is also advantageous in removing the irrelevant features. This paper presents a novel approach of surveying the popular feature selection algorithms specifically used in medical data classification, by considering the following types of medical data—signals, images and numerical. This work shall be very useful to researchers in collecting first hand information since we have reviewed the various aspects such as—available medical datasets, feature selection techniques, choice of classifier, issues in identifying the feature selection technique, analysis of major feature selection methodologies and detailed mechanisms thereof. We have also performed sample experimentation on the standard medical datasets from UCI and analyzed the effects on time and performance by employing 12 popular classifiers. The results demonstrate improved accuracy and lowered computation times.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rasmita D (2018) An adaptive harmony search approach for gene selection and classification of high dimensional medical data. J King Saud Univ Comput Inf Sci 1–13. https://doi.org/10.1016/j.jksuci.2018.02.013
Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–159. https://doi.org/10.1109/34.574797
Jianyu M, Lingfeng N (2016) A survey on feature selection. Procedia Comput Sci 91:919–926. https://doi.org/10.1016/j.procs.2016.07.111
Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput 56:94–106. https://doi.org/10.1016/j.asoc.2017.03.002
Sasikala S, Appavu S, Geetha S (2016) Multi filtration feature selection (MFFS) to improve discriminatory ability in clinical data set. Appl Comput Inform 12:117–127. https://doi.org/10.1016/j.aci.2014.03.002
Ghaddar B, Sawaya JN (2018) High dimensional data classification and feature selection using support vector machines. Eur J Oper Res 265:993–1004
Chinnaswamy A, Srinivasan R (2018) Attribute Selection using fuzzy rough set based customized similarity measure for lung cancer microarray gene expression data. Future Comput Inform J 3(1):131–142. https://doi.org/10.1016/j.fcij.2018.02.002
Berbar MA (2018) Hybrid methods for feature extraction for breast masses classification. Egypt Inform J 19(1):63–73. https://doi.org/10.1016/j.eij.2017.08.001
Nagpal S, Arora S, Dey S, Shreya (2017) Feature selection using gravitational search algorithm for biomedical data. Procedia Comput Sci 115:258–265. https://doi.org/10.1016/j.procs.2017.09.133
Ka T, Jacob SG, Athilakshmi (2017) Feature selection techniques for prediction of neuro-degenerative disorders: a case study with Alzheimer’s And Parkinson’s disease. Procedia Comput Sci 115:188–194. https://doi.org/10.1016/j.procs.2017.09.125
Dash R (2017) A two stage grading approach for feature selection and classification of microarray data using Pareto based feature ranking techniques: a case study. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2017.08.005 (In Press)
Sweetlin JD, Nehemiah HK, Kannan A (2017) Computer aided diagnosis of pulmonary hamartoma from CT scan images using ant colony optimization based feature selection. Alex Eng J 57(3):1557–1567. https://doi.org/10.1016/j.aej.2017.04.014
Lu H, Chen J, Yan K, Jin Q, Xue Y, Gao Z (2017) A hybrid feature selection algorithm for gene expression classification. Neurocomputing 256:56–62
Trofimov AG, Shishkin SL, Kozyrskiy BL, Velichkovsky BM (2018) A greedy feature selection algorithm for brain computer interface classification committees. Procedia Comput Sci 123:488–493. https://doi.org/10.1016/j.procs.2018.01.074
Moskovitch R, Choi H, Hripcsak G, Tatonetti NP (2017) Prognosis of clinical outcomes with temporal patterns and experiences with one class feature selection. IEEE/ACM Trans Comput Biol Bioinform 4(3):555–564
Setyawati O, Arifianto AS, Sarosa M (2017) Feature selection for the classification of clinical data of stroke patients. In: 20th IEEE international conference on electrical machines and systems (ICEMS), pp 1–4. https://doi.org/10.1109/icems.2017.8056491
Rad AB, Eftest T, Engan K, Irusta U, Kvaly JT, Johansen JK, Wik L, Katsaggelos AK (2017) ECG-based classification of resuscitation cardiac rhythms for retrospective data analysis. IEEE Trans Biomed Eng 64(10):2411–2418
Yang S, Guo J, Jin J (2018) An improved Id3 algorithm for medical data classification. Comput Electr Eng 65. https://doi.org/10.1016/j.compeleceng.2017.08.005.19
Lu W, Hou H, Chu J (2018) Feature fusion for imbalanced ECG data analysis. Biomed Signal Process Control 41:152–160
Ma B, Xia Y (2017) A genetic algorithm based feature selection for binary phenotype prediction using structural brain magnetic resonance imaging. In: 13th international conference on natural computation, fuzzy systems and knowledge discovery, vol 4, no 5, pp 124–130. https://doi.org/10.1109/fskd.2017.8393025
Pratama MO (2017) Kidney transplant classification with gene expression profiles using LI feature selection ensemble classifier based on data clustering. In: 9th international conference on advanced computer science and information systems, pp 239–245
Zu C, Wang Y, Zhou L, Wang L, Zhang D (2018) Multi-modality feature selection with adaptive similarity learning for classification of Alzheimer’s disease. In: 15th IEEE international symposium on biomedical imaging (ISBI), Washington DC, USA
Yamada M, Tang J, Lugo-Martinez J, Hodzic E, Shrestha R, Saha A, Ouyang H, Yin D, Mamitsuka H, Sahinalp C, Radivojac P, Menczer F, Chang Y (2018) Ultra high-dimensional nonlinear feature selection for big biological data. IEEE Trans Knowl Data Eng 30(7). https://doi.org/10.1109/tkde.2018.2789451
Liu Q, Gu Q, Wu Z (2017) Feature selection method based on support vector machine and shape analysis for high-throughput medical data. Comput Biol Med 91:103–111. https://doi.org/10.1016/j.compbiomed.2017.10.008
Nagpal S, Arora S, Dey S, Shreya (2017) Feature selection using gravitational search algorithm for biomedical data. Procedia Comput Sci 115:258–265
Sanjay A, Nair HV, Murali S, Krishnaveni KS (2018) A data mining model to predict breast cancer using improved feature selection method on real time data. In: 2018 international conference on advances in computing, communications and informatics (ICACCI), pp 2437–2440. https://doi.org/10.1109/icacci.2018.8554450
Santos V, Datia N, Pato MPM (2014) Ensemble feature ranking applied to medical data. Procedia Technol 17:223–230. https://doi.org/10.1016/j.protcy.2014.10.232
Vinod DF, Vasudevan VA (2016) Filter based feature set selection approach for big data classification of patient records. In: International conference on electrical, electronics, and optimization techniques (ICEEOT), pp 3684–3687
Kumar SS, Shaikh T (2017) Empirical evaluation of the performance of feature selection approaches on random forest. In: IEEE international conference on computer and applications, pp 227–231 https://doi.org/10.1109/comapp.2017.8079769
Cheruku R, Edla DR (2017) Bin-BB: binning with branch & bound feature selection for improved diabetes classification. In: 14th IEEE India council international conference (INDICON), pp 1–4. https://doi.org/10.1109/indicon.2017.8487868
Zhu M, Su B, Ning G (2017) Research on medical high dimensional imbalanced data classification—ensemble feature selection algorithm with random forest. In: International conference on smart grid and electrical automation, pp 273–277
Dhakate PP, Rajeswari K, Abin D (2015) An ensemble approach for cancerous dataset analysis using feature selection. In: Proceedings of 2015 global conference on communication technologies pp 479–482. https://doi.org/10.1109/gcct.2015.7342708
Seethal CR, Panicker JR, Vasudevan V (2016) Feature selection in clinical data processing for classification. In: IEEE international conference on information science (ICIS), pp 172–175. https://doi.org/10.1109/infosci.2016.7845321
Bar Y, Diamant I, Wolf L, Lieberman S, Konen E, Greenspan H (2016) Chest pathology identification using deep feature selection with non-medical training. Comput Methods Biomech Biomed Eng Imaging Vis 6(3):259–263
Pavithra D, Lakshmanan B (2017) Feature selection and classification in gene expression cancer data. In: International conference on computational intelligence in data science (ICCIDS), pp 1–6. https://doi.org/10.1109/iccids.2017.8272668
Peker M, Arslan A, Sen B, Çelebi FV, But AA (2015) Novel hybrid method for determining the depth of anesthesia level: combining ReliefF feature selection and random forest algorithm (ReliefF + RF). In: 2015 International symposium on innovations intelligent systems and applications (INISTA), pp 1–8. https://doi.org/10.1109/inista.2015.7276737
Wang H, Liu Y, Huang W (2017) The application of feature selection in Hepatitis B virus reactivation. In: IEEE second international conference on big data analysis, pp 893–896
Wosiak A, Zakrzewska D (2017) Unsupervised feature selection using reversed correlation for improved medical diagnosis. In: 2017 IEEE international conference on INnovations in Intelligent SysTems and Applications (INISTA), pp 1–5. https://doi.org/10.1109/inista.2017.8001125
Li K, Peng H, Zhou X, Li S (2016) Feature selection based on multiple correlation measures for medical examination dataset. In: 2016 IEEE advanced information management, communicates, electronic and automation control conference (IMCEC), pp 845–849. https://doi.org/10.1109/imcec.2016.7867329
Suzuki A, Ryu K (2014) Feature selection method for estimating systolic blood pressure using the Taguchi method. IEEE Trans Ind Inform 10(2):1077–1083
Yusof MM, Mohamed R, Wahid N (2016) Benchmark of feature selection techniques with machine learning algorithms for cancer datasets. In: ACM international conference on image analysis and recognition, pp 1–5. https://doi.org/10.1145/2952744.2952753
Nalband S, Sundar A, Prince AA, Agarwal A (2016) Feature selection and classification methodology for the detection of knee joint disorders. Computer methods and programs in biomedicine. In: 2016 IEEE advanced information management, communicates, electronic and automation control conference (IMCEC), pp 94–104. https://doi.org/10.1109/imcec.2016.7867329
Keles MK, Kilic U (2018) Artificial Bee Colony Algorithm for feature selection on SCADI dataset. In: 3rd international conference on computer science and engineering (UBMK), pp 463–466. https://doi.org/10.1109/ubmk.2018.8566287
Zhou J, Lu Z, Sun J, Yuan L, Wang F, Ye J (2013) FeaFiner: biomarker identification from medical data through feature generalization and selection. In: ACM SIGKDD knowledge discovery and data mining conference, pp 1034–1042
Autistic Spectrum Disorder Screening Data for Children. https://archive.ics.uci.edu/ml/datasets/Autistic+Spectrum+Disorder+Screening+Data+for+Children++
Hepatitis Data. Set https://archive.ics.uci.edu/ml/datasets/Hepatitis
Sanchez A, Soguero-Ruiz C, Mora-Jiménez I, Rivas-Flores FJ, Lehmann DJ, Rubio-Sánchez M (2018) Scaled radial axes for interactive visual feature selection—a case study for analyzing chronic conditions. Expert Syst Appl 100:182–196
Nouinou S, Afia AE, Fkihi SE (2018) Overview on last advances of feature selection In: International conference on learning and optimization algorithms: theory and applications, pp 1–6. https://doi.org/10.1145/3230905.3230959
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Panicker, S.S., Gayathri, P. (2020). Feature Selection Algorithms in Medical Data Classification: A Brief Survey and Experimentation. In: Kumar, A., Paprzycki, M., Gunjan, V. (eds) ICDSMLA 2019. Lecture Notes in Electrical Engineering, vol 601. Springer, Singapore. https://doi.org/10.1007/978-981-15-1420-3_90
Download citation
DOI: https://doi.org/10.1007/978-981-15-1420-3_90
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1419-7
Online ISBN: 978-981-15-1420-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)