Survey on Classification and Feature Selection Approaches for Disease Diagnosis

Tripathi, Diwakar; Manoj, I.; Raja Prasanth, G.; Neeraja, K.; Varma, Mohan Krishna; Ramachandra Reddy, B.

doi:10.1007/978-981-15-0135-7_52

Diwakar Tripathi¹⁶,
I. Manoj¹⁶,
G. Raja Prasanth¹⁶,
K. Neeraja¹⁶,
Mohan Krishna Varma¹⁶ &
…
B. Ramachandra Reddy¹⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1054))

970 Accesses
6 Citations

Abstract

Patient case similarity implies that finding and extracting a patient case have similar features in the knowledge base. The knowledge base contains data obtained through demographics, progress notes, medications, past medical history, discharge summaries and lab values. Data pre-processing is the first step and an important step in the modelling process. The aim of this step is to increase the effectiveness of the classification process by using representative and consistent data set. Pre-processing includes data cleaning, data transformation and feature selection. Further, for predicting the new cases, new sample will be submitted to trained model. In the literature, various feature selection and classification approaches are available, but it is not clear which feature selection approach may have better classification performance. So, this study presents a survey on feature selection and classification approaches applied on seven benched-marked diseases data sets obtained from the UCI repository.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wang, X., Wang, Y., Gao, C., Lin, K., Li, Y.: Automatic diagnosis with efficient medical case searching based on evolving graphs. IEEE Access 6, 53307–53318 (2018)
Article Google Scholar
Canino, G., Guzzi, P.H., Tradigo, G., Zhang, A., Veltri, P.: On the analysis of diseases and their related geographical data. IEEE J. biomed. health Inform. 21(1), 228–237 (2017)
Article Google Scholar
The top 10 causes of death (Last Accessed 25 Apr 2019). https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death
Edla, D.R., Tripathi, D., Cheruku, R., Kuppili, V.: An efficient multi-layer ensemble framework with bpsogsa-based feature selection for credit scoring data analysis. Arab. J. Sci. Eng. 43(12), 6909–6928 (2018)
Article Google Scholar
Tripathi, D., Cheruku, R., Bablani, A.: Relative performance evaluation of ensemble classification with feature reduction in credit scoring datasets. In: Advances in Machine Learning and Data Science, pp. 293–304. Springer (2018)
Google Scholar
Tripathi, D., Edla, D.R., Cheruku, R.: Hybrid credit scoring model using neighborhood rough set and multi-layer ensemble classification. J. Intell. Fuzzy Syst. 34(3), 1543–1549 (2018)
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Article Google Scholar
Hall, M.A.: Correlation-based feature selection of discrete and numeric class machine learning (2000)
Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Article Google Scholar
Rosenblatt, F.: Principles of neurodynamics. perceptrons and the theory of brain mechanisms. Tech. rep., CORNELL AERONAUTICAL LAB INC BUFFALO NY (1961)
Google Scholar
Broomhead, D.S., Lowe, D.: Radial basis functions, multi-variable functional interpolation and adaptive networks. Tech. rep, Royal Signals and Radar Establishment Malvern, UK (1988)
MATH Google Scholar
Le Cessie, S., Van Houwelingen, J.C.: Ridge estimators in logistic regression. Appl. Stat. 191–201 (1992)
Google Scholar
Rokach, L., Maimon, O.Z.: Data mining with decision trees: theory and applications, 69
Google Scholar
Shi, H.: Best-first decision tree learning. Ph.D. thesis, The University of Waikato (2007)
Google Scholar
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann (2016)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Cleary, J.G., Trigg, L.E.: K*: An instance-based learner using an entropic distance measure. In: Machine Learning Proceedings 1995, pp. 108–114. Elsevier (1995)
Google Scholar
John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence. pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)
Google Scholar
Platt, J.C.: 12 fast training of support vector machines using sequential minimal optimization. Adv. kernel methods 185–208 (1999)
Google Scholar
UCI machine learning repository (Last Accessed 25 Apr 2019). https://archive.ics.uci.edu/ml/index.php
Tripathi, D., Edla, D.R., Cheruku, R., Kuppili, V.: A novel hybrid credit scoring model based on ensemble feature selection and multilayer ensemble classification. Computational Intelligence
Google Scholar

Download references

Author information

Authors and Affiliations

Madanapalle Institute of Technology & Science, Madanapalle, A.P, India
Diwakar Tripathi, I. Manoj, G. Raja Prasanth, K. Neeraja, Mohan Krishna Varma & B. Ramachandra Reddy

Authors

Diwakar Tripathi
View author publications
You can also search for this author in PubMed Google Scholar
I. Manoj
View author publications
You can also search for this author in PubMed Google Scholar
G. Raja Prasanth
View author publications
You can also search for this author in PubMed Google Scholar
K. Neeraja
View author publications
You can also search for this author in PubMed Google Scholar
Mohan Krishna Varma
View author publications
You can also search for this author in PubMed Google Scholar
B. Ramachandra Reddy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diwakar Tripathi .

Editor information

Editors and Affiliations

Department of Computer Science, Sri Padmavati Mahila University, Tirupathi, Andhra Pradesh, India
P. Venkata Krishna
Department of ECE, Nazarbayev University, Astana, Kazakhstan
Mohammad S. Obaidat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tripathi, D., Manoj, I., Raja Prasanth, G., Neeraja, K., Varma, M.K., Ramachandra Reddy, B. (2020). Survey on Classification and Feature Selection Approaches for Disease Diagnosis. In: Venkata Krishna, P., Obaidat, M. (eds) Emerging Research in Data Engineering Systems and Computer Communications. Advances in Intelligent Systems and Computing, vol 1054. Springer, Singapore. https://doi.org/10.1007/978-981-15-0135-7_52

Download citation

DOI: https://doi.org/10.1007/978-981-15-0135-7_52
Published: 11 February 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0134-0
Online ISBN: 978-981-15-0135-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics