Abstract
Patient case similarity implies that finding and extracting a patient case have similar features in the knowledge base. The knowledge base contains data obtained through demographics, progress notes, medications, past medical history, discharge summaries and lab values. Data pre-processing is the first step and an important step in the modelling process. The aim of this step is to increase the effectiveness of the classification process by using representative and consistent data set. Pre-processing includes data cleaning, data transformation and feature selection. Further, for predicting the new cases, new sample will be submitted to trained model. In the literature, various feature selection and classification approaches are available, but it is not clear which feature selection approach may have better classification performance. So, this study presents a survey on feature selection and classification approaches applied on seven benched-marked diseases data sets obtained from the UCI repository.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wang, X., Wang, Y., Gao, C., Lin, K., Li, Y.: Automatic diagnosis with efficient medical case searching based on evolving graphs. IEEE Access 6, 53307–53318 (2018)
Canino, G., Guzzi, P.H., Tradigo, G., Zhang, A., Veltri, P.: On the analysis of diseases and their related geographical data. IEEE J. biomed. health Inform. 21(1), 228–237 (2017)
The top 10 causes of death (Last Accessed 25 Apr 2019). https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death
Edla, D.R., Tripathi, D., Cheruku, R., Kuppili, V.: An efficient multi-layer ensemble framework with bpsogsa-based feature selection for credit scoring data analysis. Arab. J. Sci. Eng. 43(12), 6909–6928 (2018)
Tripathi, D., Cheruku, R., Bablani, A.: Relative performance evaluation of ensemble classification with feature reduction in credit scoring datasets. In: Advances in Machine Learning and Data Science, pp. 293–304. Springer (2018)
Tripathi, D., Edla, D.R., Cheruku, R.: Hybrid credit scoring model using neighborhood rough set and multi-layer ensemble classification. J. Intell. Fuzzy Syst. 34(3), 1543–1549 (2018)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Hall, M.A.: Correlation-based feature selection of discrete and numeric class machine learning (2000)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Rosenblatt, F.: Principles of neurodynamics. perceptrons and the theory of brain mechanisms. Tech. rep., CORNELL AERONAUTICAL LAB INC BUFFALO NY (1961)
Broomhead, D.S., Lowe, D.: Radial basis functions, multi-variable functional interpolation and adaptive networks. Tech. rep, Royal Signals and Radar Establishment Malvern, UK (1988)
Le Cessie, S., Van Houwelingen, J.C.: Ridge estimators in logistic regression. Appl. Stat. 191–201 (1992)
Rokach, L., Maimon, O.Z.: Data mining with decision trees: theory and applications, 69
Shi, H.: Best-first decision tree learning. Ph.D. thesis, The University of Waikato (2007)
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann (2016)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Cleary, J.G., Trigg, L.E.: K*: An instance-based learner using an entropic distance measure. In: Machine Learning Proceedings 1995, pp. 108–114. Elsevier (1995)
John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence. pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)
Platt, J.C.: 12 fast training of support vector machines using sequential minimal optimization. Adv. kernel methods 185–208 (1999)
UCI machine learning repository (Last Accessed 25 Apr 2019). https://archive.ics.uci.edu/ml/index.php
Tripathi, D., Edla, D.R., Cheruku, R., Kuppili, V.: A novel hybrid credit scoring model based on ensemble feature selection and multilayer ensemble classification. Computational Intelligence
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tripathi, D., Manoj, I., Raja Prasanth, G., Neeraja, K., Varma, M.K., Ramachandra Reddy, B. (2020). Survey on Classification and Feature Selection Approaches for Disease Diagnosis. In: Venkata Krishna, P., Obaidat, M. (eds) Emerging Research in Data Engineering Systems and Computer Communications. Advances in Intelligent Systems and Computing, vol 1054. Springer, Singapore. https://doi.org/10.1007/978-981-15-0135-7_52
Download citation
DOI: https://doi.org/10.1007/978-981-15-0135-7_52
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0134-0
Online ISBN: 978-981-15-0135-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)