Skip to main content

Survey on Classification and Feature Selection Approaches for Disease Diagnosis

  • Conference paper
  • First Online:
Emerging Research in Data Engineering Systems and Computer Communications

Abstract

Patient case similarity implies that finding and extracting a patient case have similar features in the knowledge base. The knowledge base contains data obtained through demographics, progress notes, medications, past medical history, discharge summaries and lab values. Data pre-processing is the first step and an important step in the modelling process. The aim of this step is to increase the effectiveness of the classification process by using representative and consistent data set. Pre-processing includes data cleaning, data transformation and feature selection. Further, for predicting the new cases, new sample will be submitted to trained model. In the literature, various feature selection and classification approaches are available, but it is not clear which feature selection approach may have better classification performance. So, this study presents a survey on feature selection and classification approaches applied on seven benched-marked diseases data sets obtained from the UCI repository.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wang, X., Wang, Y., Gao, C., Lin, K., Li, Y.: Automatic diagnosis with efficient medical case searching based on evolving graphs. IEEE Access 6, 53307–53318 (2018)

    Article  Google Scholar 

  2. Canino, G., Guzzi, P.H., Tradigo, G., Zhang, A., Veltri, P.: On the analysis of diseases and their related geographical data. IEEE J. biomed. health Inform. 21(1), 228–237 (2017)

    Article  Google Scholar 

  3. The top 10 causes of death (Last Accessed 25 Apr 2019). https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death

  4. Edla, D.R., Tripathi, D., Cheruku, R., Kuppili, V.: An efficient multi-layer ensemble framework with bpsogsa-based feature selection for credit scoring data analysis. Arab. J. Sci. Eng. 43(12), 6909–6928 (2018)

    Article  Google Scholar 

  5. Tripathi, D., Cheruku, R., Bablani, A.: Relative performance evaluation of ensemble classification with feature reduction in credit scoring datasets. In: Advances in Machine Learning and Data Science, pp. 293–304. Springer (2018)

    Google Scholar 

  6. Tripathi, D., Edla, D.R., Cheruku, R.: Hybrid credit scoring model using neighborhood rough set and multi-layer ensemble classification. J. Intell. Fuzzy Syst. 34(3), 1543–1549 (2018)

    Article  Google Scholar 

  7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Article  Google Scholar 

  8. Hall, M.A.: Correlation-based feature selection of discrete and numeric class machine learning (2000)

    Google Scholar 

  9. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)

    Article  Google Scholar 

  10. Rosenblatt, F.: Principles of neurodynamics. perceptrons and the theory of brain mechanisms. Tech. rep., CORNELL AERONAUTICAL LAB INC BUFFALO NY (1961)

    Google Scholar 

  11. Broomhead, D.S., Lowe, D.: Radial basis functions, multi-variable functional interpolation and adaptive networks. Tech. rep, Royal Signals and Radar Establishment Malvern, UK (1988)

    MATH  Google Scholar 

  12. Le Cessie, S., Van Houwelingen, J.C.: Ridge estimators in logistic regression. Appl. Stat. 191–201 (1992)

    Google Scholar 

  13. Rokach, L., Maimon, O.Z.: Data mining with decision trees: theory and applications, 69

    Google Scholar 

  14. Shi, H.: Best-first decision tree learning. Ph.D. thesis, The University of Waikato (2007)

    Google Scholar 

  15. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann (2016)

    Google Scholar 

  16. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  17. Cleary, J.G., Trigg, L.E.: K*: An instance-based learner using an entropic distance measure. In: Machine Learning Proceedings 1995, pp. 108–114. Elsevier (1995)

    Google Scholar 

  18. John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence. pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)

    Google Scholar 

  19. Platt, J.C.: 12 fast training of support vector machines using sequential minimal optimization. Adv. kernel methods 185–208 (1999)

    Google Scholar 

  20. UCI machine learning repository (Last Accessed 25 Apr 2019). https://archive.ics.uci.edu/ml/index.php

  21. Tripathi, D., Edla, D.R., Cheruku, R., Kuppili, V.: A novel hybrid credit scoring model based on ensemble feature selection and multilayer ensemble classification. Computational Intelligence

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diwakar Tripathi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tripathi, D., Manoj, I., Raja Prasanth, G., Neeraja, K., Varma, M.K., Ramachandra Reddy, B. (2020). Survey on Classification and Feature Selection Approaches for Disease Diagnosis. In: Venkata Krishna, P., Obaidat, M. (eds) Emerging Research in Data Engineering Systems and Computer Communications. Advances in Intelligent Systems and Computing, vol 1054. Springer, Singapore. https://doi.org/10.1007/978-981-15-0135-7_52

Download citation

Publish with us

Policies and ethics