Skip to main content
Log in

Integrating multiple methods to enhance medical data classification

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

In medical data classification, data reduction and improving classification performance are the important issues in the current scenario. In existing medical data classification methods, initially, the medical data pre-processing is performed. After pre-processing feature selection is performed, otherwise, the process is more time consuming and has poor accuracy. Here we have proposed two algorithms for enhancing the classification performance on medical data. In first proposed method Bag of Words technique is used for better feature subset selection. Subsequently, the hybrid Fuzzy-Neural Network approach used that can handle imprecision in data while classification. This combination of feature selection technique and Fuzzy-Neural Network classifier approach gives enhanced classification accuracy. In the second proposed algorithm, we have integrated data cleaning technique to improve data quality as pre-processing technique along with bag of words and Fuzzy-Neural Network, this method performs classification on clean filtered data with appropriately reduced feature set that results in more accurate classification than the existing methods. Thus in proposed approaches we have tried to handle three issues, removing noise in data, optimal feature subset selection and handling imprecision in data. The comparative study of various medical datasets in terms of accuracy shows that the two proposed algorithms perform better as compared to existing techniques and the enhancement obtained is around 3% and 17% respectively. In addition the performance of Bag of Words feature selection method used in the proposed system is compared with two feature selection methods LSFS and SFFS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Ajam N (2015) Heart diseases diagnoses using artificial neural network, business administration college babylon university, network and complex system. http://www.iiste.org ISSN 2224-610X (Paper) ISSN 2225—0603 (Online) Vol. 5, No. 4

  • Alzubi R, Ramzan N, Alzoubi H, Amira A(2018) A hybrid feature selection method for complex diseases SNPs, IEEE Access, vol. 6, pp 1292–1301

  • Angelov P, R Yager (2013) Density-based averaging—a new operator for data fusion. Inf Sci 222:163–174

    Article  MathSciNet  Google Scholar 

  • Anooj PK (2012) Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules. Elsevier Comput Inf Sci 24(1):27–40

    Google Scholar 

  • Baruah RD, P Angelov (2012) Evolving local means method for clustering of streaming data, In: 2012 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–8

  • Dennis B, Muthukrishnan S (2014) AGFS: adaptive genetic fuzzy system for medical data classification. Elsevier Appl Soft Comput 24:242–252

    Article  Google Scholar 

  • Do QH, Chen JF (2013) A neuro-fuzzy approach in the classification of students academic performance, Hindawi Publ Corp Comput Intell Neurosci, 2013:1–7

    Google Scholar 

  • Galathiya S, Ganatra AP, Bhensdadia CK (2012) Improved decision tree induction algorithm with feature selection, cross validation, model complexity, and reduced error pruning, (IJCSIT) Int J Comput Sci Inf Technol, Vol. 3(2):3427–3431

    Google Scholar 

  • George J, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine learning: proceedings of the eleventh international conference, Rutgers University, New Brunswick, NJ, July 10–13, 1994, pp 121–129

  • Gorzałczany MB, Rudziński F (2017) Interpretable and accurate medical data classification-a multi-objective genetic-fuzzy optimization approach. Elsevier Expert Syst Appl 71:26–39

    Article  Google Scholar 

  • Harb HM, Desuky AS (2014) Feature selection on classification of medical datasets based on particle swarm optimization. Int J Comput Appl 104(5):14–17

    Google Scholar 

  • Jayanthi SK, Sasikala S (2014) Naive bayesian classifier and PCA for web link spam detection. Comput Sci Telecommun 41(1):3–15

    Google Scholar 

  • Juhola M, Joutsijoki H, Aalto H, Hirvonen TP (2014) On classification in the case of a medical data set with a complicated distribution. Elsevier Appl Comput Inf 10(2):52–67

    Google Scholar 

  • Khaleel MA, Pradham SK, Dash GN (2013) A survey of data mining techniques on medical data for finding locally frequent diseases. Int J Adv Res Comput Sci Softw Eng 3(8):149–153

    Google Scholar 

  • Kharya S (2012) Using data mining techniques for diagnosis and prognosis of cancer disease. Int J Comput Sci Eng Inf Technol 2(2):55–66

    Google Scholar 

  • Kumar V, Minz S (2014) Feature selection: a literature review. Smart Comput Rev 4:3

  • Kuncheva LI, Faithfull WJ (2014) PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Trans Neural Netw Learn Syst 25(1):69–80

    Article  Google Scholar 

  • Liu Y, Zhang H, Chen M, Zhang L (2016) A boosting-based spatial-spectral model for stroke patients’ EEG analysis in rehabilitation training. IEEE Trans Neural Syst Rehabil Eng 24(1):169–179

    Article  Google Scholar 

  • Niranjana Murthy HS, Meenakshi M (2013) Ann model to predict coronary heart disease based on risk factors. Bonfring Int J Man Mach Interface 3(2):13–18

    Article  Google Scholar 

  • Park HW, Li D, Piao Y, Ryu KH (2017) A hybrid feature selection method to classification and its application in hypertension diagnosis. In: Bursa M, Holzinger A, Renda M, Khuri S (eds) Information technology in bio- and medical Informatics. ITBAM 2017, vol 10443. Lecture notes in computer science. Springer, Cham

    Chapter  Google Scholar 

  • Patil DV, Bichkar RS (2012) Issues in optimization of decision tree learning: a survey. Int J Appl Inf Syst (IJAIS) 3(5):13–29

    Google Scholar 

  • Peng Y, Wu Z, Jiang J (2010) A novel feature selection approach for biomedical data classification, school of informatics, university of Bradford. UK J Biomed Inf 43:(2010) 15–23

    Article  Google Scholar 

  • Samb ML, Camara F, Ndiaye S, Slimani Y, Esseghir MA (2012) A novel RFE–SVM-based feature selection approach for classification. Int J Adv Sci Technol 43:27–36

    Google Scholar 

  • Sánchez-Maroño N, Alonso-Betanzos A, Tmobile-Sanromán M (2007) Filter methods for feature selection—a comparative study, intelligent data engineering and automated learning—IDEAL 2007. Lecture notes in computer science, vol 4881. Springer, Berlin

    Google Scholar 

  • Setiawan D, Kusuma WA, Wigena AH (2017), Sequential forward floating selection with two selection criteria, In: 2017 international conference on advanced computer science and information systems (ICACSIS), Bali, pp 395–400

  • Sharma S, Agrawal J, Agarwal S, Sharma S (2013) Machine learning techniques for data mining: a survey, In: Proceedings of computational intelligence and computing research (ICCIC), IEEE international conference on 26–28 Dec 2013, pp 1–6

  • Sumalatha G, Muniraj NJR(2013) Survey on Medical Diagnosis Using Data Mining Techniques. In: IEEE proceedings of international conference on optical imaging sensor and security, Coimbatore, Tamil Nadu, India, July 2–3

  • Tarle B, Jena S (2017a) An artificial neural network based pattern classification algorithm for diagnosis of heart disease. In: IEEE proceedings of international conference on computing, communication, control and automation (ICCUBEA) on 17–18 Aug 2017, Pune. pp 1–4

  • Tarle B, Jena S (2017b) Improved artificial neural network (ANN) with aid of artificial bee colony (ABC) for medical data classification. Int J Bus Integilince Data Min. https://doi.org/10.1504/IJBIDM.2017.10010713

    Article  Google Scholar 

  • Tomar D, Agarwal S (2013) A survey on data mining approaches for healthcare. Int J Bio-Sci Bio-Technol 5(5):241–266

    Article  Google Scholar 

  • Usha Rani K (2011) Analysis of heart disease dataset using neural network approach. IJDKP 1(5):1–8

    Article  Google Scholar 

  • Xu S, Dai J, Shi H (2018) Semi-supervised Feature Selection Based on Least Square Regression with Redundancy Minimization, In: 2018 international joint conference on neural networks (IJCNN), Rio de Janeiro, pp 1–8

  • Yahya AA, Osman A, Ramli AR, Balola A (2011) Feature selection for high dimensional data: an evolutionary filter approach. J Comput Sci 7(5):800–820. https://doi.org/10.3844/jcssp.2011.800.820

    Article  Google Scholar 

  • Yusuke Adachi N, Onimura T, Yamashita SH (2016) Standard measure and SVM measure for feature selection and their performance effect for text classification, In: iiWAS ‘16 ACM proceedings of the 18th international conference on information integration and web-based applications and services Singapore, pp 262–266

  • Zhao R, Mao K (2018) Fuzzy bag-of-words model for document representation. IEEE Trans Fuzzy Syst 26:794–804

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Balasaheb Tarle.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tarle, B., Chintakindi, S. & Jena, S. Integrating multiple methods to enhance medical data classification. Evolving Systems 11, 133–142 (2020). https://doi.org/10.1007/s12530-019-09272-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-019-09272-x

Keywords

Navigation