Advertisement

A new complement naïve Bayesian approach for biomedical data classification

  • Amare Anagaw
  • Yang-Lang Chang
Original Research

Abstract

Biomedical data classification tasks are very challenging because data is usually large, noised and imbalanced. Particularly the noise can reduce system performance in terms of classification accuracy, time in building a classifier and the size of the classifier. Accordingly, most existing learning algorithms have integrated various approaches to enhance their learning abilities from noisy environments, but the existence of noise can still introduce serious negative impacts. A more reasonable solution might be to employ some preprocessing mechanisms to handle noisy instances before a learner is formed. Therefore, we introduce a method called double learning to improve the classification performance of our model. As to the author’s knowledge, most of the previous works used the normal (noise free) instances for model construction (training) after the noise instances are isolated. This approach increases computational task on model construction for active learners and total computational time for passive learners. It also ignores minority data instance which leads to miss classification of instances from minority group as test cases. The main idea of this paper is to construct a model using noised instances. This approach minimizes the model construction time by reducing the number of instances and improves classification performance. Therefore, only the identified noised data are used for model construction instead of the normal (noise free) data. Since noised instances are used for model construction, the entire naïve Bayesian working logic is reversed. This method is called complement naïve Bayesian (CNB) which makes use of the idea of complement based learning to improve the accuracy performance. Finally, the performance of the proposed CNB is compared to naïve Bayesian and some other classification algorithms with the single photon emission computed tomography, Indian liver patient dataset, Wilt and Tic-Tac-Toe endgame datasets. The experimental results demonstrated that the proposed approach has shown promising results in terms of computational time and accuracy performance on both balanced and imbalanced datasets used.

Keywords

Biomedical data classification Complement naïve Bayesian Double learning 

Notes

Acknowledgements

Funding was provided by Ministry of Science and Technology, Taiwan (Grant nos. MOST 107-2116-M-027-003 and MOST 106-2116-M-027-004).

References

  1. Ali A, Amin SE, Ramadan HH, Tolba MF (2012) Integration of neural network preprocessing model for OMI aerosol optical depth data assimilation. In: Advanced machine learning technologies and applications. AMLTA, Cairo, Egypt, December 8–10, pp 496–506Google Scholar
  2. Chen Y, Miao D, Wang R, Wu K (2011) A rough set approach to feature based on power set tree. Knowl Based Syst.  https://doi.org/10.1016/j.knosys.2010.09.004 CrossRefGoogle Scholar
  3. Gamberger D, Lavrac N, Groselj C (1999) Experiments with noise filtering in a medical domain. In: 16th international conference on machine learning, San Francisco, CA, pp 143–151Google Scholar
  4. Gulia A, Vohra R, Rani P (2014) Liver patient classification using intelligent techniques. Int J Comput Sci Inf Technol 5(4):5110–5511Google Scholar
  5. Karmaker A, Kwek S (2005) A boosting approach to remove class label noise (2005). In: Proceedings of the fifth international conference on hybrid intelligent systems, Washington, DC, USA, IEEE Computer Society, pp 206–211Google Scholar
  6. Khoshgoftaar TM, Zhong S, Joshi V (2005) Enhancing software quality estimation using ensemble-classifier based noise filtering. Intell Data Anal 9(1):3–27CrossRefGoogle Scholar
  7. Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Transit Pattern Anal Mach Intell 20(3):226–239CrossRefGoogle Scholar
  8. Kurgan LA, Cios KJ, Tadeusiewicz R, Ogiela M, Goodenday LS (2001) Knowledge discovery approach to automated cardiac SPECT diagnosis. Artif Intell Med 23:149–169CrossRefGoogle Scholar
  9. Maletic J, Marcus A (2000) Data cleansing. In: Proceedings of the conference on information quality, pp 100–499. http://www.sdml.info/papers/IQ2000. Accessed 13 Jul 2018
  10. Ong HC (2011) Improving classification in Bayesian networks using structural learning. Int J Math Comput Sci 5(3):403–407Google Scholar
  11. Orr K (1998) Data quality and systems theory. CACM 41(2):66–71.  https://doi.org/10.1145/269012.269023 CrossRefGoogle Scholar
  12. Oza NC (2004) Aveboost2: boosting for noisy data. In: Roli F, Kittler J, Windeatt T (eds) Multiple classifier systems. MCS 2004. Lecture Notes in Computer Science. Springer, Berlin, pp 31–40CrossRefGoogle Scholar
  13. Quinlan JR (1983) Learning from noisy data. In: Proceedings of the second international machine learning workshop, University of Illinois at Urbana–ChampaignGoogle Scholar
  14. Quinlan JR (1986) The effect of noise on concept learning. In: Machine learning, an artificial intelligence approach, vol II. Morgan Kaufmann, pp 149–166Google Scholar
  15. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo, pp 786–983Google Scholar
  16. Quinlan JR (1998) Induction of decision trees. Mach Learn 1(1):81–106.  https://doi.org/10.1023/A:1022643204877 CrossRefGoogle Scholar
  17. Schaffer C (1992) Sparse data and the effect of overfitting avoidance in decision tree induction. In: Proceedings of the tenth national conference on artificial intelligence (AAAI), San Jose, CA, pp 147–152Google Scholar
  18. Smita Roy S, Mondal, Ekbal A (2015) CRDT: correlation ratio based decision tree model for healthcare data mining. In: 2016 IEEE 16th international conference on bioinformatics and bioengineering (BIBE), pp 36–43Google Scholar
  19. Tomek I (1976) Two modifications of CNN. IEEE Trans Syst Man Commun 6:769–772MathSciNetzbMATHGoogle Scholar
  20. Van Hulse J, Khoshgoftaar TM (2006) Class noise detection using frequent itemsets. Intell Data Anal 10(6):487–507.  https://doi.org/10.3233/IDA-2006-10602 CrossRefGoogle Scholar
  21. Vijayarani S, Dhayanand S (2015) Liver disease prediction using SVM and Naïve Bayes algorithms. IJSETR 4:816–820Google Scholar
  22. Wilson D (1972) Asymptotic properties of nearest neighbor rules using edited data sets. IEEE Trans Syst Man Cybern 2(3):408–421.  https://doi.org/10.1109/TSMC.1972.4309137 CrossRefzbMATHGoogle Scholar
  23. Wolberg WH, Street WN, Mangasarian OL (1994) Machine learning techniques to diagnose breast cancer from fine-needle aspirates. Cancer US Natl Libr Med 77:163–171Google Scholar
  24. Wolberg WH, Street WN, Mangasarian OL (1995) Image analysis and machine learning applied to breast cancer diagnosis and prognosis. Anal Quant Cytol Histol 17(2):77–87Google Scholar
  25. Wu X (1995) Knowledge acquisition from databases. Greenwood Publishing Group Inc., Westport, pp 34–45Google Scholar
  26. Zhu X, Wu X (2004) Class noise vs attribute noise: a quantitative study of their impacts. Artif Intell Rev 22(3):177–210.  https://doi.org/10.1007/s10462-004-0751-8 MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Electrical EngineeringNational Taipei University of TechnologyTaipeiTaiwan

Personalised recommendations