Skip to main content
Log in

Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Naïve Bayes (NB) is a well-known probabilistic classification algorithm. It is a simple but efficient algorithm with a wide variety of real-world applications, ranging from product recommendations through medical diagnosis to controlling autonomous vehicles. Due to the failure of real data satisfying the assumptions of NB, there are available variations of NB to cater general data. With the unique applications for each variation of NB, they reach different levels of accuracy. This manuscript surveys the latest applications of NB and discusses its variations in different settings. Furthermore, recommendations are made regarding the applicability of NB while exploring the robustness of the algorithm. Finally, an attempt is given to discuss the pros and cons of NB algorithm and some vulnerabilities, with related computing code for implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Al-Aidaroos K, Bakar A, Othman Z (2012) Medical data classification with Naive Bayes approach. Inf Technol J 11(9):1166–1174

    Article  Google Scholar 

  • Anderson HS, Kharkar A, Filar B, Roth P (2017) Evading machine learning malware detection. Black Hat, London

    Google Scholar 

  • Arar ÖF, Ayan K (2017) A feature dependent Naive Bayes approach and its application to the software defect prediction problem. Appl Soft Comput 59:197–209

    Article  Google Scholar 

  • Carbin M, Rinard MC (2010) Automatically identifying critical input regions and code in applications. In: Proceedings of the 19th international symposium on software testing and analysis. ACM, pp 37–48

  • Carvajal G, Roser DJ, Sisson SA, Keegan A, Khan SJ (2015) Modelling pathogen log10 reduction values achieved by activated sludge treatment using naïve and semi naïve bayes network models. Water Res 85:304–315

    Article  Google Scholar 

  • Catal C, Diri B (2009) A systematic review of software fault prediction studies. Expert Syst Appl 36(4):7346–7354

    Article  Google Scholar 

  • Chaba S, Kumar R, Pant R, Dave M (2017) Malware detection approach for android systems using system call logs. arXiv:1709.08805

  • Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3(4):261–283

  • Cover TM, Thomas JA (2012) Elements of information theory. Wiley, Hoboken

    MATH  Google Scholar 

  • Danglot B, Preux P, Baudry B, Monperrus M (2018) Correctness attraction: a study of stability of software behavior under runtime perturbation. Empir Softw Eng 23(4):2086–2119

    Article  Google Scholar 

  • Devasia T, Vinushree T, Hegde V (2016) Prediction of students performance using educational data mining. In: 2016 International conference on data mining and advanced computing (SAPIENCE). IEEE, pp 91–95

  • Dhamodharan S (2014) Liver disease prediction using Bayesian classification. In: 4th national conference on advanced computing, applications & technologies, pp 1–3

  • Domingos P, Pazzani M (1996) Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Saitta L (ed) Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, pp 105–112

  • Elkan C (1997) Boosting and Naive Bayesian learning. In: Proceedings of the international conference on knowledge discovery and data mining

  • Ferreira J, Denison D, Hand D (2001) Weighted Naive Bayes modelling for data mining. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.29.1176&rep=rep1&type=pdf

  • Flores MJ, Gámez JA, Martínez AM (2014) Domains of competence of the semi-naive Bayesian network classifiers. Inf Sci 260:120–148

    Article  MathSciNet  Google Scholar 

  • Frank E, Hall M, Pfahringer B (2002) Locally weighted naive bayes. In: Proceedings of the nineteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, pp 249–256

  • Friedman N, Goldszmidt M (1996) Building classifiers using Bayesian networks. In: Proceedings of the national conference on artificial intelligence, pp 1277–1284

  • Gammerman A, Thatcher A (1991) Bayesian diagnostic probabilities without assuming independence of symptoms. Methods Inf Med 30(01):15–22

    Article  Google Scholar 

  • Garg A, Roth D (2001) Understanding probabilistic classifiers. In: European conference on machine learning. Springer, pp 179–191

  • Geigel A (2013) Neural network trojan. J Comput Secur 21(2):191–232

    Article  Google Scholar 

  • Geigel A (2014) Unsupervised learning trojan. Ph.D. thesis

  • Hall M (2006) A decision tree-based attribute weighting filter for Naive Bayes. In: International conference on innovative techniques and applications of artificial intelligence. Springer, pp 59–70

  • Hand D (1992) Statistical methods in diagnosis. Stat Methods Med Res 1(1):49–67

    Article  Google Scholar 

  • Hand DJ, Yu K (2001) Idiot’s bayes-not so stupid after all? Int Stat Rev 69(3):385–398

    MATH  Google Scholar 

  • He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190

    Article  Google Scholar 

  • Hilden J, Bjerregaard B (1976) Computer-aided diagnosis and the atypical case. Decision making and medical care: can information science help, pp 365–378

  • Jiang L (2011) Random one-dependence estimators. Pattern Recognit Lett 32(3):532–539

    Article  Google Scholar 

  • Jiang L, Cai Z, Zhang H, Wang D (2013) Naive bayes text classifiers: a locally weighted learning approach. J Exp Theor Artif Intell 25(2):273–286

    Article  Google Scholar 

  • Jiang L, Wang D, Cai Z (2012) Discriminatively weighted Naive Bayes and its application in text classification. Int J Artif Intell Tools 21(01):1250007

    Article  Google Scholar 

  • Jin W, Shi R, Chua TS (2004) A semi-naive Bayesian method incorporating clustering with pair-wise constraints for auto image annotation. In: Proceedings of the 12th annual ACM international conference on multimedia. ACM, pp 336–339

  • John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence

  • Kalutarage HK, Nguyen HN, Shaikh SA (2017) Towards a threat assessment framework for apps collusion. Telecommun Syst 66(3):417–430

    Article  Google Scholar 

  • Kalutarage HK, Shaikh SA, Wickramasinghe IP, Zhou Q, James AE (2015) Detecting stealthy attacks: efficient monitoring of suspicious activities on computer networks. Comput Electr Eng 47:327–344

    Article  Google Scholar 

  • Kohavi R (1996) Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: Kdd, vol 96. Citeseer, pp 202–207

  • Kononenko I (1991) Semi-naive Bayesian classifier. In: European working session on learning. Springer, pp 206–219

  • Kuncheva LI (2006) On the optimality of Naive Bayes with dependent binary features. Pattern Recognit Lett 27(7):830–837

    Article  Google Scholar 

  • Langley P, Iba W, Thompson K et al (1992) An analysis of Bayesian classifiers. AAAI 90:223–228

    Google Scholar 

  • Langley P, Sage S (1994) Induction of selective bayesian classifers. In: Proceedings of the Tenth Conference on Uncertainty in Articial Intelligence Uncertainty. Morgan Kaufmann, Seattle, WA, pp 399–406

  • Lee CH, Gutierrez F, Dou D (2011) Calculating feature weights in Naive Bayes with Kullback–Leibler measure. In: 2011 IEEE 11th international conference on data mining. IEEE, pp 1146–1151

  • Liu X, Lu R, Ma J, Chen L, Qin B (2015) Privacy-preserving patient-centric clinical decision support system on Naive Bayesian classification. IEEE J Biomed Health Informatics 20(2):655–668

    Article  Google Scholar 

  • Lowd D, Meek C (2005) Good word attacks on statistical spam filters. In: CEAS, vol 2005

  • Lv Z, Li X (2015) Virtual reality assistant technology for learning primary geography. In: International conference on web-based learning. Springer, pp 31–40

  • Mani S, Pazzani MJ, West J (1997) Knowledge discovery from a breast cancer database. In: Conference on artificial intelligence in medicine in Europe. Springer, pp 130–133

  • Marucci-Wellman HR, Lehto MR, Corns HL (2015) A practical tool for public health surveillance: semi-automated coding of short injury narratives from large administrative databases using naïve bayes algorithms. Accid Anal Prev 84:165–176

    Article  Google Scholar 

  • Menzies T, Greenwald J, Frank A (2006) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13

    Article  Google Scholar 

  • Nafea IT (2018) Machine learning in educational technology. In: Machine learning-advanced techniques and emerging applications. pp 175–183

  • Nordyke RA, Kulikowski CA, Kulikowski CW (1971) A comparison of methods for the automated diagnosis of thyroid dysfunction. Comput Biomed Res 4(4):374–389

    Article  Google Scholar 

  • Ohmann C, Moustakis V, Yang Q, Lang K, Group AAPS et al (1996) Evaluation of automatic knowledge acquisition techniques in the diagnosis of acute abdominal pain. Artif Intell Med 8(1):23–36

    Article  Google Scholar 

  • Pattekari SA, Parveen A (2012) Prediction system for heart disease using Naïve Bayes. Int J Adv Comput Math Sci 3(3):290–294

    Google Scholar 

  • Pazzani MJ (1996) Searching for dependencies in Bayesian classifiers. In: Learning from data. Springer, pp 239–248

  • Provan GM, Singh M (1996) Learning Bayesian networks using feature selection. In: Learning from Data. Springer, New York, NY, pp 291–300

  • Queiroz R, Berger T, Czarnecki K (2016) Towards predicting feature defects in software product lines. In: Proceedings of the 7th international workshop on feature-oriented software development. ACM, pp 58–62

  • Rasmussen CE (2003) Gaussian processes in machine learning. In: Summer school on machine learning. Springer, pp 63–71

  • Ravi C, Manoharan R (2012) Malware detection using windows api sequence and machine learning. Int J Comput Appl 43(17):12–16

    Google Scholar 

  • Razaque F, Soomro N, Shaikh SA, Soomro S, Samo JA, Kumar N, Dharejo H (2017) Using naïve bayes algorithm to students’ bachelor academic performances analysis. In: 2017 4th IEEE International Conference on Engineering Technologies and Applied Sciences (ICETAS). IEEE, pp 1–5

  • Rennie JD, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 616–623

  • Rish I et al (2001) An empirical study of the Naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3, pp 41–46

  • Robles V, Larrañaga P, Peña J, Pérez M, Menasalvas E, Herves V (2003) Bayesian netwoks as consensed voting system in the construction of a multi-classifier for protein secondary structure prediction. Artif Intell Med

  • Russek E, Kronmal RA, Fisher LD (1983) The effect of assuming independence in applying bayes’ theorem to risk estimation and classification in diagnosis. Comput Biomed Res 16(6):537–552

    Article  Google Scholar 

  • Sayfullina L, Eirola E, Komashinsky D, Palumbo P, Miche Y, Lendasse A, Karhunen J (2015) Improved naive bayes classifier for android malware classification. In: The Proceedings of the 14th IEEE international conference on trust, security and privacy in computing and communications (IEEE TrustCom’15)(Aug. 2015). IEEE

  • Settouti N, Bechar MEA, Chikh MA (2016) Statistical comparisons of the top 10 algorithms in data mining for classification task. Int J Interact Multimed Artif Intell 4(1):46–51

    Google Scholar 

  • Shang F, Li Y, Deng X, He D (2018) Android malware detection method based on Naive Bayes and permission correlation algorithm. Cluster Comput 21(1):955–966

    Article  Google Scholar 

  • Titterington D, Murray G, Murray L, Spiegelhalter D, Skene A, Habbema J, Gelpke G (1981) Comparison of discrimination techniques applied to a complex data set of head injured patients. J R Stat Soc Ser A (Gen) 144(2):145–161

    Article  MathSciNet  Google Scholar 

  • Todd Ba, Stamper R (1994) The relative accuracy of a variety of medical diagnostic programs. Methods Inf Med 33(04):402–416

    Article  Google Scholar 

  • Vembandasamy K, Sasipriya R, Deepa E (2015) Heart diseases detection using Naive Bayes algorithm. Int J Innov Sci Eng Technol 2(9):441–444

    Google Scholar 

  • Veni S, Srinivasan A (2017) Defect classification using Naïve Bayes classification. Int J Appl Eng Res 12(22):12693–12700

    Google Scholar 

  • Vijayarani S, Dhayanand S (2015) Liver disease prediction using svm and Naïve Bayes algorithms. Int J Sci Eng Technol Res (IJSETR) 4(4):816–820

    Google Scholar 

  • Vinod P, Jaipur R, Laxmi V, Gaur M (2009) Survey on malware detection methods. In: Proceedings of the 3rd Hackers’ workshop on computer and internet security (IITKHACK’09), pp 74–79

  • Wickramasinghe I, Kalutarage H (2020) Naive Bayes: applications, variations and vulnerabilities—a review of literature with code snippets for implementation. https://github.com/HarshaKumaraKalutarage/Naive-Bayes-Applications-and-Vulnerabilities

  • Wittel GL, Wu SF (2004) On attacking statistical spam filters. In: CEAS

  • Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Philip SY et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37

    Article  Google Scholar 

  • Xiao H (2017) Adversarial and secure machine learning. Ph.D. thesis, Technische Universit”AT M” Unchen

  • Yukselturk E, Ozekes S, Türel YK (2014) Predicting dropout student: an application of data mining methods in an online education program. Eur J Open Distance e-Learn 17(1):118–133

    Article  Google Scholar 

  • Zaidi NA, Cerquides J, Carman MJ, Webb GI (2013) Alleviating Naive Bayes attribute independence assumption by attribute weighting. J Mach Learn Res 14(1):1947–1988

    MathSciNet  MATH  Google Scholar 

  • Zhang H (2004) The optimality of Naive Bayes. AA 1(2):3

    Google Scholar 

  • Zhang H, Sheng S (2004) Learning weighted Naive Bayes with accurate ranking. In: Fourth IEEE international conference on data mining (ICDM’04). IEEE, pp 567–570

  • Zheng F, Webb GI, Suraweera P, Zhu L (2012) Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning. Mach Learn 87(1):93–125

    Article  MathSciNet  Google Scholar 

  • Zheng Z, Webb GI, Ting KM (1999) Lazy bayesian rules: A lazy semi-naive bayesian learning technique competitive to boosting decision trees. In: Proceedings of 16th international conference on machine learning. Citeseer

Download references

Acknowledgements

This work has been partly funded by Burroughs Wellcome Fund, and we are thankful to the funder for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Indika Wickramasinghe.

Ethics declarations

Conflict of interest

Author Indika Wickramasinghe declares that he has no conflict of interest. Author Harsha Kalutarage declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wickramasinghe, I., Kalutarage, H. Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation. Soft Comput 25, 2277–2293 (2021). https://doi.org/10.1007/s00500-020-05297-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-020-05297-6

Keywords

Navigation