Skip to main content
Log in

Efficient heuristics for learning scalable Bayesian network classifier from labeled and unlabeled data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Naive Bayes (NB) is one of the top ten machine learning algorithms whereas its attribute independence assumption rarely holds in practice. A feasible and efficient approach to improving NB is relaxing the assumption by adding augmented edges to the restricted topology of NB. In this paper we prove theoretically that the generalized topology may be a suboptimal solution to model multivariate probability distributions if its fitness to data cannot be measured. Thus we propose to apply log-likelihood function as the scoring function, then introduce an efficient heuristic search strategy to explore high-dependence relationships, and for each iteration the learned topology will be improved to fit data better. The proposed algorithm, called log-likelihood Bayesian classifier (LLBC), can respectively learn two submodels from labeled training set and individual unlabeled testing instance, and then make them work jointly for classification in the framework of ensemble learning. Our extensive experimental evaluations on 36 benchmark datasets from the University of California at Irvine (UCI) machine learning repository reveal that, LLBC demonstrates excellent classification performance and provides a competitive approach to learn from labeled and unlabeled data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability and Access

The data that support the findings of this study are available from the corresponding author, upon reasonable request.

Notes

  1. The source code of LLBC can be found in https://github.com/Wangjj1129/LLBC

  2. The used datasets can be found in https://archive.ics.uci.edu/ml/datasets.html.

References

  1. Pang Y, Zhao X, Yan H, Liu Y (2021) Data-driven trajectory prediction with weather uncertainties: A Bayesian deep learning approach. Transportation Research Part C-Emerging Technologies. 130:103326

    Google Scholar 

  2. Kong H, Shi X, Wang L, Liu Y, Mammadov M, Wang G (2021) Averaged tree-augmented one-dependence estimators. Appl Intell 51(7):4270–4286

    Google Scholar 

  3. Chen Z, Jiang L, Li C (2022) Label augmented and weighted majority voting for crowdsourcing. Inf Sci 606:397–409

    Google Scholar 

  4. Wang L, Zhang S, Mammadov M, Li K, Zhang X, Wu S (2021) Semi-supervised weighting for averaged one-dependence estimators. Appl Intell 52(4):4057–4073

    Google Scholar 

  5. Zhao X, Yan H, Hu Z, Du D (2022) Deep spatio-temporal sparse decomposition for trend prediction and anomaly detection in cardiac electrical conduction. IISE Transactions on Healthcare Systems Engineering. 12(2):150–164

    CAS  Google Scholar 

  6. Chickering DM, Heckerman D, Meek C (2004) Large-sample learning of Bayesian networks is NP-hard. J Mach Learn Res 5:1287–1330

    MathSciNet  Google Scholar 

  7. Zhang H, Jiang L, Zhang W, Li C (2023) Multi-view Attribute Weighted Naive Bayes. IEEE Trans Knowl Data Eng 35(7):7291–7302

    Google Scholar 

  8. Ren Y, Wang L, Li X, Pang M, Wei J (2022) Stochastic optimization for bayesian network classifiers. Appl Intell 52(13):15496–15516

    Google Scholar 

  9. Martinez AM, Webb GI, Chen S, Zaidi NA (2016) Scalable learning of Bayesian network classifiers. J Mach Learn Res 17(1):1515–1549

    MathSciNet  Google Scholar 

  10. Chen S, Zhang Z, Liu L (2021) Attribute Selecting in Tree-Augmented Naive Bayes by Cross Validation Risk Minimization. Mathematics. 9(20):2564

    Google Scholar 

  11. Sahami M (1996) Learning limited dependence Bayesian classifiers, In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. pp. 335–338

  12. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163

    Google Scholar 

  13. Friedman JH, Kohavi R, Yun Y (1996) Lazy decision trees, In: Proceedings of the 13th National Conference on Artificial Intelligence. pp. 717–724

  14. Gregory FC, Edward H (1992) A Bayesian Method for the Induction of Probabilistic Networks from Data. Mach Learn 9:309–347

    Google Scholar 

  15. David H, Dan G, David MC (1995) Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Mach Learn 20(3):197–243

    Google Scholar 

  16. Zhao X, lquebal A, Sun H, Yan H (2020) Simultaneous material microstructure classification and discovery via hidden Markov modeling of acoustic emission signals, In: 15th ASME International Manufacturing Science and Engineering Conference (MSEC). V002T07A035

  17. Silander T, Roos T, Kontkanen P, Myllymäki P (2008) Factorized Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures, In: Proceedings of the 4th European Workshop on Probabilistic Graphical Models. pp. 257–272

  18. Wang L, Li L, Li Q, Li K (2024) Learning high-dependence Bayesian network classifier with robust topology. Expert Syst Appl 239:122395

    Google Scholar 

  19. Zhang H, Jiang L, Yu L (2021) Attribute and instance weighted naive Bayes. Pattern Recogn 111:107674

    Google Scholar 

  20. Jiang L, Zhang L, Li C, Wu J (2019) A Correlation-based Feature Weighting Filter for Naive Bayes. IEEE Trans Knowl Data Eng 31(2):201–213

    Google Scholar 

  21. Pang Y, Zhao X, Hu J, Yan H, Liu Y (2022) Bayesian Spatio-Temporal grAph tRansformer network (B-STAR) for multi-aircraft trajectory prediction. Knowl-Based Syst 249:108998

    Google Scholar 

  22. Parag KV, Donnelly CA (2020) Adaptive Estimation for Epidemic Renewal and Phylogenetic Skyline Models. Syst Biol 69(6):1163–1179

    PubMed  PubMed Central  Google Scholar 

  23. de Campos LM (2006) A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J Mach Learn Res 7:2149–2187

    MathSciNet  Google Scholar 

  24. Jiang L, Cai Z, Wang D, Zhang H (2012) Improving Tree augmented Naive Bayes for class probability estimation. Knowl-Based Syst 26:239–245

    CAS  Google Scholar 

  25. Zhao X, Yan H, Liu Y (2021) Hierarchical tree-based sequential event prediction with application in the aviation accident report, In: IEEE 37th International Conference on Data Engineering(ICDE). pp.1925–1930

  26. Jiang L, Zhang L, Yu L, Wang D (2019) Class-specific attribute weighted naive Bayes. Pattern Recogn 88:321–330

    ADS  Google Scholar 

  27. Madden MG (2009) On the classification performance of TAN and general Bayesian networks. Knowl-Based Syst 22(7):489– 495

  28. Pernkopf F, O’Leary P (2003) Floating search algorithm for structure learning of Bayesian network classifiers. Pattern Recgnition Letters. 24(15):2839–2848

    ADS  Google Scholar 

  29. de Campos CP, Corani G, Scanagatta M, Cuccu M, Zaffalon M (2016) Learning extended tree augmented naive structures. Int J Approximate Reasoning 68:153–163

    MathSciNet  Google Scholar 

  30. Kong H, Wang L (2023) Flexible model weighting for one-dependence estimators based on point-wise independence analysis. Pattern Recogn 139:109473

    Google Scholar 

  31. Breiman L (1996) Bagging Predictors. Mach Learn 24:123–140

    Google Scholar 

  32. Verma B, Rahman A (2012) Cluster-Oriented Ensemble Classifier: Impact of Multicluster Characterization on Ensemble Classifier Learning. IEEE Trans Knowl Data Eng 24(4):605–618

    Google Scholar 

  33. Schapire RE (1990) The Strength of Weak Learnability. Mach Learn 5(2):197–227

    Google Scholar 

  34. Webb GI, Boughton JR, Wang Z (2005) Not so naive Bayes: Aggregating one-dependence estimators. Mach Learn 58:5–24

    Google Scholar 

  35. Jiang L, Zhang H, Cai Z, Wang D (2012) Weighted average of one-dependence estimators. Journal of Experimental & Theoretical Artificial Intelligence. 24(2):219–230

    Google Scholar 

  36. Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning, In: Proceedings of the 13th International Joint Conference on Artificial Intelligence. pp. 1022–1027

  37. Gigerenzer G, Brighton H (2009) Homo Heuristicus: Why Biased Minds Make Better Inferences. Top Cogn Sci 1(1):107–143

    PubMed  Google Scholar 

  38. Thomas GD, Richard HL, Tomás L (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1–2):31–71

    Google Scholar 

  39. Wang L, Xie Y, Pang M, Wei J (2022) Alleviating the attribute conditional independence and I.I.D. assumptions of averaged one-dependence estimator by double weighting, Knowledge-Based Systems. 250:109078

  40. Frank E, Mark H, Bernhard P (2003) Locally weighted naive bayes, In: Proceedings of the Conference on Uncertainty in Artificial Intelligence. pp. 249–256

  41. Bhattacharjee K, Pant M, Zhang YD, Satapathy SC (2020) Multiple Instance Learning with Genetic Pooling for medical data analysis. Pattern Recogn Lett 133:247–255

    ADS  Google Scholar 

  42. Park SH, Fuernkranz J (2014) Efficient implementation of class-based decomposition schemes for naive bayes. Mach Learn 96(3):295–309

    MathSciNet  Google Scholar 

  43. Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers, In: Proceedings of the 18th International Conference on Machine Learning. pp.609–616

  44. Wang L, Zhou J, Wei J, Pang M, Sun M (2022) Learning causal Bayesian networks based on causality analysis for classification. Eng Appl Artif Intell 114:105212

    Google Scholar 

  45. Duan Z, Wang L, Chen S, Sun M (2020) Instance-based weighting filter for superparent one-dependence estimators. Knowl-Based Syst 203:106085

    Google Scholar 

  46. Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-one loss functions, In: Proceedings of the 13th International Conference on Machine Learning. pp. 275–283

  47. Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy. International Journal of Forecast. 22(4):679–688

    Google Scholar 

  48. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Google Scholar 

  49. Demšar J (2006) Statistical comparisons of classifiers over multiple datasets. J Mach Learn Res 7(1):1–30

    MathSciNet  Google Scholar 

  50. Luca M, Barlacchi G, Lepri B, Pappalardo L (2022) A Survey on Deep Learning for Human Mobility. ACM Comput Surv 55(1):7

    Google Scholar 

  51. Brauwers G, Frasincar F (2023) A General Survey on Attention Mechanisms in Deep Learning. IEEE Trans Knowl Data Eng 35(4):3279–3298

    Google Scholar 

  52. Han C, Pan S, Que W, Wang Z, Zhai Y, Shi L (2022) Automated localization and severity period prediction of myocardial infarction with clinical interpretability based on deep learning and knowledge graph. Expert Syst Appl 209:118398

    Google Scholar 

  53. Tamasauskaite G, Groth P (2023) Defining a Knowledge Graph Development Process Through a Systematic Review. ACM Transactions on Software Engineering and Methodology. 32(1):27

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China (No. 2019YFC1804804), Open Research Project of the Hubei Key Laboratory of Intelligent Geo-Information Processing (No. KLIGIP-2021A04), and the Scientific and Technological Developing Scheme of Jilin Province (No. 20200201281JC).

Author information

Authors and Affiliations

Authors

Contributions

Limin Wang: Methodology, Supervision, Writing-review & editing, Funding acquisition. Junjie Wang: Conceptualization, Validation, Visualization, Writing-original draft. Lu Guo: Formal analysis, Project administration. Qilong Li: Software, Investigation.

Corresponding author

Correspondence to Junjie Wang.

Ethics declarations

competing interests

The authors declare that they have no conflict of interest.

Ethical and informed consent for data used

This study does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 8 The experiment results of ZOL
Table 9 The experiment results of bias
Table 10 The experiment results of variance
Table 11 The experiment results of RMSE

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, L., Wang, J., Guo, L. et al. Efficient heuristics for learning scalable Bayesian network classifier from labeled and unlabeled data. Appl Intell 54, 1957–1979 (2024). https://doi.org/10.1007/s10489-023-05242-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05242-8

Keywords

Navigation