Abstract
Bayesian network classifier (BNC) allows efficient and effective inference under condition of uncertainty for classification, and it depicts the interdependencies among random variables using directed acyclic graph (DAG). However, learning an optimal BNC is NP-hard, and complicated DAGs may lead to biased estimates of multivariate probability distributions and subsequent degradation in classification performance. In this study, we suggest using the entropy function as the scoring metric, and then apply greedy search strategy to improve the fitness of learned DAG to training data at each iteration. The proposed algorithm, called One\(+\) Bayesian Classifier (O\(^{+}\)BC), can represent high-dependence relationships in its robust DAG with a limited number of directed edges. We compare the performance of O\(^{+}\)BC with other six state-of-the-art single and ensemble BNCs. The experimental results reveal that O\(^{+}\)BC demonstrates competitive or superior performance in terms of zero-one loss, bias-variance decomposition, Friedman and Nemenyi tests.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are available from the corresponding author, upon reasonable request.
References
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. Artif Intell 17(3):37–37
Yukselturk E, Ozekes S, Türel YK (2014) Predicting dropout student: an application of data mining methods in an online education program. Eur J Open, Dist E-learning 17(1):118–133
Wang L, Xie Y, Pang M, Wei J (2022) Alleviating the attribute conditional independence and I.I.D. assumptions of averaged one-dependence estimator by double weighting. Knowl-Based Syst 250:109078
Wu H, Yan G, Xu D (2014) Developing vehicular data cloud services in the IoTenvironment. IEEE Trans Ind Inform 10(2):1587–1595
Peña-Ayala A (2014) Educational data mining: a survey and a data mining-based analysis of recent works. Expert Syst Appl 41(4):1432–1462
Jiang L, Zhang L, Yu L, Wang D (2019) Class-specific attribute weighted naive Bayes. Pattern Recognit 88:321–330
Ren Y, Wang L, Li X, Peng M, Wei J (2022) Stochastic optimization for bayesian network classifiers. Appl Intell 52(13):15496–15516
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of Plausible Inference. Morgan Kaufmann
Wang L, Zhang S, Mammadov M, Li K, Zhang X (2021) Semi-supervised weighting for averaged one-dependence estimators. Appl Intell 52(4):4057–4073
Zhang H, Petitjean F, Buntine W (2020) Bayesian network classifiers using ensembles and smoothing. Knowl Inf Syst 62(9):3457–3480
Jiang L, Li C, Wang S, Zhang L (2016) Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell 52:26–39
Kong H, Shi X, Wang L (2021) Averaged tree-augmented onedependence estimators. Appl Intell 51(7):4270–4286
Zhang H, Jiang L, Li C (2022) Attribute augmented and weighted naive Bayes. Sci China Inf Sci 65(12):222101
Chickering DM (1996) Learning Bayesian networks is NP-complete. Learn Data: Artif Intell Stat V:121-130
Wang L, Zhou J, Wei J, Pang M, Sun M (2022) Learning causal Bayesian networks based on causality analysis for classification. Eng Appl Artif Intell 114:105212
Jiang L, Zhang H, Cai Z (2008) A novel Bayes model: hidden naive bayes. IEEE Trans Knowl Data Eng 21(10):1361–1371
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2):131–163
Cover TM, Thomas JA (2006) Elements of Information Theory. Wiley-Interscience
Martínez AM, Webb GI, Chen S, Zaidi NA (2012) Scalable learning of Bayesian network classifiers. J Mach Learn Res 17(1):1515–1549
Zhao X, Yan H, Hu Z, Du D (2022) Deep spatio-temporal sparse decomposition for trend prediction and anomaly detection in cardiac electrical conduction. IISE Trans Healthc Syst Eng 12(2):150–164
Jiang L, Zhang L, Li C, Wu J (2019) A correlation-based feature weighting filter for naive Bayes. IEEE Trans Knowl Data Eng 31:201–213
Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In: Proceedings of AAAI conference on artificial intelligence, pp 223–228
Pang Y, Zhao X, Hu J, Yan H, Liu Y (2022) Bayesian spatio-temporal graph transformer network (b-star) for multi-aircraft trajectory prediction. Knowl-Based Syst 249:108998
Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29(2):103–130
Kononenko I (1991) Semi-naive Bayesian classifier. In: Machine learning-EWSL-91: European working session on learning porto, pp 206–219
Jiang L, Cai Z, Wang D, Zhang H (2012) Improving Tree augmented Naive Bayes for class probability estimation. Knowl-Based Syst 26:239–245
Peng F, Schuurmans D, Wang S (2004) Augmenting naive bayes classifiers with statistical language models. Inf Retr 7(3–4):317–345
Sahami M (1996) Learning limited dependence Bayesian classifiers. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 335–338
Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 5(1):3– 55
Wang L, Zhang X, Li K, Zhang S (2022) Semi-supervised learning for k-dependence Bayesian classifiers. Appl Intell 52(4):3604–3622
Jiang L, Zhang H, Cai Z, Wang D (2012) Weighted average of one-dependence estimators. J Exp Theor Artif Intell 24(2):219– 230
Duan Z, Wang L, Chen S, Sun M (2020) Instance-based weighting filter for superparent one-dependence estimators. Knowl-Based Syst 203:106085
Akaike H (1974) A New Look at the Statistical Model Identification. IEEE Trans Autom Control 19:716–723
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–465
Suzuki J (1999) Learning Bayesian belief networks based on the MDL principle: an efficient algorithm using the branch and bound technique. IEICE Trans Inf Syst 82(2):356–367
Korb KB, Nicholson AE (2004) Bayesian artificial intelligence. Chapman and Hall
Kong H, Wang L (2023) Flexible model weighting for one-dependence estimators based on point-wise independence analysis. Pattern Recognit 139:109473
Sun H (2020) Simultaneous material microstructure classification and discovery using acoustic emission signals. Arizona State University
Heckerman D (1998) A tutorial on learning Bayesian networks. Springer, Netherlands
Liu Y, Wang L, Mammadov M, Chen S, Wang G, Qi S, Sun M (2021) Hierarchical independence thresholding for learning Bayesian network classifiers. Knowl-Based Syst 212:106627
Zhao X, Yan H, Liu Y (2021) Hierarchical tree-based sequential event prediction with application in the aviation accident report. In: 2021 IEEE 37th international conference on data engineering (ICDE), pp 1925–1930
Wang L, Chen S, Mammadov M (2018) Target learning: a novel framework to mine significant dependencies for unlabeled data. In: Proceedings of the 22nd Pacific-Asia conference on knowledge discovery and data mining, pp 06–117
Pang Y, Zhao X, Yan H, Liu Y (2021) Data-driven trajectory prediction with weather uncertainties: a Bayesian deep learning approach. Transp Res C: Emerg Technol 130:103326
Bache K, Lichman M, UCI machine learning repository, Available online: https://archive.ics.uci.edu/ml/datasets.html
Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th international joint conference on artificial intelligence, pp 1022–1029
Zheng F, Webb GI, Suraweera P, Zhu L (2012) Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning. Mach Learn 87(1):93–125
Fisher RA (1970) Statistical methods for research workers. Breakthroughs in statistics: Methodology and distribution 66–70
Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the 13th international conference on machine learning, pp 275–283
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Nemenyi PB (1963) Distribution-free multiple comparisons, Princeton University
Demšar J (2006) Statistical comparisons of classifiers over multiple datasets. J Mach Learn Res 7:1–30
Acknowledgements
This work is supported by the National Key Research and Development Program of China (No.2019YFC1804804), Open Research Project of The Hubei Key Laboratory of Intelligent Geo-Information Processing (No.KLIGIP-2021A04), and the Scientific and Technological Developing Scheme of Jilin Province (No.20200201281JC).
Author information
Authors and Affiliations
Contributions
Lanni Wang: Conceptualization, Validation, Visualization, Writing - original draft. Limin Wang: Methodology, Supervision, Writing - review & editing, Funding acquisition. Lu Guo: Formal analysis, Project administration. Qilong Li: Software, Investigation. Xiongfei Li: Writing - review & editing, Validation.
Corresponding author
Ethics declarations
Competing Interest
The authors declare that they have no conflict of interest.
Ethical and Informed Consent for Data Used
This study does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, L., Wang, L., Guo, L. et al. Exploring complex multivariate probability distributions with simple and robust bayesian network topology for classification. Appl Intell 53, 29799–29817 (2023). https://doi.org/10.1007/s10489-023-05098-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05098-y