Multi-dimensional Bayesian network classifiers: A survey

Abstract

Multi-dimensional classification is a cutting-edge problem, in which the values of multiple class variables have to be simultaneously assigned to a given example. It is an extension of the well known multi-label subproblem, in which the class variables are all binary. In this article, we review and expand the set of performance evaluation measures suitable for assessing multi-dimensional classifiers. We focus on multi-dimensional Bayesian network classifiers, which directly cope with multi-dimensional classification and consider dependencies among class variables. A comprehensive survey of this state-of-the-art classification model is offered by covering aspects related to their learning and inference process complexities. We also describe algorithms for structural learning, provide real-world applications where they have been used, and compile a collection of related software.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    This is a simplification taken from Read et al. (2013) to facilitate discussion of the problem complexity. Actually, we will see later that each class variable can take a different number of values.

  2. 2.

    A graph is said to be maximal connected if there is a path between every pair of vertices in its undirected version (Bielza et al. 2011).

  3. 3.

    Note that we have modified the term \(r_s = \sum _{j=1}^{d} |\Omega _{C_j}|\) of Fernandes et al. (2013) by d in the denominator of the equation in order to correctly normalize the score to lie between 0 and 1.

  4. 4.

    The popular approach to handle concept drifts named ensemble learning consists of combining the predictions of a set of individual classifiers, the so-called ensemble, in order to predict new incoming examples. A comprehensive review of ensemble approaches for data stream analysis was conducted by Krawczyk et al. (2017).

  5. 5.

    http://mulan.sourceforge.net/datasets-mlc.html.

  6. 6.

    http://palm.seu.edu.cn/zhangml/files/MLNB.rar.

  7. 7.

    http://www.sc.ehu.es/ccwbayes/members/jafernandes/files/Multi-dimensional_Pre-processing.zip.

  8. 8.

    https://github.com/jacintoArias/academic-FMC.

  9. 9.

    https://github.com/marcobb8/tr_bn.

  10. 10.

    https://github.com/ComputationalIntelligenceGroup/MBCTree.

References

  1. Abdelbar AM, Hedetniemi SM (1998) Approximating MAPs for belief networks is NP-hard and other theorems. Artif Intell 102(1):21–38

    MathSciNet  MATH  Google Scholar 

  2. Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010a) Local causal and Markov blanket induction for causal discovery and feature selection for classification. Part I: Algorithms and empirical evaluation. J Mach Learn Res 11:171–234

    MathSciNet  MATH  Google Scholar 

  3. Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010b) Local causal and Markov blanket induction for causal discovery and feature selection for classification. Part II: Analysis and extensions. J Mach Learn Res 11:235–284

    MathSciNet  MATH  Google Scholar 

  4. Antonucci A, Corani G, Mauá D, Gabaglio S (2013) An ensemble of Bayesian networks for multilabel classification. In: Proceedings of the 23rd international joint conference on artificial intelligence, AAAI Press, pp 1220–1225

  5. Arias J, Gámez JA, Nielsen TD, Puerta JM (2016) A scalable pairwise class interaction framework for multidimensional classification. Int J Approx Reason 68:194–210

    MATH  Google Scholar 

  6. Arnborg S, Corneil DG, Proskurowski A (1987) Complexity of finding embeddings in a k-tree. SIAM J Alg Discrete Methods 8(2):277–284

    MathSciNet  MATH  Google Scholar 

  7. Benjumeda M, Bielza C, Larrañaga P (2018) Tractability of most probable explanations in multidimensional Bayesian network classifiers. Int J Approx Reason 93:74–87

    MathSciNet  MATH  Google Scholar 

  8. Bielza C, Larrañaga P (2014) Discrete Bayesian network classifiers: A survey. ACM Comput Surv 47(1):5

    MATH  Google Scholar 

  9. Bielza C, Li G, Larrañaga P (2011) Multi-dimensional classification with Bayesian networks. Int J Approx Reason 52(6):705–727

    MathSciNet  MATH  Google Scholar 

  10. Blanco R, Inza I, Merino M, Quiroga J, Larrañaga P (2005) Feature selection in Bayesian classifiers for the prognosis of survival of cirrhotic patients treated with TIPS. J Biomed Inform 38(5):376–388

    Google Scholar 

  11. Bolt JH, van der Gaag LC (2017) Balanced sensitivity functions for tuning multi-dimensional Bayesian network classifiers. Int J Approx Reason 80:361–376

    MathSciNet  MATH  Google Scholar 

  12. Borchani H, Bielza C, Larrañaga P (2010) Learning CB-decomposable multi-dimensional Bayesian network classifiers. In: Proceedings of the 5th European workshop on probabilistic graphical models, pp 25–32

  13. Borchani H, Bielza C, Martínez-Martín P, Larrañaga P (2012) Markov blanket-based approach for learning multi-dimensional Bayesian network classifiers: An application to predict the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson’s Disease Questionnaire (PDQ-39). J Biomed Inform 45(6):1175–1184

    Google Scholar 

  14. Borchani H, Bielza C, Toro C, Larrañaga P (2013) Predicting human immunodeficiency virus inhibitors using multi-dimensional Bayesian network classifiers. Artif Intell Med 57(3):219–229

    Google Scholar 

  15. Borchani H, Larrañaga P, Gama J, Bielza C (2016) Mining multi-dimensional concept-drifting data streams using Bayesian network classifiers. Intell Data Anal 20(2):257–280

    Google Scholar 

  16. Bouckaert RR (1992) Optimizing causal orderings for generating DAGs from data. In: Proceedings of the 8th conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc, pp 9–16

  17. Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771

    Google Scholar 

  18. Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78(1):1–3

    Google Scholar 

  19. Buntine W (1991) Theory refinement on Bayesian networks. In: Proceedings of the 7th conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc, pp 52–60

  20. Charte F, Charte D (2015) Working with multilabel datasets in R: The mldr package. R J 7(2):149–162

    Google Scholar 

  21. Charte F, Rivera AJ, Charte D, del Jesus MJ, Herrera F (2018) Tips, guidelines and tools for managing multi-label datasets: The mldr.datasets R package and the Cometa data repository. Neurocomputing 289:68–85

    Google Scholar 

  22. Cheng W, Hühn J, Hüllermeier E (2009) Decision tree and instance-based learning for label ranking. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 161–168

  23. Chow C, Liu C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14(3):462–467

    MATH  Google Scholar 

  24. Chu YJ, Liu TH (1965) On the shortest arborescence of a directed graph. Sci Sinica 14:1396–1400

    MathSciNet  MATH  Google Scholar 

  25. Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9(4):309–347

    MATH  Google Scholar 

  26. Corani G, Antonucci A, Mauá DD, Gabaglio S (2014) Trading off speed and accuracy in multilabel classification. In: Proceedings of the 7th European workshop on probabilistic graphical models, Lecture Notes in Artificial Intelligence, Springer, pp 145–159

  27. Dawid AP (1992) Applications of a general propagation algorithm for probabilistic expert systems. Stat Comput 2(1):25–36

    Google Scholar 

  28. Dean T, Kanazawa K (1989) A model for reasoning about persistence and causation. Comput Intell 5(2):142–150

    Google Scholar 

  29. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197

    Google Scholar 

  30. Dechter R (1999) Bucket elimination: A unifying framework for reasoning. Artif Intell 113(1–2):41–85

    MathSciNet  MATH  Google Scholar 

  31. Dechter R, Rish I (1997) A scheme for approximating probabilistic inference. In: Proceedings of the 13th conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc, pp 132–141

  32. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  33. Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th international joint conference on artificial intelligence, pp 1022–1027

  34. Fernandes JA, Lozano JA, Inza I, Irigoien X, Pérez A, Rodríguez JD (2013) Supervised pre-processing approaches in multiple class variables classification for fish recruitment forecasting. Environ Modell Softw 40:245–254

    Google Scholar 

  35. Fernandez-Gonzalez P, Bielza C, Larrañaga P (2015) Multidimensional classifiers for neuroanatomical data. In: ICML Workshop on statistics, machine learning and neuroscience (Stamlins 2015)

  36. Frank E, Hall M (2001) A simple approach to ordinal classification. In: Proceedings of the 12th European conference on machine learning, Lecture Notes in Artificial Intelligence, Springer, pp 145–156

  37. Friedman N (1997) Learning belief networks in the presence of missing values and hidden variables. In: Proceedings of the 14th international conference on machine learning, Morgan Kaufmann Publishers Inc, vol 97, pp 125–133

  38. Friedman N (1998) The Bayesian structural EM algorithm. In: Proceedings of the 14th conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc, pp 129–138

  39. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163

    MATH  Google Scholar 

  40. Fürnkranz J, Hüllermeier E, Mencía EL, Brinker K (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–153

    Google Scholar 

  41. van der Gaag LC, de Waal PR (2006) Muti-dimensional Bayesian network classifiers. In: Proceedings of the 3rd European workshop in probabilistic graphical models, pp 107–114

  42. Gama J, Castillo G (2006) Learning with local drift detection. In: Proceedings of the 2nd international conference on advanced data mining and applications, Lecture Notes in Artificial Intelligence, Springer, pp 42–55

  43. Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44

    MATH  Google Scholar 

  44. Gelsema ES (1995) Abductive reasoning in Bayesian belief networks using a genetic algorithm. Pattern Recogn Lett 16(8):865–871

    Google Scholar 

  45. Gibaja E, Ventura S (2015) A tutorial on multi-label learning. ACM Comput Surv 47(3):52

    Google Scholar 

  46. Gil-Begue S, Larrañaga P, Bielza C (2018) Multi-dimensional Bayesian network classifier trees. In: Proceedings of the 19th international conference on intelligent data engineering and automated learning, Lecture Notes in Computer Science, Springer, pp 354–363

  47. Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining, Lecture Notes in Artificial Intelligence, Springer, pp 22–30

  48. Guan DJ (1998) Generalized Gray codes with applications. In: Proceedings of the national science council of the Republic of China, part a: Physical science and engineering, vol 22, No 6, pp 841–848

  49. Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th international conference on machine learning, Morgan Kaufmann Publishers Inc, pp 359–366

  50. Henrion M (1988) Propagating uncertainty in Bayesian networks by probabilistic logic sampling. In: Machine intelligence and pattern recognition, vol 5, Elsevier, pp 149–163

  51. Hernández-González J, Inza I, Lozano JA (2015) Multidimensional learning from crowds: Usefulness and application of expertise detection. Int J Intell Syst 30(3):326–354

    Google Scholar 

  52. Hinkley DV (1971) Inference about the change-point from cumulative sum tests. Biometrika 58(3):509–523

    MathSciNet  MATH  Google Scholar 

  53. Hüllermeier E, Fürnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artif Intell 172(16–17):1897–1916

    MathSciNet  MATH  Google Scholar 

  54. Hutter F, Hoos HH, Stützle T (2005) Efficient stochastic local search for MPE solving. In: Proceedings of the 19th international joint conference on artificial intelligence, Morgan Kaufmann Publishers Inc, pp 169–174

  55. John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the 11th conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc, pp 338–345

  56. Kask K, Dechter R (1999) Mini-bucket heuristics for improved search. In: Proceedings of the 15th conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc, pp 314–323

  57. Kask K, Dechter R (2001) A general scheme for automatic generation of search heuristics from specification dependencies. Artif Intell 129(1–2):91–131

    MathSciNet  MATH  Google Scholar 

  58. Koller D, Friedman N (2009) Probabilistic graphical models: Principles and techniques. The MIT Press, London

    Google Scholar 

  59. Kong X, Philip SY (2011) An ensemble-based approach to fast classification of multi-label data streams. In: Proceedings of the 7th international conference on collaborative computing: Networking, applications and worksharing, IEEE, pp 95–104

  60. Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M (2017) Ensemble learning for data stream analysis: A survey. Inf Fusion 37:132–156

    Google Scholar 

  61. Kruskal JB (1956) On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Am Math Soc 7(1):48–50

    MathSciNet  MATH  Google Scholar 

  62. Kullback S (1997) Information theory and statistics. Courier Corporation

  63. Kwisthout J (2011) Most probable explanations in Bayesian networks: Complexity and tractability. Int J Approx Reason 52(9):1452–1469

    MathSciNet  MATH  Google Scholar 

  64. Langley P, Sage S (1994) Induction of selective Bayesian classifiers. In: Proceedings of the 10th conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc, pp 399–406

  65. Li Z, D’Ambrosio B (1993) An efficient approach for finding the MPE in belief networks. In: Proceedings of the 9th conference on uncertainty in artificial intelligence, Morgan Kaufmann Publisher Inc, pp 342–349

  66. Marinescu R, Dechter R (2009) AND/OR branch-and-bound search for combinatorial optimization in graphical models. Artif Intell 173(16–17):1457–1491

    MathSciNet  MATH  Google Scholar 

  67. Mencía EL, Fürnkranz J (2010) Efficient multilabel classification algorithms for large-scale problems in the legal domain. In: Semantic processing of legal texts, Springer, pp 192–215

  68. Minku LL, White AP, Yao X (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742

    Google Scholar 

  69. Minsky M (1961) Steps toward artificial intelligence. Proc Inst Radio Eng 49(1):8–30

    MathSciNet  Google Scholar 

  70. Nodelman U, Shelton CR, Koller D (2002) Continuous time Bayesian networks. In: Proceedings of the 18th conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc, pp 378–387

  71. Ortigosa-Hernández J, Rodríguez JD, Alzate L, Lucania M, Inza I, Lozano JA (2012) Approaching sentiment analysis by using semi-supervised learning of multi-dimensional classifiers. Neurocomputing 92:98–115

    Google Scholar 

  72. Page ES (1954) Continuous inspection schemes. Biometrika 41(1/2):100–115

    MathSciNet  MATH  Google Scholar 

  73. Park S, Fürnkranz J (2008) Multi-label classification with label constraints. In: Proceedings of the joint European conference on machine learning and principles and practice of knowledge discovery in databases workshop on preference learning, pp 157–171

  74. Pastink A, van der Gaag LC (2015) Multi-classifiers of small treewidth. In: Proceedings of the 13th European conference on symbolic and quantitative approaches to reasoning and uncertainty, Lecture Notes in Artificial Intelligence, Springer, pp 199–209

  75. Pearl J (1988) Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann Publishers, New York

    Google Scholar 

  76. Pérez A, Larrañaga P, Inza I (2009) Bayesian classifiers based on kernel density estimation: Flexible classifiers. Int J Approx Reason 50(2):341–362

    MATH  Google Scholar 

  77. Provost F, Domingos P (2000) Improving probability estimation trees. Mach Learn 52(3):199–215

    MATH  Google Scholar 

  78. Qazi M, Fung G, Krishnan S, Rosales R, Steck H, Rao RB, Poldermans D, Chandrasekaran D (2007) Automated heart wall motion abnormality detection from ultrasound images using Bayesian networks. In: Proceedings of the 20th international joint conference on artificial intelligence, Morgan Kaufmann Publishers Inc, pp 519–525

  79. Qu W, Zhang Y, Zhu J, Qiu Q (2009) Mining multi-label concept-drifting data streams using dynamic classifier ensemble. In: Proceedings of the 1st Asian conference on machine learning, Lecture Notes in Artificial Intelligence, Springer, pp 308–321

  80. Read J (2008) A pruned problem transformation method for multi-label classification. In: Proceedings of the New Zealand computer science research student conference, pp 143–150

  81. Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359

    MathSciNet  Google Scholar 

  82. Read J, Bifet A, Holmes G, Pfahringer B (2012) Scalable and efficient multi-label classification for evolving data streams. Mach Learn 88(1–2):243–272

    MathSciNet  Google Scholar 

  83. Read J, Bielza C, Larrañaga P (2013) Multi-dimensional classification with super-classes. IEEE Trans Knowl Data Eng 26(7):1720–1733

    Google Scholar 

  84. Read J, Reutemann P, Pfahringer B, Holmes G (2016) MEKA: A multi-label/multi-target extension to WEKA. J Mach Learn Res 17:667–671

    MathSciNet  MATH  Google Scholar 

  85. Rebane G, Pearl J (1987) The recovery of causal poly-trees from statistical data. In: Proceedings of the 3rd conference on uncertainty in artificial intelligence, AUAI Press, pp 222–228

  86. Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471

    MATH  Google Scholar 

  87. Rivas JJ, Orihuela-Espina F, Sucar LE (2018) Circular chain classifiers. In: Proceedings of the 9th international conference on probabilistic graphical models, proceedings of machine learning research, pp 380–391

  88. Rivolli A, de Carvalho ACPLF (2018) The utiml package: Multi-label classification in R. The R J 10(2):24–37

    Google Scholar 

  89. Robinson RW (1973) Counting labeled acyclic digraphs. In: New directions in the theory of graphs, Academic Press, pp 239–273

  90. Rodríguez JD, Lozano JA (2008) Multi-objective learning of multi-dimensional Bayesian classifiers. In: Proceedings of the 8th international conference on hybrid intelligent systems, IEEE Computer Society, pp 501–506

  91. Rodríguez JD, Perez A, Arteta D, Tejedor D, Lozano JA (2012) Using multidimensional Bayesian network classifiers to assist the treatment of multiple sclerosis. IEEE Trans Syst Man Cybern Part C Appl Rev 42(6):1705–1715

    Google Scholar 

  92. Rojas-Guzman C, Kramer MA (1993) GALGO: A genetic algorithm decision support tool for complex uncertain systems modeled with Bayesian belief networks. In: Proceedings of the 9th conference on uncertainty in artificial intelligence, Morgan Kaufmann Publisher Inc, pp 368–375

  93. Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1(5):206–215

    Google Scholar 

  94. Sahami M (1996) Learning limited dependence Bayesian classifiers. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, AAAI Press, 1, pp 335–338

  95. Santos E (1991) On the generation of alternative explanations with implications for belief revision. In: Proceedings of the 7th conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc, pp 339–347

  96. Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336

    MATH  Google Scholar 

  97. Schapire RE, Singer Y (2000) BoosTexter: A boosting-based system for text categorization. Mach Learn 39(2–3):135–168

    MATH  Google Scholar 

  98. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    MathSciNet  MATH  Google Scholar 

  99. Sechidis K, Tsoumakas G, Vlahavas I (2011) On the stratification of multi-label data. In: Proceedings of the joint European conference on machine learning and knowledge discovery in databases, Lecture Notes in Artificial Intelligence, Springer, pp 145–158

  100. Shimony SE (1994) Finding MAPs for belief networks is NP-hard. Artif Intell 68(2):399–410

    MATH  Google Scholar 

  101. Shimony SE, Charniak W (1990) A new algorithm for finding MAP assignments to belief networks. In: Proceedings of the 6th annual conference on uncertainty in artificial intelligence, Elsevier, pp 185–196

  102. Song G, Ye Y (2014) A new ensemble method for multi-label data stream classification in non-stationary environment. In: Proceedings of the 2014 international joint conference on neural networks, IEEE, pp 1776–1783

  103. Stella F, Amer Y (2012) Continuous time Bayesian network classifiers. J Biomed Inform 45(6):1108–1119

    Google Scholar 

  104. Sucar LE, Bielza C, Morales EF, Hernandez-Leal P, Zaragoza JH, Larrañaga P (2014) Multi-label classification with Bayesian network-based chain classifiers. Pattern Recogn Lett 41:14–22

    Google Scholar 

  105. Sy BK (1992) Reasoning MPE to multiply connected belief networks using message passing. In: Proceedings of the 10th national conference on artificial intelligence, AAAI Press, pp 570–576

  106. Szymanski P, Kajdanowicz T (2019) Scikit-multilearn: A scikit-based Python environment for performing multi-label classification. J Mach Learn Res 20:209–230

    MATH  Google Scholar 

  107. Teyssier M, Koller D (2005) Ordering-based search: A simple and effective algorithm for learning Bayesian networks. In: Proceedings of the 21st conference on uncertainty in artificial intelligence, AUAI Press, pp 584–590

  108. Tsoumakas G, Katakis I (2007) Multi-label classification: An overview. Int J Data Warehouse Min 3(3):1–13

    Google Scholar 

  109. Tsoumakas G, Vlahavas I (2007) Random k-labelsets: An ensemble method for multilabel classification. In: Proceedings of the 18th European conference on machine learning, Lecture Notes in Artificial Intelligence, Springer, pp 406–417

  110. Tsoumakas G, Katakis I, Vlahavas I (2009) Mining multi-label data. In: Data mining and knowledge discovery handbook, Springer, pp 667–685

  111. Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, Vlahavas I (2011) MULAN: A Java library for multi-label learning. J Mach Learn Res 12:2411–2414

    MathSciNet  MATH  Google Scholar 

  112. de Waal PR, van der Gaag LC (2007) Inference and learning in multi-dimensional Bayesian network classifiers. In: Proceedings of the 9th European conference on symbolic and quantitative approaches to reasoning with uncertainty, Lecture Notes in Artificial Intelligence, Springer, pp 501–511

  113. Wang L, Shen H, Tian H (2017) Weighted ensemble classification of multi-label data streams. In: Proceedings of the 21st Pacific-Asia conference on knowledge discovery and data mining, Lecture Notes in Artificial Intelligence, Springer, pp 551–562

  114. Wang P, Zhang P, Guo L (2012) Mining multi-label data streams using ensemble-based active learning. In: Proceedings of the 2012 SIAM international conference on data mining, SIAM, pp 1131–1140

  115. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101

    Google Scholar 

  116. Xioufis ES, Spiliopoulou M, Tsoumakas G, Vlahavas IP (2011) Dealing with concept drift and class imbalance in multi-label stream classification. In: Proceedings of the 22nd international joint conference on artificial intelligence, AAAI Press, pp 1583–1588

  117. Yang Y (1999) An evaluation of statistical approaches to text categorization. Inf Retrieval 1(1–2):69–90

    Google Scholar 

  118. Yang Y, Ding M (2019) Decision function with probability feature weighting based on Bayesian network for multi-label classification. Neural Comput Appl 31(9):4819–4828

    Google Scholar 

  119. Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp 42–49

  120. Zaragoza JH, Sucar LE, Morales EF (2011a) A two-step method to learn multidimensional Bayesian network classifiers based on mutual information measures. In: Proceedings of the 24th international FLAIRS conference, AAAI Press, pp 644–649

  121. Zaragoza JH, Sucar LE, Morales EF, Bielza C, Larranaga P (2011b) Bayesian chain classifiers for multidimensional classification. In: Proceedings of the 22nd international joint conference on artificial intelligence, AAAI Press, pp 2192–2197

  122. Zhang ML, Zhou ZH (2007) ML-KNN: A lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048

    MATH  Google Scholar 

  123. Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837

    Google Scholar 

  124. Zhang ML, Peña JM, Robles V (2009) Feature selection for multi-label naive Bayes classification. Inf Sci 179(19):3218–3229

    MATH  Google Scholar 

  125. Zhu M, Liu S, Jiang J (2016) A hybrid method for learning multi-dimensional Bayesian network classifiers based on an optimization model. Appl Intell 44(1):123–148

    Google Scholar 

  126. Zhu S, Ji X, Xu W, Gong Y (2005) Multi-labelled classification using maximum entropy method. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp 274–281

Download references

Acknowledgements

This work has been partially supported by the Spanish Ministry of Science and Innovation through the PID2019-109247GB-IOO project. Santiago Gil-Begue has been supported by the predoctoral grant FPU17/04341 from the Spanish Ministry of Science, Innovation and Universities.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Santiago Gil-Begue.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gil-Begue, S., Bielza, C. & Larrañaga, P. Multi-dimensional Bayesian network classifiers: A survey. Artif Intell Rev 54, 519–559 (2021). https://doi.org/10.1007/s10462-020-09858-x

Download citation

Keywords

  • Multi-dimensional classification
  • Multi-label classification
  • Bayesian networks
  • Performance evaluation measures
  • Structural learning
  • Bayesian network inference complexity