Artificial Intelligence Review

, Volume 42, Issue 4, pp 1069–1093 | Cite as

A survey on independence-based Markov networks learning

  • Federico Schlüter


The problem of learning the Markov network structure from data has become increasingly important in machine learning, and in many other application fields. Markov networks are probabilistic graphical models, a widely used formalism for handling probability distributions in intelligent systems. This document focuses on a technology called independence-based learning, which allows for the learning of the independence structure of Markov networks from data in an efficient and sound manner, whenever the dataset is sufficiently large, and data is a representative sample of the target distribution. In the analysis of such technology, this work surveys the current state-of-the-art algorithms, discussing its limitations, and posing a series of open problems where future work may produce some advances in the area, in terms of quality and efficiency.


Markov networks Structure learning Independence-based Survey 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agresti A (2002) Categorical data analysis. 2nd edn. Wiley, New YorkCrossRefzbMATHGoogle Scholar
  2. Alden M (2007) MARLEDA: effective distribution estimation through Markov Random fields. PhD thesis, Dept of CS, University of Texas AustinGoogle Scholar
  3. Aliferis C, Tsamardinos I, Statnikov A (2003) HITON, a novel Markov blanket algorithm for optimal variable selection. AMIA FallGoogle Scholar
  4. Aliferis C, Statnikov A, Tsamardinos I, Mani S, Koutsoukos X (2010a) Local causal and Markov blanket induction for causal discovery and feature selection for classification part I: algorithms and empirical evaluation. JMLR 11: 171–234zbMATHMathSciNetGoogle Scholar
  5. Aliferis C, Statnikov A, Tsamardinos I, Mani S, Koutsoukos X (2010b) Local causal and Markov blanket induction for causal discovery and feature selection for classification part II: analysis and extensions. JMLR 11: 235–284zbMATHMathSciNetGoogle Scholar
  6. Amgoud L, Cayrol C (2002) A reasoning model based on the production of acceptable arguments. Ann Math Artif Intell 34: 197–215CrossRefzbMATHMathSciNetGoogle Scholar
  7. Anguelov D, Taskar B, Chatalbashev V, Koller D, Gupta D, Heitz G, Ng A (2005) Discriminative learning of Markov random fields for segmentation of 3D range data. In: Proceedings of the CVPRGoogle Scholar
  8. Barahona F (1982) On the computational complexity of Ising spin glass models. J Phys A: Math Gen 15(10): 3241–3253CrossRefMathSciNetGoogle Scholar
  9. Besag J (1977) Efficiency of pseudolikelihood estimation for simple Gaussian fields. Biometrica 64: 616–618CrossRefzbMATHMathSciNetGoogle Scholar
  10. Besag J, York J, Mollie A (1991) Bayesian image restoration with two applications in spatial statistics. Ann Inst Stat Math 43: 1–59CrossRefzbMATHMathSciNetGoogle Scholar
  11. Bromberg F (2007) Markov network structure discovery using independence tests. PhD thesis, Dept of CS, Iowa State UniversityGoogle Scholar
  12. Bromberg F, Margaritis D (2007) Efficient and robust independence-based Markov network structure discovery. In: Proceedings of IJCAIGoogle Scholar
  13. Bromberg F, Margaritis D (2009) Improving the reliability of causal discovery from small data sets using argumentation. JMLR 10: 301–340zbMATHMathSciNetGoogle Scholar
  14. Bromberg F, Margaritis D, Honavar V (2006) Efficient markov network structure discovery using independence tests. In: Proceedings of the SIAM data mining, p 06Google Scholar
  15. Bromberg F, Margaritis D, Honavar H (2009) Efficient Markov network structure discovery using independence tests. JAIR 35: 449–485zbMATHGoogle Scholar
  16. Cai KK, Bu JJ, Chen C, Qiu G (2007) A novel dependency language model for information retrieval. J Zhejiang Univ Sci A 8: 871–882. doi: 10.1631/jzus.2007.A0871 CrossRefzbMATHGoogle Scholar
  17. Cochran WG (1954) Some methods of strengthening the common χ tests. Biometrics 10: 417–451CrossRefzbMATHMathSciNetGoogle Scholar
  18. Cooper GF (1990) The computational complexity of probabilistic inference using bayesian belief networks. Artif Intell 42(2–3): 393–405. doi: 10.1016/0004-3702(90)90060-D CrossRefzbMATHGoogle Scholar
  19. Cover TM, Thomas JA (1991) Elements of information theory. Wiley-Interscience, New YorkCrossRefzbMATHGoogle Scholar
  20. Cressie N (1992) Statistics for spatial data. Terra Nova 4(5): 613–617. doi: 10.1111/j.1365-3121.1992.tb00605.x CrossRefGoogle Scholar
  21. Davis J, Domingos P (2010) Bottom-up learning of Markov network structure. In: ICML, pp 271–278Google Scholar
  22. Della Pietra S, Della Pietra VJ, Lafferty JD (1997) Inducing features of random fields. IEEE Trans PAMI 19(4): 380–393CrossRefGoogle Scholar
  23. Friedman N, Linial M, Nachman I, Pe’er D (2000) Using Bayesian networks to analyze expression data. Comput Biol 7: 601–620CrossRefGoogle Scholar
  24. Fu S, Desmarais MC (2008) Fast Markov blanket discovery algorithm via local learning within single pass. In: Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on advances in artificial intelligence, Springer, Berlin, Heidelberg, Canadian AI’08, pp 96–107Google Scholar
  25. Fu S, Desmarais MC (2010) Markov blanket based feature selection: a review of past decade. In: Proceedings of the world congress on engineering, vol I, pp 321–328Google Scholar
  26. Ganapathi V, Vickrey D, Duchi J, Koller D (2008) Constrained approximate maximum entropy learning of Markov random fields. In: Uncertainty in artificial intelligence, pp 196–203Google Scholar
  27. Gandhi P, Bromberg F, Margaritis D (2008) Learning Markov network structure using few independence tests. In: SIAM international conference on data mining, pp 680–691Google Scholar
  28. Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Tech Rep MSR-TR-94-09, Mach Learn 20(3):197–243Google Scholar
  29. Höfling H, Tibshirani R (2009) Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods. J Mach Learn Res 10: 883–906zbMATHMathSciNetGoogle Scholar
  30. Hyvärinen A, Dayan P (2005) Estimation of non-normalized statistical models by score matching. J Mach Learn Res 6: 695–709zbMATHMathSciNetGoogle Scholar
  31. Karyotis V (2010) Markov random fields for malware propagation: the case of chain networks. Commun Lett 14: 875–877CrossRefGoogle Scholar
  32. Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, CambridgeGoogle Scholar
  33. Koller D, Sahami M (1996) Toward optimal feature selection. Morgan Kaufmann, Los Altos 284–292Google Scholar
  34. Lam W, Bacchus F (1994) Learning Bayesian belief networks: an approach based on the MDL principle. Comput Intell 10: 269–293CrossRefGoogle Scholar
  35. Larrañaga P, Lozano JA (2002) Estimation of distribution algorithms. A new tool for evolutionary computation. Kluwer, DordrechtCrossRefzbMATHGoogle Scholar
  36. Lauritzen SL (1996) Graphical models. Oxford University Press, OxfordGoogle Scholar
  37. Lee SI, Ganapathi V, Koller D (2006) Efficient structure learning of Markov networks using L1-regularization. In: NIPSGoogle Scholar
  38. Li SZ (2001) Markov random field modeling in image analysis. Springer-Verlag New York, Inc, SecaucusCrossRefzbMATHGoogle Scholar
  39. Margaritis D (2005) Distribution-free learning of Bayesian network structure in continuous domains. In: Proceedings of AAAIGoogle Scholar
  40. Margaritis D, Bromberg F (2009) Efficient Markov network discovery using particle filter. Comput Intell 25(4): 367–394CrossRefMathSciNetGoogle Scholar
  41. Margaritis D, Thrun S (2000) Bayesian network induction via local neighborhoods. In: Proceedings of NIPSGoogle Scholar
  42. McCallum A (2003) Efficiently inducing features of conditional random fields. In: Proceedings of uncertainty in artificial intelligence (UAI)Google Scholar
  43. Metzler D, Croft WB (2005) A markov random field model for term dependencies. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, ACM, New York, NY, USA, SIGIR ’05, pp 472–479Google Scholar
  44. Minka T (2001) Algorithms for maximum-likelihood logistic regression. Tech. rep., Dept of Statistics, Carnegie Mellon University, PittsburghGoogle Scholar
  45. Minka T (2004) Power EP. Tech. Rep. MSR-TR-2004-149, Microsoft Research, CambridgeGoogle Scholar
  46. Mooij JM (2010) libDAI: a free and open source C++ library for discrete approximate inference in graphical models. J Mach Learn Res 11: 2169–2173zbMATHGoogle Scholar
  47. Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, Los AltosGoogle Scholar
  48. Pearl J, Paz A (1985) GRAPHOIDS: a graph based logic for reasonning about relevance relations. Tech. Rep. 850038 (R-53-L), cognitive systems laboratory, University of California, Los AngelesGoogle Scholar
  49. Peña JM, Nilsson R, Björkegren J, Tegnér J (2007) Towards scalable and data efficient learning of Markov boundaries. Int J Approx Reason 45:211–232CrossRefzbMATHGoogle Scholar
  50. Ravikumar P, Wainwright MJ, Lafferty JD (2010) High-dimensional Ising model selection using L1-regularized logistic regression. Ann Stat 38: 1287–1319. doi: 10.1214/09-AOS691 CrossRefzbMATHMathSciNetGoogle Scholar
  51. Schmidt M, Murphy K, Fung G, Rosales R (2008) Structure learning in random fields for heart motion abnormality detection. In: Computer vision and pattern recognition, 2008. CVPR 2008. IEEE Conference on, pp 1 –8. doi: 10.1109/CVPR.2008.4587367
  52. Shakya S, Santana R (2008) A markovianity based optimization algorithm. Tech. rep., Basque Country UGoogle Scholar
  53. Shekhar S, Zhang P, Huang Y, Vatsavai RR (2004) Trends in Spatial Data Mining. In: Kargupta H, Joshi A, Sivakumar K, Yesha Y (eds) Trends in spatial data mining (Chap. 19). AAAI Press/The MIT Press, Cambridge, pp 357–379Google Scholar
  54. Spirtes P, Glymour C, Scheines R (2000) Causation, prediction, and search, adaptive computation and machine learning series. MIT Press, CambridgeGoogle Scholar
  55. Tsamardinos I, Aliferis CF, Statnikov Er (2003) Algorithms for large scale Markov blanket discovery. In: The 16th international FLAIRS conference, St. Augustine, Florida, USA, pp 376–380Google Scholar
  56. Tsamardinos I, Brown L, Aliferis CF (2006) The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn 65: 31–78CrossRefGoogle Scholar
  57. Vishwanathan SVN, Schraudolph NN, Schmidt MW, Murphy KP (2006) Accelerated training of conditional random fields with stochastic gradient methods. In: Proceedings of the 23rd international conference on Machine learning, ACM, New York, NY, USA, ICML ’06, pp 969–976Google Scholar
  58. Wainwright MJ, Jordan MI (2008) Graphical models, exponential families, and variational inference. Found Trends Mach Learn 1: 1–305. doi: 10.1561/2200000001 CrossRefzbMATHGoogle Scholar
  59. Wainwright MJ, Jaakkola TS, Willsky AS (2003) Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching. In: AISTATSGoogle Scholar
  60. Winn J, Bishop CM (2005) Variational message passing. J Mach Learn Res 6: 661–694zbMATHMathSciNetGoogle Scholar
  61. Yaramakala S, Margaritis D (2005) Speculative Markov blanket discovery for optimal feature selection. In: Data mining, fifth IEEE international conference on, 4 pp. doi: 10.1109/ICDM.2005.134
  62. Yedidia J, Freeman W, Weiss Y (2005) Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans Inform Theory 51(7): 2282–2312. doi: 10.1109/TIT.2005.850085 CrossRefzbMATHMathSciNetGoogle Scholar
  63. Yedidia JS, Freeman WT, Weiss Y (2004) Constructing free energy approximations and generalized belief propagation algorithms. IEEE Tran Inform Theory 51: 2282–2312CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.Lab. DHARMa of Artificial Intelligence, Dept of Information Systems, Facultad Regional MendozaNational Technological UniversityMendozaArgentina

Personalised recommendations