Advertisement

Machine Learning

, Volume 50, Issue 1–2, pp 127–158 | Cite as

Improving Markov Chain Monte Carlo Model Search for Data Mining

  • Paolo Giudici
  • Robert Castelo
Article

Abstract

The motivation of this paper is the application of MCMC model scoring procedures to data mining problems, involving a large number of competing models and other relevant model choice aspects.

To achieve this aim we analyze one of the most popular Markov Chain Monte Carlo methods for structural learning in graphical models, namely, the MC3 algorithm proposed by D. Madigan and J. York (International Statistical Review, 63, 215–232, 1995). Our aim is to improve their algorithm to make it an effective and reliable tool in the field of data mining. In such context, typically highly dimensional in the number of variables, little can be known a priori and, therefore, a good model search algorithm is crucial.

We present and describe in detail our implementation of the MC3 algorithm, which provides an efficient general framework for computations with both Directed Acyclic Graphical (DAG) models and Undirected Decomposable Models (UDG). We believe that the possibility of commuting easily between the two classes of models constitutes an important asset in data mining, where an a priori knowledge of causal effects is usually difficult to establish.

Furthermore, in order to improve the MC3 method we propose provide several graphical monitors which can help extracting results and assessing the goodness of the Markov chain Monte Carlo approximation to the posterior distribution of interest.

We apply our proposed methodology first to the well-known coronary heart disease dataset (D. Edwards &; T. Havránek, Biometrika, 72:2, 339–351, 1985). We then introduce a novel data mining application which concerns market basket analysis.

Bayesian structural learning convergence diagnostics Dirichlet distribution market basket analysis Markov chain Monte Carlo 

References

  1. Brooks, S. (1998). Markov chain Monte Carlo method and its application. The Statistician, 47, 69-100.Google Scholar
  2. Buntine, W. (1991). Theory refinement on bayesian networks. In P. S. B. D'Ambrosio, &; P. Bonissone (Eds.), Proc. of the Conf. on Uncertainty in Artificial Intelligence (pp. 52-60). Morgan Kaufmann.Google Scholar
  3. Chickering, D. (1995). A transformational characterization of equivalent Bayesian networks. In P. Besnard, &; S. Hanks (Eds.), Proc. of the Conf. on Uncertainty in Artificial Intelligence. (pp. 87-98). Morgan Kaufmann.Google Scholar
  4. Cowell, R., Dawid, A., Lauritzen, S., &; Spiegelhalter, D. (1999). Probabilistic networks and expert systems. New York: Springer-Verlag.Google Scholar
  5. Dawid, A. (1979). Conditional independence in statistical theory (with discussion). Journal of the Royal Statistical Society B, 41:1, 1-31.Google Scholar
  6. Dawid, A., &; Lauritzen, S. (1993). Hyper-Markov laws in the statistical analysis of decomposable graphical models. Annals of Statistics,21:3, 1272-1317.Google Scholar
  7. Dellaportas, P., &; Forster, J. (1999). Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika, 86:3, 615-633.Google Scholar
  8. Edwards, D., &; Havránek, T. (1985). A fast procedure for model search in multidimensional contingency tables. Biometrika, 72:2, 339-351.Google Scholar
  9. Frydenberg, M., &; Lauritzen, S. (1989). Decomposition of maximum likelihood in mixed interaction models. Biometrika, 76:3, 539-555.Google Scholar
  10. Gillispie, S., &; Perlman, M. (2001). Enumerating Markov equivalence classes of acyclic digraph models. In J. Breese, &; D. Koller (Eds.), Proc. of the Conf. on Uncertainty in Artificial Intelligence (pp. 171-177). Morgan Kaufmann.Google Scholar
  11. Giudici, P., &; Green, P. (1999). Decomposable graphical gaussian model determination. Biometrika, 86:4, 785-801.Google Scholar
  12. Giudici, P., &; Passerone, G. (2001). Data mining of association structures to model consumer behaviour. Journal of Computational Statistics and Data Analysis, to appear.Google Scholar
  13. Heckerman, D., Geiger, D., &; Chickering, D. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 194-243.Google Scholar
  14. Kass, R., &; Raftery, A. (1995). Bayes factors. Journal of the American Statistical Association, 90:430, 773-795.Google Scholar
  15. Lauritzen, S. (1996). Graphical models. Oxford: Oxford University Press.Google Scholar
  16. Lauritzen, S., Dawid, A., Larsen, B., &; Leimer, H. (1990). Independence properties of directed Markov fields. Networks, 20, 491-505.Google Scholar
  17. Madigan, D., Andersson, S., Perlman, M., &; Volinsky, C. (1996). Bayesian model averaging and model selection for markov equivalence classes of acyclic digraphs. Communications in Statistics (theory and methods), 25:11, 2493-2512.Google Scholar
  18. Madigan, D., &; Raftery, A. (1994). Model selection and accounting for model uncertainty in graphical models using Occam's window. Journal of the American Statistical Association, 89:428, 1535-1546.Google Scholar
  19. Madigan, D., &; York, J. (1995). Bayesian graphical models for discrete data. International Statistical Review, 63, 215-232.Google Scholar
  20. Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Mateo, California: Morgan Kaufmann.Google Scholar
  21. Pearl, J., &; Verma, T. (1987). The logic of representing dependencies by directed graphs. In Proc. of the Conf. of the American Association of Artificial Intelligence (pp. 374-379).Google Scholar
  22. Robinson, R. (1973). Counting labeled acyclic digraphs. In F. Harary (Ed.), New directions in the theory of graphs (pp. 239-273). Academic Press: New York.Google Scholar
  23. Tarjan, R., &; Yannakakis, M. (1984). Simple linear time algorithms to test chordality of graphs, test acyclicity of hypergraphs and selectively reduc acyclic hypergraphs. SIAM Journal of Computing, 13, 566-579.Google Scholar
  24. Verma, T., &; Pearl, J. (1990). Equivalence and synthesis of causal models. In P. Bonissone, M. Henrion, L. Kanal,&; J. Lemmer (Eds.), Proc. of the Conf. on Uncertainty in Artificial Intelligence (pp. 255-268). Morgan Kaufmann.Google Scholar
  25. Wormald, N. (1985). Counting labeled chordal graphs. Graphs and Combinatorics, 1, 193-200.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Paolo Giudici
    • 1
  • Robert Castelo
    • 2
  1. 1.Department of Economics and Quantitative MethodsUniversity of PaviaPaviaItaly
  2. 2.Institute of Information and Computing SciencesUniversity of UtrechtUtrechtThe Netherlands

Personalised recommendations