Abstract
The motivation of this paper is the application of MCMC model scoring procedures to data mining problems, involving a large number of competing models and other relevant model choice aspects.
To achieve this aim we analyze one of the most popular Markov Chain Monte Carlo methods for structural learning in graphical models, namely, the MC 3 algorithm proposed by D. Madigan and J. York (International Statistical Review, 63, 215–232, 1995). Our aim is to improve their algorithm to make it an effective and reliable tool in the field of data mining. In such context, typically highly dimensional in the number of variables, little can be known a priori and, therefore, a good model search algorithm is crucial.
We present and describe in detail our implementation of the MC 3 algorithm, which provides an efficient general framework for computations with both Directed Acyclic Graphical (DAG) models and Undirected Decomposable Models (UDG). We believe that the possibility of commuting easily between the two classes of models constitutes an important asset in data mining, where an a priori knowledge of causal effects is usually difficult to establish.
Furthermore, in order to improve the MC 3 method we propose provide several graphical monitors which can help extracting results and assessing the goodness of the Markov chain Monte Carlo approximation to the posterior distribution of interest.
We apply our proposed methodology first to the well-known coronary heart disease dataset (D. Edwards &; T. Havránek, Biometrika, 72:2, 339–351, 1985). We then introduce a novel data mining application which concerns market basket analysis.
Article PDF
Similar content being viewed by others
References
Brooks, S. (1998). Markov chain Monte Carlo method and its application. The Statistician, 47, 69-100.
Buntine, W. (1991). Theory refinement on bayesian networks. In P. S. B. D'Ambrosio, &; P. Bonissone (Eds.), Proc. of the Conf. on Uncertainty in Artificial Intelligence (pp. 52-60). Morgan Kaufmann.
Chickering, D. (1995). A transformational characterization of equivalent Bayesian networks. In P. Besnard, &; S. Hanks (Eds.), Proc. of the Conf. on Uncertainty in Artificial Intelligence. (pp. 87-98). Morgan Kaufmann.
Cowell, R., Dawid, A., Lauritzen, S., &; Spiegelhalter, D. (1999). Probabilistic networks and expert systems. New York: Springer-Verlag.
Dawid, A. (1979). Conditional independence in statistical theory (with discussion). Journal of the Royal Statistical Society B, 41:1, 1-31.
Dawid, A., &; Lauritzen, S. (1993). Hyper-Markov laws in the statistical analysis of decomposable graphical models. Annals of Statistics,21:3, 1272-1317.
Dellaportas, P., &; Forster, J. (1999). Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika, 86:3, 615-633.
Edwards, D., &; Havránek, T. (1985). A fast procedure for model search in multidimensional contingency tables. Biometrika, 72:2, 339-351.
Frydenberg, M., &; Lauritzen, S. (1989). Decomposition of maximum likelihood in mixed interaction models. Biometrika, 76:3, 539-555.
Gillispie, S., &; Perlman, M. (2001). Enumerating Markov equivalence classes of acyclic digraph models. In J. Breese, &; D. Koller (Eds.), Proc. of the Conf. on Uncertainty in Artificial Intelligence (pp. 171-177). Morgan Kaufmann.
Giudici, P., &; Green, P. (1999). Decomposable graphical gaussian model determination. Biometrika, 86:4, 785-801.
Giudici, P., &; Passerone, G. (2001). Data mining of association structures to model consumer behaviour. Journal of Computational Statistics and Data Analysis, to appear.
Heckerman, D., Geiger, D., &; Chickering, D. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 194-243.
Kass, R., &; Raftery, A. (1995). Bayes factors. Journal of the American Statistical Association, 90:430, 773-795.
Lauritzen, S. (1996). Graphical models. Oxford: Oxford University Press.
Lauritzen, S., Dawid, A., Larsen, B., &; Leimer, H. (1990). Independence properties of directed Markov fields. Networks, 20, 491-505.
Madigan, D., Andersson, S., Perlman, M., &; Volinsky, C. (1996). Bayesian model averaging and model selection for markov equivalence classes of acyclic digraphs. Communications in Statistics (theory and methods), 25:11, 2493-2512.
Madigan, D., &; Raftery, A. (1994). Model selection and accounting for model uncertainty in graphical models using Occam's window. Journal of the American Statistical Association, 89:428, 1535-1546.
Madigan, D., &; York, J. (1995). Bayesian graphical models for discrete data. International Statistical Review, 63, 215-232.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Mateo, California: Morgan Kaufmann.
Pearl, J., &; Verma, T. (1987). The logic of representing dependencies by directed graphs. In Proc. of the Conf. of the American Association of Artificial Intelligence (pp. 374-379).
Robinson, R. (1973). Counting labeled acyclic digraphs. In F. Harary (Ed.), New directions in the theory of graphs (pp. 239-273). Academic Press: New York.
Tarjan, R., &; Yannakakis, M. (1984). Simple linear time algorithms to test chordality of graphs, test acyclicity of hypergraphs and selectively reduc acyclic hypergraphs. SIAM Journal of Computing, 13, 566-579.
Verma, T., &; Pearl, J. (1990). Equivalence and synthesis of causal models. In P. Bonissone, M. Henrion, L. Kanal,&; J. Lemmer (Eds.), Proc. of the Conf. on Uncertainty in Artificial Intelligence (pp. 255-268). Morgan Kaufmann.
Wormald, N. (1985). Counting labeled chordal graphs. Graphs and Combinatorics, 1, 193-200.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Giudici, P., Castelo, R. Improving Markov Chain Monte Carlo Model Search for Data Mining. Machine Learning 50, 127–158 (2003). https://doi.org/10.1023/A:1020202028934
Issue Date:
DOI: https://doi.org/10.1023/A:1020202028934