Improving Markov Chain Monte Carlo Model Search for Data Mining
- 1.3k Downloads
The motivation of this paper is the application of MCMC model scoring procedures to data mining problems, involving a large number of competing models and other relevant model choice aspects.
To achieve this aim we analyze one of the most popular Markov Chain Monte Carlo methods for structural learning in graphical models, namely, the MC3 algorithm proposed by D. Madigan and J. York (International Statistical Review, 63, 215–232, 1995). Our aim is to improve their algorithm to make it an effective and reliable tool in the field of data mining. In such context, typically highly dimensional in the number of variables, little can be known a priori and, therefore, a good model search algorithm is crucial.
We present and describe in detail our implementation of the MC3 algorithm, which provides an efficient general framework for computations with both Directed Acyclic Graphical (DAG) models and Undirected Decomposable Models (UDG). We believe that the possibility of commuting easily between the two classes of models constitutes an important asset in data mining, where an a priori knowledge of causal effects is usually difficult to establish.
Furthermore, in order to improve the MC3 method we propose provide several graphical monitors which can help extracting results and assessing the goodness of the Markov chain Monte Carlo approximation to the posterior distribution of interest.
We apply our proposed methodology first to the well-known coronary heart disease dataset (D. Edwards &; T. Havránek, Biometrika, 72:2, 339–351, 1985). We then introduce a novel data mining application which concerns market basket analysis.
- Brooks, S. (1998). Markov chain Monte Carlo method and its application. The Statistician, 47, 69-100.Google Scholar
- Buntine, W. (1991). Theory refinement on bayesian networks. In P. S. B. D'Ambrosio, &; P. Bonissone (Eds.), Proc. of the Conf. on Uncertainty in Artificial Intelligence (pp. 52-60). Morgan Kaufmann.Google Scholar
- Chickering, D. (1995). A transformational characterization of equivalent Bayesian networks. In P. Besnard, &; S. Hanks (Eds.), Proc. of the Conf. on Uncertainty in Artificial Intelligence. (pp. 87-98). Morgan Kaufmann.Google Scholar
- Cowell, R., Dawid, A., Lauritzen, S., &; Spiegelhalter, D. (1999). Probabilistic networks and expert systems. New York: Springer-Verlag.Google Scholar
- Dawid, A. (1979). Conditional independence in statistical theory (with discussion). Journal of the Royal Statistical Society B, 41:1, 1-31.Google Scholar
- Dawid, A., &; Lauritzen, S. (1993). Hyper-Markov laws in the statistical analysis of decomposable graphical models. Annals of Statistics,21:3, 1272-1317.Google Scholar
- Dellaportas, P., &; Forster, J. (1999). Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika, 86:3, 615-633.Google Scholar
- Edwards, D., &; Havránek, T. (1985). A fast procedure for model search in multidimensional contingency tables. Biometrika, 72:2, 339-351.Google Scholar
- Frydenberg, M., &; Lauritzen, S. (1989). Decomposition of maximum likelihood in mixed interaction models. Biometrika, 76:3, 539-555.Google Scholar
- Gillispie, S., &; Perlman, M. (2001). Enumerating Markov equivalence classes of acyclic digraph models. In J. Breese, &; D. Koller (Eds.), Proc. of the Conf. on Uncertainty in Artificial Intelligence (pp. 171-177). Morgan Kaufmann.Google Scholar
- Giudici, P., &; Green, P. (1999). Decomposable graphical gaussian model determination. Biometrika, 86:4, 785-801.Google Scholar
- Giudici, P., &; Passerone, G. (2001). Data mining of association structures to model consumer behaviour. Journal of Computational Statistics and Data Analysis, to appear.Google Scholar
- Heckerman, D., Geiger, D., &; Chickering, D. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 194-243.Google Scholar
- Kass, R., &; Raftery, A. (1995). Bayes factors. Journal of the American Statistical Association, 90:430, 773-795.Google Scholar
- Lauritzen, S. (1996). Graphical models. Oxford: Oxford University Press.Google Scholar
- Lauritzen, S., Dawid, A., Larsen, B., &; Leimer, H. (1990). Independence properties of directed Markov fields. Networks, 20, 491-505.Google Scholar
- Madigan, D., Andersson, S., Perlman, M., &; Volinsky, C. (1996). Bayesian model averaging and model selection for markov equivalence classes of acyclic digraphs. Communications in Statistics (theory and methods), 25:11, 2493-2512.Google Scholar
- Madigan, D., &; Raftery, A. (1994). Model selection and accounting for model uncertainty in graphical models using Occam's window. Journal of the American Statistical Association, 89:428, 1535-1546.Google Scholar
- Madigan, D., &; York, J. (1995). Bayesian graphical models for discrete data. International Statistical Review, 63, 215-232.Google Scholar
- Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Mateo, California: Morgan Kaufmann.Google Scholar
- Pearl, J., &; Verma, T. (1987). The logic of representing dependencies by directed graphs. In Proc. of the Conf. of the American Association of Artificial Intelligence (pp. 374-379).Google Scholar
- Robinson, R. (1973). Counting labeled acyclic digraphs. In F. Harary (Ed.), New directions in the theory of graphs (pp. 239-273). Academic Press: New York.Google Scholar
- Tarjan, R., &; Yannakakis, M. (1984). Simple linear time algorithms to test chordality of graphs, test acyclicity of hypergraphs and selectively reduc acyclic hypergraphs. SIAM Journal of Computing, 13, 566-579.Google Scholar
- Verma, T., &; Pearl, J. (1990). Equivalence and synthesis of causal models. In P. Bonissone, M. Henrion, L. Kanal,&; J. Lemmer (Eds.), Proc. of the Conf. on Uncertainty in Artificial Intelligence (pp. 255-268). Morgan Kaufmann.Google Scholar
- Wormald, N. (1985). Counting labeled chordal graphs. Graphs and Combinatorics, 1, 193-200.Google Scholar