High Dimensional Modelling
This chapter describes methods suitable for high-dimensional graphical modeling. Recent years have seen intense interest in applying graphical modeling techniques to data of high dimension: by this we mean from hundreds to tens of thousands of variables. Such data arise routinely in fields such as molecular biology. We first describe two typical datasets: one from a study of gene expression in breast cancer patients, and the other from the HapMap project, in which a large number of genomic markers and gene expression measurements are recorded for 90 individuals. We compare the computational efficiency of some model selection algorithms, as applied to one of the example datasets. Of these, an extension of the Chow-Liu algorithm to find the minimal BIC forest, implemented in the gRapHD package, is found to be most efficient. Also the glasso algorithm and a stepwise decomposable search algorithm are highly efficient. We describe these algorithms in more detail and illustrate their use on the example datasets. Finally, as a more advanced example, we illustrate how a Bayesian equivalent to the minimal BIC forest algorithm for high-dimensional discrete data may be obtained. Assuming a hyper-Dirichlet prior, the maximum a posteriori forest is derived by using the extended Chow-Liu algorithm with appropriate user-defined edge weights. This is illustrated using a subset of the HapMap data.
KeywordsGaussian Data Undirected Graphical Model Model Selection Algorithm Maximum Weight Span Tree Decomposable Model
- Chickering DM (1996) Learning Bayesian networks is NP-complete. In: Fisher D, Lenz HJ (eds) Learning from data: artificial intelligence and statistics V. Springer, New York, pp 121–130 Google Scholar
- Kirshner S, Smyth P, Robertson AW (2004) Conditional Chow-Liu tree structures for modeling discrete-valued vector time series. In: Proceedings of the 20th conference on uncertainty in artificial intelligence, UAI ’04, AUAI Press, Arlington, pp 317–324. http://portal.acm.org/citation.cfm?id=1036843.1036882 Google Scholar
- Miller LD, Smeds J, George J, Vega VB, Vergara L, Ploner A, Pawitan Y, Hall P, Klaar S, Liu ET, Bergh J (2005) An expression signature for p 53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc Natl Acad Sci USA 102(38):13550–13555. http://dx.doi.org/10.1073/pnas.0506230102 CrossRefGoogle Scholar