Sequential Monte Carlo on large binary sampling spaces
- 754 Downloads
A Monte Carlo algorithm is said to be adaptive if it automatically calibrates its current proposal distribution using past simulations. The choice of the parametric family that defines the set of proposal distributions is critical for good performance. In this paper, we present such a parametric family for adaptive sampling on high dimensional binary spaces.
A practical motivation for this problem is variable selection in a linear regression context. We want to sample from a Bayesian posterior distribution on the model space using an appropriate version of Sequential Monte Carlo.
Raw versions of Sequential Monte Carlo are easily implemented using binary vectors with independent components. For high dimensional problems, however, these simple proposals do not yield satisfactory results. The key to an efficient adaptive algorithm are binary parametric families which take correlations into account, analogously to the multivariate normal distribution on continuous spaces.
We provide a review of models for binary data and make one of them work in the context of Sequential Monte Carlo sampling. Computational studies on real life data with about a hundred covariates suggest that, on difficult instances, our Sequential Monte Carlo approach clearly outperforms standard techniques based on Markov chain exploration.
KeywordsAdaptive Monte Carlo Multivariate binary data Sequential Monte Carlo Linear regression Variable selection
Unable to display preview. Download preview PDF.
- Bahadur, R.: A representation of the joint distribution of responses to n dichotomous items. In: Solomon, H. (ed.) Studies in Item Analysis and Prediction, pp. 158–168. Stanford University Press, Stanford (1961) Google Scholar
- Cox, D.: The analysis of multivariate binary data. Appl. Stat. 113–120 (1972) Google Scholar
- Emrich, L., Piedmonte, M.: A method for generating high dimensional multivariate binary variates. Am. Stat. 45, 302–304 (1991) Google Scholar
- Gordon, N.J., Salmond, D.J., Smith, A.F.M.: Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. Radar Sonar Navig. 140(2), 107–113 (1993) Google Scholar
- Jasra, A., Stephens, D., Doucet, A., Tsagaris, T.: Inference for Lévy-Driven stochastic volatility models via adaptive sequential Monte Carlo. Scand. J. Stat. (2008) Google Scholar
- Lee, A.: Generating random binary deviates having fixed marginal distributions and specified degrees of association. Am. Stat. 47(3) (1993) Google Scholar
- Leisch, F., Weingessel, A., Hornik, K.: On the generation of correlated artificial binary data. Technical report, WU Vienna University of Economics and Business (1998) Google Scholar
- Park, C., Park, T., Shin, D.: A simple method for generating correlated binary variates. Am. Stat. 50(4) (1996) Google Scholar
- Schäfer, C.: Parametric families on large binary spaces. Technical report, Centre de Recherche en Economie et en Statistique, Paris (2011) Google Scholar
- Suchard, M., Holmes, C., West, M.: Some of the what?, why?, how?, who? and where? of graphics processing unit computing for Bayesian analysis. In: Bernardo, J.M. (ed.) Bayesian Statistics, vol. 9. Oxford University Press, London (2010) Google Scholar