Sparse Point Estimation for Bayesian Regression via Simulated Annealing

  • Sudhir Raman
  • Volker Roth
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7476)


In the context of variable selection in a regression model, the classical Lasso based optimization approach provides a sparse estimate with respect to regression coefficients but is unable to provide more information regarding the distribution of regression coefficients. Alternatively, using a Bayesian approach is more advantageous since it gives direct access to the distribution which is usually summarized by estimating the expectation (not sparse) and variance. Additionally, to support frequent application requirements, heuristics like thresholding are generally used to produce sparse estimates for variable selection purposes. In this paper, we provide a more principled approach for generating a sparse point estimate in a Bayesian framework. We extend an existing Bayesian framework for sparse regression to generate a MAP estimate by using simulated annealing. We then justify this extension by showing that this MAP estimate is also sparse in the regression coefficients. Experiments on real world applications like the splice site detection and diabetes progression demonstrate the usefulness of the extension.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Andrieu, C., Breyer, L.A., Doucet, A.: Convergence of simulated annealing using Foster-Lyapunov criteria. J. Appl. Probab. 38(4), 975–994 (2001)MathSciNetMATHCrossRefGoogle Scholar
  2. 2.
    Caron, F., Doucet, A.: Sparse Bayesian nonparametric regression. In: ICML 2008, pp. 88–95. ACM (2008)Google Scholar
  3. 3.
    Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. The Annals of Statistics 32(2), 407–499 (2004)MathSciNetMATHCrossRefGoogle Scholar
  4. 4.
    Gelfand, S.B., Mitter, S.K.: Metropolis-type annealing algorithms for global optimization in Rd. SIAM J. Control Optim. 31, 111–131 (1993)MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    van Gerven, M., Cseke, B., Oostenveld, R., Heskes, T.: Bayesian source localization with the multivariate laplace prior. In: Advances in Neural Information Processing Systems 22, pp. 1901–1909 (2009)Google Scholar
  6. 6.
    Goldstein, L.: Mean square rates of convergence in the continuous time simulated annealing algorithm on Rd. Adv. Appl. Math. 9, 35–39 (1988)MATHCrossRefGoogle Scholar
  7. 7.
    Gramacy, R.B., Polson, N.G.: Simulation-based Regularized Logistic Regression. ArXiv e-prints (May 2010)Google Scholar
  8. 8.
    Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)MathSciNetMATHCrossRefGoogle Scholar
  9. 9.
    Kyung, M., Gill, J., Ghosh, M., Casella, G.: Penalized regression, standard errors, and Bayesian Lassos. Bayesian Analysis 5(2), 369–412 (2010)MathSciNetGoogle Scholar
  10. 10.
    Meier, L., van de Geer, S., Bühlmann, P.: The Group Lasso for logistic regression. J. Roy. Stat. Soc. B 70(1), 53–71 (2008)MATHCrossRefGoogle Scholar
  11. 11.
    Park, T., Casella, G.: The Bayesian Lasso. Journal of the American Statistical Association 103, 681–686 (2008)MathSciNetMATHCrossRefGoogle Scholar
  12. 12.
    Raman, S., Fuchs, T., Wild, P., Dahl, E., Roth, V.: The Bayesian Group-Lasso for analyzing contingency tables. In: Proceedings of the 26th International Conference on Machine Learning, pp. 881–888 (June 2009)Google Scholar
  13. 13.
    Raman, S., Roth, V.: Sparse Bayesian Regression for Grouped Variables in Generalized Linear Models. In: Denzler, J., Notni, G., Süße, H. (eds.) DAGM 2009. LNCS, vol. 5748, pp. 242–251. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  14. 14.
    Roberts, G.O., Stramer, O.: Langevin diffusions and Metropolis-Hastings algorithms. Methodology and Computing in Applied Probability 4, 337–357 (2002)MathSciNetMATHCrossRefGoogle Scholar
  15. 15.
    Royer, G.: A remark on simulated annealing of diffusion processes. SIAM Journal on Control and Optimization 27(6), 1403–1408 (1989)MathSciNetMATHCrossRefGoogle Scholar
  16. 16.
    Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc. B 58(1), 267–288 (1996)MathSciNetMATHGoogle Scholar
  17. 17.
    Černý, V.: Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm. Journal of Optimization Theory and Applications 45(1), 41–51 (1985)MathSciNetMATHCrossRefGoogle Scholar
  18. 18.
    Yeo, G., Burge, C.: Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comp. Biology 11, 377–394 (2004)CrossRefGoogle Scholar
  19. 19.
    Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. B, 49–67 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sudhir Raman
    • 1
  • Volker Roth
    • 1
  1. 1.Department of Mathematics and Computer ScienceUniversity of BaselBaselSwitzerland

Personalised recommendations