Advertisement

Statistics and Computing

, Volume 15, Issue 2, pp 93–101 | Cite as

Efficient sampling schemes for Bayesian MARS models with many predictors

  • David J. NottEmail author
  • Anthony Y. C. Kuk
  • Hiep Duc
Article

Abstract

Multivariate adaptive regression spline fitting or MARS (Friedman 1991) provides a useful methodology for flexible adaptive regression with many predictors. The MARS methodology produces an estimate of the mean response that is a linear combination of adaptively chosen basis functions. Recently, a Bayesian version of MARS has been proposed (Denison, Mallick and Smith 1998a, Holmes and Denison, 2002) combining the MARS methodology with the benefits of Bayesian methods for accounting for model uncertainty to achieve improvements in predictive performance. In implementation of the Bayesian MARS approach, Markov chain Monte Carlo methods are used for computations, in which at each iteration of the algorithm it is proposed to change the current model by either (a) Adding a basis function (birth step) (b) Deleting a basis function (death step) or (c) Altering an existing basis function (change step). In the algorithm of Denison, Mallick and Smith (1998a), when a birth step is proposed, the type of basis function is determined by simulation from the prior. This works well in problems with a small number of predictors, is simple to program, and leads to a simple form for Metropolis-Hastings acceptance probabilities. However, in problems with very large numbers of predictors where many of the predictors are useless it may be difficult to find interesting interactions with such an approach. In the original MARS algorithm of Friedman (1991) a heuristic is used of building up higher order interactions from lower order ones, which greatly reduces the complexity of the search for good basis functions to add to the model. While we do not exactly follow the intuition of the original MARS algorithm in this paper, we nevertheless suggest a similar idea in which the Metropolis-Hastings proposals of Denison, Mallick and Smith (1998a) are altered to allow dependence on the current model. Our modification allows more rapid identification and exploration of important interactions, especially in problems with very large numbers of predictor variables and many useless predictors. Performance of the algorithms is compared in simulation studies.

Keywords

Markov chain Monte Carlo multivariate adaptive regression splines nonparametric regression Bayesian inference high-dimensional regression 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Albert J.H. and Chib S. 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88: 669–679.Google Scholar
  2. Biller C. 2000. Adaptive Bayesian regression splines in semiparametric generalized linear models. Journal of Computational and Graphical Statistics 9: 122–140.Google Scholar
  3. Carlin B.P. and Chib S. 1995. Bayesian model choice via Markov chain Monte Carlo methods. Journal of the Royal Statistical Society, Ser. B 57: 473–484.Google Scholar
  4. Chib S. and Greenberg E. 1995. Understanding the Metropolis-Hastings algorithm. The American Statistician 40: 327–335.Google Scholar
  5. Denison D.G.T., Mallick B.K. and Smith A.F.M. 1998a. Bayesian MARS. Statistics and Computing 8: 337–346.Google Scholar
  6. Denison D.G.T., Mallick B.K. and Smith A.F.M. 1998b. Automatic Bayesian curve fitting. Journal of the Royal Statistical Society, Ser. B. 60: 333–350.Google Scholar
  7. DiMatteo I., Genovese C.R. and Kass R.E. 1998. Bayesian curve fitting with free knot splines. Biometrika 88: 1055–1073.Google Scholar
  8. Friedman J.H. 1991. Multivariate adaptive regression splines. The Annals of Statistics 19: 1–141.Google Scholar
  9. Friedman J.H. and Silverman B.W. 1989. Flexible parsimonious smoothing and additive modelling. Technometrics 31: 3–39.Google Scholar
  10. Hastie T., Tibshirani R., and Friedman J. 2001. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York.Google Scholar
  11. Hoeting J.A., Madigan D., Raftery A.E., and Volinsky C. 1999. Bayesian model averaging: A tutorial (with discussion). Statistical Science 14: 382–417.Google Scholar
  12. Holmes C.C. and Denison D.G.T. 2002. A Bayesian MARS classifier. Machine Learning, to appear.Google Scholar
  13. Hwang J.-N., Lay S.-R., Maechler M., Martin D., and Schmiert J. 1994. Regression modelling in back-propagation and projection pursuit learning. IEEE Transations on Neural Networks 5: 342–353.Google Scholar
  14. Kohn R., Smith M., and Chan D. 2001. Nonparametric regression using linear combinations of basis functions. Statistics and Computing 11: 313–322.Google Scholar
  15. Kooperberg C., Bose S., and Stone C.J. 1997. Polychotomous regression. Journal of the American Statistical Association 93: 117–127.Google Scholar
  16. Smith M. and Kohn R. 1996. Nonparametric regression using Bayesian variable selection. Journal of Econometrics 75: 317–344.Google Scholar
  17. Tierney L. 1994. Markov chains for exploring posterior distributions. Annals of Statistics 22: 1701–1728.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of New South WalesSydneyAustralia
  2. 2.Department of Statistics and Applied ProbabilityThe National University of SingaporeSingapore
  3. 3.New South Wales Environmental Protection AuthorityLidcombeAustralia

Personalised recommendations