Abstract
We present a new method for classification using a Bayesian version of the Multivariate Adaptive Regression Spline (MARS) model of J.H. Friedman (Annals of Statistics, 19, 1–141, 1991). Special attention is paid to the use of Markov chain Monte Carlo (MCMC) simulation to gain inference under the model. In particular we discuss three important developments in MCMC methodology. First, we describe the reversible jump MCMC algorithm of P.J. Green (Biometrika, 82, 711–732, 1995) which allows inference on a varying dimensional, possibly uncountable, model space. This allows us to consider MARS models of differing numbers and positions of splines. Secondly, we discuss marginalisation which is used to reduce the effective dimension of the parameter space under consideration. Thirdly, we describe the use of latent variables to improve the MCMC computation. Our methods are generic and can be applied to any basis function model including, wavelets, artificial neural nets and radial basis functions. We present examples to show that the Bayesian MARS classifier is competitive with other approaches on a number of benchmark data sets.
Article PDF
Similar content being viewed by others
References
Albert, J. H., &; Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88, 669-679.
Andrieu, C., de Freitas, J. F. G., &; Doucet, A. (2000). Robust full Bayesian methods for neural networks. In S. A. Solla, T. K. Leen, &; K. Muller (Eds.), Advances in neural information processing systems (NIPS 12) (Vol. 12, pp. 379-385). MIT Press.
Denison, D. G. T., Adams, N. A., Holmes, C. C., &; Hand, D. J. (2001). Bayesian partition modelling. Computational Statistics and Data Analysis, to appear.
Denison, D. G. T., Mallick, B. K., &; Smith, A. F. M. (1998). Bayesian MARS. Statistics and Computing, 8, 337-346.
Draper, D. (1995). Assessment and propogation of model uncertainty (with discussion). Journal of the Royal Statistical Society series B, 57, 45-97.
Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Annals of Statistics, 19, 1-141.
Gilks, W. R., Richardson, S., &; Spiegelhalter, D. J. (1996). Markov chain Monte Carlo in practice. London: Chapman and Hall.
Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711-732.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97-109.
Holmes, C. C., &; Mallick, B. K. (1998). Bayesian radial basis functions of variable dimension. Neural Computation, 10, 1217-1233.
Holmes, C. C., &; Mallick, B. K. (2000). Bayesian wavelet networks for nonparametric regression. IEEE Transactions on Neural Networks, 11, 27-35.
Husmeier, D., Penny, W. D., &; Roberts, S. J. (1999). An empirical evaluation of Bayesian sampling with hybrid Monte Carlo for training neural network classifiers. Neural Networks, 12, 677-705.
Kass, R. E., &; Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773-795.
Kelly, M. (1998). Tackling change and uncertainty in credit scoring. PhD Thesis, The Open University.
Kooperberg, C., Bose, S., &; Stone, C. J. (1997). Polychotomous regression. Journal of the American Statistical Association, 93, 117-127.
MacKay, D. J. C. (1992). Bayesian interpolation. Neural Computation, 4, 415-447.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., &; Teller, E. (1953). Equations of state calculations by fast computing machines. Journal of Chemical Physics, 21, 1087-1091.
Neal, R. M. (1996). Bayesian learning for neural networks. New York: Springer-Verlag.
O'Hagan, A. (1994). Kendall's advanced theory of statistics: Bayesian inference (Vol. 2b). Cambridge: Arnold.
Rasmussen, C. E. (1996). Evaluation of Gaussian processes and other methods for non-linear regression. PhD Thesis, University of Toronto.
Robert, C. P. (1995). Simulation of truncated normal variables. Statistics and Computing, 5, 121-125.
Schapire, R. E., Freund, Y., Bartlett, P., &; Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26, 1651-1686.
Schapire, R. E., &; Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37, 297-336.
Smith, M., &; Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. Journal of Econometrics, 75, 317-344.
Spyers-Ashby, J. M. (1996). The recording and analysis of tremor in neurological disorders. PhD Thesis, Imperial College, London University.
Wood, S., &; Kohn, R. (1998). A Bayesian approach to robust binary nonparametric regression. Journal of the American Statistical Association, 93, 203-213.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Holmes, C., Denison, D. Classification with Bayesian MARS. Machine Learning 50, 159–173 (2003). https://doi.org/10.1023/A:1020254013004
Issue Date:
DOI: https://doi.org/10.1023/A:1020254013004