Smooth relevance vector machine: a smoothness prior extension of the RVM

Schmolck, Alexander; Everson, Richard

doi:10.1007/s10994-007-5012-z

Smooth relevance vector machine: a smoothness prior extension of the RVM

Published: 07 July 2007

Volume 68, pages 107–135, (2007)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Smooth relevance vector machine: a smoothness prior extension of the RVM

Download PDF

Alexander Schmolck¹ &
Richard Everson¹

1351 Accesses
37 Citations
Explore all metrics

Abstract

Enforcing sparsity constraints has been shown to be an effective and efficient way to obtain state-of-the-art results in regression and classification tasks. Unlike the support vector machine (SVM) the relevance vector machine (RVM) explicitly encodes the criterion of model sparsity as a prior over the model weights. However the lack of an explicit prior structure over the weight variances means that the degree of sparsity is to a large extent controlled by the choice of kernel (and kernel parameters). This can lead to severe overfitting or oversmoothing—possibly even both at the same time (e.g. for the multiscale Doppler data). We detail an efficient scheme to control sparsity in Bayesian regression by incorporating a flexible noise-dependent smoothness prior into the RVM. We present an empirical evaluation of the effects of choice of prior structure on a selection of popular data sets and elucidate the link between Bayesian wavelet shrinkage and RVM regression. Our model encompasses the original RVM as a special case, but our empirical results show that we can surpass RVM performance in terms of goodness of fit and achieved sparsity as well as computational performance in many cases. The code is freely available.

References

Arfken, G. (1985). Mathematical methods for physicists. New York: Academic Press.
Google Scholar
Bernardo, J., & Smith, A. (1994). Bayesian theory. New York: Wiley.
MATH Google Scholar
Bishop, C., & Tipping, M. (2000). Variational relevance vector machines. In Proceedings of the 16th conference on uncertainty in artificial intelligence (pp. 46–53).
Bobin, J., Moudden, Y., Starck, J.-L., & Elad, M. (2005). Multichannel morphological component analysis. In Proceedings of Spars05 (pp. 103–106), Rennes, France.
Chipman, H., Kolaczyk, E., & McCulloch, R. (1997). Adaptive Bayesian wavelet shrinkage. Journal of the American Statistical Association, 92, 1413–1421.
Article MATH Google Scholar
Clarkson, E., & Barrett, H. (2001). High-pass filters give histograms with positive kurtosis. Optics Letters, 26(16), 1253–1255.
Google Scholar
Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia: SIAM.
MATH Google Scholar
Denison, D., Holmes, C., Mallick, B., & Smith, A. (2002). Bayesian methods for nonlinear classification and regression. New York: Wiley.
MATH Google Scholar
Donoho, D., & Johnstone, I. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3), 425–455.
Article MATH MathSciNet Google Scholar
D’Souza, A., Vijayakumar, S., & Schaal, S. (2004). The Bayesian backfitting relevance vector machine. In Proceedings of the international conference on machine learning (ICML 2004).
Faul, A., & Tipping, M. (2002). Analysis of sparse Bayesian learning. In Advances in neural information processing systems (Vol. 14).
Figueiredo, M. (2003). Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 1150–1159.
Article Google Scholar
Fokoué, E., Goel, P., & Sun, D. (2004). A prior for consistent estimation for the relevance vector machine. Technical report, Statistical and Applied Mathematical Sciences Institute, Research Triangle Park, NC, USA.
Girolami, M., & Rogers, S. (2005). Hierachic Bayesian models for kernel learning. In 22nd international conference on machine learning (ICML 2005) (pp. 241–248), Bonn.
Golub, G. H., & van Loan, C. E. (1989). Matrix computations (2nd ed.). Baltimore: Hopkins.
MATH Google Scholar
Hastie, T., & Tibshirani, R. (1990). Generalized additive models. London: Chapman and Hall.
MATH Google Scholar
Hoerl, A., & Kennard, R. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.
Article MATH Google Scholar
Holmes, C., & Denison, G. (1999). Bayesian wavelet analysis with a model complexity prior. In Bayesian statistics 6: proceedings of the sixth Valencia international meeting (pp. 769–776), Oxford.
Jansen, M. (2001). Noise reduction by wavelet thresholding. New York: Springer.
MATH Google Scholar
Lam, E., & Goodman, J. (2000). A mathematical analysis of the DCT coefficient distributions for images. IEEE Trans. Image Processing, 9, 1661–1666.
Article MATH Google Scholar
MacKay, D. (1992). The evidence framework applied to classification networks. Neural Computation, 4(5), 720–736.
Article Google Scholar
Mallat, S. (1999). A wavelet tour of signal processing (2nd ed.). New York: Academic Press.
MATH Google Scholar
Press, W., Teukolsky, S., Vetterling, W., & Flannery, B. (1992). Numerical Recipes in C (2nd ed.). Cambridge: Cambridge University Press.
MATH Google Scholar
Quiñonero-Candela, J. (2004). Learning with uncertainty—Gaussian processes and relevance vector machines. PhD thesis, Technical University of Denmark, Lyngby, Denmark.
Roweis, S. (1999). Matrix identities. Available from http://www.cs.toronto.edu/~roweis/notes/matrixid.pdf.
Schmolck, A., & Everson, R. (2005). Smoothness priors for sparse Bayesian regression. In Workshop on signal processing with adaptive sparse structured representations, Rennes, France.
Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT.
Google Scholar
Smola, A., & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. In P. Langley (Ed.), Proceedings of the 17th international conference on machine learning (pp. 911–918). San Fransisco: Morgan Kaufman.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (B), 58, 267–288.
MATH MathSciNet Google Scholar
Tipping, M. (2000). The relevance vector machine. In A. Solla, T. Leen, K.-R. Müller (Eds.), Advances in neural information processing systems (Vol. 12, pp. 652–658). Cambridge: MIT.
Google Scholar
Tipping, M. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.
Article MATH MathSciNet Google Scholar
Tipping, M., & Faul, A. (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In Proceedings of the ninth international workshop on artificial intelligence and statistics.
Vidakovic, B. (1998a). Nonlinear wavelet shrinkage with Bayes rules and Bayes factors. Journal of the American Statistical Association, 93, 173–179.
Article MATH MathSciNet Google Scholar
Vidakovic, B. (1998b). Wavelet-based non-parametric Bayes Methods. In Practical nonparametric and semiparametric Bayesian statistics (pp. 133–155). New York: Springer.
Google Scholar
Wipf, D., & Rao, B. (2004). Perspectives on sparse Bayesian learning. In Advances in neural information processing systems (Vol. 16).
Wipf, D., & Rao, B. (2005). Finding sparse representations in multiple response models via Bayesian learning. In Proceedings of Spars05 (pp. 155–158), Rennes.

Download references

Author information

Authors and Affiliations

School of Engineering, Computer Science and Mathematics, The University of Exeter, Exeter, EX4 4QF, UK
Alexander Schmolck & Richard Everson

Authors

Alexander Schmolck
View author publications
You can also search for this author in PubMed Google Scholar
Richard Everson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richard Everson.

Additional information

Action Editor: Dale Schuurmans.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schmolck, A., Everson, R. Smooth relevance vector machine: a smoothness prior extension of the RVM. Mach Learn 68, 107–135 (2007). https://doi.org/10.1007/s10994-007-5012-z

Download citation

Received: 29 April 2006
Revised: 30 January 2007
Accepted: 21 April 2007
Published: 07 July 2007
Issue Date: August 2007
DOI: https://doi.org/10.1007/s10994-007-5012-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Smooth relevance vector machine: a smoothness prior extension of the RVM

Abstract

Article PDF

Similar content being viewed by others

Adaptive Sparse Bayesian Regression with Variational Inference for Parameter Estimation

Robust relevance vector machine for classification with variational inference

An interpretable regression approach based on bi-sparse optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Smooth relevance vector machine: a smoothness prior extension of the RVM

Abstract

Article PDF

Similar content being viewed by others

Adaptive Sparse Bayesian Regression with Variational Inference for Parameter Estimation

Robust relevance vector machine for classification with variational inference

An interpretable regression approach based on bi-sparse optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation