Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3944))

Included in the following conference series:

Abstract

In many regression tasks, in addition to an accurate estimate of the conditional mean of the target distribution, an indication of the predictive uncertainty is also required. There are two principal sources of this uncertainty: the noise process contaminating the data and the uncertainty in estimating the model parameters based on a limited sample of training data. Both of them can be summarised in the predictive variance which can then be used to give confidence intervals. In this paper, we present various schemes for providing predictive variances for kernel ridge regression, especially in the case of a heteroscedastic regression, where the variance of the noise process contaminating the data is a smooth function of the explanatory variables. The use of leave-one-out cross-validation is shown to eliminate the bias inherent in estimates of the predictive variance. Results obtained on all three regression tasks comprising the predictive uncertainty challenge demonstrate the value of this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Quiñonero-Candela, J.: Evaluating Predictive Uncertainty Challenge (2005), http://www.predict.kyb.tuebingen.mpg.de

  2. Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proc. 15th Int. Conf. on Machine Learning, Madison, WI, pp. 515–521 (1998)

    Google Scholar 

  3. Williams, C., Rasmussen, C.: Gaussian Processes for Regression. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, NIPS, vol. 8. MIT Press, Cambridge (1995)

    Google Scholar 

  4. Suykens, J.A.K., De Brabanter, J., Lukas, L., Vanderwalle, J.: Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing 48, 85–105 (2002)

    Article  MATH  Google Scholar 

  5. Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society of London A 209, 415–446 (1909)

    Article  MATH  Google Scholar 

  6. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines (and other kernel-based learning methods). Cambridge University Press, Cambridge (2000)

    Google Scholar 

  7. Schölkopf, B., Smola, A.J.: Learning with kernels - support vector machines, regularization, optimization and beyond. MIT Press, Cambridge (2002)

    Google Scholar 

  8. Hoerl, A.E., Kennard, R.W.: Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970)

    Article  MATH  Google Scholar 

  9. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)

    MATH  Google Scholar 

  10. Satchwell, C.: Finding error bars (the easy way). Neural Computing Applications Forum 5 (1994)

    Google Scholar 

  11. Lowe, D., Zapart, C.: Point-wise confidence interval estimation by neural networks: A comparative study based on automotive engine calibration. Neural Computing and Applications 8, 77–85 (1999)

    Article  Google Scholar 

  12. Nix, D.A., Weigend, A.S.: Estimating the mean and variance of the target probability distribution. In: Proceedings of the IEEE International Conference on Neural Networks, Orlando, FL, vol. 1, pp. 55–60 (1994)

    Google Scholar 

  13. Williams, P.M.: Using neural networks to model conditional multivariate densities. Neural Computation 8, 843–854 (1996)

    Article  Google Scholar 

  14. Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, Chichester (1998)

    MATH  Google Scholar 

  15. Kimeldorf, G.S., Wahba, G.: Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications 33, 82–95 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  16. Schölkopf, B., Herbrich, R., Smola, A.J.: A generalised representer theorem. In: Proceedings of the Fourteenth International Conference on Computational Learning Theory, Amsterdam, The Netherlands, pp. 416–426 (2001)

    Google Scholar 

  17. Cawley, G.C., Talbot, N.L.C., Foxall, R.J., Dorling, S.R., Mandic, D.P.: Heteroscedastic kernel ridge regression. Neurocomputing 57, 105–124 (2004)

    Article  Google Scholar 

  18. Foxall, R.J., Cawley, G.C., Talbot, N.L.C., Dorling, S.R., Mandic, D.P.: Heteroscedastic regularised kernel regression for prediction of episodes of poor air quality. In: Proceedings of the European Symposium on Artificial Neural Networks (ESANN 2002), Bruges, Belgium, pp. 19–24 (2002)

    Google Scholar 

  19. Yuan, M., Wahba, G.: Doubly penalized likelihood estimator in heteroscedastic regression. Statistics and Probability Letters 69, 11–20 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  20. Nabney, I.T.: Efficient training of RBF networks for classification. In: Proceedings of the Ninth International Conference on Artificial Neural Networks, Edinburgh, United Kingdom, vol. 1, pp. 210–215 (1999)

    Google Scholar 

  21. Stone, M.: Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society B 36, 111–147 (1974)

    MathSciNet  MATH  Google Scholar 

  22. Luntz, A., Brailovsky, V.: On estimation of characters obtained in statistical procedure of recognition (in Russian). Techicheskaya Kibernetica 3 (1969)

    Google Scholar 

  23. Cawley, G.C., Talbot, N.L.C.: Efficient leave-one-out cross-validation of kernel Fisher discriminant classifiers. Pattern Recognition 36, 2585–2592 (2003)

    Article  MATH  Google Scholar 

  24. Williams, P.M.: Bayesian regularization and pruning using a Laplace prior. Neural Computation 7, 117–143 (1995)

    Article  Google Scholar 

  25. Bishop, C.M., Qazaz, C.S.: Bayesian inference of noise levels in regression. In: Vorbrüggen, J.C., von Seelen, W., Sendhoff, B. (eds.) ICANN 1996. LNCS, vol. 1112, pp. 59–64. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  26. Goldberg, P.W., Williams, C.K.I., Bishop, C.M.: Regression with input-dependent noise: A Gaussian process treatment. In: Jordan, M., Kearns, M., Solla, S. (eds.) Advances in Neural Information Processing Systems, vol. 10, pp. 493–499. MIT Press, Cambridge (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cawley, G.C., Talbot, N.L.C., Chapelle, O. (2006). Estimating Predictive Variances with Kernel Ridge Regression. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds) Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment. MLCW 2005. Lecture Notes in Computer Science(), vol 3944. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11736790_5

Download citation

  • DOI: https://doi.org/10.1007/11736790_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33427-9

  • Online ISBN: 978-3-540-33428-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics