Estimating Predictive Variances with Kernel Ridge Regression

  • Gavin C. Cawley
  • Nicola L. C. Talbot
  • Olivier Chapelle
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3944)

Abstract

In many regression tasks, in addition to an accurate estimate of the conditional mean of the target distribution, an indication of the predictive uncertainty is also required. There are two principal sources of this uncertainty: the noise process contaminating the data and the uncertainty in estimating the model parameters based on a limited sample of training data. Both of them can be summarised in the predictive variance which can then be used to give confidence intervals. In this paper, we present various schemes for providing predictive variances for kernel ridge regression, especially in the case of a heteroscedastic regression, where the variance of the noise process contaminating the data is a smooth function of the explanatory variables. The use of leave-one-out cross-validation is shown to eliminate the bias inherent in estimates of the predictive variance. Results obtained on all three regression tasks comprising the predictive uncertainty challenge demonstrate the value of this approach.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Quiñonero-Candela, J.: Evaluating Predictive Uncertainty Challenge (2005), http://www.predict.kyb.tuebingen.mpg.de
  2. 2.
    Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proc. 15th Int. Conf. on Machine Learning, Madison, WI, pp. 515–521 (1998)Google Scholar
  3. 3.
    Williams, C., Rasmussen, C.: Gaussian Processes for Regression. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, NIPS, vol. 8. MIT Press, Cambridge (1995)Google Scholar
  4. 4.
    Suykens, J.A.K., De Brabanter, J., Lukas, L., Vanderwalle, J.: Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing 48, 85–105 (2002)CrossRefMATHGoogle Scholar
  5. 5.
    Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society of London A 209, 415–446 (1909)CrossRefMATHGoogle Scholar
  6. 6.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines (and other kernel-based learning methods). Cambridge University Press, Cambridge (2000)Google Scholar
  7. 7.
    Schölkopf, B., Smola, A.J.: Learning with kernels - support vector machines, regularization, optimization and beyond. MIT Press, Cambridge (2002)Google Scholar
  8. 8.
    Hoerl, A.E., Kennard, R.W.: Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970)CrossRefMATHGoogle Scholar
  9. 9.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)MATHGoogle Scholar
  10. 10.
    Satchwell, C.: Finding error bars (the easy way). Neural Computing Applications Forum 5 (1994)Google Scholar
  11. 11.
    Lowe, D., Zapart, C.: Point-wise confidence interval estimation by neural networks: A comparative study based on automotive engine calibration. Neural Computing and Applications 8, 77–85 (1999)CrossRefGoogle Scholar
  12. 12.
    Nix, D.A., Weigend, A.S.: Estimating the mean and variance of the target probability distribution. In: Proceedings of the IEEE International Conference on Neural Networks, Orlando, FL, vol. 1, pp. 55–60 (1994)Google Scholar
  13. 13.
    Williams, P.M.: Using neural networks to model conditional multivariate densities. Neural Computation 8, 843–854 (1996)CrossRefGoogle Scholar
  14. 14.
    Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, Chichester (1998)MATHGoogle Scholar
  15. 15.
    Kimeldorf, G.S., Wahba, G.: Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications 33, 82–95 (1971)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Schölkopf, B., Herbrich, R., Smola, A.J.: A generalised representer theorem. In: Proceedings of the Fourteenth International Conference on Computational Learning Theory, Amsterdam, The Netherlands, pp. 416–426 (2001)Google Scholar
  17. 17.
    Cawley, G.C., Talbot, N.L.C., Foxall, R.J., Dorling, S.R., Mandic, D.P.: Heteroscedastic kernel ridge regression. Neurocomputing 57, 105–124 (2004)CrossRefGoogle Scholar
  18. 18.
    Foxall, R.J., Cawley, G.C., Talbot, N.L.C., Dorling, S.R., Mandic, D.P.: Heteroscedastic regularised kernel regression for prediction of episodes of poor air quality. In: Proceedings of the European Symposium on Artificial Neural Networks (ESANN 2002), Bruges, Belgium, pp. 19–24 (2002)Google Scholar
  19. 19.
    Yuan, M., Wahba, G.: Doubly penalized likelihood estimator in heteroscedastic regression. Statistics and Probability Letters 69, 11–20 (2004)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Nabney, I.T.: Efficient training of RBF networks for classification. In: Proceedings of the Ninth International Conference on Artificial Neural Networks, Edinburgh, United Kingdom, vol. 1, pp. 210–215 (1999)Google Scholar
  21. 21.
    Stone, M.: Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society B 36, 111–147 (1974)MathSciNetMATHGoogle Scholar
  22. 22.
    Luntz, A., Brailovsky, V.: On estimation of characters obtained in statistical procedure of recognition (in Russian). Techicheskaya Kibernetica 3 (1969)Google Scholar
  23. 23.
    Cawley, G.C., Talbot, N.L.C.: Efficient leave-one-out cross-validation of kernel Fisher discriminant classifiers. Pattern Recognition 36, 2585–2592 (2003)CrossRefMATHGoogle Scholar
  24. 24.
    Williams, P.M.: Bayesian regularization and pruning using a Laplace prior. Neural Computation 7, 117–143 (1995)CrossRefGoogle Scholar
  25. 25.
    Bishop, C.M., Qazaz, C.S.: Bayesian inference of noise levels in regression. In: Vorbrüggen, J.C., von Seelen, W., Sendhoff, B. (eds.) ICANN 1996. LNCS, vol. 1112, pp. 59–64. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  26. 26.
    Goldberg, P.W., Williams, C.K.I., Bishop, C.M.: Regression with input-dependent noise: A Gaussian process treatment. In: Jordan, M., Kearns, M., Solla, S. (eds.) Advances in Neural Information Processing Systems, vol. 10, pp. 493–499. MIT Press, Cambridge (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Gavin C. Cawley
    • 1
  • Nicola L. C. Talbot
    • 1
  • Olivier Chapelle
    • 2
  1. 1.School of Computing SciencesUniversity of East AngliaNorwichU.K.
  2. 2.Max Plank Institute for Biological CyberneticsTübingenGermany

Personalised recommendations