Extensions of the Informative Vector Machine

  • Neil D. Lawrence
  • John C. Platt
  • Michael I. Jordan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3635)


The informative vector machine (IVM) is a practical method for Gaussian process regression and classification. The IVM produces a sparse approximation to a Gaussian process by combining assumed density filtering with a heuristic for choosing points based on minimizing posterior entropy. This paper extends IVM in several ways. First, we propose a novel noise model that allows the IVM to be applied to a mixture of labeled and unlabeled data. Second, we use IVM on a block-diagonal covariance matrix, for “learning to learn” from related tasks. Third, we modify the IVM to incorporate prior knowledge from known invariances. All of these extensions are tested on artificial and real data.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baxter, J.: Learning internal representations. In: Proc. COLT, vol. 8, pp. 311–320. Morgan Kaufmann Publishers, San Francisco (1995)CrossRefGoogle Scholar
  2. 2.
    Becker, S., Thrun, S., Obermayer, K. (eds.): Advances in Neural Information Processing Systems, vol. 15. MIT Press, Cambridge (2003)Google Scholar
  3. 3.
    Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)Google Scholar
  4. 4.
    Caruana, R.: Multitask learning. Machine Learning 28(1), 41–75 (1997)CrossRefGoogle Scholar
  5. 5.
    Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: Becker, et al. (ed.) [2]Google Scholar
  6. 6.
    Cortes, C., Vapnik, V.N.: Support vector networks. Machine Learning 20, 273–297 (1995)MATHGoogle Scholar
  7. 7.
    Csató, L.: Gaussian Processes — Iterative Sparse Approximations. PhD thesis, Aston University (2002)Google Scholar
  8. 8.
    Csató, L., Opper, M.: Sparse representation for Gaussian process models. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems, vol. 13, pp. 444–450. MIT Press, Cambridge (2001)Google Scholar
  9. 9.
    Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. Chapman and Hall, Boca Raton (1995)Google Scholar
  10. 10.
    Kass, R.E., Steffey, D.: Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). Journal of the American Statistical Association 84, 717–726 (1989)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Lawrence, N.D., Jordan, M.I.: Semi-supervised learning via Gaussian processes. In: Advances in Neural Information Processing Systems, vol. 17. MIT Press, Cambridge (2005) (to appear)Google Scholar
  12. 12.
    Lawrence, N.D., Platt, J.C.: Learning to learn with the informative vector machine. In: Greiner, R., Schuurmans, D. (eds.) Proceedings of the International Conference in Machine Learning, vol. 21, pp. 512–519. Morgan Kauffman, San Francisco (2004)Google Scholar
  13. 13.
    Lawrence, N.D., Schölkopf, B.: Estimating a kernel Fisher discriminant in the presence of label noise. In: Brodley, C., Danyluk, A.P. (eds.) Proceedings of the International Conference in Machine Learning, vol. 18. Morgan Kauffman, San Francisco (2001)Google Scholar
  14. 14.
    Lawrence, N.D., Seeger, M., Herbrich, R.: Fast sparse Gaussian process methods: The informative vector machine. In: Becker, et al. (eds.) [2], pp. 625–632Google Scholar
  15. 15.
    MacKay, D.J.C.: Bayesian Methods for Adaptive Models. PhD thesis, California Institute of Technology (1991)Google Scholar
  16. 16.
    Minka, T.P.: A family of algorithms for approximate Bayesian inference. PhD thesis, Massachusetts Institute of Technology (2001)Google Scholar
  17. 17.
    Nabney, I.T.: Netlab: Algorithms for Pattern Recognition. Advances in Pattern Recognition. Springer, Berlin (2001), Code available from http://www.ncrg.aston.ac.uk/netlab/ Google Scholar
  18. 18.
    Schölkopf, B., Burges, C.J.C., Vapnik, V.N.: Incorporating invariances in support vector learning machines. In: Vorbrüggen, J.C., von Seelen, W., Sendhoff, B. (eds.) ICANN 1996. LNCS, vol. 1112, pp. 47–52. Springer, Heidelberg (1996)Google Scholar
  19. 19.
    Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2001)Google Scholar
  20. 20.
    Seeger, M.: Covariance kernels from Bayesian generative models. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 905–912. MIT Press, Cambridge (2002)Google Scholar
  21. 21.
    Seeger, M.: Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations. PhD thesis, The University of Edinburgh (2004)Google Scholar
  22. 22.
    Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Conference on Computational Learning Theory, vol. 10, pp. 287–294. Morgan Kauffman, San Francisco (1992)Google Scholar
  23. 23.
    Sollich, P.: Probabilistic interpretation and Bayesian methods for support vector machines. In: Proceedings 1999 International Conference on Artificial Neural Networks, ICANN 1999, London, U.K, pp. 91–96, The Institution of Electrical Engineers (1999)Google Scholar
  24. 24.
    Thrun, S.: Is learning the n-th thing any easier than learning the first? In: Touretzky, et al. (eds.) [25], pp. 640–646Google Scholar
  25. 25.
    Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.): Advances in Neural Information Processing Systems, vol. 8. MIT Press, Cambridge (1996)Google Scholar
  26. 26.
    Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons, New York (1998)MATHGoogle Scholar
  27. 27.
    Williams, C.K.I.: Computing with infinite networks. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9. MIT Press, Cambridge (1997)Google Scholar
  28. 28.
    Williams, C.K.I., Rasmussen, C.E.: Gaussian processes for regression. In: Touretzky, et al. (eds.) [28], pp. 514–520Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Neil D. Lawrence
    • 1
  • John C. Platt
    • 2
  • Michael I. Jordan
    • 3
  1. 1.Department of Computer ScienceUniversity of SheffieldSheffieldU.K.
  2. 2.Microsoft Research, Microsoft CorporationRedmondU.S.A
  3. 3.Computer Science and StatisticsUniversity of CaliforniaBerkeleyU.S.A

Personalised recommendations