Missing Data in Kernel PCA

  • Guido Sanguinetti
  • Neil D. Lawrence
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


Kernel Principal Component Analysis (KPCA) is a widely used technique for visualisation and feature extraction. Despite its success and flexibility, the lack of a probabilistic interpretation means that some problems, such as handling missing or corrupted data, are very hard to deal with. In this paper we exploit the probabilistic interpretation of linear PCA together with recent results on latent variable models in Gaussian Processes in order to introduce an objective function for KPCA. This in turn allows a principled approach to the missing data problem. Furthermore, this new approach can be extended to reconstruct corrupted test data using fixed kernel feature extractors. The experimental results show strong improvements over widely used heuristics.


Test Point Reconstruction Error Probabilistic Interpretation Latent Variable Model Kernel Principal Component Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Schölkopf, B., Smola, A.J., Müller, K.R.: Kernel principal component analysis. In: Gerstner, W., Hasler, M., Germond, A., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, p. 583. Springer, Heidelberg (1997)Google Scholar
  2. 2.
    Lawrence, N.D.: Gaussian process models for visualisation of high dimensional data. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems, vol. 16, pp. 329–336. MIT Press, Cambridge (2004)Google Scholar
  3. 3.
    Lawrence, N.D.: Probabilistic non-linear principal component analysis with Gaussian process latent variable models. Journal of Machine Learning Research 6, 1783–1816 (2005)MathSciNetGoogle Scholar
  4. 4.
    Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)MATHGoogle Scholar
  5. 5.
    Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. Journal of the Royal Statistical Society B 6, 611–622 (1999)MathSciNetGoogle Scholar
  6. 6.
    Bishop, C.M., Svensén, M., Williams, C.K.I.: GTM: The Generative Topographic Mapping. Neural Computation 10, 215–234 (1998)CrossRefGoogle Scholar
  7. 7.
    Lawrence, N.D., Sanguinetti, G.: Matching kernels through Kullback-Leibler divergence minimisation. Technical Report CS-04-12, The University of Sheffield, Department of Computer Science (2004)Google Scholar
  8. 8.
    Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2001)Google Scholar
  9. 9.
    Williams, C.K.I.: Computing with infinite networks. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9. MIT Press, Cambridge (1997)Google Scholar
  10. 10.
    Sanguinetti, G., Lawrence, N.D.: Missing data in kernel PCA. Technical Report CS-06-08, The University of Sheffield, Department of Computer Science (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Guido Sanguinetti
    • 1
  • Neil D. Lawrence
    • 1
  1. 1.Department of Computer ScienceUniversity of SheffieldSheffieldU.K.

Personalised recommendations