Local Dimensionality Reduction for Non-Parametric Regression

Article

Abstract

Locally-weighted regression is a computationally-efficient technique for non-linear regression. However, for high-dimensional data, this technique becomes numerically brittle and computationally too expensive if many local models need to be maintained simultaneously. Thus, local linear dimensionality reduction combined with locally-weighted regression seems to be a promising solution. In this context, we review linear dimensionality-reduction methods, compare their performance on non-parametric locally-linear regression, and discuss their ability to extend to incremental learning. The considered methods belong to the following three groups: (1) reducing dimensionality only on the input data, (2) modeling the joint input-output data distribution, and (3) optimizing the correlation between projection directions and output data. Group 1 contains principal component regression (PCR); group 2 contains principal component analysis (PCA) in joint input and output space, factor analysis, and probabilistic PCA; and group 3 contains reduced rank regression (RRR) and partial least squares (PLS) regression. Among the tested methods, only group 3 managed to achieve robust performance even for a non-optimal number of components (factors or projection directions). In contrast, group 1 and 2 failed for fewer components since these methods rely on the correct estimate of the true intrinsic dimensionality. In group 3, PLS is the only method for which a computationally-efficient incremental implementation exists. Thus, PLS appears to be ideally suited as a building block for a locally-weighted regressor in which projection directions are incrementally added on the fly.

Keywords

Correlation Dimensionality reduction Factor analysis Incremental learning Kernel function Locally-weighted regression Partial least squares Principal component analysis Principal component regression Reduced-rank regression 

References

  1. 1.
    Abraham B, Merola G (2005) Dimensionality reduction approach to multivariate prediction. Comput Stat Data Anal 48: 5–16MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Atkeson CG, Moore AW, Schaal S (1997a) Locally weighted learning. Artif Intell Rev 11: 11–73CrossRefGoogle Scholar
  3. 3.
    Atkeson CG, Moore AW, Schaal S (1997b) Locally weighted learning for control. Artif Intell Rev 11: 75–113CrossRefGoogle Scholar
  4. 4.
    Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comp 15(6): 1373–1396MATHCrossRefGoogle Scholar
  5. 5.
    Bell A, Sejnowski T (1997) The independent components of natural scenes are edge filters. Vis Res 37(23): 3327–3338CrossRefGoogle Scholar
  6. 6.
    Bellman RE (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton, NJMATHGoogle Scholar
  7. 7.
    Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is ‘nearest neighbor’ meaningful? In: Proceedings of 7th international conference on database theory, Jerusalem, Israel, pp 217–235Google Scholar
  8. 8.
    Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkMATHGoogle Scholar
  9. 9.
    Cressie N (1993) Statistics for spatial data. Wiley, New YorkGoogle Scholar
  10. 10.
    de Jong S (1993) SIMPLS: an alternative approach to partial least squares regression. Chemometr Intell Lab Syst 18: 251–263CrossRefGoogle Scholar
  11. 11.
    Diamantaras KI, Kung SY (1996) Principal component neural networks. Wiley, New YorkMATHGoogle Scholar
  12. 12.
    D’Souza A, Vijayakumar S, Schaal S (2001) Are internal models of the entire body learnable? Society for Neuroscience, AbstractsGoogle Scholar
  13. 13.
    Everitt BS (1984) An introduction to latent variable models. Chapman and Hall, LondonMATHGoogle Scholar
  14. 14.
    Fan J, Gijbels I (1996) Local polynomial modeling and its applications. Chapman and Hall, London, UKGoogle Scholar
  15. 15.
    Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2): 109–135MATHCrossRefGoogle Scholar
  16. 16.
    Geweke J (1996) Bayesian reduced rank regression in econometrics. J Econometr 75(1): 121–146MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Ghahramani Z, Beal M (2000) Variational inference for bayesian mixtures of factor analysers. In: Solla S, Leen T, Müller KR (eds) Advances in neural information processing systems, vol 12. MIT Press, Cambridge, MA, pp 449–455Google Scholar
  18. 18.
    Ghahramani Z, Hinton GE (1997) The EM algorithm for mixtures of factor analyzers. Tech. Rep. CRG-TR-96-1. Department of Computer Science, University of Toronto, CanadaGoogle Scholar
  19. 19.
    Hinton G, Roweis ST (2003) Stochastic neighbor embedding. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems, vol 15. MIT Press, Cambridge, MA, pp 857–864Google Scholar
  20. 20.
    Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1): 55–67MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Hoffmann H (2005) Unsupervised learning of visuomotor associations, MPI series in biological cybernetics, vol 11. Logos Verlag Berlin, PhD thesis (2004), Bielefeld University, GermanyGoogle Scholar
  22. 22.
    Hoffmann H, Möller R (2003) Unsupervised learning of a kinematic arm model. In: Kaynak O, Alpaydin E, Oja E, Xu L (eds) Artificial neural networks and neural information processing—ICANN/ICONIP 2003, LNCS, vol 2714. Springer, Berlin, pp 463–470Google Scholar
  23. 23.
    Hoffmann H, Schenck W, Möller R (2005) Learning visuomotor transformations for gaze-control and grasping. Biol Cybernetics 93(2): 119–130MATHCrossRefGoogle Scholar
  24. 24.
    Izenman AJ (1975) Reduced-rank regression for the multivariate linear model. J Multivar Anal 5(2): 248–264MATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Jordan MI, Rumelhart DE (1992) Forward models: supervised learning with a distal teacher. Cogn Sci 16: 307–354CrossRefGoogle Scholar
  26. 26.
    Kawato M (1999) Internal models for motor control and trajectory planning. Curr Opin Neurobiol 9: 718–727CrossRefGoogle Scholar
  27. 27.
    Kustra R (1996) Delve census-house dataset. Available at http://www.cs.toronto.edu/~delve/data/datasets.html
  28. 28.
    Matheron G (1963) Principles of geostatistics. Econ Geol 58(8): 1246–1266CrossRefGoogle Scholar
  29. 29.
    Möller R (2002) Interlocking of learning and orthonormalization in RRLSA. Neurocomputing 49: 429–433MATHCrossRefGoogle Scholar
  30. 30.
    Movellan JR, McClelland JL (1993) Learning continuous probability distributions with symmetric diffusion networks. Cogn Sci 17: 463–496CrossRefGoogle Scholar
  31. 31.
    Oja E (1982) A simplified neuron model as a principal component analyzer. J Math Biol 15(3): 267–273MATHCrossRefMathSciNetGoogle Scholar
  32. 32.
    Oja E (1989) Neural networks, principle components, and subspaces. Int J Neural Syst 1(1): 61–68CrossRefMathSciNetGoogle Scholar
  33. 33.
    Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381: 607–609CrossRefGoogle Scholar
  34. 34.
    Ouyang S, Bao Z, Liao GS (2000) Robust recursive least squares learning algorithm for principal component analysis. IEEE Trans Neural Netw 11(1): 215–221CrossRefGoogle Scholar
  35. 35.
    Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge, MAMATHGoogle Scholar
  36. 36.
    Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290: 2323–2326CrossRefGoogle Scholar
  37. 37.
    Sanger TD (1989) Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Netw 2: 459–473CrossRefGoogle Scholar
  38. 38.
    Schaal S, Sternad D (2001) Origins and violations of the 2/3 power law in rhythmic 3d movements. Exp Brain Res 136: 60–72CrossRefGoogle Scholar
  39. 39.
    Schaal S, Vijayakumar S, Atkeson CG (1998) Local dimensionality reduction. In: Jordan M, Kearns M, Solla S (eds) Advances in neural information processing systems, vol 10. MIT Press, Cambridge, MA, pp 633–639Google Scholar
  40. 40.
    Schölkopf B, Smola AJ (2002) Learning with kernels. MIT Press, Cambridge, MAGoogle Scholar
  41. 41.
    Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290: 2319–2323CrossRefGoogle Scholar
  42. 42.
    Tipping ME (2001) Sparse bayesian learning and the relevance vector machine. J Mach Learn Res 1: 211–244MATHCrossRefMathSciNetGoogle Scholar
  43. 43.
    Tipping ME, Bishop CM (1999) Probabilistic principal component analysis. J Roy Stat Soc Ser B 61(3): 611–622MATHCrossRefMathSciNetGoogle Scholar
  44. 44.
    Vapnik VN (1995) The nature of statistical learning theory. Springer-Verlag, New YorkMATHGoogle Scholar
  45. 45.
    Vijayakumar S, Schaal S (2000a) Fast and efficient incremental learning for high-dimensional movement systems. In: Proceedings of the international conference on robotics and automation, San Francisco, CAGoogle Scholar
  46. 46.
    Vijayakumar S, Schaal S (2000b) Locally weighted projection regression: an O(n) algorithm for incremental real time learning in high dimensional space. In: Proceedings of the 17th international conference on machine learning, Montreal, Canada, pp 1079–1086Google Scholar
  47. 47.
    Vijayakumar S, D’Souza A, Shibata T, Conradt J, Schaal S (2002) Statistical learning for humanoid robots. Auton Robots 12(1): 55–69MATHCrossRefGoogle Scholar
  48. 48.
    Vijayakumar S, D’Souza A, Schaal S (2005) Incremental online learning in high dimensions. Neural Comput 17: 2602–2634CrossRefMathSciNetGoogle Scholar
  49. 49.
    Vlassis N, Motomura Y, Kröse B (2002) Supervised dimension reduction of intrinsically low-dimensional data. Neural Comput 14: 191–215MATHCrossRefGoogle Scholar
  50. 50.
    Webster JT, Gunst RF, Mason RL (1974) Latent root regression analysis. Technometrics 16(4): 513–522MATHCrossRefMathSciNetGoogle Scholar
  51. 51.
    Weinberger KQ, Sha F, Saul LK (2004) Learning a kernel matrix for nonlinear dimensionality reduction. In: Proceedings of the 21st international conference on machine learning, Washington, DCGoogle Scholar
  52. 52.
    Wold S, Ruhe A, Wold H, Dunn WJ III (1984) The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM J Sci Stat Comput 5(3): 735–743MATHCrossRefGoogle Scholar
  53. 53.
    van den Wollenberg AL (1977) Redundancy analysis an alternative for canonical correlation analysis. Psychometrika 42(2): 207–219MATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC. 2009

Authors and Affiliations

  • Heiko Hoffmann
    • 1
    • 2
  • Stefan Schaal
    • 3
  • Sethu Vijayakumar
    • 1
  1. 1.IPAB, School of InformaticsUniversity of EdinburghEdinburghUK
  2. 2.Biomedical EngineeringUniversity of Southern CaliforniaLos AngelesUSA
  3. 3.Computer Science and NeuroscienceUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations