, Volume 85, Issue 4, pp 267–299 | Cite as

Principal manifold learning by sparse grids

  • Christian Feuersänger
  • Michael Griebel


In this paper, we deal with the construction of lower-dimensional manifolds from high-dimensional data which is an important task in data mining, machine learning and statistics. Here, we consider principal manifolds as the minimum of a regularized, non-linear empirical quantization error functional. For the discretization we use a sparse grid method in latent parameter space. This approach avoids, to some extent, the curse of dimension of conventional grids like in the GTM approach. The arising non-linear problem is solved by a descent method which resembles the expectation maximization algorithm. We present our sparse grid principal manifold approach, discuss its properties and report on the results of numerical experiments for one-, two- and three-dimensional model problems.


Sparse grids Regularized principal manifolds High-dimensional data 

Mathematics Subject Classification (2000)

65N30 65F10 65N22 41A29 41A63 65D15 65D10 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Aronzaijn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68: 337–404CrossRefGoogle Scholar
  5. 5.
    Babenko K (1960) Approximation by trigonometric polynomials in a certain class of periodic functions of several variables. Soviet Math Dokl 1:672–675. Russian original in Dokl. Akad. Nauk SSSR, 132 (1960), pp. 982–985Google Scholar
  6. 6.
    Balder R (1994) Adaptive Verfahren für elliptische und parabolische Differentialgleichungen. Dissertation, Technische Universität MünchenGoogle Scholar
  7. 7.
    Balder R, Zenger C (1996) The solution of the multidimensional real Helmholtz equation on sparse grids. SIAM J Sci Comp 17: 631–646zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Banks D, Olszewski R (1997) Estimating local dimensionality. In: Proceedings of the Statistical Computing Section of the American Statistical Society, ASAGoogle Scholar
  9. 9.
    Bishop C, James G (1993) Analysis of multiphase flows using dual-energy gamma densitometry and neural networks. Nucl Instrum Methods Phys Res A327: 580–593Google Scholar
  10. 10.
    Bishop C, Svensen M, Williams C (1998) GTM: the generative topographic mapping. Neural Comput 10(2): 215–234CrossRefGoogle Scholar
  11. 11.
    Bishop C, Svensen M, Williams C (1998) Developments of the generative topographic mapping. Neurocomputing 21: 203–224zbMATHCrossRefGoogle Scholar
  12. 12.
    Bonk T (1994) Ein rekursiver Algorithmus zur adaptiven numerischen Quadratur mehrdimensionaler Funktionen. Dissertation, Institut für Informatik, Technische Universität MünchenGoogle Scholar
  13. 13.
    Broomhead D, King G (1986) Extracting qualitative dynamics from experimental data. Phys D 20: 217zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Broomhead D, Kirby M (2000) A new approach to dimensionality reduction: Theory and algorithms. SIAM J Appl Math 60(6): 2114–2142zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Bruske J, Summer G (1998) Intrinsic dimensionality estimation with optimally topology preserving maps. IEEE Trans Pattern Anal Mach Intel 20(5): 572–575CrossRefGoogle Scholar
  16. 16.
    Bungartz H-J (1992) An adaptive Poisson solver using hierarchical bases and sparse grids. In: Iterative methods in linear algebra. Elsevier, Amsterdam, pp 293–310Google Scholar
  17. 17.
    Bungartz H-J, Griebel M (1999) A note on the complexity of solving Poisson’s equation for spaces of bounded mixed derivatives. J Complexity 15: 167–199zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Bungartz H-J, Griebel M (2004) Sparse grids. Acta Numer 13: 1–121CrossRefMathSciNetGoogle Scholar
  19. 19.
    Chang K, Ghosh J (2001) A unified model for probabilistic principal surfaces. IEEE Trans Pattern Anal Mach Intel 23(1): 22–41CrossRefGoogle Scholar
  20. 20.
    Chang K, Ghosh J (2005) Probabilistic principal surfaces classifier. In: Wang L, Jin Y (eds) FSKD 2005. LNAI, vol 3614, pp 1236–1244Google Scholar
  21. 21.
    Carreira-Perpinan M (1997) A review of dimension reduction techniques. Technical Report CS-96-09. Department of Computer Science, University of SheffieldGoogle Scholar
  22. 22.
    Delicado P (2001) Another look at principal curves and surfaces. J Multivar Anal 77(1): 84–116zbMATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Delvos F, Schempp W (1989) Boolean methods in interpolation and approximation. Pitman Research Notes in Mathematics, vol 230. Longman Scientific and Technical, HarlowGoogle Scholar
  24. 24.
    Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39(1): 1–38zbMATHMathSciNetGoogle Scholar
  25. 25.
    Der R, Steinmetz U, Balzuweit G (1998) Nonlinear principal component analysis. Technical Report, Institut für Informatik, Universität LeipzigGoogle Scholar
  26. 26.
    DeVore R, Konyagin S, Temlyakov V (1998) Hyperbolic wavelet approximation. Constr Approx 14: 1–26CrossRefMathSciNetGoogle Scholar
  27. 27.
    Dong D, McAvoy T (1995) Nonlinear principal component analysis, based on principal curves and neural networks. Comput Chem Eng 20(1): 65–78CrossRefGoogle Scholar
  28. 28.
    Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13: 1–50zbMATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    Feuersänger C (2005) Dünngitterverfahren für hochdimensionale elliptische partielle Differentialgleichungen. Diplomarbeit, Institut für Numerische Simulation, Universität BonnGoogle Scholar
  30. 30.
    Garcke J, Griebel M (2000) On the computation of the eigenproblems of hydrogen and helium in strong magnetic and electric fields with the sparse grid combination technique. J Comput Phys 165: 694–716zbMATHCrossRefMathSciNetGoogle Scholar
  31. 31.
    Garcke J, Hegland M (2006) Fitting multidimensional data using gradient penalties and combination techniques. In: Proceedings of HPSC. Hanoi, VietnamGoogle Scholar
  32. 32.
    Gerstner T, Griebel M (1998) Numerical integration using sparse grids. Numer Algorithms 18: 209–232zbMATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Gerstner T, Griebel M (2003) Dimension-adaptive tensor-product quadrature. Computing 71(1): 65–87zbMATHCrossRefMathSciNetGoogle Scholar
  34. 34.
    Gordon W (1971) Blending function methods of bivariate and multivariate interpolation and approximation. SIAM J Numer Anal 8: 158–177zbMATHCrossRefMathSciNetGoogle Scholar
  35. 35.
    Griebel M (2006) Sparse grids and related approximation schemes for higher dimensional problems. In: Pardo L, Pinkus A, Suli E, Todd MJ (eds) Proceedings of the conference on foundations of computational mathematics (FoCM05), Santander, Spain (2005), Foundations of Computational Mathematics. LMS, vol 331, Cambridge University Press, CambridgeGoogle Scholar
  36. 36.
    Griebel M (1998) Adaptive sparse grid multilevel methods for elliptic PDEs based on finite differences. Computing 61(2): 151–179zbMATHCrossRefMathSciNetGoogle Scholar
  37. 37.
    Griebel M, Knapek S (2000) Optimized tensor-product approximation spaces. Constr Approx 16(4): 525–540zbMATHCrossRefMathSciNetGoogle Scholar
  38. 38.
    Griebel M, Oswald P (1994) On additive Schwarz preconditioners for sparse grid discretizations. Numer Math 66: 449–464zbMATHCrossRefMathSciNetGoogle Scholar
  39. 39.
    Griebel M, Oswald P (1995) Tensor product type subspace splitting and multilevel iterative methods for anisotropic problems. Adv Comput Math 4: 171–206zbMATHCrossRefMathSciNetGoogle Scholar
  40. 40.
    Griebel M, Zenger C, Zimmer S (1993) Multilevel Gauss-Seidel-algorithms for full and sparse grid problems. Computing 50: 127–148zbMATHCrossRefMathSciNetGoogle Scholar
  41. 41.
    Hastie T (1984) Principal curves and surfaces. Ph.D. Thesis, Stanford UniversityGoogle Scholar
  42. 42.
    Hastie T, Stuetzle W (1989) Principal curves. J Am Stat Assoc 84(406): 502–516zbMATHCrossRefMathSciNetGoogle Scholar
  43. 43.
    Huo X, Ni X, Smith A (2006) A survey of manifold-based learning methods. In: Mining of enterprise data, emerging nonparametric methodology, chapter 1. Springer, New YorkGoogle Scholar
  44. 44.
    Jamshidi AA, Kirby MJ (2007) Towards a black box algorithm for nonlinear function approximation over high-dimensional domains. SIAM J Sci Comput 29(3): 941–963zbMATHCrossRefMathSciNetGoogle Scholar
  45. 45.
    Jollife I (1986) Principal component analysis. Springer, New YorkGoogle Scholar
  46. 46.
    Jost J (1994) Differentialgeometrie und Minimalflächen. Springer, HeidelbergzbMATHGoogle Scholar
  47. 47.
    Kégl B (1999) Principal curves: learning, design, and applications. Ph.D. Thesis, Concordia University, CanadaGoogle Scholar
  48. 48.
    Kégl B, Krzyzak A, Linder T, Zeger K (2000) Learning and design of principal curves. IEEE Trans Pattern Anal Mach Intel 22(3): 281–297CrossRefGoogle Scholar
  49. 49.
    Kimmelsdorf G, Wahba G (1971) Some results on Tchebycheffian spline functions. J Math Anal Appl 33: 82–95CrossRefMathSciNetGoogle Scholar
  50. 50.
    Kirby M (2001) Geometric data analysis: an empirical approach to dimensionality reduction and the study of patterns. Wiley, New YorkzbMATHGoogle Scholar
  51. 51.
    Kramer M (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE J 37: 233–243CrossRefGoogle Scholar
  52. 52.
    Minka T (2001) Automatic choice of dimensionality for PCA. In: Leen T, Dietterich T, Tresp V (eds) Advances in neural information processing systems, vol 13. MIT Press, Cambridge, pp 598–604Google Scholar
  53. 53.
    Owen A (2004) Multidimensional variation for quasi-Monte Carlo. Technical Report 2004-02, Department of Statistics, Stanford UnivGoogle Scholar
  54. 54.
    Press W, Flannery B, Teukolsky S, Vetterling W (1992) Numerical recipes in C. Cambridge University Press, CambridgezbMATHGoogle Scholar
  55. 55.
    Paskov S (1993) Average case complexity of multivariate integration for smooth functions. J Complexity 9(2): 291–312zbMATHCrossRefMathSciNetGoogle Scholar
  56. 56.
    Sandilya S, Kulkarni S (2000) Principal curves with bounded turn. IEEE Trans Inf Theory 48(10): 2789–2793CrossRefMathSciNetGoogle Scholar
  57. 57.
    Schwab C, Todor R (2003) Sparse finite elements for stochastic elliptic problems-higher order moments. Computing 71: 43–63zbMATHCrossRefMathSciNetGoogle Scholar
  58. 58.
    Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, CambridgeGoogle Scholar
  59. 59.
    Schölkopf B, Herbrich R, Smola A, Williamson R (2001) A generalized representer theorem. Technical Report 200-81, NeuroCOLT 2000. In: Proceedings COLT’2001. Lecture Notes on Artificial Intelligence. Springer, HeidelbergGoogle Scholar
  60. 60.
    Smola A, Mika S, Schölkopf B, Williamson R (2001) Regularized principal manifolds. J Mach Learn Res 1: 179–209zbMATHCrossRefMathSciNetGoogle Scholar
  61. 61.
    Smolyak S (1963) Quadrature and interpolation formulas for tensor products of certain classes of functions. Soviet Math Dokl. 4:240–243. [Russian original in Dokl. Akad. Nauk SSSR, 148:1042–1045]Google Scholar
  62. 62.
    Takens F (1981) Detecting strange attractors in turbulence. In: Dynamical Systems and Turbulence. Rand D, Young L (eds) Lecture Notes in Mathematics. Springer, New York, p 366Google Scholar
  63. 63.
    Tibshirani R (1992) Principal curves revisited. Stat Comput 2: 183–190CrossRefGoogle Scholar
  64. 64.
    Wahba G (1990) Spline models for observational data. Volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics, Society for Industrial and Applied Mathematics (SIAM), PhiladelphiaGoogle Scholar
  65. 65.
    Whitney H (1936) Differentiable manifolds. Ann Math 37: 645–680CrossRefMathSciNetGoogle Scholar
  66. 66.
    Zenger C (1991) Sparse grids. In: Hackbusch W (ed) Parallel Algorithms for Partial Differential Equations. NNFM, vol 31, Vieweg, Braunschweig/WiesbadenGoogle Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  1. 1.Institute for Numerical SimulationUniversity of BonnBonnGermany

Personalised recommendations