Machine Learning

, Volume 56, Issue 1–3, pp 209–239 | Cite as

Semi-Supervised Learning on Riemannian Manifolds

  • Mikhail Belkin
  • Partha Niyogi


We consider the general problem of utilizing both labeled and unlabeled data to improve classification accuracy. Under the assumption that the data lie on a submanifold in a high dimensional space, we develop an algorithmic framework to classify a partially labeled data set in a principled manner. The central idea of our approach is that classification functions are naturally defined only on the submanifold in question rather than the total ambient space. Using the Laplace-Beltrami operator one produces a basis (the Laplacian Eigenmaps) for a Hilbert space of square integrable functions on the submanifold. To recover such a basis, only unlabeled examples are required. Once such a basis is obtained, training can be performed using the labeled data set.

Our algorithm models the manifold using the adjacency graph for the data and approximates the Laplace-Beltrami operator by the graph Laplacian. We provide details of the algorithm, its theoretical justification, and several practical applications for image, speech, and text classification.

semi-supervised learning manifold learning graph regularization laplace operator graph laplacian 


  1. Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15:6, 1373–1396.Google Scholar
  2. Belkin, M., Matveeva, I., & Niyogi, P. (2003). Regression and regularization on large graphs. University of Chicago Computer Science, Technical Report TR-2003-11.Google Scholar
  3. Blum, A., & Chawla, S. (2001). Learning from labeled and unlabeled data using graph mincuts. In Proceedings of the International Conference on Machine Learning.Google Scholar
  4. Bousquet, O., & Elisseeff, A. (2001). Stability and generalization. Journal of Machine Learning Research.Google Scholar
  5. Buser, P. (1982). A note on the isoperimetric constant. Ann. Sci. Ec. Norm. Sup. 15.Google Scholar
  6. Castelli, V., & Cover, T. M. (1995). On the exponential value of labeled samples. Pattern Recognition Letters, 16.Google Scholar
  7. Cheeger, J. (1970). A lower bound for the smallest eigenvalue of the laplacian. In R.C. Gunnings (Ed.), Problems in analysis. Princeton University Press.Google Scholar
  8. Chapelle, O., Weston, J., & Scholkopf, B. (2003). Cluster kernels for semi-supervised learning. Advances in Neural Information Processing Systems.Google Scholar
  9. Cucker, F., & Smale, S. (2001). On the mathematical foundations of learning. Bulletin of the AMS, 39, 1–49.Google Scholar
  10. Chung, F. R. K. (1997). Spectral graph theory. Regional Conference Series in Mathematics, number 92.Google Scholar
  11. Chung, F. R. K., Grigor'yan, A., & Yau, S.-T. (2000). Higher eigenvalues and isoperimetric inequalities on Riemannian manifolds and graphs. Communications on Analysis and Geometry, 8, 969–1026.Google Scholar
  12. Haykin, S. (1999). Neural networks, A comprehensive foundation. Prentice Hall.Google Scholar
  13. Joachims, T. (2003). Transductive learning via spectral graph partitioning. In Proceedings of the International Conference on Machine Learning.Google Scholar
  14. Kannan, R., Vempala, S., & Adrian Vetta. (2000). On clusterings: Good, bad and spectral. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science.Google Scholar
  15. Kondor, R., & Lafferty, J. (2002). Diffusion kernels on graphs and other discrete input spaces. In Proceedings of the International Conference on Machine Learning.Google Scholar
  16. Kutin, S., & Niyogi, P. (2002). Almost everywhere algorithmic stability and generalization error. In Proceedings of Uncertainty in Artificial Intelligence.Google Scholar
  17. Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled data. Machine Learning, 39:2/3.Google Scholar
  18. Rosenberg, S. (1997). The Laplacian on a riemmannian manifold. Cambridge University Press.Google Scholar
  19. Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290.Google Scholar
  20. Schölkopf, B., Smola, A., & Mller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:5.Google Scholar
  21. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22:8.Google Scholar
  22. Smola, A., & Kondor, R. (2003). Kernels and regularization on graphs. In The Sixteenth Annual Conference on Learning Theory/The Seventh Workshop on Kernel Machines.Google Scholar
  23. Szummer, M., & Jaakkola, T. (2002). Partially labeled classification with Markov random walks. Advances in Neural Information Processing Systems.Google Scholar
  24. Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290.Google Scholar
  25. Tikhonov, A. N., & Arsenin, V. Y. (1977). Solutions of ill-posed problems. W. H. Winston, Washington, D.C.Google Scholar
  26. Wahba, G. (1990). Spline models for observational data. Society for Industrial and Applied Mathematics.Google Scholar
  27. Zhou, D., Bousquet, O., Lal, T.N., Weston, J., & Schölkopf, B. (2003). Learning with local and global consistency, Max Planck Institute for Biological Cybernetics Technical Report.Google Scholar
  28. Zhu, X., Lafferty, J., & Ghahramani, Z. (2003). Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the International Conference on Machine Learning.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Mikhail Belkin
    • 1
  • Partha Niyogi
    • 1
  1. 1.Department of Computer ScienceUniversity of ChicagoChicagoUSA

Personalised recommendations