Machine Learning

, Volume 106, Issue 5, pp 627–650 | Cite as

A unified probabilistic framework for robust manifold learning and embedding



This paper focuses on learning a smooth skeleton structure from noisy data—an emerging topic in the fields of computer vision and computational biology. Many dimensionality reduction methods have been proposed, but none are specially designed for this purpose. To achieve this goal, we propose a unified probabilistic framework that directly models the posterior distribution of data points in an embedding space so as to suppress data noise and reveal the smooth skeleton structure. Within the proposed framework, a sparse positive similarity matrix is obtained by solving a box-constrained convex optimization problem, in which the sparsity of the matrix represents the learned neighborhood graph and the positive weights stand for the new similarity. Embedded data points are then obtained by applying the maximum a posteriori estimation to the posterior distribution expressed by the learned similarity matrix. The embedding process naturally provides a probabilistic interpretation of Laplacian eigenmap and maximum variance unfolding. Extensive experiments on various datasets demonstrate that our proposed method obtains the embedded points that accurately uncover inherent smooth skeleton structures in terms of data visualization, and the method yields superior clustering performance compared to various baselines.


Dimensionality reduction Probabilistic model Manifold embedding Structure learning 


  1. Belkin, M., & Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), NIPS (Vol. 14, pp. 585–591). Cambridge: MIT Press.Google Scholar
  2. Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR, 7, 2399–2434.MathSciNetMATHGoogle Scholar
  3. Besag, J. (1975). Statistical analysis of non-lattice data. The Statistician, 24(3), 179–195.CrossRefGoogle Scholar
  4. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.CrossRefMATHGoogle Scholar
  5. Burges, C. J. C. (2009). Dimension reduction: A guided tour. FTML, 2(4), 275–365.MATHGoogle Scholar
  6. Byrd, R. H., Lu, P., Nocedal, J., & Zhu, C. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5), 1190–1208.MathSciNetCrossRefMATHGoogle Scholar
  7. Creighton, C. (2012). The molecular profile of luminal b breast cancer. Biologics, 15, 440.Google Scholar
  8. Curtis, C., Shah, S. P., Chin, S., et al. (2012). The genomic and transcriptomic architecture of 2000 breast tumours reveals novel subgroups. Nature, 486(7403), 346–352.Google Scholar
  9. Elhamifar, E., & Vidal, R. (2011). Sparse manifold clustering and embedding. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), NIPS (pp. 55–63). Granada, Spain: Granada Congress and Exhibition Centre.Google Scholar
  10. Fukunaga, K. (2013). Introduction to statistical pattern recognition. New York: Academic press.MATHGoogle Scholar
  11. Greaves, M., & Maley, C. C. (2012). Clonal evolution in cancer. Nature, 481(7381), 306–313.CrossRefGoogle Scholar
  12. Gupta, A. K., & Nagar, D. K. (1999). Matrix variate distributions (Vol. 104). Boca Raton: CRC Press.MATHGoogle Scholar
  13. Hastie, T., & Stuetzle, W. (1989). Principal curves. JASA, 84, 502–516.MathSciNetCrossRefMATHGoogle Scholar
  14. Jebara, T. (2001). Discriminative, generative and imitative learning. Ph.D. thesis, Massachusetts Institute of Technology.Google Scholar
  15. Jimenez, L. O., & Landgrebe, D. (1998). Supervised classification in high-dimensional space: Geometrical, statistical, and asymptotical properties of multivariate data. TSMC, 28(1), 39–54.CrossRefGoogle Scholar
  16. Jolliffe, J. T. (1986). Principal component analysis. Berlin: Springer.CrossRefMATHGoogle Scholar
  17. Kégl, B., Krzyzak, A., Linder, T., & Zeger, K. (2000). Learning and design of principal curves. IEEE TPAMI, 22(3), 281–297.CrossRefGoogle Scholar
  18. Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86.MathSciNetCrossRefMATHGoogle Scholar
  19. Lawrence, N. D. (2005). Probabilistic non-linear principal component analysis with gaussian process latent variable models. JMLR, 6, 1783–1816.MathSciNetMATHGoogle Scholar
  20. Lawrence, N. D. (2012). A unifying probabilistic perspective for spectral dimensionality reduction: Insights and new models. JMLR, 13(1), 1609–1638.MathSciNetMATHGoogle Scholar
  21. Mao, Q., Tsang, I. W., Gao, S., & Wang, L. (2015). Generalized multiple kernel learning with data-dependent priors. IEEE TNNLS, 24(2), 248–261.Google Scholar
  22. Nene, S. A., Nayar, S. K., & Murase, H. (1996). Columbia object image library (coil-20). Technical Report CUCS-005-96.Google Scholar
  23. Nie, F., Xu, D., Tsang, I. W., & Zhang, C. (2009). Spectral embedded clustering. In C. Boutilier (Ed.), IJCAI (pp. 1181–1186). Menlo Park, California: AAAI Press.Google Scholar
  24. Nie, F., Xu, D., Tsang, I. W., & Zhang, C. (2010). Flexible manifold embedding: A framework for semi-supervised and unsupervised dimension reduction. TIP, 19(7), 1921–1932.MathSciNetGoogle Scholar
  25. Parker, J., Mullins, M., Cheang, M., et al. (2009). Supervised risk predictor of breast cancer based on intrinsic subtypes. Journal of Clinical Oncology, 27(8), 1160–1167.CrossRefGoogle Scholar
  26. Rasmussen, C. E. (2006). Gaussian processes for machine learning. Cambridge: The MIT Press.MATHGoogle Scholar
  27. Saul, L. K., & Roweis, S. T. (2003). Think globally, fit locally: Unsupervised learning of low dimensional manifolds. JMLR, 4, 119–155.MathSciNetMATHGoogle Scholar
  28. Schölkopf, B., Smola, A., & Muller, K. (1999). Kernel principal component analysis. In B. Schölkopf, A. J. Smola, & C. J. C. Burges (Eds.), Advances in Kernel methods–Support vector learning (pp. 327–352). Cambridge: MIT Press.Google Scholar
  29. Smola, A. J., & Kondor, R. (2003). Kernels and regularization on graphs. In B. Schölkopf & M. K. Warmuth (Eds.), ICML (pp. 144–158). New York: Springer.Google Scholar
  30. Song, L., Smola, A., Gretton, A., & Borgwardt, K. (2007). A dependence maximization view of clustering. In Z. Ghahramani (Ed.), ICML (pp. 815–822). New York: ACM.CrossRefGoogle Scholar
  31. Sun, Y., Yao, J., Nowak, N., & Goodison, S. (2014). Cancer progression modeling using static sample data. Genome Biology, 15(8), 440.CrossRefGoogle Scholar
  32. Tipping, M. E., & Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society, 61(3), 611–622.MathSciNetCrossRefMATHGoogle Scholar
  33. Tutuncu, R., Toh, K., & Todd, M. (2003). Solving semidefinite-quadratic-linear programs using SDPT3. Mathematical Programming, 95, 189–217.MathSciNetCrossRefMATHGoogle Scholar
  34. Vandenberghe, L., & Boyd, S. (1996). Semidefinite programming. SIAM Review, 38(1), 49–95.MathSciNetCrossRefMATHGoogle Scholar
  35. van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. JMLR, 9(2579–2605), 85.MATHGoogle Scholar
  36. van der Maaten, L., Postma, E. O., & van den Herik, H. J. (2009). Dimensionality reduction: A comparative review. Tilburg University Technical Report, TiCC-TR 2009-005.Google Scholar
  37. Weinberger, K. Q., Packer, B. D., & Saul, L. K. (2005). Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In R. G. Cowell & Z. Ghahramani (Eds.) Proceedings of the 10th international workshop on artificial intelligence and statistics (pp. 381–388).Google Scholar
  38. Weinberger, K. Q., & Saul, L. K. (2006). Unsupervised learning of image manifolds by semidefinite programming. IJCV, 70(1), 77–90.CrossRefGoogle Scholar
  39. Weinberger, K. Q., Sha, F., & Saul, L. K. (2004). Learning a kernel matrix for nonlinear dimensionality reduction. In C. E. Brodley (Ed.), ICML (p. 106). New York: ACM.CrossRefGoogle Scholar
  40. Zhu, J., & Xing, E. P. (2009). Maximum entropy discrimination markov networks. JMLR, 10, 2531–2569.MathSciNetMATHGoogle Scholar

Copyright information

© The Author(s) 2016

Authors and Affiliations

  1. 1.HERE North AmericaChicagoUSA
  2. 2.Department of Mathematics, Statistics and Computer ScienceUniversity of Illinois at ChicagoChicagoUSA
  3. 3.Centre for Artificial IntelligenceUniversity of Technology SydneySydneyAustralia

Personalised recommendations