Statistical Methods

  • René Vidal
  • Yi Ma
  • S. Shankar Sastry
Part of the Interdisciplinary Applied Mathematics book series (IAM, volume 40)


The algebraic-geometric approach to subspace clustering described in the previous chapter provides a fairly complete characterization of the algebra and geometry of multiple subspaces, which leads to simple and elegant subspace clustering algorithms. However, while these methods can handle some noise in the data, they do not make explicit assumptions about the distribution of the noise or the data inside the subspaces. Therefore, the estimated subspaces need not be optimal from a statistical perspective, e.g., in a maximum likelihood (ML) sense.


Subspace Clustering Mixture Of Probabilistic Principal Component Analysis (MPPCA) Compression-based Clustering PPCA Model Code Length Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Agarwal, P., & Mustafa, N. (2004). k-means projective clustering. In ACM Symposium on Principles of Database Systems.Google Scholar
  2. Aldroubi, A., Cabrelli, C., & Molter, U. (2008). Optimal non-linear models for sparsity and sampling. Journal of Fourier Analysis and Applications, 14(5–6), 793–812.MathSciNetCrossRefzbMATHGoogle Scholar
  3. Aldroubi, A., & Zaringhalam, K. (2009). Nonlinear least squares in N. Acta Applicandae Mathematicae, 107(1–3), 325–337.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Benson, H. (1994). Concave minimization: Theory, applications and algorithms. In R. Horst & P. M. Pardalos (Eds.), Handbook of global optimization (vol. 2, pp. 43-148), Springer Verlag.Google Scholar
  5. Boyd, S., & Vandenberghe, L. (2004). Convex Optimization. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  6. Bradley, P. S., & Mangasarian, O. L. (2000). k-plane clustering. Journal of Global Optimization, 16(1), 23–32.Google Scholar
  7. Cilibrasi, R., & Vitányi, P. M. (2005). Clustering by compression. IEEE Transactions on Information Theory, 51(4), 1523–1545.Google Scholar
  8. Cover, T., & Thomas, J. (1991). Elements of information theory. Wiley.CrossRefzbMATHGoogle Scholar
  9. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39(1), 1–38.MathSciNetzbMATHGoogle Scholar
  10. Donoho, D. L., Vetterli, M., DeVore, R., & Daubechies, I. (1998). Data compression and harmonic analysis. IEEE Transactions on Information Theory, 44(6), 2435–2476.Google Scholar
  11. Fazel, M., Hindi, H., & Boyd, S. (2003). Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. In Proceedings of the American Control Conference (pp. 2156–2162).Google Scholar
  12. Figueiredo, M. A. T., & Jain, A. K. (2002). Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3), 381–396.Google Scholar
  13. Fischler, M. A., & Bolles, R. C. (1981). RANSAC random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 26, 381–395.Google Scholar
  14. Forgy, E. (1965). Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications (abstract). Biometrics, 21, 768–769.Google Scholar
  15. Frey, B., Colmenarez, A., & Huang, T. (1998). Mixtures of local linear subspaces for face recognition. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  16. Ghahramani, Z., & Beal, M. (2000). Variational inference for Bayesian mixtures of factor analysers. Advances in Neural Information Processing Systems, 12, 449–455.Google Scholar
  17. Ghahramani, Z., & Hinton, G. (1996). The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, University of Toronto, Canada.Google Scholar
  18. Hamkins, J., & Zeger, K. (2002). Gaussian source coding with spherical codes. IEEE Transactions on Information Theory, 48(11), 2980–2989.MathSciNetCrossRefzbMATHGoogle Scholar
  19. Hansen, M., & Yu, B. (2001). Model selection and the principle of minimum description length. Journal of American Statistical Association, 96, 746–774.MathSciNetCrossRefzbMATHGoogle Scholar
  20. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning. New York: Springer.CrossRefzbMATHGoogle Scholar
  21. Ho, J., Yang, M., Lim, J., Lee, K., & Kriegman, D. (2003). Clustering appearances of objects under varying illumination conditions. In Proceedings of International Conference on Computer Vision and Pattern Recognition.Google Scholar
  22. Horn, R. A., & Johnson, C. R. (1985). Matrix Analysis. Cambridge: Cambridge University Press.Google Scholar
  23. Huang, K., Ma, Y., & Vidal, R. (2004). Minimum effective dimension for mixtures of subspaces: A robust GPCA algorithm and its applications. In IEEE Conference on Computer Vision and Pattern Recognition (Vol. II, pp. 631–638).Google Scholar
  24. Jancey, R. (1966). Multidimensional group analysis. Australian Journal of Botany, 14, 127–130.CrossRefGoogle Scholar
  25. Kamvar, S., Klein, D., & Manning, C. (2002). Interpreting and extending classical agglomerative clustering methods using a model-based approach. Technical Report 2002-11, Stanford University Department of Computer Science.Google Scholar
  26. Lloyd, S. (1957). Least squares quantization in PCM. Technical Report. Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory, 28, 128–137.Google Scholar
  27. Ma, Y., Derksen, H., Hong, W., & Wright, J. (2007). Segmentation of multivariate mixed data via lossy coding and compression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9), 1546–1562.CrossRefGoogle Scholar
  28. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297).Google Scholar
  29. Madiman, M., Harrison, M., & Kontoyiannis, I. (2004). Minimum description length vs. maximum likelihood in lossy data compression. In Proceedings of the 2004 IEEE International Symposium on Information Theory.Google Scholar
  30. McLanchlan, G. J., & Krishnan, T. (1997). The EM Algorithms and Extentions. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc.Google Scholar
  31. Neal, R., & Hinton, G. (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. Jordan (Ed.), Learning in graphical models (pp. 355–368). Boston: Kluwer Academic.CrossRefGoogle Scholar
  32. Recht, B., Fazel, M., & Parrilo, P. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501.MathSciNetCrossRefzbMATHGoogle Scholar
  33. Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.CrossRefzbMATHGoogle Scholar
  34. Rose, K. (1998). Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proceedings of the IEEE, 86(11), 2210–2239.CrossRefGoogle Scholar
  35. Selim, S., & Ismail, M. A. (1984). K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transaction on Pattern Analysis and Machine Intelligence, 6(1), 81–87.Google Scholar
  36. Shi, J., & Malik, J. (1998). Motion segmentation and tracking using normalized cuts. In IEEE International Conference on Computer Vision (pp. 1154–1160).Google Scholar
  37. Tipping, M., & Bishop, C. (1999a). Mixtures of probabilistic principal component analyzers. Neural Computation, 11(2), 443–482.CrossRefGoogle Scholar
  38. Torr, P., Szeliski, R., & Anandan, P. (2001). An integrated Bayesian approach to layer extraction from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 297–303.CrossRefGoogle Scholar
  39. Tse, D., & Viswanath, P. (2005). Fundamentals of Wireless Communications. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  40. Tseng, P. (2000). Nearest q-flat to m points. Journal of Optimization Theory and Applications, 105(1), 249–252.MathSciNetCrossRefzbMATHGoogle Scholar
  41. Ueda, N., Nakan, R., & Ghahramani, Z. (2000). SMEM algorithm for mixture models. Neural Computation, 12, 2109–2128.CrossRefGoogle Scholar
  42. Ward, J. (1963). Hierarchical grouping to optimize and objective function. Journal of the American Statistical Association, 58, 236–244.MathSciNetCrossRefGoogle Scholar
  43. Wright, J., Ma, Y., Tao, Y., Lin, Z., & Shum, H.-Y. (2009b). Classification via minimum incremental coding length (MICL). SIAM Journal on Imahing Sciences, 2(2), 367–395.Google Scholar
  44. Wu, J. (1983). On the convergence properties of the EM algorithm. Annals of Statistics, 11(1), 95–103.MathSciNetCrossRefzbMATHGoogle Scholar
  45. Yang, M. H., Ahuja, N., & Kriegman, D. (2000). Face detection using mixtures of linear subspaces. In IEEE International Conference on Automatic Face and Gesture Recognition.Google Scholar
  46. Zhang, T., Szlam, A., & Lerman, G. (2009). Median k-flats for hybrid linear modeling with many outliers. In Workshop on Subspace Methods.Google Scholar
  47. Zhang, T., Szlam, A., Wang, Y., & Lerman, G. (2010). Randomized hybrid linear modeling via local best-fit flats. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1927–1934).Google Scholar

Copyright information

© Springer-Verlag New York 2016

Authors and Affiliations

  • René Vidal
    • 1
  • Yi Ma
    • 2
  • S. Shankar Sastry
    • 3
  1. 1.Center for Imaging Science Department of Biomedical EngineeringJohns Hopkins UniversityBaltimoreUSA
  2. 2.School of Information Science and Technology ShanghaiTech UniversityShanghaiChina
  3. 3.Department of Electrical Engineering and Computer ScienceUniversity of California BerkeleyBerkeleyUSA

Personalised recommendations