International Journal of Computer Vision

, Volume 114, Issue 2–3, pp 113–136 | Cite as

Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds

  • Mehrtash Harandi
  • Richard Hartley
  • Chunhua Shen
  • Brian Lovell
  • Conrad Sanderson
Article

Abstract

Sparsity-based representations have recently led to notable results in various visual recognition tasks. In a separate line of research, Riemannian manifolds have been shown useful for dealing with features and models that do not lie in Euclidean spaces. With the aim of building a bridge between the two realms, we address the problem of sparse coding and dictionary learning in Grassmann manifolds, i.e., the space of linear subspaces. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric mapping. This in turn enables us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we propose an algorithm for learning a Grassmann dictionary, atom by atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann sparse coding and dictionary learning algorithms through embedding into higher dimensional Hilbert spaces. Experiments on several classification tasks (gender recognition, gesture classification, scene analysis, face recognition, action recognition and dynamic texture classification) show that the proposed approaches achieve considerable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as kernelized Affine Hull Method and graph-embedding Grassmann discriminant analysis.

Keywords

Riemannian geometry Grassmann manifolds Sparse coding Dictionary learning 

References

  1. Absil, P.-A., Mahony, R., & Sepulchre, R. (2004). Riemannian geometry of grassmann manifolds with a view on algorithmic computation. Acta Applicandae Mathematica, 80(2), 199–220.CrossRefMathSciNetMATHGoogle Scholar
  2. Absil, P.-A., Mahony, R., & Sepulchre, R. (2008). Optimization algorithms on matrix manifolds. Princeton: Princeton University Press.CrossRefMATHGoogle Scholar
  3. Aharon, M., Elad, M., & Bruckstein, A. (2006). K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322.CrossRefGoogle Scholar
  4. Arsigny, V., Fillard, P., Pennec, X., & Ayache, N. (2006). Log-euclidean metrics for fast and simple calculus on diffusion tensors. Magnetic Resonance in Medicine, 56(2), 411–421.CrossRefGoogle Scholar
  5. Basri, R., & Jacobs, D. W. (2003). Lambertian reflectance and linear subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(2), 218–233.CrossRefGoogle Scholar
  6. Begelfor, E., & Werman, M. (2006). Affine invariance revisited. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2087–2094).Google Scholar
  7. Candès, E. J., Romberg, J., & Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2), 489–509.CrossRefMATHGoogle Scholar
  8. Cetingul, H. E., & Vidal, R. (2009), Intrinsic mean shift for clustering on stiefel and grassmann manifolds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1896–1902).Google Scholar
  9. Cetingul, H.E., & Vidal, R. (2011). Sparse riemannian manifold clustering for HARDI segmentation. In IEEE International Symposium on Biomedical Imaging: From Nano to Macro (pp. 1750–1753).Google Scholar
  10. Cetingul, H. E., Wright, M. J., Thompson, P. M., & Vidal, R. (2014). Segmentation of high angular resolution diffusion MRI using sparse riemannian manifold clustering. IEEE Transactions on Medical Imaging, 33(2), 301–317.CrossRefGoogle Scholar
  11. Cevikalp, H., & Triggs, B. (2010). Face recognition based on image sets. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2567–2573).Google Scholar
  12. Chan, A.B., & Vasconcelos, N. (2005). Probabilistic kernels for the classification of auto-regressive visual processes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 846–851).Google Scholar
  13. Chen, S., Sanderson, C., Harandi, M., & Lovell, B. C. (2013). Improved image set classification via joint sparse approximated nearest subspaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 452–459).Google Scholar
  14. Chikuse, Y. (2003). Statistics on special manifolds (Vol. 174). New York: Springer.MATHGoogle Scholar
  15. Cock, K. D., & Moor, B. D. (2002). Subspace angles between ARMA models. Systems and Control Letters, 46, 265–270.CrossRefMathSciNetMATHGoogle Scholar
  16. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 886–893).Google Scholar
  17. Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289–1306.CrossRefMathSciNetMATHGoogle Scholar
  18. Doretto, G., Chiuso, A., Wu, Y. N., & Soatto, S. (2003). Dynamic textures. International Journal of Computer Vision, 51, 91–109.CrossRefMATHGoogle Scholar
  19. Elad, M. (2010). Sparse and redundant representations—From theory to applications in signal and image processing. New York: Springer.MATHGoogle Scholar
  20. Elhamifar, E., & Vidal, R. (2013). Sparse subspace clustering: Algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11), 2765–2781.CrossRefGoogle Scholar
  21. Gallivan, K. A., Srivastava, A., Liu, X., & Van Dooren, P. (2003). Efficient algorithms for inferences on Grassmann manifolds. In IEEE Workshop on Statistical Signal Processing (pp. 315–318).Google Scholar
  22. Ghanem, B., & Ahuja, N. (2010). Maximum margin distance learning for dynamic texture recognition. Proceedings of the European Conference on Computer Vision (ECCV), 6312, 223–236.Google Scholar
  23. Goh, A., & Vidal, R. (2008). Clustering and dimensionality reduction on Riemannian manifolds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–7).Google Scholar
  24. Golub, G. H., & Van Loan, C. F. (1996). Matrix computations (3rd ed.). Baltimore: Johns Hopkins University Press.MATHGoogle Scholar
  25. Gong, B., Shi, Y., Sha, F., & Grauman, K. (2012). Geodesic flow kernel for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2066–2073).Google Scholar
  26. Gopalan, R., Li, R., & Chellappa, R. (2014). Unsupervised adaptation across domain shifts by generating intermediate data representations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11), 2288–2302.CrossRefGoogle Scholar
  27. Guo, K., Ishwar, P., & Konrad, J. (2013). Action recognition from video using feature covariance matrices. IEEE Transactions on Image Processing (TIP), 22(6), 2479–2494.CrossRefMathSciNetGoogle Scholar
  28. Hamm, J., & Lee, D. D. (2008). Grassmann discriminant analysis: a unifying view on subspace-based learning. In Proceedings of the International Conference on Machine Learning (ICML) (pp. 376–383).Google Scholar
  29. Harandi, M., Sanderson, C., Shen, C., & Lovell, B. C. (2013). Dictionary learning and sparse coding on Grassmann manifolds: An extrinsic solution. In: Proceedings of the International Conference on Computer Vision (ICCV).Google Scholar
  30. Harandi, M.T., Hartley, R., Lovell, B. C., & Sanderson, C. (2015). Sparse coding on symmetric positive definite manifolds using bregman divergences. IEEE Transaction on Neural Networks and Learning Systems (TNNLS) PP(99):1–1.Google Scholar
  31. Harandi, M. T., Sanderson, C., Shirazi, S., & Lovell, B. C. (2011). Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2705–2712).Google Scholar
  32. Hartley, R., Trumpf, J., Dai, Y., & Li, H. (2013). Rotation averaging. International Journal of Computer Vision, 103(3), 267–305.CrossRefMathSciNetMATHGoogle Scholar
  33. Helmke, U., Hüper, K., & Trumpf, J. (2007). Newtons method on Grassmann manifolds. Preprint: arXiv:0709.2205.
  34. Ho, J., Xie, Y., & Vemuri, B. (2013). On a nonlinear generalization of sparse coding and dictionary learning. In: Proceedings of the International Conference on Machine Learning (ICML) (pp. 1480–1488).Google Scholar
  35. Karcher, H. (1977). Riemannian center of mass and mollifier smoothing. Communications on pure and applied mathematics, 30(5), 509–541.CrossRefMathSciNetMATHGoogle Scholar
  36. Kim, M., Kumar, S., Pavlovic, V., & Rowley, H. (2008). Face tracking and recognition with visual constraints in real-world videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–8).Google Scholar
  37. Kim, T.-K., & Cipolla, R. (2009). Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(8), 1415–1428.CrossRefGoogle Scholar
  38. Kim, T.-K., Kittler, J., & Cipolla, R. (2007). Discriminative learning and recognition of image set classes using canonical correlations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 1005–1018.CrossRefGoogle Scholar
  39. Kokiopoulou, E., Chen, J., & Saad, Y. (2011). Trace optimization and eigenproblems in dimension reduction methods. Numerical Linear Algebra with Applications, 18(3), 565–602.CrossRefMathSciNetMATHGoogle Scholar
  40. Lee, J. M. (2012). Introduction to smooth manifolds (Vol. 218). New York: Springer.CrossRefGoogle Scholar
  41. Li, B., Ayazoglu, M., Mao, T., Camps, O. I., & Sznaier, M. (2011). Activity recognition using dynamic subspace angles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3193–3200).Google Scholar
  42. Lui, Y. M. (2012). Human gesture recognition on product manifolds. Journal of Machine Learning Research, 13, 3297–3321.MathSciNetMATHGoogle Scholar
  43. Mairal, J., Bach, F., & Ponce, J. (2012). Task-driven dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 791–804.CrossRefGoogle Scholar
  44. Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11, 19–60.MathSciNetMATHGoogle Scholar
  45. Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zisserman, A. (2008). Discriminative learned dictionaries for local image analysis. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–8). IEEE.Google Scholar
  46. Mairal, J., Elad, M., & Sapiro, G. (2008). Sparse representation for color image restoration. IEEE Transactions on Image Processing (TIP), 17(1), 53–69.CrossRefMathSciNetGoogle Scholar
  47. Manton, J. H. (2004). A globally convergent numerical algorithm for computing the centre of mass on compact lie groups. In International Conference on Control, Automation, Robotics and Vision 3 (pp. 2211–2216).Google Scholar
  48. Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 971–987.CrossRefGoogle Scholar
  49. Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607–609.CrossRefGoogle Scholar
  50. Ramamoorthi, R. (2002). Analytic PCA construction for theoretical analysis of lighting variability in images of a Lambertian object. IEEE Trans. Pattern Analysis and Machine Intelligence, 24(10), 1322–1333.CrossRefGoogle Scholar
  51. Rao, S. R., Tron, R., Vidal, R., & Ma, Y. (2008). Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–8).Google Scholar
  52. Ravichandran, A., Favaro, P., & Vidal, R. (2011). A unified approach to segmentation and categorization of dynamic textures. In Proceedings of the Asian Conference on Computer Vision (ACCV) (pp. 425–438). Springer.Google Scholar
  53. Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.CrossRefGoogle Scholar
  54. Sanderson, C., Harandi, M. T., Wong, Y., & Lovell, B. C. (2012). Combined learning of salient local descriptors and distance metrics for image set face verification. In Proceedings of the International Conference on Advanced Video and Signal-Based Surveillance (pp. 294–299).Google Scholar
  55. Sankaranarayanan, A., Turaga, P., Baraniuk, R., & Chellappa, R. (2010). Compressive acquisition of dynamic scenes. Proceedings of the European Conference on Computer Vision (ECCV), 6311, 129–142.Google Scholar
  56. Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  57. Shirazi, S., Sanderson, C., McCool, C., & Harandi, M. T. (2015). Bags of affine subspaces for robust object tracking. Preprint: arXiv:1408.2313.
  58. Srivastava, A., & Klassen, E. (2004). Bayesian and geometric subspace tracking. Advances in Applied Probability, 36(1), 43–56.CrossRefMathSciNetMATHGoogle Scholar
  59. Subbarao, R., & Meer, P. (2009). Nonlinear mean shift over Riemannian manifolds. International Journal of Computer Vision, 84(1), 1–20.CrossRefGoogle Scholar
  60. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288.MathSciNetMATHGoogle Scholar
  61. Turaga, P., Veeraraghavan, A., Srivastava, A., & Chellappa, R. (2011). Statistical computations on Grassmann and Stiefel manifolds for image and video-based recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11), 2273–2286.CrossRefGoogle Scholar
  62. Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1), 71–86.CrossRefGoogle Scholar
  63. Vemulapalli, R., Pillai, J. K., & Chellappa, R. (2013). Kernel learning for extrinsic classification of manifold features. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1782–1789).Google Scholar
  64. Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.CrossRefGoogle Scholar
  65. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3360–3367).Google Scholar
  66. Wang, Y., & Mori, G. (2009). Human action recognition by semilatent topic models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(10), 1762–1774.CrossRefGoogle Scholar
  67. Wikipedia. Min-max theorem – wikipedia, the free encyclopedia, 2015. [Online; accessed 27-May-2015].Google Scholar
  68. Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T. S., & Yan, S. (2010). Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE, 98(6), 1031–1044.CrossRefGoogle Scholar
  69. Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S., & Ma, Y. (2009). Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2), 210–227.CrossRefGoogle Scholar
  70. Xu, Y., Quan, Y., Ling, H., & Ji, H. (2011). Dynamic texture classification using dynamic fractal analysis. In Proceedings of the International Conference on Computer Vision (ICCV).Google Scholar
  71. Yang, J., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1794–1801).Google Scholar
  72. Yu, K., & Zhang, T. (2010). Improved local coordinate coding using local tangents. In Proceedings of the International Conference on Machine Learning (ICML) (pp. 1215–1222).Google Scholar
  73. Yu, K., Zhang, T., & Gong, Y. (2009). Nonlinear learning using local coordinate coding. In Proceedings of the Advances in Neural Information Processing Systems (NIPS) 9 (p 1).Google Scholar
  74. Yu, S., Tan, T., Huang, K., Jia, K., & Wu, X. (2009). A study on gait-based gender classification. IEEE Transactions on Image Processing (TIP), 18(8), 1905–1910.CrossRefMathSciNetGoogle Scholar
  75. Yuan, C., Hu, W., Li, X., Maybank, S., & Luo, G. (2010). Human action recognition under log-euclidean Riemannian metric. In H. Zha, R.-I. Taniguchi, & S. Maybank editors, Proc. Asian Conference on Computer Vision (ACCV), volume 5994 of Lecture Notes in Computer Science, pages 343–353. Springer Berlin Heidelberg.Google Scholar
  76. Zhao, G., & Pietikäinen, M. (2007). Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Analysis and Machine Intelligence, 29(6), 915–928.CrossRefGoogle Scholar
  77. Zheng, S., Zhang, J., Huang, K., He, R., & Tan, T. (2011). Robust view transformation model for gait recognition. In International Conference on Image Processing (ICIP) (pp. 2073–2076).Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.College of Engineering and Computer ScienceAustralian National UniversityCanberraAustralia
  2. 2.NICTACanberraAustralia
  3. 3.School of Computer ScienceThe University of AdelaideAdelaideAustralia
  4. 4.The University of QueenslandBrisbaneAustralia

Personalised recommendations