Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

On Unifying Multi-view Self-Representations for Clustering by Tensor Multi-rank Minimization

Abstract

In this paper, we address the multi-view subspace clustering problem. Our method utilizes the circulant algebra for tensor, which is constructed by stacking the subspace representation matrices of different views and then rotating, to capture the low rank tensor subspace so that the refinement of the view-specific subspaces can be achieved, as well as the high order correlations underlying multi-view data can be explored. By introducing a recently proposed tensor factorization, namely tensor-Singular Value Decomposition (t-SVD) (Kilmer et al. in SIAM J Matrix Anal Appl 34(1):148–172, 2013), we can impose a new type of low-rank tensor constraint on the rotated tensor to ensure the consensus among multiple views. Different from traditional unfolding based tensor norm, this low-rank tensor constraint has optimality properties similar to that of matrix rank derived from SVD, so the complementary information can be explored and propagated among all the views more thoroughly and effectively. The established model, called t-SVD based Multi-view Subspace Clustering (t-SVD-MSC), falls into the applicable scope of augmented Lagrangian method, and its minimization problem can be efficiently solved with theoretical convergence guarantee and relatively low computational complexity. Extensive experimental testing on eight challenging image datasets shows that the proposed method has achieved highly competent objective performance compared to several state-of-the-art multi-view clustering methods.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. 1.

    The tensor rotation in Matlab can be achieved by using the command “shiftdim”.

  2. 2.

    A similar discussion about the optimization of the TNN regularized low-rank tensor completion problem can be found in Zhang et al. (2014).

  3. 3.

    https://cvc.yale.edu/projects/yalefaces/yalefaces.html.

  4. 4.

    https://cvc.yale.edu/projects/yalefacesB/yalefacesB.html.

  5. 5.

    http://www.uk.research.att.com/facedatabase.html.

  6. 6.

    http://www-cvr.ai.uiuc.edu/ponce_grp/data/.

  7. 7.

    This feature was extracted by using vlfeat toolbox Vedaldi and Fulkerson (2008).

  8. 8.

    http://www.cs.columbia.edu/CAVE/software/softlib/.

References

  1. Bickel, S., & Scheffer, T. (2004). Multi-view clustering. In Proceedings of the IEEE international conference on date mining (pp. 19–26).

  2. Blaschko, M. B., & Lampert, C. H. (2008). Correlational spectral clustering. In Proceedings of the IEEE computer vision and pattern recognition (pp. 1–8).

  3. Bosch, A., Zisserman, A., & Munoz, X. (2007) Image classification using random forest and ferns. In Proceedings of the IEEE international conference on computer vision.

  4. Cai, J., Candes, E., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982.

  5. Cai, D., He, X., & Han, J. (2005). Document clustering using locality preserving indexing. IEEE Transactions on Knowledge and Data Engineering, 17(12), 1624–1637.

  6. Cao, X., Zhang, C., Fu, H., Liu, S., & Zhang, H. (2015). Diversity-induced multi-view subspace clustering. In Proceedings of the IEEE computer vision and pattern recognition.

  7. Cao, X., Zhang, C., Zhou, C., Fu, H., & Foroosh, H. (2015). Constrained multi-view video face clustering. IEEE Transactions on Image Processing, 24(11), 4381–4393.

  8. Chaudhuri, K., Kakade, S. M., Livescu, K., & Sridharan, K. (2009). Multi-view clustering via canonical correlation analysis. In Proceedings of the international conference on machine learning (pp. 129–136).

  9. Christopher, M., Raghavan, D. P., & Schtze, H. (2008). Introduction to information retrieval (Vol. 1). Cambridge: Cambridge University Press.

  10. de Sa, V. R. (2005). Spectral clustering with two views. In Proceedings of the international conference on machine learning.

  11. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: a large-scale hierarchical image database. In Proceedings of the IEEE computer vision and pattern recognition.

  12. Eckstein, J., & Bertsekas, D. (1992). On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55, 293–318.

  13. Elhamifar, E., & Vidal, R. (2013). Sparse subspace clustering: Algorithm, theory, and application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11), 2765–2781.

  14. Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE computer vision and pattern recognition (pp. 524–531).

  15. Fei-Fei, L., Fergus, R., & Perona, P. (2007). Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 106(1), 59–70.

  16. Gao, H., Nie, F., Li, X., & Huang, H. (2015). Multi-view subspace clustering. In Proceedings of the IEEE international conference on computer vision.

  17. Gui, L., & Morency, L. P. (2015). Learning and transferring deep ConvNet representations with group-sparse factorization. In Proceedings of the IEEE international conference on computer vision workshops.

  18. Kernfeld, E., Aeron, S., & Kilmer, M. (2014). Clustering multi-way data: a novel algebraic approach. arXiv preprint, arXiv:1412.7056.

  19. Kilmer, M., Braman, K., Hao, N., & Hoover, R. (2013). Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging. SIAM Journal on Matrix Analysis and Applications, 34(1), 148–172.

  20. Kilmer, M. E., & Martin, C. D. (2011). Factorization strategies for third-order tensors. Linear Algebra and Its Applications, 435(3), 641–658.

  21. Kolda, T., & Bader, B. (2009). Tensor decompositions and applications. SIAM Review, 51(3), 455–500.

  22. Kumar, A., & Daumé, H. (2011). A co-training approach for multi-view spectral clustering. In Proceedings of the international conference on machine learning.

  23. Kumar, A., Rai, P., & Daumé, H. (2011) Co-regularized multiview spectral clustering. In Proceedings of the neural information processing systems.

  24. Lades, M., Vorbruggen, J. C., Buhmann, J., Lange, J., von der Malsburg, C., Wurtz, R. P., et al. (1993). Distortion invariant object recognition in the dynamic link architecture. IEEE Transactions on Computer, 42(3), 300–311.

  25. Lawrence, H., & Phipps, A. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

  26. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE computer vision and pattern recognition (pp. 2169–2178).

  27. Lin, Z., Chen, M., & Ma, Y. (2009). The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. Technical Report UILU-ENG-09-2215, UIUC.

  28. Lin, Z., Liu, R., & Su, Z. (2011). Linearized alternating direction method with adaptive penalty for low-rank representation. In Proceedings of the neural information processing systems (pp. 612–620).

  29. Lin, T., Ma, S., & Zhang, S. (2016). Iteration complexity analysis of multi-block ADMM for a family of convex minimization without strong convexity. Journal of Scientific Computing, 69(1), 52–81.

  30. Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., & Ma, Y. (2013). Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 171–184.

  31. Liu, J., Musialski, P., Wonka, P., & Ye, J. (2013). Tensor completion for estimating missing values in visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 208–220.

  32. Lu, C., Feng, J., Chen, Y., Liu, W., & Lin, Z. (2016). Tensor robust principal component analysis: Exact recovery of corrupted low-rank tensors via convex optimization. In Proceedings of the IEEE computer vision and pattern recognition.

  33. Luo, Y., Tao, D., Ramamohanarao, K., Xu, C., & Wen, Y. (2015). Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Transactions on Knowledge and Data Engineering, 27(11), 3111–3124.

  34. Lu, C., Yan, S., & Lin, Z. (2016). Convex sparse spectral clustering: Single-view to multi-view. IEEE Transactions on Image Processing, 25(6), 2833–2843.

  35. Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In Proceedings of the neural information processing systems.

  36. Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Recognition and Machine Intelligence, 24(7), 971–987.

  37. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42, 145–175.

  38. Piao, X., Hu, Y., Gao, J., Sun, Y., Lin, Z., & Yin, B. (2016). Tensor sparse and low-rank based submodule clustering method for multi-way data. arXiv preprint, arXiv:1601.00149.

  39. Qi, X., Xiao, R., Li, C., Qiao, Y., Guo, J., & Tang, X. (2014). Pairwise rotation invariant co-occurrence local binary pattern. IEEE Transactions on Pattern Recognition and Machine Intelligence, 36(11), 2199–2213.

  40. Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In Proceedings of the IEEE computer vision and pattern recognition (pp. 413–420).

  41. Semerci, O., Hao, Ning, Kilmer, M., & Miller, E. (2014). Tensor-based formulation and nuclear norm regularization for multienergy computed tomography. IEEE Transactions Image Processing, 23(4), 1678–1693.

  42. Shu, L., & Latecki, L. J. (2015). Integration of single-view graphs with diffusion of tensor product graphs for multi-view spectral clustering. In Proceedings of the Asian conference on machine learning.

  43. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. In Proceedings of the international conference on learning representations.

  44. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2015). Rethinking the inception architecture for computer vision. arXiv preprint, arXiv:1512.00567.

  45. Tang, W., Lu, Z., & Dhillon, I. S. (2009) Clustering with multiple graphs. In Proceedings of the IEEE international conference on date mining.

  46. Tzortzis, G., & Likas, A. (2012) Kernel-based weighted multi-view clustering. In Proceedings of the IEEE international conference on date mining (pp. 675–684).

  47. Vedaldi, A., & Fulkerson, B. (2008). VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/.

  48. Vedaldi, A., & Lenc, K. (2015). Matconvnet - convolutional neural networks for matlab. http://www.vlfeat.org/matconvnet/.

  49. Wang, W., Arora, R., Livescu, K., & Bilmes, J. (2015). On deep multi-view representation learning. In Proceedings of the international conference on machine learning.

  50. White, M., Zhang, X., Schuurmans, D., & Yu, Y. I. (2012). Convex multi-view subspace learning. In Proceedings of the neural information processing systems.

  51. Wu, J., & Rehg, J. M. (2011). Centrist: A visual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 1489–1501.

  52. Xia, R., Pan, Y., Du, L., & Yin, J. (2014) Robust multi-view spectral clustering via low-rank and sparse decomposition. In Proceedings of the AAAI conference on artificial intelligence.

  53. Xu, C., Tao, D., & Xu, C. (2013). A survey on multi-view learning. arXiv preprint, arXiv:1304.5634.

  54. Xu, C., Tao, D., & Xu, C. (2014). Large-margin multi-view information bottleneck. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8), 1559–1572.

  55. Xu, C., Tao, D., & Xu, C. (2015). Multi-view intact space learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(12), 2531–2544.

  56. Zhang, Z., Ely, G., Aeron, S., Hao, N., & Kilmer, M. (2014). Novel methods for multilinear data completion and de-noising based on tensor-SVD. In Proceedings of the IEEE computer vision and pattern recognition.

  57. Zhang, C., Fu, H., Liu, S., Liu, G., & Cao, X. (2015). Low-rank tensor constrained multiview subspace clustering. In Proceedings of the IEEE international conference on computer vision (pp. 2439–2446).

  58. Zhang, Y., Xu, C., Lu, H., & Huang, Y. M. (2009). Character identification in feature-length films using global face-name matching. IEEE Transactions on Multimedia, 11(7), 1276–1288.

  59. Zhou, D., & Burges, C. (2007) Spectral clustering and transductive learning with multiple views. In Proceedings of the international conference on machine learning.

Download references

Acknowledgements

The authors would like to thank editor and anonymous reviewers who gave valuable suggestion that has helped to improve the quality of the paper. This work was supported in part by the National Natural Science Foundation of China under Grants 61772524, 61402480, 61373077, 61602482; by the Beijing Natural Science Foundation under Grant 4182067; by the Australian Research Council Projects FL-170100117, DP-180103424, DP-140102164, LP-150100671; by the HK RGC General Research Fund (PolyU 152135/16E).

Author information

Correspondence to Yuan Xie.

Additional information

Communicated by T.E. Boult.

Appendix

Appendix

Proof of the Theorem 2:

Proof

In Fourier domain, the optimization problem of Eq. (29) can be reformulated as

$$\begin{aligned} \varvec{{\mathcal {G}}}_{f}&= {{\mathrm{argmin}}}_{\varvec{{\mathcal {G}}}_{f}} ~ \tau ||\mathrm {bdiag}(\varvec{{\mathcal {G}}}_{f})||_{*} + \frac{1}{2n_{3}}||\varvec{{\mathcal {G}}}_{f} - \varvec{{\mathcal {F}}}_{f}||_{F}^{2} \end{aligned}$$
(39)
$$\begin{aligned}&={{\mathrm{argmin}}}_{\varvec{{\mathcal {G}}}_{f}} ~ \sum _{j=1}^{n_{3}}\tau '||\varvec{{\mathcal {G}}}_{f}^{(j)}||_{*} + \frac{1}{2}||\varvec{{\mathcal {G}}}_{f}^{(j)} - \varvec{{\mathcal {F}}}_{f}^{(j)}||_{F}^{2}, \end{aligned}$$
(40)

where \(\tau '=n_{3}\tau \). Then Eq. (40) can be separated into \(n_{3}\) independent subproblems,

$$\begin{aligned} \varvec{{\mathcal {G}}}_{f}^{(j)} = {{\mathrm{argmin}}}_{\varvec{{\mathcal {G}}}_{f}^{(j)}} ~ \tau '||\varvec{{\mathcal {G}}}_{f}^{(j)}||_{*} + \frac{1}{2}||\varvec{{\mathcal {G}}}_{f}^{(j)} - \varvec{{\mathcal {F}}}_{f}^{(j)}||_{F}^{2}, \end{aligned}$$
(41)

where \(j = 1,2,\ldots ,n_{3}\). Note that Eq. (41) is the F-norm based nuclear norm low rank matrix approximation problem represented in Fourier domain. According to the result on subgradients of unitarily invariant norms, Eq. (41) can also be solved by a soft-thresholding operation (Cai et al. 2010),

$$\begin{aligned} \varvec{{\mathcal {G}}}_{f}^{(j)} = D_{\tau '}(\varvec{{\mathcal {F}}}_{f}^{(j)})= \varvec{{\mathcal {U}}}_{f}^{(j)}\varvec{{\mathcal {S}}}_{f,\tau '}^{(j)}\varvec{{\mathcal {V}}}_{f}^{(j)^{\mathrm {T}}}, \end{aligned}$$
(42)

here, \(\varvec{{\mathcal {G}}}_{f}^{(j)} = \varvec{{\mathcal {U}}}_{f}^{(j)}\varvec{{\mathcal {S}}}_{f}^{(j)}\varvec{{\mathcal {V}}}_{f}^{(j)^{\mathrm {T}}}\), \({\mathcal {D}}_{\tau '}(\cdot )\) is the SVT operation with with threshold \(\tau '\) (see Sect. 3), and \({\mathcal {S}}_{f,\tau '}^{(j)}= \mathrm {diag}\{(\mathcal{{\mathcal {S}}}_{f}^{(j)}(i,i)-\tau ' )_{+}\}\). Then, we can get

$$\begin{aligned} \varvec{{\mathcal {G}}}_{f} = \mathrm {bdfold}\left\{ \mathrm {bdiag}(\varvec{\mathcal{U}}_{f})\mathrm {bdiag}(\varvec{\mathcal{S}}_{f,\tau '}) \mathrm {bdiag}(\varvec{\mathcal{V}}_{f})^{\mathrm {T}}\right\} \end{aligned}$$
(43)

and

$$\begin{aligned} \varvec{{\mathcal {G}}} = \varvec{{\mathcal {U}}}*\varvec{\mathcal {{\tilde{S}}}}*\varvec{{\mathcal {V}}^{\mathrm {T}}} \end{aligned}$$
(44)

where \(\varvec{\mathcal {{\tilde{S}}}} = \mathrm {ifft}(\varvec{\mathcal{S}}_{f,\tau '},[~], 3)\). Suppose that \(\varvec{\mathcal{J}}\) is an \(n_{1} \times n_{2} \times n_{3}\) f-diagonal tensor whose diagonal element in the Fourier domain is \(\varvec{\mathcal{J}}_{f}(i,i,j) = (1 - \frac{\tau '}{\varvec{\mathcal{S}}_{f}^{(j)}(i,i)})_{+}\), then we have that \(\varvec{\mathcal{S}}_{f,\tau '}(i,i,:) = \varvec{\mathcal{S}}_{f}(i,i,:)\varvec{\mathcal{J}}_{f}(i,i,:)\) in the Fourier domain, as well as \(\varvec{\mathcal{{\tilde{S}}}}(i,i,:) = \varvec{\mathcal{S}}(i,i,:) \circ \varvec{\mathcal{J}}(i,i,:)\) in the original domain. Because both \(\varvec{\mathcal{S}}\) and \(\varvec{\mathcal{J}}\) are f-diagonal, \(\varvec{\mathcal{{\tilde{S}}}}\) can be formulated as \(\varvec{\mathcal{{\tilde{S}}}} = \varvec{\mathcal{S}} * \varvec{\mathcal{J}}\). Therefore, a convolution based tubal-shrinkage operator in the original domain is equivalent to the tensor SVT in the Fourier domain. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xie, Y., Tao, D., Zhang, W. et al. On Unifying Multi-view Self-Representations for Clustering by Tensor Multi-rank Minimization. Int J Comput Vis 126, 1157–1179 (2018). https://doi.org/10.1007/s11263-018-1086-2

Download citation

Keywords

  • T-SVD
  • Tensor multi-rank
  • Multi-view features
  • Subspace clustering