Abstract
Impressive performance has been achieved when learning graphs from data in clustering tasks. However, real data often contain considerable noise, which leads to unreliable or inaccurate constructed graphs. In this paper, we propose adaptive data correction-based graph clustering (ADCGC), which can be used to adaptively remove errors and noise from raw data and improve the performance of clustering. The ADCGC method mainly contains three advantages. First, we design the weighted truncated Schatten p-norm (WTSpN) instead of the nuclear norm to recover the low-rank clean data. Second, we choose clean data samples that represent the essential properties of the data as the vertices of the undirected graph, rather than using all the data feature points. Third, we adopt the block-diagonal regularizer to define the edge weights of the graph, which helps to learn an ideal affinity matrix and improve the performance of clustering. In addition, an efficient iterative scheme based on the generalized soft-thresholding operator and alternating minimization is developed to directly solve the nonconvex optimization model. Experimental results show that ADCGC both quantitatively and visually outperforms existing advanced methods.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-04268-8/MediaObjects/10489_2022_4268_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-04268-8/MediaObjects/10489_2022_4268_Figg_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-04268-8/MediaObjects/10489_2022_4268_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-04268-8/MediaObjects/10489_2022_4268_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-04268-8/MediaObjects/10489_2022_4268_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-04268-8/MediaObjects/10489_2022_4268_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-04268-8/MediaObjects/10489_2022_4268_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-04268-8/MediaObjects/10489_2022_4268_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-04268-8/MediaObjects/10489_2022_4268_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-04268-8/MediaObjects/10489_2022_4268_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-04268-8/MediaObjects/10489_2022_4268_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-04268-8/MediaObjects/10489_2022_4268_Fig11_HTML.png)
Similar content being viewed by others
References
Abhadiomhen SE, Wang Z, Shen X (2022) Coupled low rank representation and subspace clustering. Appl Intell 52(1):530–546
Anvari R, Siahsar MAN, Gholtashi S, Kahoo AR, Mohammadi M (2017) Seismic random noise attenuation using synchrosqueezed wavelet transform and low-rank signal matrix approximation. IEEE Trans Geosci Remote Sens 55(11):6574–6581
Cai X, Huang D, Wang CD, Kwoh CK (2020) Spectral clustering by subspace randomization and graph fusion for high-dimensional data. In: Pacific-asia conference on knowledge discovery and data mining, Springer, pp 330–342
Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM (JACM) 58(3):1–37
Chen B, Sun H, Xia G, Feng L, Li B (2018) Human motion recovery utilizing truncated schatten p-norm and kinematic constraints. Inf Sci 450:89–108
Chen J, Yang J (2013) Robust subspace segmentation via low-rank representation. IEEE Trans Cybernet 44(8):1432–1445
Chen Y, Zhou Y, Chen W, Zu S, Huang W, Zhang D (2017) Empirical low-rank approximation for seismic noise attenuation. IEEE Trans Geosci Remote Sens 55(8):4696–4711
Doneva M, Amthor T, Koken P, Sommer K, Börnert P (2017) Matrix completion-based reconstruction for undersampled magnetic resonance fingerprinting data. Magn Reson Imaging 41:41–52
Elhamifar E, Vidal R (2013) Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781
Fazel M (2002) Matrix rank minimization with applications PhD thesis. Stanford University, PhD thesis
Belhumeur P N, Hespanha J P, Kriegman D J (1997) Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on pattern analysis and machine intelligence 19(7):711–720
Gu S, Zhang L, Zuo W, Feng X (2014) Weighted nuclear norm minimization with application to image denoising. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2862–2869
Gu S, Xie Q, Meng D, Zuo W, Feng X, Zhang L (2017) Weighted nuclear norm minimization and its applications to low level vision. Int J Comput Vis 121(2):183–208
Guo L, Zhang X, Liu Z, Xue X, Wang Q, Zheng S (2021) Robust subspace clustering based on automatic weighted multiple kernel learning. Inf Sci 573:453–474
Han Y, Zhu L, Cheng Z, Li J, Liu X (2018) Discrete optimal graph clustering. IEEE Trans Cybernet 50(4):1697–1710
Hu Y, Zhang D, Ye J, Li X, He X (2012) Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Trans Pattern Anal Mach Intell 35(9):2117–2130
Huang T, Wang S, Zhu W (2020) An adaptive kernelized rank-order distance for clustering non-spherical data with high noise. Int J Mach Learn Cybernet 11(8):1735–1747
Ji P, Reid I, Garg R, Li H, Salzmann M (2017) Low-rank kernel subspace clustering. arXiv:170704974.1
Kang Z, Pan H, Hoi SC, Xu Z (2019a) Robust graph learning from noisy data. IEEE Trans Cybernet 50(5):1833–1843
Kang Z, Wen L, Chen W, Xu Z (2019b) Low-rank kernel learning for graph-based clustering. Knowl-Based Syst 163:510–517
Lang K (1995) Newsweeder: learning to filter netnews. In: Machine Learning Proceedings 1995, Elsevier, pp 331–339
Li J, Liu H, Tao Z, Zhao H, Fu Y (2020) Learnable subspace clustering. IEEE Trans Neural Netw Learn Syst 33:1119–1133
Li S, Li W, Hu J, Li Y (2022) Semi-supervised bi-orthogonal constraints dual-graph regularized nmf for subspace clustering. Appl Intell 52(3):3227–3248
Li T, Cheng B, Ni B, Liu G, Yan S (2016) Multitask low-rank affinity graph for image segmentation and image annotation. ACM Trans Intell Syst Technol (TIST) 7(4):1–18
Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2012) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184
Liu M, Wang Y, Sun J, Ji Z (2020) Structured block diagonal representation for subspace clustering. Appl Intell 50(8):2523–2536
Lu C, Feng J, Lin Z, Mei T, Yan S (2018) Subspace clustering by block diagonal representation. IEEE Trans Pattern Anal Mach Intell 41(2):487–501
Lu G-F, Wang Y, Tang G (2022) Robust low-rank representation with adaptive graph regularization from clean data. Appl Intell 52(5):5830–5840
Lyons MJ, Akamatsu S, Kamachi M, Gyoba J, Budynek J (1998) The japanese female facial expression (jaffe) database. In: Proceedings of third international conference on automatic face and gesture recognition, pp 14–16
Martinez A, Benavente R (1998) The ar face database: Cvc technical report, 24
Nikolova M, Ng MK (2005) Analysis of half-quadratic minimization methods for signal and image recovery. SIAM J Sci Comput 27(3):937–966
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Rate C, Retrieval C (2011) Columbia object image library (coil-20). Computer
Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of 1994 IEEE workshop on applications of computer vision, IEEE, pp 138–142
Schmitz MA, Heitz M, Bonneel N, Ngole F, Coeurjolly D, Cuturi M, Peyré G, Starck J L (2018) Wasserstein dictionary learning: optimal transport-based unsupervised nonlinear dictionary learning. SIAM J Imaging Sci 11(1):643–678
Sim T, Baker S, Bsat M (2002) The cmu pose, illumination, and expression (pie) database. In: Proceedings of fifth IEEE international conference on automatic face gesture recognition, IEEE, pp 53–58
Singh D, Singh B (2019) Hybridization of feature selection and feature weighting for high dimensional data. Appl Intell 49(4):1580–1596
Vidal R (2011) Subspace clustering. IEEE Signal Proc Mag 28(2):52–68
Wang L, Huang J, Yin M, Cai R, Hao Z (2020) Block diagonal representation learning for robust subspace clustering. Inf Sci 526:54–67
Xu Y, Chen S, Li J, Luo L, Yang J (2021) Learnable low-rank latent dictionary for subspace clustering. Pattern Recogn 120:108142
Xue X, Zhang X, Feng X, Sun H, Chen W, Liu Z (2020) Robust subspace clustering based on non-convex low-rank approximation and adaptive kernel. Inf Sci 513:190–205
Xue Z, Dong J, Zhao Y, Liu C, Chellali R (2019) Low-rank and sparse matrix decomposition via the truncated nuclear norm and a sparse regularizer. Vis Comput 35(11):1549–1566
Xuelong L, Guosheng C, Yongsheng D (2017) Graph regularized non-negative low-rank matrix factorization for image clustering. IEEE Transactions on Cybernetics”,“pubMedId”:“27448379 47(11):3840–3853
Yin M, Xie S, Wu Z, Zhang Y, Gao J (2018) Subspace clustering via learning an adaptive low-rank graph. IEEE Trans Image Process 27(8):3716–3728
Yuan C, Zhong Z, Lei C, Zhu X, Hu R (2021) Adaptive reverse graph learning for robust subspace learning. Inf Process Manag 58(6):102733
Zhang GY, Chen XW, Zhou YR, Wang CD, Huang D, He XY (2022) Kernelized multi-view subspace clustering via auto-weighted graph learning. Appl Intell 52(1):716–731
Zhang T, Tang Z, Liu Q (2017a) Robust subspace clustering via joint weighted schatten-p norm and l q norm minimization. J Electr Imaging 26(3):033021
Zhang X, Chen B, Sun H, Liu Z, Ren Z, Li Y (2019a) Robust low-rank kernel subspace clustering based on the schatten p-norm and correntropy. IEEE Trans Knowl Data Eng 32(12):2426–2437
Zhang Z, Jiang W, Qin J, Zhang L, Li F, Zhang M, Yan S (2017b) Jointly learning structured analysis discriminative dictionary and analysis multiclass classifier. IEEE Trans Neural Netw Learn Syst 29(8):3798–3814
Zhang Z, Zhang Y, Liu G, Tang J, Yan S, Wang M (2019b) Joint label prediction based semi-supervised adaptive concept factorization for robust data representation. IEEE Trans Knowl Data Eng 32(5):952–970
Zheng R, Li M, Liang Z, Wu FX, Pan Y, Wang J (2019) Sinnlrr: a robust subspace clustering method for cell type detection by non-negative and low-rank representation. Bioinformatics 35(19):3642–3650
Zheng R, Liang Z, Chen X, Tian Y, Cao C, Li M (2020) An adaptive sparse subspace clustering for cell type identification. Front Genet 11:407
Zheng Y, Zhang X, Yang S, Jiao L (2013) Low-rank representation with local constraint for graph construction. Neurocomputing 122:398–405
Zhu P, Zhu W, Hu Q, Zhang C, Zuo W (2017) Subspace clustering guided unsupervised feature selection. Pattern Recogn 66:364–374
Acknowledgements
This research work was subsidized by the following funds: 62102331, 2020YJ0432 and 2018TZDZX002. We want to thank Canyi Lu, Pan Ji, Zhao Kang and Xuqian Xue for providing the codes for BDR, LRKSC, RGC and LAKRSC, respectively. Finally, we thank the anonymous commenters who made comments for improving our work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
We set H = RΛLT to be the SVD of H, and
can be rewritten as
Equation (40) can be alternately solved.
(1) Updating \((\hat {{R}},\hat {{L}})\):
According to [12], we have
The optimal solution of L and R are the row and column bases of the SVD of Λ.
(1) Updating Λ:
RΛLT is a diagonal matrix consisting of nonascending combinations, since Λ is a diagonal matrix, L and R are permutation matrices. Equation (41) can be expressed as
Inspired by [12], we use soft-thresholding operation on each component of matrix RΛpLT, and \(\hat {{\varLambda }} = {R^{T}}{S_{w}}({{\varSigma }^{p}})L\) can be obtained.
Equation (39) can be solved by
We have \(\mathbf {Q} = {\varGamma } {\hat {R}^{T}}{S_{w}}({{\varSigma }^{p}})\hat {L}{{\varUpsilon }^{T}}\). When the weight vectors are in nondescending order (i.e., \(0 \le {w_{1}} \le {w_{1}} {\cdots } \le {w_{{\min \limits } (m,n)}}\) ), we initialize Λ0 in (42) and obtain
Therefore, we can obtain the solution \(\mathbf {Q} = {\varGamma } {S_{w}}({{\varSigma }^{p}}){{\varUpsilon }^{T}}\) of the WTSpNM problem in (19).
Appendix B
Set the objective function to f(Zt,Ct,Qt,Et,Tt), where t is the tth iteration. Let \({J_{1}} = \{ \boldsymbol {C}\left | {{\mathbf {C}_{ii}} = 0,\mathbf {C} \ge 0,\mathbf {C} = {\mathbf {C}^{T}}} \right .\} \) and \({J_{2}} = \{ \mathbf {W}\left | {Tr(\mathbf {W}) = k,0 \le \mathbf {W} \le \mathbf {I}} \right .\}\), defining the indicator function of \({J_{1}} \ \text {and} \ {J_{2}} \ \text {as} \ {l_{{j_{1}}}} \ \text {and} \ {l_{{j_{2}}}}\), respectively.
Inspired by [27], the sequence (Zt,Ct,Qt,Et,Tt) obtained via Algorithm 1 has the following properties:
-
(1)
Objective f(Zt,Ct,Qt,Et,Tt) is monotonically decreasing:
$$ \begin{array}{@{}rcl@{}} &&{}f({\mathbf{Z}^{t + 1}},{\mathbf{C}^{t + 1}},{\mathbf{Q}^{t + 1}},{\mathbf{E}^{t + 1}},{\mathbf{T}^{t + 1}}) + {l_{{j_{1}}}}({\mathbf{C}^{t + 1}}) + {l_{{j_{2}}}}({\mathbf{W}^{t + 1}}) \\ &&{}\le f({\mathbf{Z}^{t}},{\mathbf{C}^{t}},{\mathbf{Q}^{t}},{\mathbf{E}^{t}},{\mathbf{T}^{t}}) + {l_{{j_{1}}}}({\mathbf{C}^{t}}) + {l_{{j_{2}}}}({\mathbf{W}^{t}}) - \frac{\mu}{2}(\left\| {\mathbf{Z}^{t + 1}}\right.\\ &&\left.- {\mathbf{Z}^{t}} \right\|_{F}^{2} + \left\| {{\mathbf{C}^{t + 1}} - {\mathbf{C}^{t}}} \right\|_{F}^{2}\\ &&+ \left\| {{\mathbf{Q}^{t + 1}} - {\mathbf{Q}^{t}}} \right\|_{F}^{2} + \left\| {{\mathbf{E}^{t + 1}} - {\mathbf{E}^{t}}} \right\|_{F}^{2} + \frac{1}{\mu}\left\| {{\mathbf{T}^{t + 1}} - {\mathbf{T}^{t}}} \right\|_{F}^{2}) \end{array} $$ -
(2)
Zt+ 1 −Zt → 0,Ct+ 1 −Ct → 0,Qt+ 1 −Qt → 0,Et+ 1 −Et → 0,Tt+ 1 −Tt → 0
-
(3)
The sequence {Zt},{Ct},{Qt},{Et} and {Tt} are bounded.
Therefore, according to Theorem 7 in the literature [27], we know that the finite point (Z∗,C∗,Q∗,E∗,T∗) of (Zt,Ct,Qt,Et,Tt) is a stationary point of (17).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, L., Zhang, X., Zhang, R. et al. Robust graph representation clustering based on adaptive data correction. Appl Intell 53, 17074–17092 (2023). https://doi.org/10.1007/s10489-022-04268-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04268-8