Skip to main content

Weighted Nuclear Norm Minimization and Its Applications to Low Level Vision

Abstract

As a convex relaxation of the rank minimization model, the nuclear norm minimization (NNM) problem has been attracting significant research interest in recent years. The standard NNM regularizes each singular value equally, composing an easily calculated convex norm. However, this restricts its capability and flexibility in dealing with many practical problems, where the singular values have clear physical meanings and should be treated differently. In this paper we study the weighted nuclear norm minimization (WNNM) problem, which adaptively assigns weights on different singular values. As the key step of solving general WNNM models, the theoretical properties of the weighted nuclear norm proximal (WNNP) operator are investigated. Albeit nonconvex, we prove that WNNP is equivalent to a standard quadratic programming problem with linear constrains, which facilitates solving the original problem with off-the-shelf convex optimization solvers. In particular, when the weights are sorted in a non-descending order, its optimal solution can be easily obtained in closed-form. With WNNP, the solving strategies for multiple extensions of WNNM, including robust PCA and matrix completion, can be readily constructed under the alternating direction method of multipliers paradigm. Furthermore, inspired by the reweighted sparse coding scheme, we present an automatic weight setting method, which greatly facilitates the practical implementation of WNNM. The proposed WNNM methods achieve state-of-the-art performance in typical low level vision tasks, including image denoising, background subtraction and image inpainting.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Notes

  1. A general proximal operator is defined on a convex problem to guarantee an accurate projection. Although the problem here is nonconvex, we can strictly prove that it is equivalent to a convex quadratic programing problem in Sect. 3. We thus also call it a proximal operator throughout the paper for convenience.

  2. http://www.cs.tut.fi/foi/GCF-BM3D/BM3D.zip

  3. http://people.csail.mit.edu/danielzoran/noiseestimation.zip

  4. http://lear.inrialpes.fr/people/mairal/software.php

  5. http://www4.comp.polyu.edu.hk/~cslzhang/code/NCSR.rar

  6. http://www.csee.wvu.edu/xinl/demo/saist.html

  7. The SAR image was downloaded at http://aess.cs.unh.edu/radar%20se%20Lecture%2018%20B.html.

  8. The color image was used in previous work (Portilla 2004).

  9. http://www.cs.cmu.edu/ftorre/codedata.html

  10. http://winsty.net/brmf.html

  11. http://sites.google.com/site/yinqiangzheng/

  12. http://www.cs.cmu.edu/~deyum/Publications.htm

  13. The color versions of images #3, #5, #6, #7, #9, #11 are used in this MC experiment.

  14. http://www.gris.informatik.tu-darmstadt.de/sroth/research/foe

  15. http://gpi.upf.edu/static/vnli/interp/interp.html

  16. http://people.ee.duke.edu/mz1/Softwares

  17. http://www.imm.dtu.dk/pcha/mxTV/,

References

  • Arias, P., Facciolo, G., Caselles, V., & Sapiro, G. (2011). A variational framework for exemplar-based image inpainting. International Journal of computer Vision, 93(3), 319–347.

    MathSciNet  Article  MATH  Google Scholar 

  • Babacan, S. D., Luessi, M., Molina, R., & Katsaggelos, A. K. (2012). Sparse bayesian methods for low-rank matrix estimation. IEEE Transactions on Signal Processing, 60(8), 3964–3977.

    MathSciNet  Article  Google Scholar 

  • Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transaction on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.

    Article  Google Scholar 

  • Buades, A., Coll, B., & Morel, J. M. (2005). A non-local algorithm for image denoising. In CVPR.

  • Buades, A., Coll, B., & Morel, J. M. (2008). Nonlocal image and movie denoising. International Journal of Computer Vision, 76(2), 123–139.

    Article  Google Scholar 

  • Buchanan, A.M., & Fitzgibbon, A.W, (2005). Damped newton algorithms for matrix factorization with missing data. In CVPR.

  • Cai, J. F., Candès, E. J., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982.

    MathSciNet  Article  MATH  Google Scholar 

  • Candès, E. J., & Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6), 717–772.

    MathSciNet  Article  MATH  Google Scholar 

  • Candès, E. J., Wakin, M. B., & Boyd, S. P. (2008). Enhancing sparsity by reweighted \(l_1\) minimization. Journal of Fourier Analysis and Applications, 14(5–6), 877–905.

    MathSciNet  Article  MATH  Google Scholar 

  • Candès, E. J., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis? Journal of the ACM, 58(3), 11.

    MathSciNet  Article  MATH  Google Scholar 

  • Chan, T. F., & Shen, J. J. (2005). Image processing and analysis: Variational, PDE, wavelet, and stochastic methods. Philadelphia: SIAM Press.

    Book  MATH  Google Scholar 

  • Chartrand, R. (2007). Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Processing Letters, 14(10), 707–710.

    Article  Google Scholar 

  • Chartrand, R. (2012). Nonconvex splitting for regularized low-rank+ sparse decomposition. IEEE Transaction on Signal Processing, 60(11), 5810–5819.

    MathSciNet  Article  Google Scholar 

  • Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transaction on Image Processing, 16(8), 2080–2095.

    MathSciNet  Article  Google Scholar 

  • Dahl, J., Hansen, P. C., Jensen, S. H., & Jensen, T. L. (2010). Algorithms and software for total variation image reconstruction via first-order methods. Numerical Algorithms, 53(1), 67–92.

    MathSciNet  Article  MATH  Google Scholar 

  • De La Torre, F., & Black, M. J. (2003). A framework for robust subspace learning. International Journal of Computer Vision, 54(1–3), 117–142.

    Article  MATH  Google Scholar 

  • Ding, X., He, L., & Carin, L. (2011). Bayesian robust principal component analysis. IEEE Transactions on Image Processing, 20(12), 3419–3430.

    MathSciNet  Article  Google Scholar 

  • Dong, W., Zhang, L., & Shi, G. (2011). Centralized sparse representation for image restoration. In ICCV.

  • Dong, W., Shi, G., & Li, X. (2013). Nonlocal image restoration with bilateral variance estimation: A low-rank approach. IEEE Transaction on Image Processing, 22(2), 700–711.

    MathSciNet  Article  Google Scholar 

  • Dong, W., Shi, G., Li, X., Ma, Y., & Huang, F. (2014). Compressive sensing via nonlocal low-rank regularization. IEEE Transaction on Image Processing, 23(8), 3618–3632.

    MathSciNet  Article  Google Scholar 

  • Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Transaction on Info Theory, 41(3), 613–627.

    MathSciNet  Article  MATH  Google Scholar 

  • Eriksson, A., & Van Den Hengel, A. (2010). Efficient computation of robust low-rank matrix approximations in the presence of missing data using the \(l_1\) norm. In CVPR.

  • Fazel, M. (2002). Matrix rank minimization with applications. PhD thesis, PhD thesis, Stanford University.

  • Fazel, M., Hindi, H., & Boyd, S.P. (2001). A rank minimization heuristic with application to minimum order system approximation. In American Control Conference. (ACC).

  • Gu, S., Zhang, L., Zuo, W., & Feng, X. (2014). Weighted nuclear norm minimization with application to image denoising. In CVPR.

  • Jain, P., Netrapalli, P., & Sanghavi, S. (2013). Low-rank matrix completion using alternating minimization. In ACM symposium on theory of computing.

  • Ji, H., Liu, C., Shen, Z., & Xu, Y. (2010). Robust video denoising using low rank matrix completion. In CVPR.

  • Ji, S., & Ye, J. (2009). An accelerated gradient method for trace norm minimization. In ICML (pp. 457–464).

  • Ke, Q., & Kanade, T. (2005). Robust \(l_1\) norm factorization in the presence of outliers and missing data by alternative convex programming. In CVPR.

  • Kwak, N. (2008). Principal component analysis based on l1-norm maximization. IEEE Transaction on Pattern Analysis and Machine Intelligence, 30(9), 1672–1680.

    Article  Google Scholar 

  • Levin, A., & Nadler, B. (2011). Natural image denoising: Optimality and inherent bounds. In CVPR.

  • Levin, A., Nadler, B., Durand, F., & Freeman, W.T. (2012). Patch complexity, finite pixel correlations and optimal denoising. In ECCV.

  • Li, L., Huang, W., Gu, I. H., & Tian, Q. (2004). Statistical modeling of complex backgrounds for foreground object detection. IEEE Transaction on Image Processing, 13(11), 1459–1472.

    Article  Google Scholar 

  • Lin, Z., Ganesh, A., Wright, J., Wu, L., Chen, M., & Ma, Y. (2009). Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix. In International Workshop on Computational Advances in Multi-Sensor Adaptive Processing.

  • Lin, Z., Liu, R., & Su, Z. (2011). Linearized alternating direction method with adaptive penalty for low-rank representation. In NIPS.

  • Lin, Z., Liu, R., & Li, H. (2015). Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learning. Machine Learning, 99(2), 287–325.

    MathSciNet  Article  MATH  Google Scholar 

  • Liu, G., Lin, Z., Yan, S., Sun, J., Yu, & Y., Ma, Y. (2010). Robust subspace segmentation by low-rank representation. In ICML.

  • Liu, R., Lin, Z., De la, Torre, F., & Su, Z. (2012). Fixed-rank representation for unsupervised visual learning. In CVPR.

  • Lu, C., Tang, J., Yan, S., & Lin, Z. (2014a). Generalized nonconvex nonsmooth low-rank minimization. In CVPR.

  • Lu, C., Zhu, C., Xu, C., Yan, S., & Lin, Z. (2014b). Generalized singular value thresholding. arXiv preprint arXiv:1412.2231.

  • Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zisserman, A. (2009). Non-local sparse models for image restoration. In ICCV.

  • Meng, D., & Torre, F.D.L. (2013). Robust matrix factorization with unknown noise. In ICCV.

  • Mirsky, L. (1975). A trace inequality of john von neumann. Monatshefte für Mathematik, 79(4), 303–306.

    MathSciNet  Article  MATH  Google Scholar 

  • Mnih, A.,&Salakhutdinov, R. (2007). Probabilistic matrix factorization. In NIPS.

  • Mohan, K., & Fazel, M. (2012). Iterative reweighted algorithms for matrix rank minimization. The Journal of Machine Learning Research, 13(1), 3441–3473.

    MathSciNet  MATH  Google Scholar 

  • Moreau, J. J. (1965). Proximité et dualité dans un espace hilbertien. Bulletin de la Société mathématique de France, 93, 273–299.

    MathSciNet  MATH  Google Scholar 

  • Mu, Y., Dong, J., Yuan, X., & Yan, S. (2011). Accelerated low-rank visual recovery by random projection. In CVPR.

  • Nie, F., Huang, H., & Ding, C.H. (2012). Low-rank matrix recovery via efficient schatten p-norm minimization. In AAAI.

  • Oh, T.H., Kim, H., Tai, Y.W., Bazin, J.C., & Kweon, I.S. (2013). Partial sum minimization of singular values in rpca for low-level vision. In ICCV.

  • Peng, Y., Ganesh, A., Wright, J., Xu, W., & Ma, Y. (2012). Rasl: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Transaction on Pattern Analysis and Machine Intelligence, 34(11), 2233–2246.

    Article  Google Scholar 

  • Portilla, J. (2004). Blind non-white noise removal in images using gaussian scale. Citeseer: In Proceedings of the IEEE benelux signal processing symposium.

  • Rhea, D. (2011). The case of equality in the von Neumann trace inequality. Preprint.

  • Roth, S., & Black, M. J. (2009). Fields of experts. International Journal of Computer Vision, 82(2), 205–229.

    Article  Google Scholar 

  • Ruslan, S., & Srebro, N. (2010). Collaborative filtering in a non-uniform world: Learning with the weighted trace norm. In NIPS.

  • She, Y. (2012). An iterative algorithm for fitting nonconvex penalized generalized linear models with grouped predictors. Computational Statistics & Data Analysis, 56(10), 2976–2990.

    MathSciNet  Article  MATH  Google Scholar 

  • Srebro, N., & Jaakkola, T., et al. (2003). Weighted low-rank approximations. In ICML.

  • Srebro, N., Rennie, J., & Jaakkola, T.S. (2004). Maximum-margin matrix factorization. In NIPS.

  • Tipping, M. E., & Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3), 611–622.

    MathSciNet  Article  MATH  Google Scholar 

  • Wang, N., & Yeung, D.Y. (2013). Bayesian robust matrix factorization for image and video processing. In ICCV.

  • Wang, S., Zhang, L.,&Y, L. (2012). Nonlocal spectral prior model for low-level vision. In ACCV.

  • Wright, J., Peng, Y., Ma, Y., Ganesh, A., & Rao, S. (2009). Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In NIPS.

  • Zhang, D., Hu, Y., Ye, J., Li, X., & He X (2012a). Matrix completion by truncated nuclear norm regularization. In CVPR.

  • Zhang, Z., Ganesh, A., Liang, X., & Ma, Y. (2012b). Tilt: transform invariant low-rank textures. International Journal of Computer Vision, 99(1), 1–24.

    MathSciNet  Article  MATH  Google Scholar 

  • Zhao, Q., Meng, D., Xu, Z., Zuo, W., & Zhang, L. (2014) Robust principal component analysis with complex noise. In ICML.

  • Zheng, Y., Liu, G., Sugimoto, S., Yan, S., & Okutomi, M. (2012). Practical low-rank matrix approximation under robust \(l_1\) norm. In CVPR.

  • Zhou M, Chen, H., Ren, L., Sapiro, G., Carin, L., & Paisley, J.W. (2009). Non-parametric bayesian dictionary learning for sparse image representations. In NIPS.

  • Zhou, X., Yang, C., Zhao, H., & Yu, W. (2014). Low-rank modeling and its applications in image analysis. arXiv preprint arXiv:1401.3409.

  • Zoran, D., & Weiss, Y. (2011). From learning models of natural image patches to whole image restoration. In ICCV.

Download references

Acknowledgments

This work is supported by the Hong Kong RGC GRF grant (PolyU 5313/13E).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Zhang.

Additional information

Communicated by Jean-Michel Morel.

Appendix

Appendix

In this appendix, we provide the proof details of the theoretical results in the main text.

Proof of Theorem 1

Proof

For any \({{\varvec{X}}}, {{\varvec{Y}}}\in \mathfrak {R}^{m\times {n}}(m>n)\) , denote by \(\bar{{{\varvec{U}}}}{{\varvec{D}}}\bar{{{\varvec{V}}}}^T\) and \( {{\varvec{U}}}\varvec{\varSigma }{{\varvec{V}}}^T\) the singular value decomposition of matrix \({{\varvec{X}}}\) and \({{\varvec{Y}}}\), respectively, where \(\varvec{\varSigma }=\left( \begin{array}{cc} diag(\sigma _1,\sigma _2,...,\sigma _n)\\ \mathbf 0 \end{array} \right) \in \mathfrak {R}^{m\times {n}}\), and \({{\varvec{D}}}=\left( \begin{array}{cc} diag(d_1,d_2,...,d_n)\\ \mathbf 0 \end{array} \right) \) are the diagonal singular value matrices. Based on the property of Frobenius norm, the following derivations hold:

$$\begin{aligned}&\Vert {{\varvec{Y}}}-{{\varvec{X}}}\Vert _F^2+\Vert {{\varvec{X}}}\Vert _{w,*}\\&\quad = Tr\left( {{\varvec{Y}}}^T{{\varvec{Y}}}\right) -2Tr\left( {{\varvec{Y}}}^T{{\varvec{X}}}\right) +Tr\left( {{\varvec{X}}}^T{{\varvec{X}}}\right) +\sum _i^n w_i d_i\\&\quad =\sum _i^n\sigma _i^2-2Tr\left( {{\varvec{Y}}}^T{{\varvec{X}}}\right) +\sum _i^nd_i^2+\sum _i^n w_id_i. \end{aligned}$$

Based on the von Neumann trace inequality in Lemma 1, we know that \(Tr\left( {{\varvec{Y}}}^T{{\varvec{X}}}\right) \) achieves its upper bound \(\sum _i^n\sigma _i d_i\) if \({{\varvec{U}}} = \bar{{{\varvec{U}}}}\) and \({{\varvec{V}}} = \bar{{{\varvec{V}}}}\). Then, we have

$$\begin{aligned}&\min _{{\varvec{X}}}\Vert {{\varvec{Y}}}-{{\varvec{X}}}\Vert _F^2+\Vert {{\varvec{X}}}\Vert _{w,*}\\&\quad \Leftrightarrow \min _{{\varvec{D}}}\sum _i^n\sigma _i^2-2\sum _i^n\sigma _i d_i+\sum _i^nd_i^2+\sum _i^n w_id_i\\&\quad s.t. d_1\ge d_2 \ge ...\ge d_n \ge 0 \\&\quad \Leftrightarrow \min _{{{\varvec{D}}}}\sum _{i}(d_i-\sigma _i)^2+w_id_i\\&\quad s.t. ~d_1\ge d_2 \ge ...\ge d_n \ge 0. \end{aligned}$$

From the above derivation, we can see that the optimal solution of the WNNP problem in (5) is

$$\begin{aligned} {{\varvec{X}}}^*= {{\varvec{U}}}{{\varvec{D}}}{{\varvec{V}}}^T, \end{aligned}$$

where \({{\varvec{D}}}\) is the optimum of the constrained quadratic optimization problem in (6).

End of proof. \(\square \)

Proof of Corollary 1

Proof

Without considering the constraint, the optimization problem (6) degenerates to the following unconstrained formula:

$$\begin{aligned}&\min _{d_i\ge 0}(d_i-\sigma _i)^2+w_id_i\\&\quad \Leftrightarrow \min _{d_i\ge 0}\left( d_i-(\sigma _i-\frac{w_i}{2})\right) ^2. \end{aligned}$$

It is not difficult to derive its global optimum as:

$$\begin{aligned} \bar{d}_i = max\left( \sigma _i-\frac{w_i}{2},0\right) ,~i=1,2,...,n. \end{aligned}$$
(15)

Since we have \(\sigma _1 \ge \sigma _2 \ge ... \ge \sigma _n\) and the weight vector has a non-descending order \(w_1\le w_2 \le ... \le w_n\), it is easy to see that \(\bar{d}_1 \ge \bar{d}_2 \ge ... \ge \bar{d}_n\). Thus, \(\bar{d}_{i=1,2,...,n}\) satisfy the constraint of (6), and the solution in (15) is then the globally optimal solution of the original constrained problem in (6).

End of proof. \(\square \)

Proof of Theorem 2

Proof

Denote by \({{\varvec{U}}}_k\varvec{\varLambda }_k{{\varvec{V}}}_k^T\) the SVD of the matrix \(\{{{\varvec{Y}}}+\mu _k^{-1}{{\varvec{L}}}_k-{{\varvec{E}}}_{k+1}\}\) in the \((k+1)\)-th iteration, where \(\varvec{\varLambda }_k = \{diag(\sigma _k^1, \sigma _k^2 , ..., \sigma _k^n)\}\) is the diagonal singular value matrix. Based on the conclusion of Corollary 1, we have

$$\begin{aligned} {{\varvec{X}}}_{k+1}={{{\varvec{U}}}_k\varvec{\varSigma }_k{{\varvec{V}}}_k^T}, \end{aligned}$$
(16)

where \(\varvec{\varSigma }_k = {\mathcal {S}}_\mathbf{w /\mu _k}(\varvec{\varLambda }_k)\) is the singular value matrix after weighted shrinkage. Based on the Lagrange multiplier updating method in step 5 of Algorithm 1, we have

$$\begin{aligned} \begin{aligned} \Vert {{\varvec{L}}}_{k+1}\Vert _F&=\Vert {{\varvec{L}}}_k+\mu _k({{\varvec{Y}}}-{{\varvec{X}}}_{k+1}-{{\varvec{E}}}_{k+1})\Vert _F\\&=\mu _k\Vert \mu _k^{-1}{{\varvec{L}}}_k+{{\varvec{Y}}}-{{\varvec{X}}}_{k+1}-{{\varvec{E}}}_{k+1}\Vert _F\\&=\mu _k\Vert {{\varvec{U}}}_k\varvec{\varLambda }_k{{\varvec{V}}}_k^T-{{\varvec{U}}}_k\varvec{\varSigma }_k{{\varvec{V}}}_k^T\Vert _F\\&=\mu _k\Vert \varvec{\varLambda }_k-\varvec{\varSigma }_k\Vert _F\\&=\mu _k\Vert \varvec{\varLambda }_k - \mathcal {S}_\mathbf{w /\mu _k}(\varvec{\varLambda }_k)\Vert _F\\&\le \mu _k\sqrt{\sum _i \left( \frac{w_i}{\mu _k}\right) ^2}\\&=\sqrt{\sum _i w_i^2}. \end{aligned} \end{aligned}$$
(17)

Thus, \(\{{{\varvec{L}}}_{k}\}\) is bounded.

To analyze the boundedness of \(\varGamma ({{\varvec{X}}}^{k+1},{{\varvec{E}}}^{k+1},{{\varvec{L}}}^{k},\mu ^k)\), first we can see the following inequality holds because in step 3 and step 4 we have achieved the globally optimal solutions of the \({{\varvec{X}}}\) and \({{\varvec{E}}}\) subproblems:

$$\begin{aligned} \varGamma ({{\varvec{X}}}_{k+1},{{\varvec{E}}}_{k+1},{{\varvec{L}}}_{k},\mu _k)\le \varGamma ({{\varvec{X}}}_{k},{{\varvec{E}}}_{k},{{\varvec{L}}}_{k},\mu _k). \end{aligned}$$

Then, based on the way we update \({{\varvec{L}}}\):

$$\begin{aligned} {{\varvec{L}}}_{k+1} = {{\varvec{L}}}_k+\mu _k({{\varvec{Y}}}-{{\varvec{X}}}_{k+1}-{{\varvec{E}}}_{k+1}), \end{aligned}$$

there is

$$\begin{aligned}&\varGamma (X_k,E_k,L_k,\mu _k) \\&\quad = \varGamma (X_k,E_k,L_{k-1},\mu _{k-1})\\&\qquad +\frac{\mu _k-\mu _{k-1}}{2}\left\| Y-X_{k}-E_{k}\right\| _F^2\\&\qquad +\langle L_k-L_{k-1},Y-X_k-E_k\rangle \\&\quad = \varGamma (X_k,E_k,L_{k-1},\mu _{k-1})\\&\qquad +\frac{\mu _k - \mu _{k-1}}{2}\left\| \mu ^{-1}_{k-1}\left( L_k-L_{k-1}\right) \right\| _F^2\\&\qquad +\left\langle L_k-L_{k-1},\mu ^{-1}_{k-1}\left( L_k-L_{k-1}\right) \right\rangle \\&\quad = \varGamma (X_k,E_k,L_{k-1},\mu _{k-1})\\&\qquad +\frac{\mu _k+\mu _{k-1}}{2\mu ^{2}_{k-1}}\left\| L_k-L_{k-1}\right\| _F^2. \end{aligned}$$

Denote by \(\Theta \) the upper bound of \(\Vert {{\varvec{L}}}_k-{{\varvec{L}}}_{k-1}\Vert _F^2\) for all \(\{k=1,\ldots ,\infty \}\). We have

$$\begin{aligned} \varGamma ({{\varvec{X}}}_{k+1},{{\varvec{E}}}_{k+1},{{\varvec{L}}}_{k},\mu _k)\le&\varGamma ({{\varvec{X}}}_{1},{{\varvec{E}}}_{1},{{\varvec{L}}}_{0},\mu _0)\\&+\varTheta \sum _{k=1}^\infty \frac{\mu _k+\mu _{k-1}}{2\mu _{k-1}^{2}}. \end{aligned}$$

Since the penalty parameter \(\{\mu _k\}\) satisfies \(\sum _{k=1}^\infty \mu _k^{-2}\mu _{k+1}<+\infty \), we have

$$\begin{aligned} \sum _{k=1}^\infty \frac{\mu _k+\mu _{k-1}}{2\mu _{k-1}^{2}}\le \sum _{k=1}^\infty \mu _{k-1}^{-2}\mu _{k}<+\infty . \end{aligned}$$

Thus, we know that \(\varGamma ({{\varvec{X}}}^{k+1},{{\varvec{E}}}^{k+1},{{\varvec{L}}}^{k},\mu ^k)\) is also upper bounded.

The boundedness of \(\{{{\varvec{X}}}^{k}\}\) and \(\{{{\varvec{E}}}^{k}\}\) can be easily deduced as follows:

$$\begin{aligned}&\Vert {{\varvec{E}}}_{k}\Vert _1+\Vert {{\varvec{X}}}_{k}\Vert _{w,*}\\&\quad =\varGamma ({{\varvec{X}}}_{k},{{\varvec{E}}}_{k},{{\varvec{L}}}_{k-1},\mu _{k-1})+\frac{\mu _{k-1}}{2}( \frac{1}{\mu ^2_{k-1}}\Vert {{\varvec{L}}}_{k-1}\Vert _F^2\\&\qquad - \Vert {{\varvec{Y}}}-{{\varvec{X}}}_k-{{\varvec{E}}}_k+ \frac{1}{\mu _{k-1}}{{\varvec{L}}}_{k-1}\Vert _F^2)\\&\quad = \varGamma ({{\varvec{X}}}_{k},{{\varvec{E}}}_{k},{{\varvec{L}}}_{k-1},\mu _{k-1})-\frac{1}{2\mu _{k-1}}(\Vert {{\varvec{L}}}_{k}\Vert _F^2-\Vert {{\varvec{L}}}_{k-1}\Vert _F^2). \end{aligned}$$

Thus, \(\{{{\varvec{X}}}_{k}\}\), \(\{{{\varvec{E}}}_{k}\}\) and \(\{{{\varvec{L}}}_{k}\}\) generated by the proposed algorithm are all bounded. There exists at least one accumulation point for \(\{{{\varvec{X}}}_{k},{{\varvec{E}}}_{k},{{\varvec{L}}}_{k}\}\). Specifically, we have

$$\begin{aligned} \lim _{k\rightarrow \infty }\Vert {{\varvec{Y}}}-{{\varvec{X}}}_{k+1}-{{\varvec{E}}}_{k+1}\Vert _F&=\lim _{k\rightarrow \infty }\frac{1}{\mu _k}\Vert {{\varvec{L}}}_{k+1}-{{\varvec{L}}}_{k}\Vert _F =0, \end{aligned}$$

and the accumulation point is a feasible solution to the objective function.

We then prove that the change of the variables in adjacent iterations tends to be zero. For the \({{\varvec{E}}}\) subproblem in step 3, we have

$$\begin{aligned}&\lim _{k\rightarrow \infty }\Vert {{\varvec{E}}}_{k+1}-{{\varvec{E}}}_{k}\Vert _F\\&\quad =\lim _{k\rightarrow \infty }\Vert \mathcal {S}_{\frac{1}{\mu _k}}\left( {{\varvec{Y}}}+\mu _k^{-1}{{\varvec{L}}}_{k}-{{\varvec{X}}}_{k}\right) -\left( {{\varvec{Y}}}+\mu _k^{-1}{{\varvec{L}}}_{k}-{{\varvec{X}}}_{k}\right) \\&\qquad -2\mu _k^{-1}{{\varvec{L}}}_{k}-\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1}\Vert _F\\&\quad \le \lim _{k\rightarrow \infty }\frac{mn}{\mu _k}+\Vert 2\mu _k^{-1}{{\varvec{L}}}_{k}+\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1}\Vert _F=0, \end{aligned}$$

in which \(\mathcal {S}_{\frac{1}{\mu _k}}(\cdot )\) is the soft-thresholding operation with parameter \(\frac{1}{\mu _k}\), and m and n are the size of matrix \({{\varvec{Y}}}\).

To prove \(\lim _{k\rightarrow \infty }\Vert {{\varvec{X}}}_{k+1}-{{\varvec{X}}}_{k}\Vert _F=0\), we recall the updating strategy in Algorithm 1 which makes the following inequalities hold:

$$\begin{aligned}&{{\varvec{X}}}_{k}={{{\varvec{U}}}_{k-1}\mathcal {S}_\mathbf{w /\mu _{k-1}}(\varvec{\varLambda }_{k-1}){{\varvec{V}}}_{k-1}^T},\\&{{\varvec{X}}}_{k+1}={{\varvec{Y}}}+\mu _k^{-1}{{\varvec{L}}}_{k}-{{\varvec{E}}}_{k+1}-\mu _k^{-1}{{\varvec{L}}}_{k+1}, \end{aligned}$$

where \({{\varvec{U}}}_{k-1}\varvec{\varLambda }_{k-1}{{\varvec{V}}}_{k-1}^T\) is the SVD of the matrix \(\{{{\varvec{Y}}}+\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1}-{{\varvec{E}}}_{k}\}\) in the k-th iteration. We then have

$$\begin{aligned}&\lim _{k\rightarrow \infty }\Vert {{\varvec{X}}}_{k+1}-{{\varvec{X}}}_{k}\Vert _F\\&\quad =\lim _{k\rightarrow \infty }\Vert ({{\varvec{Y}}}+\mu _k^{-1}{{\varvec{L}}}_{k}-{{\varvec{E}}}_{k+1}-\mu _k^{-1}{{\varvec{L}}}_{k+1})-{{\varvec{X}}}_{k}\Vert _F\\&\quad =\lim _{k\rightarrow \infty }\Vert ({{\varvec{Y}}}+\mu _k^{-1}{{\varvec{L}}}_{k}-{{\varvec{E}}}_{k+1}-\mu _k^{-1}{{\varvec{L}}}_{k+1})-{{\varvec{X}}}_{k}\\&\qquad +({{\varvec{E}}}_{k}+\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1})-({{\varvec{E}}}_{k}+\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1})\Vert _F\\&\quad \le \lim _{k\rightarrow \infty }\Vert {{\varvec{Y}}}+\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1}-{{\varvec{E}}}_{k}-{{\varvec{X}}}_{k}\Vert _F+\Vert {{\varvec{E}}}_{k}\\&\qquad -{{\varvec{E}}}_{k+1}+\mu _{k}^{-1}{{\varvec{L}}}_{k}-\mu _k^{-1}{{\varvec{L}}}_{k+1}-\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1}\Vert _F\\&\quad \le \lim _{k\rightarrow \infty }\Vert \varvec{\varLambda }_{k-1} - \mathcal {S}_\mathbf{w /\mu _{k-1}}(\varvec{\varLambda }_{k-1})\Vert _F+\Vert {{\varvec{E}}}_{k}-{{\varvec{E}}}_{k+1}\Vert _F\\&\qquad +\Vert \mu _{k}^{-1}{{\varvec{L}}}_{k}-\mu _k^{-1}{{\varvec{L}}}_{k+1}-\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1}\Vert _F\\&\quad = 0. \end{aligned}$$

End of proof. \(\square \)

Proof of Remark 1

Proof

Based on the conclusion of Theorem 1, the WNNM problem can be equivalently transformed to a constrained singular value optimization problem. Furthermore, when utilizing the reweighting strategy \(w_i^{\ell +1}=\frac{C}{\sigma _i^\ell ({{\varvec{X}}})+\varepsilon }\), the singular values of \({{\varvec{X}}}\) are consistently sorted in a non-ascending order. The weight vector thus follows the non-descending order. It is then easy to deduce that the sorted orders of the sequences \(\{\sigma _i({{\varvec{Y}}}), \sigma _i({{\varvec{X}}}_\ell ),w_i^\ell ; i=1,2,\cdots ,n\}\) keep unchanged during the iteration. Thus, the optimization for each singular value \(\sigma _i({{\varvec{X}}})\) can be analyzed independently. For the purpose of simplicity, in the following development we omit the subscript i and denote by y a singular value of matrix \({{\varvec{Y}}}\), and denote by x and w the corresponding singular value of \({{\varvec{X}}}\) and its weight.

For the weighting strategy \(w^\ell =\frac{C}{x^{\ell -1}+\varepsilon }\), we have

$$\begin{aligned} x^\ell =max\left( y-\frac{C}{x^{\ell -1}+\varepsilon },0\right) . \end{aligned}$$

Since we initialize \(x^0\) as the singular value of matrix \({{\varvec{X}}}_0={{\varvec{Y}}}\), and each \(x^\ell \) is a result of soft-thresholding operation on positive value \(y=\sigma _i({{\varvec{Y}}})\), \(\{x^\ell \}\) is a non-negative sequence. The convergence value \(\lim _{\ell \rightarrow \infty } x^\ell \) for different conditions are analyzed as follows.

  1. (1)

    \(c_2<0\) From the definition of \(c_1\) and \(c_2\), we have \((y+\varepsilon )^2-4C<0\). In such case, the quadratic system \(x^2+(\varepsilon -y)x+C-y\varepsilon =0\) does not have a real solution and function \(f(x) = x^2+(\varepsilon -y)x+C-y\varepsilon \) gets its positive minimum value \(C-y\varepsilon -\frac{(y-\varepsilon )^2}{4}\) at \(x=\frac{y-\varepsilon }{2}\). \(\forall \tilde{x}\ge 0\), the following inequalities hold

    $$\begin{aligned}&f(\tilde{x})\ge f\left( \frac{y-\varepsilon }{2}\right) \\&\tilde{x}^2+(\varepsilon -y)\tilde{x}\ge -\frac{(y-\varepsilon )^2}{4}\\&\tilde{x}-\frac{C-y\varepsilon -\frac{(y-\varepsilon )^2}{4}}{\tilde{x}+\varepsilon }\ge y-\frac{C}{\tilde{x}+\varepsilon }. \end{aligned}$$

    The sequence \(x^{\ell +1}=max\left( y-\frac{C}{x^{\ell }+\varepsilon },0\right) \) with initialization \(x^0=y\) is a monotonically decreasing sequence for any \(x^\ell \ge 0\). We have \(x^\ell <y\), and

    $$\begin{aligned} x^\ell -\left( y-\frac{C}{x^\ell +\varepsilon }\right) >\frac{C-y\varepsilon -\frac{(y-\varepsilon )^2}{4}}{y+\varepsilon }. \end{aligned}$$

    If \(x^\ell \le \frac{C-y\varepsilon }{y}\), we have \(y-\frac{C}{x^\ell +\varepsilon }\le 0\) and \(x^{\ell +1} = max\left( y-\frac{C}{x^{\ell }+\varepsilon },0\right) =0\). If \(x^\ell >\frac{C-y\varepsilon }{y}\), \(\exists N\in \mathbb {N}\) makes \(x^{\ell +N}<x^\ell -N\cdot \frac{C-y\varepsilon -\frac{(y-\varepsilon )^2}{4}}{y+\varepsilon }\) less than \(\frac{C-y\varepsilon }{y}\). The sequence \(\{x^\ell \}\) will shrink to 0 monotonically.

  2. (2)

    \(c_2\ge 0\) In such case, we can know that \(y>0\), because if \(y=0\), we will have \(c_2=(y+\varepsilon )^2-4C=\varepsilon ^2-4C<0\). For positive C and sufficiently small value \(\varepsilon \), we can know that \(c_1\) is also non-negative:

    $$\begin{aligned}&c_2 = (y+\varepsilon )^2-4C\ge 0\\&(y+\varepsilon )^2\ge 4C\\&y-\varepsilon \ge 2(\sqrt{C}-\varepsilon )\\&c_1=y-\varepsilon \ge 0. \end{aligned}$$

    Having \(c_2\ge 0\), \(c_1\ge 0\), we have

    $$\begin{aligned} \bar{x}_2 = \frac{y-\varepsilon +\sqrt{(y-\varepsilon )^2-4(C-\varepsilon y)}}{2}>0. \end{aligned}$$

    For any \(x>\bar{x}_2>0\), the following inequalities hold:

    $$\begin{aligned}&f(x) = x^2+(\varepsilon -y)x+C-y\varepsilon>0\\&\left[ x-\left( y-\frac{C}{x+\varepsilon }\right) \right] (x+\varepsilon )>0\\&x>y-\frac{C}{x+\varepsilon } . \end{aligned}$$

    Furthermore, we have

    $$\begin{aligned} x>y-\frac{C}{x+\varepsilon }>y-\frac{C}{\bar{x}_2+\varepsilon }=\bar{x}_2. \end{aligned}$$

    Thus, for \(x^0=y>\bar{x}_2\), we always have \(x^\ell>x^{\ell +1}>\bar{x}_2\), the sequence is monotonically decreasing and has lower bound \(\bar{x}_2\). The sequence will converge to \(\bar{x}_2\), as we prove below. We propose a proof by contradiction. If \({x^\ell }\) converges to \(\hat{x}\ne \bar{x}_2\), then we have \(\hat{x}>\bar{x}_2\) and \(f(\hat{x})>0\). By the definition of convergence, we can obtain that \(\forall \epsilon >0\), \(\exists N\in \mathbb {N}\) s.t. \(\forall \ell \ge N\), the following inequality must be satisfied

    $$\begin{aligned} |x^\ell -\hat{x}|<\epsilon . \end{aligned}$$
    (18)

    We can also have the following inequalies

    $$\begin{aligned}&f(x^N) \ge f(\hat{x})\\&\left[ x^N-\left( y-\frac{C}{x^N+\varepsilon }\right) \right] (x^N+\varepsilon ) \ge f(\hat{x})\\&\left[ x^N-\left( y-\frac{C}{x^N+\varepsilon }\right) \right] (y+\varepsilon ) \ge f(\hat{x})\\&x^N-\left( y-\frac{C}{x^N+\varepsilon }\right) \ge \frac{f(\hat{x})}{y+\varepsilon }\\&x^{N}-x^{N+1}>\frac{f(\hat{x})}{y+\varepsilon } \end{aligned}$$

    If we take \(\epsilon =\frac{f(\hat{x})}{2(y+\varepsilon )}\), then \( x^{N}-x^{N+1}> 2\epsilon \), and we can thus obtain

    $$\begin{aligned}&|x^{N+1}-\hat{x}|\\= & {} |x^{N+1}-x^N+x^N-\hat{x}|\\\ge & {} \left| |x^{N+1}-x^N|-|x^N-\hat{x}|\right| \\\le & {} |2\epsilon -\epsilon |=\epsilon \end{aligned}$$

    This is however a contradiction to (18), and thus \({x^\ell }\) converges to \( {\bar{x}}_2\).

End of proof. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gu, S., Xie, Q., Meng, D. et al. Weighted Nuclear Norm Minimization and Its Applications to Low Level Vision. Int J Comput Vis 121, 183–208 (2017). https://doi.org/10.1007/s11263-016-0930-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-016-0930-5

Keywords

  • Low rank analysis
  • Nuclear norm minimization
  • Low level vision