Abstract
In this paper, we study the popularly dubbed matrix completion problem, where the task is to “fill in” the unobserved entries of a matrix from a small subset of observed entries, under the assumption that the underlying matrix is of low rank. Our contributions herein enhance our prior work on nuclear norm regularized problems for matrix completion (Mazumder et al. in J Mach Learn Res 1532(11):2287–2322, 2010) by incorporating a continuum of nonconvex penalty functions between the convex nuclear norm and nonconvex rank functions. Inspired by Soft-Impute (Mazumder et al. 2010; Hastie et al. in J Mach Learn Res, 2016), we propose NC-Impute—an EM-flavored algorithmic framework for computing a family of nonconvex penalized matrix completion problems with warm starts. We present a systematic study of the associated spectral thresholding operators, which play an important role in the overall algorithm. We study convergence properties of the algorithm. Using structured low-rank SVD computations, we demonstrate the computational scalability of our proposal for problems up to the Netflix size (approximately, a 500,000 \(\times \) 20,000 matrix with \(10^8\) observed entries). We demonstrate that on a wide range of synthetic and real data instances, our proposed nonconvex regularization framework leads to low-rank solutions with better predictive performance when compared to those obtained from nuclear norm problems. Implementations of algorithms proposed herein, written in the R language, are made available on github.
This is a preview of subscription content, access via your institution.








Notes
We say that a function is a spectral function of a matrix X, if it depends only upon the singular values of X. The state-of-the-art algorithmics in mixed integer Semidefinite optimization problems is in its nascent stage; and not even comparable to the technology for mixed integer quadratic optimization.
Since the problems under consideration are nonconvex, our methods are not guaranteed to reach the global minimum—we thus refer to the solutions obtained as upper bounds. In many synthetic examples, however, the solutions are indeed seen to be globally optimal. We do show rigorously, however, that these solutions are first-order stationary points for the optimization problems under consideration.
Note that we consider \(\tau \ge 0\) in the definition so that it includes the case of (nonstrong) convexity.
This follows from the simple observation that \(s_{a\lambda , \gamma }(ax)=a s_{\lambda , \gamma }(x)\) and \(s'_{a\lambda , \gamma }(ax)=s'_{\lambda , \gamma }(x)\).
Due to the boundedness of the penalty function, the boundedness of the objective function does not necessarily imply that the sequence \(\varvec{\sigma }(X_k)\) will remain bounded.
We note that it is not guaranteed that the \({X}_k\)’s will be of low rank across the iterations of the algorithm for \(k \ge 1\), even if they are eventually, for k sufficiently large. However, in the presence of warm starts across \((\lambda ,\gamma )\) they are indeed, empirically, found to have low rank as long as the regularization parameters are large enough to result in a small rank solution. Typically, as we have observed in our experiments, in the presence of warm starts, the rank of \(X_{k}\) is found to remain low across all iterations.
Available at http://grouplens.org/datasets/movielens/.
Note that we do not assume that the sequence \(\varvec{\sigma }_{k}\) has a limit point.
References
Alquier, P.: A bayesian approach for noisy matrix completion: optimal rate under general sampling distribution. Electron. J. Stat. 9(1), 823–841 (2015)
Bai, Z., Silverstein, J.W.: Spectral Analysis of Large Dimensional Random Matrices. Springer, Berlin (2010)
Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)
Bhojanapalli, S., Neyshabur, B., Srebro, N.: Global optimality of local search for low rank matrix recovery. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 3873–3881 (2016)
Borwein, J., Lewis, A.: Convex Analysis and Nonlinear Optimization. Springer, New York (2006)
Cai, J.F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20, 1956–1982 (2010)
Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9, 717–772 (2009)
Candès, E.J., Plan, Y.: Matrix completion with noise. Proc. IEEE 98, 925–936 (2010a)
Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56, 2053–2080 (2010b)
Candes, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted \(\ell _1\) minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008)
Candès, E., Sing-Long, C., Trzasko, J.D.: Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Trans.Signal Process. 61(19), 4643–4657 (2013)
Chen, Y., Bhojanapalli, S., Sanghavi, S., Ward, R.: Coherent matrix completion. In: Proceedings of the 31st International Conference on Machine Learning, JMLR, pp. 674–682 (2014)
Chen, Y.: Incoherence-optimal matrix completion. IEEE Trans. Inf. Theory 61(5), 2909–2923 (2015)
Chen, Y., Wainwright, M.J.: Fast low-rank estimation by projected gradient descent: general statistical and algorithmic guarantees (2015). arXiv preprint arXiv:1509.03025
Chen, J., Liu, D., Li, X.: Nonconvex rectangular matrix completion via gradient descent without \(\ell _{2,\infty }\) regularization (2019a). arXiv preprint arXiv:1901.06116
Chen, Y., Chi, Y., Fan, J., Ma, C., Yan, Y.: Noisy matrix completion: understanding statistical guarantees for convex relaxation via nonconvex optimization (2019b). arXiv preprint arXiv:1902.07698
Chi, Y., Lu, Y.M., Chen, Y.: Nonconvex optimization meets low-rank matrix factorization: an overview. IEEE Trans. Signal Process. 67(20), 5239–5269 (2019)
Chistov, A.L., Grigor’ev, D.Y.: Complexity of quantifier elimination in the theory of algebraically closed fields. In: Proceedings of the 11th International Symposium on Mathematical Foundations of Computer Science, pp. 17–31. Springer (1984)
Daubechies, I., DeVore, R., Fornasier, M., Güntürk, C.S.: Iteratively reweighted least squares minimization for sparse recovery. Commun. Pure Appl. Math. J. Issued Courant Inst. Math. Sci. 63(1), 1–38 (2010)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. B 39, 1–38 (1977)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression (with discussion). Ann. Stat. 32(2), 407–499 (2004)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Fazel, M.: Matrix rank minimization with applications. Ph.D. thesis, Stanford University (2002)
Feng, L., Zhang, C.H.: Sorted concave penalized regression (2017). arXiv preprint arXiv:1712.09941
Fornasier, M., Rauhut, H., Ward, R.: Low-rank matrix recovery via iteratively reweighted least squares minimization. SIAM J. Optim. 21(4), 1614–1640 (2011)
Frank, I.E., Friedman, J.H.: A statistical view of some chemometrics regression tools. Technometrics 35, 109–135 (1993)
Freund, R.M., Grigas, P., Mazumder, R.: An extended Frank-Wolfe method with “In-Face” directions, and its application to low-rank matrix completion (2015). arXiv e-prints arXiv:1511.02204
Ge, R., Lee, J.D., Ma, T.: Matrix completion has no spurious local minimum. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 2973–2981 (2016)
Ge, R., Jin, C., Zheng, Y.: No spurious local minima in nonconvex low rank problems: a unified geometric analysis. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, JMLR. org, pp. 1233–1242 (2017)
Golub, G., Van Loan, C.: Matrix Computations. Johns Hopkins University Press, Baltimore (1983)
Gross, D.: Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inf. Theory 57(3), 1548–1566 (2011)
Gu, S., Xie, Q., Meng, D., Zuo, W., Feng, X., Zhang, L.: Weighted nuclear norm minimization and its applications to low level vision. Int. J. Comput. Vis. 121(2), 183–208 (2017)
Hardt, M.: Understanding alternating minimization for matrix completion. In: IEEE 55th Annual Symposium on Foundations of Computer Science, pp. 651–660. IEEE (2014)
Hardt, M., Wootters, M.: Fast matrix completion without the condition number. In: Conference on Learning Theory, pp. 638–678 (2014)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Prediction, Inference and Data Mining, 2nd edn. Springer, New York (2009)
Hastie, T., Mazumder, R., Lee, J.D., Zadeh, R.: Matrix completion and low-rank SVD via fast alternating least squares. J. Mach. Learn. Res. 16(1), 3367–3402 (2016)
Hazimeh, H., Mazumder, R.: Fast best subset selection: coordinate descent and local combinatorial optimization algorithms. Oper. Res. (2019) (accepted)
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (2012)
Jaggi, M., Sulovský, M.: A simple algorithm for nuclear norm regularized problems. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 471–478 (2010)
Jain, P., Meka, R., Dhillon, I.S.: Guaranteed rank minimization via singular value projection. In: Advances in Neural Information Processing Systems, pp. 937–945 (2010)
Jain, P., Netrapalli, P., Sanghavi, S.: Low-rank matrix completion using alternating minimization. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, pp. 665–674. ACM (2013)
Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from noisy entries. J. Mach. Learn. Res. 11, 2057–2078 (2010)
Klopp, O.: Noisy low-rank matrix completion with general sampling distribution. Bernoulli 20(1), 282–303 (2014)
Koltchinskii, V., Lounici, K., Tsybakov, A.B.: Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Stat. 39(5), 2302–2329 (2011)
Larsen, R.: Propack-software for large and sparse SVD calculations (2004). http://sun.stanford.edu/~rmunk/PROPACK
Lecué, G., Mendelson, S.: Regularization and the small-ball method I: sparse recovery. Ann. Stat. 46(2), 611–641 (2018)
Lewis, A.S.: The convex analysis of unitarily invariant matrix functions. J.f Convex Anal. 2, 173–183 (1995)
Loh, P.L., Wainwright, M.J.: Regularized m-estimators with nonconvexity: statistical and algorithmic theory for local optima. J. Mach. Learn. Res. 16, 559–616 (2015)
Lv, J., Fan, Y.: A unified approach to model selection and sparse recovery using regularized least squares. Ann. Stat. 37, 3498–3528 (2009)
Ma, C., Wang, K., Chi, Y., Chen, Y.: Implicit regularization in nonconvex statistical estimation: gradient descent converges linearly for phase retrieval, matrix completion and blind deconvolution (2017). arXiv preprint arXiv:1711.10467
Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11, 2287–2322 (2010)
Mazumder, R., Friedman, J.H., Hastie, T.: Sparsenet: coordinate descent with nonconvex penalties. J. Am. Stat. Assoc. 106, 1125–1138 (2011)
Mazumder, R., Radchenko, P.: The discrete dantzig selector: estimating sparse linear models via mixed integer linear optimization (2015). arXiv preprint arXiv:1508.01922
Mazumder, R., Radchenko, P., Dedieu, A.: Subset selection with shrinkage: sparse linear modeling when the SNR is low (2017). arXiv preprint arXiv:1708.03288
Mohan, K., Fazel, M.: Reweighted nuclear norm minimization with application to system identification. In: Proceedings of the 2010 American Control Conference, pp. 2953–2959. IEEE (2010)
Mohan, K., Fazel, M.: Iterative reweighted algorithms for matrix rank minimization. J. Mach. Learn. Res. 13(Nov), 3441–3473 (2012)
Negahban, S.N., Wainwright, M.J.: Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Stat. 39, 1069–1097 (2011)
Negahban, S.N., Wainwright, M.J.: Restricted strong convexity and weighted matrix completion: optimal bounds with noise. J. Mach. Learn. Res. 13, 1665–1697 (2012)
Nikolova, M.: Local strong homogeneity of a regularized estimator. SIAM J. Appl. Math. 61, 633–658 (2000)
Recht, B.: A simpler approach to matrix completion. J. Mach. Learn. Res. 12, 3413–3430 (2011)
Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Rohde, A., Tsybakov, A.B.: Estimation of high-dimensional low-rank matrices. Ann. Stat. 39, 887–930 (2011)
Shapiro, A., Xie, Y., Zhang, R.: Matrix completion with deterministic pattern: a geometric perspective. IEEE Trans. Signal Process. 67(4), 1088–1103 (2018)
SIGKDD, A., Netflix: Soft modelling by latent variables: the nonlinear iterative partial least squares (NIPALS) approach. In: Proceedings of KDD Cup and Workshop (2007)
Stein, C.M.: Estimation of the mean of a multivariate normal distribution. Ann. Stat. 9(6), 1135–1151 (1981)
Stewart, G.W., Sun, J.G.: Matrix Perturbation Theory. Computer Science and Scientific Computing. Academic Press, Cambridge (1990)
Sun, R., Luo, Z.Q.: Guaranteed matrix completion via non-convex factorization. IEEE Trans. Inf. Theory 62(11), 6535–6579 (2016)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.B.: Missing value estimation methods for dna microarrays. Bioinformatics 17(6), 520–525 (2001)
Wang, S., Weng, H., Maleki, A.: Which bridge estimator is the best for variable selection? Ann. Stat. (2019) (accepted)
Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Zhang, C.H., Zhang, T.: A general theory of concave regularization for high-dimensional sparse estimation problems. Stat. Sci. 27(4), 576–593 (2012)
Zheng, Q., Lafferty, J.: Convergence analysis for rectangular matrix completion using Burer-Monteiro factorization and gradient descent (2016). arXiv preprint arXiv:1605.07051
Zheng, L., Maleki, A., Weng, H., Wang, X., Long, T.: Does \(\ell _p\)-minimization outperform \(\ell _1\)-minimization? IEEE Trans. Inf. Theory 63(11), 6896–6935 (2017)
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006)
Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 36(4), 1509–1533 (2008)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Additional technical material
Lemma 1
(Marchenko–Pastur law (Bai and Silverstein 2010)). Let \(X\in {\mathbb {R}}^{m \times n}\), where \(X_{ij}\) are iid with \({\mathbb {E}}(X_{ij})=0, {\mathbb {E}}(X_{ij}^2)=1\), and \(m>n\). Let \(\lambda _1\le \lambda _2 \le \dots \le \lambda _n\) be the eigenvalues of \(Q_m=\frac{1}{m}X'X\). Define the random spectral measure
Then, assuming \(n/m \rightarrow \alpha \in (0,1]\), we have
where \(\mu \) is a deterministic measure with density
Here, \(\alpha _+=(1+\sqrt{\alpha })^2\,\) and \(\, \alpha _-=(1-\sqrt{\alpha })^2\).
1.1.1 Proof of Proposition 5.
Proof
In the following proof, we make use of the notation: \(\varTheta _1(\cdot )\) and \(\varTheta _2(\cdot )\), defined as follows. For two positive sequences \(a_{k}\) and \(b_k\), we say \(a_k= \varTheta _2(b_k)\) if there exists a constant \(c>0\) such that \(a_k \ge c b_k\) and we say \(a_k = \varTheta _1(b_k)\), whenever, \(a_k = \varTheta _2(b_k)\) and \(b_k=\varTheta _2(a_k)\).
We first consider the case \(\lambda _n=\varTheta _1(\sqrt{m})\) For simplicity, we assume \(\lambda _n=\zeta \sqrt{m}\) for some constant \(\zeta >0\). Denote \(df(S_{\lambda _n,\gamma }(Z))= D_{\lambda _n,\gamma }\), and use \({\mathcal {T}}_{t_1,t_2}\) to represent
Adopting the notation from Lemma 1, it is not hard to verify that
where \(t_1, t_2 \overset{\text {iid}}{\sim } \mu _n\). A quick check of the relation between \(s_{\lambda _n,\gamma }\) and \(g_{\zeta ,\gamma }\) yields
Due to the Lipschitz continuity of the functions \(s_{\lambda _n,\gamma }(x)\) and \(xg_{\zeta , \gamma }(x)\), we obtain
Hence, there exists a positive constant \(C_{\alpha }\), such that for sufficiently large n,
Let \(T_1,T_2\) be two independent random variables generated from the Marchenko–Pastur distribution \(\mu \). If we can show
then by the dominated convergence theorem (DCT), we conclude the proof in the \(\lambda _n=\varTheta _1(\sqrt{m})\) regime. Note immediately that
Moreover, given that \(g_{\zeta ,\gamma }(\cdot )\) is bounded and continuous, the Marchenko–Pastur theorem in Lemma 1 implies
Since \((t_1, t_2) \overset{d}{\rightarrow } (T_1, T_2)\), and the discontinuity set of the function \(\frac{t_1g_{\zeta ,\gamma }(t_1)-t_2g_{\zeta ,\gamma }(t_2)}{t_1-t_2}\mathbb {1}(t_1\ne t_2)\) has zero probability under the measure induced by \((T_1,T_2)\), by the continuous mapping theorem,
Also, due to the boundedness of \(\frac{t_1g_{\zeta ,\gamma }(t_1)-t_2g_{\zeta ,\gamma }(t_2)}{t_1-t_2}\mathbb {1}(t_1\ne t_2)\), it holds that
Combining (34)–(36) completes the proof for the \(\lambda _n=\varTheta _1(\sqrt{m})\) case.
When \(\lambda _n=o(\sqrt{m})\), we can readily see that
Using that both \(\frac{s_{\lambda _n,\gamma }(\sqrt{mt_1})}{\sqrt{mt_1}}\,\) and \({\mathcal {T}}_{t_1,t_2}\) are bounded, we have, almost surely
and
Invoking DCT completes the proof. Similar arguments hold for the case \(\lambda _n=\varTheta _2(\sqrt{m})\). \(\square \)
Random orthogonal model (ROM) simulations with \(\text {SNR}=1\). The optimal nonconvex penalties are obtained at \(\gamma =30\) and \(\gamma =20\) under the two scenarios, respectively. The integers from 1 to 100 on the x-axis index the grid of 100 values of \(\lambda \) (from largest to smallest) as described in Sect. 4.1
Random orthogonal model (ROM) simulations with \(\text {SNR}=5\). The optimal nonconvex penalties are obtained at \(\gamma =30\) and \(\gamma =5\) under the two scenarios, respectively. The integers from 1 to 100 on the x-axis index the grid of 100 values of \(\lambda \) (from largest to smallest) as described in Sect. 4.1
Coherent and nonuniform sampling (NUS) simulations with \(\text {SNR}=10\). The optimal nonconvex penalties are both obtained at \(\gamma =5\) under the two scenarios, respectively. The integers from 1 to 100 on the x-axis index the grid of 100 values of \(\lambda \) (from largest to smallest) as described in Sect. 4.1
1.1.2 Proof of Proposition 10
Proof
Observe that R as defined in Proposition 9 can be written as:
where above we have used the fact that \({\widetilde{A}}{\widetilde{V}}_{1} = {\widetilde{U}}_{1}{\widetilde{\varSigma }}_{1}\), which follows from the definition of the SVD of \({\widetilde{A}}\). By a simple inequality, it follows that
where we have used the fact that \(\Vert {\widetilde{V}}_{1}\Vert _2 = 1\). Similarly, we have an analogous result for Q:
Note that (38) and (39) together imply that if \(\Vert {\widetilde{A}} - A \Vert _2\) is small, then so are \(\Vert R\Vert _2, \Vert Q\Vert _2\).
We now apply (31) (Proposition 9) with \(A = X_{k}\) and \({\widetilde{A}} = X_{k+1}\) and \(r_{1} = p\), to arrive at the proof of Proposition 10. \(\square \)
1.1.3 Proof of Proposition 11
Proof
Proof of Part (a):
Let us write the stationary conditions for every update:
We set the subdifferential of the map \(X \mapsto F_{\ell }(X;X_{k})\) to zero at \(X = X_{k+1}\):
where \(X_{k+1} = U_{k+1} \mathrm {diag}(\varvec{\sigma }_{k+1})V'_{k+1}\) is the SVD of \(X_{k+1}\). Note that the term, \(U_{k+1} \nabla _{k+1} V_{k+1}'\) in (40), is a subdifferential (Lewis 1995) of the spectral function:
where \(\nabla _{k+1}\) is a diagonal matrix with the ith diagonal entry being a derivative of the map \(\sigma _{i} \mapsto P(\sigma _{i}; \lambda ,\gamma )\) (on \(\sigma _{i} \ge 0\)), denoted by \(\partial P(\sigma _{k+1, i}; \lambda ,\gamma )/\partial \sigma _{i}\) for all i. Note that (40) can be rewritten as:
As \(k \rightarrow \infty \), term (a) converges to zero (See Proposition 7), and thus, we have:
Let us denote the ith column of \(U_{k}\) by \({u}_{k,i}\), and use a similar notation for \(V_{k}\) and \(v_{k,i}\). Let \(r_{k+1}\) denote the rank of \(X_{k+1}\). Hence, we have:
Multiplying the left- and right-hand sides of the above by \(u'_{k+1,j}\) and \(v_{k+1,j}\), we have the following:
for \(j = 1, \ldots , r_{k+1}.\) Let \(\left\{ {\bar{U}}, {\bar{V}} \right\} \) denote a limit point of the sequence \(\left\{ U_{k},V_{k}\right\} \) (which exists since the sequence is bounded), and let r be the rank of \({\bar{U}}\) and \({\bar{V}}\). Let us now study the following equations:Footnote 8
Using the notation \({\bar{\theta }}_{j} = \text {vec} \left( {\mathcal {P}}_{\varOmega }({\bar{u}}_{j}{\bar{v}}'_{j}) \right) \) and \({\bar{y}} = \text {vec}({\mathcal {P}}_{\varOmega }(Y))\), we note that (41) are the first-order stationary conditions for a point \(\bar{\varvec{\sigma }}\) for the following penalized regression problem:
with \(\varvec{\sigma } \ge \mathbf {0}\).
If the matrix \({\bar{\varTheta }} = [{{\bar{\theta }}}_{1}, \ldots , {{\bar{\theta }}}_{r}]\) (note that \({\bar{\varTheta }} \in {\mathbb {R}}^{mn \times r}\)) has rank r, then any \(\varvec{\sigma }\) that satisfies (41) is finite—in particular, the sequence \(\varvec{\sigma }_{k}\) is bounded and has a limit point: \(\bar{\varvec{\sigma }}\) which satisfies the first-order stationary condition (41).
Proof of Part (b):
Furthermore, if we assume that
then (42) admits a unique solution \(\bar{\varvec{\sigma }}\), which implies that \(\varvec{\sigma }_{k}\) has a unique limit point, and hence, the sequence \(\varvec{\sigma }_{k}\) necessarily converges. \(\square \)
1.2 Additional simulation results
The y-axis denotes the number of iterations NC-Impute takes to stabilize the rank. The integers on the x-axis index some values on a grid of \(\lambda \) (from largest to smallest) as described in Sect. 4.1. The six plots represent the six scenarios considered in Sect. 4.1: (a)–(d) correspond to the four scenarios of Example A; (e) covers Example B; (f) is for Example C. Each procedure is repeated 10 times
This section contains additional numerical results from the simulation study in Sect. 4.1.
To demonstrate the variation of the procedures in the experiments, we plot the averaged value and standard error of both test error and rank for some representative nonconvex penalty functions. Specifically, under each scenario considered in Sect. 4.1, we pick the nonconvex penalty that yields the best prediction and rank estimation performance. For each picked penalty, we plot the averaged value of test error and rank along with the associated standard error, against the tuning parameter \(\lambda \). The results are shown in Figs. 10, 11, and 12. As is clear form the figures, the standard error is typically (at least) one order of magnitude smaller than the average. Moreover, the general patterns of test error and rank on the solution path are expected, except for a few points corresponding to very small values of \(\lambda \). The irregularity of these few points occurs probably because the solutions are getting unstable as the nonconvex regularization becomes weak when \(\lambda \) is significantly small.
To examine the rank dynamics of the updates in NC-Impute, we compute the number of iterations that the algorithm takes for the convergence of the rank. We choose the same six nonconvex penalties as above and evaluate the rank stabilization for several values of \(\lambda \). The results are summarized in Fig. 9. One clearly observes that except for few instances, it takes less than 10 iterations for the rank to stabilize. Moreover, when the penalty is more “nonconvex” (i.e., \(\gamma \) is smaller), the rank stabilization occurs earlier. These empirical results provide complementary information on rank stabilization that has been theoretically investigated in Sect. 3.1.1.
Rights and permissions
About this article
Cite this article
Mazumder, R., Saldana, D. & Weng, H. Matrix completion with nonconvex regularization: spectral operators and scalable algorithms. Stat Comput 30, 1113–1138 (2020). https://doi.org/10.1007/s11222-020-09939-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-020-09939-5
Keywords
- Matrix completion
- Low rank
- Spectral nonconvex penalties
- MC+ penalty
- Optimization
- Degrees of freedom