Skip to main content
Log in

Robust matrix completion

  • Published:
Probability Theory and Related Fields Aims and scope Submit manuscript

Abstract

This paper considers the problem of estimation of a low-rank matrix when most of its entries are not observed and some of the observed entries are corrupted. The observations are noisy realizations of a sum of a low-rank matrix, which we wish to estimate, and a second matrix having a complementary sparse structure such as elementwise sparsity or columnwise sparsity. We analyze a class of estimators obtained as solutions of a constrained convex optimization problem combining the nuclear norm penalty and a convex relaxation penalty for the sparse constraint. Our assumptions allow for simultaneous presence of random and deterministic patterns in the sampling scheme. We establish rates of convergence for the low-rank component from partial and corrupted observations in the presence of noise and we show that these rates are minimax optimal up to logarithmic factors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. This statement actually appears as an intermediate step in the proof of this lemma.

References

  1. Agarwal, A., Negahban, S., Wainwright, M.J.: Noisy matrix decomposition via convex relaxation: optimal rates in high dimensions. Ann. Stat. 40(2), 1171–1197 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  2. Aubin, J.P., Ekeland, I.: Applied nonlinear analysis. In: Pure and Applied Mathematics. Wiley, New York (1984)

  3. Bauer, F.L., Stoer, J., Witzgall, C.: Absolute and monotonic norms. Numer. Math. 3, 257–264 (1961)

    Article  MathSciNet  MATH  Google Scholar 

  4. Buehlmann, P., van de Geer, S.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, New York (2011)

    Book  Google Scholar 

  5. Cai, T.T., Zhou, W.: Matrix completion via max-norm constrained optimization. doi:10.1007/978-1-4612-0537-1.201E

  6. Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(1), 1–37 (2009)

    MathSciNet  MATH  Google Scholar 

  7. Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chandrasekaran, V., Sanghavi, S., Parrilo, P.A., Willsky, A.S.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21(2), 572–596 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  9. Chen, Y., Huan, X., Caramanis, C., Sanghavi, S.: Robust matrix completion with corrupted columns. ICML, 873–880 (2011)

  10. Chen, Y., Jalali, A., Sanghavi, S., Caramanis, C.: Low-rank matrix recovery from errors and erasures. IEEE Trans. Inf. Theory 59(7), 4324–4337 (2013)

    Article  Google Scholar 

  11. de la Peña, V.H., Giné, E.: Decoupling. Probability and Its Applications (New York). Springer, New York (1999) (from dependence to independence, randomly stopped processes. \(U\)-statistics and processes. Martingales and beyond)

  12. Foygel, R., Srebro, N.: Concentration-based guarantees for low-rank matrix reconstruction. J. 24nd Annu. Conf. Learn. Theory (COLT) (2011)

  13. Giné, E., Latała, R., Zinn, J.: High dimensional probability, II (Seattle, 1999). In: Progress in Probability. Exponential and Moment Inequalities for \(U\)-Statistics, vol. 47, pp. 13–38. Birkhäuser, Boston (2000)

  14. Gross, D.: Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inf. Theory 57(3), 1548–1566 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  15. Hsu, D., Kakade, S.M., Zhang, T.: Robust matrix decomposition with sparse corruptions. IEEE Trans. Inf. Theory 57(11), 7221–7234 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  16. Huan, X., Caramanis, C., Sanghavi, S.: Robust PCA via outlier pursuit. IEEE Trans. Inf. Theory 58(5), 3047–3064 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  17. Keshavan, R.H., Montanari, A., Sewoong, O.: Matrix completion from noisy entries. J. Mach. Learn. Res. 11, 2057–2078 (2010)

    MathSciNet  MATH  Google Scholar 

  18. Klopp, O.: Noisy low-rank matrix completion with general sampling distribution. Bernoulli 20(1), 282–303 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  19. Koltchinskii, V.: Lectures from the 38th Probability Summer School held in Saint-Flour, 2008, École d’Été de Probabilités de Saint-Flour (Saint-Flour Probability Summer School). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. In: Lecture Notes in Mathematics, vol. 2033. Springer, Heidelberg (2011)

  20. Koltchinskii, V., Lounici, K., Tsybakov, A.B.: Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Stat. 39(5), 2302–2329 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  21. Negahban, S., Wainwright, M.J.: Restricted strong convexity and weighted matrix completion: optimal bounds with noise. J. Mach. Learn. Res. 13, 1665–1697 (2012)

    MathSciNet  MATH  Google Scholar 

  22. Recht, B.: A simpler approach to matrix completion. J. Mach. Learn. Res. 12, 3413–3430 (2011)

    MathSciNet  MATH  Google Scholar 

  23. Rohde, A., Tsybakov, A.: Estimation of high-dimensional low-rank matrices. Ann. Stat. 39(2), 887–930 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  24. Rudelson, M., Vershynin, R.: Hanson–Wright inequality and sub-Gaussian concentration. Electron. Commun. Probab. 18(82), 9 (2013)

    MathSciNet  MATH  Google Scholar 

  25. Tsybakov, A.B.: Introduction to nonparametric estimation. In: Springer Series in Statistics. Springer, New York (2009) (revised and extended from the 2004 French original, translated by Vladimir Zaiats)

  26. Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. arXiv:1011.3027 (2010)

  27. Xiaodong, L.: Compressed sensing and matrix completion with constant proportion of corruptions. Constr. Approx. 37(1) (2013)

Download references

Acknowledgments

The work of O. Klopp was conducted as part of the project Labex MME-DII (ANR11-LBX-0023-01). The work of K. Lounici was supported in part by Simons Grant 315477 and by NSF Career Grant DMS-1454515. The work of A.B. Tsybakov was supported by GENES and by the French National Research Agency (ANR) under the Grants IPANEMA (ANR-13-BSH1-0004-02), Labex ECODEC (ANR—11-LABEX-0047), ANR—11-IDEX-0003-02, and by the “Chaire Economie et Gestion des Nouvelles Données”, under the auspices of Institut Louis Bachelier, Havas-Media and Paris-Dauphine. The authors want to thank the anonymous referee for his extremely valuable remarks.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olga Klopp.

Appendices

Appendix A: Proofs of Theorem 1 and of Corollary 7

1.1 A.1: Proof of Theorem 1

The proofs of the upper bounds have similarities with the methods developed in [18] for noisy matrix completion but the presence of corruptions in our setting requires a new approach, in particular, for proving ”restricted strong convexity property” (Lemma 15) which is the main difficulty in the proof.

Recall that our estimator is defined as

$$\begin{aligned} ({\hat{L}},{\hat{S}})\in \underset{^{\left\| L\right\| _{\infty }\le {\mathbf {a}}}_{\left\| S\right\| _{\infty }\le {\mathbf {a}}} }{\mathop {\hbox {arg min}}}\left\{ \dfrac{1}{N}\sum ^{N}_{i=1} \left( Y_i-\left\langle X_i,L+S\right\rangle \right) ^{2}+\lambda _{1} \Vert L\Vert _* +\lambda _{2}{\mathcal {R}}(S)\right\} \end{aligned}$$

and our goal is to bound from above the Frobenius norms \(\Vert L_0-{\hat{L}}\Vert _2^{2}\) and \(\Vert S_0-{\hat{S}}\Vert _2^{2}\).

(1):

Set \({\mathcal {F}}(L,S)=\frac{1}{N}\sum ^{N}_{i=1} \left( Y_i-\left\langle X_i,L+S\right\rangle \right) ^{2}+\lambda _1 \Vert L\Vert _*+\lambda _2 {\mathcal {R}}( S)\), \(\Delta L=L_0-{\hat{L}}\) and \(\Delta S=S_0-{\hat{S}}\). Using the inequality \({\mathcal {F}}({\hat{L}},{\hat{S}})\le {\mathcal {F}}(L_0,S_0)\) and (1) we get

$$\begin{aligned}&\dfrac{1}{N}\sum ^{N}_{i=1} \left( \left\langle X_i,\Delta L+\Delta S\right\rangle +\xi _{i}\right) ^{2}+\lambda _{1} \Vert {\hat{L}}\Vert _*+\lambda _{2}{\mathcal {R}}({\hat{S}})\nonumber \\&\qquad \le \dfrac{1}{N}\sum ^{N}_{i=1} \xi _{i}^{2}+\lambda _{1} \Vert L_0\Vert _* +\lambda _{2}{\mathcal {R}}(S_0). \end{aligned}$$

After some algebra this implies

$$\begin{aligned} \dfrac{1}{N}\sum _{i\in \Omega } \left\langle X_i,\Delta L+\Delta S\right\rangle ^{2}\le & {} \underset{{\mathbf {I}}}{\underbrace{\dfrac{2}{N}\sum _{i\in {\tilde{\Omega }}} \left| \left\langle \xi _iX_i,\Delta L+\Delta S\right\rangle \right| -\dfrac{1}{N}\sum _{i\in {\tilde{\Omega }}} \left\langle X_i, \Delta L+\Delta S\right\rangle ^{2}}}\nonumber \\&+\, \underset{{\mathbf {II}}}{\underbrace{2\left| \left\langle \Sigma ,\Delta L\right\rangle \right| +\lambda _{1} \left( \Vert L_0\Vert _* -\Vert {\hat{L}}\Vert _*\right) }}\nonumber \\&+\,\underset{{\mathbf {III}}}{\underbrace{2\left| \left\langle \Sigma ,\Delta S_{\mathcal {I}}\right\rangle \right| +\lambda _{2}\left( \mathcal {R}(S_0)-\mathcal {R}({\hat{S}})\right) }} \end{aligned}$$
(25)

where \(\Sigma =\frac{1}{N}\sum _{i\in \Omega } \xi _iX_i\) and we have used the equality \(\left\langle \Sigma ,\Delta S\right\rangle =\left\langle \Sigma ,\Delta S_{\mathcal {I}}\right\rangle \). We now estimate each of the three terms on the right hand side of (25) separately. This will be done on the random event

$$\begin{aligned} {\mathcal {U}}=\left\{ \underset{1\le i\le N}{\max }\vert \xi _{i}\vert \le C_*\sigma \sqrt{\log d}\right\} \end{aligned}$$
(26)

where \(C_*>0\) is a suitably chosen constant. Using a standard bound on the maximum of sub-gaussian variables and the constraint \(N\le m_1m_2\) we get that there exists an absolute constant \(C_*>0\) such that \({\mathbb {P}}({\mathcal {U}})\ge 1-\frac{1}{2d}\). In what follows, we take this constant \(C_*\) in the definition of \({\mathcal {U}}\).

We start by estimating \({\mathbf {I}}\). On the event \({\mathcal {U}}\), we get

$$\begin{aligned} \mathbf{I}\le & {} \frac{1}{N}\sum _{i\in {\tilde{\Omega }}} \xi ^{2}_i \le \dfrac{C\,\sigma ^{2}\vert {\tilde{\Omega }}\vert \log (d)}{N}. \end{aligned}$$
(27)

Now we estimate \({\mathbf {II}}\). For a linear vector subspace S of a euclidean space, let \(P_S\) denote the orthogonal projector on S and let \(S^\bot \) denote the orthogonal complement of S. For any \(A\in {\mathbb {R}}^{m_1\times m_2}\), let \(u_j(A)\) and \(v_j(A)\) be the left and right orthonormal singular vectors of A, respectively . Denote by \(S_1(A)\) the linear span of \(\{u_j(A)\}\), and by \(S_2(A)\) the linear span of \(\{v_j(A)\}\). We set

$$\begin{aligned} {\mathbf {P}}_A^{\bot }(B)= & {} P_{S_1^{\bot }(A)}BP_{S_2^{\bot }(A)} \quad \text {and}\quad {\mathbf {P}}_A(B)=B- {\mathbf {P}}_A^{\bot }(B). \end{aligned}$$

By definition of \({\mathbf {P}}_{L_0}^{\bot }\), for any matrix B the singular vectors of \({\mathbf {P}}_{L_0}^{\bot }(B)\) are orthogonal to the space spanned by the singular vectors of \(L_0\). This implies that \(\left\| L_0+{\mathbf {P}}_{L_0}^{\bot }(\Delta L) \right\| _*=\left\| L_0 \right\| _*+\left\| {\mathbf {P}}_{L_0}^{\bot }(\Delta L) \right\| _*\). Thus,

$$\begin{aligned} \Vert {\hat{L}}\Vert _*= & {} \left\| L_0 +\Delta L\right\| _*\\= & {} \left\| L_0 +{\mathbf {P}}_{L_0}^{\bot }(\Delta L) +{\mathbf {P}}_{L_0}(\Delta L)\right\| _*\\\ge & {} \left\| L_0 +{\mathbf {P}}_{L_0}^{\bot }(\Delta L)\right\| _* -\left\| {\mathbf {P}}_{L_0}(\Delta L)\right\| _*\\= & {} \left\| L_0 \right\| _*+\left\| {\mathbf {P}}_{L_0}^{\bot }(\Delta L)\right\| _*-\left\| {\mathbf {P}}_{L_0}(\Delta L)\right\| _*, \end{aligned}$$

which yields

$$\begin{aligned} \Vert L_0 \Vert _*-\Vert {\hat{L}}\Vert _*\le \left\| {\mathbf {P}}_{L_0}(\Delta L)\right\| _*-\left\| {\mathbf {P}}_{L_0}^{\bot }(\Delta L)\right\| _*. \end{aligned}$$
(28)

Using (28) and the duality between the nuclear and the operator norms, we obtain

$$\begin{aligned} {\mathbf {II}}\le 2 \Vert \Sigma \Vert \Vert \Delta L\Vert _{*}+\lambda _{1}\left( \left\| {\mathbf {P}}_{L_0}(\Delta L)\right\| _*-\left\| {\mathbf {P}}_{L_0}^{\bot }(\Delta L)\right\| _*\right) . \end{aligned}$$

The assumption that \(\lambda _1\ge 4\Vert \Sigma \Vert \) and the triangle inequality imply

$$\begin{aligned} {\mathbf {II}} \le \dfrac{3}{2}\lambda _1\left\| {\mathbf {P}}_{L_0}(\Delta L)\right\| _*\le \dfrac{3}{2}\lambda _1\sqrt{2r}\left\| \Delta L\right\| _2 \end{aligned}$$
(29)

where \(r=\mathrm{rank}(L_0)\) and we have used that \(\mathrm{rank}({\mathbf {P}}_{L_0}(\Delta L))\le 2\,\mathrm{rank}(L_0)\).

For the third term in (25), we use the duality between the \(\mathcal {R}\) and \(\mathcal {R}^{*}\), and the identity \(\Delta S_{\mathcal {I}}=-{\hat{S}}_{\mathcal {I}}\):

$$\begin{aligned} {\mathbf {III}}\le 2\mathcal {R}^{*}(\Sigma )\mathcal {R}({\hat{S}}_{\mathcal {I}})+\lambda _{2}( \mathcal {R}(S_0)-\mathcal {R}({\hat{S}})). \end{aligned}$$

This and the assumption that \(\lambda _2\ge 4\mathcal {R}^{*}(\Sigma )\) imply

$$\begin{aligned} {\mathbf {III}}\le \lambda _2\mathcal {R}( S_0). \end{aligned}$$
(30)

Plugging (29), (30) and (27) in (25) we get that, on the event \({\mathcal {U}}\),

$$\begin{aligned} \dfrac{1}{n}\sum _{i\in \Omega } \left\langle X_i,\Delta L+\Delta S\right\rangle ^{2}\le & {} \frac{3\,\ae \,\lambda _1}{\sqrt{2}}\sqrt{r}\left\| \Delta L\right\| _2 +\ae \lambda _2\mathcal {R}( S_0) +\dfrac{C\sigma ^{2}\vert {\tilde{\Omega }} \vert \log (d)}{n}\nonumber \\ \end{aligned}$$
(31)

where \(\ae =N/n\).

(2):

Second, we will show that a kind of restricted strong convexity holds for the random sampling operator given by \((X_i)\) on a suitable subset of matrices. In words, we prove that the observation operator captures a substantial component of any pair of matrices LS belonging to a properly chosen constrained set (cf. Lemma 15(ii) below for the exact statement). This will imply that, with high probability,

$$\begin{aligned} \dfrac{1}{n}\sum _{i\in \Omega } \left\langle X_i,\Delta L+\Delta S\right\rangle ^{2}\ge \left\| \Delta L+\Delta S\right\| _{L_2(\Pi )}^{2}-{\mathcal {E}} \end{aligned}$$
(32)

with an appropriate residual \({\mathcal {E}}\), whenever we prove that \((\Delta L,\Delta S)\) belongs to the constrained set. This will be a substantial element of the remaining part of the proof. The result of the theorem will then be deduced by combining (31) and (32).

We start by defining our constrained set. For positive constants \(\delta _1\) and \(\delta _2\), we first introduce the following set of matrices where \(\Delta S\) should lie:

$$\begin{aligned} {\mathcal {B}}(\delta _1,\delta _2)=\{B\in {\mathbb {R}}^{m_1\times m_2} \,:\, \left\| B\right\| _{L_2(\Pi )}^{2}\le \delta ^{2}_1\;\text {and}\; \mathcal {R}(B)\le \delta _2\}. \end{aligned}$$
(33)

The constants \(\delta _1\) and \(\delta _2\) define the constraints on the \(L_2(\Pi )\)-norm and on the sparsity of the component S. The error term \({\mathcal {E}}\) in (32) depends on \(\delta _1\) and \(\delta _2\). We will specify the suitable values of \(\delta _1\) and \(\delta _2\) for the matrix \(\Delta S\) later. Next, we define the following set of pairs of matrices:

$$\begin{aligned} {\mathcal {D}}(\tau ,\kappa )= & {} \Biggl \{(A,B)\in {\mathbb {R}}^{m_1\times m_2}\,:\, \left\| A+B\right\| _{L_2(\Pi )}^{2}\ge \sqrt{\dfrac{64\,\log (d)}{\log \left( 6/5\right) \,n}},\\&\qquad \left\| A+B\right\| _{\infty }\le 1,\left\| A\right\| _{*}\le \sqrt{\tau } \left\| A_{\mathcal {I}}\right\| _{2}+\kappa \Biggr \} \end{aligned}$$

where \(\kappa \) and \(\tau <m_1\wedge m_2\) are some positive constants. This will be used for \(A=\Delta L\) and \(B=\Delta S\). If the \(L_2(\Pi )\)-norm of the sum of two matrices is too small, the right hand side of (32) is negative. The first inequality in the definition of \({\mathcal {D}}(\tau ,\kappa )\) prevents from this. Condition \(\left\| A\right\| _{*}\le \sqrt{\tau } \left\| A_{\mathcal {I}}\right\| _{2}+\kappa \) is a relaxed form of the condition \(\left\| A\right\| _{*}\le \sqrt{\tau } \left\| A\right\| _{2}\) satisfied by matrices with rank \(\tau \). We will show that, with high probability, the matrix \(\Delta L\) satisfies this condition with \(\tau =C\,\mathrm{rank}(L_0)\) and a small \(\kappa \). To prove it, we need the bound \(\mathcal {R}(B)\le \delta _2\) on the corrupted part.

Finally, define our constrained set as the intersection

$$\begin{aligned} {\mathcal {D}}(\tau ,\kappa )\cap \left\{ {\mathbb {R}}^{m_1\times m_2}\times {\mathcal {B}}(\delta _1,\delta _2)\right\} . \end{aligned}$$

We now return to the proof of the theorem. To prove (11), we bound separately the norms \(\left\| \Delta L\right\| _2\) and \(\left\| \Delta S\right\| _2\). Note that

$$\begin{aligned} \Vert \Delta L\Vert _2^{2}\le & {} \Vert \Delta L_{\mathcal {I}}\Vert _2^{2}+\Vert \Delta L_{{\tilde{\mathcal {I}}}}\Vert _2^{2} \le \Vert \Delta L_{\mathcal {I}}\Vert _2^{2}+4{\mathbf {a}}^{2}\vert {\tilde{\mathcal {I}}}\vert \nonumber \\\le & {} \mu \vert \mathcal {I}\vert \Vert \Delta L_{\mathcal {I}}\Vert _{L_2(\Pi )}^{2}+4{\mathbf {a}}^{2}\vert {\tilde{\mathcal {I}}}\vert \end{aligned}$$
(34)

and similarly,

$$\begin{aligned} \Vert \Delta S\Vert _2^{2}\le & {} \mu \vert \mathcal {I}\vert \Vert \Delta S_{\mathcal {I}}\Vert _{L_2(\Pi )}^{2}+4{\mathbf {a}}^{2}\vert {\tilde{\mathcal {I}}}\vert . \end{aligned}$$

In view of these inequalities, it is enough to bound the quantities \(\Vert \Delta S_{\mathcal {I}}\Vert _{L_2(\Pi )}^{2}\) and \(\Vert \Delta L_{\mathcal {I}}\Vert _2^{2}\). A bound on \(\Vert \Delta S_{\mathcal {I}}\Vert _{L_2(\Pi )}^{2}\) with the rate as claimed in (11) is given in Lemma 14 below. In order to bound \(\Vert \Delta L_{\mathcal {I}}\Vert _{L_2(\Pi )}^{2}\) (or \(\Vert \Delta L_{\mathcal {I}}\Vert _2^{2}\) according to cases), we will need the following argument.

Case 1 Suppose that \(\left\| \Delta L+\Delta S\right\| _{L_2(\Pi )}^{2} < 16{\mathbf {a}}^{2}\sqrt{\dfrac{64\,\log (d)}{\log \left( 6/5\right) \,n}}\). Then a straightforward inequality

$$\begin{aligned} \Vert \Delta L+\Delta S\Vert ^{2}_{L_2(\Pi )}\ge \frac{1}{2} \Vert \Delta L\Vert ^{2}_{L_2(\Pi )}-\Vert \Delta S\Vert ^{2}_{L_2(\Pi )} \end{aligned}$$
(35)

together with Lemma 14 below implies that, with probability at least \(1-2.5/d\),

$$\begin{aligned} \left\| \Delta L\right\| ^{2}_{L_2(\Pi )}\le \Delta _{1} \end{aligned}$$
(36)

where

$$\begin{aligned} \Delta _1=C\Psi _4/\mu= & {} C\left\{ {\mathbf {a}}^{2}\, \sqrt{\dfrac{\log (d)}{n}}+{\mathbf {a}}\, \mathcal {R}({\mathbf {Id}}_{\tilde{\Omega }})\left[ \ae \,\lambda _{2}+ {\mathbf {a}}\, {\mathbb {E}}\left( \mathcal {R}^{*}(\Sigma _R)\right) \right] \right. \\&\qquad \left. +\left( \dfrac{{\mathbf {a}}\,{\mathbb {E}}\left( \mathcal {R}^{*}(\Sigma _R)\right) }{\lambda _2} +\ae \right) \frac{\vert {\tilde{\Omega }}\vert \left( {\mathbf {a}}^{2}+\sigma ^{2}\log (d)\right) }{N} \right\} . \end{aligned}$$

Note also that \(\Psi _4\le C(\Psi _1+\Psi _2 +\Psi _3)\). In view of (34), (36) and of fact that \(\vert \mathcal {I}\vert \le m_1m_2\), the bound on \(\Vert \Delta L\Vert _2^{2}\) stated in the theorem holds with probability at least \(1-2.5/d\).

Case 2 Assume now that \(\left\| \Delta L+\Delta S\right\| _{L_2(\Pi )}^{2} \ge 16{\mathbf {a}}^{2}\sqrt{\dfrac{64\,\log (d)}{\log \left( 6/5\right) \,n}}\). We will show that in this case and with an appropriate choice of \(\delta _1,\delta _2,\tau \) and \(\kappa \), the pair \(\frac{1}{4{\mathbf {a}}}(\Delta L, \Delta S)\) belongs to the intersection \({\mathcal {D}}(\tau ,\kappa )\cap \{{\mathbb {R}}^{m_1\times m_2}\times {\mathcal {B}}(\delta _1,\delta _2)\}\).

Lemma 13 below and (27) imply that, on the event \({\mathcal {U}}\),

$$\begin{aligned} \Vert \Delta L\Vert _{*}\le & {} 4\sqrt{2r}\Vert \Delta L\Vert _{2}+\frac{\lambda _2\, {\mathbf {a}}}{\lambda _1}\,\mathcal {R}({\mathbf {Id}}_{\tilde{\Omega }})+\dfrac{C\sigma ^{2} \vert {\tilde{\Omega }}\vert \log (d)}{N\lambda _1}\nonumber \\\le & {} 4\sqrt{2r}\Vert \Delta L_{\mathcal {I}}\Vert _{2}+8{\mathbf {a}}\sqrt{2r\vert {\tilde{\mathcal {I}}}\vert }+\frac{\lambda _2\, {\mathbf {a}}}{\lambda _1}\,\mathcal {R}({\mathbf {Id}}_{\tilde{\Omega }})+\dfrac{C\sigma ^{2}\vert {\tilde{\Omega }}\vert \log (d)}{N\lambda _1}.\qquad \end{aligned}$$
(37)

Lemma 14 yields that, with probability at least \(1-2.5\,d^{-1}\),

$$\begin{aligned} \dfrac{\Delta S}{4{\mathbf {a}}}\in {\mathcal {B}}\left( \dfrac{\sqrt{\Delta _1}}{4{\mathbf {a}}}, 2\mathcal {R}({\mathbf {Id}}_{\tilde{\Omega }})+\frac{C\vert {\tilde{\Omega }}\vert \left( {\mathbf {a}}^{2} +\sigma ^{2}\log (d)\right) }{4{\mathbf {a}}N\lambda _2}\right) ={\bar{\mathcal {B}}}. \end{aligned}$$

This property and (37) imply that \(\frac{1}{4{\mathbf {a}}}\left( \Delta L,\Delta S\right) \in {\mathcal {D}}(\tau ,\kappa )\cap \{{\mathbb {R}}^{m_1\times m_2}\times {\bar{\mathcal {B}}}\}\) with probability at least \(1-2.5\,d^{-1}\), where

$$\begin{aligned} \tau =32r\quad \text {and}\quad \kappa =2\sqrt{2r\vert {\tilde{\mathcal {I}}}\vert } +\frac{\lambda _2}{4\lambda _1}\,\mathcal {R}({\mathbf {Id}}_{\tilde{\Omega }}) +\dfrac{C\sigma ^{2}\vert {\tilde{\Omega }}\vert \log (d)}{4{\mathbf {a}}\,N\lambda _1}. \end{aligned}$$

Therefore, we can apply Lemma 15(ii). From Lemma 15(ii) and (31) we obtain that, with probability at least \(1-4.5\,d^{-1}\),

$$\begin{aligned} \dfrac{1}{2}\Vert \Delta L+\Delta S\Vert ^{2}_{L_2(\Pi )}\le & {} \frac{3\,\ae \,\lambda _1}{\sqrt{2}}\sqrt{r}\left\| \Delta L\right\| _2 +C{\mathcal {E}} \end{aligned}$$
(38)

where

$$\begin{aligned} {\mathcal {E}}= & {} \mu \,{\mathbf {a}}^{2}\,r\, \vert \mathcal {I}\vert \left( {\mathbb {E}}\left( \Vert \Sigma _R\Vert \right) \right) ^{2} +8{\mathbf {a}}^{2}\sqrt{2r\vert {\tilde{\mathcal {I}}}\vert }\, {\mathbb {E}}\left( \Vert \Sigma _R\Vert \right) \nonumber \\&+\,\lambda _2\mathcal {R}({\mathbf {Id}}_{\tilde{\Omega }}) \left( \dfrac{{\mathbf {a}}^{2}{\mathbb {E}}\left( \Vert \Sigma _R\Vert \right) }{\lambda _1}+{\mathbf {a}}\,\ae \right) \nonumber \\&+\, \dfrac{\vert {\tilde{\Omega }}\vert \left( {\mathbf {a}}^{2}+\sigma ^{2}\log (d)\right) }{N}\left( \dfrac{{\mathbf {a}}\, {\mathbb {E}}\left( \Vert \Sigma _R\Vert \right) }{\lambda _1}+\dfrac{{\mathbf {a}}\,{\mathbb {E}}\left( \mathcal {R}^{*}(\Sigma _R)\right) }{\lambda _2}+\ae \right) +\Delta _1.\qquad \end{aligned}$$
(39)

Using an elementary argument and then (34) we find

$$\begin{aligned} \frac{3\,\ae }{\sqrt{2}}\lambda _1\sqrt{r}\left\| \Delta L\right\| _2\le & {} \dfrac{9\,\ae ^{2}\,\mu \,m_1\,m_2\,r\,\lambda _1^{2}}{2}+\dfrac{\left\| \Delta L\right\| ^{2}_2}{4\mu m_1 m_2}\\\le & {} \dfrac{9\,\ae ^{2}\,\mu \,m_1\,m_2\,r\,\lambda _1^{2}}{2}+\dfrac{\left\| \Delta L_{\mathcal {I}}\right\| ^{2}_2}{4\mu m_1 m_2}+\dfrac{{\mathbf {a}}^{2}\vert {\tilde{\mathcal {I}}}\vert }{\mu \,m_1m_2}. \end{aligned}$$

This inequality and (38) yield

$$\begin{aligned} \Vert \Delta L+\Delta S\Vert ^{2}_{L_2(\Pi )}\le & {} \dfrac{9\,\ae ^{2}\,\mu \,m_1\,m_2\,r\,\lambda _1^{2}}{4}+\dfrac{\left\| \Delta L_{\mathcal {I}}\right\| ^{2}_2}{4\mu m_1 m_2}+\dfrac{{\mathbf {a}}^{2}\vert {\tilde{\mathcal {I}}}\vert }{\mu \,m_1m_2} +C{\mathcal {E}}. \end{aligned}$$

Using again (35), Lemma 14, (9) and the bound \(\vert \mathcal {I}\vert \le m_1m_2\) we obtain

$$\begin{aligned} \dfrac{\Vert \Delta L_{\mathcal {I}}\Vert ^{2}_2}{\mu m_1 m_2}\le & {} C\left\{ \ae ^{2}\,\mu \,m_1\,m_2\,r\,\lambda _1^{2}+\dfrac{{\mathbf {a}}^{2}\vert {\tilde{\mathcal {I}}}\vert }{\mu \,m_1m_2} +{\mathcal {E}}\right\} . \end{aligned}$$

This and the inequality \(\sqrt{2r\vert {\tilde{\mathcal {I}}}\vert }\,{\mathbb {E}}\left( \Vert \Sigma _R\Vert \right) \le \dfrac{\vert {\tilde{\mathcal {I}}}\vert }{\mu \,m_1m_2}+\mu \,m_1m_2\,r\,\left( {\mathbb {E}}\left( \Vert \Sigma _R\Vert \right) \right) ^{2}\) imply that, with probability at least \(1-4.5\,d^{-1}\),

$$\begin{aligned} \dfrac{\Vert \Delta L_{\mathcal {I}}\Vert _2^{2}}{m_1 m_2}\le & {} C\left\{ \Psi _1+ \Psi _2+\Psi _3\right\} . \end{aligned}$$
(40)

In view of (40) and (34), \(\Vert \Delta L\Vert _2^{2}\) is bounded by the right hand side of (11) with probability at least \(1-4.5\,d^{-1}\). Finally, inequality (12) follows from Lemma 14, (9) and the identity \(\Delta S_{\mathcal {I}}=-{\hat{S}}_{\mathcal {I}}\).

Lemma 12

Assume that \(\lambda _2\ge 4\left( \mathcal {R}^{*}(\Sigma )+2{\mathbf {a}}\mathcal {R}^{*}(W)\right) \). Then, we have

$$\begin{aligned} \mathcal {R}(\Delta S_{\mathcal {I}})\le 3\mathcal {R}(\Delta S_{{\tilde{\Omega }}})+\frac{1}{N\lambda _2}\left[ 4{\mathbf {a}}^{2}\vert {\tilde{\Omega }}\vert +\sum _{i\in {\tilde{\Omega }}} \xi _i^{2}\right] \end{aligned}$$

Proof

Let \(\partial \Vert \cdot \Vert _{*}\), and \(\partial {\mathcal {R}}\) denote the subdifferentials of \(\Vert \cdot \Vert _{*}\) and of \({\mathcal {R}}\), respectively. By the standard condition for optimality over a convex set (see [2, Chapter 4, Section 2, Corollary 6]), we have

$$\begin{aligned}&-\dfrac{2}{N}\sum ^{N}_{i=1} (Y_i-\langle X_i,{\hat{L}} +{\hat{S}}\rangle )\langle X_i,L+S-{\hat{L}} -{\hat{S}}\rangle \nonumber \\&+\lambda _{1}\langle \partial \Vert {\hat{L}}\Vert _{*},L-{\hat{L}}\rangle +\lambda _{2}\langle \partial {\mathcal {R}}({\hat{S}}),S-{\hat{S}}\rangle \ge 0 \end{aligned}$$
(41)

for all feasible pairs (LS). In particular, for \(({\hat{L}},S_0)\) we obtain

$$\begin{aligned} -\dfrac{2}{N}\sum ^{N}_{i=1} (Y_i-\langle X_i,{\hat{L}}+{\hat{S}}\rangle )\langle X_i,\Delta S\rangle +\lambda _{2}\langle \partial {\mathcal {R}}({\hat{S}}),\Delta S\rangle\ge & {} 0, \end{aligned}$$

which implies

$$\begin{aligned}&-\dfrac{2}{N}\sum ^{N}_{i=1} \left\langle X_i,\Delta S\right\rangle ^{2} -\dfrac{2}{N}\sum _{i\in {\tilde{\Omega }}} \left\langle X_i,\Delta L\right\rangle \left\langle X_i,\Delta S\right\rangle -\dfrac{2}{N}\sum _{i\in {\tilde{\Omega }}} \xi _i\left\langle X_i,\Delta S\right\rangle \\&-\dfrac{2}{N}\sum _{i\in \Omega } \left\langle X_i,\Delta L\right\rangle \left\langle X_i,\Delta S\right\rangle -2\left\langle \Sigma ,\Delta S\right\rangle +\lambda _{2}\langle \partial {\mathcal {R}}({\hat{S}}),\Delta S\rangle \ge 0. \end{aligned}$$

Using the elementary inequality \(2ab\le a^2+b^2\) and the bound \(\Vert \Delta L\Vert _{\infty }\le 2{\mathbf {a}}\) we find

$$\begin{aligned}&-\dfrac{2}{N}\sum ^{N}_{i=1} \left\langle X_i,\Delta S\right\rangle ^{2} -\dfrac{2}{N}\sum _{i\in {\tilde{\Omega }}} \left\langle X_i,\Delta L\right\rangle \left\langle X_i,\Delta S\right\rangle -\dfrac{2}{N}\sum _{i\in {\tilde{\Omega }}} \xi _i\left\langle X_i,\Delta S\right\rangle \\&\le \dfrac{1}{N}\sum _{i\in {\tilde{\Omega }}} \left\langle X_i, \Delta L\right\rangle ^{2}+\dfrac{1}{N}\sum _{i\in {\tilde{\Omega }}}\xi ^{2}\\&\le \dfrac{4{\mathbf {a}}^{2}\vert {\tilde{\Omega }}\vert }{N}+\frac{1}{N}\sum _{i\in {\tilde{\Omega }}}\xi _i^{2}. \end{aligned}$$

Combining the last two displays we get

$$\begin{aligned} \lambda _{2}\langle \partial {\mathcal {R}}({\hat{S}}),{\hat{S}}-S_0\rangle\le & {} 2\left| \left\langle \dfrac{1}{N}\sum _{i\in \Omega } \left\langle X_i, \Delta L\right\rangle X_i,\Delta S\right\rangle \right| +2\left| \left\langle \Sigma ,\Delta S\right\rangle \right| \nonumber \\&\qquad +\,\dfrac{4{\mathbf {a}}^{2}\vert {\tilde{\Omega }}\vert }{N}+\frac{1}{N} \sum _{i\in {\tilde{\Omega }}}\xi _i^{2}\nonumber \\\le & {} 2\mathcal {R}^{*}\left( \dfrac{1}{N}\sum _{i\in \Omega } \left\langle X_i, \Delta L\right\rangle X_i\right) \mathcal {R}(\Delta S)+2\mathcal {R}^{*}( \Sigma )\mathcal {R}(\Delta S)\nonumber \\&\qquad +\,\dfrac{4{\mathbf {a}}^{2}\vert {\tilde{\Omega }}\vert }{N}+\frac{1}{N}\sum _{i\in {\tilde{\Omega }}}\xi _i^{2}. \end{aligned}$$
(42)

By Lemma 18,

$$\begin{aligned} \mathcal {R}^{*}\left( \frac{1}{N}\sum _{i\in \Omega } \left\langle X_i,\Delta L\right\rangle X_i\right)\le & {} 2{\mathbf {a}}\mathcal {R}^{*}(W) \end{aligned}$$
(43)

where \(W=\frac{1}{N}\sum _{i\in \Omega }X_{i}\). On the other hand, the convexity of \(\mathcal {R}(\cdot )\) and the definition of subdifferential imply

$$\begin{aligned} \mathcal {R}(S_0)\ge \mathcal {R}({\hat{S}})+\langle \partial {\mathcal {R}}({\hat{S}}),\Delta S\rangle . \end{aligned}$$
(44)

Plugging (43) and (44) in (42) we obtain

$$\begin{aligned} \lambda _{2}(\mathcal {R}({\hat{S}})-\mathcal {R}(S_0))\le 4{\mathbf {a}}\mathcal {R}^{*}(W)\mathcal {R}(\Delta S)+2\mathcal {R}^{*}( \Sigma )\mathcal {R}(\Delta S)+\dfrac{4{\mathbf {a}}^{2}\vert {\tilde{\Omega }}\vert }{N}+\frac{1}{N}\sum _{i\in {\tilde{\Omega }}}\xi _i^{2}. \end{aligned}$$

Next, the decomposability of \(\mathcal {R}(\cdot )\), the identity \((S_0)_{\mathcal {I}}=0\) and the triangle inequality yield

$$\begin{aligned} \mathcal {R}(S_0-\Delta S)-\mathcal {R}(S_0)= & {} \mathcal {R}\left( (S_0-\Delta S)_{{\tilde{\mathcal {I}}}}\right) +\mathcal {R}\left( (S_0-\Delta S)_{\mathcal {I}}\right) -\mathcal {R}\left( (S_0)_{{\tilde{\mathcal {I}}}}\right) \\\ge & {} \mathcal {R}\left( (\Delta S)_{\mathcal {I}}\right) -\mathcal {R}\left( (\Delta S)_{{\tilde{\mathcal {I}}}}\right) . \end{aligned}$$

Since \(\lambda _2\ge 4\left( 2{\mathbf {a}}\mathcal {R}^{*}(W)+\mathcal {R}^{*}( \Sigma )\right) \) the last two displays imply

$$\begin{aligned}&\lambda _2\left( \mathcal {R}\left( (\Delta S)_{\mathcal {I}}\right) -\mathcal {R}\left( (\Delta S)_{{\tilde{\mathcal {I}}}}\right) \right) \\&\quad \le \dfrac{\lambda _2}{2}\left( \mathcal {R}\left( \Delta S_{{\tilde{\mathcal {I}}}}\right) +\mathcal {R}\left( (\Delta S)_{\mathcal {I}}\right) \right) +\dfrac{4{\mathbf {a}}^{2}\vert {\tilde{\Omega }}\vert }{N}+\frac{1}{N}\sum _{i\in {\tilde{\Omega }}}\xi _i^{2}. \end{aligned}$$

Thus,

$$\begin{aligned} \mathcal {R}\left( \Delta S_{\mathcal {I}}\right)\le & {} 3\mathcal {R}\left( \Delta S_{{\tilde{\mathcal {I}}}}\right) +\frac{1}{N\lambda _2}\left[ 4{\mathbf {a}}^{2}\vert {\tilde{\Omega }}\vert +\sum _{i\in {\tilde{\Omega }}} \xi _i^{2}\right] . \end{aligned}$$
(45)

Since we assume that all unobserved entries of \(S_0\) are zero, we have \((S_0)_{\tilde{\mathcal {I}}}=(S_0)_{\tilde{\Omega }}\). On the other hand, \(S_{\tilde{\mathcal {I}}}={\hat{S}}_{\tilde{\Omega }}\) as \(\mathcal {R}(\cdot )\) is a monotonic norm. Indeed, adding to S a non-zero element on the non-observed part increases \(\mathcal {R}(S)\) but does not modify \(\frac{1}{N}\sum ^{N}_{i=1} \left( Y_i-\left\langle X_i,L+S\right\rangle \right) ^{2}\). To conclude, we have \(\Delta S_{{\tilde{\mathcal {I}}}}=\Delta S_{{\tilde{\Omega }}}\), which together with (45), implies the Lemma. \(\square \)

Lemma 13

Suppose that \(\lambda _1\ge 4\Vert \Sigma \Vert \) and \(\lambda _2\ge 4\mathcal {R}^{*}(\Sigma )\). Then,

$$\begin{aligned} \left\| {\mathbf {P}}_{L_0}^{\bot }(\Delta L)\right\| _*\le 3\left\| {\mathbf {P}}_{L_0}(\Delta L)\right\| _*+\frac{\lambda _2\,{\mathbf {a}}}{\lambda _1}\, \mathcal {R}({\mathbf {Id}}_{\tilde{\Omega }})+\frac{1}{N\lambda _1}\sum _{i\in {\tilde{\Omega }}} \xi _i^{2}. \end{aligned}$$

Proof

Using (41) for \((L,S)=(L_0,S_0)\) we obtain

$$\begin{aligned}&-\dfrac{2}{N}\sum ^{N}_{i=1}\left\langle X_i,\Delta S+\Delta L\right\rangle ^{2} -\dfrac{2}{N}\sum _{i\in {\tilde{\Omega }}} \left\langle \xi _iX_i,\Delta L+\Delta S\right\rangle \nonumber \\&-2\left\langle \Sigma ,\left( \Delta S\right) _{\mathcal {I}}\right\rangle -2\left\langle \Sigma ,\Delta L\right\rangle +\lambda _{1}\langle \partial \Vert {\hat{L}}\Vert _{*},\Delta L\rangle +\lambda _{2}\langle \partial {\mathcal {R}}({\hat{S}}),\Delta S\rangle \ge 0.\qquad \end{aligned}$$
(46)

The convexity of \(\Vert \cdot \Vert _{*}\) and of \(\mathcal {R}(\cdot )\) and the definition of the subdifferential imply

$$\begin{aligned} \Vert L_0\Vert _{*}\ge & {} \Vert {\hat{L}}\Vert _{*}+\langle \partial \Vert {\hat{L}}\Vert _{*},\Delta L\rangle \\ \mathcal {R}(S_0)\ge & {} \mathcal {R}({\hat{S}})+\langle \partial {\mathcal {R}}({\hat{S}}),\Delta S\rangle . \end{aligned}$$

Together with (46), this yields

$$\begin{aligned} \lambda _{1}(\Vert {\hat{L}}\Vert _{*}-\Vert L_0\Vert _{*}) +\lambda _{2}(\mathcal {R}({\hat{S}})-\mathcal {R}(S_0))\le & {} 2\Vert \Sigma \Vert \Vert \Delta L\Vert _{*}+2\mathcal {R}^{*}( \Sigma )\mathcal {R}\left( \Delta S_{\mathcal {I}}\right) \\&+\,\frac{1}{N}\sum _{i\in {\tilde{\Omega }}} \xi _i^{2}. \end{aligned}$$

Using the conditions \(\lambda _1\ge 4\Vert \Sigma \Vert \), \(\lambda _2\ge 4\mathcal {R}^{*}(\Sigma )\), the triangle inequality and (28) we get

$$\begin{aligned}&\lambda _1\left( \left\| {\mathbf {P}}_{L_0}^{\bot }(\Delta L)\right\| _* - \left\| {\mathbf {P}}_{L_0}(\Delta L)\right\| _*\right) +\lambda _2 (\mathcal {R}({\hat{S}})-\mathcal {R}(S_0))\\&\quad \le \dfrac{\lambda _1}{2}\left( \left\| {\mathbf {P}}_{L_0}^{\bot }(\Delta L)\right\| _* + \left\| {\mathbf {P}}_{L_0}(\Delta L)\right\| _* \right) +\dfrac{\lambda _2}{2}\mathcal {R}({\hat{S}}_{\mathcal {I}})+\frac{1}{N} \sum _{i\in {\tilde{\Omega }}} \xi _i^{2}. \end{aligned}$$

Since we assume that all unobserved entries of \(S_0\) are zero, we obtain \(\mathcal {R}(S_0)\le {\mathbf {a}}\mathcal {R}({\mathbf {Id}}_{\tilde{\Omega }})\). Using this inequality in the last display proves the lemma. \(\square \)

Lemma 14

Let \(n>m_1\) and \(\lambda _2\ge 4\left( \mathcal {R}^{*}(\Sigma )+2{\mathbf {a}}\mathcal {R}^{*}(W)\right) \). Suppose that the distribution \(\Pi \) on \({\mathcal {X}}'\) satisfies Assumptions 1 and 2. Let \(\left\| S_0\right\| _{\infty }\le {\mathbf {a}}\) for some constant \({\mathbf {a}}\) and let Assumption 3 be satisfied. Then, with probability at least \(1-2.5\,d^{-1}\),

$$\begin{aligned} \Vert \Delta S\Vert _{L_2(\Pi )}^{2}\le & {} C \Psi _4/\mu , \end{aligned}$$
(47)

and

$$\begin{aligned} \mathcal {R}(\Delta S)\le & {} 8{\mathbf {a}}\mathcal {R}({\mathbf {Id}}_{\tilde{\Omega }})+\frac{\vert {\tilde{\Omega }}\vert (4{\mathbf {a}}^{2}+C\sigma ^{2}\log (d))}{N\lambda _2}. \end{aligned}$$
(48)

Proof

Using the inequality \({\mathcal {F}}({\hat{L}},{\hat{S}})\le {\mathcal {F}}({\hat{L}},S_0)\) and (1) we obtain

$$\begin{aligned}&\dfrac{1}{N}\sum ^{N}_{i=1} \left( \left\langle X_i,\Delta L +\Delta S\right\rangle +\xi _{i}\right) ^{2}+\lambda _{2}{\mathcal {R}}({\hat{S}})\\&\quad \le \dfrac{1}{N}\sum ^{N}_{i=1} \left( \left\langle X_i,\Delta L\right\rangle +\xi _{i}\right) ^{2}+\lambda _{2}{\mathcal {R}}(S_0) \end{aligned}$$

which implies

$$\begin{aligned}&\dfrac{1}{N}\sum _{i\in \Omega } \left\langle X_i,\Delta S\right\rangle ^{2}+\dfrac{1}{N}\sum _{i\in {\tilde{\Omega }}} \left\langle X_i,\Delta S\right\rangle ^{2}+\dfrac{2}{N}\sum _{i\in {\tilde{\Omega }}} \left\langle X_i,\Delta L\right\rangle \left\langle X_i,\Delta S\right\rangle +\dfrac{2}{N}\sum _{i\in {\tilde{\Omega }}} \left\langle \xi _iX_i,\Delta S\right\rangle \\&\quad +\,\dfrac{2}{N}\sum _{i\in \Omega } \left\langle X_i,\Delta L\right\rangle \left\langle X_i,\Delta S_{\mathcal {I}}\right\rangle +2\left\langle \Sigma ,\Delta S_{\mathcal {I}}\right\rangle +\lambda _{2} {\mathcal {R}}({\hat{S}})\le \lambda _{2} {\mathcal {R}}(S_0). \end{aligned}$$

From Lemma 18 and the duality between \(\mathcal {R}\) and \(\mathcal {R}^{*}\) we obtain

$$\begin{aligned} \dfrac{1}{N}\sum _{i\in \Omega } \left\langle X_i,\Delta S\right\rangle ^{2}\le & {} 2(2{\mathbf {a}}\,\mathcal {R}^{*}(W)+\mathcal {R}^{*}( \Sigma ))\mathcal {R}(\Delta S_{\mathcal {I}}) + \lambda _{2} ({\mathcal {R}}(S_0)-{\mathcal {R}}({\hat{S}}))\\&+\, \dfrac{2}{N}\sum _{i\in {\tilde{\Omega }}} \left\langle X_i,\Delta L\right\rangle ^{2}+\dfrac{2}{N}\sum _{i\in {\tilde{\Omega }}}\xi ^{2}. \end{aligned}$$

Since here \(\Delta S_{\mathcal {I}}=-{\hat{S}}_{\mathcal {I}}\) and \(\lambda _2\ge 4\left( \mathcal {R}^{*}(\Sigma )+2{\mathbf {a}}\mathcal {R}^{*}(W)\right) \) it follows that

$$\begin{aligned} \dfrac{1}{N}\sum _{i\in \Omega } \left\langle X_i,\Delta S\right\rangle ^{2}\le \lambda _{2}\mathcal {R}\left( S_0\right) +\dfrac{2}{N}\sum _{i\in {\tilde{\Omega }}} \left\langle X_i,\Delta L\right\rangle ^{2}+\dfrac{2}{N}\sum _{i\in {\tilde{\Omega }}}\xi ^{2}. \end{aligned}$$
(49)

Now, Lemma 12 and the bound \(\Vert \Delta S\Vert _{\infty }\le 2{\mathbf {a}}\) imply that, on the event \({\mathcal {U}}\) defined in (26),

$$\begin{aligned} \mathcal {R}(\Delta S)\le & {} 4\mathcal {R}(\Delta S_{{\tilde{\Omega }}})+\frac{\vert {\tilde{\Omega }}\vert (4{\mathbf {a}}^{2}+C\sigma ^{2}\log (d))}{N\lambda _2}\nonumber \\\le & {} 8{\mathbf {a}}\mathcal {R}({\mathbf {Id}}_{\tilde{\Omega }})+\frac{\vert {\tilde{\Omega }}\vert (4{\mathbf {a}}^{2}+C\sigma ^{2}\log (d))}{N\lambda _2}. \end{aligned}$$
(50)

Thus, (48) is proved. To prove (47), consider the following two cases.

Case I \(\Vert \Delta S\Vert ^{2}_{L_{2}(\Pi )}< 4{\mathbf {a}}^{2}\sqrt{\frac{64\log (d)}{\log (6/5)\,n}}\). Then (47) holds trivially.

Case II \(\Vert \Delta S\Vert ^{2}_{L_{2}(\Pi )}\ge 4{\mathbf {a}}^{2}\sqrt{\frac{64\log (d)}{\log (6/5)\,n}}\). Then inequality (50) and the bound \(\Vert \Delta S\Vert _{\infty }\le 2{\mathbf {a}}\) imply that, on the event \({\mathcal {U}}\),

$$\begin{aligned} \dfrac{\Delta S}{2{\mathbf {a}}}\in {\mathcal {C}}\left( 4\,\mathcal {R}({\mathbf {Id}}_{\tilde{\Omega }})+\,\frac{\vert {\tilde{\Omega }}\vert \left( 8{\mathbf {a}}^{2}+C\sigma ^{2}\log (d)\right) }{2{\mathbf {a}}\,N\lambda _2}\right) \end{aligned}$$

where, for any \(\delta >0\), the set \({\mathcal {C}}(\delta )\) is defined as:

$$\begin{aligned} {\mathcal {C}}(\delta )=\left\{ A\in {\mathbb {R}}^{m_1\times m_2}:\left\| A\right\| _{\infty }\le 1, \left\| A\right\| _{L_2(\Pi )}^{2}\ge \sqrt{\dfrac{64\,\log (d)}{\log \left( 6/5\right) \,n}}, \mathcal {R}(A)\le \delta \right\} .\qquad \end{aligned}$$
(51)

Thus, we can apply Lemma 15(i) below. In view of this lemma, the inequalities (49), (27), \(\Vert \Delta L\Vert _{\infty }\le 2{\mathbf {a}}\) and \(\mathcal {R}\left( S_0\right) \le {\mathbf {a}}\mathcal {R}({\mathbf {Id}}_{\tilde{\mathcal {I}}})\) imply that (47) holds with probability at least \(1-2.5\,d^{-1}\). \(\square \)

Lemma 15

Let the distribution \(\Pi \) on \({\mathcal {X}}'\) satisfy Assumptions 1 and 2. Let \(\delta , \delta _1, \delta _2, \tau \), and \(\kappa \) be positive constants. Then, the following properties hold.

  1. (i)

    With probability at least \(1-\dfrac{2}{d}\),

    $$\begin{aligned} \frac{1}{n}\sum _{i\in \Omega } \left\langle X_i,S\right\rangle ^{2}\ge \dfrac{1}{2}\Vert S\Vert _{L_2(\Pi )}^{2}-8\delta {\mathbb {E}}( \mathcal {R}^{*}(\Sigma _R)) \end{aligned}$$

    for any \(S\in {\mathcal {C}}(\delta )\).

  2. (ii)

    With probability at least \(1-\dfrac{2}{d}\),

    $$\begin{aligned} \frac{1}{n}\sum _{i\in \Omega } \left\langle X_i,L+S\right\rangle ^{2}\ge & {} \dfrac{1}{2}\Vert L+S\Vert _{L_2(\Pi )}^{2}- \{360\mu \,\vert \mathcal {I}\vert \,\tau \left( {\mathbb {E}}\left( \Vert \Sigma _R\Vert \right) \right) ^{2}\\&+4\delta _1^{2}+8\delta _2\,{\mathbb {E}}( \mathcal {R}^{*}(\Sigma _R))+8\kappa {\mathbb {E}}\left( \Vert \Sigma _R\Vert \right) \} \end{aligned}$$

    for any pair \((L,S)\in {\mathcal {D}}(\tau ,\kappa )\cap \{{\mathbb {R}}^{m_1\times m_2}\times {\mathcal {B}}(\delta _1,\delta _2)\}\).

Proof

We give a unified proof of (i) and (ii). Let \(A=S\) for (i) and \(A=L+S\) for (ii). Set

$$\begin{aligned} {\mathcal {E}}= \left\{ \begin{array}{ll} 8\delta {\mathbb {E}}\left( \mathcal {R}^{*}(\Sigma _R)\right) &{}\quad \hbox {for (i)} \\ 360\mu \,\vert \mathcal {I}\vert \,\tau \left( {\mathbb {E}}\left( \Vert \Sigma _R\Vert \right) \right) ^{2}+4\delta _1^{2} +8\delta _2\,{\mathbb {E}}\left( \mathcal {R}^{*}(\Sigma _R)\right) +8\kappa {\mathbb {E}}\left( \Vert \Sigma _R\Vert \right) &{}\quad {\text {for (ii)}} \end{array} \right. \end{aligned}$$

and

$$\begin{aligned} {\mathcal {C}}= \left\{ \begin{array}{ll} {\mathcal {C}}(\delta ) &{} \quad \hbox {for (i)} \\ {\mathcal {D}}(\tau ,\kappa )\cap ({\mathbb {R}}^{m_1\times m_2}\times {\mathcal {B}}(\delta _1,\delta _2)) &{}\quad {\text {for (ii).}} \end{array} \right. \end{aligned}$$

To prove the lemma it is enough to show that the probability of the random event

$$\begin{aligned} {\mathcal {B}}=\left\{ \exists \,A\in {\mathcal {C}}\,\text {such that}\,\left| \frac{1}{n}\sum _{i\in \Omega } \left\langle X_i,A\right\rangle ^{2}-\Vert A\Vert _{L_2(\Pi )}^{2}\right| > \dfrac{1}{2}\Vert A\Vert _{L_2(\Pi )}^{2}+ {\mathcal {E}}\right\} \end{aligned}$$

is smaller than 2 / d. In order to estimate the probability of \({\mathcal {B}}\), we use a standard peeling argument. Set \(\nu =\sqrt{\dfrac{64\,\log (d)}{\log \left( 6/5\right) \,n}}\) and \(\alpha =\dfrac{6}{5}\). For \(l\in {\mathbb {N}}\), define

$$\begin{aligned} S_l=\{A\in {\mathcal {C}}\,:\,\alpha ^{l-1}\nu \le \Vert A\Vert _{L_2(\Pi )}^{2}\le \alpha ^{l}\nu \}. \end{aligned}$$

If the event \({\mathcal {B}}\) holds, there exist \(l\in {\mathbb {N}}\) and a matrix \(A\in {\mathcal {C}}\cap S_l\) such that

$$\begin{aligned} \left| \frac{1}{n}\sum _{i\in \Omega } \left\langle X_i,A\right\rangle ^{2} -\Vert A\Vert _{L_2(\Pi )}^{2}\right|&{>} \dfrac{1}{2}\Vert A\Vert _{L_2(\Pi )}^{2} + {\mathcal {E}}\nonumber \\&> \dfrac{1}{2}\alpha ^{l-1}\nu + {\mathcal {E}}\nonumber \\&= \dfrac{5}{12}\alpha ^{l}\nu + {\mathcal {E}}. \end{aligned}$$
(52)

For each \(l\in {\mathbb {N}}\), consider the random event

$$\begin{aligned} {\mathcal {B}}_l=\left\{ \exists \,A\in {\mathcal {C}}'(\alpha ^{l}\nu )\,:\,\left| \frac{1}{n}\sum _{i\in \Omega } \left\langle X_i,A\right\rangle ^{2}-\Vert A\Vert _{L_2(\Pi )}^{2}\right| > \dfrac{5}{12}\alpha ^{l}\nu + {\mathcal {E}}\right\} \end{aligned}$$

where

$$\begin{aligned} {\mathcal {C}}'(T)=\{A\in {\mathcal {C}} \,:\, \left\| A\right\| _{L_2(\Pi )}^{2}\le T \}, \quad \forall T>0. \end{aligned}$$

Note that \(A\in S_l\) implies that \(A\in {\mathcal {C}}'(\alpha ^{l}\nu )\). This and (52) grant the inclusion \({\mathcal {B}}\subset \cup _{l=1}^\infty \,{\mathcal {B}}_l\). By Lemma 16, \({\mathbb {P}}\left( {\mathcal {B}}_l\right) \le \exp (-c_5\,n\,\alpha ^{2l}\nu ^{2})\) where \(c_5=1/128\). Using the union bound we find

$$\begin{aligned} \displaystyle {\mathbb {P}}\left( {\mathcal {B}}\right)\le & {} \sum _{l=1}^{\infty }{\mathbb {P}}\left( {\mathcal {B}}_l\right) \\ \displaystyle\le & {} \sum _{l=1}^{\infty }\exp (-c_5\,n\,\alpha ^{2l}\,\nu ^{2})\\ \displaystyle\le & {} \sum _{l=1}^{\infty }\exp (-(2\,c_5\,n\,\log (\alpha )\, \nu ^{2})l) \end{aligned}$$

where we have used the inequality \(e^{x}\ge x\). We finally obtain, for \(\nu =\sqrt{\dfrac{64\,\log (d)}{\log \left( 6/5\right) \,n}}\),

$$\begin{aligned} {\mathbb {P}}\left( {\mathcal {B}}\right) \le \dfrac{\exp \left( -2\,c_5\,n\,\log (\alpha )\,\nu ^{2}\right) }{1-\exp \left( -2\,c_5\,n\, \log (\alpha )\,\nu ^{2}\right) }=\dfrac{\exp \left( -\log (d)\right) }{1-\exp \left( -\log (d)\right) }. \end{aligned}$$

\(\square \)

Let

$$\begin{aligned} Z_T=\underset{A\in {\mathcal {C}}'(T)}{\sup }\left| \frac{1}{n}\sum _{i\in \Omega } \left\langle X_i,A\right\rangle ^{2}-\Vert A\Vert _{L_2(\Pi )}^{2}\right| . \end{aligned}$$

Lemma 16

Let the distribution \(\Pi \) on \({\mathcal {X}}'\) satisfy Assumptions 1 and 2. Then,

$$\begin{aligned}{\mathbb {P}}\left( Z_T>\dfrac{5}{12}T+{\mathcal {E}}\right) \le \exp (-c_5\,n\,T^{2}) \end{aligned}$$

where \(c_5=\dfrac{1}{128}\).

Proof

We follow a standard approach: first we show that \(Z_T\) concentrates around its expectation and then we bound from above the expectation. Since \(\left\| A\right\| _{\infty }\le 1\) for all \(A\in {\mathcal {C}}'(T)\), we have \(\left| \left\langle X_i,A\right\rangle \right| \le 1\). We use first a Talagrand type concentration inequality, cf. [4, Theorem 14.2], implying that

$$\begin{aligned} {\mathbb {P}}\left( Z_T\ge {\mathbb {E}}\left( Z_T\right) +\dfrac{1}{9}\left( \dfrac{5}{12}T\right) \right) \le \exp (-c_5\,n\,T^{2}) \end{aligned}$$
(53)

where \(c_5=\dfrac{1}{128}\). Next, we bound the expectation \({\mathbb {E}}\left( Z_T\right) \). By a standard symmetrization argument (see e.g. [19, Theorem 2.1]) we obtain

$$\begin{aligned} {\mathbb {E}}\left( Z_T\right)= & {} {\mathbb {E}}\left( \underset{A\in {\mathcal {C}}'(T)}{\sup } \left| \frac{1}{n}\sum _{i\in \Omega } \left\langle X_i,A\right\rangle ^{2} -{\mathbb {E}}(\left\langle X,A\right\rangle ^{2})\right| \right) \\\le & {} 2{\mathbb {E}}\left( \underset{A\in {\mathcal {C}}'(T)}{\sup }\left| \dfrac{1}{n} \sum _{i\in \Omega } \epsilon _i\left\langle X_i,A\right\rangle ^{2} \right| \right) \end{aligned}$$

where \(\{\epsilon _i\}_{i=1}^{n}\) is an i.i.d. Rademacher sequence. Then, the contraction inequality (see e.g. [19]) yields

$$\begin{aligned} {\mathbb {E}}\left( Z_T\right) \le 8{\mathbb {E}}\left( \underset{A\in {\mathcal {C}}'(T)}{\sup }\left| \dfrac{1}{n} \sum _{i\in \Omega } \epsilon _i\left\langle X_i,A\right\rangle \right| \right) =8{\mathbb {E}}\left( \underset{A\in {\mathcal {C}}'(T)}{\sup }\left| \left\langle \Sigma _R,A\right\rangle \right| \right) \end{aligned}$$

where \(\Sigma _R=\dfrac{1}{n} {\sum \nolimits ^{n}_{i=1}} \epsilon _i X_i\). Now, to obtain a bound on \({\mathbb {E}}({\sup }_{A\in {\mathcal {C}}'(T)}\left| \left\langle \Sigma _R,A\right\rangle \right| )\) we will consider separately the cases \({\mathcal {C}}={\mathcal {C}}(\delta )\) and \({\mathcal {C}}= {\mathcal {D}}(\tau ,\kappa )\cap \{{\mathbb {R}}^{m_1\times m_2}\times {\mathcal {B}}(\delta _1,\delta _2)\}\).

Case I \(A\in {\mathcal {C}}(\delta )\) and \(\left\| A\right\| _{L_2(\Pi )}^{2}\le T\). By the definition of \({\mathcal {C}}(\delta )\) we have \(\mathcal {R}( A) \le \delta .\) Thus, by the duality between \(\mathcal {R}\) and \(\mathcal {R}^{*}\),

$$\begin{aligned} {\mathbb {E}}\left( Z_T\right) \le 8{\mathbb {E}}\left( \underset{\mathcal {R}( A)\le \delta }{\sup }\left| \left\langle \Sigma _R,A\right\rangle \right| \right) \le 8\delta \,{\mathbb {E}}\left( \mathcal {R}^{*}(\Sigma _R)\right) . \end{aligned}$$

This and the concentration inequality (53) imply

$$\begin{aligned} {\mathbb {P}}\left( Z_T>\dfrac{5}{12}T+ {\mathcal {E}}\right) \le \exp (-c_5\,n\,T^{2}) \end{aligned}$$

with \(c_5=\dfrac{1}{128}\) and \({\mathcal {E}}=8\delta \,{\mathbb {E}}\left( \mathcal {R}^{*}(\Sigma _R)\right) \) as stated.

Case II \(A=L+S\) where \((L,S)\in {\mathcal {D}}(\tau ,\kappa )\), \(S\in {\mathcal {B}}(\delta _1,\delta _2)\), and \(\Vert L+S\Vert ^{2}_{L_2(\Pi )}\le T\). Then, by the definition of \({\mathcal {B}}(\delta _1,\delta _2)\), we have \(\mathcal {R}(S)\le \delta _2.\) On the other hand, the definition of \({\mathcal {D}}(\tau ,\kappa )\) yields

$$\begin{aligned} \Vert L\Vert _{*}\le \sqrt{\tau }\Vert L_{\mathcal {I}}\Vert _{2}+\kappa \end{aligned}$$

and

$$\begin{aligned} \Vert L\Vert _{L_2(\Pi )}\le \Vert L+S\Vert _{L_2(\Pi )}+\Vert S\Vert _{L_2(\Pi )}\le \sqrt{T}+\delta _1. \end{aligned}$$

The last two inequalities imply

$$\begin{aligned} \Vert L\Vert _{*}\le \sqrt{\mu \,\vert \mathcal {I}\vert \,\tau }(\sqrt{T}+\delta _1)+\kappa :=\Gamma _{1}. \end{aligned}$$

Therefore we can write

$$\begin{aligned} {\mathbb {E}}\left( \underset{A\in {\mathcal {C}}'(T)}{\sup }\left| \left\langle \Sigma _R,A\right\rangle \right| \right)\le & {} 8{\mathbb {E}}\left( \underset{\Vert L\Vert _{*}\le \Gamma _{1} }{\sup }\left| \left\langle \Sigma _R,L\right\rangle \right| +\underset{\mathcal {R}( S)\le \delta _{2} }{\sup }\left| \left\langle \Sigma _R,S\right\rangle \right| \right) \\\le & {} 8\left\{ \Gamma _{1}\,{\mathbb {E}}\left( \Vert \Sigma _R\Vert \right) +\delta _{2}\,{\mathbb {E}}\left( \mathcal {R}^{*}(\Sigma _R)\right) \right\} . \end{aligned}$$

Combining this bound with the following elementary inequalities:

$$\begin{aligned} \frac{1}{9}\left( \frac{5}{12}T\right) +8\sqrt{\mu \,\vert \mathcal {I}\vert \,\tau \,T}\,{\mathbb {E}}\left( \Vert \Sigma _R\Vert \right)\le & {} \left( \frac{1}{9}+\frac{8}{9}\right) \frac{5}{12}T +44\mu \,\vert \mathcal {I}\vert \,\tau \left( {\mathbb {E}}\left( \Vert \Sigma _R\Vert \right) \right) ^{2},\\ \delta _1\sqrt{\mu \,\vert \mathcal {I}\vert \,\tau }_,{\mathbb {E}}\left( \Vert \Sigma _R\Vert \right)\le & {} \mu \,\vert \mathcal {I}\vert \,\tau \left( {\mathbb {E}}\left( \Vert \Sigma _R\Vert \right) \right) ^{2}+\frac{\delta _1^{2}}{2} \end{aligned}$$

and using the concentration bound (53) we obtain

$$\begin{aligned} {\mathbb {P}}\left( Z_T>\dfrac{5}{12}T+ {\mathcal {E}}\right) \le \exp (-c_5\,n\,T^{2}) \end{aligned}$$

with \(c_5=\dfrac{1}{128}\) and

$$\begin{aligned} {\mathcal {E}}= & {} 360\mu \,\vert \mathcal {I}\vert \,\tau \left( {\mathbb {E}}\left( \Vert \Sigma _R\Vert \right) \right) ^{2}+4\delta _1^{2}+8\delta _2\,{\mathbb {E}}\left( \mathcal {R}^{*}(\Sigma _R)\right) +8\kappa {\mathbb {E}}\left( \Vert \Sigma _R\Vert \right) \end{aligned}$$
(54)

as stated. \(\square \)

1.2 A.2: Proof of Corollary 7

With \(\lambda _1\) and \(\lambda _2\) given by (16) we obtain

$$\begin{aligned} \Psi _1= & {} \mu ^{2}\ae ^{2}(\sigma \vee {\mathbf {a}})^{2}\dfrac{M\,r\,\log d}{N},\\ \Psi '_2\le & {} \mu ^{2}\ae ^{2}(\sigma \vee {\mathbf {a}})^{2}\log (d) \dfrac{\vert {\tilde{\Omega }}\vert }{N}+\dfrac{{\mathbf {a}}^{2}s}{m_2},\\ \Psi '_3= & {} \dfrac{\mu \,\ae \vert {\tilde{\Omega }}\vert ({\mathbf {a}}^{2}+\sigma ^{2}\log (d))}{N} +\dfrac{{\mathbf {a}}^{2}s}{m_2}\\ \Psi '_4\le & {} \dfrac{\mu \,\ae ^{2}\vert {\tilde{\Omega }}\vert ({\mathbf {a}}^{2}+\sigma ^{2}\log (d))}{N} +{\mathbf {a}}^{2}\,\sqrt{\dfrac{\log (d)}{n}}+\dfrac{{\mathbf {a}}^{2}s}{m_2}. \end{aligned}$$

Appendix B: Proof of Theorems 2 and 3

Note that the assumption \(\ae \le 1+s/m_2\) implies that

$$\begin{aligned} \dfrac{\vert {\tilde{\Omega }}\vert }{n}\le \dfrac{s}{m_2}. \end{aligned}$$
(55)

Assume w.l.o.g. that \(m_1\ge m_2\). For a \(\gamma \le 1\), define

$$\begin{aligned} {{\tilde{\mathcal {L}}}} =\Big \{{\tilde{L}} = (l_{ij})\in {\mathbb {R}}^{m_1\times r}: l_{ij}\in \Big \{0, \gamma (\sigma \wedge {\mathbf {a}}) \Big (\frac{ r M}{n}\Big )^{1/2}\Big \},\quad \forall 1\le i \le m_1,\, 1\le j\le r\Big \}, \end{aligned}$$

and consider the associated set of block matrices

$$\begin{aligned} {\mathcal {L}} \ =\ \{ L=({\tilde{L}}\quad \cdots \quad {\tilde{L}}\quad O) \in {\mathbb {R}}^{m_1\times m_2}: {\tilde{L}}\in {\tilde{\mathcal {L}}}\}, \end{aligned}$$

where O denotes the \(m_1\times (m_2-r\lfloor m_2/(2r) \rfloor )\) zero matrix, and \(\lfloor x \rfloor \) is the integer part of x.

We define similarly the set of matrices

$$\begin{aligned} {\tilde{\mathcal {S}}} \, =\{ {\tilde{S}}=(s_{ij})\in {\mathbb {R}}^{m_1\times s}: s_{ij}\in \big \{0, \gamma (\sigma \wedge {\mathbf {a}})\big \},\quad \forall 1\le i \le m_1,\, 1\le j\le s\}, \end{aligned}$$

and

$$\begin{aligned} {\mathcal {S}}\ =\ \{ S=({\tilde{O}}\quad {\tilde{S}})\in {\mathbb {R}}^{m_1\times m_2}: {\tilde{S}} \in {\tilde{\mathcal {S}}}\}, \end{aligned}$$

where \({\tilde{O}}\) is the \(m_1\times (m_2-s )\) zero matrix. We now set

$$\begin{aligned} {\mathcal {A}} = \left\{ A = L+S\,:\, L \in {\mathcal {L}},\, S \in {\mathcal {S}} \right\} . \end{aligned}$$

Remark 2

In the case \(m_1< m_2\), we only need to change the construction of the low rank component of the test set. We first introduce a matrix \({\tilde{L}}= \left( \begin{array}{c|c}{\bar{L}}&{}O\\ \end{array}\right) \in {\mathbb {R}}^{r \times m_2}\) where \({\bar{L}} \in {\mathbb {R}}^{r \times (m_2/2)}\) with entries in \(\{0, \gamma (\sigma \wedge {\mathbf {a}}) (\frac{ rM}{n} )^{1/2}\}\) and then we replicate this matrix to obtain a block matrix L of size \(m_1 \times m_2\)

$$\begin{aligned} L=\left( \begin{array}{c} {\tilde{L}}\\ \hline \\ \vdots \\ \hline \\ {\tilde{L}}\\ \hline \\ O \end{array} \right) . \end{aligned}$$

By construction, any element of \({\mathcal {A}}\) as well as the difference of any two elements of \({\mathcal {A}}\) can be decomposed into a low rank component L of rank at most r and a group sparse component S with at most s nonzero columns. In addition, the entries of any matrix in \({\mathcal {A}}\) take values in [0, a]. Thus, \({\mathcal {A}}\subset {\mathcal {A}}_{GS}(r,s,{\mathbf {a}})\).

We first establish a lower bound of the order rM / n. Let \({\tilde{\mathcal {A}}}\subset {\mathcal {A}}\) be such that for any \(A=L+S\in {\tilde{\mathcal {A}}}\) we have \(S={\mathbf {0}}\). The Varshamov–Gilbert bound (cf. Lemma 2.9 in [25]) guarantees the existence of a subset \({\mathcal {A}} ^{0}\subset {\tilde{\mathcal {A}}}\) with cardinality \({\mathrm {Card}}({\mathcal {A}} ^0) \ge 2^{(rM)/8}+1\) containing the zero \(m_{1}\times m_2\) matrix \(\mathbf{0}\) and such that, for any two distinct elements \(A_1\) and \(A_2\) of \({\mathcal {A}} ^{0}\),

$$\begin{aligned} \vert \vert A_1-A_2\vert \vert _{2}^2 \ge \frac{Mr}{8} \left( \gamma ^2(\sigma \wedge {\mathbf {a}})^2 \frac{Mr }{n} \right) \left\lfloor \frac{m_2}{r}\right\rfloor \ge \frac{\gamma ^2}{16}(\sigma \wedge {\mathbf {a}})^2\,m_1m_2 \,\frac{Mr}{n}. \end{aligned}$$
(56)

Since \(\xi _i\sim {\mathcal {N}}(0,\sigma ^2)\) we get that, for any \(A\in {\mathcal {A}} ^0\), the Kullback–Leibler divergence \(K\big ({{\mathbb {P}}}_{\mathbf{0}},{{\mathbb {P}}}_{A}\big )\) between \({{\mathbb {P}}}_{\mathbf{0}}\) and \({{\mathbb {P}}}_{A}\) satisfies

$$\begin{aligned} K\big ({{\mathbb {P}}}_{\mathbf{0}},{{\mathbb {P}}}_{A}\big )\ =\ \frac{\vert \Omega \vert }{2\sigma ^2}\Vert A\Vert _{L_2(\Pi )}^2 \le \frac{\mu _1\gamma ^2\,Mr}{2} \end{aligned}$$
(57)

where we have used Assumption 9. From (57) we deduce that the condition

$$\begin{aligned} \frac{1}{{\mathrm {Card}}({\mathcal {A}} ^0)-1} \sum _{A\in {\mathcal {A}} ^0}K({{\mathbb {P}}}_\mathbf{0},{{\mathbb {P}}}_{A})\ \le \ \frac{1}{16} \log \big ({\mathrm {Card}}({\mathcal {A}} ^0)-1\big ) \end{aligned}$$
(58)

is satisfied if \(\gamma >0\) is chosen as a sufficiently small numerical constant. In view of (56) and (58), the application of Theorem 2.5 in [25] implies

$$\begin{aligned} \inf _{({\hat{L}},{\hat{S}})}\sup _{\begin{array}{c} (L_0,S_0)\in \,{\mathcal {A}}_{GS}( r,s,{\mathbf {a}}) \end{array}} {\mathbb {P}}_{A_0}\left( \dfrac{\Vert {\hat{L}}-L_0\Vert _2^{2}}{m_1m_2} +\dfrac{\Vert {\hat{S}}-S_0\Vert _2^{2}}{m_1m_2} > \dfrac{C(\sigma \wedge {\mathbf {a}})^2\,Mr}{n} \right) \ \ge \ \beta \nonumber \\ \end{aligned}$$
(59)

for some absolute constants \(\beta \in (0,1)\).

We now prove the lower bound relative to the corruptions. Let \({\bar{\mathcal {A}}}\subset {\mathcal {A}}\) such that for any \(A=L+S\in {\bar{\mathcal {A}}}\) we have \(L={\mathbf {0}}\). The Varshamov–Gilbert bound (cf. Lemma 2.9 in [25]) guarantees the existence of a subset \({\mathcal {A}} ^0\subset {\bar{\mathcal {A}}}\) with cardinality \({\mathrm {Card}}({\mathcal {A}} ^0) \ge 2^{(sm_1)/8}+1\) containing the zero \(m_1\times m_2\) matrix \(\mathbf{0}\) and such that, for any two distinct elements \(A_1\) and \(A_2\) of \({\mathcal {A}} ^0\),

$$\begin{aligned} \vert \vert S_1-S_2\vert \vert _{2}^2 \ge \frac{sm_1}{8} (\gamma ^2(\sigma \wedge {\mathbf {a}})^2 ) = \frac{\gamma ^2\,(\sigma \wedge {\mathbf {a}})^2\,s}{8m_2}\,m_1m_2. \end{aligned}$$
(60)

For any \(A\in {\mathcal {A}} _0\), the Kullback–Leibler divergence between \({{\mathbb {P}}}_{\mathbf{0}}\) and \({{\mathbb {P}}}_{A}\) satisfies

$$\begin{aligned} K\big ({{\mathbb {P}}}_{\mathbf{0}},{{\mathbb {P}}}_{A}\big )\ =\ \frac{\vert {\tilde{\Omega }}\vert }{2\sigma ^2}\gamma ^2(\sigma \wedge {\mathbf {a}})^2\le \frac{\gamma ^2\,m_1s}{2} \end{aligned}$$

which implies that condition  (58) is satisfied if \(\gamma >0\) is chosen small enough. Thus, applying Theorem 2.5 in [25] we get

$$\begin{aligned} \inf _{({\hat{L}},{\hat{S}})}\sup _{\begin{array}{c} (L_0,S_0)\in \,{\mathcal {A}}_{GS}( r,s,{\mathbf {a}}) \end{array}} {\mathbb {P}}_{A_0}\left( \dfrac{\Vert {\hat{L}}-L_0\Vert _2^{2}}{m_1m_2} +\dfrac{\Vert {\hat{S}}-S_0\Vert _2^{2}}{m_1m_2} > \dfrac{C(\sigma \wedge {\mathbf {a}})^2\,s}{m_2} \right) \ \ge \ \beta \nonumber \\ \end{aligned}$$
(61)

for some absolute constant \(\beta \in (0,1)\). Theorem 2 follows from inequalities (55), (59) and (61).

The proof of Theorem 3 follows the same lines as that of Theorem 2. The only difference is that we replace \({\tilde{\mathcal {S}}}\) by the following set

$$\begin{aligned} \{ S=(s_{ij})\in {\mathbb {R}}^{m_1\times m_2}: s_{ij}\in \{0, \gamma (\sigma \wedge {\mathbf {a}}) \},\quad \forall 1\le i \le m_1,\, \lfloor m_2/2 \rfloor +1\le j\le m_2\}. \end{aligned}$$

We omit further details here.

Appendix C: Proof of Lemma 6

Part (i) of Lemma 6 is proved in Lemmas 5 and 6 in [18].

Proof of (ii)

For the sake of brevity, we set \(X_i(j,k) = \langle X_i, e_j(m_1)e_k(m_2)^{\top }\rangle \). By definition of \(\Sigma \) and \(\Vert \cdot \Vert _{2,\infty }\), we have

$$\begin{aligned} \Vert \Sigma \Vert _{2,\infty }^2 = \max _{1\le k \le m_2}\sum _{j=1}^{m_1} \left( \frac{1}{N}\sum _{i \in \Omega }\xi _{i}X_i(j,k) \right) ^2. \end{aligned}$$

For any fixed k, we have

$$\begin{aligned} \sum _{j=1}^{m_1}\left( \frac{1}{N}\sum _{i \in \Omega }\xi _{i}X_i(j,k) \right) ^2&= \frac{1}{N^2}\sum _{i_1, i_2\in \Omega }\xi _{i_1}\xi _{i_2}\sum _{j= 1}^{m_1} X_{i_1}(j,k)X_{i_2}(j,k)\nonumber \\&= \Xi ^\top A_k \Xi , \end{aligned}$$
(62)

where \(\Xi = (\xi _1,\ldots ,\xi _n)^\top \) and \(A_k\in {\mathbb {R}}^{|\Omega |\times |\Omega |}\) with entries

$$\begin{aligned} a_{i_1 i_2}(k) = \frac{1}{N^2}\sum _{j=1}^{m_1} X_{i_1}(j,k) X_{i_2}(j,k). \end{aligned}$$

We freeze the \(X_i\) and we apply the version of Hanson–Wright inequality in [24] to get that there exists a numerical constant C such that with probability at least \(1-e^{-t}\)

$$\begin{aligned} | \Xi ^{\top } A_k \Xi - {\mathbb {E}}[\Xi ^\top A_k \Xi \vert X_i] | \le C \sigma ^2 \left( \Vert A_k\Vert _2 \sqrt{t} + \Vert A_k\Vert t \right) . \end{aligned}$$
(63)

Next, we note that

$$\begin{aligned} \Vert A_k\Vert _2^2 = \sum _{i_1,i_2} a_{i_1 i_2}^2(k)&\le \frac{1}{N^4} \sum _{i_1 i_2} \left( \sum _{j_1=1}^{m_1} X_{i_1}^2 (j_1,k) \right) \left( \sum _{j_1=1}^{m_1} X_{i_2}^2 (j_1,k) \right) \\&\le \frac{1}{N^4}\left[ \sum _{i_1 }\sum _{j_1=1}^{m_1} X_{i_1}^2 (j_1,k) \right] ^2= \left[ \frac{1}{N^2}\sum _{i_1 }\sum _{j_1=1}^{m_1} X_{i_1} (j_1,k) \right] ^2, \end{aligned}$$

where we have used the Cauchy–Schwarz inequality in the first line and the relation \(X^{2}_{i} (j,k)=X_{i} (j,k)\).

Note that \(Z_i(k):= \sum _{j= 1}^{m_1} X_i(j,k)\) follows a Bernoulli distribution with parameter \(\pi _{\cdot k}\) and consequently \(Z(k) = \sum _{i\in \Omega } Z_{i}(k)\) follows a Binomial distribution \(B(|\Omega |,\pi _{\cdot k})\). We apply Bernstein’s inequality (see, e.g., [4, page 486]) to get that, for any \(t>0\),

$$\begin{aligned} {\mathbb {P}} \left( |Z(k) - {\mathbb {E}}[ Z(k)]| \ge 2\sqrt{ |\Omega | \pi _{\cdot k} t} + t \right) \le 2 e^{-t}. \end{aligned}$$

Consequently, we get with probability at least \(1-2e^{-t}\) that

$$\begin{aligned} \Vert A_k\Vert _2^2 \le \left( \frac{|\Omega | \pi _{\cdot k} + 2\sqrt{|\Omega | \pi _{\cdot k} t} + t}{N^2}\right) ^2 \end{aligned}$$

and, using \(\Vert A_k\Vert \le \Vert A_k\Vert _{2}\), that

$$\begin{aligned} \Vert A_k\Vert \le \frac{|\Omega | \pi _{\cdot k} + 2\sqrt{|\Omega | \pi _{\cdot k} t} + t}{N^2}. \end{aligned}$$

Note also that

$$\begin{aligned} {\mathbb {E}}[\Xi ^\top A_k \Xi \big \vert X_i] = \frac{\sigma ^2}{N^2}Z(k). \end{aligned}$$

Combining the last three displays with (63) we get, up to a rescaling of the constants, with probability at least \(1-e^{-t}\) that

$$\begin{aligned} \sum _{j=1}^{m_1}\left( \frac{1}{N}\sum _{i \in \Omega _r}\xi _{i}X_i(j,k) \right) ^2 \le C \frac{\sigma ^2}{N^2} \left( |\Omega | \pi _{\cdot k} + 2\sqrt{|\Omega | \pi _{\cdot k} t} + t\right) (1+\sqrt{t}+t). \end{aligned}$$

Replacing t by \(t+\log m_2\) in the above display and using the union bound gives that, with probability at least \(1-e^{-t}\),

$$\begin{aligned} \Vert \Sigma \Vert _{2,\infty }\le & {} C \frac{\sigma }{N} \left( |\Omega | \pi _{\cdot k} + 2\sqrt{|\Omega | \pi _{\cdot k} (t+\log m_2)} + (t+\log m_2)\right) ^{1/2}\\&\times \,(1+\sqrt{t+\log m_2}+t+\log m_2)^{1/2}\\= & {} C \frac{\sigma }{N}\left( \sqrt{|\Omega | \pi _{\cdot k}} + \sqrt{t+\log m_2}\right) \left( 1+\sqrt{t+\log m_2}\right) . \end{aligned}$$

Assuming that \(\log m_2 \ge 1\) we get with probability at least \(1-e^{-t}\) that

$$\begin{aligned} \Vert \Sigma \Vert _{2,\infty } \le C \frac{\sigma }{N} \left( \sqrt{ |\Omega | \pi _{\cdot k} (t+\log m_2)} + (t+\log m_2)\right) . \end{aligned}$$

Using (14), we get that there exists a numerical constant \(C>0\) such with probability at least \(1 -e^{-t}\)

$$\begin{aligned} \Vert \Sigma \Vert _{2,\infty } \le C \frac{\sigma }{N} \left( \sqrt{ \dfrac{\gamma ^{1/2}n(t+\log m_2)}{m_2}} + (t+\log m_2)\right) . \end{aligned}$$

Finally, we use Lemma 17 to obtain the required bound on \({\mathbb {E}}\Vert \Sigma \Vert _{2,\infty }\).

Proof of (iii)

We follow the same lines as in the proof of part (ii) above. The only difference is to replace \(\xi _i\) by \(\epsilon _i\), \(\sigma \) by 1 and N by n.

Proof of (iv)

We need to establish the bound on

$$\begin{aligned} \Vert W\Vert _{2,\infty }^2 = \max _{1\le k \le m_2}\sum _{j=1}^{m_1} \left( \frac{1}{N}\sum _{i \in \Omega }X_i(j,k) \right) ^2. \end{aligned}$$

For any fixed k, we have

$$\begin{aligned} \sum _{j=1}^{m_1}\left( \frac{1}{N}\sum _{i \in \Omega }X_i(j,k) \right) ^2&= \frac{1}{N^2}\sum _{i \in \Omega } \sum _{j= 1}^{m_1} X_i^2(j,k) + \frac{1}{N^2}\sum _{i_1\ne i_2}\sum _{j= 1}^{m_1} X_{i_1}(j,k)X_{i_2}(j,k).\nonumber \\ \end{aligned}$$

The first term on the right hand side of the last display can be written as

$$\begin{aligned} \frac{1}{N^2}\sum _{i \in \Omega } \sum _{j= 1}^{m_1} X_i^2(j,k) = \frac{1}{N^2}\sum _{i \in \Omega } \sum _{j= 1}^{m_1} X_i(j,k)=\frac{Z(k)}{N^2}. \end{aligned}$$

Using the concentration bound on Z(k) in the proof of part (ii) above, we get that, with probability at least \(1-e^{-t}\),

$$\begin{aligned} \frac{1}{N^2}\sum _{i \in \Omega } \sum _{j= 1}^{m_1} X_i^2(j,k) \le \frac{|\Omega |}{N^2} \pi _{\cdot k} + 2 \frac{\sqrt{|\Omega | \pi _{\cdot k}t}}{N^2} + \frac{t}{N^2}. \end{aligned}$$
(64)

Next, the random variable

$$\begin{aligned} U_2 = \frac{1}{N^2}\sum _{i_1\ne i_2}\sum _{j= 1}^{m_1} [X_{i_1}(j,k)X_{i_2}(j,k) - \pi _{j,k}^2] \end{aligned}$$

is a U-statistic of order 2. We use now a Bernstein-type concentration inequality for U-statistics. To this end, we set \(X_i(\cdot ,k) = (X_i(1,k),\ldots , X_i(m_1,k))^\top \) and

$$\begin{aligned} h(X_{i_1}(\cdot ,k),X_{i_2}(\cdot ,k)) = \sum _{j= 1}^{m_1} [X_{i_1}(j,k)X_{i_2}(j,k) - \pi _{j,k}^2]. \end{aligned}$$

Let \(e_0(m_1) = {\mathbf {0}}_{m_1}\) be the zero vector in \({\mathbb {R}}^{m_1}\). Note that \(X_i(\cdot ,k)\) takes values in \(\{ e_j(m_1),\, 0\le j \le m_1\}\). For any function \(g\,:\, \{ e_j(m_1),\, 0\le j \le m_1\}^2 \rightarrow {\mathbb {R}}\), we set \(\Vert g\Vert _{L^{\infty }} = \max _{0\le j_1,j_2\le m_1}|g(e_{j_1}(m_1),e_{j_2}(m_1))|\).

We will need the following quantities to control the tail behavior of \(U_2\)

$$\begin{aligned} {\mathbf {A}}&= \Vert h\Vert _{L^\infty },\\ {\mathbf {B}}^2&= \max \left\{ \left\| \sum _{i_1} {\mathbb {E}} h^2(X_{i_1}(\cdot ,k),\cdot ) \right\| _{L^\infty }, \left\| \sum _{i_2} {\mathbb {E}} h^2(\cdot ,X_{i_2}(\cdot ,k)) \right\| _{L^\infty } \right\} ,\\ {\mathbf {C}}&= \sum _{i_1\ne i_2} {\mathbb {E}} [ h^2(X_{i_1} (\cdot ,k),X_{i_2}'(\cdot ,k))]\quad \text {and}\\ {\mathbf {D}}&= \sup \left\{ {\mathbb {E}} \sum _{i_1\ne i_2} h\left[ X_{i_1}(\cdot ,k), X_{i_2}'(\cdot ,k)\right] f_{i_1} [X_{i_1}(\cdot ,k)] g_{i_2}[X_{i_2}'(\cdot ,k)],\right. \\&\qquad \left. {\mathbb {E}} \sum _{i_1} f_{i_1}^2(X_{i_1}(\cdot ,k))\le 1, {\mathbb {E}} \sum _{i_2} g_{i_2}^2(X_{i_2}'(\cdot ,k))\le 1 \right\} , \end{aligned}$$

where \(X_i'(\cdot ,k)\) are independent replications of \(X_i(\cdot ,k)\) and f, \(g\,:\, {\mathbb {R}}^{m_1} \rightarrow {\mathbb {R}}\).

We now evaluate the above quantities in our particular setting. It is not hard to see that \({\mathbf {A}} =\max \{ \pi _{\cdot k}^{(2)} \,,\,1 - \pi _{\cdot k}^{(2)} \}\le 1\) where \(\pi _{\cdot k}^{(2)}= \sum _{j=1}^{m_1}\pi _{jk}^2 \). We also have that

$$\begin{aligned} {\mathbf {C}}&= \sum _{i_1\ne i_2} \left[ {\mathbb {E}} \left[ \left\langle X_{i_1}(\cdot ,k),X_{i_2}'(\cdot ,k)\right\rangle ^2 \right] - \left( \sum _{j=1}^{m_1}\pi _{jk}^2 \right) ^2\right] \\&= |\Omega |(|\Omega | - 1)\left[ {\mathbb {E}} \left[ \left\langle X_{i_1}(\cdot ,k),X_{i_2}'(\cdot ,k)\right\rangle \right] - \left( \sum _{j=1}^{m_1}\pi _{jk}^2 \right) ^2\right] \\&= |\Omega |(|\Omega | - 1) \left[ \sum _{j=1}^{m_1}\pi _{jk}^2 - \left( \sum _{j=1}^{m_1}\pi _{jk}^2 \right) ^2\right] \le |\Omega |(|\Omega | - 1) \pi _{\cdot k}^{(2)}, \end{aligned}$$

where we have used in the second line that \(\langle X_{i_1}(\cdot ,k),X_{i_2}'(\cdot ,k)\rangle ^2 \!=\! \langle X_{i_1}(\cdot ,k),X_{i_2}'(\cdot ,k)\rangle \) since \(\langle X_{i_1}(\cdot ,k),X_{i_2}'(\cdot ,k)\rangle \) takes values in \(\{0,1\}\).

We now derive a bound on \({\mathbf {D}}\). By Jensen’s inequality, we get

$$\begin{aligned} \sum _{i} \sqrt{{\mathbb {E}} \left[ f_{i}^2(X_i(\cdot ,k))\right] } \le |\Omega |^{1/2} \sqrt{{\mathbb {E}} \left[ \sum _{i}f_{i}^2(X_i(\cdot ,k))\right] } \le |\Omega |^{1/2} \end{aligned}$$

where we used the bound \({\mathbb {E}} [\sum _{i}f_{i}^2(X_i(\cdot ,k))]\le 1\). Thus, the Cauchy–Schwarz inequality implies

$$\begin{aligned} {\mathbf {D}}&\le \sum _{i_1\ne i_2} {\mathbb {E}} \left[ h^2(X_{i_1},X_{i_2}')\right] {\mathbb {E}}^{1/2} \left[ f^{2}_{i_1}(X_{i_1}(\cdot ,k))\right] {\mathbb {E}}^{1/2} \left[ g^{2}_{i_2}(X_{i_2}'(\cdot ,k))\right] \\&\le \max _{i_1\ne i_2} \left\{ {\mathbb {E}}^{1/2} \left[ h^2(X_{i_1},X_{i_2}')\right] \right\} \sum _{i_1,i_2} {\mathbb {E}}^{1/2} \left[ f^{2}_{i_1}(X_{i_1}(\cdot ,k))\right] {\mathbb {E}}^{1/2} \left[ g^{2}_{i_2}(X_{i_2}'(\cdot ,k))\right] \\&\le \max _{i_1\ne i_2} \left\{ {\mathbb {E}}^{1/2} \left[ h^2(X_{i_1},X_{i_2}')\right] \right\} |\Omega | \\&\le |\Omega |\left( \sum _{j=1}^{m_1}\pi _{jk}^2\right) ^{1/2} = |\Omega |\left[ \pi _{\cdot k}^{(2)}\right] ^{1/2}, \end{aligned}$$

where we have used the fact that \({\mathbb {E}} [h^2(X_{i_1},X_{i_2}')] \le \sum _{j=1}^{m_1}\pi _{jk}^2\) following from an argument similar to that used to bound \({\mathbf {C}}\).

Finally, we get a bound on \({\mathbf {B}}\). Set \(\pi _{0,k} = 1 - \pi _{\cdot ,k}\). Note first that

$$\begin{aligned} \left\| \sum _{i_1} {\mathbb {E}} h^2(X_{i_1}(\cdot ,k),\cdot ) \right\| _{L^\infty }&= |\Omega | \max _{0\le j'\le m_1} \left\{ \sum _{j=0}^{m_1} h^{2}(e_j(m_1),e_{j'}(m_1))\pi _{jk} \right\} \\&\le |\Omega | (\pi _{\cdot k}^{(2)} )^2 + |\Omega |\max _{1\le j' \le m_1}\pi _{j',k}. \end{aligned}$$

By symmetry, we obtain the same bound on \(\Vert \sum _{i_2} {\mathbb {E}} h^2(\cdot ,X_{i_2}(\cdot ,k)) \Vert _{L^\infty } \). Thus we have

$$\begin{aligned} {\mathbf {B}}&\le |\Omega |^{1/2}\left( \pi _{\cdot k}^{(2)} + \max _{1\le j' \le m_1}\pi ^{1/2}_{j',k}\right) . \end{aligned}$$

Set now \(U_2 = \sum _{i_1\ne i_2}h(X_{i_1}(\cdot ,k), X_{i_2}(\cdot ,k)).\) We apply a decoupling argument (See for instance Theorem 3.4.1 page 125 in [11]) to get that there exists a constant \(C>0\), such that for any \(u>0\)

$$\begin{aligned} {\mathbb {P}}\left( \sum _{i_1\ne i_2}h\left( X_{i_1}(\cdot ,k), X_{i_2}(\cdot ,k)\right) \ge u \right) \le C {\mathbb {P}}\left( \sum _{i_1\ne i_2}h(X_{i_1}(\cdot ,k), X_{i_2}^{'}(\cdot ,k)) \ge u/C\right) , \end{aligned}$$

where \(X_i'(\cdot ,k)\) is independent of \(X_i(\cdot ,k)\) and has the same distribution as \(X_i(\cdot ,k)\). Next, Theorem 3.3 in [13] gives that, for any \(u>0\),

$$\begin{aligned} {\mathbb {P}}\left( \sum _{i_1\ne i_2}h(X_{i_1}(\cdot ,k), X_{i_2}^{'}(\cdot ,k)) \ge u\right) \le C \exp \left[ - \frac{1}{C}\min \left( \frac{u^2}{{\mathbf {C}}^2},\frac{u}{{\mathbf {D}}},\frac{u^{2/3}}{{\mathbf {B}}^{2/3}}, \frac{u^{1/2}}{{\mathbf {A}}^{1/2}}\right) \right] , \end{aligned}$$

for some absolute constant \(C>0\). Combining the last display with our bounds on \({\mathbf {A}},{\mathbf {B}},{\mathbf {C}},{\mathbf {D}}\), we get that for any \(t>0\), with probability at least \(1- 2e^{-t}\),

$$\begin{aligned} \left| \frac{1}{N^2}\sum _{i_1\ne i_2}\sum _{j= 1}^{m_1} X_{i_1}(j,k) X_{i_2}(j,k)\right|&\le \frac{|\Omega |(|\Omega | - 1)}{N^2}\pi _{\cdot k}^{(2)}\\&\qquad + \frac{C}{N^2} \left( {\mathbf {C}} t^{1/2}+ {\mathbf {D}} t + {\mathbf {B}} t^{3/2} + {\mathbf {A}} t^2 \right) \\&\le \frac{|\Omega |(|\Omega | - 1)}{N^2}\pi _{\cdot k}^{(2)}\\&\qquad + C \biggl [ \frac{|\Omega |(|\Omega | - 1)}{N^2}\pi _{\cdot k}^{(2)} t^{1/2} + \frac{|\Omega |}{N^2}\left( \pi _{\cdot k}^{(2)} \right) ^{1/2} t \\&\qquad + \frac{|\Omega |^{1/2}}{N^2}\left( \pi _{\cdot k}^{(2)} + \max _{1\le j' \le m_1}\pi ^{1/2}_{j',k}\right) t^{3/2} +\frac{ t^2}{N^2} \biggr ], \end{aligned}$$

where \(C>0\) is a numerical constant. Combining the last display with (64) we get that, for any \(t>0\) with probability at least \(1-3e^{-t}\),

$$\begin{aligned} \sum _{j=1}^{m_1}\left( \frac{1}{N} \sum _{i \in \Omega } X_i(j,k)\right) ^2&\le \frac{|\Omega |(|\Omega | - 1)}{N^2}\pi _{\cdot k}^{(2)}\\&\qquad +C \biggl [ \left( \frac{|\Omega |(|\Omega | - 1)}{N^2}\pi _{\cdot k}^{(2)} + \frac{2\sqrt{|\Omega | \pi _{\cdot k}}}{N^2}\right) t^{1/2} \\&\qquad + \frac{|\Omega |}{N^2} \pi _{\cdot k} + \left( \frac{|\Omega |}{N^2}\left( \pi _{\cdot k}^{(2)} \right) ^{1/2} + \frac{1}{N^2}\right) t\\&\qquad + \frac{|\Omega |^{1/2}}{N^2}\left( \pi _{\cdot k}^{(2)} + \max _{1\le j' \le m_1}\pi ^{1/2}_{j',k}\right) t^{3/2} +\frac{ t^2}{N^2} \biggr ]. \end{aligned}$$

Set \(\pi _{\max } = \max _{1\le k \le m_2}\{\pi _{\cdot k}\}\) and \(\pi ^{(2)}_{\max } = \max _{1\le k \le m_2}\{\pi _{\cdot k}^{(2)}\}\). Using the union bound and up to a rescaling of the constants, we get that, with probability at least \(1 - e^{t}\),

$$\begin{aligned} \Vert W\Vert _{2,\infty }^2&\le \frac{|\Omega |(|\Omega | - 1)}{N^2}\pi ^{(2)}_{\max }\\&\qquad + C \left[ \left( \frac{|\Omega |(|\Omega | - 1)}{N^2}\pi ^{(2)}_{\max } + \frac{2\sqrt{|\Omega | \pi _{\max }}}{N^2} \right) (t+\log m_2)^{1/2}\right. \\&\qquad \left. +\frac{|\Omega |}{N^2} \pi _{\max }+ \frac{|\Omega |}{N^2} \left( \pi ^{(2)}_{\max } \right) ^{1/2} (t+\log m_2) \right. \\&\qquad \left. + \frac{|\Omega |^{1/2}}{N^2}\left( \pi ^{(2)}_{\max } + \max _{j,k}\{\pi _{jk}^{1/2}\}\right) (t+\log m_2)^{3/2} +\frac{ (t+\log m_2)^2}{N^2} \right] . \end{aligned}$$

Recall that \(\vert \Omega \vert =n\) and \(\ae =N/n\). Assumption 5 and the fact that \(n\le \vert \mathcal {I}\vert \) imply that there exists a numerical constant \(C>0\) such that, with probability at least \(1 -e^{-t}\),

$$\begin{aligned} \Vert W\Vert _{2,\infty }^2&\le C \left( \frac{\gamma ^{2} }{\ae N m_2}\left( \sqrt{t+\log m_2}+(t+\log m_2)\sqrt{\frac{m_2}{n}}\right) + \frac{(t+\log m_2)^{2}}{N^{2}} \right) \end{aligned}$$

where we have used that \(\pi _{j,k} \le \pi _{\cdot k} \le \sqrt{2}\gamma /m_2\). Finally, the bound on the expectation \({\mathbb {E}} \Vert W\Vert _{2,\infty }\) follows from this result and Lemma 17.

Appendix D: Proof of Lemma 10

With the notation \(X_i(j,k) = \langle X_i, e_j(m_1)e_k(m_2)^{\top }\rangle \) we have

$$\begin{aligned} \Vert \Sigma \Vert _{\infty } = \max _{1\le j\le m_1,1\le k\le m_2} \left| \frac{1}{N}\sum _{i \in \Omega } \xi _i X_i(j,k) \right| . \end{aligned}$$

Under Assumption 3, the Orlicz norm \(\Vert \xi _i\Vert _{\psi _2}= \inf \{x>0: {\mathbb {E}}[(\xi _i/x)^2] \le e\}\) satisfies \(\Vert \xi _i\Vert _{\psi _2} \le c\sigma \) for some numerical constant \(c>0\) and all i. This and the relation (See Lemma 5.5 in [26]Footnote 1)

$$\begin{aligned} {\mathbb {E}}[|\xi _i|^\ell ] \le \frac{\ell }{2} \Gamma \left( \frac{\ell }{2}\right) \Vert \xi _i\Vert _{\psi _2}^\ell ,\quad \forall \ell \ge 1, \end{aligned}$$

imply that \(N^{-\ell }{\mathbb {E}}[|\xi _i|^\ell X_i^\ell (j,k)] = N^{-\ell } {\mathbb {E}}[X_i(j,k)] {\mathbb {E}}[|\xi _i|^\ell ]\le (\ell !/2) c^2v(c\sigma /N)^{\ell -2}\) for all \(\ell \ge 2\) and \(v=\frac{\sigma ^2\mu _1}{N^2 m_1m_2}\), where we have used the independence between \(\xi _i\) and \(X_i\), and Assumption 9. Thus, for any fixed (jk), we have

$$\begin{aligned} \sum _{i\in \Omega }{\mathbb {E}} \left[ \frac{1}{N^2} \xi _i^2 X_i^2(j,k) \right] \le |\Omega |\frac{c^2\sigma ^2\mu _1}{N^2 m_1m_2}=\frac{c^2\mu _1 \sigma ^2}{\ae N m_1m_2} =:v_1, \end{aligned}$$

and

$$\begin{aligned} \sum _{i\in \Omega }{\mathbb {E}} \left[ \frac{1}{N^\ell } |\xi _i|^\ell X_i^\ell (j,k) \right] \le \frac{\ell !}{2}v_1\left( \frac{c\sigma }{ N}\right) ^{\ell -2}. \end{aligned}$$

Thus, we can apply Bernstein’s inequality (see, e.g. [4, page 486]), which yields

$$\begin{aligned} {\mathbb {P}} \left( \left| \frac{1}{N}\sum _{i \in \Omega }\xi _i X_i(j,k) \right| > C\left( \sqrt{ \frac{\mu _1 \sigma ^2 t}{\ae N m_1m_2}} + \frac{\sigma t}{N}\right) \right) \le 2e^{-t} \end{aligned}$$

for any fixed (jk). Replacing here t by \(t+\log (m_1m_2)\) and using the union bound we obtain

$$\begin{aligned} {\mathbb {P}} \left( \left\| \Sigma \right\| _{\infty } > C\left( \sqrt{ \frac{\mu _1 (t+\log (m_1m_2))}{\ae N m_1m_2}} + \frac{(t+\log (m_1m_2))}{N}\right) \right) \le 2e^{-t}. \end{aligned}$$

The bound on \({\mathbb {E}}[\Vert \Sigma \Vert _{\infty }]\) in the statement of Lemma 10 follows from this inequality and Lemma 17. The same argument proves the bounds on \(\Vert \Sigma _R\Vert _{\infty }\) and \({\mathbb {E}} \Vert \Sigma _R\Vert _{\infty }\) in the statement of Lemma 10. By a similar (and even somewhat simpler) argument, we also get that

$$\begin{aligned} {\mathbb {P}} \left( \left\| W - {\mathbb {E}}[ W] \right\| _{\infty } > C\left( \sqrt{ \frac{\mu _1 (t+\log (m_1m_2))}{\ae N m_1m_2}} + \frac{t+\log (m_1m_2)}{ N}\right) \right) \le 2e^{-t} \end{aligned}$$

while Assumption 9 implies that \(\Vert {\mathbb {E}}[W]\Vert _{\infty } \le \frac{\mu _1}{\ae m_1m_2}\).

Appendix E: Technical Lemmas

Lemma 17

Let Y be a non-negative random variable. Let there exist \(A\ge 0\), and \(a_j>0\), \(\alpha _j>0\) for \(1\le j\le m\), such that

$$\begin{aligned} {\mathbb {P}}\left( Y> A + \sum _{j=1}^m a_j t^{\alpha _j}\right) \le e^{-t},\quad \forall t>0. \end{aligned}$$

Then

$$\begin{aligned} {\mathbb {E}} [Y] \le A + \sum _{j=1}^m a_j \alpha _j \Gamma (\alpha _j), \end{aligned}$$

where \(\Gamma (\cdot )\) is the Gamma function.

Proof

Using the change of variable \( u = \sum _{j=1}^m a_j v^{\alpha _j} \) we get

$$\begin{aligned} {\mathbb {E}} [Y]&= \int _{0}^{\infty }{\mathbb {P}}(Y>t)dt \le A + \int _{0}^{\infty }{\mathbb {P}}(Y>A + u)du\\&= A+ \int _{0}^{\infty }{\mathbb {P}}(Y>A + \sum _{j=1}^m a_j v^{\alpha _j}) \left( \sum _{j=1}^m a_j \alpha _j v^{\alpha _j-1}\right) dv\\&\le A+ \int _{0}^{\infty }\left( \sum _{j=1}^m a_j \alpha _j v^{\alpha _j-1}\right) e^{-v} dv = A+ \sum _{j=1}^m a_j \alpha _j \Gamma (\alpha _j). \end{aligned}$$

\(\square \)

Lemma 18

Assume that \(\mathcal {R}\) is an absolute norm. Then

$$\begin{aligned} \mathcal {R}^{*}\left( \frac{1}{N}\sum _{i\in \Omega } \left\langle X_i,\Delta L\right\rangle X_i\right) \le 2{\mathbf {a}}\mathcal {R}^{*}(W) \end{aligned}$$

where \(W=\frac{1}{N}\sum _{i\in \Omega }X_{i}\).

Proof

In view of the definition of \(\mathcal {R}^{*}\),

$$\begin{aligned} \frac{1}{2{\mathbf {a}}}\mathcal {R}^{*}\left( \dfrac{1}{N}\sum _{i\in \Omega } \left\langle X_i,\Delta L\right\rangle X_i\right)= & {} \underset{\mathcal {R}(B)\le 1}{\sup }\left\langle \dfrac{1}{N}\sum _{i\in \Omega } \frac{\left\langle X_i, \Delta L\right\rangle }{2{\mathbf {a}}}X_i,B\right\rangle \\\le & {} \underset{\mathcal {R}(B')\le 1}{\sup }\left\langle \dfrac{1}{N}\sum _{i\in \Omega } X_i,B'\right\rangle = \mathcal {R}^{*}(W), \end{aligned}$$

where we have used the inequalities \(\left\langle X_i,\Delta L\right\rangle \le \Vert \Delta L\Vert _{\infty }\le 2{\mathbf {a}}\), and the fact that \(\mathcal {R}\) is an absolute norm. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Klopp, O., Lounici, K. & Tsybakov, A.B. Robust matrix completion. Probab. Theory Relat. Fields 169, 523–564 (2017). https://doi.org/10.1007/s00440-016-0736-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00440-016-0736-y

Mathematics Subject Classification

Navigation