1 Introduction

The Gaussian Elimination is a classical algorithm for solving systems of linear equations [6, Chapter 3], [7, Chapter 9]. The simplest form of the algorithm—the Gaussian Elimination with no pivoting—solves a linear system (SLE) \(Ax=b\) with a square coefficient matrix A by performing the LUfactorization: A is represented as the product LU where L and U are lower and upper triangular matrix, respectively, and x is obtained by a combination of forward and back substitutions \(y:=L^{-1}b\), \(x:=U^{-1}y\). A possible algorithmic representation of this well known process is given below in Algorithm 1. The procedure produces a sequence of matrices \(A^{(0)}:=A,A^{(1)},\dots ,A^{(n-1)}=:U\), where for every \(k\le n-1\), the \(k\times k\) top left submatrix of \(A^{(k-1)}\) is upper triangular. The elimination process with no pivoting fails if at any step \(k=1,\dots ,n-1\), the k–th diagonal element of \(A^{(k-1)}\) is zero.

The computation of \(A^{(k)}\) from \(A^{(k-1)}\) can be represented in matrix form as

$$\begin{aligned} A^{(k)}:= (\textrm{Id}_n-\tau ^{(k)}e_k^\top )A^{(k-1)}, \end{aligned}$$

where

$$\begin{aligned} \tau ^{(k)}:= \sum _{j=k+1}^n \tau ^{(k)}_j e_j, \end{aligned}$$

and where \(e_1,\dots ,e_n\) are standard unit basis vectors in \({\mathbb {R}}^n\). We also note that with this notation

$$\begin{aligned} L=\textrm{Id}_n+\sum _{k=1}^{n-1} \tau ^{(k)}e_k^\top . \end{aligned}$$

The matrices \(\textrm{Id}_n-\tau ^{(k)}e_k^\top \), \(k=1,\dots ,n-1\), are called the Gauss transformations.

Algorithm 1
figure a

LU -factorization

When considering an implementation using the floating point arithmetic, a well known issue of the Gaussian Elimination is its numerical instability. Recall that the condition number \(\kappa (A)\) of a square matrix A defined as the ratio of the largest and smallest singular value of A. Even for some well-conditioned matrices (i.e having a small condition number), solving SLE with help of the Gaussian Elimination with no pivoting results in large relative errors of the computed solution vectors [6, Section 3.3].

Several modifications of the elimination procedure are commonly used in matrix computations to address the instability issue [6, Chapter 3], [7, Chapter 9]. In particular, the Gaussian Elimination with Partial Pivoting (GEPP) looks for a representation \(PA=LU\) (called PLUfactorization), where, as before, L and U are lower and upper triangular matrices, while P is a specially constructed permutation matrix. The solution of a corresponding SLE can then be obtained by a combination of forward and back substitutions, and a permutation of vector’s components (see Algorithm 2; for better readability, we represent the formula for L in matrix rather than entry-wise form there). The GEPP succeeds in exact arithmetic whenever A is non-singular (although it may fail in floating point arithmetic).

Algorithm 2
figure b

PLU -factorization

A seminal result of Wilkinson [21] gives an upper bound on the backward error during the Gaussian Elimination when the floating point computations are performed. Define the unit roundoff

$$\begin{aligned} \textbf{u}:=\frac{1}{2}\big (\hbox {the gap between } 1 \hbox { and the next floating point number}\big ), \end{aligned}$$

so that for every real number x, its floating point representation \(\textrm{fl}(x)\) satisfies \(|x-\textrm{fl}(x)|\le \textbf{u}|x|\) as long as no underflow or overflow exception arise [6, Section 2.7]. Let A be an invertible \(n\times n\) matrix, assume that GEPP in floating point arithmetic with no underflow and overflow exceptions is performed on the matrix \(\textrm{fl}(A)\), and assume that no error occurs during the computation (i.e no zero pivots are encountered). Let \({\hat{P}},{\hat{L}},{\hat{U}}\) be the computed matrices from the floating point GEPP of \(\textrm{fl}(A)\), with \({\hat{U}}_{n,n}\ne 0\), and let PLU be the PLU–factorization of A in exact arithmetic. Assume that \({\hat{P}}=P\). Further, let \({\hat{x}}\) denote the computed solution corresponding to the exact solution of the SLE \(Ax=b\). Then

$$\begin{aligned} PA={\hat{L}} {\hat{U}}+H,\quad (A+E){\hat{x}}=b \quad \hbox {(equalities hold in exact arithmetic)}, \end{aligned}$$

where

$$\begin{aligned} \Vert H\Vert = O\big (n^2\,\textbf{u}\,(\Vert A\Vert +n\Vert {\hat{U}}\Vert )\big ), \end{aligned}$$

and

$$\begin{aligned} \Vert E\Vert =O\big (n^2\,\textbf{u}\,(\Vert A\Vert +n\Vert {\hat{U}}\Vert )\big ) \end{aligned}$$

(see, in particular, [6, Theorem 3.3.1 and Theorem 3.3.2]). Define the growth factor \(\mathbf{g_{\textrm{GEPP}}}\) as

$$\begin{aligned} \mathbf{g_{\textrm{GEPP}}}(A) :=\frac{ \max _{k,i,j \in [n]} |{\hat{A}}^{(k-1)}_{i,j}|}{\max _{i,j\in [n]} |A_{i,j}|}, \end{aligned}$$
(1)

where \({\hat{A}}^{(1)},\dots ,{\hat{A}}^{(n-1)}\) denote the computed (in the floating point arithmetic) matrices \(A^{(1)},\dots \), \(A^{(n-1)}\). Then, under the above assumptions, the backward error estimate can be written as

$$\begin{aligned} \Vert E\Vert =O\big (n^4 \,\textbf{u}\,g_{\textrm{GEPP}}(A)\,\max _{i,j\in [n]} |A_{i,j}|\big ), \end{aligned}$$

implying, under the additional assumption \(s_{\min }(A)\ge 2\Vert E\Vert \), the forward error bound for the computed solution

$$\begin{aligned} \frac{\Vert {\hat{x}} - x\Vert _2}{\Vert {\hat{x}}\Vert _2} = O\big (n^4 \,\textbf{u}\,\kappa (A)\, \mathbf{g_{\textrm{GEPP}}}(A)\big ), \end{aligned}$$
(2)

where \(\kappa (A)=\Vert A\Vert \,\Vert A^{-1}\Vert \) is the condition number of A. Similar error bounds are available for other versions of the Gaussian Elimination (with no pivoting, with complete or with rook pivoting). We refer, in particular, to Wilkinson’s paper [21] and to modern accounts of the backward error analysis of the different forms of the Gaussian Elimination in [6, Chapter 3], [7, Chapter 9], as well as [8].

It can be checked that \(\textbf{g}_{\textrm{GEPP}}(A)=O(2^n)\) for any \(n\times n\) invertible matrix A, and that this bound is attained. Thus, (2) provides a satisfactory worst-case estimate only under the assumption \(\textbf{u}\ll 2^{-n}\), i.e when the unit roundoff is exponentially small in the matrix dimension. At the same time, the accumulated empirical evidence suggests that for a “typical” coefficient matrix the loss of precision is much smaller than the worst-case prediction. Let us quote [6, p. 131]: “Although there is still more to understand about [the growth factor], the consensus is that serious element growth in Gaussian Elimination with Partial Pivoting is extremely rare. The method can be used with confidence.

In [18] Trefethen and Schreiber carried out an empirical study of the Gaussian Elimination with Partial and with Complete Pivoting in the setting when the input coefficient matrix A is random, having i.i.d standard Gaussian entries. Their experiments showed that with high probability the growth factor in GEPP is only polynomially large in n. Further numerical studies by Edelman suggest that \(\textbf{g}_{\textrm{GEPP}}(A)\) of an \(n\times n\) standard Gaussian matrix A is of order \(O(n^{1/2+o(1)})\) with probability close to one (see a remark in [5, p. 182]).

An important step in improving theoretical understanding of numerical stability of the Gaussian Elimination was made by Yeung and Chan [22]. Their result implies (although that is not explicitly stated in the paper) that for the Gaussian Elimination with no pivoting applied to the standard \(n\times n\) Gaussian matrix, the relative error of the solution vector can be bounded above by \(\textbf{u}\,n^{O(1)}\) with probability close to one. A vast generalization of their estimate was obtained by Sankar ET AL. [15] in the context of the smoothed analysis of algorithms. Let M be any non-random \(n\times n\) matrix, and let G be an \(n\times n\) matrix with i.i.d \(N(0,\sigma ^2)\) Gaussian entries. The main result of [15] asserts that the expected number of bits of precision sufficient to solve \((M+G)x = b\) to m bits of accuracy using Gaussian elimination without pivoting is at most \(m+O\big (\log \big (n+\frac{\Vert M\Vert }{\sigma }\big )\big )\). This provides a theoretical justification for the observed performance of the GE with no pivoting for structured dense coefficient matrices.

The no-pivoting strategy is crucial for the proofs in [22] or [15]. With partial pivoting, the permutations of the rows after each elimination step introduce complex dependencies to the model which require other arguments to handle. In the PhD thesis [14], Sankar carried out smoothed analysis of GEPP based on certain recursive matrix formula (to be discussed in some detail in the next section). Let \(A=M+G\), where G is the Gaussian random matrix with i.i.d \(N(0,\sigma ^2)\) entries, and M is a deterministic matrix of spectral norm at most one. One of main results of [14] states that, with the above notation,

$$\begin{aligned} {{\mathbb {P}}}\bigg \{ \frac{ \max _{k,i,j \in [n]} |A^{(k-1)}_{i,j}|}{\max _{i,j\in [n]} |A_{i,j}|}\ge t\bigg \} \le \frac{\big (O\big (n\sigma ^{-1}+n^{3/2}\big )\big )^{12\log n}}{t^{(\log n)/21}},\quad t>0, \end{aligned}$$

so that in the mean zero setting \(M=0\), with high probability \(\frac{\max _{k,i,j \in [n]} |A^{(k-1)}_{i,j}|}{\max _{i,j\in [n]} |A_{i,j}|}= n^{O(\log n)}\). Note that the quantity considered in [14] is not a growth factor as was defined above but its “exact arithmetic” counterpart. The relation between matrices \(A^{(k-1)}\) and the corresponding computed matrices \({\hat{A}}^{(k-1)}\) is not trivial and will be discussed later; at this point we note that assuming that the magnitudes of the ratio \(\frac{\max _{k,i,j \in [n]} |A^{(k-1)}_{i,j}|}{\max _{i,j\in [n]} |A_{i,j}|}\) and the growth factor \(\textbf{g}_{\textrm{GEPP}}(A)\) match and in view of (2), the result of Sankar implies that with high probability GEPP results in at most \(O(\log ^2 n)\) lost bits of precision in the obtained solution vector. This bound is worse than the \(O(\log n)\) estimate for GE with no pivoting implied by [22].

To summarize, whereas strong results on average-case stability of GE with no pivoting has been obtained in the literature, the Gaussian Elimination with Partial Pivoting lacked matching theoretical guarantees, let alone justifying the common belief that GEPP tends to be more stable than GE with no pivoting. In this work, we make progress on this problem. To avoid any ambiguity, we recall all the imposed assumptions and notation:

Theorem A

There are universal constants \(C,{\tilde{C}}>1\) and a function \(\tilde{n}:[1,\infty )\rightarrow {\mathbb {N}}\) with the following property. Let \(p\ge 1\), and let \(n\ge {\tilde{n}}(p)\).

  • Assume that the floating point computations with no underflow and overflow exceptions and with a unit roundoff \(\textbf{u}\) are being performed.

  • Let A be the random \(n\times n\) matrix with i.i.d standard Gaussian entries, (the real Ginibre Ensemble). Assume that the floating point GEPP is performed on the matrix \(\textrm{fl}(A)\).

Then with probability at least \(1-\textbf{u}^{1/8}\,n^{{\tilde{C}}}\), the GEPP for \(\textrm{fl}(A)\) succeeds in floating point arithmetic and the computed permutation matrix \({\hat{P}}\) agrees with the matrix P from the PLU–factorization of A in exact arithmetic. Furthermore, assuming \(\textbf{u}^{1/8}\,n^{{\tilde{C}}}\le 1/2\),

$$\begin{aligned}{} & {} {{\mathbb {P}}}\big \{\textbf{g}_{\textrm{GEPP}}(A) \ge n^{t}\;\big |\;\hbox {GEPP succeeds in f.p. arithmetic and }{\hat{P}}=P \big \} \le 40^p\,n^{-p t},\\{} & {} \quad t\ge Cp^2. \end{aligned}$$

We do not attempt to compute the constant C in the above theorem explicitly and leave the problem of finding an optimal (up to \(n^{o_n(1)}\) multiple) estimate of the growth factor \(\textbf{g}_\textrm{GEPP}(A)\) for future research (see Sect. 8). Further, we expect a much stronger bound on the probability that GEPP succeeds in the floating point arithmetic and that \({\hat{P}}=P\).

In view of the aforementioned work of Wilkinson and well known estimates for the condition number of the Gaussian matrix [4, 17], the theorem implies that with probability close to one the number of bits of precision sufficient to solve \(Ax = b\) to m bits of accuracy using GEPP is \(m+O(\log n)\). We conjecture that this bound is optimal in the sense that in the same setting \(m+\Omega (\log n)\) bits of precision are necessary with probability close to one.

Let us further apply Theorem A to compare numerical stability of GEPP with that of GE with no pivoting. As we mentioned at the beginning of the introduction, the Gaussian Elimination with no pivoting can produce arbitrarily large relative error in the floating point arithmetic even for well-conditioned coefficient matrices. As an illustration, consider a \(2\times 2\) standard Gaussian matrix in floating point arithmetic,

$$\begin{aligned} M=\begin{pmatrix} \textrm{fl}(g_{11}) &{}\quad \textrm{fl}(g_{12})\\ \textrm{fl}(g_{21}) &{}\quad \textrm{fl}(g_{22}) \end{pmatrix}. \end{aligned}$$

The Gaussian Elimination with no pivoting yields the computed LU–factorization of M,

$$\begin{aligned} {\hat{L}}=\begin{pmatrix} 1 &{}\quad 0\\ \textrm{fl}\big (\textrm{fl}(g_{21})/\textrm{fl}(g_{11})\big ) &{}\quad 1 \end{pmatrix};\quad {\hat{U}}=\begin{pmatrix} \textrm{fl}(g_{11}) &{}\quad \textrm{fl}(g_{12})\\ 0 &{}\quad \textrm{fl}\big (\textrm{fl}(g_{22})-\textrm{fl}(g_{12})\cdot \textrm{fl}(g_{21})/\textrm{fl}(g_{11})\big ) \end{pmatrix}. \end{aligned}$$

It can be checked that for every \(\varepsilon \in (\textbf{u},1)\), with probability \(\Theta (\varepsilon )\) all of the following holds:

  • The matrix M is well-conditioned, say, \(\kappa (M)\le 100\);

  • \(|\textrm{fl}(g_{11})|\le \varepsilon \), \(|\textrm{fl}(g_{12})|,|\textrm{fl}(g_{21})|,|\textrm{fl}(g_{22})|\in [1/2,2]\);

  • \(\big |\textrm{fl}\big (\textrm{fl}(g_{22})-\textrm{fl}(g_{12})\cdot \textrm{fl}(g_{21})/\textrm{fl}(g_{11})\big ) -\big (\textrm{fl}(g_{22})-\textrm{fl}(g_{12})\cdot \textrm{fl}(g_{21})/\textrm{fl}(g_{11})\big ) \big |\ge \Omega (\textbf{u})\,\big (\textrm{fl}(g_{22})-\textrm{fl}(g_{12})\cdot \textrm{fl}(g_{21})/\textrm{fl}(g_{11})\big )\).

With the above conditions, the bottom right element of the product \({\hat{L}}{\hat{U}}\) differs from \(\textrm{fl}(g_{22})\) by a quantity of order \(\Omega (\textbf{u})\,\big (\textrm{fl}(g_{22})-\textrm{fl}(g_{12})\cdot \textrm{fl}(g_{21})/\textrm{fl}(g_{11})\big )\), that is, the normwise backward error satisfies

$$\begin{aligned} {{\mathbb {P}}}\big \{\Vert {\hat{L}}{\hat{U}}-M\Vert > c\textbf{u}\,\Vert M\Vert /\varepsilon \;\big |\; \kappa (M)\le 100\big \}\ge c\varepsilon ,\quad \varepsilon \in (\textbf{u},1), \end{aligned}$$

for some universal constant \(c>0\) (one may safely take \(c=1/100\), say).

In sharp contrast with the above observation, in the case of GEPP the probability of large deviations for the backward error is much smaller as Theorem A shows. Indeed, with the notation from the theorem and in view of Wilkinson’s bound, arbitrary \(p\ge 1\) and assuming n is sufficiently large, we have

$$\begin{aligned} {{\mathbb {P}}}\big \{ \Vert {\hat{L}}{\hat{U}}-{\hat{P}} A\Vert > C'\textbf{u}\,n^{4}\Vert A\Vert /\varepsilon \big \}\le 40^p\varepsilon ^p+\textbf{u}\,n^C,\; \varepsilon \in (0, n^{-Cp^2}], \end{aligned}$$

for a universal constant \(C'>0\). Thus, the tail of the distribution of the backward error of GEPP decays superpolynomially. Informally, the “proportion” of well-conditioned coefficient matrices yielding large backward errors is much smaller for GEPP than for the Gaussian Elimination with no pivoting.

We provide a detailed outline of the argument, as well as a comparison of our techniques with the earlier approach of Sankar, in the next section.

The following notation will be used throughout the paper:

For positive integers \(m\le n\),

[n]:

is the set \(\{1,2,3,\dots , n\}\)

[mn]:

is the set \(\{m,m+1,\dots , n\}\)

For a \(n\times m\) matrix M, indices \(i\in [n]\), \(j\in [m]\), and non-empty subsets \(I \subset [n]\) and \(J \subset [m]\),

\(M_{I,J}\):

is the submatrix of M formed by taking rows indexed over I and columns indexed over J. When \(I =\{i\}\) or \(J=\{j\}\), we will use lighter notations \(M_{i,J}\) and \(M_{I,j}\) in place of \(M_{\{i\},J}\) and \(M_{I,\{j\}}\)

\(M_{i,j}\):

is the (ij)th entry of M

\(s_j(M)\):

is the jth largest singular value of M

\({\mathbb {R}}^I\):

The |I|-dimensional Euclidean space with components indexed over I

\(\textrm{dist}(\cdot ,\cdot )\):

The Euclidean distance

2 Outline of the proof

Let A be an \(n\times n\) standard Gaussian matrix, let \(A^{(0)}:=A,A^{(1)},\dots ,A^{(n-1)}\) be the sequence of matrices generated by GEPP process, and let \(P^{(1)},\dots ,P^{(n-1)}\) be the corresponding permutation matrices (see Algorithm 2). It turns out that in our probabilistic model, estimating the growth factor \(\textbf{g}_{\textrm{GEPP}}(A)\) can be reduced to bounding the exact arithmetic counterpart of the quantity,

$$\begin{aligned} \frac{\max _{k,i,j \in [n]} |A^{(k-1)}_{i,j}|}{\max _{i,j\in [n]} |A_{i,j}|}. \end{aligned}$$

Our main focus is to derive Proposition 6.13, which is the exact arithmetic counterpart of the main theorem, and then reduce the setting of floating-point arithmetic to exact arithmetic. We provide a rigorous account of the reduction procedure in Sect. 7, and prefer to avoid discussing this technical matter here. We only note that comparison of the matrices \(A^{(k-1)}\) and \({\hat{A}}^{(k-1)}\), \(1\le k\le n-1\), is based on a well established inductive argument somewhat similar to the one used to prove Wilkinson’s backward error bound. From now on and till Sect. 7 we work in exact arithmetic unless explicitly stated otherwise.

Define “unpermuted” matrices \({\mathcal {M}}^{(k)}\) obtained at the kth elimination step, i.e \({\mathcal {M}}^{(0)}:=A\) and

$$\begin{aligned} {\mathcal {M}}^{(k)} = \big (P^{(1)}\big )^{-1} \big (P^{(2)}\big )^{-1} \cdots \big (P^{(k)}\big )^{-1}\, A^{(k)},\quad 1\le k\le n-1. \end{aligned}$$
(3)

Let \(I_0:=\emptyset \), and for each \(1\le k\le n-1\) let \(I_k=I_k(A)\) be the (random) subset of [n] of row indices of A corresponding to the pivot elements used in the first k steps of the “permutation-free” elimination process. Notice that within the kth column of \({{\mathcal {M}}}^{(k)}\), the components except those in \(I_{k-1}\) and the kth pivot element, are all zeros. Therefore, the set \(I_k\) can be defined as

$$\begin{aligned} I_k:=I_{k-1}\cup \big \{i\in [n]{\setminus } I_{k-1}:\;{\mathcal {M}}^{(k)}_{ik}\ne 0\big \},\;\;1\le k\le n-1, \end{aligned}$$

where \(\big \{i\in [n]{\setminus } I_{k-1}:\;{\mathcal {M}}^{(k)}_{ik}\ne 0\big \}\) is a singleton. We will further denote by \(i_k=i_k(A)\), \(1\le k\le n-1\), the elements in the singletons \(I_k{\setminus } I_{k-1}\), so that \(I_k = \{i_1,i_2,\dots , i_k \}\), \(1\le k\le n-1\).

For \(1\le k\le n-1\) and \(t \in [k]\), the first \(t-1\) components of \(\textrm{row}_{i_t}({\mathcal {M}}^{(k)})\) are zeros, and \({\mathcal {M}}^{(k)}_{ [n]\backslash I_k, [k] }\) is the zero matrix; more specifically, for each \(1 \le k \le n-1\), and \(j \in [n]\backslash I_k\),

$$\begin{aligned} \textrm{row}_j\big ( {\mathcal {M}}^{(k)}_{[n],[k]} \big )&= 0 \quad \hbox { and } \nonumber \\ \textrm{row}_j\big ({\mathcal {M}}^{(k)}_{[n],[k+1,n]}\big )&= \textrm{row}_j \big ( A_{[n],[k+1,n]} \big ) - \textrm{row}_j(A_{[n],[k]}) \big (A_{I_k,[k]} \big )^{-1} A_{I_k, [k+1,n]} \end{aligned}$$
(4)

(see, in particular, [15, Formula 4.1] for GE with no pivoting, which can be adapted to our setting). Thus, for \(0\le k<n-1\), the index \(i_{k+1}\) is defined as the one corresponding to the largest number among

$$\begin{aligned} A_{j,k+1} - A_{j,[k]} \big (A_{I_k,[k]}\big )^{-1} A_{I_k, k+1}, \, \, j \in [n]\backslash I_k. \end{aligned}$$

Due to strong concentration of Gaussian variables, the operator norms of matrices \(A_{I,J}\), \(I,J\subset [n]\), can be uniformly bounded from above by a polynomial in n. Thus, the principal difficulty in obtaining satisfactory upper bounds on the growth factor \(\textbf{g}_{\textrm{GEPP}}(A)\) is in estimating the norm of vectors \(\textrm{row}_j(A_{[n],[k]}) \big (A_{I_k,[k]} \big )^{-1}\), \(j \in [n]\backslash I_k\). The sets \(I_k\) are random and depend on A in a rather complicated way. At the same time, the trivial upper bound

$$\begin{aligned} \max \limits _{j\in [n]\backslash I_k} \big \Vert \textrm{row}_j(A_{[n],[k]}) \big (A_{I_k,[k]} \big )^{-1}\big \Vert _2 \le \max \limits _{J\subset [n],\,|J|=k;\,j\in [n]{\setminus } J} \big \Vert \textrm{row}_j(A_{[n],[k]}) \big (A_{J,[k]} \big )^{-1}\big \Vert _2 \end{aligned}$$

which completely eliminates the randomness of \(I_k\) from consideration, is vastly suboptimal.Footnote 1

The first part of this section is devoted to the argument of Sankar from [14] which yields a bound \(\textbf{g}_\textrm{GEPP}(A)=O(n^{C\log n})\) with high probability using certain recursive matrix formula. In the second part, we discuss our approach.

2.1 Sankar’s argument

Consider a block matrix

$$\begin{aligned} \begin{bmatrix} B \\ X \end{bmatrix}= \begin{bmatrix} B_{\textrm{u}} \\ B_{\mathrm{\ell }} \\ X \end{bmatrix} = \begin{bmatrix} B_{ \mathrm u\ell } &{} B_{ \mathrm ur} \\ B_{\mathrm{\ell \ell }} &{} B_{ \mathrm \ell r} \\ X_{\mathrm{\ell }} &{} X_{\textrm{r}} \end{bmatrix}, \end{aligned}$$

where \(B_{\mathrm{u\ell }}\) and \(B_{\mathrm{\ell r}}\) are square non-singular matrices and \(X\) is a row vector. Then, denoting \(B':= B_{\mathrm{\ell r}} - B_{\mathrm{\ell \ell }}B_{\mathrm{u\ell }}^{-1} B_{\textrm{ur}}\) and \(X':= X_{\textrm{r}} - X_{\mathrm{\ell }} B_{\mathrm{u\ell }}^{-1} B_\textrm{ur}\),

$$\begin{aligned} \begin{bmatrix} -XB^{-1}&1 \end{bmatrix} =&\begin{bmatrix} -X'(B')^{-1}&1 \end{bmatrix} \cdot \begin{bmatrix} -\begin{bmatrix} B_{\mathrm{\ell }} \\ X \end{bmatrix} \cdot B_{\textrm{u}}^\dagger&\textrm{Id}\end{bmatrix}, \end{aligned}$$
(5)

where \(B_{\textrm{u}}^\dagger \) is the right pseudoinverse of \(B_{\textrm{u}}\) (see [14, Chapter 3]).

The above formula is applied in [14] in a recursive manner. Assume for simplicity of exposition that we are interested in bounding the Euclidean norm of the vector \(A _{j,[n/2]} (A_{I_{n/2},[n/2]})^{-1}\) for some \(j\in [n]{\setminus } I_{n/2}\) (recall that, in view of (4) and standard concentration estimates for the spectral norm of Gaussian matrices, this would immediately imply an estimate on the components of \({\mathcal {M}}^{(n/2)}_{j,[n/2+1,n]}\)). Fix for a moment any \(0\le v< m < n/2\), and let \(B:= {\mathcal {M}}^{(v)}_{ I_{n/2}{\setminus } I_v, [v+1,n/2]}\) and \(X:= {\mathcal {M}}^{(v)}_{j, [v+1,n/2]}\). We write

$$\begin{aligned} \begin{bmatrix} B \\ X \end{bmatrix}= \begin{bmatrix} B_{\textrm{u}} \\ B_{\mathrm{\ell }} \\ X \end{bmatrix} = \begin{bmatrix} B_{ \mathrm u\ell } &{} B_{ \mathrm ur} \\ B_{\mathrm{\ell \ell }} &{} B_{ \mathrm \ell r} \\ X_{\mathrm{\ell }} &{} X_{\textrm{r}} \end{bmatrix}&= \begin{bmatrix} {\mathcal {M}}^{(v)}_{I_{m}{\setminus } I_v,[v+1,m]} &{} {\mathcal {M}}^{(v)}_{ I_{m}{\setminus } I_v,[m+1,n/2]} \\ {\mathcal {M}}^{(v)}_{I_{n/2}{\setminus } I_m, [v+1,m]} &{} {\mathcal {M}}^{(v)}_{I_{n/2}{\setminus } I_m,[m+1,n/2]} \\ {\mathcal {M}}^{(v)}_{j,[v+1,m] } &{} {\mathcal {M}}^{(v)}_{j, [m+1,n/2]} \end{bmatrix}. \end{aligned}$$

It can be checked that with the above notation, \(B'= {\mathcal {M}}^{(m)}_{I_{n/2}{\setminus } I_m,[m+1,n/2]}\) and \(X' = {\mathcal {M}}^{(m)}_{j,[m+1,n/2]}\) [14]. Relation (5) then implies

$$\begin{aligned}&\left\| \begin{bmatrix} {\mathcal {M}}^{(v)} _{j,[v+1,n/2]} ({\mathcal {M}}^{(v)}_{I_{n/2}{\setminus } I_v,[v+1,n/2]})^{-1}&1 \end{bmatrix}\right\| \\&\quad \le \left\| \begin{bmatrix} {\mathcal {M}}^{(m)}_{j,[m+1,n/2]} ({\mathcal {M}}^{(m)}_{I_{n/2}{\setminus } I_m,[m+1,n/2]})^{-1}&1 \end{bmatrix}\right\| \cdot \\&\qquad \cdot \left\| \begin{bmatrix}{\mathcal {M}}^{(v)}_{ (I_{n/2}{\setminus } I_m)\cup \{j\}, [v+1,n/2]} \big ( {\mathcal {M}}^{(v)}_{I_{m}{\setminus } I_v, [v+1,n/2]} \big )^\dagger&\textrm{Id}_{n/2-m+1} \end{bmatrix}\right\| . \end{aligned}$$

Now, assume that we have constructed a sequence of indices \(0=k_0<k_1<k_2<\cdots k_s <n/2\), with \(k_s=n/2-1\) and \(k_1\ge n/4\). Applying the last relation recursively s times, we obtain

$$\begin{aligned}&\left\| \begin{bmatrix} A_{j,[n/2]} (A_{I_{n/2},[n/2]})^{-1}&1 \end{bmatrix}\right\| \nonumber \\&\quad \le \left\| \begin{bmatrix} {\mathcal {M}}^{(n/2-1)}_{j,n/2} ({\mathcal {M}}^{(n/2-1)}_{i_{n/2},n/2})^{-1}&1 \end{bmatrix}\right\| \cdot \nonumber \\&\qquad \cdot \prod _{\ell =0}^{s-1} \left\| \begin{bmatrix}{\mathcal {M}}^{(k_\ell )}_{ (I_{n/2}{\setminus } I_{k_{\ell +1}})\cup \{j\}, [k_\ell +1,n/2]} \big ( {\mathcal {M}}^{(k_\ell )}_{I_{k_{\ell +1}}{\setminus } I_{k_\ell }, [k_\ell +1,n/2]} \big )^\dagger&\textrm{Id}_{n/2-k_{\ell +1}+1} \end{bmatrix}\right\| , \end{aligned}$$
(6)

where by the definition of the partial pivoting, \(\big |{\mathcal {M}}^{(n/2-1)}_{j,n/2} ({\mathcal {M}}^{(n/2-1)}_{i_{n/2},n/2})^{-1}\big |\le 1\). Therefore, the problem reduces to estimating the spectral norms of matrices

$$\begin{aligned} {\mathcal {M}}^{(k_\ell )}_{ (I_{n/2}{\setminus } I_{k_{\ell +1}})\cup \{j\}, [k_\ell +1,n/2]} \big ( {\mathcal {M}}^{(k_\ell )}_{I_{k_{\ell +1}}{\setminus } I_{k_\ell }, [k_\ell +1,n/2]} \big )^\dagger ,\;\;0\le \ell <s. \end{aligned}$$
(7)

Sankar shows that as long as \(n/2-k_\ell \) (\(\ell =s,s-1,\dots ,0\)) grow as a geometric sequence (in which case s should be of order logarithmic in n), the norm of each matrix can be bounded by a constant power of n with a large probability. We only sketch this part of the argument. Fix any \(0\le \ell <s\), and define \(Z:= (A_{I_{k_\ell },[k_\ell +1,n/2]})^\dagger A_{I_{k_\ell },[k_\ell ]}\), so that \(ZZ^\dagger = \textrm{Id}\), and

$$\begin{aligned}&{\mathcal {M}}^{(k_\ell )}_{ (I_{n/2}{\setminus } I_{k_{\ell +1}})\cup \{j\}, [k_\ell +1,n/2]} \big ( {\mathcal {M}}^{(k_\ell )}_{I_{k_{\ell +1}}{\setminus } I_{k_\ell }, [k_\ell +1,n/2]} \big )^\dagger \\&\quad = {\mathcal {M}}^{(k_\ell )}_{ (I_{n/2}{\setminus } I_{k_{\ell +1}})\cup \{j\}, [k_\ell +1,n/2]}\, Z\big ( {\mathcal {M}}^{(k_\ell )}_{I_{k_{\ell +1}}{\setminus } I_{k_\ell }, [k_\ell +1,n/2]} Z\big )^\dagger , \end{aligned}$$

where, in view of (4),

$$\begin{aligned}&{\mathcal {M}}^{(k_\ell )}_{ (I_{n/2}{\setminus } I_{k_{\ell +1}})\cup \{j\}, [k_\ell +1,n/2]}\, Z\\&\quad = A_{(I_{n/2}{\setminus } I_{k_{\ell +1}})\cup \{j\}, [k_\ell +1,n/2]}\, (A_{I_{k_\ell },[k_\ell +1,n/2]})^\dagger \, A_{I_{k_\ell },[k_\ell ]} - A_{(I_{n/2}{\setminus } I_{k_{\ell +1}})\cup \{j\}, [k_\ell ]}\, Z^\dagger Z. \end{aligned}$$

Since \(Z^{\dagger }Z\) is a projection and has unit norm, an upper bound on \(\big \Vert (A_{I_{k_\ell },[k_\ell +1,n/2]})^\dagger \big \Vert \) would provide a bound on \(\big \Vert {\mathcal {M}}^{(k_\ell )}_{ (I_{n/2}{\setminus } I_{k_{\ell +1}})\cup \{j\}, [k_\ell +1,n/2]} Z \big \Vert \). The key observation here is that \(A_{I_{k_\ell },[k_\ell +1,n/2]}\) is equidistributed with the standard \(k_\ell \times (n/2-k_\ell )\) Gaussian matrix, so that a satisfactory estimate on the norm of the pseudoinverse follows.

Bounding the operator norm of \(\big ( {\mathcal {M}}^{(k_\ell )}_{I_{k_{\ell +1}}{\setminus } I_{k_\ell }, [k_\ell +1,n/2]} Z\big )^\dagger \) is more involved. Note that, equivalently, it is sufficient to provide a good lower bound on the smallest singular value of the matrix

$$\begin{aligned} \big ({\mathcal {M}}^{(k_\ell )}_{I_{k_{\ell +1}}{\setminus } I_{k_\ell }, [k_\ell +1,n/2]} Z\big )^\top . \end{aligned}$$

We have

$$\begin{aligned} s_{\min }\big (\big ({\mathcal {M}}^{(k_\ell )}_{I_{k_{\ell +1}}{\setminus } I_{k_\ell }, [k_\ell +1,n/2]} Z\big )^\top \big ) \ge \min _{J \subset [n] \backslash I_{k_\ell }, |J|=k_{\ell +1}-k_{\ell }} s_{\min } \big (({\mathcal {M}}_{J,[k_\ell +1,n/2]}^{(k_\ell )}Z)^\top \big ), \end{aligned}$$
(8)

where, again in view of (4), for each admissible \(J\),

$$\begin{aligned} ({\mathcal {M}}_{J,[k_\ell +1,n/2]}^{(k_\ell )}Z)^\top = Z^\top (A_{J,[k_\ell +1,n/2]})^\top - Z^{\dagger }Z (A_{J,[k_\ell ]})^\top , \end{aligned}$$
(9)

and where \(Z^{\dagger }Z\) is a \(k_\ell \times k_\ell \) orthogonal projection matrix of rank \(n/2-k_\ell \). Although \((A_{J,[k_\ell ]})^\top \) is dependent on Z, it can be shown that \(Z^{\dagger }Z (A_{J,[k_\ell ]})^\top \) behaves “almost” like \(Z^{\dagger }Z\) applied to an independent tall rectangular \(k_\ell \times (k_{\ell +1}-k_\ell )\) Gaussian matrix (see [14, Section 3.7]). This allows to obtain probabilistic estimates on the smallest singular value of the matrix in (9) which, under the assumption that the sequence \(n/2-k_\ell \) (\(\ell =s,s-1,\dots ,0\)) does not grow too fast, turn out to be strong enough to survive the union bound in (8).

To summarize, the above argument gives a polynomial in n estimate for matrices in (7), where s is logarithmic in n. Thus, (6) implies a bound \(\Vert {\mathcal {M}}^{(n/2)}_{j,[n/2+1,n]}\Vert _2=n^{O(\log n)}\), \(j\in [n]{\setminus } I_{n/2}\), with high probability. An extension of this argument to all \({\mathcal {M}}^{(k)}\), \(1\le k\le n-1\), yields \(\textrm{g}_{\textrm{GEPP}}(A)=n^{O(\log n)}\). As Sankar notes in [14], a different choice of s and of the sequence \(k_0,k_1,\dots ,k_s\), and a refined analysis for the operator norms of matrices (7) may improve the upper estimate on the growth factor, but cannot achieve a polynomial bound.

2.2 High-level structure of the proof of the main theorem

Returning to relation (4), a polynomial bound on the growth factor \(\textrm{g}_{\textrm{GEPP}}(A)\) will follow as long as the norm \((A_{I_r,[r]})^{-1}\) is bounded by \(n^{O(1)}\) for every \(1\le r\le n-1\) with high probability. We obtain this estimate via analysis of the entire singular spectrum of \(A_{I_r,[r]}\) rather than attempting to directly bound the smallest singular value of the matrix.

The strategy of the proof can be itemized as follows:

  • Obtaining estimates on the singular values of partially random block matrices. More specifically, we consider matrices of the form

    $$\begin{aligned} B= \begin{bmatrix} F &{}\quad M \\ W &{}\quad Q \end{bmatrix}, \end{aligned}$$
    (10)

    where F is a fixed square matrix with prescribed singular spectrum, and MWQ are independent Gaussian random matrices of compatible dimensions. Our goal here is to derive lower bounds on the intermediate singular values of B in terms of singular values of F.

  • Applying the estimates on the intermediate singular values of partially random block matrices in a recursive manner together with a union bound argument, derive lower bounds on the “smallish” singular values of matrices \(A_{I_r,[r]}\). Our argument at this step only allows to bound first to \((r-C)\)th singular value of the matrix for some large constant C.

  • Use the bound on \(s_{r-C}(A_{I_r,[r]})\) together with the information on the Euclidean distances from \(\textrm{row}_{i_\ell }(A_{I_r,[r]})\) to \(\mathrm{span\,}\{\textrm{row}_{i_j}(A_{I_r,[r]}),\;1\le j<\ell \}\), \(\ell =1,\dots ,r\) that can be extracted from the partial pivoting rule, to obtain polynomial in n lower bounds on \(s_{\min }(A_{I_r,[r]})\).

Below, we discuss each component in more detail.

Singular spectrum of partially random block matrices The partially random block matrices are treated in Sect. 3 of the paper. Consider a block matrix B of type (10), where F is a fixed \(r\times r\) matrix, M is \(r\times x\), W is \(x\times r\), Q is \(x\times x\) (with \(x\le r\)), and the entries of M, W, Q are mutually independent standard Gaussian variables. In view of rotational invariance of the Gaussian distribution, we can “replace” F with a diagonal matrix D with the same singular spectrum, and with its singular values on the main diagonal arranged in a non-decreasing order. We fix a small positive parameter \(\tilde{\varepsilon }>0\) and an integer \(i\ge 1\) such that \(\tilde{\varepsilon }(1+\tilde{\varepsilon })^{-i}r\ge 2\). Our goal at this point is to estimate from below the singular value

$$\begin{aligned} s_{\lfloor (1-(1+\tilde{\varepsilon })^{-i-1})(r+x)\rfloor }(B). \end{aligned}$$

Having chosen a certain small threshold \(\tau >0\) (which is defined as a function of \(i,r,\tilde{\varepsilon }\), the singular spectrum of D, and some other parameters which we are not discussing here), our estimation strategy splits into two cases depending on whether the number \(\ell _{i+1}'\) of the singular values of D less than \(\tau \) is “small” or “large”. In the former case, the matrix D has a well controlled singular spectrum, and our goal is to show that attaching to it x rows and columns of standard Gaussians cannot deteriorate the singular values estimates. In the latter case, we show that by adding the Gaussian rows and columns we actually improve the control of the singular values, using that the top left \(\ell _{i+1}'\times \ell _{i+1}'\) corner of B is essentially a zero matrix. The main result of Sect. 3—Proposition 3.3—provides a probability estimate on the event that the ratio

$$\begin{aligned} \frac{s_{\lfloor (1-(1+\tilde{\varepsilon })^{-i-1})(r+x)\rfloor }(B)}{\tau } \end{aligned}$$

is small assuming certain additional relations between the parameters \(x,r,\tilde{\varepsilon }\).

A recursive argument to bound \(s_{r-C}(A_{I_r,[r]})\) The treatment of the partially non-random block matrices allows us to solve the principal problem with estimating the singular spectrum of \(A_{I_r,[r]}\), namely, the complicated dependencies between A and the index set \(I_r\). As we mentioned before, simply bounding the kth smallest singular value of \(A_{I_r,[r]}\) by \(\min \limits _{I\subset [n],\,|I|=r}s_{r-k}(A_{I_r,[r]})\) produces an unsatisfactory estimate for small k. On the other hand, in view of strong concentration of intermediate singular values, already for \(k\gg \sqrt{n}\,\textrm{polylog}(n)\) (see Proposition 3.2) this straightforward union bound argument does work. In order to boost the union bound argument to smaller k, we avoid taking the union bound over all \(I\subset [n],\,|I|=r\) and instead condition on a realization of \(I_{r'}\) for certain \(r'<r\), so that the union bound over all \(I_{r'}\subset I\subset [n]\) of cardinality r runs over only \({n-r'\atopwithdelims ()r-r'}\) admissible subsets rather than \({n\atopwithdelims ()r}\) subsets. The two main issues with this approach are

  • first, we must have estimates for the singular spectrum of \(A_{I_{r'},[r']}\) in order to apply the results of Sect. 3 to obtain bounds for the singular values of \(A_{I_r,[r]}\), and,

  • second, conditioning on a realization of \(I_{r'}\) inevitably destroys Gaussianity and mutual independence of the entries of \(A_{[n]{\setminus } I_{r'},[r']}\).

The first issue is resolved through the inductive argument, when estimates on the spectrum of \(A_{I_{r'},[r']}\) obtained at the last induction step are used to control the singular spectrum of \(A_{I_r,[r]}\) at the next step. Of course, in this argument we must make sure that the total error accumulated throughout the induction process stays bounded by a constant power of n.

The second issue with probabilistic dependencies is resolved by observing that the partial pivoting “cuts” a not too large set of admissible values for the elements in \(A_{[n]{\setminus } I_{r'},[r']}\) i.e we can continue treating them as independent Gaussians up to a manageable loss in the resulting probability estimate after conditioning on a certain event of not-too-small probability. This problem is formally treated by studying the random polytopes \(K_{r'}(A)\subset {\mathbb {R}}^n\) defined in Sect. 4 as

$$\begin{aligned} K_{r'}(A):= \big \{ x \in {\mathbb {R}}^n \,:\, \forall s \in [r'],\, | \langle v_s(A),\, x \rangle | \le | \langle v_s(A),\, (A_{i_s,[n]})^\top \rangle | \big \}, \end{aligned}$$

where

$$\begin{aligned} v_s(A):= \left( ((A_{I_{s-1},[s-1]})^{-1} A_{I_{s-1},s})^\top ,\, 1,\, \underbrace{0, \dots , 0}_{ n-s \hbox { components } } \right) ^\top ,\quad s=1,2,\dots ,r'. \end{aligned}$$

By the nature of the partial pivoting process, any row of the submatrix \(A_{[n]{\setminus } I_r',[n]}\) necessarily lies within the polytope \(K_{r'}(A)\), and its distribution is a restriction of the standard Gaussian measure in \({\mathbb {R}}^n\) to \(K_{r'}(A)\) (see Sect. 4 for a rigorous description). After showing that the Gaussian measure of \(K_{r'}(A)\) is typically “not very small”, we can work with the rows of \(A_{[n]{\setminus } I_r',[n]}\) as if they were standard Gaussian vectors, up to conditioning on an event of a not very small probability. We remark here that Sankar’s work [14] uses random polytopes related to our construction.

Estimating the smallest singular value of \(A_{I_r,[r]}\) To simplify the discussion, we will only describe the idea of showing that with a “sufficiently high” probability, \(\big (s_{\min }(A_{I_r,[r]})\big )^{-1}=n^{O(1)}\), without considering computation of the moments of \(\big (s_{\min }(A_{I_r,[r]})\big )^{-1}\). As a corollary of the lower bound on \(s_{r'-C}(A_{I_{r'},[r']})\) obtained via the recursive argument, we get that with high probability, the inverse of the smallest singular value of the rectangular matrix \(A_{I_{r'},[r'+2{\tilde{C}}]}\) satisfies \(\big (s_{\min }(A_{I_{r'},[r'+2{\tilde{C}}]})\big )^{-1}=n^{O(1)}\), \(r'\in [{\tilde{C}}+1, n-2{\tilde{C}}]\), for some integer constant \(\tilde{C}>0\) (see Corollary 5.3). This corollary is a quantitative version of a rather general observation that, by adding at least \(\ell +1\) independent Gaussian rows to a fixed square matrix with at most \(\ell \) zero singular values, we get a rectangular matrix with a strictly positive \(s_{\min }\) almost surely.

Once a satisfactory bound on \(s_{\min }(A_{I_{r'},[r'+2{\tilde{C}}]})\), \(r'\in [{\tilde{C}}+1, n-2{\tilde{C}}]\), is obtained, we rely on the simple deterministic relation between the smallest singular value and distances to rowspaces: for every \(m\times k\) matrix Q,

$$\begin{aligned} \min _{i\in [m]} \textrm{dist}(H_i(Q),Q_{i,[m]}) \ge s_{\min }(Q^\top ) \ge m^{-1/2} \min _{i\in [m]} \textrm{dist}(H_i(Q),Q_{i,[m]}) \end{aligned}$$

where \(H_i(Q)\) denotes the subspace spanned by row vectors \(Q_{j,[m]}\) for \(j\ne i\). In our context, a strong probabilistic lower bound on \(\textrm{dist}( \mathrm{span\,}\{A_{i_t,[r]},\,1\le t<s\}, A_{i_s,[r]})\) (\(s\le r\)) guaranteed by the partial pivoting strategy, implies that with high probability for every \(t\in [r-2{\tilde{C}}]\),

$$\begin{aligned}{} & {} \textrm{dist}\big (\mathrm{span\,}\{A_{i_s,[r]},\,s\in [r-2{\tilde{C}}]{\setminus }\{t\}\}, A_{i_t,[r]}\big ) \\{} & {} \quad \le n^{O(1)}\textrm{dist}\big (\mathrm{span\,}\{A_{i_s,[r]},\,s\in [r]{\setminus }\{t\}\}, A_{i_t,[r]}\big ) \end{aligned}$$

(see proof of Proposition 6.9), and via the above deterministic relation to the singular values,

$$\begin{aligned} s_{\min }(A_{I_{r-2{\tilde{C}}},[r]})\le n^{O(1)}\textrm{dist}\big (\mathrm{span\,}\{A_{i_s,[r]},\,s\in [r]{\setminus }\{t\}\}, A_{i_t,[r]}\big ). \end{aligned}$$

This, combined with some auxiliary arguments, implies the lower bound on \(s_{\min }(A_{I_r,[r]})\).

3 Intermediate singular values of partially random block matrices

We start with a preparatory material to deal with norms and intermediate singular values of random matrices. We first consider a standard deviation estimates for the Hilbert–Schmidt norm of a Gaussian random matrix; see, for example, [1]:

Theorem 3.1

Let G be an \(u\times t\) random matrix with i.i.d standard Gaussian entries. Then

$$\begin{aligned} {{\mathbb {P}}}\big \{\Vert G\Vert _{HS}\ge \sqrt{ut}+s\big \}\le 2\exp (-cs^2),\quad s>0, \end{aligned}$$

where \(c>0\) is a universal constant.

The next proposition was proved in the special case of square random Gaussian matrices by Szarek [17]. In a much more general setting, similar results were obtained earlier by Nguyen [12]; his argument was later reused in [9] to get sharp small ball probability estimates for the condition number of a random square matrix.

Proposition 3.2

(Singular values of random matrices with continuous distributions) Let M be an \(u\times t\) (\(t\ge u\)) random matrix with i.i.d. standard Gaussian entries. Then

$$\begin{aligned} {{\mathbb {P}}}\bigg \{s_{u-i}(M)\le \frac{c'i\,s}{\sqrt{u}}\bigg \}\le u^{i/2}\,s^{i^2/32},\quad 4\le i\le u-1,\quad s\in (0,1], \end{aligned}$$

where \(c'\in (0,1]\) is a universal constant.

We provide a proof of the above proposition in the “Appendix”.

This section deals with a large number of parameters satisfying multiple constraints; we group those constraints into blocks for better readability. We have four “section-wide” scalar parameters:

$$\begin{aligned} r\in {\mathbb {N}},\;\;\tilde{\varepsilon }\in (0,1],\;\;i\in {\mathbb {N}},\;\;\hbox { such that } \tilde{\varepsilon }(1+\tilde{\varepsilon })^{-i}r\ge 2;\;\;\;x\in {\mathbb {N}}. \end{aligned}$$
(11)

The objective of the section is to study singular values of a block matrix of the form

$$\begin{aligned} B= \begin{bmatrix} F &{}\quad M \\ W &{}\quad Q \end{bmatrix}, \end{aligned}$$

where F is a fixed \(r\times r\) matrix with prescribed singular values, M is \(r\times x\), W is \(x\times r\), Q is \(x\times x\), and the entries of M, W, Q are mutually independent standard Gaussians. Let

$$\begin{aligned} \varrho _j : = \lfloor (1-(1+\tilde{\varepsilon })^{-j}) r \rfloor , \quad j=0,1,\dots ,i. \end{aligned}$$

Observe that the relation \(\tilde{\varepsilon }(1+\tilde{\varepsilon })^{-i}r\ge 2\) from (11) yields, for \(j \in [0,i-1]\),

$$\begin{aligned} \varrho _{j+1}-\varrho _j>&(1-(1+\tilde{\varepsilon })^{-j-1}) r -1 - (1-(1+\tilde{\varepsilon })^{-j}) r = \tilde{\varepsilon }(1 + \tilde{\varepsilon })^{-j-1}r -1 > 0, \end{aligned}$$
(12)

which in turn implies that the sequence \((\varrho _j)_{j=0}^i\) is strictly increasing.

Next, let \(g(\cdot ):(r_j)_{j \in [i]}\rightarrow (0,\infty )\) be a strictly positive growth function satisfying

$$\begin{aligned} \begin{aligned}&g(\varrho _j) \ge 16\,g(\varrho _{j+1}), \quad j=1,2,\dots ,i-1;\\&g(\varrho _i) \le \sqrt{x}. \end{aligned} \end{aligned}$$
(13)

Now, we assume the matrix F satisfies

$$\begin{aligned} s_{\varrho _j}(F)\ge g\big (\varrho _j\big ), \quad j \in [i]. \end{aligned}$$
(14)

In this section, we deal with an arbitrary growth function satisfying the conditions; a specific choice of \(g(\cdot )\) will be made later in Sect. 5.

Our objective in this section is to derive the following proposition.

Proposition 3.3

There are universal constants \(c\in (0,1]\), \({\tilde{C}}\ge 1\) with the following property. Let

$$\begin{aligned} B= \begin{bmatrix} F &{}\quad M \\ W &{}\quad Q \end{bmatrix}, \end{aligned}$$

where F is a fixed \(r\times r\) matrix, M is \(r\times x\), W is \(x\times r\), Q is \(x\times x\), and the entries of M, W, Q are mutually independent standard Gaussians. Assume that parameters \(\tilde{\varepsilon }\in (0,1]\), \(h\in (0,1]\), r, x, and \(i\in {\mathbb {N}}\) satisfy

$$\begin{aligned}&r- \varrho _i\le x\le r,\;\; \tilde{\varepsilon }x\ge 4, \;\;h\le 2^{-11}(c')^2\tilde{\varepsilon },\\&3(1+\tilde{\varepsilon })^{-i-1}r-(1+\tilde{\varepsilon })^{-i}r\ge x+1+11\tilde{\varepsilon }x,\;\; \tilde{\varepsilon }(1+\tilde{\varepsilon })^{-i}r\ge 2, \end{aligned}$$

where \(c' \in (0,1]\) is the constant from Proposition 3.2. Further, assume (14) for the singular values of F, for a positive function \(g(\cdot )\) satisfying (13). Then with probability at least

$$\begin{aligned} 1-2x^{\tilde{\varepsilon }x/2}\, h^{(\tilde{\varepsilon }x)^2/64} -4\exp \big (-cx^2\,\tilde{\varepsilon }/h^2\big ) -\tilde{C}\exp \big (-c\tilde{\varepsilon }^2 (1+\tilde{\varepsilon })^{-i}r x/h^2\big ) \end{aligned}$$

we have

$$\begin{aligned} s_{\lfloor (1-(1+\tilde{\varepsilon })^{-i-1})(r+x)\rfloor }(B)\ge \frac{c' \tilde{\varepsilon }h^5\,g\big (\varrho _i \big )}{32}. \end{aligned}$$

Note that if \(F=UDV\) is a singular values decomposition of F then, in view of rotational invariance of the Gaussian distribution,

$$\begin{aligned} B= \begin{bmatrix} UDV &{}\quad M \\ W &{}\quad Q \end{bmatrix}= \begin{bmatrix} U &{}\quad 0 \\ 0 &{}\quad \textrm{Id}_x \end{bmatrix} \begin{bmatrix} D &{}\quad U^{-1}M \\ WV^{-1} &{}\quad Q \end{bmatrix} \begin{bmatrix} V &{}\quad 0 \\ 0 &{}\quad \textrm{Id}_x \end{bmatrix}, \end{aligned}$$

where \(U^{-1}M\), \(WV^{-1}\), and Q have mutually independent standard Gaussian entries, and where the singular spectrum of B coincides with that of

$$\begin{aligned} B':= \begin{bmatrix} D &{} U^{-1}M \\ WV^{-1} &{} Q \end{bmatrix}. \end{aligned}$$

We can assume without loss of generality that the diagonal elements (the singular values) of D are arranged in the non-decreasing order when moving from top left to bottom right corner. We will work with the singular spectrum of \(B'\) as it will allow to somewhat simplify the computations.

The specific goal is to estimate from below the singular value

$$\begin{aligned} s_{\lfloor (1-(1+\tilde{\varepsilon })^{-i-1})(r+x)\rfloor }(B) =s_{\lfloor (1-(1+\tilde{\varepsilon })^{-i-1})(r+x)\rfloor }(B') \end{aligned}$$

in terms of \(g(\varrho _i )\). To have a better control on probability estimates, we introduce one more scalar parameter \(h\in (0,1]\) which will allow us to balance the precision of the estimate and the probability with which the estimate holds (the smaller h is, the less precise the estimate is and the stronger are probability bounds). We set

$$\begin{aligned} \tau := h^4\cdot g(\varrho _i ), \end{aligned}$$
(15)

and let

$$\begin{aligned} \ell _{i+1}'=\,\hbox { the number of singular values of } F \hbox { strictly less than } \tau ;\quad \ell _{i+1}'':=r - \varrho _i-\ell _{i+1}'.\nonumber \\ \end{aligned}$$
(16)

Let us remark that \(\ell _{i+1}' \le r - \varrho _i\) since, by the intermediate singular values assumption (14) on F, we have \(s_{\varrho _i}(F)\ge \tau \).

Set

$$\begin{aligned} I:=[r]{\setminus }[\ell _{i+1}']. \end{aligned}$$
(17)

Our argument to control \(s_{\lfloor (1-(1+\tilde{\varepsilon })^{-i-1})(r+x)\rfloor }(B)\) splits into two parts depending on whether \(\ell _{i+1}'\) is “small” or “large”. In the former case (see Lemma 3.4), the matrix F (or D) has a well controlled singular spectrum, and our goal is to show that attaching to it x rows and columns of standard Gaussians cannot deteriorate the singular values estimates. In this setting, we completely ignore the first \(\ell _{i+1}'\) rows of \(B'\), and work with the matrix \(B'_{I\times [r+x]}\). In the latter case (see Lemma 3.5), we show that by adding the Gaussian rows and columns we actually improve the control of the singular values. The fact that the top right \(\ell _{i+1}'\times x\) corner of \(B'\) is a standard Gaussian matrix, plays a crucial role in this setting. The proof of Proposition 3.3 follows from Lemmas 3.4 and 3.5.

The high-level proof strategy for both Lemmas 3.4 and 3.5 is similar. We construct a (random) subspace H of \({\mathbb {R}}^{r+x}\) of dimension at least \((1-(1+\tilde{\varepsilon })^{-i-1})(r+x)\), designed in such a way that, under appropriate assumptions on the singular spectra of certain submatrices of \(U^{-1}M\), \(WV^{-1}\), and Q, \(\Vert B'v\Vert _2\) is large for every unit vector \(v\in H\). By the minimax formula for singular values,

$$\begin{aligned} s_{\lfloor (1-(1+\tilde{\varepsilon })^{-i-1})(r+x)\rfloor }(B')\ge \inf \limits _{v\in H,\,\Vert v\Vert _2=1}\Vert B'v\Vert _2. \end{aligned}$$

The “appropriate assumptions” on the singular spectra are encapsulated in a good event \({\mathcal {E}}_{good}\) which, as we show, has a very large probability. In what follows, it will be convenient to use notation

$$\begin{aligned} \ell _j:= \varrho _j - \varrho _{j-1},\, j \in [i], \quad \ell _{i+1}:= r - \varrho _i. \end{aligned}$$
(18)

We remark that for every \(j=1,2,\dots ,i\), by the same derivation as shown in (12),

$$\begin{aligned} \ell _j\in [\tilde{\varepsilon }(1+\tilde{\varepsilon })^{-j}r-1,\tilde{\varepsilon }(1+\tilde{\varepsilon })^{-j}r+1]. \end{aligned}$$
(19)

Lemma 3.4

There exist universal constants \(c\in (0,1]\), \({\tilde{C}}\ge 1\) with the following property. Assume that \(i \in {\mathbb {N}}\), \(\tilde{\varepsilon }\in (0,1]\), \(h\in (0,1]\), r, and x satisfy the assumptions of Proposition 3.3, and assume additionally that

$$\begin{aligned} \ell _{i+1}'\le (1+\tilde{\varepsilon })^{-i-1}(r+x)-3\tilde{\varepsilon }x, \end{aligned}$$

where \(\ell _{i+1}'\) is defined in (16). Denote

$$\begin{aligned} \beta := \frac{c' \tilde{\varepsilon }h\tau }{32}, \end{aligned}$$

where \(c' \in (0,1]\) is the constant from Proposition 3.2 and where \(\tau \) is defined by (15). Then with probability at least

$$\begin{aligned} 1-2x^{\tilde{\varepsilon }x/2}\, h^{(\tilde{\varepsilon }x)^2/64} -4\exp \big (-cx^2\,\tilde{\varepsilon }/h^2\big ) -\tilde{C}\exp \big (-c\tilde{\varepsilon }^2 (1+\tilde{\varepsilon })^{-i}r x/h^2\big ) \end{aligned}$$

we have

$$\begin{aligned} s_{\lfloor (1-(1+\tilde{\varepsilon })^{-i-1})(r+x)\rfloor }(B)\ge \beta . \end{aligned}$$

Proof

Construction of subspace H Denote by \(X_1,X_2,\dots ,X_{r+x}\in {\mathbb {R}}^{r+x}\) an orthonormal basis of the right singular vectors of the matrix \(B_{I\times [r+x]}'\), measurable w.r.t the \(\sigma \)field \(\sigma (B_{I\times [r+x]}')\), where \(X_j\) corresponds to \(s_j(B_{I\times [r+x]}')\), \(1\le j\le r+x\), and where I is defined by (17). Note that by interlacing properties of the singular values (see, for example [3]), we have

$$\begin{aligned} s_j(B_{I\times [r+x]}')\ge s_{j}(D_{\{\ell _{i+1}'+1,\dots ,r\}\times \{\ell _{i+1}'+1,\dots ,r\}}),\quad 1\le j\le r-\ell _{i+1}'; \end{aligned}$$

in particular, \(s_{r-\ell _{i+1}'}(B_{I\times [r+x]}')\ge \tau \) everywhere on the probability space.

Observe that, conditioned on \(\sigma (B_{[r]\times [r+x]}')\), the \(x\times \ell _{i+1}''\) matrix

$$\begin{aligned} Y^{(i+1)}:= \begin{bmatrix} WV^{-1}&Q \end{bmatrix}\, \begin{bmatrix} X_{r+1-\ell _{i+1}}&\dots&X_{r-\ell _{i+1}'} \end{bmatrix} = \begin{bmatrix} WV^{-1}&Q \end{bmatrix}\, \begin{bmatrix} X_{\varrho _i+1}&\dots&X_{r-\ell _{i+1}'} \end{bmatrix} \end{aligned}$$

has mutually independent standard Gaussian entries. Denote by \(e^{(i+1)}_q\), \(1\le q\le \min (\lfloor \tilde{\varepsilon }x\rfloor ,\ell _{i+1}'')\), a random orthonormal system of right singular vectors of \(Y^{(i+1)}\) corresponding to \(\min (\lfloor \tilde{\varepsilon }x\rfloor ,\ell _{i+1}'')\) largest singular values of \(Y^{(i+1)}\), and let \(E^{(i+1)}\subset {\mathbb {R}}^{\ell _{i+1}''}\) be the subspace

$$\begin{aligned} \mathrm{span\,}\big \{e^{(i+1)}_q,\; 1\le q\le \min (\lfloor \tilde{\varepsilon }x\rfloor ,\ell _{i+1}'')\big \}^\perp . \end{aligned}$$

Similarly, for every \(1\le j\le i\) and for \(\ell _j\) given by (18), we define the \(x\times \ell _j\) matrix

$$\begin{aligned} Y^{(j)}&:= \begin{bmatrix} WV^{-1}&Q \end{bmatrix}\, \begin{bmatrix} X_{r+1-\sum _{d=j}^{i+1}\ell _{d}}&\dots&X_{r-\sum _{d=j+1}^{i+1}\ell _{d}} \end{bmatrix}\\&= \begin{bmatrix} WV^{-1}&Q \end{bmatrix}\, \begin{bmatrix} X_{\varrho _{j-1}+1}&\dots&X_{\varrho _{j}} \end{bmatrix} \end{aligned}$$

(again, conditioned on \(\sigma (B_{[r]\times [r+x]}')\), \(Y^{(j)}\) has mutually independent standard normal entries). Denote by \(e^{(j)}_q\), \(1\le q\le \min (\lfloor 2^{j-i-1}\tilde{\varepsilon }x\rfloor ,\ell _j)\), a random orthonormal system of right singular vectors of \(Y^{(j)}\) corresponding to \(\min (\lfloor 2^{j-i-1}\tilde{\varepsilon }x\rfloor ,\ell _j)\) largest singular values of \(Y^{(j)}\), and let \(E^{(j)}\subset {\mathbb {R}}^{\ell _{j}}\) be the subspace

$$\begin{aligned} \mathrm{span\,}\big \{e^{(j)}_q,\; 1\le q\le \min (\lfloor 2^{j-i-1}\tilde{\varepsilon }x\rfloor ,\ell _j)\big \}^\perp . \end{aligned}$$

Consider the random \(x\times (\ell _{i+1}'+x)\) matrix

$$\begin{aligned} {\hat{Y}}:=\begin{bmatrix} WV^{-1}&Q \end{bmatrix}\, \begin{bmatrix} X_{r+1-\ell _{i+1}'}&\dots&X_{r+x} \end{bmatrix}. \end{aligned}$$

Let \({\hat{e}}_1,{\hat{e}}_2,\dots ,{\hat{e}}_{x-\lfloor \tilde{\varepsilon }x\rfloor }\) be a random orthonormal set of right singular vectors of \({\hat{Y}}\) corresponding to \(x-\lfloor \tilde{\varepsilon }x\rfloor \) largest singular values of \({\hat{Y}}\), and let \({\tilde{E}}\subset {\mathbb {R}}^{\ell _{i+1}'+x}\) be the random subspace of dimension \(x-\lfloor \tilde{\varepsilon }x\rfloor \) defined as

$$\begin{aligned} {\tilde{E}}:=\mathrm{span\,}\{{\hat{e}}_1,{\hat{e}}_2,\dots ,{\hat{e}}_{x-\lfloor \tilde{\varepsilon }x\rfloor }\}. \end{aligned}$$

Now, we construct the (random) subspace \(H\subset {\mathbb {R}}^{r+x}\) as

$$\begin{aligned} H&:=\mathrm{span\,}\Big \{ \begin{bmatrix} X_{r+1-\ell _{i+1}'}&\dots&X_{r+x} \end{bmatrix}({\tilde{E}}),\\&\quad \begin{bmatrix} X_{r+1-\ell _{i+1}}&\dots&X_{r-\ell _{i+1}'} \end{bmatrix}(E^{(i+1)});\\&\quad \begin{bmatrix} X_{\varrho _{j-1}+1 }&\dots&X_{\varrho _j } \end{bmatrix}(E^{(j)}), \;1\le j\le i \Big \}. \end{aligned}$$

Let us check that the constructed subspace satisfies the required lower bound on dimension, that is, \(\dim H\ge (1-(1+\tilde{\varepsilon })^{-i-1})(r+x)\). In view of the assumptions on \(\ell _{i+1}'\), we have

$$\begin{aligned} \dim H&\ge x-\lfloor \tilde{\varepsilon }x\rfloor +\ell _{i+1}''-\lfloor \tilde{\varepsilon }x\rfloor +\sum _{j=1}^i \big (\ell _{j}-\lfloor 2^{j-i-1}\tilde{\varepsilon }x\rfloor \big )\\&\ge r+x-\ell _{i+1}'-3\tilde{\varepsilon }x\\&\ge r+x-(1+\tilde{\varepsilon })^{-i-1}(r+x). \end{aligned}$$

Defining a good event Denote by \(\tilde{\mathcal {E}}\) the event

$$\begin{aligned} \bigg \{\big \Vert \begin{bmatrix} WV^{-1}&Q \end{bmatrix} v\big \Vert _2\ge \frac{c' \lfloor \tilde{\varepsilon }x\rfloor \,h}{\sqrt{x}}\hbox { for every unit vector } { v\in \Big (\begin{bmatrix} X_{r+1-\ell _{i+1}'}&\dots&X_{r+x} \end{bmatrix}({\tilde{E}})\Big )}\bigg \}, \end{aligned}$$

where the constant \(c'\) is taken from Proposition 3.2. According to our definition of the subspace \({\tilde{E}}\), for every unit vector v as above we have

$$\begin{aligned} \big \Vert \begin{bmatrix} WV^{-1}&Q \end{bmatrix} v\big \Vert _2\ge s_{x-\lfloor \tilde{\varepsilon }x\rfloor }({\hat{Y}}), \end{aligned}$$

where the matrix \({\hat{Y}}\) is \(x\times (x+\ell _{i+1}')\) standard Gaussian, in view of the independence of \(\begin{bmatrix} WV^{-1}&Q \end{bmatrix}\) from the \(\sigma \)–field \(\sigma (B_{[r]\times [r+x]}')\).

Hence, by Proposition 3.2 applied to \({\hat{Y}}\), we get

$$\begin{aligned} {{\mathbb {P}}}(\tilde{\mathcal {E}})\ge 1-x^{\lfloor \tilde{\varepsilon }x\rfloor /2}\, h^{\lfloor \tilde{\varepsilon }x\rfloor ^2/32}. \end{aligned}$$

Further, let

$$\begin{aligned} {\mathcal {E}}^{(i+1)}:=\big \{s_{\lfloor \tilde{\varepsilon }x\rfloor +1}(Y^{(i+1)})\le \sqrt{x}/h\big \}, \end{aligned}$$

and for every \(1\le j\le i\), let

$$\begin{aligned} {\mathcal {E}}^{(j)}:=\big \{s_{\lfloor 2^{-i-1+j}\tilde{\varepsilon }x\rfloor +1}(Y^{(j)})\le 2^{i+1-j}\sqrt{\ell _j}/h\big \}. \end{aligned}$$

Since, by our assumptions, \(\sqrt{\tilde{\varepsilon }x}/h\ge 2\sqrt{\ell _{i+1}''}\), we have, according to Proposition 3.1,

$$\begin{aligned} {{\mathbb {P}}}\big (\big ({\mathcal {E}}^{(i+1)}\big )^c\big ) \le {{\mathbb {P}}}\big \{\Vert Y^{(i+1)}\Vert _{HS}\ge \sqrt{x}\cdot \sqrt{\tilde{\varepsilon }x}/h\big \} \le 2\exp \big (-cx^2\,\tilde{\varepsilon }/h^2\big ), \end{aligned}$$

for a universal constant \(c>0\). Similarly, since for every \(j=1,2,\dots ,i\), \(\sqrt{2^{i+1-j}\tilde{\varepsilon }x}/h\ge \sqrt{\tilde{\varepsilon }x}/h\ge 2\sqrt{x}\), we have

$$\begin{aligned} {{\mathbb {P}}}\big (\big ({\mathcal {E}}^{(j)}\big )^c\big )\le & {} {{\mathbb {P}}}\big \{\Vert Y^{(j)}\Vert _{HS}\ge 2^{i+1-j}\sqrt{\ell _j}\cdot \sqrt{2^{-i-1+j}\tilde{\varepsilon }x}/h\big \} \\\le & {} 2\exp \big (-c\,2^{i+1-j}\ell _j x\,\tilde{\varepsilon }/h^2\big ). \end{aligned}$$

We define

$$\begin{aligned} {\mathcal {E}}_{\textrm{good}}:=\tilde{\mathcal {E}}\cap \bigcap _{j=1}^{i+1}{\mathcal {E}}^{(j)}. \end{aligned}$$

In view of the above,

$$\begin{aligned} {{\mathbb {P}}}\big ({\mathcal {E}}_{\textrm{good}}\big )&\ge 1-2x^{\lfloor \tilde{\varepsilon }x\rfloor /2}\, h^{\lfloor \tilde{\varepsilon }x\rfloor ^2/32} - 4\exp \big (-cx^2\,\tilde{\varepsilon }/h^2\big )\\&\quad -2\sum _{j=1}^i \exp \big (-c\,2^{i+1-j}\ell _j x\,\tilde{\varepsilon }/h^2\big )\\&\ge 1-2x^{\tilde{\varepsilon }x/2}\, h^{(\tilde{\varepsilon }x)^2/64} -4\exp \big (-cx^2\,\tilde{\varepsilon }/h^2\big )\\&\quad -{\tilde{C}}\exp \big (-c\tilde{\varepsilon }^2 (1+\tilde{\varepsilon })^{-i}r x/h^2\big ), \end{aligned}$$

for a universal constant \({\tilde{C}}>0\).

Checking that H satisfies the required property conditioned on \({\mathcal {E}}_{\textrm{good}}\) Assuming the conditioning, pick any unit vector \(v\in H\). We represent v in terms of the basis \(X_1,\dots ,X_{r+x}\) as

$$\begin{aligned} v=\sum _{q=1}^{r+x}a_q\, X_q, \end{aligned}$$

for some coefficients \(a_1,\dots ,a_{r+x}\) with \(\sum _{q=1}^{r+x}a_q^2=1\). Note that

$$\begin{aligned} \Vert B_{I\times [r+x]}'v\Vert _2^2 =\sum _{q=1}^{r+x}a_q^2 \,s_{q}(B_{I\times [r+x]}')^2. \end{aligned}$$

If the last expression is greater than \(\beta ^2\) then we are done. Otherwise, we have

$$\begin{aligned} \sum _{q=1}^{r+x}a_q^2\, s_{q}(B_{I\times [r+x]}')^2\le \beta ^2, \end{aligned}$$

and hence, in particular,

$$\begin{aligned} \sum _{q=r+1-\ell _{i+1}}^{r-\ell _{i+1}'}a_q^2\le \frac{\beta ^2}{\tau ^2}\le \frac{h^2}{16^2}, \end{aligned}$$
(20)

and for every \(j=1,2,\dots ,i\),

$$\begin{aligned} \sum _{q=\varrho _{j-1}+1 }^{\varrho _j }a_q^2 \le \frac{\beta ^2}{g\big (\varrho _j \big )^2} \le \frac{h^2}{16^2}\cdot 16^{j-i}. \end{aligned}$$
(21)

Observe that the last conditions yield

$$\begin{aligned} \sum _{q=r+1-\ell _{i+1}'}^{r+x}a_q^2\ge \frac{1}{4}. \end{aligned}$$

In view of conditioning on \(\tilde{\mathcal {E}}\), this immediately implies

$$\begin{aligned} \Big \Vert \begin{bmatrix} W&Q \end{bmatrix} \sum _{q=r+1-\ell _{i+1}'}^{r+x}a_q X_q\Big \Vert _2\ge \frac{c' \lfloor \tilde{\varepsilon }x\rfloor }{4\sqrt{x}}. \end{aligned}$$

Further, in view of conditioning on events \({\mathcal {E}}^{(1)},\dots ,{\mathcal {E}}^{(i+1)}\),

$$\begin{aligned} \Big \Vert \begin{bmatrix} W&Q \end{bmatrix} \sum _{q=r+1-\ell _{i+1}}^{r-\ell _{i+1}'}a_q X_q\Big \Vert _2 \le \frac{\beta }{\tau }\cdot \frac{\sqrt{x}}{h}, \end{aligned}$$

and for every \(j=1,2,\dots ,i\),

$$\begin{aligned} \bigg \Vert \begin{bmatrix} W&Q \end{bmatrix} \sum _{q=\varrho _{j-1}+1 }^{\varrho _j }a_q X_q\bigg \Vert _2 \le \frac{\beta }{g\big (\varrho _j \big )}\cdot \frac{2^{i+1-j}\sqrt{\ell _j}}{h}. \end{aligned}$$

Thus, by the triangle inequality,

$$\begin{aligned} \big \Vert \begin{bmatrix} W&Q \end{bmatrix} v\big \Vert _2&\ge \frac{c' \lfloor \tilde{\varepsilon }x\rfloor }{4\sqrt{x}}-\frac{\beta }{\tau }\cdot \frac{\sqrt{x}}{h} -\sum _{j=1}^i \frac{\beta }{g\big (\varrho _j \big )}\cdot \frac{2^{i+1-j}\sqrt{\ell _j}}{h}\\&\ge \frac{c' \tilde{\varepsilon }\sqrt{x}}{8}-\frac{\beta }{\tau }\cdot \frac{\sqrt{x}}{h} -\frac{8\,\beta }{g\big (\varrho _i \big )}\cdot \frac{\sqrt{\tilde{\varepsilon }(1+\tilde{\varepsilon })^{-i}r}}{h}, \end{aligned}$$

where the last relation follows from our assumptions on parameters (19) and (13). The assumption on \(\beta \) then implies the result. \(\square \)

Lemma 3.5

There are universal constants \(c\in (0,1]\), \({\tilde{C}}\ge 1\) with the following property. Assume that \(i \in {\mathbb {N}}\), \(\tilde{\varepsilon }\in (0,1]\), \(h\in (0,1]\), r, and x satisfy the assumptions of Proposition 3.3, and assume additionally that

$$\begin{aligned} \ell _{i+1}'> (1+\tilde{\varepsilon })^{-i-1}(r+x)-3\tilde{\varepsilon }x, \end{aligned}$$

where \(\ell _{i+1}'\) is given in (16). Then with probability at least

$$\begin{aligned} 1-2x^{\tilde{\varepsilon }x/2}\, h^{(\tilde{\varepsilon }x)^2/64}- {\tilde{C}}\exp \big (-c\tilde{\varepsilon }^2 (1+\tilde{\varepsilon })^{-i}r x/h^2\big ) -2\exp \big (-c\,x^2\,\tilde{\varepsilon }/h^2\big ) \end{aligned}$$

we have

$$\begin{aligned} s_{\lfloor (1-(1+\tilde{\varepsilon })^{-i-1})(r+x)\rfloor }(B)\ge \tau , \end{aligned}$$

where \(\tau \) is defined by (15).

Proof

Construction of subspace H Consider a refinement of the block representation of \(B'\):

$$\begin{aligned} B'= \begin{bmatrix} \begin{bmatrix} D_{i+1}' &{}\quad 0 &{}\quad 0 &{}\quad \dots &{}\quad 0\\ 0 &{}\quad D_{i+1}'' &{}\quad 0 &{}\quad \dots &{}\quad 0\\ 0 &{}\quad 0 &{}\quad D_i &{}\quad \dots &{}\quad 0\\ \dots &{}\quad \dots &{}\quad \dots &{}\quad \dots &{}\quad \dots \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad \dots &{}\quad D_1 \end{bmatrix} &{} \begin{bmatrix} M_{i+1}' \\ M_{i+1}'' \\ M_{i}\\ \dots \\ M_1 \end{bmatrix}\\ \begin{bmatrix} W_{i+1}' &{} W_{i+1}'' &{} W_{i} &{} \dots &{} W_1 \end{bmatrix}&Q \end{bmatrix}, \end{aligned}$$

where

$$\begin{aligned} U^{-1}M= & {} \begin{bmatrix} M_{i+1}' \\ M_{i+1}'' \\ M_{i}\\ \dots \\ M_1 \end{bmatrix};\; WV^{-1}=\begin{bmatrix} W_{i+1}'&W_{i+1}''&W_{i}&\dots&W_1 \end{bmatrix};\;\\ D= & {} \begin{bmatrix} D_{i+1}' &{}\quad 0 &{}\quad 0 &{}\quad \dots &{}\quad 0\\ 0 &{}\quad D_{i+1}'' &{}\quad 0 &{}\quad \dots &{}\quad 0\\ 0 &{}\quad 0 &{}\quad D_i &{}\quad \dots &{}\quad 0\\ \dots &{}\quad \dots &{}\quad \dots &{}\quad \dots &{}\quad \dots \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad \dots &{}\quad D_1 \end{bmatrix}. \end{aligned}$$

In particular, for every \(1\le j\le i\), the matrix \(D_j\) above is \(\ell _j\times \ell _j\), \(M_j\) is \(\ell _j\times x\), and \(W_j\) is \(x\times \ell _j\), where \(\ell _j\)’s are given by (18). Further, \(D_{i+1}'\) is \(\ell _{i+1}'\times \ell _{i+1}'\), \(M_{i+1}'\) is \(\ell _{i+1}'\times x\), and \(W_{i+1}'\) is \(x\times \ell _{i+1}'\); the dimensions of \(D_{i+1}''\), \(M_{i+1}''\) and \(W_{i+1}''\) are defined accordingly. In this proof, we denote by \(P_{i+1}':{\mathbb {R}}^{r+x}\rightarrow {\mathbb {R}}^{\ell _{i+1}'}\) the coordinate projection onto first \(\ell _{i+1}'\) coordinates, by \(P_x:{\mathbb {R}}^{r+x}\rightarrow {\mathbb {R}}^{x}\) the coordinate projection onto last x coordinates, and, for every \(1\le j\le i\), denote by \(P_j:{\mathbb {R}}^{r+x}\rightarrow {\mathbb {R}}^{\ell _j}\) the coordinate projection onto \(\ell _j\) components starting from \(1+\sum _{d=j+1}^{i+1}\ell _d\).

Denote by \(e^{(i+1)'}_q\), \(1\le q\le \ell _{i+1}'-\lfloor \tilde{\varepsilon }x\rfloor \), a random orthonormal system of right singular vectors of \(W_{i+1}'\) corresponding to \(\ell _{i+1}'-\lfloor \tilde{\varepsilon }x\rfloor \) largest singular values of \(W_{i+1}'\), and let \({\tilde{E}}\subset {\mathbb {R}}^{\ell _{i+1}'}\) be the subspace

$$\begin{aligned} \mathrm{span\,}\big \{e^{(i+1)'}_q,\; 1\le q\le \ell _{i+1}'-\lfloor \tilde{\varepsilon }x\rfloor \big \}. \end{aligned}$$

For every \(1\le j\le i\), denote by \(e^{(j)}_q\), \(1\le q\le \min (\lfloor 2^{j-i-1}\tilde{\varepsilon }x\rfloor ,\ell _j)\), a random orthonormal system of right singular vectors of \(W_j\) corresponding to \(\min (\lfloor 2^{j-i-1}\tilde{\varepsilon }x\rfloor ,\ell _j)\) largest singular values of \(W_j\), and let \(E^{(j)}\subset {\mathbb {R}}^{\ell _{j}}\) be the subspace

$$\begin{aligned} \mathrm{span\,}\big \{e^{(j)}_q,\; 1\le q\le \min (\lfloor 2^{j-i-1}\tilde{\varepsilon }x\rfloor ,\ell _j)\big \}^\perp . \end{aligned}$$

Finally, we construct a random subspace \({\hat{E}}\subset {\mathbb {R}}^x\) as follows. Denote by \(e^{(Q)}_q\), \(1\le q\le \lfloor \tilde{\varepsilon }x\rfloor \), a random orthonormal system of right singular vectors of Q corresponding to \(\lfloor \tilde{\varepsilon }x\rfloor \) largest singular values of Q, and let \(E^{(Q)}\subset {\mathbb {R}}^x\) be the subspace

$$\begin{aligned} \mathrm{span\,}\big \{e^{(Q)}_q,\; 1\le q\le \lfloor \tilde{\varepsilon }x\rfloor \big \}^\perp . \end{aligned}$$

Further, let \(e^{(M_{i+1}')}_q\), \(1\le q\le \ell _{i+1}'-\lfloor \tilde{\varepsilon }x\rfloor \), be a random orthonormal system of right singular vectors of \(M_{i+1}'\) corresponding to \(\ell _{i+1}'-\lfloor \tilde{\varepsilon }x\rfloor \) largest singular values of \(M_{i+1}'\), and let

$$\begin{aligned} E^{(M_{i+1}')}:=\mathrm{span\,}\big \{e^{(M_{i+1}')}_q,\; 1\le q\le \ell _{i+1}'-\lfloor \tilde{\varepsilon }x\rfloor \big \}. \end{aligned}$$

For every \(1\le j\le i\), let \(e^{(M_{j})}_q\), \(1\le q\le \lfloor 2^{j-i-1}\tilde{\varepsilon }x\rfloor \), be a random orthonormal system of right singular vectors of \(M_j\) corresponding to \(\lfloor 2^{j-i-1}\tilde{\varepsilon }x\rfloor \) largest singular values of \(M_{j}\), and let

$$\begin{aligned} E^{(M_j)}:=\mathrm{span\,}\big \{e^{(M_{j})}_q,\; 1\le q\le \lfloor 2^{j-i-1}\tilde{\varepsilon }x\rfloor \big \}^\perp \subset {\mathbb {R}}^x. \end{aligned}$$

We then set

$$\begin{aligned} {\hat{E}}:=E^{(M_{i+1}')}\;\cap \;\bigcap _{j=1}^i E^{(M_j)}. \end{aligned}$$

The subspace H is now defined as

$$\begin{aligned} H:=\big \{v\in R^{r+x}:\;P_{i+1}'v\in {\tilde{E}};\;P_{i+1}''v=0;\;P_j v\in E^{(j)},\; 1\le j\le i;\; P_x v\in {\hat{E}}\big \}. \end{aligned}$$

Let us check that H satisfies the required assumptions on the dimension. We have

$$\begin{aligned} \dim H&=\ell _{i+1}'-\lfloor \tilde{\varepsilon }x\rfloor +\sum _{j=1}^i\big (\ell _j-\min (\lfloor 2^{j-i-1}\tilde{\varepsilon }x\rfloor ,\ell _j)\big ) +\dim {\hat{E}}\\&\ge r-\ell _{i+1}''- 2\tilde{\varepsilon }x +x-\tilde{\varepsilon }x-(x-\ell _{i+1}'+\lfloor \tilde{\varepsilon }x\rfloor ) -\sum _{j=1}^i \lfloor 2^{j-i-1}\tilde{\varepsilon }x\rfloor \\&\ge r+2\ell _{i+1}'-\ell _{i+1}-5\tilde{\varepsilon }x. \end{aligned}$$

Next, we use the assumption on \(\ell _{i+1}'\) and the assumptions on parameters to obtain

$$\begin{aligned} r+2\ell _{i+1}'-\ell _{i+1}-5\tilde{\varepsilon }x&\ge r+2(1+\tilde{\varepsilon })^{-i-1}(r+x)-11\tilde{\varepsilon }x -(1+\tilde{\varepsilon })^{-i}r-1\\&\ge (1-(1+\tilde{\varepsilon })^{-i-1})(r+x). \end{aligned}$$

Defining a good event Denote by \(\tilde{\mathcal {E}}\) the event

$$\begin{aligned} \bigg \{\big \Vert W_{i+1}'v\big \Vert _2\ge \frac{c' \lfloor \tilde{\varepsilon }x\rfloor \,h}{\sqrt{x}}\hbox { for every unit vector } { v\in {\tilde{E}}}\bigg \}, \end{aligned}$$

where the constant \(c'\) is taken from Proposition 3.2. According to our definition of the subspace \({\tilde{E}}\), for every unit vector v as above we have

$$\begin{aligned} \big \Vert W_{i+1}' v\big \Vert _2\ge s_{\ell _{i+1}'-\lfloor \tilde{\varepsilon }x\rfloor }(W_{i+1}'). \end{aligned}$$

Hence, by Proposition 3.2 applied to \(W_{i+1}'\), we get

$$\begin{aligned} {{\mathbb {P}}}(\tilde{\mathcal {E}})\ge 1-x^{\lfloor \tilde{\varepsilon }x\rfloor /2}\, h^{\lfloor \tilde{\varepsilon }x\rfloor ^2/32}. \end{aligned}$$

Further, for every \(1\le j\le i\), let

$$\begin{aligned} {\mathcal {E}}^{(j)}:=\big \{s_{\lfloor 2^{-i-1+j}\tilde{\varepsilon }x\rfloor +1}(W_j)\le 2^{i+1-j}\sqrt{\ell _j}/h\big \}. \end{aligned}$$

Note that conditioned on \({\mathcal {E}}^{(j)}\), we have

$$\begin{aligned} \big \Vert W_{j} v\big \Vert _2\le 2^{i+1-j}\sqrt{\ell _j}/h\hbox { for every unit vector } v\in E^{(j)}. \end{aligned}$$

Since for every \(j=1,2,\dots ,i\), \(\sqrt{2^{i+1-j}\tilde{\varepsilon }x}/h\ge \sqrt{\tilde{\varepsilon }x}/h\ge 2\sqrt{x}\), we have

$$\begin{aligned} {{\mathbb {P}}}\big (\big ({\mathcal {E}}^{(j)}\big )^c\big )\le & {} {{\mathbb {P}}}\big \{\Vert W_j\Vert _{HS}\ge 2^{i+1-j}\sqrt{\ell _j}\cdot \sqrt{2^{-i-1+j}\tilde{\varepsilon }x}/h\big \} \\\le & {} 2\exp \big (-c\,2^{i+1-j}\ell _j x\,\tilde{\varepsilon }/h^2\big ). \end{aligned}$$

Finally, we define events corresponding to a “good” realization of \({\hat{E}}\). Let \({\mathcal {E}}_{M_{i+1}'}\) be the event

$$\begin{aligned} \bigg \{\big \Vert M_{i+1}'v\big \Vert _2\ge \frac{c' \lfloor \tilde{\varepsilon }x\rfloor \,h}{\sqrt{x}}\hbox { for every unit vector } { v\in E^{(M_{i+1}')}}\bigg \}. \end{aligned}$$

Repeating the argument for \(\tilde{\mathcal {E}}\), we get

$$\begin{aligned} {{\mathbb {P}}}({\mathcal {E}}_{M_{i+1}'})\ge 1-x^{\lfloor \tilde{\varepsilon }x\rfloor /2}\, h^{\lfloor \tilde{\varepsilon }x\rfloor ^2/32}. \end{aligned}$$

Similarly, adjusting the argument for \({\mathcal {E}}^{(j)}\) accordingly, we get that for every \(1\le j\le i\), the event

$$\begin{aligned} {\mathcal {E}}_{M_{j}}:=\big \{s_{\lfloor 2^{-i-1+j}\tilde{\varepsilon }x\rfloor +1}(M_j)\le 2^{i+1-j}\sqrt{\ell _j}/h\big \} \end{aligned}$$

has probability at least \(1-2\exp \big (-c\,2^{i+1-j}\ell _j x\,\tilde{\varepsilon }/h^2\big )\), and that the event

$$\begin{aligned} {\mathcal {E}}_Q:=\big \{s_{\lfloor \tilde{\varepsilon }x\rfloor +1}(Q)\le \sqrt{x}/h\big \} \end{aligned}$$

has probability at least

$$\begin{aligned} 1-2\exp \big (-c\,x^2\,\tilde{\varepsilon }/h^2\big ). \end{aligned}$$

We define

$$\begin{aligned} {\mathcal {E}}_{\textrm{good}}:=\tilde{\mathcal {E}}\cap \; \bigcap _{j=1}^i {\mathcal {E}}^{(j)}\;\cap {\mathcal {E}}_{M_{i+1}'}\cap {\mathcal {E}}_Q\cap \;\bigcap _{j=1}^i {\mathcal {E}}_{M_{j}}. \end{aligned}$$

In view of the above,

$$\begin{aligned} {{\mathbb {P}}}({\mathcal {E}}_{\textrm{good}})&\ge 1-2x^{\tilde{\varepsilon }x/2}\, h^{(\tilde{\varepsilon }x)^2/64} -4\sum _{j=1}^i \exp \big (-c\,2^{i+1-j}\ell _j x\,\tilde{\varepsilon }/h^2\big )\\&\quad -2\exp \big (-c\,x^2\,\tilde{\varepsilon }/h^2\big )\\&\ge 1-2x^{\tilde{\varepsilon }x/2}\, h^{(\tilde{\varepsilon }x)^2/64}- {\tilde{C}}\exp \big (-c\tilde{\varepsilon }^2 (1+\tilde{\varepsilon })^{-i}r x/h^2\big )\\&\quad -2\exp \big (-c\,x^2\,\tilde{\varepsilon }/h^2\big ), \end{aligned}$$

for a universal constant \({\tilde{C}}\ge 1\).

Checking that H satisfies the required property conditioned on \({\mathcal {E}}_{\textrm{good}}\) Assuming the conditioning, pick any unit vector \(v\in H\). First, we observe that

$$\begin{aligned} \Vert D_{i+1}'P_{i+1}'v\Vert _2\le \Vert D_{i+1}'\Vert \le \tau , \end{aligned}$$

whereas, by the definition of \({\hat{E}}\) and the conditioning,

$$\begin{aligned} \Vert M_{i+1}'P_x v\Vert _2\ge \frac{c' \lfloor \tilde{\varepsilon }x\rfloor \,h}{\sqrt{x}}\,\Vert P_x v\Vert _2. \end{aligned}$$

Thus, if \(\Vert P_x v\Vert _2\ge \frac{2\sqrt{x}\tau }{c' \lfloor \tilde{\varepsilon }x\rfloor \,h}\) then \(\Vert B'v\Vert _2\ge \Vert M_{i+1}'P_x v\Vert _2-\Vert D_{i+1}'P_{i+1}'v\Vert _2\ge \tau \), and we are done.

Otherwise, if

$$\begin{aligned} \Vert P_x v\Vert _2< \frac{2\sqrt{x}\tau }{c' \lfloor \tilde{\varepsilon }x\rfloor \,h} \le \frac{4\tau }{c' \tilde{\varepsilon }\sqrt{x}\,h}, \end{aligned}$$
(22)

then, in view of the conditioning (see the definition of \({\mathcal {E}}_{M_{j}}\)),

$$\begin{aligned} \Vert M_j P_x v\Vert _2\le \frac{2^{i+1-j}\sqrt{\ell _j}}{h}\,\frac{2\sqrt{x}\tau }{c' \lfloor \tilde{\varepsilon }x\rfloor \,h} \le \frac{2^{i+3-j}\sqrt{\ell _j/x}\,\,\tau }{c'\tilde{\varepsilon }h^2},\quad 1\le j\le i. \end{aligned}$$

On the other hand, by our assumptions

$$\begin{aligned} \Vert D_j P_j v\Vert _2\ge g\big (\varrho _j \big )\,\Vert P_j v\Vert _2,\quad 1\le j\le i. \end{aligned}$$

Thus, unless \(\Vert B'_{[r]\times [r+x]}v\Vert _2\ge \tau \), we must have

$$\begin{aligned} g\big (\varrho _j \big )\,\Vert P_j v\Vert _2 -\frac{2^{i+3-j}\sqrt{\ell _j/x}\,\,\tau }{c'\tilde{\varepsilon }h^2}\le \Vert D_j P_j v\Vert _2-\Vert M_j P_x v\Vert _2\le \tau ,\quad 1\le j\le i, \end{aligned}$$

implying

$$\begin{aligned} \Vert P_j v\Vert _2\le \frac{2^{i+4-j}\sqrt{\ell _j/x}\,\,\tau }{c'\tilde{\varepsilon }h^2\, g\big (\varrho _j \big )},\quad 1\le j\le i. \end{aligned}$$
(23)

As a final step of the proof, we will show that for any unit vector \(v\in H\) satisfying conditions (22) and (23), one has \(\Vert B'_{\{r+1,\dots ,r+x\}\times [r+x]}v\Vert _2\ge \tau \). First, note that (22) and (23) imply that

$$\begin{aligned} \Vert P_{i+1}'v\Vert _2&\ge 1-\frac{4\tau }{c' \tilde{\varepsilon }\sqrt{x}\,h} -\sum _{j=1}^i \frac{2^{i+4-j}\sqrt{\ell _j/x}\,\,\tau }{c'\tilde{\varepsilon }h^2\, g\big (\varrho _j \big )}\\&\ge 1-4h^2- h\,\sum _{j=1}^i \frac{2^{i+4-j}\sqrt{2\tilde{\varepsilon }(1+\tilde{\varepsilon })^{i-j}}}{4^{i-j}}\\&>1-1-4h^2-64h\ge 1/2, \end{aligned}$$

whence, in view of conditioning on \(\tilde{\mathcal {E}}\),

$$\begin{aligned} \Vert W_{i+1}'P_{i+1}'v\Vert _2\ge \frac{c' \tilde{\varepsilon }\sqrt{x}\,h}{4} \ge \frac{c' \tilde{\varepsilon }\sqrt{(1+\tilde{\varepsilon })^{-i}r}\,h}{4}. \end{aligned}$$

Now, for every \(1\le j\le i\), by the above and in view of conditioning on \({\mathcal {E}}^{(j)}\),

$$\begin{aligned} \Vert W_jP_j v\Vert _2&\le \frac{2^{i+1-j}\sqrt{\ell _j}}{h} \frac{2^{i+4-j}\sqrt{\ell _j/x}\,\,\tau }{c'\tilde{\varepsilon }h^2\, g\big (\varrho _j \big )}\\&\le \frac{2^{2i+6-2j} (1+\tilde{\varepsilon })^{-j}r\,\,\tau /\sqrt{(1+\tilde{\varepsilon })^{-i}r}}{c' 16^{i-j}\, h^2\, g\big (\varrho _i \big )}, \end{aligned}$$

whence

$$\begin{aligned} \sum _{j=1}^i \Vert W_jP_j v\Vert _2\le \frac{2^{7} \sqrt{(1+\tilde{\varepsilon })^{-i}r}\,\,\tau }{c' h^2\, g\big (\varrho _i \big )} \le \frac{c' \tilde{\varepsilon }\sqrt{(1+\tilde{\varepsilon })^{-i}r}\,h}{16}. \end{aligned}$$

Similarly, in view of conditioning on \({\mathcal {E}}_Q\), we get

$$\begin{aligned} \Vert QP_x v\Vert _2\le \frac{4\tau }{c' \tilde{\varepsilon }\sqrt{x}\,h} \frac{\sqrt{x}}{h} \le \frac{4 h^4\sqrt{x}}{c' \tilde{\varepsilon }h^2}<\frac{c' \tilde{\varepsilon }\sqrt{(1+\tilde{\varepsilon })^{-i}r}\,h}{16}. \end{aligned}$$

Thus,

$$\begin{aligned} \Vert B'_{\{r+1,\dots ,r+x\}\times [r+x]}v\Vert _2\ge & {} \Vert W_{i+1}'P_{i+1}'v\Vert _2-\sum _{j=1}^i \Vert W_jP_j v\Vert _2 -\Vert QP_x v\Vert _2\\\ge & {} \frac{c' \tilde{\varepsilon }\sqrt{(1+\tilde{\varepsilon })^{-i}r}\,h}{8}\ge \tau , \end{aligned}$$

and the proof is complete. \(\square \)

4 Random polytopes, and distances to pivot rows

Let B be an \(n \times m\) matrix with \(m\ge n\), and assume that every square submatrix of B is invertible. We define recursively the sequence of indices \(\{i_r(B)\}_{r \in [n]}\), vectors \(\{v_r(B)\}_{r \in [n]}\) in \({\mathbb {R}}^m\), and polytopes \(\{K_r(B)\}_{r \in [n]}\) in \({\mathbb {R}}^m\) as follows.

Set \(v_1(B):=e_1\) and \(I_0(B):=\emptyset \). For r from 1 to n,

$$\begin{aligned} i_r(B)&:= \hbox {argmax}_{i \in [n]\backslash I_{r-1}(B)} | \langle v_r(B) ,\, (B_{i,[m]})^\top \rangle |, \nonumber \\ I_r(B)&:= \{i_s(B)\}_{s\in [r]}, \nonumber \\ v_r(B)&:= \left( -((B_{I_{r-1},[r-1]})^{-1} B_{I_{r-1},r})^\top ,\, 1 ,\, \underbrace{0, \dots , 0}_{ m-r \hbox { components } } \right) ^\top , \nonumber \\ K_r(B)&:= \big \{ x \in {\mathbb {R}}^m \, :\, \forall s \in [r],\, | \langle v_s(B) ,\, x \rangle | \le | \langle v_s(B),\, (B_{i_s(B),[m]})^\top \rangle | \big \}. \end{aligned}$$
(24)

Observe that \(v_r(B)\) is a null vector of \(B_{I_{r-1}, [r]}\) such that \(v_{r}(B)=1\), and that \(i_r(B)\) can be viewed as the index of the r-th pivot row in the Gaussian Elimination with Partial Pivoting with the [rectangular] input matrix B. Note also that our definition of the sets \(I_r(B)\) is consistent with that of the sets \(I_r(A)\) discussed earlier. The above construction does not provide any tie-breaking rules for the choice of the indices \(i_r(B)\) in case when respective expressions have multiple maximizers. In our setting, however (when B is Gaussian), each pivot is unique with probability one, and hence the choice of a tie-breaking rule is irrelevant. We have an immediate relation

$$\begin{aligned} K_r(B) = K_r(B_{[n],[r]}) \times {\mathbb {R}}^{m-r} \hbox { and } \sigma _m(K_r(B)) = \sigma _r(K_r(B_{[n],[r]})),\quad r \in [m-1], \end{aligned}$$
(25)

where \(\sigma _k\) is the standard Gaussian measure for the corresponding dimension.

Suppose we have performed r steps of the GEPP algorithm on the \(n\times n\) Gaussian matrix A. Let \(I\subset [n]\) have size r, and condition on a realization of \(I_r=I\) and \(A_{I,[r]}\), which determines \(K_r(A)\). Then, for every \(j\in [n]{\setminus } I\), the jth row of A is a Gaussian vector conditioned to stay within the polytope \(K_r(A)\). Formally, for every \(I\subset [n]\) of size r, every \(j\in [n]{\setminus } I\), and every Borel subset \({\mathcal {B}}\) of \({\mathbb {R}}^n\),

$$\begin{aligned}{} & {} {{\mathbb {P}}}\big \{(A_{j,[n]})^\top \in {\mathcal {B}}\;|\;A_{I,[r]}\big \}\\{} & {} \quad =\frac{\sigma _n({\mathcal {B}}\cap K_r(A))}{\sigma _n(K_r(A))}\quad \hbox {almost everywhere on the event } \{I_r(A)=I\}. \end{aligned}$$

We will not directly use the above description of the conditional distribution of \((A_{j,[n]})^\top \) given \(A_{I,[r]}\); instead, we will apply a simple decoupling based on Lemma 4.1 which essentially establishes the same property. We provided the above formula only to clarify our argument.

Lemma 4.1

Suppose B is an \(n\times m\) random matrix such that its entries are i.i.d and have continuous distribution. Then, for \(r \in [n]\) and \(I \subset [n]\) with \(|I|=r\), almost surely the following assertions are equivalent:

  1. 1.

    \(I_r(B)=I\),

  2. 2.

    \(\forall s \in [r],\, i_s(B) = i_s(B_{I,[m]}),\, v_s(B)= v_s(B_{I,[m]}), \hbox { and } K_s(B) = K_s(B_{I,[m]}) \),

  3. 3.

    \(\forall j \in [n]\backslash I\), \((B_{j,[m]})^\top \in K_r(B_{I,[m]})\).

Proof

We can assume without loss of generality that everywhere on the probability space, all square submatrices of B are invertible, and for all \(1\le s\le n-1\) and \(I'\subset [n]\) of size s, the expression

$$\begin{aligned} | B_{i,s+1} - B_{i,[s]} (B_{I',[s]})^{-1}B_{I',s+1}| \end{aligned}$$

attains its maximum on \(i\in [n]\backslash I'\) at a unique point. These conditions ensure that the above algorithm for generating \(\{i_r(B)\}_{r \in [n]}\), \(\{v_r(B)\}_{r \in [n]}\), \(\{K_r(B)\}_{r \in [n]}\) have a uniquely determined output i.e no ambiguity in the choice of the indices \(i_r(B)\) occurs.

Notice that the implication \(2 \Rightarrow 3\) is straightforward by the above definitions. We will check the implications \(1\Rightarrow 2\) and \(3\Rightarrow 1\) below.

Implication \(1 \Rightarrow 2\). Condition on the event \(\{I_r(B)=I\}\). We have \(v_1(B)=v_1(B_I) = e_1\) and

$$\begin{aligned} i_1(B_{I,[m]})= {\textrm{argmax}}_{i \in I}|\langle v_1(B), (B_{i,[m]})^\top \rangle | ={\textrm{argmax}}_{i \in [n]}|\langle v_1(B), (B_{i,[m]})^\top \rangle |=i_1(B). \end{aligned}$$

Further, assume that \(k<r\) is such that \(\forall s \in [k]\), \(i_s(B)= i_s(B_{I,[m]})\). Since \(I_k(B)=I_k(B_{I,[m]})\), we also have \(v_{k+1}(B)=v_{k+1}(B_{I,[m]})\), and thus,

$$\begin{aligned} i_{k+1}(B_{I,[m]})&= {\textrm{argmax}}_{i \in I\backslash I_k(B_{I,[m]})}|\langle v_{k+1}(B_{I,[m]}) , B_{i,[m]}^\top \rangle |\\&={\textrm{argmax}}_{i \in [n]\backslash I_k(B)}|\langle v_{k+1}(B) , B_{i,[m]}^\top \rangle | =i_{k+1}(B). \end{aligned}$$

Thus, by induction, \(i_s(B)= i_s(B_{I,[m]})\) for all \(s\in [r]\), whence \(v_s(B)=v_s(B_{I,[m]})\), \(K_s(B)=K_s(B_{I,[m]})\), and \(I_s(B)=I_s(B_{I,[m]})\) for all \(s \in [r]\).

Implication \(3 \Rightarrow 1\). The argument is based on induction just as above. We assume that \(\forall j \in [n]\backslash I\), \((B_{j,[m]})^\top \in K_r(B_{I,[m]})\). First, \(v_1(B)=v_1(B_{I,[m]})=e_1\), and since

$$\begin{aligned} K_r(B_{I,[m]})\subset \big \{x \in {\mathbb {R}}^m \,:\, |\langle v_1(B_{I,[m]}),\, x \rangle | \le | \langle v_1(B_{I,[m]}),\, (B_{i_1(B_{I,[m]}),[m]})^\top \rangle |\big \}, \end{aligned}$$

we have

$$\begin{aligned} {\textrm{argmax}}_{j \in [n]\backslash I}| \langle e_1, B_{j,[m]}^\top \rangle | \le |\langle e_1, (B_{i_1(B_{I,[m]}),[m]})^\top \rangle |. \end{aligned}$$

On the other hand, by the definition of \(i_1(B_{I,[m]})\),

$$\begin{aligned} |\langle e_1, (B_{i_1(B_{I,[m]}),[m]})^\top \rangle |= {\textrm{argmax}}_{i \in I} |\langle e_1, (B_{i,[m]})^\top \rangle |. \end{aligned}$$

As a consequence, \(i_1(B)=i_1(B_{I,[m]})\in I\), completing the base step of the induction. Now, let \(k<r\) be an integer such that \(\forall s \in [k], i_s(B)=i_s(B_{I,[m]})\). Since \(v_{k+1}(B)=v_{k+1}(B_{I,[m]})\) by our construction, and since

$$\begin{aligned} K_r(B_{I,[m]})\subset \big \{x \in {\mathbb {R}}^m \,:\, |\langle v_{k+1}(B_{I,[m]}),\, x \rangle | \le | \langle v_{k+1}(B_{I,[m]}),\, (B_{i_{k+1}(B_{I,[m]}),[m]})^\top \rangle |\big \}, \end{aligned}$$

we get

$$\begin{aligned} {\textrm{argmax}}_{j \in [n]\backslash I}| \langle v_{k+1}(B), (B_{j,[m]})^\top \rangle |&\le |\langle v_{k+1}(B_{I,[m]}), (B_{i_{k+1}(B_{I,[m]}),[m]})^\top \rangle |\\&= {\textrm{argmax}}_{i \in I\backslash I_{k}(B)} |\langle e_1, (B_{i,[m]})^\top \rangle |, \end{aligned}$$

which implies that \(i_{k+1}(B)=i_{k+1}(B_{I,[m]})\in I\). Thus, we conclude by induction that \(i_s(B)=i_s(B_{I,[m]})\in I\) for all \(s\in [r]\), and the result follows. \(\square \)

As the first main result of the section, we have a probability estimate for the event that the Gaussian measure of the polytope \(K_r(A)\) is below a given threshold:

Proposition 4.2

(Gaussian measure of \(K_r(A)\)) Let A be an \(n\times n\) Gaussian matrix. Then for any \(r \in [n-1]\) and any \(t \ge 2\),

$$\begin{aligned} {{\mathbb {P}}}\{ \sigma _n(K_r(A)) \le n^{-t} \} \le n^{-t(n-r)/2}. \end{aligned}$$

Proof

We start by writing

$$\begin{aligned} {{\mathbb {P}}}\{ \sigma _n(K_r(A)) \le n^{-t} \}&= \sum _{ I \subset [n],\, |I|=r} {{\mathbb {P}}}\big \{ I_r(A) = I \hbox { and }\sigma _n(K_r(A)) \le n^{-t} \big \}. \end{aligned}$$

For each summand, we apply Lemma 4.1 to get

$$\begin{aligned} {{\mathbb {P}}}&\big \{ I_r(A) = I \hbox { and }\sigma _n(K_r(A)) \le n^{-t} \big \} \nonumber \\&= {{\mathbb {P}}}\big \{ I_r(A) = I \hbox { and } \sigma _n(K_r(A_{I,[n]})) \le n^{-t} \big \} \nonumber \\&= {{\mathbb {P}}}\Big \{ \sigma _n(K_r(A_{I,[n]})) \le n^{-t} \hbox { and } \forall i \in [n]\backslash I,\, (A_{i,[n]})^\top \in K_r(A_{I,[n]}) \Big \}. \end{aligned}$$
(26)

Since \(K_r(A_{I,[n]}) \) and \((A_{i,[n]})^\top \) for \(i\in [n]\backslash I\) are independent, we get

$$\begin{aligned} (26) \;&\le {{\mathbb {P}}}\big \{ \sigma _n(K_r(A_{I,[n]})) \le n^{-t} \big \} \\&\quad \cdot {{\mathbb {P}}}\Big \{ \forall i \in [n]\backslash I,\, (A_{i,[n]})^\top \in K_r(A_{I,[n]}) \; \big \vert \; \sigma _n(K_r(A_{I,[n]})) \le n^{-t} \Big \} \\&\le 1\cdot (n^{-t})^{n-r}. \end{aligned}$$

Finally, in view of the standard bound \({ n \atopwithdelims ()n-r} \le n^{n-r} \) for the number of subsets \(I \subset [n]\) of size r, and by the union bound argument, the result follows. \(\square \)

Lemma 4.3

Let A be an \(n\times n\) Gaussian matrix, and let \(r\in [n]\) and \(\tau \in (0,1)\) be parameters. Then, conditioned on the event \(\big \{\sigma _n(K_r(A)) \ge \tau \big \}\),

$$\begin{aligned} \textrm{dist}( H , (A_{i_r,[r]})^\top ) \ge \sqrt{\frac{\pi }{2} }\, \tau , \end{aligned}$$

where H is the subspace of \({\mathbb {R}}^r\) spanned by vectors \((A_{i_s,[r]})^\top \), \(s\in [r-1]\).

Proof

Let \(v:= v_r(A)/ \Vert v_r(A)\Vert _2\) and let \(P:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^r\) be the orthogonal projection onto the span of \(\{e_s\}_{s\in [r]}\). From the definition of v, we have that Pv is a unit normal to the hyperplane H in \({\mathbb {R}}^r\). Then

$$\begin{aligned} \textrm{dist}( H, (A_{i_r,[r]})^\top ) = |\langle Pv, (A_{i_r,[r]})^\top \rangle | = |\langle v, (A_{i_r,[n]})^\top \rangle |:=s. \end{aligned}$$

It remains to note that, by the definition of \(K_r(A)\), on the event \(\big \{\sigma _n(K_r(A)) \ge \tau \big \}\) we have

$$\begin{aligned} \tau&\le \sigma _n( K_r(A) ) \le \sigma _n\big ( \{ x \in {\mathbb {R}}^n\,:\, | \langle v_r(A), x \rangle | \le | \langle v_r(A), (A_{i_r,[n]})^\top \rangle | \} \big ) \\&= \sigma _n\big ( \{ x \in {\mathbb {R}}^n \,:\, | \langle v, x \rangle | \le s \} \big ) = \int _{ -s }^{s} \frac{1}{\sqrt{2\pi }} \exp ( - t^2/2) \textrm{d}t \le \frac{2s}{\sqrt{2\pi }}. \end{aligned}$$

\(\square \)

As a corollary, we obtain the following probabilistic bound on the distance between \((A_{i_r,[r]})^\top \) and the span of “previous” rows (selected at previous steps of the GEPP process) \((A_{i_s,[r]})^\top \), \(s\in [r-1]\):

Corollary 4.4

Let A be an \(n\times n\) Gaussian matrix. For \( t \ge 2\) and \( r \in [n-1]\), with probability at least \(1-n^{ - t(n-r)/2 }\) we have

$$\begin{aligned} \textrm{dist}( H, (A_{i_r,[r]})^\top ) \ge \sqrt{\frac{\pi }{2} }\, n^{-t}, \end{aligned}$$

where H is the random subspace of \({\mathbb {R}}^r\) spanned by vectors \((A_{i_s,[r]})^\top \), \(s\in [r-1]\).

Proof

In view of Lemma 4.3, the statement would follow as long as \({{\mathbb {P}}}\big \{\sigma _n(K_r(A)) \ge n^{-t}\big \}\ge 1-n^{ - t(n-r)/2 }\). The latter is verified in Proposition 4.2. \(\square \)

5 A recursive argument

The goal of this section is to bound from below the intermediate singular values \(s_{r-k}(A_{I_r,[r]})\) for every r greater than some absolute constant and for k of a constant order. We will start with bounding the intermediate singular values in the bulk of the singular spectrum first and then will recursively apply Proposition 3.3 to provide lower bounds for smaller and smaller intermediate singular values.

As we mentioned in the overview of the proof, the intermediate singular values \(s_{r-k}(A_{I_r},[r])\) for \(k \gg r^{1/2}\,\textrm{polylog}(n)\) can be easily estimated from below with high probability by taking the union bound over the estimates of \(s_{r-k}(A_{I,[r]})\) (see Proposition 3.2) for \(I \subset [n]\) with \(|I|=r\). To bound \(s_{r-k}(A_{I_r,[r]})\) from below for smaller values of k we apply the following strategy. We choose an appropriate positive integer \(r'<r\), condition on a realization of \(I_{r'}\) and \(A_{I_{r'},[r']}\), and, for any I with \(I_{r'} \subset I \subset [n]\) and \(|I|=r\), apply Proposition 3.3 with \(B:=A_{I,[r]}\) and \(F:=A_{I_{r'},[r']}\). This way, \(s_{r-k}(A_{I,[r]})\) is bounded below with high probability conditioned on an event that the intermediate singular values \(s_{r-k'}(A_{I_{r'},[r']})\) are well bounded for every \(k'\) slightly bigger than k.

Definition 5.1

For an integer \(k \in [n]\) and parameters \(p,\beta \ge 1\), let \({\mathcal {E}}_{\textrm{is}}(p,k,\beta )\) be the event that

$$\begin{aligned} \forall r \in [k+1,n],\quad \quad s_{r - k}(A_{I_r,[r]}) \ge n^{-\beta /(50p)}. \end{aligned}$$

and \({\mathcal {E}}_{\textrm{rec}}(p,k,\beta )\) be the event that

$$\begin{aligned} \forall r \in [k+1,n-2k],\quad \quad s_{\min }\big ((A_{I_r,[r+2k]})^\top \big ) \ge n^{-\beta /(20p)}. \end{aligned}$$

Note that although n is not mentioned explicitly in the list of parameters for \({\mathcal {E}}_{\textrm{is}}(p,k,\beta )\), it clearly depends on the underlying matrix dimension.

The next proposition is the main result of this section:

Proposition 5.2

There is a universal constant \(C>0\) with the following property. Let \(p \ge 1\). Then there exist positive integers \(n_0:=n_0(p)\), \(120p\le k_0:=k_0(p)\le Cp\), and a positive real number \(300p \le \beta _0:=\beta _0(p)\le Cp\), so that for any \(n\ge n_0\) and \(\beta \ge \beta _0\),

$$\begin{aligned} {{\mathbb {P}}}\big ( {\mathcal {E}}_{\textrm{is}}(p,k_0(p),\beta )^c \big ) \le n^{-2\beta +o_n(1)}. \end{aligned}$$

We remark that the lower bounds on \(k_0(p)\) and \(\beta _0(p)\) in the assumptions of the proposition are not required in the proof but will be needed later. As a corollary of the proposition (proved in the end of this section), we have

Corollary 5.3

For any \(p\ge 1\), \(\beta \ge \beta _0(p)\),

$$\begin{aligned} {{\mathbb {P}}}\big ( {\mathcal {E}}_{\textrm{rec}}(p,k_0(p), \beta )^c \big ) = n^{-2\beta +o_n(1)}. \end{aligned}$$
(27)

Now, we present a technical version of the above proposition. We introduce several “section-level” parameters. Let \(\tilde{\varepsilon }>0\) be a small constant and L be a large integer to be determined later. The parameter \(\tilde{\varepsilon }\) will play the same role as in Proposition 3.3. Next, let

$$\begin{aligned} m_0:= \lceil L/ \tilde{\varepsilon }^5 \rceil \end{aligned}$$

and let \(s_1\) be the smallest integer such that \( 2^{s_1} m_0 \ge n\). Then we define the finite sequence \(m_1,\dots ,m_{s_1+1}\), where

$$\begin{aligned} \forall s \in [s_1-1],\quad m_s:= 2^s m_0 \quad \hbox { and }\quad m_{s_1+1}:= n. \end{aligned}$$

The main technical result in this section is the following

Lemma 5.4

Fix \(\tilde{\varepsilon }\in (0, 1/100]\) and \(L\ge 1/ \tilde{\varepsilon }\). Then there exists a positive integer \(n_0\) (depending on \(\tilde{\varepsilon }\) and L) such that for any \(n \ge n_0\) and \(s \in [0, s_1-1]\), we have for every \(\alpha \ge 4\):

$$\begin{aligned} {{\mathbb {P}}}\big \{ \exists r \in [m_{s+1}, m_{s+2}] \hbox { s.t.\ } s_{r - \lceil 9L/ \tilde{\varepsilon }\rceil }(A_{I_r,[r]}) \le n^{-C( \tilde{\varepsilon })\alpha } \big \} \le n^{-c( \tilde{\varepsilon }) \alpha L }, \end{aligned}$$
(28)

where \( c( \tilde{\varepsilon })\) and \(C( \tilde{\varepsilon })\) are positive constants which depend on \(c, {\tilde{C}}\) from Proposition 3.3 and on \( \tilde{\varepsilon }\).

Proof of Proposition 5.2

Let \(\tilde{\varepsilon }:= 1/100\). We can safely assume that the constants \( c( \tilde{\varepsilon })\) and \(C( \tilde{\varepsilon })\) from Lemma 5.4 satisfy \(c( \tilde{\varepsilon })\in (0,1]\) and \(C( \tilde{\varepsilon })\ge 1\). Choose

$$\begin{aligned} L:=\max \bigg (\frac{1}{\tilde{\varepsilon }}, 80p\, \frac{ C( \tilde{\varepsilon }) }{ c(\tilde{\varepsilon })}\bigg ). \end{aligned}$$

Let \(\beta _0:= \max \{ 4 c(\tilde{\varepsilon })L,300p\}\), \(k_0(p):= \max \{ \lceil 9\,L/ \tilde{\varepsilon }\rceil , 120p\}\), and let \(\beta \ge \beta _0\). Applying Lemma 5.4 with \(\alpha \ge 4\) satisfying \( \beta /(40p) = C(\tilde{\varepsilon })\alpha \), we get

$$\begin{aligned} {{\mathbb {P}}}\big \{\exists \,\,r\ge m_1\hbox { s.t. } s_{r - \lceil 9L/ \tilde{\varepsilon }\rceil }(A_{I_r,[r]}) \le n^{-\beta /(40p) } \big \} \le (s_1+1) n^{-c( \tilde{\varepsilon }) \alpha L } \le (s_1+1) n^{-2\beta }, \end{aligned}$$

implying the result for large enough n. \(\square \)

For the rest of the section, we fix \(s \in [0, s_1-1]\).

5.1 Choice of parameters and the growth function

Definition 5.5

(Definition of \(i_{th},i_{\max }\), \(f_i\), \(r_i\)) For a given positive integer L and for \( \tilde{\varepsilon }\in (0,1/4]\), let \(i_{th}\) be the integer such that

$$\begin{aligned} ( 1 + \tilde{\varepsilon })^{-i_{th}} m_s \ge \tilde{\varepsilon }m_s/10 > ( 1+ \tilde{\varepsilon })^{-i_{th}-1} m_s. \end{aligned}$$

and let \(i_{\max }\) be the integer such that

$$\begin{aligned} ( 1 + \tilde{\varepsilon })^{-i_{\max }} m_s \ge L/\tilde{\varepsilon }> ( 1+ \tilde{\varepsilon })^{-i_{\max }-1} m_s. \end{aligned}$$
(29)

Note that \(\tilde{\varepsilon }m_s/10\ge L/\tilde{\varepsilon }\), and hence \(i_{th}\le i_{\max }\).

For every \(i \in [i_{th},i_{\max }]\), we define a non-decreasing function

$$\begin{aligned} f_i(r):= \Big \lfloor \frac{r}{1 + (1 + \tilde{\varepsilon })^{-i} } \Big \rfloor ,\quad r\in {\mathbb {N}}. \end{aligned}$$
(30)

Further, we define a collection of integers \(\{r_i\}_{i\in [i_{\max }+1]}\) inductively as follows. Whenever \(i \in [i_{th}]\), we set \(r_i:= m_s \). Further, assuming that \(r_i\) has been defined for some \(i\in [i_{th},i_{\max }]\), we let \(r_{i+1}\) be the smallest integer such that \(f_i(r_{i+1}) \ge r_i\). Note that \(m_s=r_1\le r_2 \le \dots \le r_{i_{\max }+1}\).

We recall our strategy: to bound the singular value \(s_{\lfloor ( 1 - ( 1+ \tilde{\varepsilon })^{-i-1})r \rfloor }(A_{I_r,[r]})\) from below, we will select an appropriate integer \(r'<r\) and apply Proposition 3.3 with \(B:= A_{I,[r]}\) and \(F:=A_{I_{r'},[r']}\), taking the union bound over all subsets \(I\subset [n]\) with \(I_{r'}\subset I\) and \(|I|=r\). The function \(f_i\) defined above, determines the choice of \(r'\), namely, we choose

$$\begin{aligned} r':= f_i(r), \end{aligned}$$

for \(r_i \le r \le m_{s+2}\). The indices \(i_{th}\) and \(i_{\max }\) defined above, determine the range of application for the inductive strategy; namely, \(i_{\max }\) marks the largest index i for which our induction argument can be applied, and \(i_{th}\) indicates a threshold value below which the corresponding singular values \(s_{\lfloor ( 1 - ( 1+ \tilde{\varepsilon })^{-i})r \rfloor }(A_{I_r,[r]})\) concentrate very strongly and can bounded directly with help of Proposition 3.2 and a simple union bound argument.

The goal of this subsection is to verify certain relations between the introduced parameters, that need to be satisfied in order to apply the results on the singular values established earlier. Since the results here are of purely computational nature, we present the proofs in the Appendix.

Lemma 5.6

(Inequalities for \(i_{\max }\)) Let \( \tilde{\varepsilon }\in (0, 1/4)\) and \(L\ge 1/ \tilde{\varepsilon }\). For \(r \in [m_s, m_{s+2} ]\),

$$\begin{aligned} r - \lfloor ( 1 - ( 1 + \tilde{\varepsilon })^{-i_{\max }-1})r \rfloor \le 9L/ \tilde{\varepsilon }. \end{aligned}$$
(31)

Further,

$$\begin{aligned} i_{\max }\le 2\log (m_s) / \tilde{\varepsilon }. \end{aligned}$$
(32)

Lemma 5.7

(Assumptions in Proposition 3.3) Let \(\tilde{\varepsilon }\in (0, \frac{1}{28})\) and \(L\ge 4\). Fix \(i \in [i_{th}, i_{\max }]\) and assume that \({\tilde{r}}\) satisfies \(r_{i+1}\le {\tilde{r}}\le m_{s+2}\). Let \( r: = f_i( {\tilde{r}}) \) and \( x:= {\tilde{r}} - r\). Then,

$$\begin{aligned} ( 1 + \tilde{\varepsilon })^{-i} r \le x \le \frac{21}{20} ( 1 + \tilde{\varepsilon })^{-i} r. \end{aligned}$$
(33)

Moreover, irx, and \(\tilde{\varepsilon }\) satisfy the assumptions in Proposition 3.3, specifically,

$$\begin{aligned}&r - \lfloor (1 - ( 1 + \tilde{\varepsilon })^{-i}) r \rfloor \le x \le r,&\tilde{\varepsilon }x&\ge 4, \\&3 ( 1 + \tilde{\varepsilon })^{-i-1} r - ( 1+ \tilde{\varepsilon })^{-i}r \ge x + 1 + 11 \tilde{\varepsilon }x,&\tilde{\varepsilon }(1 + \tilde{\varepsilon })^{-i} r&\ge 2 . \end{aligned}$$

For a given \(i \in [i_{th}, i_{\max }]\), the number \({\tilde{r}}\) satisfying the assumptions of the above lemma can only be chosen if \(r_{i+1}\le m_{s+2}\). In the next statement, we show that the inequality is satisfied for every admissible i (and in fact verify a slightly stronger bound):

Lemma 5.8

(An upper bound on \(r_{i_{\max }+1}\)) Let \( \tilde{\varepsilon }\in (0, 1/28)\) and \(L\ge 4\). Then, \(r_{i_{\max }+1}\le 2m_s=m_{s+1}\).

To construct the growth function \(g(\cdot )\) from (13), we first define an auxiliary positive function \(g_{s}(\cdot )\), and then set

$$\begin{aligned} g\big (\lfloor (1-(1+\tilde{\varepsilon })^{-j})r\rfloor \big ):=g_s(j) \end{aligned}$$

for all admissible j. The formal definition of \(g_s(\cdot )\) is given below.

Definition 5.9

Let \(\alpha \ge 1\) be a parameter. For \( i \in [i_{th}]\), we set

$$\begin{aligned} g_s (i) := \frac{c'}{2\sqrt{m_s}} 16^{-i}m_s n^{-\alpha }, \end{aligned}$$
(34)

where \(c'\) is the constant from Proposition 3.2.

For \(i \in [i_{th},i_{\max }]\), we apply a recursive definition:

$$\begin{aligned} g_s(i+1):= \frac{c' \tilde{\varepsilon }}{32} h_s(i)^5 g_s(i), \end{aligned}$$

where \(h_s(i)\) is given by

$$\begin{aligned} h_s(i) :=&\exp \Big ( - \max \Big \{ \frac{128 \alpha \,\log n }{ \tilde{\varepsilon }^2 ( 1+ \tilde{\varepsilon })^{-i} m_s } ,\, C_h \Big \} \Big ), \end{aligned}$$
(35)

and where \(C_h\ge -\log \big (2^{-11}(c')^2\tilde{\varepsilon }\big )\) is a constant depending only \(c, {\tilde{C}}\) (from Proposition 3.3) and \(c'\), and which we shall determine in Lemma 5.10.

The function \(h_s(i)\) corresponds to the parameter h in Proposition 3.3, and is constructed in such a way that certain union bound argument that we are going to apply further works. The next lemma clarifies the choice of the constant \(C_h\) from the above definition:

Lemma 5.10

The constant \(C_h\) can be chosen so that the following holds. For \(i \in [i_{th}, i_{\max }]\) and \( {\tilde{r}} \in [r_{i+1}, m_{s+2}]\), let \(r:= f_i( {\tilde{r}} )\) and \( x:= {\tilde{r}} - r\). Then,

$$\begin{aligned}&2x^{\tilde{\varepsilon }x/2}\, h_s(i)^{(\tilde{\varepsilon }x)^2/64} +4\exp \big (-cx^2\,\tilde{\varepsilon }/h_s(i)^2\big ) +\tilde{C}\exp \big (-c\tilde{\varepsilon }^2 (1+\tilde{\varepsilon })^{-i}r x/h_s(i)^2\big ) \nonumber \\&\quad \le \exp \big ( - \alpha \log (n) x \big ). \end{aligned}$$
(36)

In the next lemma we verify the crucial bound on the growth function which will ultimately guarantee a polynomial in n bound on the intermediate singular values:

Lemma 5.11

There exists \(C( \tilde{\varepsilon })>1\) which depends on \(c', {\tilde{C}}\) from Proposition 3.3, on \(C_h\), and on \( \tilde{\varepsilon }\), such that

$$\begin{aligned} \forall \alpha \ge 1,\,\quad \quad g_s(i_{\max }+1) \ge n^{ - C( \tilde{\varepsilon })\alpha }. \end{aligned}$$
(37)

5.2 Good events, and probability estimates

Definition 5.12

For \( i \in [i_{\max }+1]\) and \(r \in [r_i, n]\), let \( {\mathcal {E}}(r, i)\) be the event that

$$\begin{aligned} s_{\lfloor ( 1 - ( 1+ \tilde{\varepsilon })^{-i})r \rfloor } (A_{I_r, [r]}) \ge g_s(i), \end{aligned}$$

where \(g_s(\cdot )\) is given in Definition 5.9 and \(r_i,i_{\max }\) are taken from Definition 5.5. Further, we denote \( {\mathcal {E}}(r, [i]):= \bigcap _{ j \in [ i ] } {\mathcal {E}}(r,j)\).

Lemma 5.13

For \(\tilde{\varepsilon }\in (0, 1/100]\), \(L\ge 4\) and \( \alpha \ge 4\),

$$\begin{aligned} {{\mathbb {P}}}\Big ( \bigcup \limits _{r \in [m_s, m_{s+2}]} {\mathcal {E}}( r, [i_{th}] )^c \Big ) \le \exp \big ( - \alpha L \log n \big ). \end{aligned}$$
(38)

Proof

Fix \(i \in [i_{th}]\) and \(r \in [m_s, m_{s+2}]\). Let \(q:= r - \lfloor (1 - (1+ \tilde{\varepsilon })^{-i})r \rfloor \). Then

$$\begin{aligned} q \ge ( 1 + \tilde{\varepsilon })^{-i}r \ge (1 + \tilde{\varepsilon })^{-i}m_s \ge 16^{-i}m_s. \end{aligned}$$
(39)

We recall that in view of the definition of \(m_s\) and \(i_{th}\), necessarily \(q<r\); furthermore,

$$\begin{aligned} q \ge (1 + \tilde{\varepsilon })^{-i}m_s \ge (1+ \tilde{\varepsilon })^{-i_{th}}m_s \ge \tilde{\varepsilon }m_s/10 \ge \tilde{\varepsilon }m_0/10 \ge L/ \tilde{\varepsilon }^4 \ge 32. \end{aligned}$$
(40)

For each \(I \subset [n]\) with \(|I|=r\),

$$\begin{aligned} {{\mathbb {P}}}\big \{ s_{r-q}(A_{I,[r]})< g_s(i) \big \}&= {{\mathbb {P}}}\Big \{ s_{r-q}(A_{I,[r]})< \frac{ c' }{ 2 \sqrt{m_s} } 16^{-i}\,m_s\, n^{-\alpha } \Big \} \\&\le {{\mathbb {P}}}\Big \{ s_{r-q}(A_{I,[r]}) < \frac{ c' }{\sqrt{r} } q n^{-\alpha } \Big \} \, \, \\&\qquad (\hbox {by (39) and the definition of } m_s,m_{s+2}) \\&\le \exp \big ( \log (r)q/2 - \log (n)\alpha q^2/32 \big ) \,\, \quad ( \hbox {by Proposition 3.2}). \end{aligned}$$

Applying (40), we conclude that

$$\begin{aligned} {{\mathbb {P}}}\big \{ s_{r - q}(A_{I,[r]}) < g_s(i) \big \}&\le \exp \big ( \log (n)q^2/64 - \log (n) \alpha q^2/32 \big ) \\&\le \exp \Big ( - \log (n) \alpha \Big (\frac{\tilde{\varepsilon }m_s}{10} \Big )^{2}/64 \Big ). \end{aligned}$$

We complete the proof with the union bound argument. There are \({n \atopwithdelims ()r} \le (en/r)^r \le \exp ( r\log n) \) subsets \(I \subset [n]\) with \(|I|=r\). As \(r \le m_{s+2}\le 4m_s\), and in view of the definition of \(m_s\) and our choice of \(\tilde{\varepsilon }\),

$$\begin{aligned} {{\mathbb {P}}}\big \{ s_{r-q}(A_{I_r,[r]} )<g_s(i) \big \}&\le {{\mathbb {P}}}\big \{ \exists I\subset [n] \hbox { with }|I|=r \hbox { such that } s_{r - q}(A_{I,[r]}) < g_s(i) \big \} \\&\le \exp \Big ( - \log (n)\alpha \Big (\frac{\tilde{\varepsilon }m_s}{10} \Big )^{2}/128 \Big ), \end{aligned}$$

By applying the union bound argument again over all \(i \in [i_{th}]\) and all \( r \in [m_s, m_{s+2}]\), the statement of the lemma follows. \(\square \)

Lemma 5.14

Assume \(\tilde{\varepsilon }\in (0, 1/100]\), \(L\ge 4\) and \( \alpha \ge 4\). For \( i \in [i_{th},i_{\max }]\) and \( {\tilde{r}} \in [r_{i+1}, m_{s+2}] \), set \(r:= f_i( {\tilde{r}})\) and \(x:={\tilde{r}}-r\). Then

$$\begin{aligned} {{\mathbb {P}}}\Big ( {\mathcal {E}}(\tilde{r}, i+1)^c \cap {\mathcal {E}}( r, [i] ) \cap \{ \sigma _n( K_r(A) ) \ge n^{-\alpha /2} \} \Big )&\le \exp \Big ( - \frac{1}{4} \alpha \log (n) x \Big ) \\&\le \exp \Big ( - \frac{1}{4} \alpha \log (n) \frac{L}{\tilde{\varepsilon }} \Big ), \end{aligned}$$

where the random polytope \(K_r(A) \subset {\mathbb {R}}^n\) was defined in (24), and where \(\sigma _n\) is the standard Gaussian measure in \({\mathbb {R}}^n\).

Proof

We start by noting that the last inequality in the statement of the lemma follows from the estimate \(x \ge (1 + \tilde{\varepsilon })^{-i_{\max }}m_s \ge L/ \tilde{\varepsilon }\) (see Lemma 5.7 and the definition of \(i_{\max }\)).

We further partition the event in question so that

$$\begin{aligned}&{{\mathbb {P}}}\Big ( {\mathcal {E}}(\tilde{r}, i+1)^c \cap {\mathcal {E}}(r, [i]) \cap \{\sigma (K_r(A)) \ge n^{-\alpha /2}\} \Big ) \nonumber \\&\quad = \sum _{I \subset [n], |I|=r } {{\mathbb {P}}}\Big ( {\mathcal {E}}(\tilde{r}, i+1)^c \cap {\mathcal {E}}(r, [i]) \cap \{\sigma _n(K_r(A)) \ge n^{-\alpha /2}\} \cap \{ I_r(A) = I\} \Big ) . \end{aligned}$$
(41)

For each \(I \subset [n]\) with \(|I|=r\), we define \({\mathcal {E}}(I)\) to be the event

$$\begin{aligned} \forall j \in [i], \, s_{\lfloor ( 1 - ( 1+ \tilde{\varepsilon })^{-j})r \rfloor } (A_{I,[r]} ) \ge g_s(j), \end{aligned}$$

and note that for each admissible I, \({\mathcal {E}}(r, [i])\cap \{ I_r(A) = I\} \subset {\mathcal {E}}(I)\).

For \(I\subset [n]\) with \(|I|=r\) and \(J \subset [n] \backslash I \) with \(|J| = x \), let \({\mathcal {E}}(I, J)\) be the event that

$$\begin{aligned} s_{\lfloor (1 - (1 + \tilde{\varepsilon })^{-i-1})({\tilde{r}}) \rfloor } ( A_{ I\cup J, [{\tilde{r}}]} ) \ge g_s(i+1). \end{aligned}$$

Denote \(K(I):= K_r(A_{I,[r]})\subset {\mathbb {R}}^r\). Then each term in (41) can be bounded as

$$\begin{aligned}&{{\mathbb {P}}}\Big ( {\mathcal {E}}(\tilde{r}, i+1)^c \cap {\mathcal {E}}(r, [i]) \cap \{\sigma _n(K_r(A)) \ge n^{-\alpha /2}\} \cap \{ I_r(A) = I\} \Big ) \nonumber \\&\quad \le \sum _{ J \subset [n] \backslash I ,\, |J| =x } {{\mathbb {P}}}\Big ( {\mathcal {E}}(I,J)^c \cap {\mathcal {E}}(I) \cap \{\sigma _n(K(I)) \ge n^{-\alpha /2}\} \cap \{ I_r(A) = I\} \Big ). \end{aligned}$$
(42)

Now, assume that for every \(I \subset [n]\) and \(J \subset [n] \backslash I\) with \(|I| = r\) and \(|J|=x\),

$$\begin{aligned} {{\mathbb {P}}}\Big ( {\mathcal {E}}(I,J)^c \cap {\mathcal {E}}(I)\; \Big \vert \;\{\sigma _n(K(I)) \ge n^{-\alpha /2}\} \cap \{ I_r(A) = I\} \Big ) \le \exp \Big ( - \frac{1}{2} \alpha \log (n) x \Big ). \end{aligned}$$
(43)

Clearly, for each \(I \subset [n]\) with \(|I|=r\),

$$\begin{aligned} |\{ J \subset [n]\backslash I:\, |J|=x\}| = {n-r \atopwithdelims ()x} \le \Big ( \frac{en}{x}\Big )^x \le \exp ( \log (n)x) \le \exp \Big ( \frac{1}{4}\alpha \log (n)x\Big ). \end{aligned}$$

Together with (42) and (43), this gives

$$\begin{aligned} (41) \le&\sum _{I \subset [n], |I|=r } \exp \Big ( \frac{1}{4}\alpha \log (n)x\Big ) \exp \Big (-\frac{1}{2} \alpha \log (n) x \Big ) \, {{\mathbb {P}}}\{ I_r(A) = I\}\\ =&\exp \Big (-\frac{1}{4} \alpha \log (n) x \Big ), \end{aligned}$$

and the result follows.

Thus, it remains to show (43). By Lemma 4.1, almost everywhere on the probability space we have

$$\begin{aligned} \textbf{1}_{\{ I_r(A) = I\}} = \textbf{1}_{\{ \forall j \in [n]\backslash I,\, (A_{j,[n]})^\top \in K_r(A_{I,[n]})\}} =\textbf{1}_{\{ \forall j \in [n]\backslash I,\, (A_{j,[r]})^\top \in K(I)\}}. \end{aligned}$$

Hence,

$$\begin{aligned}&{{\mathbb {P}}}\Big ( {\mathcal {E}}(I,J)^c \cap {\mathcal {E}}(I) \,\Big \vert \, \{\sigma _r(K(I)) \ge n^{-\alpha /2}\} \cap \{ I_r(A) = I\} \Big )\nonumber \\&\quad \le \frac{ {{\mathbb {P}}}\Big ( {\mathcal {E}}(I,J)^c \cap {\mathcal {E}}(I) \cap \{\sigma _r(K(I)) \ge n^{-\alpha /2}\} \cap \big \{ \forall j \in [n]\backslash (I\cup J),\, (A_{j,[r]})^\top \in K(I)\big \} \Big ) }{ {{\mathbb {P}}}\Big ( \{\sigma _r(K(I)) \ge n^{-\alpha /2}\} \cap \big \{ \forall j \in [n]\backslash I,\, (A_{j,[r]})^\top \in K(I)\big \} \Big ) }. \end{aligned}$$
(44)

In view of the joint independence of the entries of A, we obtain

$$\begin{aligned}&{{\mathbb {P}}}\Big ( {\mathcal {E}}(I,J)^c \cap {\mathcal {E}}(I) \cap \{\sigma _r(K(I)) \ge n^{-\alpha /2}\} \cap \big \{ \forall j \in [n]\backslash (I\cup J),\, (A_{j,[r]})^\top \in K(I)\big \} \Big ) \nonumber \\&\quad = {\mathbb {E}}_{A_{I,[r]}} \Big [ \textbf{1}_{ {\mathcal {E}}(I) \cap \{\sigma _r(K(I)) \ge n^{-\alpha /2}\}} \cdot {{\mathbb {P}}}\big \{ {\mathcal {E}}(I,J)^c \,\big |\, A_{I,[r]}\big \} \nonumber \\&\qquad \cdot {{\mathbb {P}}}\big \{ \forall j \in [n]\backslash (I\cup J) ,\, (A_{j,[r]})^\top \in K(I)\,\big |\, A_{I,[r]} \big \} \Big ], \end{aligned}$$
(45)

where the outer expectation is with respect to \(A_{I,[r]}\).

For each realization of \(A_{I,[r]}\) such that the event \({\mathcal {E}}(I)\) holds, we apply Proposition 3.3 with

$$\begin{aligned} \begin{bmatrix} F&{} M \\ W&{} Q \end{bmatrix}:= \begin{bmatrix} A_{I,[r]}&{}\quad A_{I,[r+1,r+x]} \\ A_{J,[r]}&{}\quad A_{J,[r+1,r+x]} \end{bmatrix} \end{aligned}$$

to bound \( {{\mathbb {P}}}( {\mathcal {E}}(I,J)^c\, |\,A_{I,[r]} )\). Let \(g(\cdot )\) be a growth function satisfying

$$\begin{aligned} \forall j \in [i],\quad g(\lfloor ( 1 - ( 1 + \tilde{\varepsilon })^{-j})r \rfloor ) = g_s(j), \end{aligned}$$

where \(g_s(\cdot )\) is given in Definition 5.9. Since \(16\, g_s(j+1) \le g_s(j)\) for \(j \in [i-1]\) and \(g_s(j) \le 1\) for \( j \in [i]\), the function \(g(\cdot )\) defined this way satisfies (13). Recall that on the event \({\mathcal {E}}(I)\) we have

$$\begin{aligned} s_{\lfloor (1 - ( 1 + \tilde{\varepsilon })^{-j})r \rfloor }(A_{I,[r]}) \ge g(\lfloor ( 1 - ( 1 + \tilde{\varepsilon })^{-j})r \rfloor ),\quad j \in [i]. \end{aligned}$$

We apply Proposition 3.3 with g(t) and with \(h:=h_s(i)\) (see Definition 5.9) so that

$$\begin{aligned} \frac{c' \tilde{\varepsilon }h^5\,g\big (\lfloor (1-(1+\tilde{\varepsilon })^{-i})r\rfloor \big )}{32} = g_s(i+1) \end{aligned}$$

(observe that our parameters rx satisfy the assumption of the proposition due to Lemma 5.7, and that h satisfies the assumption \(h\le 2^{-11} (c')^2 \tilde{\varepsilon }\) in view of the assumptions on the constant \(C_h\) in Definition 5.9). We get

$$\begin{aligned} {{\mathbb {P}}}\big ( {\mathcal {E}}(I,J)\,\big |\, A_{I,[r]}\big )&={{\mathbb {P}}}\Big ( s_{\lfloor (1-(1+\tilde{\varepsilon })^{-i-1})(r+x)\rfloor }(A_{I\cup J, [{\tilde{r}}]})\ge g_s(i+1)\;\big |\; A_{I,[r]} \Big )\\&\ge 1-2x^{\tilde{\varepsilon }x/2}\, h^{(\tilde{\varepsilon }x)^2/64} -4\exp \big (-cx^2\,\tilde{\varepsilon }/h^2\big ) \\&\quad -\tilde{C}\exp \big (-c\tilde{\varepsilon }^2 (1+\tilde{\varepsilon })^{-i}r x/h^2\big ). \end{aligned}$$

In view of Lemma 5.10, this implies

$$\begin{aligned} {{\mathbb {P}}}\big ( {\mathcal {E}}(I,J)^c \,\big |\, A_{I,[r]}\big ) \le \exp \big ( - \alpha \log (n)x\big ). \end{aligned}$$

Combining the last inequality with (45), we obtain

$$\begin{aligned}&{{\mathbb {P}}}\Big ( {\mathcal {E}}(I,J)^c \cap {\mathcal {E}}(I) \cap \{\sigma _r(K(I)) \ge n^{-\alpha /2}\} \cap \big \{ \forall j \in [n]\backslash (I\cup J),\, (A_{j,[r]})^\top \in K(I)\big \} \Big ) \nonumber \\&\quad \le \exp \big ( - \alpha \log (n)x\big )\nonumber \\&\qquad \cdot {{\mathbb {P}}}\Big ( \{\sigma _r(K(I)) \ge n^{-\alpha /2}\} \cap \big \{ \forall j \in [n]\backslash (I\cup J),\, (A_{j,[r]})^\top \in K(I)\big \} \Big ). \end{aligned}$$
(46)

Next, we will treat the denominator in the estimate (44). By Fubini’s theorem,

$$\begin{aligned}&{{\mathbb {P}}}\Big ( \{\sigma _r(K(I)) \ge n^{-\alpha /2}\} \cap \big \{ \forall j \in [n]\backslash I,\, (A_{j,[r]})^\top \in K(I)\big \} \Big )\\&\quad = {\mathbb {E}}_{A_{I,[r]}} \Big [ \textbf{1}_{\{\sigma _r(K(I)) \ge n^{-\alpha /2}\}} \cdot {{\mathbb {P}}}\big ( \forall j \in J,\, (A_{j,[r]})^\top \in K(I)\, \big |\, A_{I,[r]}\big )\cdot \\&\qquad \cdot {{\mathbb {P}}}\big ( \forall j \in [n]\backslash (I\cup J),\, (A_{j,[r]})^\top \in K(I)\, \big | \,A_{I,[r]} \big ) \Big ] . \end{aligned}$$

Almost everywhere on the event \(\{\sigma _r(K(I)) \ge n^{-\alpha /2}\}\) we have

$$\begin{aligned} {{\mathbb {P}}}\big ( \forall j \in J,\, (A_{j,[r]})^\top \in K(I)\, \big |\, A_{I,[r]}\big ) \ge n^{-\alpha x/2}, \end{aligned}$$

whence

$$\begin{aligned}&{{\mathbb {P}}}\Big ( \{\sigma _r(K(I)) \ge n^{-\alpha /2}\} \cap \big \{ \forall j \in [n]\backslash I,\, (A_{j,[r]})^\top \in K(I)\big \} \Big )\\&\quad \ge n^{-\alpha x/2}\, {{\mathbb {P}}}\Big ( \{\sigma _r(K(I)) \ge n^{-\alpha /2}\} \cap \big \{ \forall j \in [n]\backslash (I\cup J),\, (A_{j,[r]})^\top \in K(I)\big \} \Big ). \end{aligned}$$

Together with (46) and (44), this yields

$$\begin{aligned} {{\mathbb {P}}}\Big ( {\mathcal {E}}(I,J)^c \cap {\mathcal {E}}(I) \,\Big \vert \, \{\sigma _r(K(I)) \ge n^{-\alpha /2}\} \cap \{ I_r(A) = I\} \Big ) \le \exp \Big ( - \frac{1}{2} \alpha \log (n) x\Big ), \end{aligned}$$

and the proof of (43) is complete. \(\square \)

At this point, we are ready to prove the main lemma in this section.

Proof of Lemma 5.4

First, recall that in view of Lemma 5.8, \(r_{i_{\max }+1}\le m_{s+1}\), and that in view of (31) we have \(r - \lceil 9/ \tilde{\varepsilon }\rceil \le \lfloor (1-(1+\tilde{\varepsilon })^{-i_{\max }+1})r\rfloor \), whence

$$\begin{aligned}&\bigcup _{ r \in [m_{s+1}, m_{s+2}] } \big \{ s_{r - \lceil 9/ \tilde{\varepsilon }\rceil }(A_{I_r,[r]})< g_s(i_{\max }+1) \big \}\\&\quad \subset \bigcup _{ r \in [r_{i_{\max }+1}, m_{s+2}] } \big \{ s_{\lfloor (1-(1+\tilde{\varepsilon })^{-i_{\max }+1})r\rfloor }(A_{I_r,[r]}) < g_s(i_{\max }+1) \big \}\\&\quad =\bigcup _{r \in [r_{i_{\max }+1}, m_{s+2}] } {\mathcal {E}}^c(r,i_{\max }+1), \end{aligned}$$

where we used the definition of the events \({\mathcal {E}}(r,i)\) (Definition 5.12). To estimate the probability of the union of the events in the last line, we shall embed it into a specially structured collection.

Let \(r':= f_{i_{\max }}(m_{s+2})\), where \(f_{\cdot }(\cdot )\) was defined in (30). We have

$$\begin{aligned}&\bigcup _{r \in [r_{i_{\max }+1}, m_{s+2}] } {\mathcal {E}}^c(r,i_{\max }+1) \subset \bigcup _{r\in [m_s,r']}\{ \sigma _n(K_r(A)) < n^{-\alpha /2} \}\\&\quad \cup \; \bigcup _{r \in [r_{i_{\max }+1}, m_{s+2}] } \Big ({\mathcal {E}}^c(r,i_{\max }+1)\cap \Big ( \bigcap _{r\in [m_s,r']}\{ \sigma _n(K_r(A)) \ge n^{-\alpha /2} \}\Big )\Big ). \end{aligned}$$

To be able to apply a recursive bound from the last lemma, we use the bounds \(r_i\le f_i( {\tilde{r}} ) \le m_{s+2}\), \( {\tilde{r}} \in [r_{i+1}, m_{s+2}]\), \(i \in [i_{th}+1,i_{\max }]\), to write

$$\begin{aligned}&\bigcup _{r \in [r_{i_{\max }+1}, m_{s+2}] } \Big ({\mathcal {E}}^c(r,i_{\max }+1)\cap \Big ( \bigcap _{r\in [m_s,r']}\{ \sigma _n(K_r(A)) \ge n^{-\alpha /2} \}\Big )\Big )\\&\quad \subset \bigcup _{i \in [i_{th}]}\bigcup _{r \in [m_s,m_{s+2}]} {\mathcal {E}}^c(r,i)\\&\quad \cup \bigcup _{ i \in [i_{th},i_{\max }] } \bigcup _{ \tilde{r} \in [r_{i+1}, m_{s+2} ] } \Big ( {\mathcal {E}}^c( {\tilde{r}} , i+1 ) \cap {\mathcal {E}}( f_i( {\tilde{r}} ) , [i] ) \\&\quad \cap \Big ( \bigcap _{r\in [m_s,r']}\{ \sigma _n(K_r(A)) \ge n^{-\alpha /2} \}\Big )\Big ). \end{aligned}$$

Thus, using that \(f_i( {\tilde{r}} )\le r'\) for \(i \in [i_{th}+1,i_{\max }]\) and that \(m_{s+2} - r' \ge L \), we get

$$\begin{aligned}&{{\mathbb {P}}}\Big ( \bigcup _{r \in [r_{i_{\max }+1}, m_{s+2}] } {\mathcal {E}}^c(r,i_{\max }+1) \Big ) \nonumber \\&\quad \le \sum _{ r \in [m_s,r'] } {{\mathbb {P}}}\{ \sigma _n(K_r(A)) < n^{-\alpha /2} \} + \sum _{ i \in [i_{th}] } \sum _{ r \in [m_s,m_{s+2}] } {{\mathbb {P}}}( {\mathcal {E}}^c(r,i) ) \nonumber \\&\qquad + \sum _{ i \in [i_{th},i_{\max }] } \sum _{ \tilde{r} \in [r_{i+1}, m_{s+2} ] } {{\mathbb {P}}}\Big ( {\mathcal {E}}^c( {\tilde{r}} , i+1 ) \cap {\mathcal {E}}( f_i( {\tilde{r}} ) , [i] ) \cap \{ \sigma (K_{ f_i( {\tilde{r}} )}) \ge n^{-\alpha /2} \} \Big ) \nonumber \\&\quad \le \underbrace{n\cdot n^{-\alpha L/4}}_{\hbox { by Proposition 4.2}} + \underbrace{n^2\exp \big ( - \alpha L \log (n) \big )}_{\hbox { by Lemma 5.13}} \nonumber \\&\qquad + \underbrace{ n^2 \exp \Big ( - \frac{1}{4} \alpha \log (n) \frac{L}{\tilde{\varepsilon }} \Big )}_{\hbox { by Lemma 5.14}} \le n^{- c(\tilde{\varepsilon })\alpha L } \end{aligned}$$
(47)

for some \(c( \tilde{\varepsilon })>0\). It remains to note that in view of Lemma 5.11, \(g_s(i_{\max }+1)\ge n^{-C(\tilde{\varepsilon })\alpha }\) for some \(C(\tilde{\varepsilon })\). \(\square \)

Proof of Corollary 5.3

For brevity, we denote \(k_0:=k_0(p)\). We fix \(r \in [k_0+1,n-2k_0]\) and let \(F_{r} \subset {\mathbb {R}}^{I_r}\) be the right singular subspace of the matrix \((A_{I_r,[r]})^\top \) corresponding to \(k_0\) smallest singular values of \((A_{I_r,[r]})^\top \) (since almost everywhere on the probability space \(I_r\) is unambiguously determined, and the singular values of \((A_{I,[r]})^\top \) are distinct for every \(I\subset [n]\) with \(|I|=r\), \(F_r\) is uniquely defined). Now, let us define the event \(\tilde{\mathcal {E}}_r(\beta )\) that

$$\begin{aligned} s_{\min }\big ((A_{I_r,[r+1,r+2k_0]})^\top \big \vert _{F_r} \big )&\ge n^{-\beta /(40p)}{} & {} \hbox {and}&\big \Vert (A_{I_r,[r+1,r+2k_0]})^\top \big \vert _{F_r^\perp } \big \Vert&\le 3\sqrt{\beta n}, \end{aligned}$$
(48)

where \((A_{I_r,[r+1,r+2k_0]})^\top \big \vert _{F_r} \) and \((A_{I_r,[r+1,r+2k_0]})^\top \big \vert _{F_r^\perp }\) are linear operators obtained by restricting the domain of \((A_{I_r,[r+1,2k_0]})^\top \) to \(F_r\) and \(F_r^\perp \), respectively. Then, conditioned on the intersection \(\tilde{\mathcal {E}}_r(\beta ) \cap {\mathcal {E}}_{\textrm{is}}(p,k_0,\beta )\), for any \(v \in {\mathbb {R}}^{I_r} \backslash \{0\}\),

$$\begin{aligned} \big \Vert (A_{I_r,[r+2k_0]})^\top \,v\big \Vert _2&= \Bigg \Vert \Bigg ( \begin{matrix} (A_{I_r,[r]})^\top v \\ (A_{I_r,[r+1,r+2k_0]})^\top v \end{matrix} \Bigg ) \Bigg \Vert _2\\&\ge \max \Big \{ n^{-\beta /(50p)}\,\big \Vert P_{F_r^\perp }\,v\big \Vert _2 ,\, n^{-\beta /(40p)} \,\big \Vert P_{F_r}\,v\big \Vert _2 - 3\sqrt{\beta n} \,\big \Vert P_{F_r^\perp }\,v\big \Vert _2 \Big \}, \end{aligned}$$

where \(P_{F_r}\) and \(P_{F_r^\perp }\) are orthogonal projections onto \(F_r\) and \(F_r^\perp \), respectively. Consider two cases.

  • Suppose \( \Vert P_{F_r^\perp }v\Vert _2 \ge \frac{1}{4} \frac{n^{-\beta /(40p)}}{ 3\sqrt{\beta n } } \Vert P_{F_r}v\Vert _2\). Then,

    $$\begin{aligned} \Vert v\Vert _2&= \sqrt{ \Vert P_{F_r^\perp }v\Vert _2^2 + \Vert P_{F_r} v\Vert _2^2 } \le \Vert P_{F_r^\perp }v\Vert _2 \sqrt{1^2 + \Big ( \frac{1}{4} \frac{n^{-\beta /(40p)}}{ 3\sqrt{\beta n } }\Big )^{-2} } \\&\le O(\sqrt{\beta })\,n^{\beta /(40p) + \frac{1}{2}}\,\Vert P_{F_r^\perp }v\Vert _2, \end{aligned}$$

    which implies

    $$\begin{aligned} n^{-\beta /(50p)} \Vert P_{F_r^\perp }v\Vert _2 \ge O(\beta ^{-1/2})\,n^{-\beta / (50p) - \beta /(40p) - \frac{1}{2}}\,\Vert P_{F_r^\perp }v\Vert _2 \ge n^{-\beta /(20p)} \Vert v\Vert _2, \end{aligned}$$

    where the last inequality holds because \(\beta \ge \beta _0(p) \ge 300p\) and since n is sufficiently large depending on p.

  • In the case \( \Vert P_{F_r^\perp }v\Vert _2 < \frac{1}{4} \frac{n^{-\beta /(40p)}}{ 3\sqrt{ \beta n } } \Vert P_{F_r}v\Vert _2\), we have

    $$\begin{aligned} n^{-\beta /(40p)} \big \Vert P_{F_r}v\Vert _2 - 3\sqrt{\beta n} \big \Vert P_{F_r^\perp }v\big \Vert _2 \ge \frac{3}{4} n^{-\beta /(40p)} \Vert P_{F_r}v\Vert _2 \ge n^{-\beta /(20p)} \Vert v\Vert _2. \end{aligned}$$

Since the above estimate holds for all \(v \in {\mathbb {R}}^{I_r}\backslash \{0\}\), we conclude that everywhere on the intersection \(\tilde{\mathcal {E}}_r(\beta ) \cap {\mathcal {E}}_{\textrm{is}}(p,k_0,\beta )\),

$$\begin{aligned} s_{\min }\big ( (A_{I_r,[r+2k_0]})^\top \big ) \ge n^{-\beta /(20p)}. \end{aligned}$$

Therefore, for \(p\ge 1\) and \(\beta \ge \beta _0(p)\),

$$\begin{aligned} {\mathcal {E}}_{\textrm{is}}(p,k_0,\beta ) \cap \Big ( \bigcap _{r \in [k_0+1, n-2k_0]} \tilde{\mathcal {E}}_r(\beta ) \Big )\subset {\mathcal {E}}_\textrm{rec}(p,k_0,\beta ), \end{aligned}$$

and thus

$$\begin{aligned} {{\mathbb {P}}}\big ( {\mathcal {E}}_{\textrm{rec}}(p,k_0,\beta )^c \big ) \le {{\mathbb {P}}}\big ( {\mathcal {E}}_{\textrm{is}}(p,k_0,\beta )^c \big )+ \sum _{r \in [k_0+1, n-2k_0]} {{\mathbb {P}}}( \tilde{\mathcal {E}}_r(\beta )^c ). \end{aligned}$$

Since in view of Proposition 5.2, \({{\mathbb {P}}}\big ( {\mathcal {E}}_{\textrm{is}}(p,k_0,\beta )^c \big ) \le n^{-2\beta +o_n(1)}\), the corollary will follow if we show that \({{\mathbb {P}}}( \tilde{\mathcal {E}}_r(\beta )^c )\le n^{-2\beta -1 +o_n(1)}\).

From now on, we fix \(r \in [k_0+1, n-2k_0]\) and condition on a realization of \(A_{[n],[r]}\) such that the set \(I_r\) and the space \(F_r\) are uniquely determined. We will write \(\tilde{{\mathbb {P}}}\) and \(\tilde{\mathbb {E}}\) to denote the corresponding conditional probability and conditional expectation.

The independence of the entries of the matrix A implies that \(Q:=(A_{I_r,[r+1,r+2k_0]})^\top \big \vert _{F_r}\) and \(W:=(A_{I_r,[r+1,r+2k_0]})^\top \big \vert _{F_r^\perp }\) are (standard) Gaussian linear operators from \(F_r\) to \({\mathbb {R}}^{2k_0}\) and from \(F_r^\perp \) to \({\mathbb {R}}^{2k_0}\), respectively. For the purpose of estimating the operator norm and least singular values, we can view W and Q as matrices with i.i.d N(0, 1) entries of dimensions \(2k_0\times (r-k_0)\) and \(2k_0\times k_0\), respectively; more specifically, we can define standard Gaussian matrices \({\tilde{W}}\) and \({\tilde{Q}}\) of dimensions \(2k_0\times (r-k_0)\) and \(2k_0\times k_0\) such that everywhere on the probability space the singular spectrum of W and \({\tilde{W}}\), and of Q and \({\tilde{Q}}\), agree.

It is well known that the expected operator norm of any standard Gaussian matrix is no more than the sum of square roots of its dimensions (see, for example, [20, Section 7.3]). Hence,

$$\begin{aligned} \tilde{\mathbb {E}}\, \Vert W\Vert =\tilde{\mathbb {E}}\, \Vert {\tilde{W}}\Vert \le \sqrt{2k_0} + \sqrt{r-k_0} \le \sqrt{2n}. \end{aligned}$$

Since the spectral norm is 1-Lipschitz, the standard Gaussian concentration inequality (see, for example, [20, Section 5.2]) implies

$$\begin{aligned} \tilde{{\mathbb {P}}}\big \{ \Vert W\Vert \ge 3\sqrt{\beta n} \big \}&\le \tilde{{\mathbb {P}}}\big \{ \Vert W\Vert \ge \tilde{\mathbb {E}}\,\Vert W\Vert + \sqrt{\beta n } \big \} \le 2\exp \Big ( - \frac{ ( \sqrt{\beta n })^2}{2}\Big ) \nonumber \\&= 2\exp (-\beta n/2). \end{aligned}$$
(49)

Next, we derive an estimate for \(s_{\min }(Q)=s_{\min }({\tilde{Q}})\). For \(i \in [k_0]\), let \(P_i:{\mathbb {R}}^{k_0}\rightarrow {\mathbb {R}}^{k_0}\) be the orthogonal projection to the subspace which is orthogonal to the columns vectors \({\tilde{Q}}_{[2k_0],j}\) for \(j \in [k_0]\backslash \{i\}\). Then,

$$\begin{aligned} s_{\min }(Q)&= \min _{v \in S^{k_0-1}} \Vert {\tilde{Q}}v\Vert \ge \min _{v \in S^{k_0-1}} \max _{i\in [k_0]} \Vert P_i{\tilde{Q}}v\Vert _2 = \min _{v \in S^{k_0-1}} \max _{i\in [k_0]} \Vert P_i({\tilde{Q}}_{[2k_0],i})\Vert _2 |v_i|\\&\ge \frac{\min _{j \in [k_0]}\Vert P_j(\tilde{Q}_{[2k_0],j})\Vert _2}{\sqrt{k_0}}. \end{aligned}$$

Since \(P_j\) and \({\tilde{Q}}_{[2k_0],j}\) are independent, the norm \(\Vert P_j({\tilde{Q}}_{[2k_0],j})\Vert _2\) is equidistributed with that of a \(2k_0-(k_0-1)=(k_0+1)\)–dimensional standard Gaussian vector. Since the probability density function of a \((k_0+1)\)–dimensional Gaussian vector is bounded above by \((2\pi )^{-(k_0+1)/2}\), we obtain

$$\begin{aligned} \tilde{{\mathbb {P}}}\big \{ \Vert P_j ({\tilde{Q}}_{[2k_0],j}) \Vert _2 \le t \big \} \le \Big (\frac{t}{\sqrt{2\pi }}\Big )^{k_0+1} |B_2^{k_0+1}|,\quad t>0, \end{aligned}$$

where \(|B_2^{k_0+1}|\) is the Lebesgue measure of the unit Euclidean ball \(B_2^{k_0+1}\) in \({\mathbb {R}}^{k_0+1}\). Therefore, in view of the previous computations,

$$\begin{aligned} \tilde{{\mathbb {P}}}\big \{ s_{\min }(Q) \le t \big \} \le k_0\Big (\frac{t\sqrt{k_0}}{\sqrt{2\pi }}\Big )^{k_0+1} |B_2^{k_0+1}|, \quad t>0. \end{aligned}$$

Applying the bound \((|B_2^{k_0+1}|)^{1/(k_0+1)} = O( k_0^{-1/2})\), we get that there exists a universal constant \(C_{\textrm{b}}\ge 1\) so that

$$\begin{aligned} \tilde{{\mathbb {P}}}\big \{ s_{\min }(Q) \le t \big \} \le (C_{\textrm{b}} \,t)^{k_0},\quad t>0. \end{aligned}$$

Now, setting \(t:= n^{-\beta /(40p)}\), we get

$$\begin{aligned} \tilde{{\mathbb {P}}}\Big \{ s_{\min }(Q) \le n^{-\beta /(40p)} \Big \} \le n^{ - (1-o_n(1))\beta k_0 /(40p) } \le n^{-2\beta -1+o_n(1)}, \end{aligned}$$
(50)

where the last inequality holds since \(k_0=k_0(p)\ge 120p\).

As a final step of the proof, rewriting (49) and (50) on the entire probability space, we get

$$\begin{aligned}&{{\mathbb {P}}}\big \{ \Vert W\Vert \ge 3\sqrt{\beta n}\;\big |\;A_{[n],[r]} \big \} \le 2\exp (-\beta n/2)\quad a.s;\\&{{\mathbb {P}}}\Big \{ s_{\min }(Q) \le n^{-\beta /(40p)}\;\big |\;A_{[n],[r]} \Big \} \le n^{-2\beta -1+o_n(1)}\quad a.s. \end{aligned}$$

We conclude that \( {{\mathbb {P}}}( {\mathcal {E}}_r(\beta )^c ) \le n^{-2\beta -1 +o_n(1)}\), and the result follows. \(\square \)

6 The smallest singular value and the growth factor in exact arithmetic

6.1 Distance to subspaces

Recall that by \(i_t=i_t(A)\), \(1\le t\le n\), we denote the indices of the pivot rows in the GEPP process (see Sect. 4).

Definition 6.1

(Subspaces generated by row vectors of submatrices of A) For \(x,r \in [n]\) with \(1\le x \le r\), let

$$\begin{aligned} H_{r,x} \subset {\mathbb {R}}^r \hbox { be the random subspace spanned by } (A_{ i_t, [r] })^\top \hbox { for } t\in [x], \end{aligned}$$

and let \(H_{r,0}:=\{0\}\). Additionally, for \(s \in [x]\), let

$$\begin{aligned} H_{r,x,s} \subset {\mathbb {R}}^r \hbox { be the random subspace spanned by } (A_{ i_t, [r] })^\top \hbox { for } t \in [x]\backslash \{s\}, \end{aligned}$$

where we set \(H_{r,1,1}:=\{0\}\).

Definition 6.2

For \(\beta >0\), let \({\mathcal {E}}_{\textrm{row}}(r,\beta )\) be the event that

$$\begin{aligned} \textrm{dist}\big ( (A_{ i_r, [r]})^\top ,\, H_{r,r-1} \big ) \ge \sqrt{2/ \pi }\, n^{- 4(1+\beta / (n-r)) }\;\; \hbox { and }\;\; \Vert A_{ i_r, [n]} \Vert _2 \le \sqrt{n}+ 3 \sqrt{ \beta \log (n)} \end{aligned}$$
(51)

and set

$$\begin{aligned} {\mathcal {E}}_{\textrm{row}}(\beta ):= \bigcap _{r \in [n-1 ]} {\mathcal {E}}_{\textrm{row}}(r,\beta ). \end{aligned}$$

Further, let \({\mathcal {E}}_{\textrm{dist}}(\beta )\) be the event that

$$\begin{aligned}&\forall \; r,k,s \in [n-1] \hbox { with } s \le r-k \le r,\\&\quad \textrm{dist}\big ( (A_{i_s, [r]})^\top , H_{r,r-k,s} \big ) \le \exp \Big ( 6k \Big ( 1 + \frac{\beta }{n-r}\Big )\log n\Big ) \textrm{dist}\big ( (A_{i_s, [r]})^\top , H_{r,r,s}\big ) . \end{aligned}$$

The goal in this section is to prove

Proposition 6.3

There exists \(\beta _{6.3} \ge 2\) so that the following holds. For \(\beta \ge \beta _{6.3}\), we have \({\mathcal {E}}_{\textrm{dist}}(\beta ) \supset {\mathcal {E}}_{\textrm{row}}(\beta )\), and

$$\begin{aligned} {{\mathbb {P}}}( {\mathcal {E}}_{\textrm{dist}}(\beta )^c ) \le {{\mathbb {P}}}( {\mathcal {E}}_{\textrm{row}}(\beta )^c ) \le n^{-\beta }. \end{aligned}$$

The statement is obtained as a combination of Lemmas 6.6 and 6.7 below. First, we consider two simple facts from Euclidean geometry.

Lemma 6.4

Let \(u \in {\mathbb {R}}^r\) and let \(H \subset {\mathbb {R}}^r\) be a subspace. Then for any orthogonal projection P in \({\mathbb {R}}^r\), we have

$$\begin{aligned} \textrm{dist}( u,H) \ge \textrm{dist}(Pu,PH). \end{aligned}$$

Proof

The statement follows immediately by observing that P is a contraction. \(\square \)

Lemma 6.5

Let F be a subspace of \({\mathbb {R}}^k\), and let \(v_1, v_2 \in {\mathbb {R}}^k\) be vectors such that

$$\begin{aligned} \dim \mathrm{span\,}(F,v_1,v_2)=\dim (F)+2. \end{aligned}$$

For \(i \in [2]\), let \(F_i\) be the linear span of F and \(v_i\). Then,

$$\begin{aligned} \textrm{dist}(v_1, F) \le \frac{\textrm{dist}(v_1,F_2)\,\Vert v_2\Vert _2}{\textrm{dist}(v_2,F_1) }. \end{aligned}$$

Proof

For any subspace E, we let \(P_E\) be the orthogonal projection onto E. Let \(u_i:= \frac{P_{F^\perp }v_i }{ \Vert P_{F^\perp }v_i \Vert _2}\). Observe that,

$$\begin{aligned} \textrm{dist}(v_2,F_1)= & {} \Vert P_{F_1^\perp }v_2\Vert _2 = \Vert P_{F^\perp }v_2 - \langle P_{F^\perp }v_2, u_1 \rangle u_1 \Vert _2 \\= & {} \Vert P_{F^\perp }v_2\Vert _2\, \Vert u_2 - \langle u_2,u_1\rangle u_1 \Vert _2, \end{aligned}$$

whence

$$\begin{aligned} \Vert u_2 - \langle u_2,u_1\rangle u_1 \Vert _2 = \frac{ \textrm{dist}(v_2,F_1) }{ \Vert P_{F^\perp }v_2\Vert } \ge \frac{ \textrm{dist}(v_2,F_1) }{ \Vert v_2\Vert _2}. \end{aligned}$$

On the other hand,

$$\begin{aligned} \Vert u_2 - \langle u_2,u_1\rangle u_1\Vert _2 = \sqrt{ 1 - \langle u_2,u_1\rangle ^2 } = \Vert u_1 - \langle u_1,u_2 \rangle u_2\Vert _2, \end{aligned}$$

and therefore

$$\begin{aligned} \textrm{dist}(v_1,F)= & {} \Vert P_{F^\perp } v_1\Vert _2 = \frac{ \Vert P_{F^\perp } v_1\Vert _2\, \Vert u_1 - \langle u_1,u_2 \rangle u_2\Vert _2 }{ \Vert u_1 - \langle u_1,u_2 \rangle u_2\Vert _2} = \frac{\textrm{dist}(v_1,F_2)}{\Vert u_1 - \langle u_1,u_2 \rangle u_2\Vert _2} \\\le & {} \frac{\textrm{dist}(v_1,F_2)\,\Vert v_2\Vert _2}{\textrm{dist}(v_2,F_1) }. \end{aligned}$$

\(\square \)

Lemma 6.6

Let \(s,k,r \in [n-1]\) such that \(s\le r-k < r\). Fix a realization of A such that the event \({\mathcal {E}}_{\textrm{row}}(\beta )\) holds. Then,

$$\begin{aligned} \textrm{dist}\big ( (A_{ i_s, [r] })^\top , {H}_{r,r-k,s} \big ) \le \exp \Big ( 6k\Big ( 1 + \frac{\beta }{n-r} \Big )\log (n) \Big ) \, \textrm{dist}\big ( (A_{ i_s, [r] })^\top , {H}_{r,r,s} \big ) . \end{aligned}$$

Thus, \({\mathcal {E}}_{\textrm{dist}}(\beta ) \supset {\mathcal {E}}_{\textrm{row}}(\beta )\).

Proof

First, we note that for every \(t \in [2,r]\),

$$\begin{aligned} H_{t,t-1} = P_t (H_{r,t-1}) \end{aligned}$$

where \(P_t: {\mathbb {R}}^r \mapsto {\mathbb {R}}^t\) is the coordinate projection onto the first t components. Applying Lemma 6.4 for every \(2\le t\le r\), we obtain

$$\begin{aligned} \textrm{dist}\big ( (A_{i_t , [r]})^\top ,\, H_{r,t-1} \big ) \ge \textrm{dist}\big ( (A_{i_t , [t]})^\top ,\, H_{t,t-1} \big ) \ge \sqrt{2/\pi }\, n^{-4(1+ \beta / (n-t))}, \end{aligned}$$

where in the last inequality we used the definition of \({\mathcal {E}}_{\textrm{row}}(\beta )\). Further, for \(t \in [r-k+1,r]\), we will apply Lemma 6.5 with

$$\begin{aligned} F:= {H}_{r,t-1,s}, \quad v_1:= (A_{i_s, [r] })^\top ,\quad \hbox {and } v_2:= (A_{ i_t, [r] })^\top , \end{aligned}$$

so that \(F_1 = H_{r,t-1}\) and \(F_2 = {H}_{r,t,s}\), and from the previous inequality and the definition of \({\mathcal {E}}_{\textrm{row}}(\beta )\) we have

$$\begin{aligned} \textrm{dist}(v_2,\, F_1) \ge \sqrt{2/\pi }\, n^{-4(1+ \beta / (n-r))}\; \hbox { and }\; \Vert v_2\Vert _2 \le \Vert A_{ i_t, [n] } \Vert _2 \le \sqrt{n} + 3 \sqrt{ \beta \log n }. \end{aligned}$$

Lemma 6.5 implies

$$\begin{aligned} {\textrm{dist}}( (A_{ i_s, [r]})^\top , {H}_{r,t-1,s} ) \le \sqrt{\pi /2 }\, n^{4 (1 + \beta / (n-t))} ( \sqrt{n} + 3 \sqrt{ \beta \log n } )\, \textrm{dist}( (A_{ i_s, [r] })^\top , {H}_{r,t,s} ) \end{aligned}$$

(it is easy to see that in the case \(\dim \mathrm{span\,}\{F,v_1,v_2\}<\dim (F)+2\) when the lemma cannot be applied, the above inequality holds as well). Together with the inequality \( \sqrt{n} + 3 \sqrt{ \beta \log n } \le 2n^{1+ \beta / (n-t) }\) for \(\beta >0\),

$$\begin{aligned} {\textrm{dist}}\big ( {H}_{r,t-1,s} ,\, (A_{ i_s, [r]})^\top \big )&\le \exp \bigg ( \Big ( 1 + \frac{\beta }{n-t} \Big ) 6\log n \bigg ) {\textrm{dist}}\big ( {H}_{r,t,s},\, (A_{ i_s, [r]})^\top \big ),\\&\quad t \in [r-k+1,r]. \end{aligned}$$

Finally, applying the above inequality inductively for t from \(r-k+1\) to r we obtain

$$\begin{aligned} {\textrm{dist}}\big ( {H}_{r,r-k,s},\, (A_{ i_s, [r]})^\top \big ) \le \exp \bigg ( 6k \Big ( 1+ \frac{\beta }{n-r} \Big ) \log n \bigg )\, {\textrm{dist}}\big ( {H}_{r,r,s},\, (A_{ i_s, [r]})^\top \big ). \end{aligned}$$

\(\square \)

Lemma 6.7

For \(\beta \ge 2\) and \(1\le r\le n\), the following probability estimate holds:

$$\begin{aligned} {{\mathbb {P}}}( {\mathcal {E}}_{\textrm{row}}(r,\beta )^c ) \le (1+o_n(1))n^{-2\beta } \end{aligned}$$

and

$$\begin{aligned} {{\mathbb {P}}}( {\mathcal {E}}_{\textrm{row}}(\beta )^c ) \le n^{-\beta }. \end{aligned}$$

Proof

First, in view of Corollary 4.4, we have

$$\begin{aligned} {{\mathbb {P}}}\Big \{ \textrm{dist}\big ((A_{ i_r, [r]})^\top ,\, H_{r,r-1} \big ) \le \sqrt{ \frac{\pi }{2} }\, n^{-4(1+ \beta / (n-r))}\Big \}&\le n^{-2(1+ \beta / (n-r))(n-r)} \\&= n^{-2(n-r+\beta )} \le n^{-2\beta }. \end{aligned}$$

Next, for each \(i \in [n]\), applying the standard concentration inequality for Lipschitz functions of Gaussian variables,

$$\begin{aligned} {{\mathbb {P}}}\big \{ \Vert A_{ i,[n] } \Vert _2 \ge {\mathbb {E}}\,\Vert A_{i,[n]}\Vert _2 + t \big \} \le 2\exp ( -t^2/2), \quad t>0. \end{aligned}$$

With \({\mathbb {E}}\Vert A_{i,[n]}\Vert _2 \le ({\mathbb {E}}\Vert A_{i,[n]}\Vert _2^2)^{1/2}\le \sqrt{n}\), by taking \(t:= 3\sqrt{ \beta \log n }\) we have

$$\begin{aligned} {{\mathbb {P}}}\big \{ \Vert A_{ i, [n] } \Vert _2 \ge \sqrt{n} + 3\sqrt{ \beta \log n } \big \} \le 2\, n^{-9\beta /2}. \end{aligned}$$

Taking the union bound over \(i\in [n]\) and taking into account the condition \(\beta \ge 2\), we get the first assertion of the lemma.

The second assertion follows from another application of the union bound. \(\square \)

6.2 The smallest singular value of \(A_{I_r,[r]}\)

Definition 6.8

For \(k \in [n]\), \(\beta ,p \ge 1\), let \({\mathcal {E}}_{\textrm{sq}}(p,k,\beta )\) be the event that

$$\begin{aligned} \forall r \in [k,n-k],\quad s_{r}(A_{I_r,[r]}) \ge n^{-\beta /(6p)}. \end{aligned}$$

Proposition 6.9

There is a universal constant \(C>0\) with the following property. For any \(p \ge 1\), there exist \(n_0(p)\), \(k_1(p)\le Cp^2\) and \( \beta _6.3\le \beta _1(p)\le Cp^2\) such that for \( n \ge n_0(p), \beta \ge \beta _1(p)\), and \(k_1(p)\) we have

$$\begin{aligned} {\mathcal {E}}_{\textrm{sq}}(p,k_1(p),\beta ) \supset {\mathcal {E}}_{\textrm{row}}(\beta ) \cap {\mathcal {E}}_{\textrm{rec}}(p,k_0(p),\beta ), \end{aligned}$$

where \(k_0(p)\) is taken from Proposition 5.2, \(\beta _6.3\) is defined in Proposition 6.3, and \({\mathcal {E}}_{\textrm{rec}}(\cdot )\) is taken from Definition 5.1. Moreover,

$$\begin{aligned} {{\mathbb {P}}}( {\mathcal {E}}_{\textrm{sq}}(p,k_1(p),\beta )^c ) \le 2n^{-\beta }. \end{aligned}$$

Proof

Note that if the events’ inclusion above holds then the second assertion of the proposition follows immediately by combining the bounds \({{\mathbb {P}}}( {\mathcal {E}}_{\textrm{rec}}(p,k_0(p),\beta )^c ) \le n^{-2\beta +o_n(1)}\) from Corollary 5.3 and \({{\mathbb {P}}}( {\mathcal {E}}_\textrm{row}(\beta )^c)\le n^{-\beta }\) from Lemma 6.7. Thus, we can focus on proving the first assertion.

Let \(k_1(p)=\beta _1(p) \ge 400 k_0(p)\,p\) where \(k_0(p)\) is taken from Proposition 5.2.

Consider an argument by contradiction. Fix any realization of A such that both \({\mathcal {E}}_{\textrm{row}}(\beta )\) and \({\mathcal {E}}_{\textrm{rec}}(p, k_0(p),\beta )\) hold, and such that for some \(r \in [k_1(p), n-k_1(p)]\), \(s_{r}(A_{I_r,[r]}) \le n^{-\beta /(6p)}\), that is, there exists a unit vector \(u \in {\mathbb {R}}^{I_r}\) such that

$$\begin{aligned} \Vert (A_{I_r,[r]})^\top u \Vert _2 \le n^{-\beta /(6p)} \end{aligned}$$

(we assume here that the column of the matrix \((A_{I_r,[r]})^\top \) are indexed over the set \(I_r\)). Since \(\Vert u\Vert _2=1\), there is an index \(s \in [r]\) such that \( |u_{i_s}|\ge r^{-1/2} \ge n^{-1/2} \), whence

$$\begin{aligned} \Vert (A_{I_r,[r]})^\top u \Vert _2 = \Big \Vert \sum _{t \in [r]} (A_{\{i_t\},[r]})^\top u_{i_t} \Big \Vert _2 \ge n^{-1/2}\, \textrm{dist}\big ( (A_{i_s,[r]})^\top ,\, H_{r,r,s} \big ). \end{aligned}$$

Thus, our realization of A and our choice of s satisfy

$$\begin{aligned} \textrm{dist}\big ( (A_{i_s,[r]})^\top , H_{r,r,s} \big ) \le \exp ( -\beta \log (n) /(6p) + \log (n) /2). \end{aligned}$$
(52)

Set \(k:= \min \{ 2\,k_0(p), r-s\}\).

Assume first that \(s<r\), i.e \(k>0\). In view of the inclusion \( {\mathcal {E}}_{\textrm{row}}(\beta ) \subset {\mathcal {E}}_{\textrm{dist}}(\beta )\) (see Proposition 6.3), we get

$$\begin{aligned} \textrm{dist}( (A_{i_s,[r]})^\top , H_{r,r-k,s} ) \le&\exp \Big ( -\beta \log n/(6p) + \log (n)/2 + 6k \Big ( 1 + \frac{\beta }{n-r} \Big ) \log n \Big ). \end{aligned}$$

Since \(n-r \ge k_1(p)\) and \(\beta \ge \beta _1(p)= k_1(p) \ge 400 k_0(p)\,p \ge 200kp\), we have

$$\begin{aligned} \frac{1}{2}+ 6k\Big ( 1 + \frac{\beta }{n-r}\Big )&\le 7k\Big ( 1 + \frac{\beta }{n-r}\Big ) \le 7 \cdot \frac{k_1(p)}{200 p} \Big ( 1 + \frac{\beta }{n-r}\Big ) \\&\le 7\cdot \frac{k_1(p)}{200 p} + 7\cdot \frac{\beta }{200p} \le \frac{\beta }{12p}, \end{aligned}$$

whence

$$\begin{aligned} \textrm{dist}\big ( (A_{i_s,[r]})^\top , H_{r,r-k,s} \big ) \le n^{-\beta /(12p)}. \end{aligned}$$
(53)

Further, in the situation when \(k=0\), the inequality (53) is still true as can be immediately seen from (52).

Next, we will show that (53) leads to contradiction. The argument depends on whether \(k=r-s\) or not.

Case 1 \(k=r-s\). By the definition of the event \({\mathcal {E}}_{\textrm{row}}(\beta )\), we have

$$\begin{aligned} \textrm{dist}\big ( (A_{i_s,[r]})^\top , \, H_{r,s,s} \big )&\ge \textrm{dist}\big ( (A_{i_s,[s]})^\top , H_{s,s,s} \big ) = \textrm{dist}\big ( (A_{i_s,[s]})^\top , H_{s,s-1} \big )\\&\ge \sqrt{2/\pi }\, n^{-4(1+ \frac{\beta }{n-s})}\\&\ge \sqrt{2/\pi }\,n^{-4-\beta /(50k_0(p)p)}, \end{aligned}$$

where we used that \(n-s \ge n-r \ge k_1(p)\ge 200k_0(p)\,p\). In view of the condition \(\beta \ge 200k_0(p)\,p\ge 400p\),

$$\begin{aligned} \sqrt{2/\pi }\,n^{-4-\beta /(50k_0(p)p)} \ge n^{-5-\beta /(50p)} \ge n^{ - \beta /(25p) } > n^{- \beta /(12p)}, \end{aligned}$$

which contradicts (53).

Case 2 \(k=2k_0(p)<r-s\). In this case, \((A_{i_s,[r]})^\top \) is a column vector of \((A_{I_{r-k}, [r]})^\top \) and \(H_{r,r-k,s}\) is the span of every other column vector \((A_{i_{s'},[r]})^\top \) for \(s' \in [r-k]\backslash \{s\}\). Hence, in view of (53),

$$\begin{aligned} s_{\min }\big ((A_{I_{r-k}, [r]})^\top \big )\le \textrm{dist}\big ( (A_{i_s,[r]})^\top , H_{r,r-k,s} \big ) \le n^{-\beta /12p}. \end{aligned}$$

However, this contradicts the definition of the event \({\mathcal {E}}_\textrm{sq}(p, k_0(p),\beta )\):

$$\begin{aligned} \forall r' \in [k_0(p)+1,n-2k_0], \, s_{\min }\big ( (A_{I_{r'},[r'+2k_0]})^\top \big ) \ge n^{-\beta /(20p)}. \end{aligned}$$

The result follows. \(\square \)

The next simple lemma will be used to show that with high probability indices of the pivot rows obtained in exact arithmetic coincide with the results of the floating point computations.

Lemma 6.10

There is a universal constant \(C>0\) and a number \(n_0\in {\mathbb {N}}\) such that, assuming \(n\ge n_0\),

$$\begin{aligned} {{\mathbb {P}}}\big \{ s_{\min }(A_{I_r,[r]})\le t\,n^{-C}\hbox { for some } 1\le r\le n-1 \big \}\le t,\quad t>0. \end{aligned}$$

Proof

In view of Proposition 6.9 (say, applied with \(p=1\)), there are constants \(C_1,n_0>0\) such that, assuming \(n\ge n_0\),

$$\begin{aligned} {{\mathbb {P}}}\big \{ s_{\min }(A_{I_r,[r]})\le t\,n^{-C}\hbox { for some } k_1(1)\le r\le n-k_1(1) \big \}\le t,\quad t>0. \end{aligned}$$

For indices \(r<k_1(1)\), we use the trivial union bound:

$$\begin{aligned}&{{\mathbb {P}}}\big \{ s_{\min }(A_{I_r,[r]})\le t\;\hbox { for some } 1\le r<k_1(1)\big \}\\&\quad \le \sum _{I\subset [n],\,1\le |I|<k_1(1)} {{\mathbb {P}}}\big \{ s_{\min }(A_{I,[|I|]})\le t\big \} \le n^{k_1(1)}\,t,\quad t>0, \end{aligned}$$

where in the last line we used the standard bound on the smallest singular value of a square Gaussian random matrix [4, 17]. Similarly, we get

$$\begin{aligned} {{\mathbb {P}}}\big \{ s_{\min }(A_{I_r,[r]})\le t\;\hbox { for some } n-k_1(1)<r\le n-1 \big \} \le n^{k_1(1)}\,t,\quad t>0. \end{aligned}$$

Combining the three estimates above, we get the result. \(\square \)

6.3 Estimating the growth factor in exact arithmetic

Definition 6.11

For \(\beta >1\), let \({\mathcal {E}}_{\textrm{col}}(\beta )\) be the event that

$$\begin{aligned} \forall j\in [n],\, \Vert A_{ [n],j} \Vert _2 \le \sqrt{n}+ 3 \sqrt{ \beta \log (n)} \end{aligned}$$
(54)

and for \(\tau >1\), let \({\mathcal {E}}_{\textrm{entry}}(\tau )\) be the event that

$$\begin{aligned} \max _{i,j\in [n]} | A_{ i,j} | \ge n^{-\tau }. \end{aligned}$$
(55)

Lemma 6.12

For any \(\beta \ge 2\), we have

$$\begin{aligned} {{\mathbb {P}}}( {\mathcal {E}}_{\textrm{col}}(\beta )^c ) \le n^{-\beta }; \end{aligned}$$

furthermore, for every \(\tau \ge 1\),

$$\begin{aligned} {{\mathbb {P}}}( {\mathcal {E}}_{\textrm{entry}}(\tau )^c ) \le n^{-\tau \,n^2}. \end{aligned}$$

Proof

The upper bound on \({{\mathbb {P}}}( {\mathcal {E}}_{\textrm{col}}(\beta )^c)\) can be derived exactly the same way as in the argument for \({\mathcal {E}}_\textrm{row}(\beta )\) (see the proof of Lemma 6.7), so we skip the discussion.

To estimate the complement of \({\mathcal {E}}_{\textrm{entry}}(\tau )\), we write

$$\begin{aligned} {{\mathbb {P}}}({\mathcal {E}}_{\textrm{entry}}(\tau )^c) \le {{\mathbb {P}}}\big \{ |A_{i,j}| < n^{-\tau }\hbox { for all } i,j\big \} \le n^{-\tau \,n^2}, \end{aligned}$$

where in the last inequality we used that the probability density function of the standard Gaussian random variable is bounded by \(\frac{1}{\sqrt{2\pi }}\). \(\square \)

At this point, we are ready to prove the “exact arithmetic” counterpart of the main statement of the paper:

Proposition 6.13

There is a universal constant \(C>1\) and a function \(\tilde{n}:[1,\infty )\rightarrow {\mathbb {N}}\) with the following property. Let \(p\ge 1\), and let \(n\ge {\tilde{n}}(p)\). Then

$$\begin{aligned} {{\mathbb {P}}}\bigg \{\frac{\max _{i,j,\ell }|A^{(\ell )}_{i,j}|}{\max _{i,j}|A_{i,j}|} \ge n^{t}\bigg \} \le 5n^{-p t},\quad t\ge Cp^2. \end{aligned}$$

Proof

Recall that the parameter \(k_1(p)=O(p^2)\) was defined in Proposition 6.9. We can take a universal constant \(C_1>0\) large enough so that \(C_1p^3 \ge 600 p\,k_1(p)\) for all \(p \ge 1\). Fix \(\beta \ge C_1p^3\), set \(\tau :=\beta /(100p)\), and assume \(n\ge \sqrt{100p}\). In view of the assertions of Lemma 6.7, Proposition 6.9, and Lemma 6.12, in order to show that

$$\begin{aligned} {{\mathbb {P}}}\bigg \{\frac{\max _{i,j,\ell }|A^{(\ell )}_{i,j}|}{\max _{i,j}|A_{i,j}|} > n^{\beta /(3p)} \bigg \} \le 5n^{-\beta } \end{aligned}$$

(which would imply the statement), it is sufficient to verify that everywhere on the intersection

$$\begin{aligned} {\mathcal {E}}_{\textrm{entry}}(\tau )\cap {\mathcal {E}}_{\textrm{row}}(\beta ) \cap {\mathcal {E}}_{\textrm{col}}(\beta ) \cap {\mathcal {E}}_{\textrm{sq}}(p,k_1(p),\beta ), \end{aligned}$$

we have

$$\begin{aligned} \frac{\max _{i,j,\ell }|A^{(\ell )}_{i,j}|}{\max _{i,j}|A_{i,j}|} \le n^{\beta /(3p)}. \end{aligned}$$

In what follows, we use the notation introduced at the beginning of Sect. 2; in particular, we work with matrices \({\mathcal {M}}^{(\ell )}\), \(0\le \ell \le n-1\), defined in (3).

Recall that

$$\begin{aligned} \forall r \in [n],\, \quad \frac{\max _{i,j} |{\mathcal {M}}_{i,j}^{(r)}| }{ \max _{i,j}|{\mathcal {M}}^{(r-1)}_{i,j}|} \le 2. \end{aligned}$$
(56)

For \(r \in [k_1(p)]\), we simply use the bound above to get

$$\begin{aligned} \frac{\max _{i,j} |{\mathcal {M}}_{i,j}^{(r)}| }{ \max _{i,j}|A_{i,j}|} \le 2^{k_1(p)}, \quad r \in [k_1(p)]. \end{aligned}$$

Further, for \(r \in (k_1(p),n-k_1(p)]\), we write

$$\begin{aligned} \frac{ \max _{i,j}|{\mathcal {M}}^{(r)}_{i,j}|}{ \max _{i,j}|A_{i,j}|} = \max \left( \frac{ \max \limits _{s\in [r],j\ge s}|{\mathcal {M}}^{(s-1)}_{i_s,j}|}{ \max _{i,j}|A_{i,j}|} ,\, \frac{\max \limits _{i \in [n]\backslash I_{r},j>r}|{\mathcal {M}}^{(r)}_{i,j}|}{\max _{i,j}|A_{i,j}|} \right) . \end{aligned}$$

In view of formula (4) and our conditioning on the event \({\mathcal {E}}_{\textrm{sq}}(p,k_1(p),\beta )\), for \(s\in [k_1(p),n-k_1(p)]\), \(i \in [n]\backslash I_s\) and \(j >s\), we have

$$\begin{aligned} |{\mathcal {M}}^{(s)}_{i,j}|&\le |A_{i,j}| + \Vert A_{i,[s]}\Vert _2\, \frac{1}{s_{\min }(A_{I_s,[s]})}\, \Vert A_{[s],j}\Vert _2\\&\le \sqrt{n}+3\sqrt{\beta \log n} + (\sqrt{n}+3\sqrt{\beta \log n}) n^{\beta /(6p)} (\sqrt{n}+3\sqrt{\beta \log n}) \\&< 2n^{\beta /(6p)}(\sqrt{n}+3\sqrt{\beta \log n})^2, \end{aligned}$$

and thus

$$\begin{aligned} \frac{\max \limits _{i \in [n]\backslash I_{r},j>r}|{\mathcal {M}}^{(r)}_{i,j}|}{\max _{i,j}|A_{i,j}|} \le 2n^{\beta /(6p)+ \beta /(100p)}(\sqrt{n}+3\sqrt{\beta \log n})^2, \end{aligned}$$

and for every \(s\in [r]\) with \(s>k_1(p)\),

$$\begin{aligned} \frac{\max \limits _{j\ge s}|{\mathcal {M}}^{(s-1)}_{i_s,j}|}{ \max _{i,j}|A_{i,j}|} \le 2n^{\beta /(6p)+ \beta /(100p)}(\sqrt{n}+3\sqrt{\beta \log n})^2. \end{aligned}$$

By our earlier observation,

$$\begin{aligned} \frac{\max \limits _{s\in [k_1(p)],j\ge s}|{\mathcal {M}}^{(s-1)}_{i_s,j}|}{ \max _{i,j}|A_{i,j}|} \le 2^{k_1(p)}. \end{aligned}$$

Combining the estimates together, we conclude that for all \(r \in [n-k_1(p)]\),

$$\begin{aligned} \frac{ \max _{i,j}|{\mathcal {M}}^{(r)}_{i,j}|}{ \max _{i,j}|A_{i,j}|} \le \max \big ( 2^{k_1(p)},\, 2n^{\beta /(6p)+ \beta /(100p)}(\sqrt{n}+3\sqrt{\beta \log n})^2 \big ). \end{aligned}$$

For the “last” \(k_1(p)\) admissible values of r, we rely on (56) again to get

$$\begin{aligned} \forall r \in (n-k_1(p),n],\, \quad \frac{\max _{i,j}|{\mathcal {M}}_{i,j}^{(r)}|}{\max _{i,j}|A_{i,j}|} \le \frac{\max _{i,j}|{\mathcal {M}}_{i,j}^{(n-k_1(p))}|}{\max _{i,j}|A_{i,j}|}\,2^{k_1(p)}. \end{aligned}$$

In the end, we make use of our bound \(\beta /(6p) \ge 100 k_1(p)\) to conclude that for all large n,

$$\begin{aligned} \frac{\max _{i,j,\ell }|A^{(\ell )}_{i,j}|}{\max _{i,j}|A_{i,j}|} =\frac{ \max _{r,i,j}|{\mathcal {M}}^{(r)}_{i,j}|}{ \max _{i,j}|A_{i,j}|} \le n^{\frac{\beta }{3p}}. \end{aligned}$$

This completes the proof. \(\square \)

7 GEPP in floating point arithmetic

In this section we transfer the statement of Proposition 6.13 into the proper context of the floating point arithmetic. We expect a part of the argument in this section (specifically, in the proof of Lemma 7.2) to be rather standard for experts in numerical analysis. Still, we prefer to provide all the details to make the paper self-contained.

Lemma 7.1

Let A be an \(n\times n\) Gaussian matrix and \(A={\mathcal {M}}^{(0)},{\mathcal {M}}^{(1)},\dots , {\mathcal {M}}^{(n-1)}\) be the sequence of matrices generated by the GEPP in exact arithmetic (see (3)). Then, for every \(1\le k\le n-1\),

$$\begin{aligned} \forall \delta \ge 0, \quad {{\mathbb {P}}}\big \{ (1-\delta )|{\mathcal {M}}^{(k-1)}_{i_{k},k}| < \max _{i \notin I_{k}} |{\mathcal {M}}^{(k-1)}_{i,k}| \big \} \le \delta (n-k+1). \end{aligned}$$

Proof

Fix any \(1\le k\in n-1\). With the vector \(v_{k}(A)\) defined at the beginning of Sect. 4 and in view of (4), for every \(i \notin I_{k-1}(A)\) we have

$$\begin{aligned} |{\mathcal {M}}^{(k-1)}_{i,k}| = |\langle v_{k}(A), ({\mathcal {M}}_{i,[n]})^\top \rangle |. \end{aligned}$$

Fix any subset \(I\subset [n]\) of cardinality \(k-1\) and any \((k-1) \times k\) matrix M, and condition on the realizations \(I_{k-1}(A)=I\) and \(A_{I_{k-1},[k]}=M\). In what follows, we denote the conditional probability measure by \(\tilde{{\mathbb {P}}}\). Under this conditioning, \(v_{k}(A)\) and the polytope \(K:=K_{k-1}(A)\) (see Sect. 4; here we adopt the convention \(K_0(A):={\mathbb {R}}^n\)) are fixed. For \(i \notin I\), let

$$\begin{aligned} X_i:= |\langle v_{k}(A), (A_{i,[n]})^{\top } \rangle |. \end{aligned}$$

By Lemma 4.1, under the conditioning the vectors \(A_{i,[n]}\) for \(i\notin I\) are i.i.d., with the probability density function

$$\begin{aligned} \rho (y)= \textbf{1}_{K}(y)\, \frac{\exp ( - \Vert y\Vert _2^2/2) }{ \int _{K} \exp ( - \Vert y'\Vert _2^2/2)\, \textrm{d}y'},\quad y\in {\mathbb {R}}^n, \end{aligned}$$

which is symmetric and log-concave (i.e \( y \mapsto \log (\rho (y))\) is a concave function). Since log-concavity is preserved under taking marginals, the random variable \( \langle \frac{v_{k}(A)}{\Vert v_{k}(A)\Vert _2},(A_{i,[n]})^{\top }\rangle \) is also log-concave and symmetric under the conditioning. This implies, in particular, that the probability density function \(\rho _X\) of \(X_i\)’s (\(i \notin I\)) is non-increasing on the positive semi-axis.

Now, since \(X_{i_{k}} = \max _{i \notin I}X_i\), we have

$$\begin{aligned} \tilde{{\mathbb {P}}}\big \{ (1-\delta ) X_{i_{k}} \ge \max _{i \notin I_{k}}X_i\big \} =(n-k+1)\,\int \limits _0^\infty \Big (\int _0^{(1-\delta )r} \rho _X(t)\,dt \Big )^{n-k}\,\rho _X(r)\,dr, \end{aligned}$$

whereas

$$\begin{aligned} (n-k+1)\,\int \limits _0^\infty \Big (\int _0^{r} \rho _X(t)\,dt \Big )^{n-k}\,\rho _X(r)\,dr=1. \end{aligned}$$

Combining the two identities and using the monotonicity of \(\rho _X\), we get

$$\begin{aligned} \tilde{{\mathbb {P}}}\big \{ (1-\delta ) X_{i_{k}} \ge \max _{i \notin I_{k}}X_i\big \}= & {} \Bigg ( \frac{\int \limits _0^\infty \Big (\int _0^{(1-\delta )r} \rho _X(t)\,dt \Big )^{n-k}\,\rho _X(r)\,dr}{\int \limits _0^\infty \Big (\int _0^{r} \rho _X(t)\,dt \Big )^{n-k}\,\rho _X(r)\,dr} \Bigg ) \\\ge & {} (1-\delta )^{n-k} \ge 1- \delta (n-k+1). \end{aligned}$$

The result follows by applying Fubini’s theorem. \(\square \)

Lemma 7.2

Let M an \(n\times n\) invertible matrix and \({\hat{M}}:= \textrm{fl}(M)\). Let \(PM=LU\) be the PLU-factorization of M in exact arithmetic, assume that \(P=\textrm{Id}_n\), and let \(M=M^{(0)},M^{(1)},\dots ,\) \(M^{(n-1)}=U\) be the sequence of matrices obtained during the elimination process. Let \(\delta \in (\textbf{u},1/3)\) be a parameter and assume that the matrix M and the unit roundoff \(\textbf{u}\) satisfy

$$\begin{aligned} 8n^2\,\textbf{u}\,\max \limits _{i,j,\ell }|M^{(\ell )}_{i,j}| \le \frac{ 1}{12}\,\frac{s_{\min }(M_{[k],[k]})^3}{\Vert M\Vert ^2} \frac{\delta }{3},\quad k=1,\dots ,n-1, \end{aligned}$$

and

$$\begin{aligned} \forall k \in [n-1], \quad \frac{\max _{i\in [k+1,n]}|M^{(k-1)}_{i,k}|}{|M^{(k-1)}_{k,k}|} \le 1-\delta . \end{aligned}$$
(57)

Then GEPP in floating point arithmetic succeeds for \({\hat{M}}\); the computed permutation matrix \({\hat{P}}=\textrm{Id}_n\), and, denoting by \({\hat{M}} = {\hat{M}}^{(0)}, {\hat{M}}^{(1)},\dots , {\hat{M}}^{(n-1)}\) the sequence of matrices obtained during the elimination process, for every \(k=0, 1,\dots ,n-1\),

$$\begin{aligned} \max \limits _{i,j}|{\hat{M}}^{(k)}_{i,j}|\le 2\,\max \limits _{i,j,\ell }|M^{(\ell )}_{i,j}|. \end{aligned}$$

Proof

We will prove the statement by induction. Fix any \(k\in [n-1]\). Assume that all of the following holds

  1. (a)

    The computed matrix \({\hat{M}}^{(k-1)}\) has been produced by taking indices of the first \(k-1\) pivot rows to be \(1,2,\dots ,k-1\), and \( |{\hat{M}}^{(k-1)}_{k,k}| > \max _{i \in [k+1,n] }|{\hat{M}}^{(k-1)}_{i,k}|\), so that the index of the k-th computed pivot row is k.

  2. (b)

    \({\hat{M}}^{(k-1)} = G_{k-1}\cdots G_1(M+{\tilde{E}}^{(k-1)})\), where \(G_i\) is the Gauss transformation to eliminate i-th row of \({\hat{M}}^{(i-1)}\), \(1\le i\le k-1\), and where the error matrix \({\tilde{E}}^{(k-1)}\) satisfies

    $$\begin{aligned} \Vert {\tilde{E}}^{(k-1)} \Vert \le 8kn\,\textbf{u}\, \max _{i,j,\ell }|M_{i,j}^{(\ell )}|. \end{aligned}$$
  3. (c)

    \(\max \limits _{i,j}|{\hat{M}}^{(v)}_{i,j}|\le 2\,\max \limits _{i,j,\ell }|M^{(\ell )}_{i,j}|\) for all \(0\le v\le k-1\).

Note that, by the assumptions on the matrix M, the induction hypothesis for the base case \(k-1=0\) is satisfied.

Let \(G_k=\textrm{Id}_n-\tilde{\tau }^{(k)}\,e_k^\top \) be the Gauss transformation which eliminates entries \({\hat{M}}^{(k-1)}_{i,k}\), \(i=k+1,\dots ,n\), so that in exact arithmetic we have

$$\begin{aligned} \tilde{\tau }^{(k)}_{i}= \frac{ {\hat{M}}^{(k-1)}_{i,k}}{ {\hat{M}}^{(k-1)}_{k,k}},\;\; (G_k{\hat{M}}^{(k-1)})_{i,k}=0,\quad i=k+1,\dots ,n. \end{aligned}$$

The computed matrix \({\hat{M}}^{(k)}\) can be explicitly written as

$$\begin{aligned} {\hat{M}}^{(k)}_{i,j}&= {\left\{ \begin{array}{ll} 0, &{} \hbox {if } \quad i\in [k+1,n] \hbox { and } j=k, \\ \textrm{fl} \Big ( {\hat{M}}^{(k-1)}_{i,j} - \textrm{fl}\Big (\textrm{fl}\Big (\frac{ {\hat{M}}^{(k-1)}_{i,k}}{ {\hat{M}}^{(k-1)}_{k,k}}\Big )\, {\hat{M}}^{(k-1)}_{k,j}\Big ) \Big ), &{}\hbox {if }\quad i,j \in [k+1,n], \\ {\hat{M}}^{(k-1)}_{i,j}, &{} \hbox {otherwise} \end{array}\right. } \end{aligned}$$

(note that we “force” \({\hat{M}}^{(k)}_{i,k}\) to be 0 for \(i\in [k+1,n]\) whereas the f.p. expression \(\textrm{fl}\big (\textrm{fl}\big (\frac{ {\hat{M}}^{(k-1)}_{i,k}}{ {\hat{M}}^{(k-1)}_{k,k}}\big )\, {\hat{M}}^{(k-1)}_{k,k}\big )\) is not necessarily equal to \({\hat{M}}^{(k-1)}_{i,k}\)). Denote

$$\begin{aligned} E^{(k)}:= {\hat{M}}^{(k)}- G_k{\hat{M}}^{(k-1)}. \end{aligned}$$

Since the first k rows of \(E^{(k)}\) are 0, for every \(i \in [k]\) we have \(G_i E^{(k)}=E^{(k)}\), so \({\hat{M}}^{(k)}\) can be expressed in the form

$$\begin{aligned} {\hat{M}}^{(k)} = G_k \big ({\hat{M}}^{(k-1)} + G_{k-1}\cdots G_1 E^{(k)}\big ). \end{aligned}$$

Applying the above equality together with the induction hypothesis, we obtain

$$\begin{aligned} {\hat{M}}^{(k)} = G_k G_{k-1}\cdots G_1(M+{\tilde{E}}^{(k)}), \end{aligned}$$
(58)

where \({\tilde{E}}^{(k)}:={\tilde{E}}^{(k-1)}+E^{(k)}\). Note that non-zero entries of \(E^{(k)}\) are all contained within the bottom right \((n-k)\times (n-k)\) submatrix of \(E^{(k)}\), and for every \(i,j \in [k+1,n]\) we have

$$\begin{aligned} |E^{(k)}_{i,j}|&= \bigg | \textrm{fl} \Big ( {\hat{M}}^{(k-1)}_{i,j} - \textrm{fl}\Big (\textrm{fl}\Big (\frac{ {\hat{M}}^{(k-1)}_{i,k}}{ {\hat{M}}^{(k-1)}_{k,k}}\Big )\, {\hat{M}}^{(k-1)}_{k,j}\Big ) \Big ) -\bigg ({\hat{M}}^{(k-1)}_{i,j} - \frac{ {\hat{M}}^{(k-1)}_{i,k}}{ {\hat{M}}^{(k-1)}_{k,k}} {\hat{M}}^{(k-1)}_{k,j} \bigg )\bigg |\\&\le 3(\textbf{u} + O(\textbf{u}^2)) \max \bigg \{ |{\hat{M}}^{(k-1)}_{i,j}| ,\, \bigg |\frac{ {\hat{M}}^{(k-1)}_{i,k}}{ {\hat{M}}^{(k-1)}_{k,k}} {\hat{M}}^{(k-1)}_{k,j}\bigg | \bigg \} \\&\le 4\textbf{u} \max \big \{ |{\hat{M}}^{(k-1)}_{i,j}|, | {\hat{M}}^{(k-1)}_{k,j}| \big \} \\&\le 4 \textbf{u} \max _{i',j'} |{\hat{M}}_{i',j'}^{(k-1)}| \le 8 \textbf{u} \max _{i',j',\ell } |M_{i',j'}^{(\ell )}|, \end{aligned}$$

since there are 3 floating point operations and \( \frac{ |{\hat{M}}^{(k-1)}_{i,k}|}{ |{\hat{M}}^{(k-1)}_{k,k}|}\le 1\), and where in the last inequality we used the induction assumption (c). Thus,

$$\begin{aligned} \Vert {\tilde{E}}^{(k)}\Vert \le \Vert {\tilde{E}}^{(k-1)}\Vert + \Vert E^{(k)}\Vert\le & {} 8kn \textbf{u} \max _{i,j,\ell } |M_{i,j}^{(\ell )}| + n \max _{i,j}|E^{(k)}_{i,j}| \\\le & {} 8(k+1)n \textbf{u} \max _{i,j,\ell } |M_{i,j}^{(\ell )}|, \end{aligned}$$

confirming condition (b) on the kth step. Moreover, in view of the assumptions on M we then have

$$\begin{aligned} \Vert {\tilde{E}}^{(k)}\Vert \le 8n^2 \textbf{u} \max _{i,j,\ell } |M_{i,j}^{(\ell )}| \le \frac{ 1}{12}\,\frac{\min _{1\le v\le n-1}\,s_{\min }(M_{[v],[v]})^3}{\Vert M\Vert ^2} \frac{\delta }{3}. \end{aligned}$$
(59)

Further, by the assumptions on \(s_{\min }(M_{[k],[k]})\) and in view of the bound on the norm of \(E^{(k)}\), the matrix \((M+\tilde{E}^{(k)})_{[k],[k]}\) is invertible. Hence, for every \(i\in [k+1,n]\) there is a unique linear combination \(L_i\) of the first k rows of \(M+{\tilde{E}}^{(k)}\) such that the vector \(\textrm{row}_i(M+\tilde{E}^{(k)})-L_i\) has first k components equal zero. We conclude that necessarily the matrices \(G_1,G_2,\dots ,G_k\) are Gauss transformations for \(M+{\tilde{E}}^{(k)}\), whence for every \(j \in [k+1,n]\) we have

$$\begin{aligned} {\hat{M}}^{(k)}_{j,[k+1,n]}&= (M+\tilde{E}^{(k)})_{j,[k+1,n]} - (M+{\tilde{E}}^{(k)})_{j,[k]} \big ((M+\tilde{E}^{(k)})_{[k],[k]} \big )^{-1} \nonumber \\&\quad (M+{\tilde{E}}^{(k)})_{[k], [k+1,n]}, \end{aligned}$$
(60)

whereas

$$\begin{aligned} M^{(k)}_{j,[k+1,n]} = M_{j,[k+1,n]} - M_{j,[k]} \big (M_{[k],[k]} \big )^{-1} M_{[k], [k+1,n]}. \end{aligned}$$
(61)

We will rely on formulas (60) and (61) to show that \({\hat{M}}^{(k)}\) and \(M^{(k)}\) are sufficiently close entry-wise.

In view of (60) and (61), for every \(j\in [k+1,n]\) we have

$$\begin{aligned} \Vert {\hat{M}}^{(k)}_{j,[k+1,n]}-M^{(k)}_{j,[k+1,n]}\Vert&\le \Vert \tilde{E}^{(k)}_{j,[k+1,n]}\Vert + 2\Vert {\tilde{E}}^{(k)}\Vert \,\Vert M\Vert \, \big \Vert \big ((M+{\tilde{E}}^{(k)})_{[k],[k]} \big )^{-1}\big \Vert \\&\quad +\big \Vert {\tilde{E}}^{(k)}\Vert ^2\,\Vert \big ((M+{\tilde{E}}^{(k)})_{[k],[k]} \big )^{-1}\big \Vert \\&\quad +\Vert M\Vert ^2\, \big \Vert \big ((M+{\tilde{E}}^{(k)})_{[k],[k]} \big )^{-1}-\big (M_{[k],[k]} \big )^{-1}\big \Vert . \end{aligned}$$

Note that

$$\begin{aligned} \big ((M+{\tilde{E}}^{(k)})_{[k],[k]} \big )^{-1}-\big (M_{[k],[k]} \big )^{-1} =-\big (M_{[k],[k]} \big )^{-1}\,\tilde{E}^{(k)}_{[k],[k]}\,\big ((M+{\tilde{E}}^{(k)})_{[k],[k]} \big )^{-1}, \end{aligned}$$

and that the bound \(2\Vert {\tilde{E}}^{(k)}\Vert \le s_{\min }(M_{[k],[k]})\) implies

$$\begin{aligned} \Vert \big ((M+{\tilde{E}}^{(k)})_{[k],[k]} \big )^{-1}\big \Vert \le 2\big \Vert (M_{[k],[k]})^{-1}\big \Vert . \end{aligned}$$

Thus, applying (59), for every \(i=k+1,\dots ,n\) we get

$$\begin{aligned}&\Vert {\hat{M}}^{(k)}_{i,[k+1,n]}-M^{(k)}_{i,[k+1,n]}\Vert \nonumber \\&\quad \le \Vert {\tilde{E}}^{(k)}_{i,[k+1,n]}\Vert + 6\Vert \tilde{E}^{(k)}\Vert \,\Vert M\Vert \,\big \Vert (M_{[k],[k]})^{-1}\big \Vert +2\Vert M\Vert ^2\, \big \Vert \big (M_{[k],[k]} \big )^{-1}\big \Vert ^2\, \Vert {\tilde{E}}^{(k)}\Vert \nonumber \\&\quad \le \frac{ 1}{12}\,\frac{\min _{1\le v\le n-1} s_{\min }(M_{[v],[v]})^3}{\Vert M\Vert ^2} \frac{\delta }{3} \bigg ( 1+\frac{6\Vert M\Vert }{s_{\min }(M_{[k],[k]})} +\frac{2\Vert M\Vert ^2}{s_{\min }(M_{[k],[k]})^2} \bigg )\nonumber \\&\quad \le (\delta /3)\, \min _{1\le v\le n-1} s_{\min }(M_{[v],[v]}). \end{aligned}$$
(62)

Immediately we have

$$\begin{aligned} \max _{i,j \in [k+1,n]} | {\hat{M}}^{(k)}_{i,j } | \le \max _{i,j \in [k+1,n]} | M^{(k)}_{i,j}| + s_{\min }(M_{[k],[k]}) \le 2\max _{i,j,\ell }|M^{(\ell )}_{i,j}|. \end{aligned}$$

By the nature of the Gaussian Elimination process, the first k rows of \({\hat{M}}^{(k)}\) coincide with those of \({\hat{M}}^{(k-1)}\), and the bottom left \((n-k)\times k\) submatrix of \({\hat{M}}^{(k)}\) is zero. And thus

$$\begin{aligned} \max _{i,j} |{\hat{M}}^{(k)}_{i,j}| \le 2 \max _{i,j,\ell }|M^{(\ell )}_{i,j}|, \end{aligned}$$

confirming the condition (c) for the kth step.

It remains to check the condition (a). Note that we only need to consider the case \(k\le n-2\). Using the definition of vectors \(v_k(\cdot )\) from the beginning of Sect. 4, we can write

$$\begin{aligned} |M^{(k)}_{k+1,k+1}|&= M_{k+1,k+1} - M_{k+1,[k]} M^{-1}_{[k],[k]} M_{[k],k+1} \\&= | \langle v_k(M), M_{k+1,[k+1]}^\top \rangle | \\&= \Vert v_k(M)\Vert _2 \cdot \textrm{dist}(H, M_{k+1,[k+1]}^\top ) \ge 1 \cdot s_{\min }(M_{[k+1],[k+1]}), \end{aligned}$$

where \(H \subset {\mathbb {R}}^{k+1}\) is the subspace spanned by the first k rows of \(M_{[k+1],[k+1]}\). Applying (62) and (57), we conclude that

$$\begin{aligned} |{\hat{M}}^{(k)}_{k+1,k+1}|&\ge (1-\delta /3 )|M^{(k)}_{k+1,k+1}| > \max _{i\in [k+1,n]}|M^{(k)}_{i,k+1}| + (\delta / 3) |M^{(k)}_{k+1,k+1}| \\&\ge \max _{i\in [k+1,n]}|M^{(k)}_{i,k+1}| + (\delta /3) s_{\min }(M_{[k+1],[k+1]}) \ge \max _{i \in [k+1,n]} |{\hat{M}}^{(k)}_{i,k+1}|, \end{aligned}$$

and the result follows. \(\square \)

Proof of Theorem A

In view of Lemma 6.10, there are \(C',n_0>0\) such that, assuming \(n\ge n_0\),

$$\begin{aligned} {{\mathbb {P}}}\big \{ s_{\min }(A_{I_r,[r]})\le t\,n^{-C'}\hbox { for some } 1\le r\le n-1 \big \}\le t,\quad t>0. \end{aligned}$$

On the other hand, standard concentration estimates for the spectral norm of Gaussian matrices (see, for example, [20, Chapter 4]) implies that, assuming n is sufficiently large,

$$\begin{aligned} {{\mathbb {P}}}\big \{\Vert A\Vert \ge C''\sqrt{n}\big \}\le 2^{-n}. \end{aligned}$$

Let \(\delta \in (\textbf{u},1/3)\) be a parameter to be chosen later, and define the events

$$\begin{aligned} {\mathcal {E}}_1(\delta )&:=\bigg \{ 8n^2\,\textbf{u}^{1/2} \le \frac{ 1}{12}\,\frac{s_{\min }(A_{I_k,[k]})^3}{\Vert A\Vert ^3} \frac{\delta }{3}, \quad k=1,\dots ,n-1\bigg \},\\ {\mathcal {E}}_2(\delta )&:=\bigg \{ \frac{\max \limits _{i,j,\ell }|A^{(\ell )}_{i,j}|}{\Vert A\Vert }\le \textbf{u}^{-1/2} \bigg \},\\ {\mathcal {E}}_3(\delta )&:=\bigg \{ \frac{\max _{i\notin I_k}|{\mathcal {M}}^{(k-1)}_{i,k}|}{|{\mathcal {M}}^{(k-1)}_{i_k,k}|} \le 1-\delta ,\quad k=1,\dots ,n-1 \bigg \}. \end{aligned}$$

The above observations on the spectral norm and the smallest singular values imply

$$\begin{aligned} {{\mathbb {P}}}({\mathcal {E}}_1(\delta )) \ge 1-2^{-n}-n^{C'}\cdot \big (7C'' n^{7/6}{} \textbf{u}^{1/6}\delta ^{-1/3}\big ). \end{aligned}$$

Further, Proposition 6.13 (applied, say, with \(p=2\)) yields for all sufficiently large n,

$$\begin{aligned} {{\mathbb {P}}}({\mathcal {E}}_2(\delta ))\ge 1-n^{C'''}{} \textbf{u} \end{aligned}$$

for some universal constant \(C'''>0\). Finally, in view of Lemma 7.1, we have

$$\begin{aligned} {{\mathbb {P}}}({\mathcal {E}}_3(\delta ))\ge 1-\delta \,n^2. \end{aligned}$$

Thus, the intersection of the events \({\mathcal {E}}_1(\delta )\cap {\mathcal {E}}_2(\delta )\cap {\mathcal {E}}_3(\delta )\) has probability at least

$$\begin{aligned} 1-2^{-n}-n^{C'}\cdot \big (7C'' n^{7/6}\textbf{u}^{1/6}\delta ^{-1/3}\big )-n^{C'''}{} \textbf{u} -\delta \,n^2. \end{aligned}$$

Taking \(\delta :=\textbf{u}^{1/8}\), we get that for any large enough n,

$$\begin{aligned} {{\mathbb {P}}}\big ({\mathcal {E}}_1(\textbf{u}^{1/8})\cap {\mathcal {E}}_2(\textbf{u}^{1/8}) \cap {\mathcal {E}}_3(\textbf{u}^{1/8})\big )\ge 1-n^{{\tilde{C}}}{} \textbf{u}^{1/8}, \end{aligned}$$

where \({\tilde{C}}>0\) is a universal constant. It remains to note that, in view of Lemma 7.2, everywhere on the intersection \({\mathcal {E}}_1(\textbf{u}^{1/8})\cap {\mathcal {E}}_2(\textbf{u}^{1/8}) \cap {\mathcal {E}}_3(\textbf{u}^{1/8})\) the GEPP in floating point arithmetic succeeds for \({\hat{A}}^{(0)}=\textrm{fl}(A)\); the computed permutation matrix \({\hat{P}}\) coincides with the matrix P from the PLU–factorization of A in exact arithmetic, and

$$\begin{aligned} \textbf{g}_{\textrm{GEPP}}(A)\le \frac{4\,\max \limits _{i,j,\ell }|A^{(\ell )}_{i,j}|}{\max \limits _{i,j}|A_{i,j}|}. \end{aligned}$$

A second application of Proposition 6.13, now to bound \(\textbf{g}_{\textrm{GEPP}}(A)\) conditioned on the intersection \({\mathcal {E}}_1(\textbf{u}^{1/8})\cap {\mathcal {E}}_2(\textbf{u}^{1/8}) \cap {\mathcal {E}}_3(\textbf{u}^{1/8})\), completes the proof. \(\square \)

8 Further questions

In this section, we mention some open questions related to the probabilistic analysis of the Gaussian Elimination with Partial Pivoting.

Sharp estimate of the growth factor Our main result shows that with probability close to one, the growth factor of GEPP is at most polynomial in the matrix dimension, \(\textbf{g}_{\textrm{GEPP}}(A) \le n^C\). Our analysis leaves the constant \(C>0\) unspecified, and it would be of interest to obtain an estimate with a reasonable (single digit) explicit constant. Furthermore, as we mentioned in the introduction, it was suggested in [5] based on numerical simulations that for large n, \(\textbf{g}_{\textrm{GEPP}}(A) =O(n^{1/2+o_n(1)})\) with probability close to one. The problem of finding the optimal constant power of n in the growth factor estimate seems to require essential new ideas. At the same time, it is natural to expect that recurrent estimates of the singular spectrum of submatrices obtained in the GEPP process should remain a key element of the future refinements of our result.

The probability that the Gaussian Elimination with Partial Pivoting succeeds in the floating point arithmetic Our main result states that, under the assumption that the dimension n is sufficiently large, with probability at least \(1-\textbf{u}^{1/8}\,n^{{\tilde{C}}}\) the GEPP in f.p. arithmetic succeeds for \(\textrm{fl}(A)\), and the computed permutation matrix agrees with that obtained in exact arithmetic. We expect the probability estimate to be much stronger, perhaps of the form \(1-\textbf{u}^{1-o_n(1)}\,n^{{\tilde{C}}}\), and leave this as an open problem.

Smoothed analysis of the growth factor Our proof does not extend to the smoothed analysis setting without incurring significant losses in the upper estimate for the growth factor. In fact, our treatment of the partially random block matrices B in Sect. 3 heavily relies on the assumption that the norm of a submatrix within the “random part” of B is typically of the same order as the square root of that submatrix’ larger dimension. Establishing a polynomial upper bound on the growth factor in the presence of a non-random shift (of polynomial operator norm) is an interesting and challenging problem.