Average-case analysis of the Gaussian Elimination with Partial Pivoting

The Gaussian Elimination with Partial Pivoting (GEPP) is a classical algorithm for solving systems of linear equations. Although in specific cases the loss of precision in GEPP due to roundoff errors can be very significant, empirical evidence strongly suggests that for a {\it typical} square coefficient matrix, GEPP is numerically stable. We obtain a (partial) theoretical justification of this phenomenon by showing that, given the random $n\times n$ standard Gaussian coefficient matrix $A$, the {\it growth factor} of the Gaussian Elimination with Partial Pivoting is at most polynomially large in $n$ with probability close to one. This implies that with probability close to one the number of bits of precision sufficient to solve $Ax = b$ to $m$ bits of accuracy using GEPP is $m+O(\log n)$, which improves an earlier estimate $m+O(\log^2 n)$ of Sankar, and which we conjecture to be optimal by the order of magnitude. We further provide tail estimates of the growth factor which can be used to support the empirical observation that GEPP is more stable than the Gaussian Elimination with no pivoting.


Introduction
The Gaussian Elimination is a classical algorithm for solving systems of linear equations [6,Chapter 3], [7,Chapter 9]. The simplest form of the algorithm -the Gaussian Elimination with no pivoting -solves a linear system (SLE) Ax = b with a square coefficient matrix A by performing the LU -factorization: A is represented as the product LU where L and U are lower and upper triangular matrix, respectively, and x is obtained by a combination of forward and back substitutions y := L −1 b, x := U −1 y. A possible algorithmic representation of this well known process is given below in Algorithm 1. The procedure produces a sequence of matrices A (0) := A, A (1) , . . . , A (n−1) =: U , where for every k ≤ n − 1, the k × k top left submatrix of A (k−1) is upper triangular. The elimination process with no pivoting fails if at any step k = 1, . . . , n − 1, the k-th diagonal element of A (k−1) is zero.
The computation of A (k) from A (k−1) can be represented in matrix form as and where e 1 , . . . , e n are standard unit basis vectors in R n . We also note that with this notation The matrices Id n − τ (k) e ⊤ k , k = 1, . . . , n − 1, are called the Gauss transformations. When considering an implementation using the floating point arithmetic, a well known issue of the Gaussian Elimination is its numerical instability. Recall that the condition number κ(A) of a square matrix A defined as the ratio of the largest and smallest singular value of A. Even for some well-conditioned matrices (i.e having a small condition number), solving SLE with help of the Gaussian Elimination with no pivoting results in large relative errors of the computed solution vectors [6,Section 3.3].
Several modifications of the elimination procedure are commonly used in matrix computations to address the instability issue [6,Chapter 3], [7,Chapter 9]. In particular, the Gaussian Elimination Initialize A (k) as zero matrix 6: for i = k + 1, . . . , n do with Partial Pivoting (GEPP) looks for a representation P A = LU (called P LU -factorization), where, as before, L and U are lower and upper triangular matrices, while P is a specially constructed permutation matrix. The solution of a corresponding SLE can then be obtained by a combination of forward and back substitutions, and a permutation of vector's components (see Algorithm 2; for better readability, we represent the formula for L in matrix rather than entry-wise form there). The GEPP succeeds in exact arithmetic whenever A is non-singular (although it may fail in floating point arithmetic).
A seminal result of Wilkinson [21] gives an upper bound on the backward error during the Gaussian Elimination when the floating point computations are performed. Define the unit roundoff Initialize A (k) as zero matrix 10: for i = k + 1, . . . , n do 11: τ (k) i := (P (k) A (k−1) ) i,k /(P (k) A (k−1) ) k,k 12: for j = k + 1, . . . , n do 13: A (k) i,j := (P (k) A (k−1) ) i,j − (P (k) A (k−1) ) k,j τ whereÂ (1) , . . . ,Â (n−1) denote the computed (in the floating point arithmetic) matrices A (1) , . . . , A (n−1) . Then, under the above assumptions, the backward error estimate can be written as implying, under the additional assumption s min (A) ≥ 2 E , the forward error bound for the computed solution where κ(A) = A A −1 is the condition number of A. Similar error bounds are available for other versions of the Gaussian Elimination (with no pivoting, with complete or with rook pivoting). We refer, in particular, to Wilkinson's paper [21] and to modern accounts of the backward error analysis of the different forms of the Gaussian Elimination in [6,Chapter 3], [7,Chapter 9], as well as [8].
It can be checked that g GEPP (A) = O(2 n ) for any n × n invertible matrix A, and that this bound is attained. Thus, (2) provides a satisfactory worst-case estimate only under the assumption u ≪ 2 −n , i.e when the unit roundoff is exponentially small in the matrix dimension. At the same time, the accumulated empirical evidence suggests that for a "typical" coefficient matrix the loss of precision is much smaller than the worst-case prediction. Let us quote [6, p. 131]: "Although there is still more to understand about [the growth factor], the consensus is that serious element growth in Gaussian Elimination with Partial Pivoting is extremely rare. The method can be used with confidence." In [18] Trefethen and Schreiber carried out an empirical study of the Gaussian Elimination with Partial and with Complete Pivoting in the setting when the input coefficient matrix A is random, having i.i.d standard Gaussian entries. Their experiments showed that with high probability the growth factor in GEPP is only polynomially large in n. Further numerical studies by Edelman suggest that g GEPP (A) of an n × n standard Gaussian matrix A is of order O(n 1/2+o (1) ) with probability close to one (see a remark in [5, p. 182]).
An important step in improving theoretical understanding of numerical stability of the Gaussian Elimination was made by Yeung and Chan in [22]. Their result implies (although that is not explicitly stated in the paper) that for the Gaussian Elimination with no pivoting applied to the standard n × n Gaussian matrix, the relative error of the solution vector can be bounded above by u n O (1) with probability close to one. A vast generalization of their estimate was obtained by Sankar, Spielman, and Teng in [15] in the context of the smoothed analysis of algorithms. Let M be any non-random n × n matrix, and let G be an n × n matrix with i.i.d N (0, σ 2 ) Gaussian entries. The main result of [15] asserts that the expected number of bits of precision sufficient to solve (M + G)x = b to m bits of accuracy using Gaussian elimination without pivoting is at most m + O log n + M σ . This provides a theoretical justification for the observed performance of the GE with no pivoting for structured dense coefficient matrices.
The no-pivoting strategy is crucial for the proofs in [22] or [15]. With partial pivoting, the permutations of the rows after each elimination step introduce complex dependencies to the model which require other arguments to handle. In the PhD thesis [14], Sankar carried out smoothed analysis of GEPP based on certain recursive matrix formula (to be discussed in some detail in the next section). Let A = M + G, where G is the Gaussian random matrix with i.i.d N (0, σ 2 ) entries, and M is a deterministic matrix of spectral norm at most one. One of main results of [14] states that, with the above notation, that the quantity considered in [14] is not a growth factor as was defined above but its "exact arithmetic" counterpart. The relation between matrices A (k−1) and the corresponding computed matricesÂ (k−1) is not trivial and will be discussed later; at this point we note that assuming that the magnitudes of the ratio and the growth factor g GEPP (A) match and in view of (2), the result of Sankar implies that with high probability GEPP results in at most O(log 2 n) lost bits of precision in the obtained solution vector. This bound is worse than the O(log n) estimate for GE with no pivoting implied by [22].
To summarize, whereas strong results on average-case stability of GE with no pivoting has been obtained in the literature, the Gaussian Elimination with Partial Pivoting lacked matching theoretical guarantees, let alone justifying the common belief that GEPP tends to be more stable than GE with no pivoting. In this work, we make progress on this problem. To avoid any ambiguity, we recall all the imposed assumptions and notation: Theorem A. There are universal constants C,C > 1 and a functionñ : [1, ∞) → N with the following property. Let p ≥ 1, and let n ≥ñ(p).
• Assume that the floating point computations with no underflow and overflow exceptions and with a unit roundoff u are being performed.
• Let A be the random n × n matrix with i.i.d standard Gaussian entries, and assume that the floating point GEPP is performed on the matrix fl(A).
Then with probability at least 1 − u 1/8 nC, the GEPP for fl(A) succeeds in floating point arithmetic and the computed permutation matrixP agrees with the matrix P from the P LU -factorization of A in exact arithmetic. Furthermore, assuming u 1/8 nC ≤ 1/2, P g GEPP (A) ≥ n t GEPP succeeds in f.p. arithmetic andP = P ≤ 40 p n −pt , t ≥ Cp 2 .
We do not attempt to compute the constant C in the above theorem explicitly and leave the problem of finding an optimal (up to n on(1) multiple) estimate of the growth factor g GEPP (A) for future research (see Section 8). Further, we expect a much stronger bound on the probability that GEPP succeeds in the floating point arithmetic and thatP = P .
In view of the aforementioned work of Wilkinson and well known estimates for the condition number of the Gaussian matrix [4,17], the theorem implies that with probability close to one the number of bits of precision sufficient to solve Ax = b to m bits of accuracy using GEPP is m + O(log n). We conjecture that this bound is optimal in the sense that in the same setting m + Ω(log n) bits of precision are necessary with probability close to one.
Let us further apply Theorem A to compare numerical stability of GEPP with that of GE with no pivoting. As we mentioned at the beginning of the introduction, the Gaussian Elimination with no pivoting can produce arbitrarily large relative error in the floating point arithmetic even for well-conditioned coefficient matrices. As an illustration, consider a 2 × 2 standard Gaussian matrix in floating point arithmetic, The Gaussian Elimination with no pivoting yields the computed LU -factorization of M , L = 1 0 fl fl(g 21 )/fl(g 11 ) 1 ;Û = fl(g 11 ) fl(g 12 ) 0 fl fl(g 22 ) − fl(g 12 ) · fl(g 21 )/fl(g 11 ) .
With the above conditions, the bottom right element of the productLÛ differs from fl(g 22 ) by a quantity of order Ω(u) fl(g 22 ) − fl(g 12 ) · fl(g 21 )/fl(g 11 ) , that is, the normwise backward error satisfies for some universal constant c > 0 (one may safely take c = 1/100, say). In sharp contrast with the above observation, in the case of GEPP the probability of large deviations for the backward error is much smaller as Theorem A shows. Indeed, with the notation from the theorem and in view of Wilkinson's bound, arbitrary p ≥ 1 and assuming n is sufficiently large, we have for a universal constant C ′ > 0. Thus, the distribution of the backward error of GEPP decays superpolynomially. Informally, the "proportion" of well-conditioned coefficient matrices yielding large backward errors is much smaller for GEPP than for the Gaussian Elimination with no pivoting.
We provide a detailed outline of the argument, as well as a comparison of our techniques with the earlier approach of Sankar, in the next section.
The following notation will be used throughout the paper: is the (i, j)-th entry of M s j (M ) is the j-th largest singular value of M R I The |I|-dimensional Euclidean space with components indexed over I dist(·, ·) The Euclidean distance 2 Outline of the proof Let A be an n × n standard Gaussian matrix, let A (0) := A, A (1) , . . . , A (n−1) be the sequence of matrices generated by GEPP process, and let P (1) , . . . , P (n−1) be the corresponding permutation matrices (see Algorithm 2). It turns out that in our probabilistic model, estimating the growth factor g GEPP (A) can be reduced to bounding the exact arithmetic counterpart of the quantity, We will provide a rigorous justification in Section 7 and avoid discussing this technical matter here. We only note that the comparison of the matrices A (k−1) andÂ (k−1) , 1 ≤ k ≤ n − 1, follows a well established inductive argument somewhat similar to the one used to prove Wilkinson's backward error bound. From now on and till Section 7 we work in exact arithmetic unless explicitly stated otherwise.
Define "unpermuted" matrices M (k) obtained at the k-th elimination step, i.e M (0) := A and Let I 0 := ∅, and for each 1 ≤ k ≤ n − 1 let I k = I k (A) be the (random) subset of [n] of row indices of A corresponding to the pivot elements used in the first k steps of the "permutation-free" elimination process, i.e (see, in particular, [15,Formula 4.1] for GE with no pivoting, which can be adapted to our setting). Thus, for 0 ≤ k < n − 1, the index i k+1 is defined as the one corresponding to the largest number among Due to strong concentration of Gaussian variables, the operator norms of matrices A I,J , I, J ⊂ [n], can be uniformly bounded from above by a polynomial in n. Thus, the principal difficulty in obtaining satisfactory upper bounds on the growth factor g GEPP (A) is in estimating the norm of vectors row j ( is greater than any predefined constant power of n. We expect that a much stronger lower bound can be established. The first part of this section is devoted to the argument of Sankar from [14] which yields a bound g GEPP (A) = O(n C log n ) with high probability using certain recursive matrix formula. In the second part, we discuss our approach.

Sankar's argument
Consider a block matrix where B uℓ and B ℓr are square non-singular matrices and X is a row vector. Then, denoting where B † u is the right pseudoinverse of B u (see [14,Chapter 3]). The above formula is applied in [14] in a recursive manner. Assume for simplicity of exposition that we are interested in bounding the Euclidean norm of the vector A j,[n/2] (A I n/2 ,[n/2] ) −1 for some j ∈ [n] \ I n/2 (recall that, in view of (4) and standard concentration estimates for the spectral norm of Gaussian matrices, this would immediately imply an estimate on the components of M It can be checked that with the above notation, B ′ = M Now, assume that we have constructed a sequence of indices 0 = k 0 < k 1 < k 2 < · · · k s < n/2, with k s = n/2 − 1 and k 1 ≥ n/4. Applying the last relation recursively s times, we obtain where, by the definition of the partial pivoting, M i n/2 ,n/2 ) −1 ≤ 1. Therefore, the problem reduces to estimating the spectral norms of matrices Sankar shows that as long as n/2 − k ℓ (ℓ = s, s − 1, . . . , 0) grow as a geometric sequence (in which case s should be of order logarithmic in n), the norm of each matrix can be bounded by a constant power of n with a large probability. We only sketch this part of the argument. Fix any 0 ≤ ℓ < s, and define Z := (A I k ℓ ,[k ℓ +1,n/2] ) † A I k ℓ ,[k ℓ ] , so that ZZ † = Id, and where, in view of (4), Since Z † Z is a projection and has unit norm, an upper bound on ( The key observation here is that A I k ℓ ,[k ℓ +1,n/2] is equidistributed with the standard k ℓ × (n/2 − k ℓ ) Gaussian matrix, so that a satisfactory estimate on the norm of the pseudoinverse follows.
Bounding the operator norm of M (k ℓ ) lently, it is sufficient to provide a good lower bound on the smallest singular value of the matrix We have where, again in view of (4), for each admissible J, and where Z † Z is a k ℓ × k ℓ orthogonal projection matrix of rank n/2 − k ℓ . Although (A J,[k ℓ ] ) ⊤ is dependent on Z, it can be shown that Z † Z(A J,[k ℓ ] ) ⊤ behaves "almost" like Z † Z applied to an independent tall rectangular k ℓ × (k ℓ+1 − k ℓ ) Gaussian matrix (see [14,Section 3.7]). This allows to obtain probabilistic estimates on the smallest singular value of the matrix in (9) which, under the assumption that the sequence n/2 − k ℓ (ℓ = s, s − 1, . . . , 0) does not grow too fast, turn out to be strong enough to survive the union bound in (8).
To summarize, the above argument gives a polynomial in n estimate for matrices in (7), where s is logarithmic in n. Thus, (6) implies a bound M (n/2) j,[n/2+1,n] 2 = n O(log n) , j ∈ [n] \ I n/2 , with high probability. An extension of this argument to all M (k) , 1 ≤ k ≤ n − 1, yields g GEPP (A) = n O(log n) . As Sankar notes in [14], a different choice of s and of the sequence k 0 , k 1 , . . . , k s , and a refined analysis for the operator norms of matrices (7) may improve the upper estimate on the growth factor, but cannot achieve a polynomial bound.

High-level structure of the proof of the main theorem
Returning to relation (4), a polynomial bound on the growth factor g GEPP (A) will follow as long as the norm (A Ir,[r] ) −1 is bounded by n O(1) for every 1 ≤ r ≤ n − 1 with high probability. We obtain this estimate via analysis of the entire singular spectrum of A Ir,[r] rather than attempting to directly bound the smallest singular value of the matrix.
The strategy of the proof can be itemized as follows: • Obtaining estimates on the singular values of partially random block matrices. More specifically, we consider matrices of the form where F is a fixed square matrix with prescribed singular spectrum, and M, W, Q are independent Gaussian random matrices of compatible dimensions. Our goal here is to derive lower bounds on the intermediate singular values of B in terms of singular values of F .
• Applying the estimates on the intermediate singular values of partially random block matrices in a recursive manner together with a union bound argument, derive lower bounds on the "smallish" singular values of matrices A Ir, [r] . Our argument at this step only allows to bound first to (r − C)-th singular value of the matrix for some large constant C.
• Use the bound on s r−C (A Ir, [r] ) together with the information on the Euclidean distances from row i ℓ (A Ir, [r] ) to span {row i j (A Ir,[r] ), 1 ≤ j < ℓ}, ℓ = 1, . . . , r that can be extracted from the partial pivoting rule, to obtain polynomial in n lower bounds on s min (A Ir, [r] ).
Below, we discuss each component in more detail.
Singular spectrum of partially random block matrices. The partially random block matrices are treated in Section 3 of the paper. Consider a block matrix B of type (10), where where F is a fixed r × r, M is r × x, W is x × r, Q is x × x (with x ≤ r), and the entries of M , W , Q mutually independent standard Gaussian variables. In view of rotational invariance of the Gaussian distribution, we can "replace" F with a diagonal matrix D with the same singular spectrum, and with its singular values on the main diagonal arranged in a non-decreasing order. We fix a small positive parameterε > 0 and an integer i ≥ 1 such thatε(1 +ε) −i r ≥ 2. Our goal at this point is to estimate from below the singular value Having chosen a certain small threshold τ > 0 (which is defined as a function of i, r,ε, the singular spectrum of D, and some other parameters which we are not discussing here), our estimation strategy splits into two cases depending on whether the number ℓ ′ i+1 of the singular values of D less than τ is "small" or "large". In the former case, the matrix D has a well controlled singular spectrum, and our goal is to show that attaching to it x rows and columns of standard Gaussians cannot deteriorate the singular values estimates. In the latter case, we show that by adding the Gaussian rows and columns we actually improve the control of the singular values, using that the top left ℓ ′ i+1 × ℓ ′ i+1 corner of B is essentially a zero matrix. The main result of Section 3 -Proposition 3.5 -provides a probability estimate on the event that the ratio is small assuming certain additional relations between the parameters x, r,ε.
A recursive argument to bound s r−C (A Ir, [r] ). The treatment of the partially non-random block matrices allows us to solve the principal problem with estimating the singular spectrum of A Ir, [r] , namely, the complicated dependencies between A and the index set I r . As we mentioned before, simply bounding the k-th smallest singular value of A Ir,[r] by min duces an unsatisfactory estimate for small k. On the other hand, in view of strong concentration of intermediate singular values, already for k ≫ √ n polylog(n) (see Proposition 3.2) this straightforward union bound argument does work. In order to boost the union bound argument to smaller k, we avoid taking the union bound over all I ⊂ [n], |I| = r and instead condition on a realization of I r ′ for certain r ′ < r, so that the union bound over all I r ′ ⊂ I ⊂ [n] of cardinality r runs over only n−r ′ r−r ′ admissible subsets rather than n r subsets. The two main issues with this approach are • first, we must have estimates for the singular spectrum of A I r ′ ,[r ′ ] in order to apply the results of Section 3 to obtain bounds for the singular values of A Ir, [r] , and, • second, conditioning on a realization of I r ′ inevitably destroys Gaussianity and mutual independence of the entries of The first issue is resolved through the inductive argument, when estimates on the spectrum of A I r ′ ,[r ′ ] obtained at the last induction step are used to control the singular spectrum of A Ir,[r] at the next step. Of course, in this argument we must make sure that the total error accumulated throughout the induction process stays bounded by a constant power of n.
The second issue with probabilistic dependencies is resolved by observing that the partial pivoting "cuts" a not too large set of admissible values for the elements in A [n]\I r ′ ,[r ′ ] i.e we can continue treating them as independent Gaussians up to a manageable loss in the resulting probability estimate after conditioning on a certain event of not-too-small probability. This problem is formally treated by studying the random polytopes K r ′ (A) ⊂ R n defined in Section 4 as By the nature of the partial pivoting process, any row of the submatrix A [n]\I ′ r ,[n] necessarily lies within the polytope K r ′ (A), and its distribution is a restriction of the standard Gaussian measure in R n to K r ′ (A) (see Section 4 for a rigorous description). After showing that the Gaussian measure of K r ′ (A) is typically "not very small", we can work with the rows of A [n]\I ′ r ,[n] as if they were standard Gaussian vectors, up to conditioning on an event of a not very small probability. We remark here that Sankar's work [14] uses random polytopes related to our construction.
Estimating the smallest singular value of A Ir, [r] . To simplify the discussion, we will only describe the idea of showing that with a "sufficiently high" probability, s min (A Ir, , for some integer constantC > 0 (see Corol- . This corollary is a quantitative version of a rather general observation that, by adding at least ℓ + 1 independent Gaussian rows to a fixed square matrix with at most ℓ zero singular values, we get a rectangular matrix with a strictly positive s min almost surely. Once a satisfactory bound on s min ( , is obtained, we rely on the simple deterministic relation between the smallest singular value and distances to rowspaces: for every m × k matrix Q, where H i (Q) denotes the subspace spanned by row vectors Q j,[m] for j = i. In our context, a strong probabilistic lower bound on dist(span {A it,[r] , 1 ≤ t < s}, A is,[r] ) (s ≤ r) guaranteed by the partial pivoting strategy, implies that with high probability for every (see proof of Proposition 6.9), and via the above deterministic relation to the singular values, This, combined with some auxiliary arguments, implies the lower bound on s min (A Ir,[r] ).

Intermediate singular values of partially random block matrices
We start with a preparatory material to deal with norms and intermediate singular values of random matrices. We first consider a standard deviation estimates for the Hilbert-Schmidt norm of a Gaussian random matrix; see, for example, [1]: where c > 0 is a universal constant.
The next proposition was proved in the special case of square random Gaussian matrices by Szarek in [17]. In a much more general setting, similar results were obtained earlier by Nguyen [12]; his argument was later reused in [9] to get sharp small ball probability estimates for the condition number of a random square matrix.

Proposition 3.2 (Singular values of random matrices with continuous distributions). Let
We provide a proof of the above proposition in the Appendix.
This section deals with a large number of parameters satisfying multiple constraints; we group those constraints into blocks for better readability. We have four "section-wide" scalar parameters: The objective of the section is to study singular values of a block matrix of the form and the entries of M , W , Q mutually independent standard Gaussians. For the matrix F , we will assume that its singular values satisfy the relations where g(·) is a strictly positive growth function satisfying Observe that the relationε(1 +ε) −i r ≥ 2 implies that the sequence is strictly increasing, so the condition (12) on g(·) is well posed. In this section, we deal with an arbitrary growth function satisfying the conditions; a specific choice of g(·) will be made later in Section 5. We further define numbers to partition the singular spectrum of F into blocks according to constraints (11); specifically, we have Note that if F = U DV is a singular values decomposition of F then, in view of rotational invariance of the Gaussian distribution, where U −1 M , W V −1 , and Q have mutually independent standard Gaussian entries, and where the singular spectrum of B coincides with that of We can assume without loss of generality that the diagonal elements / singular values of D are arranged in the non-decreasing order when moving from top left to bottom right corner. We will work with the singular spectrum of B ′ as it will allow to somewhat simplify the computations.
The specific goal is to estimate from below the singular value To have a better control on probability estimates, we introduce one more scalar parameter h ∈ (0, 1] which will allow us to balance the precision of the estimate and the probability with which the estimate holds (the smaller h is, the less precise the estimate is and the stronger are probability bounds).
We set . Our argument to control s ⌊(1−(1+ε) −i−1 )(r+x)⌋ (B) splits into two parts depending on whether ℓ ′ i+1 is "small" or "large". In the former case, the matrix F (or D) has a well controlled singular spectrum, and our goal is to show that attaching to it x rows and columns of standard Gaussians cannot deteriorate the singular values estimates. In this setting, we completely ignore the first ℓ ′ i+1 rows of B ′ , and work with the matrix B ′ I× [r+x] . In the latter case, we show that by adding the Gaussian rows and columns we actually improve the control of the singular values. The fact that the top right ℓ ′ i+1 × x corner of B ′ is a standard Gaussian matrix, plays a crucial role in this setting. The high-level proof strategy in either case is similar. We construct a (random) subspace H of R r+x of dimension at least (1 − (1 +ε) −i−1 )(r + x), designed in such a way that, under appropriate assumptions on the singular spectra of certain submatrices of U −1 M , W V −1 , and Q, B ′ v 2 is large for every unit vector v ∈ H. By the minimax formula for singular values, The "appropriate assumptions" on the singular spectra are encapsulated in a good event E good which, as we show, has a very large probability.
Lemma 3.3. There exist universal constants c ∈ (0, 1],C ≥ 1 with the following property. Assume thatε ∈ (0, 1], h ∈ (0, 1], r, and x satisfy and that where c ′ is the constant from Proposition 3.2. Then with probability at least Proof. Construction of subspace H. Denote by X 1 , X 2 , . . . , X r+x ∈ R r+x an orthonormal basis of the right singular vectors of the matrix Note that by interlacing properties of the singular values (see, for example [3]), we have has mutually independent standard Gaussian entries. Denote by e Similarly, for every 1 ≤ j ≤ i, we define the x × ℓ j matrix Letê 1 ,ê 2 , . . . ,ê x−⌊εx⌋ be a random orthonormal set of right singular vectors ofŶ corresponding to x − ⌊εx⌋ largest singular values ofŶ , and letẼ ⊂ R ℓ ′ i+1 +x be the random subspace of dimension x − ⌊εx⌋ defined asẼ := span {ê 1 ,ê 2 , . . . ,ê x−⌊εx⌋ }. Now, we construct the (random) subspace H ⊂ R r+x as Let us check that the constructed subspace satisfies the required lower bound on dimension, Defining a good event. Denote byẼ the event where the constant c ′ is taken from Proposition 3.2. According to our definition of the subspaceẼ, for every unit vector v as above we have ). Hence, by Proposition 3.2 applied toŶ , we get Since, by our assumptions, √ε x/h ≥ 2 ℓ ′′ i+1 , we have, according to Proposition 3.1, for a universal constant c > 0. Similarly, since for every j = 1, 2, . . . , i, We define In view of the above, for a universal constantC > 0.
Checking that H satisfies the required property conditioned on E good . Assuming the conditioning, pick any unit vector v ∈ H. We represent v in terms of the basis X 1 , . . . , for some coefficients a 1 , . . . , a r+x with r+x q=1 a 2 q = 1. Note that If the last expression is greater than β 2 then we are done. Otherwise, we have and hence, in particular, and for every j = 1, 2, . . . , i, Observe that the last conditions yield In view of conditioning onẼ, this immediately implies Further, in view of conditioning on events and for every j = 1, 2, . . . , i, Thus, by the triangle inequality, where the last relation follows from our assumptions on parameters, (13), and (12). The assumption on β then implies the result.
Lemma 3.4. There are universal constants c ∈ (0, 1],C ≥ 1 with the following property. Assume that where c ′ is the constant from Proposition 3.2, and that ℓ ′ i+1 satisfies Then with probability at least Proof. Construction of subspace H. Consider a refinement of the block representation of B ′ : and W ′′ i+1 are defined accordingly. In this proof, we denote by P ′ i+1 : R r+x → R ℓ ′ i+1 the coordinate projection onto first ℓ ′ i+1 coordinates, by P x : R r+x → R x the coordinate projection onto last x coordinates, and, for every 1 ≤ j ≤ i, denote by P j : R r+x → R ℓ j the coordinate projection onto ℓ j components starting from 1 + i+1 For every 1 ≤ j ≤ i, denote by e (j) q , 1 ≤ q ≤ min(⌊2 j−i−1ε x⌋, ℓ j ), a random orthonormal system of right singular vectors of W j corresponding to min(⌊2 j−i−1ε x⌋, ℓ j ) largest singular values of W j , and let E (j) ⊂ R ℓ j be the subspace Finally, we construct a random subspaceÊ ⊂ R x as follows. Denote by e (Q) q , 1 ≤ q ≤ ⌊εx⌋, a random orthonormal system of right singular vectors of Q corresponding to ⌊εx⌋ largest singular values of Q, and let E (Q) ⊂ R x be the subspace For every 1 ≤ j ≤ i, let e (M j ) q , 1 ≤ q ≤ ⌊2 j−i−1ε x⌋, be a random orthonormal system of right singular vectors of M j corresponding to ⌊2 j−i−1ε x⌋ largest singular values of M j , and let We then setÊ The subspace H is now defined as Let us check that H satisfies the required assumptions on the dimension. We have Next, we use the assumption on ℓ ′ i+1 and the assumptions on parameters to obtain Defining a good event. Denote byẼ the event where the constant c ′ is taken from Proposition 3.2. According to our definition of the subspaceẼ, for every unit vector v as above we have Hence, by Proposition 3.2 applied to W ′ i+1 , we get Note that conditioned on E (j) , we have Since for every j = 1, 2, . . . , i, Finally, we define events corresponding to a "good" realization ofÊ. Let E M ′ i+1 be the event Repeating the argument forẼ, we get Similarly, adjusting the argument for E (j) accordingly, we get that for every 1 ≤ j ≤ i, the event has probability at least 1 − 2 exp − c 2 i+1−j ℓ j xε/h 2 , and that the event We define In view of the above, for a universal constantC ≥ 1.
Checking that H satisfies the required property conditioned on E good . Assuming the conditioning, pick any unit vector v ∈ H. First, we observe that whereas, by the definition ofÊ and the conditioning, then, in view of the conditioning (see the definition of E M j ), On the other hand, by our assumptions As a final step of the proof, we will show that for any unit vector v ∈ H satisfying conditions (16) and (17) (16) and (17) imply that whence, in view of conditioning onẼ, Now, for every 1 ≤ j ≤ i, by the above and in view of conditioning on E (j) , Similarly, in view of conditioning on E Q , we get Thus, and the proof is complete.
For convenience, we combine the above two statements into a single proposition, while simplifying some of the constraints on parameters.
Proposition 3.5. There are universal constants c ∈ (0, 1],C ≥ 1 with the following property. Let x × x, and the entries of M , W , Q are mutually independent standard Gaussians. Assume that parametersε ∈ (0, 1], h ∈ (0, 1], r, x, and i ∈ N satisfy where c ′ is the constant from Proposition 3.2. Further, assume (11) for the singular values of F , for a positive function g(·) satisfying (12). Then with probability at least 4 Random polytopes, and distances to pivot rows where σ k is the standard Gaussian measure for the corresponding dimension. Suppose we have performed r steps of the GEPP algorithm on the n × n Gaussian matrix A. Let I ⊂ [n] have size r, and condition on a realization of I r = I and A I, [r] , which determines K r (A). Then, for every j ∈ [n] \ I, the j-th row of A is a Gaussian vector conditioned to stay within the polytope K r (A). Formally, for every I ⊂ [n] of size r, every j ∈ [n] \ I, and every Borel subset B of R n , almost everywhere on the event {I r (A) = I}.
We will not directly use the above description of the conditional distribution of (A j,[n] ) ⊤ given A I, [r] ; instead, we will apply a simple decoupling based on Lemma 4.1 which essentially establishes the same property. We provided the above formula only to clarify our argument. As the first main result of the section, we have a probability estimate for the event that the Gaussian measure of the polytope K r (A) is below a given threshold:  Finally, in view of the standard bound n n−r ≤ n n−r for the number of subsets I ⊂ [n] of size r, and by the union bound argument, the result follows. Lemma 4.3. Let A be an n × n Gaussian matrix, and let r ∈ [n] and τ ∈ (0, 1) be parameters. Then, conditioned on the event σ n (K r (A)) ≥ τ , Proof. Let v := v r (A)/ v r (A) 2 and let P : R n → R r be the orthogonal projection onto the span of {e s } s∈ [r] . From the definition of v, we have that P v is a unit normal to the hyperplane H in R r .
It remains to note that, by the definition of K r (A), on the event σ n (K r (A)) ≥ τ we have As a corollary, we obtain the following probabilistic bound on the distance between (A ir,[r] ) ⊤ and the span of "previous" rows (selected at previous steps of the GEPP process) (A is,[r] ) ⊤ , s ∈ [r − 1]: Corollary 4.4. Let A be an n × n Gaussian matrix. For t ≥ 2 and r ∈ [n − 1], with probability at least 1 − n −t(n−r)/2 we have where H is the random subspace of R r spanned by vectors (A is,[r] ) ⊤ , s ∈ [r − 1].
Proof. In view of Lemma 4.3, the statement would follow as long as P σ n (K r (A)) ≥ n −t ≥ 1 − n −t(n−r)/2 . The latter is verified in Proposition 4.2.

A recursive argument
The goal of this section is to bound from below the intermediate singular values s r−k (A Ir,[r] ) for every r greater than some absolute constant and for k of a constant order. We will start with bounding the intermediate singular values in the bulk of the singular spectrum first and then will recursively apply Proposition 3.  and E smin rect (p, k, β) be the event that Note that although n is not mentioned explicitly in the list of parameters for E IntSing (p, k, β), it clearly depends on the underlying matrix dimension.
We remark that the lower bounds on k 0 (p) and β 0 (p) in the assumptions of the proposition are not required in the proof but will be needed later. As a corollary of the proposition (proved in the end of this section), we have Now, we present a technical version of the above proposition. We introduce several "sectionlevel" parameters. Letε > 0 be a small constant and L be a large integer to be determined later. The parameterε will play the same role as in Proposition 3.5. Next, let The main technical result in this section is the following Lemma 5.4. Fixε ∈ (0, 1/100] and L ≥ 1/ε. Then there exists a positive integer n 0 (depending onε and L) such that for any n ≥ n 0 and s ∈ [0, s 1 − 1], we have for every α ≥ 4: where c(ε) and C(ε) are positive constants which depend on c,C from Proposition 3.5 and onε.
For the rest of the section, we fix s ∈ [0, s 1 − 1]. and let i max be the integer such that

Choice of parameters and the growth function
Note thatεm s /10 ≥ L/ε, and hence i th ≤ i max .
For every i ∈ [i th , i max ], we define a non-decreasing function (24) Further, we define a collection of integers {r i } i∈[imax+1] inductively as follows. Whenever i ∈ [i th ], we set r i := m s . Further, assuming that r i has been defined for some i ∈ [i th , i max ], we let r i+1 be the smallest integer such that f i (r i+1 ) ≥ r i . Note that m s = r 1 ≤ r 2 ≤ · · · ≤ r imax+1 .
We recall our strategy: to bound the singular value s ⌊(1−(1+ε) −i−1 )r⌋ (A Ir,[r] ) from below, we will select an appropriate integer r ′ < r and apply Proposition 5.2 with B := A I, [r] and F := A I r ′ ,[r ′ ] , taking the union bound over all subsets I ⊂ [n] with I r ′ ⊂ I and |I| = r. The function f i defined above, determines the choice of r ′ , namely, we choose for r i ≤ r ≤ m s+2 . The indices i th and i max defined above, determine the range of application for the inductive strategy; namely, i max marks the largest index i for which our induction argument can be applied, and i th indicates a threshold value below which the corresponding singular values s ⌊(1−(1+ε) −i )r⌋ (A Ir,[r] ) concentrate very strongly and can bounded directly with help of Proposition 3.2 and a simple union bound argument.
The goal of this subsection is to verify certain relations between the introduced parameters, that need to be satisfied in order to apply the results on the singular values established earlier.
Since the results here are of purely computational nature, we present the proofs in the Appendix.
Moreover, i, r, x, andε satisfy the assumptions in Proposition 3.5, specifically, For a given i ∈ [i th , i max ], the numberr satisfying the assumptions of the above lemma can only be chosen if r i+1 ≤ m s+2 . In the next statement, we show that the inequality is satisfied for every admissible i (and in fact verify a slightly stronger bound): Lemma 5.8 (An upper bound on r imax+1 ). Letε ∈ (0, 1/28) and L ≥ 4. Then, r imax+1 ≤ 2m s = m s+1 .
To construct the growth function g(·) from (12), we first define an auxiliary positive function g s (·), and then set g ⌊(1 − (1 +ε) −j )r⌋ := g s (j) for all admissible j. The formal definition of g s (·) is given below.
Definition 5.9. Let α ≥ 1 be a parameter. For i ∈ [i th ], we set where c ′ is the constant from Proposition 3.2.
For i ∈ [i th , i max ], we apply a recursive definition: where h s (i) is given by and where C h ≥ − log 2 −11 (c ′ ) 2ε is a constant depending only c,C (from Proposition 3.5) and c ′ , and which we shall determine in Lemma 5.10.
The function h s (i) corresponds to the parameter h in Proposition 3.5, and is constructed in such a way that certain union bound argument that we are going to apply further works. The next lemma clarifies the choice of the constant C h from the above definition: Lemma 5. 10. The constant C h can be chosen so that the following holds. For i ∈ [i th , i max ] and r ∈ [r i+1 , m s+2 ], let r := f i (r) and x :=r − r. Then, In the next lemma we verify the crucial bound on the growth function which will ultimately guarantee a polynomial in n bound on the intermediate singular values: Lemma 5.11. There exists C(ε) > 1 which depends on c ′ ,C from Proposition 3.5, on C h , and oñ ε, such that

Good events, and probability estimates
where g s (·) is given in Definition 5.9. Further, we denote E(r, Then where the random polytope K r (A) ⊂ R n was defined in (18), and where σ n is the standard Gaussian measure in R n .
Proof. We start by noting that the last inequality in the statement of the lemma follows from the estimate x ≥ (1 +ε) −imax m s ≥ L/ε (see Lemma 5.7 and the definition of i max ). We further partition the event in question so that where the outer expectation is with respect to A I, [r] .
For each realization of A I, [r] such that the event E(I) holds, we apply Proposition 3.5 with where g s (·) is given in Definition 5.9. Since 16 g s (j + 1) ≤ g s (j) for j ∈ [i − 1] and g s (j) ≤ 1 for j ∈ [i], the function g(·) defined this way satisfies (12). Recall that on the event E(I) we have We apply Proposition 3.5 with g(t) and with h := h s (i) (see Definition 5.9) so that c ′ε h 5 g ⌊(1 − (1 +ε) −i )r⌋ 32 = g s (i + 1) (observe that our parameters r, x satisfy the assumption of the proposition due to Lemma 5.7, and that h satisfies the assumption h ≤ 2 −11 (c ′ ) 2ε in view of the assumptions on the constant C h in Definition 5.9). We get In view of Lemma 5.10, this implies Combining the last inequality with (39), we obtain Next, we will treat the denominator in the estimate (38). By Fubini's theorem, Almost everywhere on the event {σ r (K(I)) ≥ n −α/2 } we have At this point, we are ready to prove the main lemma in this section.
Proof of Lemma 5.4. First, recall that in view of Lemma 5.8, r imax+1 ≤ m s+1 , and that in view of (25) we have where we used the definition of the events E(r, i) (Definition 5.12). To estimate the probability of the union of the events in the last line, we shall embed it into a specially structured collection. Let r ′ := f imax (m s+2 ), where f · (·) was defined in (24). We have To be able to apply a recursive bound from the last lemma, we use the bounds Thus, using that f i (r) ≤ r ′ for i ∈ [i th + 1, i max ] and that m s+2 − r ′ ≥ L, we get for some c(ε) > 0. It remains to note that in view of Lemma 5.11, g s (i max + 1) ≥ n −C(ε)α for some C(ε).
Proof of Corollary 5.3. For brevity, we denote k 0 := k 0 (p). We fix r ∈ [k 0 + 1, n − 2k 0 ] and let F r ⊂ R Ir be the right singular subspace of the matrix (A Ir, [r] ) ⊤ corresponding to k 0 smallest singular values of (A Ir,[r] ) ⊤ (since almost everywhere on the probability space I r is unambiguously determined, and the singular values of (A I,[r] ) ⊤ are distinct for every I ⊂ [n] with |I| = r, F r is uniquely defined). Now, let us define the eventẼ r (β) that where the domain of (A Ir,[r+1,2k 0 ] ) ⊤ to F r and F ⊥ r , respectively. Then, conditioned on the intersection Ẽ r (β) ∩ E IntSing (p, k 0 , β), for any v ∈ R Ir \{0}, where P Fr and P F ⊥ r are orthogonal projections onto F r and F ⊥ r , respectively. Consider two cases.
where the last inequality holds because β ≥ β 0 (p) ≥ 300p and since n is sufficiently large depending on p.
From now on, we fix r ∈ [k 0 + 1, n − 2k 0 ] and condition on a realization of A [n], [r] such that the set I r and the space F r are uniquely determined. We will writeP andẼ to denote the corresponding conditional probability and conditional expectation. are (standard) Gaussian linear operators from F r to R 2k 0 and from F ⊥ r to R 2k 0 , respectively. For the purpose of estimating the operator norm and least singular values, we can view W and Q as matrices with i.i.d N (0, 1) entries of dimensions 2k 0 × (r − k 0 ) and 2k 0 × k 0 , respectively; more specifically, we can define standard Gaussian matricesW andQ of dimensions 2k 0 × (r − k 0 ) and 2k 0 × k 0 such that everywhere on the probability space the singular spectrum of W andW , and of Q andQ, agree.
It is well known that the expected operator norm of any standard Gaussian matrix is no more than the sum of square roots of its dimensions (see, for example, [20,Section 7.3]). Hence, Since the spectral norm is 1-Lipschitz, the standard Gaussian concentration inequality (see, for example, [20, Section 5.2]) implies Next, we derive an estimate for s min (Q) = s min (Q). For i ∈ [k 0 ], let P i : R k 0 → R k 0 be the orthogonal projection to the subspace which is orthogonal to the columns vectorsQ [2k 0 ],j for j ∈ [k 0 ]\{i}. Then, Since P j andQ [2k 0 ],j are independent, the norm P j (Q [2k 0 ],j ) 2 is equidistributed with that of a 2k 0 − (k 0 − 1) = (k 0 + 1)-dimensional standard Gaussian vector. Since the probability density function of a (k 0 + 1)-dimensional Gaussian vector is bounded above by (2π) −(k 0 +1)/2 , we obtaiñ where |B k 0 +1 2 | is the Lebesgue measure of the unit Euclidean ball B k 0 +1 2 in R k 0 +1 . Therefore, in view of the previous computations, ), we get that there exists a universal constant Now, setting t := n −β/(40p) , we get where the last inequality holds since k 0 = k 0 (p) ≥ 120p.
As a final step of the proof, rewriting (43) We conclude that P(E r (β) c ) ≤ n −2β−1+on (1) , and the result follows.
6 The smallest singular value and the growth factor in exact arithmetic 6

.1 Distance to subspaces
Recall that by i t = i t (A), 1 ≤ t ≤ n, we denote the indices of the pivot rows in the GEPP process (see Section 4). E row (r, β).
Further, let E dist (β) be the event that The goal in this section is to prove Proposition 6.3. There exists β 6.3 ≥ 2 so that the following holds. For β ≥ β 6.3 , we have E dist (β) ⊃ E row (β), and The statement is obtained as a combination of Lemmas 6.6 and 6.7 below. First, we consider two simple facts from Euclidean geometry. Lemma 6.4. Let u ∈ R r and let H ⊂ R r be a subspace. Then for any orthogonal projection P in R r , we have dist(u, H) ≥ dist(P u, P H).
Proof. The statement follows immediately by observing that P is a contraction.
Lemma 6.5. Let F be a subspace of R k , and let v 1 , v 2 ∈ R k be vectors such that For i ∈ [2], let F i be the linear span of F and v i . Then, Proof. For any subspace E, we let P E be the orthogonal projection onto E. Let u i := On the other hand, and therefore Lemma 6.6. Let s, k, r ∈ [n − 1] such that s ≤ r − k < r. Fix a realization of A such that the event E row (β) holds. Then, Proof. First, we note that for every t ∈ [2, r], where P t : R r → R t is the coordinate projection onto the first t components. Applying Lemma 6.4 for every 2 ≤ t ≤ r, we obtain where in the last inequality we used the definition of E row (β). Further, for t ∈ [r − k + 1, r], we will apply Lemma 6.5 with so that F 1 = H r,t−1 and F 2 = H r,t,s , and from the previous inequality and the definition of E row (β) we have dist(v 2 , F 1 ) ≥ 2/π n −4(1+β/(n−r)) and v 2 2 ≤ A it,[n] 2 ≤ √ n + 3 β log n.
Next, for each i ∈ [n], applying the standard concentration inequality for Lipschitz functions of Gaussian variables, 2 2 ) 1/2 ≤ √ n, by taking t := 3 √ β log n we have Taking the union bound over i ∈ [n] and taking into account the condition β ≥ 2, we get the first assertion of the lemma. The second assertion follows from another application of the union bound. Proposition 6.9. There is a universal constant C > 0 with the following property. For any p ≥ 1, there exist n 0 (p), k 1 (p) ≤ Cp 2 and β 6.3 ≤ β 1 (p) ≤ Cp 2 such that for n ≥ n 0 (p), β ≥ β 1 (p), and where k 0 (p) is taken from Proposition 5.2 and where β 6.3 is defined in Proposition 6.3. Moreover, Proof. Note that if the events' inclusion above holds then the second assertion of the proposition follows immediately by combining the bounds P(E smin rect (p, k 0 (p), β) c ) ≤ n −2β+on(1) from Corollary 5.3 and P(E row (β) c ) ≤ n −β from Lemma 6.7. Thus, we can focus on proving the first assertion.
Next, we will show that (47) leads to contradiction. The argument depends on whether k = r−s or not.
Case 1: k = r − s. By the definition of the event E row (β), we have which contradicts (47). Case 2: k = 2k 0 (p) < r − s. In this case, (A is,[r] ) ⊤ is a column vector of (A I r−k ,[r] ) ⊤ and H r,r−k,s is the span of every other column vector (A i s ′ ,[r] ) ⊤ for s ′ ∈ [r − k]\{s}. Hence, in view of (47), However, this contradicts to the definition of the event E smin rect (p, k 0 (p), β): The result follows.
The next simple lemma will be used to show that with high probability indices of the pivot rows obtained in exact arithmetic coincide with the results of the floating point computations.
Lemma 6.10. There is a universal constant C > 0 and a number n 0 ∈ N such that, assuming n ≥ n 0 , P s min (A Ir,[r] ) ≤ t n −C for some 1 ≤ r ≤ n − 1 ≤ t, t > 0.
For indices r < k 1 (1), we use the trivial union bound: where in the last line we used the standard bound on the smallest singular value of a square Gaussian random matrix [4,17]. Similarly, we get P s min (A Ir,[r] ) ≤ t for some n − k 1 (1) < r ≤ n − 1 ≤ n k 1 (1) t, t > 0.
Combining the three estimates above, we get the result.

Estimating the growth factor in exact arithmetic
Lemma 6.12. For any β ≥ 2, we have furthermore, for every τ ≥ 1, Proof. The upper bound on P(E col (β) c ) can be derived exactly the same way as in the argument for E row (β) (see the proof of Lemma 6.7), so we skip the discussion.
To estimate the complement of E entry (τ ), we write where in the last inequality we used that the probability density function of the standard Gaussian random variable is bounded by 1 √ 2π . At this point, we are ready to prove the "exact arithmetic" counterpart of the main statement of the paper: Proposition 6.13. There is a universal constant C > 1 and a functionñ : [1, ∞) → N with the following property. Let p ≥ 1, and let n ≥ñ(p). Then Proof. Recall that the parameter k 1 (p) = O(p 2 ) was defined in Proposition 6.9. We can take a universal constant C 1 > 0 large enough so that C 1 p 3 ≥ 600p k 1 (p) for all p ≥ 1. Fix β ≥ C 1 p 3 , set τ := β/(100p), and assume n ≥ √ 100p. In view of the assertions of Lemma 6.7, Proposition 6.9, and Lemma 6.12, in order to show that (which would imply the statement), it is sufficient to verify that everywhere on the intersection In what follows, we use the notation introduced at the beginning of Section 2; in particular, we work with matrices M (ℓ) , 0 ≤ ℓ ≤ n − 1, defined in (3).
Recall that For r ∈ [k 1 (p)], we simply use the bound above to get Further, for r ∈ (k 1 (p), n − k 1 (p)], we write In view of formula (4) and our conditioning on the event E smin (p, k 1 (p), β), for s ∈ [k 1 (p), n − k 1 (p)], i ∈ [n]\I s and j > s, we have and thus max i∈[n]\Ir,j>r and for every s ∈ [r] with s > k 1 (p), By our earlier observation, max Combining the estimates together, we conclude that for all r ∈ [n − k 1 (p)], For the "last" k 1 (p) admissible values of r, we rely on (50) again to get ∀r ∈ (n − k 1 (p), n], In the end, we make use of our bound β/(6p) ≥ 100k 1 (p) to conclude that for all large n, This completes the proof.

GEPP in floating point arithmetic
In this section we transfer the statement of Proposition 6.13 into the proper context of the floating point arithmetic. We expect a part of the argument in this section (specifically, in the proof of Lemma 7.2) to be rather standard for experts in numerical analysis. Still, we prefer to provide all the details to make the paper self-contained.
Lemma 7.1. Let A be an n × n Gaussian matrix and A = M (0) , M (1) , . . . , M (n−1) be the sequence of matrices generated by the GEPP in exact arithmetic (see (3)). Then, for every 1 ≤ k ≤ n − 1, Proof. Fix any 1 ≤ k ∈ n − 1. With the vector v k (A) defined at the beginning of Section 4 and in view of (4), for every i / ∈ I k−1 (A) we have exp(− y 2 2 /2) K exp(− y ′ 2 2 /2) dy ′ , y ∈ R n , which is symmetric and log-concave (i.e y → log(ρ(y)) is a concave function). Since log-concavity is preserved under taking marginals, the random variable ) ⊤ is also log-concave and symmetric under the conditioning. This implies, in particular, that the probability density function ρ X of X i 's (i / ∈ I) is non-increasing on the positive semi-axis.
Combining the two identities and using the monotonicity of ρ X , we get The result follows by applying Fubini's theorem.
Then GEPP in floating point arithmetic succeeds forM ; the computed permutation matrixP = Id n , and, denoting byM =M (0) ,M (1) , . . . ,M (n−1) the sequence of matrices obtained during the elimination process, for every k = 0, 1, . . . , n − 1, Proof. We will prove the statement by induction. Fix any k ∈ [n − 1]. Assume that all of the following holds (a) The computed matrixM (k−1) has been produced by taking indices of the first k − 1 pivot rows to be 1, 2, . . . , k − 1, and |M |, so that the index of the k-th computed pivot row is k. 1) ), where G i is the Gauss transformation to eliminate i-th row ofM (i−1) , 1 ≤ i ≤ k − 1, and where the error matrixẼ (k−1) satisfies Further, by the assumptions on s min (M [k], [k] ) and in view of the bound on the norm of E (k) , the matrix (M +Ẽ (k) ) [k],[k] is invertible. Hence, for every i ∈ [k + 1, n] there is a unique linear combination L i of the first k rows of M +Ẽ (k) such that the vector row i (M +Ẽ (k) ) − L i has first k components equal zero. We conclude that necessarily the matrices G 1 , G 2 , . . . , G k are Gauss transformations for M +Ẽ (k) , whence for every j ∈ [k + 1, n] we havê We will rely on formulas (54) and (55) to show thatM (k) and M (k) are sufficiently close entry-wise.
In view of (54) and (55) confirming the condition (c) for the k-th step. It remains to check the condition (a). Note that we only need to consider the case k ≤ n − 2. Using the definition of vectors v k (·) from the beginning of Section 4, we can write Proof of Theorem A. In view of Lemma 6.10, there are C ′ , n 0 > 0 such that, assuming n ≥ n 0 , P s min (A Ir,[r] ) ≤ t n −C ′ for some 1 ≤ r ≤ n − 1 ≤ t, t > 0.
On the other hand, standard concentration estimates for the spectral norm of Gaussian matrices (see, for example, [20,Chapter 4]) implies that, assuming n is sufficiently large, Let δ ∈ (u, 1/3) be a parameter to be chosen later, and define the events i k ,k | ≤ 1 − δ, k = 1, . . . , n − 1 .
Taking δ := u 1/8 , we get that for any large enough n, whereC > 0 is a universal constant. It remains to note that, in view of Lemma 7.2, everywhere on the intersection E 1 (u 1/8 ) ∩ E 2 (u 1/8 ) ∩ E 3 (u 1/8 ) the GEPP in floating point arithmetic succeeds forÂ (0) = fl(A); the computed permutation matrixP coincides with the matrix P from the P LUfactorization of A in exact arithmetic, and A second application of Proposition 6.13, now to bound g GEPP (A) conditioned on the intersection E 1 (u 1/8 ) ∩ E 2 (u 1/8 ) ∩ E 3 (u 1/8 ), completes the proof.

Further questions
In this section, we mention some open questions related to the probabilistic analysis of the Gaussian Elimination with Partial Pivoting.
Sharp estimate of the growth factor. Our main result shows that with probability close to one, the growth factor of GEPP is at most polynomial in the matrix dimension, g GEPP (A) ≤ n C . Our analysis leaves the constant C > 0 unspecified, and it would be of interest to obtain an estimate with a reasonable (single digit) explicit constant. Furthermore, as we mentioned in the introduction, it was suggested in [5] based on numerical simulations that for large n, g GEPP (A) = O(n 1/2+on(1) ) with probability close to one. The problem of finding the optimal constant power of n in the growth factor estimate seems to require essential new ideas. At the same time, it is natural to expect that recurrent estimates of the singular spectrum of submatrices obtained in the GEPP process should remain a key element of the future refinements of our result.
The probability that the Gaussian Elimination with Partial Pivoting succeeds in the floating point arithmetic. Our main result states that, under the assumption that the dimension n is sufficiently large, with probability at least 1 − u 1/8 nC the GEPP in f.p. arithmetic succeeds for fl(A), and the computed permutation matrix agrees with that obtained in exact arithmetic. We expect the probability estimate to be much stronger, perhaps of the form 1 − u 1−on(1) nC, and leave this as an open problem.
Smoothed analysis of the growth factor. Our proof does not extend to the smoothed analysis setting without incurring significant losses in the upper estimate for the growth factor.
In fact, our treatment of the partially random block matrices B in Section 3 heavily relies on the assumption that the norm of a submatrix within the "random part" of B is typically of the same order as the square root of that submatrix' larger dimension. Establishing a polynomial upper bound on the growth factor in the presence of a non-random shift (of polynomial operator norm) is an interesting and challenging problem.
Proof of Proposition 3.2. In view of the interlacing properties of singular values (see, for example, [3, Theorem 1.4]), we can assume without loss of generality that t = u. We fix an index 1 ≤ i ≤ u−1, and s ∈ (0, 1]. Denote by E the event where c = c(ρ) > 0 will be chosen later. Let Z ⊤ be the u × i random orthogonal matrix measurable w.r.t σ(M ) and such that M col j (Z ⊤ ) 2 ≤ ci s √ u , 1 ≤ j ≤ i, everywhere on E (one may take Z ⊤ as the matrix whose columns are the normalized right singular vectors of M corresponding to i smallest singular values of M ).
In To estimate the probabilities on the right hand side of the inequality, we apply Theorem A.2. Observe that, in view of Theorem A.