1 Motivation

We consider the following transform on matrix spaces:

$$ \tilde{T}(H)=T\circ H,$$

where ∘ is the Hadamard product, H C n × n , and

T= ( i 0 i ) .
(1)

If \(H=L+L^{*}\) (∗ denotes the adjoint) with L being strictly lower triangular, then

$$ L=\frac{L+L^{*}}{2}+i\cdot \frac{L-L^{*}}{2i}=\frac{1}{2}H+ \frac{i}{2}\tilde{T}(H),$$

thus simply takes the “real” part of L to its “imaginary” part, because of this it should be reasonable to call the conjugate transform.

Another good reason for such a name is the connection of to the Hilbert transform on the torus, which is defined as

$$ f\mapsto \tilde{f}(\theta )=\frac{1}{2\pi}\mathrm{p.v.} \int _{0}^{2\pi}f(t) \cot \biggl(\frac{\theta -t}{2} \biggr)\,dt,$$

(p.v. stands for Cauchy principal value). The Fourier series of f and differ by a sign depending on the frequency term, i.e., if f(θ)= k Z f ˆ (k) e i k θ , then f ˜ (θ)=i k Z sgn(k) f ˆ (k) e i k θ [1, Chap. 6].

Take f L (T), it induces a bounded multiplication operator on L 2 (T) by \(g\mapsto fg\). We can expand f into Fourier series (recall that L (T) L 2 (T), and by the Carleson theorem the series also converges a.e. [2, 3]) and write it as a vector \((\ldots , \hat{f}(-1), \hat{f}(0), \hat{f}(1),\ldots )^{T}\). In this way the multiplication operator induced by f can be represented by a bi-infinite Toeplitz matrix F (i.e., a Laurent operator) with \(F_{ij}=\hat{f}(k)\) if \(j-i=k\) (alternatively see [4, Chap. 1] or [5, Chap. 3]). It then follows that the multiplication operator \(g\mapsto \tilde{f}g\) can be represented by the matrix \(\tilde{T}(F)\), thus on matrix forms of Laurent/Toeplitz operators is a way of realizing the Hilbert transform on L (T) (see also [6] for a different perspective where is viewed as the bilinear Hilbert transform on Hankel operators).

Moreover, we have

1 2 ( f ( θ ) + i f ˜ ( θ ) f ˆ ( 0 ) ) = k N f ˆ (k) e i k θ .

The right-hand side is called the Riesz projection of f, on Toeplitz matrices it corresponds to

$$ \frac{1}{2}\bigl(A+i\tilde{T}(A)-\tilde{D}(A)\bigr)=\tilde{L}(A), $$
(2)

where is the diagonal projection that maps A to its main diagonal, and is the triangular truncation that maps A to its strict lower triangular part. Since is for many norms bounded, the boundedness of can then essentially be determined by inspecting .

The truncation appears at various places in mathematics, for example, in numerical analysis, enters critically into the iteration matrix for the Gauss–Seidel method and the Kaczmarz method, the error reduction rate with respect to the spectral condition number can be estimated using the spectral operator norm of (see [7, 8]); In functional analysis, on finite dimensional spaces is the explicit form of the projection that maps a Schatten class to the subclass of Volterra operators in it (see [9, Chap. 3] or [10]); In harmonic analysis, the norm of majorant function in the Rademacher–Menshov inequality [11, 12] can be estimated by the norm (see [13]). Therefore, as simple as the form of (and ) is, its rich and profound background intrigues us to understand its behavior on C n × n .

To our interest is the Schatten class \(S_{p}\), which consists of compact operators whose singular values are in \(\ell ^{p}\). \(S_{p}\) is a Banach space equipped with the \(\ell ^{p}\) norm of its singular values. \(S_{1}\), \(S_{2}\), \(S_{\infty}\) norms are nuclear, Hilbert–Schmidt, and spectral norms respectively. We use \(\|\cdot \|_{p}\) to denote the \(S_{p}\) norm of a matrix, if \(p=\infty \), then the subscript is omitted.

For integral operators, it is known that if their symbol belongs to particular mixed norm spaces \(L^{p,q}\) (p, q are Hölder conjugates with \(p\ge 2\)), then they are in the Schatten class \(S_{p}\) (see [1416]). On the other hand, the Hilbert transform is bounded on L p (T) for \(1< p<\infty \) (known as the Marcel–Riesz inequality [1, Chap. 6.17]) and unbounded on L 1 (T) (thus also unbounded on L (T) by duality and its anti-symmetry [17, Theorem 102]), an explicit example of this unboundedness can be found in [18, p. 250].

Such insights suggest that acts on \(S_{p}\) the same way as the Hilbert transform behaves on \(L^{p}\), which brings us to the main result of this paper:

Theorem 1

  1. (i)

    The operator norm \(\|\tilde{T}\|_{\infty}\) of on C n × n with respect to the \(S_{\infty}\) norm is

    $$ \Vert \tilde{T} \Vert _{\infty}=\frac{1}{n} \Vert T \Vert _{1}=\frac{1}{n}\sum_{k=0}^{n-1} \biggl\vert \cot \frac{(2k+1)\pi}{2n} \biggr\vert \asymp \frac{2}{\pi}\ln n.$$
  2. (ii)

    The operator norm \(\|\tilde{T}\|_{p}\) of on C n × n with respect to the \(S_{p}\) norm for \(2\le p<\infty \) satisfies (regardless of the dimension n)

    $$ \Vert \tilde{T} \Vert _{p}\le 4p.$$
  3. (iii)

    The following holds regardless of the size of A:

    $$ \sup_{\operatorname{rank}(A)=r}\frac{ \Vert \tilde{T}(A) \Vert }{ \Vert A \Vert }\le 4e\ln r.$$
  4. (iv)

    For any A C n × n , there exist a permutation matrix P and a constant C independent of the dimension such that

    $$ \bigl\Vert \tilde{T}\bigl(PAP^{*}\bigr) \bigr\Vert \le C \Vert A \Vert .$$
  5. (v)

    There is a constant C independent of the dimension and the choice of A C n × n such that

    $$ \biggl\Vert \frac{1}{n!}\sum_{P}\tilde{T} \bigl(PAP^{*}\bigr) \biggr\Vert \le C \Vert A \Vert ,$$

    where the summation is taken over all possible permutation matrices P.

2 Preliminaries

If x= ( x 1 , x 2 , , x n ) T C n , then we write

$$ D_{x}=\operatorname{diag}(x_{1},x_{2},\ldots , x_{n}),\qquad P_{x}=xx^{*},$$

in particular, one may verify that

$$ P_{x}\circ A=D_{x}AD_{x}^{*}. $$
(3)

Let

$$ \zeta =e^{\frac{\pi i}{n}},\qquad \omega =e^{\frac{2\pi i}{n}}=\zeta ^{2}, $$
(4)

and denote W as the Fourier matrix whose ijth entry is \(W_{ij}=\omega ^{(i-1)(j-1)}/\sqrt {n}\).

Lemma 1

T can be diagonalized as

$$ T=D_{\xi}^{*}W^{*}D_{\tau}WD_{\xi},$$

where

$$ \xi =\bigl(1,\zeta , \zeta ^{2},\ldots , \zeta ^{n-1} \bigr)^{T},\qquad \tau =( \tau _{0},\tau _{1},\ldots , \tau _{n-1})^{T},$$

with

$$ \tau _{k}=\cot \frac{(2k+1)\pi}{2n}.$$

Proof

It is easy to verify that \(D_{\xi}TD_{\xi}^{*}\) is circulant, thus it can be diagonalized by W, the cotangent comes from further computation that

$$ \tau _{k}=-i\sum_{j=1}^{n-1}(z_{k})^{j}=-i \biggl( \frac{1-(z_{k})^{n}}{1-z_{k}}-1\biggr)=-i\biggl(\frac{1+z_{k}}{1-z_{k}}\biggr)=-i\biggl( \frac{z_{k}^{-\frac{1}{2}}+z_{k}^{\frac{1}{2}}}{z_{k}^{-\frac{1}{2}}-z_{k}^{\frac{1}{2}}}\biggr)= \cot \frac{(2k+1)\pi}{2n},$$

where \(z_{k}=\zeta \omega ^{k}\). □

Lemma 2

Let H be a Hermitian matrix with vanishing main diagonal, if

$$ c_{p}=\sup_{H}\frac{ \Vert \tilde{T}(H) \Vert _{p}}{ \Vert H \Vert _{p}},$$

with the supreme taken over all such matrices, then

$$ c_{p}\le p.$$

Proof

The inequality is obvious for \(p=2\) since \(c_{2}=1<2\), now suppose it holds for \(p=k\), and we look at the case of \(p=2k\). Notice that the following holds:

$$ H\tilde{T}(H)+\tilde{T}(H)H=-i(L+U) (L-U)-i(L-U) (L+U)=-2i\bigl(L^{2}-U^{2} \bigr),$$

where L, U are respectively the strict lower and upper triangular part of H. It follows that

$$ H^{2}+\tilde{T} \bigl(H\tilde{T}(H)+\tilde{T}(H)H \bigr)=(L+U)^{2}-2 \bigl(L^{2}+U^{2}\bigr)=-(L-U)^{2}= \bigl(\tilde{T}(H) \bigr)^{2},$$

thus

$$\begin{aligned} \bigl\Vert \tilde{T}(H) \bigr\Vert _{2k}^{2} =& \bigl\Vert \bigl(\tilde{T}(H) \bigr)^{2} \bigr\Vert _{k} \\ \le& \bigl\Vert H^{2} \bigr\Vert _{k}+ \bigl\Vert \tilde{T} \bigl(H\tilde{T}(H)+\tilde{T}(H)H \bigr) \bigr\Vert _{k} \\ \le& \Vert H \Vert _{2k}^{2}+2c_{k} \Vert H \Vert _{2k} \bigl\Vert \tilde{T}(H) \bigr\Vert _{2k}, \end{aligned}$$

i.e.,

$$ c_{2k}^{2}\le 1+2c_{k}c_{2k},$$

which we may solve and get

$$ c_{2k}\le c_{k}+\sqrt{1+c_{k}^{2}}.$$

By induction it then leads to

$$ c_{2^{n}}\le 2^{n}.$$

For other values of p, simply apply the Riesz–Thorin interpolation theorem. □

3 Proof of the main theorem

Proof

  1. (i)

    By (3) and Lemma 1, we have

    $$ \bigl\Vert \tilde{T}(A) \bigr\Vert \le \sum_{k=0}^{n-1} \bigl\Vert \tau _{k}D_{u_{k}}AD_{u_{k}}^{*} \bigr\Vert =\frac{1}{n}\sum_{k=0}^{n-1} \vert \tau _{k} \vert \Vert A \Vert =\frac{1}{n} \Vert T \Vert _{1} \Vert A \Vert ,$$

    where \(u_{k}\) is the \(k+1\)st column in \(D_{z}^{*}W^{*}\). The equality is attainable at, e.g.,

    $$ A=W^{*}D_{\operatorname{sgn}(\tau )}W,$$

    where

    $$ \operatorname{sgn}(\tau )= \bigl(\operatorname{sgn}(\tau _{0}), \operatorname{sgn}(\tau _{1}), \ldots , \operatorname{sgn}(\tau _{n-1}) \bigr)^{T}.$$

    The asymptotic estimate follows by noticing that

    $$ \frac{\pi}{2n}\sum_{k=0}^{n-1} \biggl\vert \cot \frac{(2k+1)\pi}{2n} \biggr\vert \asymp \int _{\frac{\pi}{4n}}^{\frac{4n-1}{4n}} \vert \cot x \vert \,dx,$$

    where the left-hand side can be viewed as a quadrature formula (e.g., middle point rule) for the integral in the right-hand side, which grows like lnn.

  2. (ii)

    Denote \(\tilde{A}=A-D(A)\), then apply Lemma 2 to get

    $$\begin{aligned} \bigl\Vert \tilde{T}(A) \bigr\Vert _{p} =& \bigl\Vert \tilde{T}( \tilde{A}) \bigr\Vert _{p} \\ \le& \frac{1}{2} \bigl\Vert \tilde{T} \bigl(\tilde{A}+\tilde{A}^{*}\bigr) \bigr\Vert _{p}+ \frac{1}{2} \bigl\Vert \tilde{T}\bigl(\tilde{A}- \tilde{A}^{*} \bigr) \bigr\Vert _{p} \\ \le& 2p \Vert \tilde{A} \Vert _{p} \le 4p \Vert A \Vert _{p}. \end{aligned}$$
  3. (iii)

    This is a direct consequence of (ii) since

    $$ \bigl\Vert \tilde{T}(A) \bigr\Vert \le \bigl\Vert \tilde{T}(A) \bigr\Vert _{p}\le 4p \Vert A \Vert _{p}\le 4pr^{ \frac{1}{p}} \Vert A \Vert \le 4e\ln r \Vert A \Vert ,$$

    where the bound in the last inequality is attained at \(p=\ln r\) (easily verifiable with elementary calculus).

  4. (iv)

    The proof critically relies on the following celebrated paving conjecture (now a theorem) [19]:

    Paving: For every ϵ with \(1>\epsilon >0\), there exists a number \(\gamma _{\epsilon}\), which depends only on ϵ, such that for any A C n × n with vanishing main diagonal, one can partition the set \(\{1,2,\ldots , n\}\) into \(\gamma _{\epsilon}\) number of subsets \(\Lambda _{1}, \Lambda _{2},\ldots , \Lambda _{\gamma _{\epsilon}}\) with the property that

    $$ \bigl\Vert Q_{\Lambda _{i}}AQ_{\Lambda _{i}}^{*} \bigr\Vert \le \epsilon \Vert A \Vert ,\quad i=1,2, \ldots ,\gamma _{\epsilon},$$

    where \(Q_{\Lambda _{i}}\) is the orthogonal projection onto the space spanned by \(\{\vec{e}_{k}\}_{k\in \Lambda _{i}}\) with \(\vec{e}_{k}\) being the kth standard Euclidean basis vector.

    The paving conjecture is an equivalent formulation of the Kadison–Singer problem [20], which was solved in [21]. It suffices to take \(\gamma _{\epsilon}\) to be \((6/\epsilon )^{4}\) for real matrices and \((6/\epsilon )^{8}\) for complex matrices, see the exposition in [22].

    Clearly, for our problem it suffices (since diagonal projections are bounded) to consider only matrices with vanishing main diagonals. The existence of such a permutation can then be established by induction, and we may take

    $$ C=\frac{2(\gamma _{\epsilon}-1)}{1-\epsilon}$$

    for some properly chosen ϵ.

    For \(n=2\), the statement is trivially true for, e.g., \(\epsilon =1/2\). Suppose it holds for all \(n\le m\), and consider the case of \(m+1\). For a matrix A with vanishing main diagonal, we pave A to get the partition \(\Lambda _{1}, \Lambda _{2}, \ldots , \Lambda _{\gamma _{\epsilon}}\) and simultaneously permute (denote the permutation as σ) rows and columns of A so that \(\{Q_{\Lambda _{i}}AQ_{\Lambda _{i}}^{*}\}_{i=1}^{\gamma _{\epsilon}}\) now appears as consecutive diagonal blocks of \(P_{\sigma}AP_{\sigma}^{*}\). Denote \(A_{\sigma}=P_{\sigma}AP_{\sigma}^{*}\).

    Apply the induction assumption on each diagonal block \(Q_{\Lambda _{i}}A_{\sigma}Q_{\Lambda _{i}}^{*}\) to obtain a permutation \(\sigma _{i}\) so that

    $$ \bigl\Vert \tilde{T} \bigl(P_{\sigma _{i}}Q_{\Lambda _{i}}A_{\sigma}Q_{\Lambda _{i}}^{*}P_{ \sigma _{i}}^{*} \bigr) \bigr\Vert \le C \bigl\Vert Q_{\Lambda _{i}}A_{\sigma}Q_{\Lambda _{i}}^{*} \bigr\Vert \le C\epsilon \Vert A \Vert $$

    holds. We combine these permutations \(\sigma _{1}, \sigma _{2}, \ldots , \sigma _{\gamma _{\epsilon}}\) and σ together to get a new matrix Ã. The strategy is best illustrated by Fig. 1. Each diagonal block of size \(\Lambda _{i}\times \Lambda _{i}\) is denoted as \(\tilde{A}_{i}\) in the above figure. Away from these diagonal blocks à consists of \(\gamma _{\epsilon}-1\) number of matrices (denoted as \(B_{i}\) in the above figure), each of which consists of two rectangle matrices (both are submatrices of Ã) located in symmetric (with respect to the main diagonal) positions. Consequently, applying the induction assumption on the main diagonal blocks and the trivial estimate \(\|B_{i}\|\le 2\|A\|\) elsewhere, we obtain

    $$ \bigl\Vert \tilde{T}(\tilde{A}) \bigr\Vert \le \max_{1\le k\le \gamma _{\epsilon}} \bigl\Vert \tilde{T}(\tilde{A}_{i}) \bigr\Vert +\sum _{i=1}^{\gamma _{\epsilon}-1} \Vert B_{i} \Vert \le C \epsilon \Vert A \Vert +2(\gamma _{\epsilon}-1) \Vert \tilde{A} \Vert =C \Vert A \Vert .$$
    Figure 1
    figure 1

    The induction strategy

  5. (v)

    Consider the grand sum (i.e., the sum of all entries) of a matrix

    $$ \operatorname{gs}(A)=\sum_{j,k}A_{jk}. $$
    (5)

    It has a trivial upper bound

    $$ \bigl\vert \operatorname{gs}(A) \bigr\vert = \bigl\vert (A\vec{1}, \vec{1}) \bigr\vert \le n \Vert A \Vert , $$
    (6)

    where \(\vec{1}\) is the all one vector. It is easy to see that for any matrix A we have

    $$ \sum_{\sigma}P_{\sigma}AP_{\sigma}^{*}=(n-2)! \bigl(\operatorname{gs}(A)- \operatorname{tr}(A) \bigr)E_{0}+(n-1)!\operatorname{tr}(A)I,$$

    where \(E_{0}=E-I\) with E being the all one matrix and I is the identity matrix, thus straightforward estimate shows

    $$ \frac{1}{n!} \biggl\Vert \sum_{\sigma}P_{\sigma}AP_{\sigma}^{*} \biggr\Vert \le c \Vert A \Vert ,$$

    with c being an absolute constant independent of n and A, since both \(|\operatorname{gs}(A)|\) and \(|\operatorname{tr}(A)|\) are trivially bounded by \(n\|A\|\), while \(\|E_{0}\|\le \|E\|+\|I\|\le n+1\). □

4 Applications

4.1 Optimal constants in Rademacher–Menshov inequality

The Rademacher–Menshov inequality [11, 12] states that if φ= { φ k } k N is an orthonormal system on some measure space \((\Omega , \mu )\) and a= { a k } k N 2 is a scalar sequence, then

$$ \Vert M_{a,\varphi ,n} \Vert _{L^{2}}\le C\ln n\Biggl( \sum_{k=1}^{n} \vert a_{k} \vert ^{2}\Biggr)^{ \frac{1}{2}}, $$
(7)

where C is independent of a, φ, n and

$$ M_{a,\varphi ,n}(x)=\max_{m\le n} \Biggl\vert \sum _{j=1}^{m}a_{j}\varphi _{j}(x) \Biggr\vert $$

is often called the majorant function. With this inequality, one can further establish the Rademacher–Menshov theorem, i.e., if \(\sum_{k=1}^{\infty}|a_{k}|^{2}\ln ^{2} k<\infty \), then \(\sum_{k=1}^{\infty}a_{k}\varphi _{k}\) converges a.e. for all orthonormal systems { φ k } k N . That boundedness of the majorant function implies a.e. convergence of the series is today a standard technique, see, e.g., the exposition in [23].

For convenience, let us denote

$$ R_{n}=\frac{1}{\ln n}\sup_{a,\varphi} \frac{ \Vert M_{a,\varphi ,n} \Vert _{L^{2}}}{ (\sum_{k=1}^{n} \vert a_{k} \vert ^{2} )^{\frac{1}{2}}}. $$
(8)

For fixed n, \(R_{n}\) is the optimal constant in the right-hand side of (7) (while C in the right-hand side of the Rademacher–Menshov inequality (7) upper bounds \(R_{n}\) for all n). With the help of , \(R_{n}\) can be estimated as follows.

Corollary 1

For fixed n, the optimal constant \(R_{n}\) as defined in (8) in the Rademacher–Menshov inequality (7) satisfies \(R_{n}\to \frac{1}{\pi}\) as \(n\to \infty \).

Proof

Denote

L n = sup A C n × n L ˜ ( A ) A , T n = sup A C n × n T ˜ ( A ) A .

That \(R_{n}\ln n=L_{n}\) can be justified in the following way (see also [13] for a different approach in probabilistic setting):

Let \(\Lambda =\{\Lambda _{j}\}_{j=1}^{n}\) be a partition of Ω where each \(\Lambda _{j}\) is μ measurable. Compose the matrix \(A^{(\Lambda )}\) whose elements are defined as

$$ A^{(\Lambda )}_{ij}=\varphi _{j}|_{\Lambda _{i}},$$

then \(A^{(\Lambda )}\) is a unitary linear map from C n to \(L^{2}(\Lambda _{1})\oplus L^{2}(\Lambda _{2})\oplus \cdots \oplus L^{2}( \Lambda _{n})\) since if a =( a 1 , a 2 ,, a n ) C n and \(f=a_{1}\varphi _{1}+a_{2}\varphi _{2}+\cdots +a_{n}\varphi _{n}\), then

$$ \Vert f \Vert _{L^{2}}^{2}= \bigl\Vert A^{(\Lambda )} \vec{a} \bigr\Vert _{L^{2}(\Lambda )}^{2}= \Vert \vec{a} \Vert ^{2},$$

where \(L^{2}(\Lambda )\) denotes \(L^{2}(\Lambda _{1})\oplus L^{2}(\Lambda _{2})\oplus \cdots \oplus L^{2}( \Lambda _{n})\). Now take

$$ g_{\vec{a}}=\sum_{i=1}^{n} \sum _{j=1}^{j}a_{j}\varphi _{j} |_{\Lambda _{i}},$$

then we have

$$ \Vert g_{\vec{a}} \Vert _{L^{2}}^{2}= \bigl\Vert \tilde{L}\bigl(A^{(\Lambda )}\bigr)\vec{a} \bigr\Vert _{L^{2}( \Lambda )}^{2},$$

consequently

$$ L_{n}=\sup_{A^{(\Lambda )}} \frac{ \Vert \tilde{L}(A^{(\Lambda )}) \Vert }{ \Vert A^{(\Lambda )} \Vert }=\sup _{ \substack{A^{(\Lambda )} \\ \Vert \vec{a} \Vert =1 }} \frac{ \Vert \tilde{L}(A^{(\Lambda )})\vec{a} \Vert _{L^{2}(\Lambda )}^{2}}{ \Vert A^{(\Lambda )} \Vert }= \sup_{\substack{\varphi \\ \Vert \vec{a} \Vert =1 }} \Vert g_{\vec{a}} \Vert _{L^{2}}\le \sup_{\substack{\varphi \\ \Vert \vec{a} \Vert =1 }} \Vert M_{a,\varphi ,n} \Vert _{L^{2}}=R_{n} \ln n.$$

On the other hand, consider the following particular partition:

$$\begin{aligned} \tilde{\Lambda}_{j}={}&\Biggl\{ x\in \Omega : \text{(i) } \Biggl\vert \sum _{i=1}^{j}a_{i} \varphi _{i}(x) \Biggr\vert > \Biggl\vert \sum _{i=1}^{m}a_{i}\varphi _{i}(x) \Biggr\vert , \forall m \le n; \\ &{} \text{(ii) } j< j' \text{ if } \Biggl\vert \sum _{i=1}^{j}a_{i}\varphi _{i}(x) \Biggr\vert = \Biggl\vert \sum_{i=1}^{j'}a_{i} \varphi _{i}(x) \Biggr\vert \Biggr\} , \end{aligned}$$

i.e., x belongs to \(\tilde{\Lambda}_{j}\) if j is the smallest index where the sum \(|\sum_{i=1}^{j}a_{i}\varphi _{i}(x)|\) attains the value of the majorant function \(M_{n}(x)\) at x. Each \(\tilde{\Lambda}_{j}\) is also measurable, since it is the pre-image of the measurable set \(\mathrm{range}(M_{n})\) under the function mapping \(x\mapsto |\sum_{i=1}^{j}a_{i}\varphi _{i}(x)|\), thus we obtain that (with \(\|a\|=1\))

$$ R_{n}\ln n= \bigl\Vert M_{n}(x) \bigr\Vert _{L^{2}}^{2}=\sum_{j=1}^{n} \Biggl\Vert \sum_{i=1}^{j}a_{i} \varphi _{i}(x) \Biggr\Vert _{L^{2}(\tilde{\Lambda}_{j})}^{2}= \bigl\Vert \tilde{L} \bigl(A^{( \tilde{\Lambda})} \bigr)\vec{a} \bigr\Vert _{L^{2}(\tilde{\Lambda})}^{2}\le L_{n},$$

together we get that \(R_{n}=L_{n}\). It then easily follows from (2) and Theorem 1 (i) that

$$ R_{n}=\frac{1}{\ln n}L_{n}\asymp \frac{1}{2\ln n}T_{n} \asymp \frac{1}{\pi}.$$

 □

4.2 Ordering in Gauss–Seidel type methods

Let A be positive definite with diagonal D and strict lower triangular part L, then the error reduction matrix for applying the Gauss–Seidel method on a linear system \(Ax=b\) is \(Q=I-(D+L)^{-1}A\), thus with Theorem 1 (i) we can conclude that the error reduction rate per cycle is at least (see also [24])

$$ 1-\frac{1}{c\kappa (A)\ln n}, $$
(9)

where \(\kappa (A)\) is the spectral condition number of A and the constant c is independent of n, A and is approximately \(1/\pi \).

A similar result holds for the Kaczmarz method [25], an alternating projection method also known as ART ([26]) whose randomized version has drawn much attention in recent years since [27]. Running the Kaczmarz method on \(Ax=b\) is equivalent to running the Gauss-Seidel method implicitly on \(AA^{*}y=b\) (see [28]). The Kaczmarz method converges even for rank deficient A and inconsistent systems (see [29]), thus with Theorem 1 (iii), the error reduction rate in (9) can be improved in the rank deficient case by replacing the lnn factor with the lnr factor, the same also holds for the Gauss–Seidel method on positive semi-definite matrices.

An often observed phenomena in reality is that rearranging the ordering of equations may (though need not) accelerate the error reduction. Theorem 1 (iv) and (v) provides an explanation: The linear system in natural ordering (given ordering) may converge slowly in bad cases where the lnn factor in (9) may be active, but by (iv) there exists some good ordering with which one can get rid of this lnn factor, while (v) shows that shuffling equations after each sweep will on average also remove it.