1 Introduction

1.1 The Operator, Transfer Matrices and Lyapunov Exponents

Let \(W \geqslant 1\). Let \(\{L_x\}_{x \in {\mathbb {Z}}}\) be a sequence of identically distributed \(W \times W\) random matrices in \({\text {GL}}(W, {\mathbb {R}})\), and let \(\{ V_x \}_{x \in {\mathbb {Z}}}\) be a sequence of identically distributed \(W \times W\) real symmetric matrices, so that \(\{L_x\}_{x \in {\mathbb {Z}}}, \{V_x\}_{x \in {\mathbb {Z}}}\) are jointly independent. Denote by \({\mathcal {L}}\) the support of \(L_0\) and by \({\mathcal {V}}\)—the support of \(V_0\). Throughout this paper, we assume that

  1. (A)

    there exists \(\eta > 0\) such that

    $$\begin{aligned} {\mathbb {E}} (\Vert V_0 \Vert ^\eta + \Vert L_0\Vert ^\eta + \Vert L_0^{-1}\Vert ^\eta ) < \infty ~; \end{aligned}$$
  2. (B)

    the Zariski closure of the group generated by \({\mathcal {L}} \, {\mathcal {L}}^{-1}\) in \({\text {GL}}(W, {\mathbb {R}})\) intersects \({\mathcal {L}}\) (this holds, for example, when \(\mathbbm {1} \in {\mathcal {L}}\));

  3. (C)

    \({\mathcal {V}}\) is irreducible (i.e. has no common invariant subspaces except for \(\{0\}\) and \({\mathbb {R}}^W\)), and \({\mathcal {V}} - {\mathcal {V}}\) contains a matrix of rank one.

We are concerned with the spectral properties of the random operator H acting on (a dense subspace of) \(\ell _2 ({\mathbb {Z}} \rightarrow {\mathbb {C}}^W)\) via

$$\begin{aligned} (H \psi )(x) = L_x \psi (x+1) + V_x \psi (x) + L_{x-1}^\intercal \psi (x-1), \quad x \in {\mathbb {Z}}. \end{aligned}$$
(1)

This model, often referred to as a quasi-one-dimensional random operator, is the general Hamiltonian describing a quantum particle with W internal degrees of freedom in random potential and with nearest-neighbour random hopping. The special case \(L_x \equiv \mathbbm {1}\) is known as the block Anderson model; it is in turn a generalisation of the Anderson model on the strip \({\mathbb {Z}} \times \{1, \ldots , W\}\), and, more generally, on \({\mathbb {Z}} \times \Gamma \), where \(\Gamma \) is any connected finite graph (the assumption that \(\Gamma \) is connected ensures that \({\mathcal {V}}\) is irreducible). Another known special case of (1) is the Wegner orbital model.

Fix \(E \in {\mathbb {R}}\). If \(\psi : {\mathbb {Z}} \rightarrow {\mathbb {C}}^W\) is a formal solution of the equation

$$\begin{aligned} L_x \psi (x+1) + V_x \psi (x) + L_{x-1}^\intercal \psi (x-1) = E \psi (x), \quad x \geqslant 1, \end{aligned}$$

then

$$\begin{aligned} \left( {\begin{array}{c}\psi (x+1)\\ \psi (x)\end{array}}\right) = T_x \left( {\begin{array}{c}\psi (x)\\ \psi (x-1)\end{array}}\right) , \end{aligned}$$
(2)

where the one-step transfer matrix \(T_x \in {\text {GL}}(2W, {\mathbb {R}})\) is given by

$$\begin{aligned} T_x = \left( \begin{array}{cc} L_x^{-1}(E \mathbbm {1} - V_x) &{} - L_x^{-1} L_{x-1}^\intercal \\ \mathbbm {1} &{} 0 \end{array}\right) . \end{aligned}$$
(3)

The multi-step transfer matrices \(\Phi _{x, y} \in {\text {GL}}(2W, {\mathbb {R}})\), \(x, y \in {\mathbb {Z}}\), are defined by

$$\begin{aligned} \Phi _{x,y} = {\left\{ \begin{array}{ll} T_{x-1} \ldots T_y, &{} x > y \\ \mathbbm {1}, &{} x = y \\ T_{x}^{-1} \ldots T_{y-1}^{-1}, &{} x < y, \end{array}\right. } \end{aligned}$$
(4)

so that

$$\begin{aligned} \Phi _{x,y} \left( {\begin{array}{c}\psi (y)\\ \psi (y-1)\end{array}}\right) = \left( {\begin{array}{c}\psi (x)\\ \psi (x-1)\end{array}}\right) . \end{aligned}$$
(5)

In particular, \(T_x = \Phi _{x+1,x}\). We abbreviate \(\Phi _{N} = \Phi _{N,0}\). The Lyapunov exponents \(\gamma _j(E)\), \(1 \leqslant j \leqslant 2W\), are defined as

$$\begin{aligned} \gamma _j(E) = \lim _{N \rightarrow \infty } \frac{1}{N} {\mathbb {E}} \log s_j(\Phi _N(E)), \end{aligned}$$

where \(s_j\) stands for the jth singular value. It is known [11] that (for fixed E) this limit in expectation is also an almost sure limit. The cocycle \(\{\Phi _{x,y}\}\) is conjugate to a symplectic one (see Sect. 3.1), and hence

$$\begin{aligned} \gamma _j (E) = - \gamma _{2W+1-j} (E), \quad j =1,\ldots , W. \end{aligned}$$

Further, as we shall see in Sect. 3.2, using the work of Goldsheid [15] to verify the conditions of the Goldsheid–Margulis theorem [16] on the simplicity of the Lyapunov spectrum, that

$$\begin{aligned} \gamma _1(E)> \gamma _2(E)> \cdots> \gamma _W(E) > 0. \end{aligned}$$

We also mention that the Lyapunov exponents \(\gamma _j(E)\) are continuous functions of E. This was proved by Furstenberg and Kifer in [12]; it can also be deduced from the large deviation estimate (27)—see Duarte and Klein [8].

1.2 The Main Results

Theorem 1

Assume (A)–(C). Then, the spectrum of H is almost surely pure point. Moreover, if

$$\begin{aligned} {\mathcal {E}}[H] = \left\{ (E, \psi ) \in {\mathbb {R}} \times \ell _2({\mathbb {Z}} \rightarrow {\mathbb {C}}^W) \, : \, \Vert \psi \Vert = 1, \, H \psi = E\psi \right\} \end{aligned}$$

is the collection of eigenpairs of H, then

$$\begin{aligned} {\mathbb {P}} \left\{ \forall (E, \psi ) \in {\mathcal {E}}[H] \,\,\, \limsup _{x \rightarrow \pm \infty } \frac{1}{|x|} \log \Vert \psi (x)\Vert \leqslant - \gamma _W(E)\right\} =1, \end{aligned}$$
(6)

i.e. each eigenfunction decays exponentially, with the rate lower-bounded by the slowest Lyapunov exponent.

Remark 1.1

It is believed that the lower bound is sharp, i.e. the rate of decay cannot be faster than the slowest Lyapunov exponent:

$$\begin{aligned} {\mathbb {P}} \left\{ \forall (E, \psi ) \in {\mathcal {E}}[H] \,\,\, \liminf _{x \rightarrow \pm \infty } \frac{1}{|x|} \log \Vert \psi (x)\Vert \geqslant - \gamma _W(E)\right\} =1. \end{aligned}$$
(7)

We refer to [18] for a discussion and partial results in this direction. For \(W=1\), (7) was proved by Craig and Simon in [7].

The property of having pure point spectrum with exponentially decaying eigenfunctions is a manifestation of Anderson localisation of the random operator H. The mathematical work on Anderson localisation in one dimension was initiated by Goldsheid, Molchanov and Pastur [17], who considered the case \(W=1\), \(L_x \equiv 1\) and established the pure point nature of the spectrum under the assumption that the distribution of \(V_x\) is regular enough (absolutely continuous with bounded density). A different proof of the result of [17] was found by Kunz and Souillard [25]. Under the same assumptions, the exponential decay of the eigenfunctions was established by Molchanov [29]. The case of singular distributions was treated by Carmona, Klein, and Martinelli [6].

The case \(W >1\) was first considered by Goldsheid [14], who established the pure point nature of the spectrum for the case of the Schrödinger operator on the strip, i.e. when \(L_x \equiv \mathbbm {1}\), \(V_x\) is tridiagonal with the off-diagonal entries equal to 1 and the diagonal ones independent and identically distributed, under the assumption that the distribution of the diagonal entries of \(V_x\) is regular. In the same setting, Lacroix [26,27,28] proved that the eigenfunctions decay exponentially. The case of the Anderson model on a strip with general (possibly, singular) distributions was settled by Klein–Lacroix–Speis [23], who established localisation in the strong form (6).

Unlike the earlier, more direct arguments treating regular distributions, the works [6, 23] allowing singular distributions involve a multi-scale argument (as developed in the work of Fröhlich–Spencer [10] on localisation in higher dimension); the theory of random matrix products is used to verify the initial hypothesis of multi-scale analysis. Recently, proofs of the result of [6] avoiding multi-scale analysis were found by Bucaj et al. [5], Jitomirskaya and Zhu [22], and Gorodetski and Kleptsyn [20]; the general one-dimensional case (allowing for random hopping) was settled by Rangamani [30]. Our Theorem 1 can be seen as a generalisation of these works, and especially of [22, 30], to which our arguments are closest in spirit: we give a relatively short and single-scale proof of localisation which applies to arbitrary \(W \geqslant 1\), and allows for rather general distributions of \(V_0\) and \(L_0\) (under no regularity assumptions on the distribution of the potential). In particular, we recover and generalise the result of [23].

In fact, we prove a stronger result pertaining to the eigenfunction correlators, introduced by Aizenman [1] (see further the monograph of Aizenman–Warzel [3]). If \(\Lambda \subset {\mathbb {Z}}\) is a finite set, denote by \(H_\Lambda \) the restriction of H to \(\ell _2(\Lambda \rightarrow {\mathbb {C}}^W)\), i.e.

$$\begin{aligned} H_\Lambda = P_\Lambda H P_\Lambda ^*, \end{aligned}$$

where \(P_\Lambda : \ell _2({\mathbb {Z}} \rightarrow {\mathbb {C}}^W) \rightarrow \ell _2(\Lambda \rightarrow {\mathbb {C}}^W)\) is the coordinate projection. If \(I \subset {\mathbb {R}}\) is a compact interval, denote

$$\begin{aligned}&Q_I^\Lambda (x, y) = \sup \left\{ \Vert f(H_{\Lambda })_{x,y}\Vert \, : \, {\text {supp}} f \subset I, \, |f| \leqslant 1 \right\} ,\\&\quad Q_I(x, y) = \sup _{a \leqslant x, y \leqslant b} Q_I^{[a,b]}(x, y). \end{aligned}$$

Here \(\Vert f(H_{\Lambda })_{x,y}\Vert \) is the operator norm of the (xy) block of \(f(H_{\Lambda })\), and the functions f in the supremum are assumed to be, say, Borel measurable.

Theorem 2

Assume (A)–(C). For any compact interval \(I \subset {\mathbb {R}}\),

$$\begin{aligned} {\mathbb {P}}\left\{ \limsup _{x \rightarrow \pm \infty } \frac{1}{|x|} \log Q_I(x, y) \leqslant - \inf _{E \in I} \gamma _W(E) \right\} = 1. \end{aligned}$$
(8)

It is known (see [3]) that Theorem 2 implies Theorem 1. By plugging in various choices of f, it also implies (almost sure) dynamical localisation with the sharp rate of exponential decay, the exponential decay of the Fermi projection, et cet. (see e.g. [2] and [3]). We chose to state Theorem 1 as a separate result rather than a corollary of Theorem 2 since its direct proof is somewhat shorter than that of the latter.

We refer to Bucaj et al. [5], Jitomirskaya–Zhu [22], and Ge–Zhao [13] for earlier results on dynamical localisation for \(W = 1\).

1.3 Main Ingredients of the Proof

Similarly to many of the previous works, including [6, 23] and also the recent works [5, 20, 22], the two main ingredients of the proof of localisation are a large deviation estimate and a Wegner-type estimate. We state these in the generality required here. Let \(I \subset {\mathbb {R}}\) be a compact interval, and let \(F \subset {\mathbb {R}}^{2W}\) be a Lagrangian subspace (see Sect. 3). Denote by \(\pi _F: {\mathbb {R}}^{2W} \rightarrow F\) the orthogonal projection onto F.

Proposition 1.2

Assume (A)–(C). For any \(\epsilon > 0\) there exist \(C, c> 0\) such that for any \(E \in I\) and any Lagrangian subspace \(F \subset {\mathbb {R}}^{2W}\)

$$\begin{aligned} {\mathbb {P}} \left\{ \left| \frac{1}{N} \log s_W(\Phi _N(E) \pi _F^*) - \gamma _W(E) \right| \geqslant \epsilon \right\} \leqslant C e^{-cN}. \end{aligned}$$
(9)

The proof is essentially given in [23]; we outline the necessary reductions in Sect. 3.1. The second proposition could be also proved along the lines of the special case considered in [23]; we present an alternative (arguably, simpler) argument in Sect. 3.3.

For an operator H and E in the resolvent set of H, we denote by \(G_E[H] = (H-E)^{-1}\) the resolvent of H and by \(G_E[H](\cdot , \cdot )\) its matrix elements. If E lies in the spectrum of H, we set \(G_E[H](\cdot , \cdot ) \equiv \infty \).

Proposition 1.3

Assume (A)–(C). For any \(\epsilon > 0\) there exist \(C, c> 0\) such that for any \(E \in I\) and \(N \geqslant 1\)

$$\begin{aligned} \begin{aligned} {\mathbb {P}} \left\{ \Vert G_E[H_{[-N,N]}](i, i) \Vert \leqslant e^{-\epsilon N} \right\} \leqslant C e^{-cN} \quad&(i \in [-N,N])\\ {\mathbb {P}} \left\{ \Vert G_E[H_{[-N,N]}](i, i\pm 1) \Vert \leqslant e^{-\epsilon N} \right\} \leqslant C e^{-cN} \quad&(i, i \pm 1\in [-N,N]) \end{aligned} \end{aligned}$$
(10)

Remark 1.4

The arguments which we present can be applied to deduce the following strengthening of (10):

$$\begin{aligned} {\mathbb {P}} \left\{ {\text {dist}}(E, \sigma (H_{[-N,N]})) \leqslant e^{-\epsilon N} \right\} \leqslant C e^{-cN}. \end{aligned}$$

We content ourselves with (10) which suffices for the proof of the main theorems.

Klein, Lacroix and Speis [23] use (special cases of) Propositions 1.2 and 1.3 to verify the assumptions required for multi-scale analysis. We deduce Theorems 1 and 2 directly from these propositions. In this aspect, our general strategy is similar to the cited works [5, 20, 22]. However, several of the arguments employed in these works rely on the special features of the model for \(W=1\); therefore, our implementation of the strategy differs in several crucial aspects.

2 Proof of the Main Theorems

2.1 Resonant Sites; The Main Technical Proposition

Let \(\tau > 0\) be a (small) number. We say that \(x\in {\mathbb {Z}}\) is \((\tau , E, N)\)-non-resonant (\(x \notin {\text {Res}}(\tau , E,N)\)) if

$$\begin{aligned} {\left\{ \begin{array}{ll} \Vert L_x\Vert \leqslant e^{\tau N}, \\ \Vert G_E[H_{[x-N, x+N]}](x, x\pm N)\Vert \leqslant e^{-(\gamma _W(E) - \tau )N}, \end{array}\right. } \end{aligned}$$
(11)

and \((\tau , E,N)\)-resonant (\(x \in {\text {Res}}(\tau , E,N)\)) otherwise. The following proposition is the key step towards the proof of Theorems 1 and 2.

Proposition 2.1

Assume (A)–(C). Let \(I \subset {\mathbb {R}}\) be a compact interval, and let \(\tau > 0\). There exist \(C, c> 0\) such that for any \(N \geqslant 1\)

$$\begin{aligned} {\mathbb {P}} \left\{ \max _{E \in I} {\text {diam}}({\text {Res}}(\tau , E, N) \cap [-N^2, N^2]) > 2N \right\} \leqslant C e^{-cN}. \end{aligned}$$

The remainder of this section is organised as follows. In Sect. 2.2, we express the Green function in terms of the transfer matrices. Using this expression and Propositions 1.2 and 1.3, we show that the probability that \(x \in {\text {Res}}(\tau , E, N)\) (for a fixed \(E \in {\mathbb {R}}\)) is exponentially small. In Sect. 2.3, we rely on this estimate to prove Proposition 2.1. Then, we use this proposition to prove Theorem 1 (Sect. 2.4) and Theorem 2 (Sect. 2.5).

2.2 Reduction to Transfer Matrices

Fix \(N \geqslant 1\). Consider the \(W \times W\) matrices

$$\begin{aligned} \begin{aligned} \Psi _i^+&= (\mathbbm {1} \,\, 0) \, \Phi _{i, N+1} \left( {\begin{array}{c}0\\ \mathbbm {1}\end{array}}\right) = (0 \,\, \mathbbm {1}) \Phi _{i+1,N+1} \left( {\begin{array}{c}0\\ \mathbbm {1}\end{array}}\right) , \\ \Psi _i^-&= (\mathbbm {1} \,\, 0) \, \Phi _{i, -N} \left( {\begin{array}{c}\mathbbm {1}\\ 0\end{array}}\right) = (0 \,\, \mathbbm {1}) \Phi _{i+1,-N} \left( {\begin{array}{c}\mathbbm {1}\\ 0\end{array}}\right) . \end{aligned} \end{aligned}$$
(12)

The Green function of \(H_{[-N,N]}\) can be expressed in terms of these matrices using the following claim, which holds deterministically for any H of the form (1). A similar expression has been employed already in [14].

Claim 2.2

If \(E \notin \sigma (H_{[-N,N]})\), then:

  1. 1.
    $$\begin{aligned} \left( {\begin{array}{c}\Psi _{\pm 1}^\pm \\ \Psi _{0}^\pm \end{array}}\right) G_E[H_{[-N,N]}](0, \pm N) = \left( {\begin{array}{c}G_E[H_{[-N,N]}](0, \pm 1)\\ G_E[H_{[-N,N]}](0,0)\end{array}}\right) ;\end{aligned}$$
  2. 2.

    for any \(i,j \in [-N, N]\),

    $$\begin{aligned} G_E[H_{[-N, N]}](i, i) = {\left\{ \begin{array}{ll} \Psi _j^{-} (\Psi _i^-)^{-1} \left( \Psi _{i+1}^+ (\Psi _i^+)^{-1} - \Psi _{i+1}^- (\Psi _i^-)^{-1}\right) ^{-1} L_i^{-1}, &{}i \geqslant j\\ \Psi _j^{+} (\Psi _i^+)^{-1} \left( \Psi _{i+1}^+ (\Psi _i^+)^{-1} - \Psi _{i+1}^- (\Psi _i^-)^{-1}\right) ^{-1} L_i^{-1}, &{}i \leqslant j. \end{array}\right. } \end{aligned}$$

Proof

Abbreviate \(G_E = G_E[H_{[-N,N]}]\), and set \(G_E(i, j) = 0\) for \(j \notin [-N, N]\). The matrices \(G_E(i, j)\), \(-N \leqslant j \leqslant N\), are uniquely determined by the system of equations

$$\begin{aligned}&L_j G_E(i, j+1) + (V_j - E \mathbbm {1}) G_E(i, j) + L_{j-1}^\intercal G_{E}(i,j-1) = \delta _{j,i} {\mathbbm {1}},\nonumber \\&\quad -N \leqslant j \leqslant N. \end{aligned}$$
(13)

We look for a solution of the form

$$\begin{aligned} G_E(i, j) = {\left\{ \begin{array}{ll} \Psi _j^- \alpha _i^-, j\leqslant i \\ \Psi _j^+ \alpha _i^+, j\geqslant i, \end{array}\right. } \end{aligned}$$
(14)

where

$$\begin{aligned} \Psi _i^- \alpha _i^- - \Psi _i^+ \alpha _i^+= & {} 0 \end{aligned}$$
(15)
$$\begin{aligned} \Psi _{i+1}^- \alpha _i^- - \Psi _{i+1}^+ \alpha _i^+= & {} - L_i^{-1}. \end{aligned}$$
(16)

The first equation ensures that (14) defines \(G_E(i,i)\) consistently, while the second one guarantees that (13) holds for \(j=i\). For the other values of j, (13) follows from the construction of the matrices \(\Psi ^{\pm }_j\).

The solution to (15)–(16) is explicitly found by elimination:

$$\begin{aligned} \alpha _i^- = (\Psi _i^-)^{-1} \Psi _i^+ \alpha _i^+, \quad \alpha _i^+ = - ( \Psi _{i+1}^- (\Psi _i^-)^{-1} \Psi _i^+ - \Psi _{i+1}^+)^{-1} L_i^{-1}. \end{aligned}$$

This implies the second part of the claim. For the first part, note that for \(j \geqslant i\)

$$\begin{aligned} G_E(0,j) = \Psi _j^+ \alpha _0^+ = \Psi _j^+ (\Psi _0^+)^{-1} G_E(0,0) = \Psi _j^+ (\Psi _1^+)^{-1} G_E(0,1).\end{aligned}$$

Observing that \(\Psi _N^+ = \mathbbm {1}\), we conclude that

$$\begin{aligned} G_E(0,N) = (\Psi _0^+)^{-1} G_E(0,0) = (\Psi _1^+)^{-1} G_E(0,1), \end{aligned}$$

as claimed. Similarly,

$$\begin{aligned} G_E(0,-N) = (\Psi _0^-)^{-1} G_E(0,0) = (\Psi _{-1}^-)^{-1} G_E(0,-1). \end{aligned}$$

\(\square \)

2.3 Proof of Proposition 2.1

Fix a small \(\tau > 0\). Without loss of generality, I is short enough to ensure that

$$\begin{aligned} \max _{E \in I} \gamma _W(E) - \min _{E \in I} \gamma _W(E) \leqslant \frac{\tau }{2} \end{aligned}$$

(this property is valid for short intervals due to the continuity of \(\gamma _W\); the statement for larger intervals I follows by compactness). Fix such I (which will be suppressed from the notation), and let

$$\begin{aligned} \gamma = \frac{1}{2} (\max _{E \in I} \gamma _W(E) + \min _{E \in I} \gamma _W(E)), \quad \text {so that} \quad \sup _{E \in I} | \gamma _W(E) - \gamma | \leqslant \frac{\tau }{4}.\end{aligned}$$

For \(x \in {\mathbb {Z}}\), let

$$\begin{aligned} {\text {Res}}^*(\tau , x, N) = \left\{ E \in I \, : \, \max _\pm \Vert G_E[H_{[x-N,x+N]}](x,x\pm N) \Vert _{1,\infty } \geqslant e^{-(\gamma (E) - \frac{\tau }{2})N} \right\} ,\end{aligned}$$

where \(\Vert A\Vert _{1,\infty } = \max _{1 \leqslant \alpha , \beta \leqslant W} |A_{\alpha ,\beta }|\). For N large enough (\(N \geqslant N_0(\tau )\)),

$$\begin{aligned} \left( \Vert L_x \Vert \leqslant e^{\tau N} \right) \,\, \text {and} \,\, \left( E \notin {\text {Res}}^*(\tau , x, N) \right) \Longrightarrow x \notin {\text {Res}}(\tau , E, N).\end{aligned}$$

By (A) and the Chebyshev inequality

$$\begin{aligned} {\mathbb {P}} \left\{ \exists x \in [-N^2, N^2] \, : \, \Vert L_x\Vert \geqslant e^{\tau N} \right\} \leqslant (2N^2 + 1) \frac{{\mathbb {E}} \Vert L_0\Vert ^\eta }{e^{\tau \eta N}} \leqslant C_1 e^{-c_1N}. \end{aligned}$$

Hence, the proposition boils down to the following statement:

$$\begin{aligned} |x-y| >2N \Longrightarrow {\mathbb {P}} \left\{ {\text {Res}}^*(\tau , x, N) \cap {\text {Res}}^*(\tau , y, N) \ne \varnothing \right\} \leqslant C e^{-cN}. \end{aligned}$$
(17)

The proof of (17) rests on two claims. The first one is deterministic:

Claim 2.3

\({\text {Res}}^*(\tau , x, N)\) is the union of at most \(C_W N\) disjoint closed intervals.

Proof

By Cramer’s rule, for each \(\alpha ,\beta \in \{1, \ldots , W\}\) and ± the function

$$\begin{aligned} g_{\alpha ,\beta }^\pm : E \mapsto (G_E[H_{[x-N,x+N]}](x, x\pm N))_{\alpha ,\beta } \end{aligned}$$

is the ratio of two polynomials of degree \(\leqslant W(2N+1)\). Hence, the level set

$$\begin{aligned} \left\{ E \, : \, |g_{\alpha ,\beta }^\pm (E)| = e^{-(\gamma - \frac{\tau }{2})N} \right\} \end{aligned}$$

is of cardinality \(\leqslant W(2N+1)\) (note that the \(\leqslant W(2N+1)\) discontinuity points of \(g_{\alpha ,\beta }^\pm \) are poles; hence, they cannot serve as the endpoints of the superlevel sets of this function). Hence, our set

$$\begin{aligned} \left\{ E \, : \, |g_{\alpha ,\beta }^\pm (E)| \geqslant e^{-(\gamma - \frac{\tau }{2})N} \right\} \end{aligned}$$

is the union of at most \(\leqslant W(2N+1)/2\) closed intervals, and \({\text {Res}}^*(\tau , x, N)\) is the union of at most

$$\begin{aligned} 2 \, \frac{W(W+1)}{2} \, \frac{W(2N+1)}{2} \leqslant C_W N\end{aligned}$$

closed intervals. \(\square \)

Claim 2.4

Assume (A)–(C). For any compact interval \(I \subset {\mathbb {R}}\) there exist \(C, c> 0\) such that for any \(N \geqslant 1\) and any \(E \in I\),

$$\begin{aligned} {\mathbb {P}}\left\{ E \in {\text {Res}}^*(\tau , x, N) \right\} \leqslant C e^{-cN}.\end{aligned}$$

Proof

According to Claim 2.2,

$$\begin{aligned} \Vert G_E[H_{[-N,N]}](0, \pm N)\Vert \leqslant \left\{ s_W \left( {\begin{array}{c}\Psi _{\pm 1}^\pm \\ \Psi _{0}^\pm \end{array}}\right) \right\} ^{-1} \, \Vert \left( {\begin{array}{c}G_E[H_{[-N,N]}](0, \pm 1)\\ \Vert G_E[H_{[-N,N]}](0,0)\end{array}}\right) \Vert ~;\end{aligned}$$

hence

$$\begin{aligned}&{\mathbb {P}} \left\{ \Vert G_E[H_{[-N,N]}](0, \pm N) \Vert \geqslant e^{-(\gamma _W(E) - \frac{\tau }{4})N)} \right\} \\&\quad \leqslant {\mathbb {P}}\left\{ \! s_W \left( {\begin{array}{c}\Psi _{\pm 1}^\pm \\ \Psi _0^\pm \end{array}}\right) \leqslant e^{(\gamma _W(E) - \frac{\tau }{8})N } \right\} + {\mathbb {P}}\left\{ \Vert \! \left( {\begin{array}{c}G_E[H_{[-N,N]}](0, \pm 1)\\ \Vert G_E[H_{[-N,N]}](0,0)\end{array}}\right) \Vert \geqslant e^{\frac{\tau }{8} N}\! \right\} . \end{aligned}$$

By Propositions 1.2 and 1.3, both terms decay exponentially in N, locally uniformly in E. \(\square \)

Now we can prove (17). By Claim 2.3 both \({\text {Res}}^*(\tau , x, N)\) and \({\text {Res}}^*(\tau , y, N)\) are unions of at most \(C_W N\) closed intervals. If these two sets intersect, then either one of the edges of the intervals composing the first one lies in the second one, or vice versa. The operators \(H_{[x-N, x+N]}\) and \(H_{[y-N, y+N]}\) are independent due to the assumption \(|x-y|> 2N\), hence by Claim 2.4

$$\begin{aligned} {\mathbb {P}} \left\{ {\text {Res}}^*(\tau , x, N) \cap {\text {Res}}^*(\tau , y, N) \ne \varnothing \right\} \leqslant 4 C_W N \times C e^{-cN} \leqslant C_1 e^{-c_1 N}. \end{aligned}$$

This concludes the proof of (17) and of Proposition 2.1. \(\square \)

2.4 Spectral Localisation: Proof of Theorem 1

The proof of localisation is based on Schnol’s lemma, which we now recall (see [21] for a version applicable in the current setting). A function \(\psi : {\mathbb {Z}} \rightarrow {\mathbb {C}}^W\) is called a generalised eigenfunction corresponding to a generalised eigenvalue \(E \in {\mathbb {R}}\) if

$$\begin{aligned}&L_x \psi (x+1) + V_x \psi (x) + L_{x-1}^\intercal \psi (x-1) = E \psi (x), \quad x \geqslant 0 \end{aligned}$$
(18)
$$\begin{aligned}&\limsup _{|x| \rightarrow \infty } \frac{1}{|x|} \log \Vert \psi (x)\Vert = 0. \end{aligned}$$
(19)

Schnol’s lemma asserts that any spectral measure of H is supported on the set of generalised eigenvalues. Thus, we need to show that (with full probability) any generalised eigenpair \((E, \psi )\) satisfies

$$\begin{aligned} \limsup _{|x| \rightarrow \infty } \frac{1}{|x|} \log \Vert \psi (x)\Vert \leqslant - \gamma _W(E). \end{aligned}$$
(20)

Fix a compact interval \(I \subset {\mathbb {R}}\), and \(\tau > 0\). Consider the events

$$\begin{aligned} {\mathcal {G}}_M(I, \tau ) = \left\{ \forall E \in I \,\, \forall N \geqslant M \,\, {\text {diam}}( {\text {Res}}(\tau , E, N) \cap [-N^2, N^2])\leqslant 2N \right\} . \end{aligned}$$

By Proposition 2.1 and the Borel–Cantelli lemma,

$$\begin{aligned} {\mathbb {P}} \left( \bigcup _{M \geqslant 1} {\mathcal {G}}_M(I, \tau )\right) = 1. \end{aligned}$$

We shall prove that on any \({\mathcal {G}}_M(I, \tau )\) every generalised eigenpair \((E, \psi )\) with \(E \in I\) satisfies

$$\begin{aligned} \limsup _{|x| \rightarrow \infty } \frac{1}{|x|} \log \Vert \psi (x)\Vert \leqslant - \gamma _W(E) +3\tau . \end{aligned}$$
(21)

From (18), we have for any x

$$\begin{aligned} \psi (x)= & {} - G_E[H_{[x-N, x+N]}] (x, x-N) L_{-N-1}^\intercal \psi (x-N-1)\\&- G_E[H_{[x-N, x+N]}] (x, x+N) L_{N} \psi (x+N+1). \end{aligned}$$

If \(x \notin {\text {Res}}(\tau , E, N)\), this implies

$$\begin{aligned} \begin{aligned} \Vert \psi (x)\Vert&\leqslant e^{-(\gamma _W(E) - 2\tau ) N} (\Vert \psi (x-N-1)\Vert + \Vert \psi (x+N+1)\Vert ) \\&\leqslant 2e^{-(\gamma _W(E) - 2\tau ) N} \max (\Vert \psi (x-N-1)\Vert , \Vert \psi (x+N+1)\Vert ), \end{aligned} \end{aligned}$$

whence \(f_\tau (x) \overset{\text {def}}{=} e^{-\tau |x|} \Vert \psi (x)\Vert \) satisfies

$$\begin{aligned} f_\tau (x) \leqslant 2 e^{-(\gamma _W(E) - 3\tau ) N}\max (f_\tau (x-N-1), f_\tau (x+N+1))). \end{aligned}$$
(22)

The function \(f_\tau \) is bounded due to (19); hence, it achieves a maximum at some \(x_\psi \in {\mathbb {Z}}\). For

$$\begin{aligned} N > \log 2 / (\gamma _W(E) - 3\tau ), \end{aligned}$$

(22) cannot hold at \(x = x_\psi \), thus on \({\mathcal {G}}_M(I, \tau )\) for all

$$\begin{aligned} N \geqslant N_0 \overset{\text {def}}{=} \max (M, \log 2 / (\gamma _W(E) - 3\tau ), |x_\psi |) \end{aligned}$$

we have:

$$\begin{aligned} {\text {Res}}(\tau , E, N) \cap [-N^2, N^2] \subset [x_\psi - 2N, x_\psi + 2N] \subset [-3N, 3N]. \end{aligned}$$

Thus, (22) holds whenever xN are such that \(3N < |x| \leqslant N^2\) and \(N \geqslant N_0\).

For each \(x \in {\mathbb {Z}}\), let N(x) be such that \(N^2/10 \leqslant |x| \leqslant N^2 / 5\). If |x| is large enough, \(N(x) \geqslant N_0\). Applying (22) \(\lfloor |x|/(N+1) \rfloor - 4\) times, we obtain

$$\begin{aligned} f_\tau (x)\leqslant & {} (2e^{-(\gamma _W(E)-3\tau )N})^{\lfloor x/(N+1)\rfloor - 4}\\&\times \max f_\tau \leqslant e^{-(\gamma _W(E)-3\tau )|x| + C(\sqrt{|x|}+1)} \times \max f_\tau , \end{aligned}$$

which implies (21). \(\square \)

2.5 Eigenfunction Correlator: Proof of Theorem 2

Fix a compact interval \(I \subset {\mathbb {R}}\), and let \(\gamma = \min _{E \in I} \gamma _W(E)\). The proof of (8) relies on the following fact from [9, Lemma 4.1], based on an idea from [3]:

$$\begin{aligned} Q_I^\Lambda (x, y) \leqslant \lim _{\epsilon \rightarrow + 0} \frac{\epsilon }{2} \int _I \! \Vert G_E[H_\Lambda ](x, y)\Vert ^{1-\epsilon } \mathrm{d}E \leqslant W. \end{aligned}$$
(23)

Our goal is to bound on this quantity uniformly in the interval \(\Lambda \supset \{x,y\}\). Without loss of generality we can assume that \(x = 0\). Choose N such that \(N^2/10 \leqslant |y| \leqslant N^2/5\). By Proposition 2.1, for any \(\tau \in (0, \gamma )\)

$$\begin{aligned} {\mathbb {P}} \left\{ \forall E \in I \, {\text {diam}}({\text {Res}}(\tau , E, N) \cap [-N^2, N^2]) \leqslant 2N\right\} \geqslant 1 - Ce^{-cN}. \end{aligned}$$

We show that on the event

$$\begin{aligned} \left\{ \forall E \in I \, {\text {diam}}({\text {Res}}(\tau , E, N) \cap [-N^2, N^2]) \leqslant 2N\right\} \end{aligned}$$
(24)

we have

$$\begin{aligned} Q_I^\Lambda (0,y) \leqslant e^{-(\gamma - 2\tau )|y|}, \quad |y| > C_0(\gamma -\tau ). \end{aligned}$$
(25)

Expand the Green function \(G_E[H_\Lambda ](0, y)\) as follows. First, iterate the resolvent identity

$$\begin{aligned} \begin{aligned} G_E[H_\Lambda ](x, y)&= G_E[H_{[x - N, x + N]}](x, x-N) L_{x-N-1}^\intercal G_E[H_\Lambda ](x-N-1, y) \\&\quad +G_E[H_{[x - N, x + N]}](x, x + N) L_{x+N}\,\,\,\,\,\,\,G_E[H_\Lambda ](x+N+1, y) \end{aligned} \end{aligned}$$

starting from \(x = 0\) at most |y|/N times, or until the first argument of \(G_E[H_\Lambda ]\) reaches the set \({\text {Res}}(\tau , E, N)\). Then, apply the identity

$$\begin{aligned}\begin{aligned} G_E[H_\Lambda ](x, u)&= G_E[H_\Lambda ](x, u - N -1) L_{u-N-1}G_E[H_{[u-N,u+N]}](u-N, u) \\&\quad + G_E[H_\Lambda ](x, u +N +1) L_{u+N}^\intercal \,\,\,\,\,\,\,G_E[H_{[u-N, u+N]}](u+N, u) \end{aligned} \end{aligned}$$

starting from \(u = y\) at most |y|/N times, or until the second argument of \(G_E[H_\Lambda ]\) reaches the set \({\text {Res}}(\tau , E, N)\). The resulting expansion has \(\leqslant 2^{2|y|/N}\) addends, each of which has the form

$$\begin{aligned} \begin{aligned}&G_E[H_{[x_0 - N, x_0 + N]}](x_0, x_1) \ldots G_E[H_{[x_{k-1} - N, x_{k-1} + N]}](x_{k-1}, x_k) \\&\quad G_E[H_\Lambda ](x_k, y_\ell ) \\&\quad G_E[H_{[y_{\ell -1} - N, y_{\ell -1} + N]}](y_\ell , y_{\ell -1}) \ldots G_E[H_{[y_0 - N, y_0 + N]}](y_1, y_0), \end{aligned} \end{aligned}$$
(26)

where \(x_0 = 0\), \(x_{j+1} = x_j \pm N\), \(y_0 = y\), \(y_{j+1} = y_j\pm N\), and (by the construction of the event (24)) \(k + \ell \geqslant |y|/N-4\). All the terms in the first and third line of (26) are bounded in norm by \(e^{-(\gamma -\tau )N}\), hence

$$\begin{aligned} \Vert G_E[H_\Lambda ](0, y) \Vert \leqslant 64 \left( 4 e^{-(\gamma -\tau )N}\right) ^{|y|/N-4} \sum _{u, v \leqslant 2|y|} \Vert G_E[H_\Lambda ](u, v)\Vert . \end{aligned}$$

Now we raise this estimate to the power \(1 -\epsilon \) and integrate over \(E \in I\):

$$\begin{aligned}&\frac{\epsilon }{2} \int _I \Vert G_E[H_\Lambda ](0, y) \Vert ^{1-\epsilon } \mathrm{d}E \leqslant 64^{1-\epsilon } \left( 4 e^{-(\gamma -\tau )N}\right) ^{(1-\epsilon )(|y|/N-4)}\\&\quad \sum _{u, v \leqslant 2|y|} \frac{\epsilon }{2} \int _I \Vert G_E[H_\Lambda ](u, v)\Vert ^{1-\epsilon } \mathrm{d}E. \end{aligned}$$

It remains to let \(\epsilon \rightarrow + 0\) while making use of the two inequalities in (23). \(\square \)

3 Properties of Transfer Matrices

3.1 Preliminaries

Denote

$$\begin{aligned} J = \left( \begin{array}{cc} 0 &{} - \mathbbm {1} \\ \mathbbm {1} &{} 0 \end{array}\right) \in {\text {GL}}(2W, {\mathbb {R}}). \end{aligned}$$

A matrix \(Q \in {\text {GL}}(2W, {\mathbb {R}})\) is called symplectic, \(Q \in {\text {Sp}}(2W, {\mathbb {R}})\), if \(Q^\intercal J Q = J\).

The matrices \(T_x\) are, generally speaking, not symplectic. However, the cocycle \(\{\Phi _{x,y}\}_{x,y,\in {\mathbb {Z}}}\) is conjugate to a symplectic one. Indeed, observe that

Claim 3.1

If \(L \in {\text {GL}}(W, {\mathbb {R}})\) and Z is \(W \times W\) real symmetric, then \(Q(L, Z) = \left( \begin{array}{cc} L^{-1} Z &{} -L^{-1} \\ L^\intercal &{} 0 \end{array} \right) \) is symplectic.

Denote \(D_x =\left( \begin{array}{cc} \mathbbm {1} &{} 0 \\ 0 &{} L_x^\intercal \end{array} \right) \), then

$$\begin{aligned} {{\widetilde{T}}}_x(E) \overset{\text {def}}{=} D_x T_x(E) D_{x-1}^{-1} = Q(L_x, E\mathbbm {1}-V_x) \in {\text {Sp}}(2W, {\mathbb {R}}). \end{aligned}$$

Thus, also

$$\begin{aligned} {{\widetilde{\Phi }}}_{x, y}(E) = D_{x-1} \Phi _{x, y}(E) D_{y-1}^{-1} = {\left\{ \begin{array}{ll} {\widetilde{T}}_{x-1} (E)\ldots {\widetilde{T}}_y(E), &{} x > y \\ \mathbbm {1}, &{} x = y \\ {\widetilde{T}}_{x}^{-1} (E)\ldots T_{y-1}^{-1}(E), &{} x < y \end{array}\right. } \,\,\,\,\,\, \in {\text {Sp}}(2W, {\mathbb {R}}). \end{aligned}$$

3.2 Simplicity of the Lyapunov Spectrum and Large Deviations

Goldsheid and Margulis showed [16] that if \(g_j\) are independent, identically distributed random matrices in \({\text {Sp}}(2W, {\mathbb {R}})\), and the group generated by the support of \(g_1\) is Zariski dense in \({\text {Sp}}(2W, {\mathbb {R}})\), then the Lyapunov spectrum of a random matrix product \(\{ g_N \ldots g_1 \}\) is simple, i.e.

$$\begin{aligned} \gamma _1> \cdots> \gamma _W> 0. \end{aligned}$$

Goldsheid showed [15] that if \({\mathcal {V}}\) is irreducible and \({\mathcal {V}} - {\mathcal {V}}\) contains a rank-one matrix, then for any \(E \in {\mathbb {R}}\) the group generated by \(Q(\mathbbm {1}, E \mathbbm {1} - V)\), \(V \in {\mathcal {V}}\), is Zariski dense in \({\text {Sp}}(2W, {\mathbb {R}})\).

Corollary 3.2

Assume (A)–(C). Then, for any \(E \in {\mathbb {R}}\)

$$\begin{aligned} \gamma _1(E)> \cdots> \gamma _W(E) > 0. \end{aligned}$$

Proof

Observe that

$$\begin{aligned} Q(L, E\mathbbm {1}-V) = \left( \begin{array}{cc} L^{-1} &{} 0 \\ 0 &{} L^\intercal \end{array} \right) Q(\mathbbm {1}, E\mathbbm {1}-V), \end{aligned}$$

whence

$$\begin{aligned} Q({\widehat{L}}, E\mathbbm {1}-V)^{-1} Q(L, E\mathbbm {1}-V) = \left( \begin{array}{cc} {\widehat{L}} L ^{-1} &{} 0 \\ 0 &{} {\widehat{L}}^{-\intercal }L^\intercal \end{array} \right) . \end{aligned}$$

If the Zariski closure of the group generated by \({\mathcal {L}} {\mathcal {L}}^{-1}\) intersects \({\mathcal {L}}\), then the Zariski closure of the group generated by \(\{ Q(L, E \mathbbm {1} - V)\}_{L \in {\mathcal {L}}, V \in {\mathcal {V}}}\) contains that of the group generated by \(\{ Q(\mathbbm {1}, E \mathbbm {1} - V)\}_{V \in {\mathcal {V}}}\). \(\square \)

Having the corollary at hand, we deduce from [23, Proposition 2.7] applied to the matrices \({{\widetilde{\Phi }}}_N(E)\):

Proposition 3.3

Assume (A)–(C). For any \(\epsilon > 0\) there exist \(C, c> 0\) such that for any \(E \in I\) and \(1 \leqslant j \leqslant W\)

$$\begin{aligned} {\mathbb {P}} \left\{ \left| \frac{1}{N} \log s_j({{\widetilde{\Phi }}}_N(E)) - \gamma _j(E) \right| \geqslant \epsilon \right\} \leqslant C e^{-cN}. \end{aligned}$$
(27)

Further, for any Lagrangian subspace \(F \subset {\mathbb {R}}^{2W}\)

$$\begin{aligned} {\mathbb {P}} \left\{ \left| \frac{1}{N} \log s_j({\widetilde{\Phi }}_N(E) \pi _F^*) - \gamma _j(E) \right| \geqslant \epsilon \right\} \leqslant C e^{-cN}. \end{aligned}$$
(28)

Proof

The estimate (28) is a restatement of [23, Proposition 2.7], whereas (27) follows from (28) applied to a \(\delta \)-net on the manifold of Lagrangian subspaces of \({\mathbb {R}}^{2W}\) (the Lagrangian Grassmannian). We note that (27) is also proved directly in [8]. \(\square \)

Note that Proposition 1.2 follows from (28).

Now fix \(\epsilon \) and a Lagrangian subspace F, and let

$$\begin{aligned} \Omega _\epsilon ^F[{\widetilde{\Phi }}_N] = \left\{ \max _{j=1}^W \left[ | \frac{1}{N} \log s_j({\widetilde{\Phi }}_N) - \gamma _j | + | \frac{1}{N} \log s_j({\widetilde{\Phi }}_N \pi _F^*) - \gamma _j | \right] \leqslant \frac{\epsilon }{100 W} \right\} . \nonumber \\ \end{aligned}$$
(29)

According to Proposition 3.3,

$$\begin{aligned} {\mathbb {P}}(\Omega _\epsilon ^{F}[{\widetilde{\Phi }}_N(E)]) \geqslant 1 - C(\epsilon , E) e^{-c(\epsilon , E) N}, \end{aligned}$$

where the constants are locally uniform in E. Let

$$\begin{aligned} {\widetilde{\Phi }}_N(E) = U_N(E) \Sigma _N(E) V_N(E)^\intercal \end{aligned}$$

be the singular value decomposition of \({\widetilde{\Phi }}_N(E)\). Assume that the singular values on the diagonal of \(\Sigma _N(E)\) are arranged in non-increasing order; the choice of the additional degrees of freedom is not essential for the current discussion. Denote

$$\begin{aligned} F_+ = \left\{ \left( {\begin{array}{c}x\\ 0\end{array}}\right) \, :\, x \in {\mathbb {R}}^{W} \right\} \subset {\mathbb {R}}^{2W}, \quad F_- = \left\{ \left( {\begin{array}{c}0\\ y\end{array}}\right) \, :\, y \in {\mathbb {R}}^{W} \right\} \subset {\mathbb {R}}^{2W}. \end{aligned}$$
(30)

Claim 3.4

Let \(F \subset {\mathbb {R}}^{2W}\) be a Lagrangian subspace. For N large enough (depending on \(\epsilon \)), one has (deterministically) on the event \(\Omega _\epsilon ^F[{\widetilde{\Phi }}_N(E)]\) defined in (29)

$$\begin{aligned} s_W(\pi _{F_+ }V_N(E)^\intercal \pi _F^*) \geqslant e^{-\frac{\epsilon }{25} N}. \end{aligned}$$

Remark 3.5

For future reference, we also record the dual version of the claim: on \(\Omega _\epsilon ^F[{\widetilde{\Phi }}_N(E)^\intercal ]\)

$$\begin{aligned} s_W(\pi _{F}^*U_N(E) \pi _{F_+}) \geqslant e^{-\frac{\epsilon }{25} N}. \end{aligned}$$

Proof

We abbreviate \(\Sigma = \Sigma _N(E)\), \(V = V_N(E)\), and \(\gamma _j = \gamma _j(E)\). On the other hand, the constants with \(\epsilon \) not explicitly present in the notation will be uniform in \(\epsilon \rightarrow + 0\).

Clearly, \(s_j(\pi _{F_+ }V^\intercal \pi _F^*) \leqslant \Vert \pi _{F_+ }V^\intercal \pi _F^* \Vert \leqslant 1\). Hence, it will suffice to show that on \(\Omega _\epsilon ^F[{\widetilde{\Phi }}_N(E)]\)

$$\begin{aligned} \prod _{k=1}^W s_k(\pi _{F_+ }V^\intercal \pi _F^*) \geqslant e^{-\epsilon N}. \end{aligned}$$
(31)

Let \(\Sigma ^+\) be the diagonal matrix obtained by setting the (kk) matrix entries of \(\Sigma \) to zero for \(k > W\). Then, on \(\Omega _\epsilon ^F[{\widetilde{\Phi }}_N(E)]\) we have

$$\begin{aligned} \Vert \Sigma - \Sigma ^{+} \Vert \leqslant \exp ( - c N) \end{aligned}$$

(with \(c>0\) uniform in \(\epsilon \rightarrow +0\)). Thus, \(s_j({\widetilde{\Phi }}_N \pi _F^*) = s_j(\Sigma V^\intercal \pi _F^*)\) satisfies

$$\begin{aligned} | s_j({\widetilde{\Phi }}_N \pi _F^*) - s_j(\Sigma ^{+} V^\intercal \pi _F^*)| \leqslant e^{-cN}. \end{aligned}$$

Observing that \(s_j(\Sigma ^{+} V^\intercal \pi _F^*) = s_j({\widehat{\Sigma }}^{+}\pi _{F_+} V^\intercal \pi _F^*)\), where \({\widehat{\Sigma }}^+ = \pi _{F_+} \Sigma ^+ \pi _{F_+}^*\), and that

$$\begin{aligned} s_j({\widetilde{\Phi }}_N \pi _F^*) \geqslant e^{(\gamma _j - \frac{\epsilon }{100W})N} \end{aligned}$$

on \(\Omega _\epsilon ^F\), we get (for sufficiently large N):

$$\begin{aligned} s_j({\widehat{\Sigma }}^+ \pi _{F_+} V^\intercal \pi _F^*) \geqslant e^{(\gamma _j - \frac{\epsilon }{50W})N}, \quad \prod _{k=1}^j s_k({\widehat{\Sigma }}^+ \pi _{F_+} V^\intercal \pi _F^*) \geqslant e^{(\gamma _1 + \cdots + \gamma _j -\frac{\epsilon }{50})N}. \end{aligned}$$

On the other hand, using the submultiplicativity of the operator norm and the equality between the norm of the jth wedge power of a matrix and the product of its j top singular values, we have

$$\begin{aligned} \begin{aligned} \prod _{k=1}^j s_k({\widehat{\Sigma }}^+ \pi _{F_+} V^\intercal \pi _F^*)&\leqslant \prod _{k=1}^j s_k ({\widehat{\Sigma }}^+) \times \prod _{k=1}^j s_k(\pi _{F_+} V^\intercal \pi _F^*)\\&\leqslant e^{(\gamma _1 + \cdots + \gamma _j + \frac{\epsilon }{100})N} \prod _{k=1}^j s_k(\pi _{F_+} V^\intercal \pi _F^*), \end{aligned} \end{aligned}$$

whence

$$\begin{aligned} \prod _{k=1}^j s_k(\pi _{F_+} V^\intercal \pi _F^*) \geqslant e^{-\frac{\epsilon }{25}N}, \quad 1 \leqslant j \leqslant W, \end{aligned}$$

thus concluding the proof of (31) and of the claim. \(\square \)

3.3 Wegner-Type Estimate: Proof of Proposition 1.3

Let us first show that for any \(i \in [-N, N]\)

$$\begin{aligned} {\mathbb {P}} \left\{ \Vert G_E[H_{-N,N]}](i,i) \Vert \geqslant e^{\epsilon N} \right\} \leqslant C_\epsilon e^{-c_\epsilon N }. \end{aligned}$$
(32)

By Claim 2.2,

$$\begin{aligned} G_E[H_{[-N, N]}](i, i) = \left( \Psi _{i+1}^+ (\Psi _i^+)^{-1} - \Psi _{i+1}^- (\Psi _i^-)^{-1}\right) ^{-1} L_i^{-1}, \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} \left( {\begin{array}{c}\Psi _{i+1}^+\\ \Psi _i^+\end{array}}\right)&= \Phi _{i+1,N+1} \left( {\begin{array}{c}0\\ \mathbbm {1}\end{array}}\right) = \left( \begin{array}{cc} \mathbbm {1} &{} 0 \\ 0 &{} L_i^{- \intercal } \end{array} \right) {\widetilde{\Phi }}_{i+1,N+1} \left( {\begin{array}{c}0\\ L_N^\intercal \end{array}}\right) ,\\ \left( {\begin{array}{c}\Psi _{i+1}^-\\ \Psi _i^-\end{array}}\right)&= \Phi _{i+1,-N} \left( {\begin{array}{c}\mathbbm {1}\\ 0\end{array}}\right) = \left( \begin{array}{cc} \mathbbm {1} &{} 0 \\ 0 &{} L_i^{- \intercal } \end{array} \right) {\widetilde{\Phi }}_{i+1,-N} \left( {\begin{array}{c}\mathbbm {1}\\ 0\end{array}}\right) . \end{aligned} \end{aligned}$$

Hence,

$$\begin{aligned} G_E[H_{[-N, N]}](i, i) = L_i^{-\intercal } \left( X^+ - X^- \right) ^{-1} L_i^{-1}, \end{aligned}$$

where

$$\begin{aligned} X^+ = ({\widetilde{\Phi }}_{i+1,N+1} )_{12} ({\widetilde{\Phi }}_{i+1,N+1})_{22}^{-1}, \quad X^- = ({\widetilde{\Phi }}_{i+1,-N} )_{11} ({\widetilde{\Phi }}_{i+1,-N})_{21}^{-1}, \end{aligned}$$

and the subscripts 11 and 21 represent extracting the corresponding \(W\times W\) blocks from a \(2W \times 2W\) matrix (i.e. \(Y_{11} = \pi _{F_+} Y \pi _{F_+}^*\), \(Y_{21} = \pi _{F_-} Y \pi _{F_+}^*\), in the notation of (30)). Both matrices \(X^\pm \) are Hermitian, as follows from the symplectic property of the transfer matrices.

Without loss of generality, we can assume that \(i \geqslant 0\). We shall prove that

$$\begin{aligned} {\mathbb {P}} \left\{ s_W(X^+ - X^-) \leqslant e^{-\epsilon N} \,| \, X^+ \right\} \leqslant C_\epsilon e^{-c_\epsilon N }. \end{aligned}$$

To this end, denote

$$\begin{aligned} F = \left\{ \left( {\begin{array}{c}x\\ y\end{array}}\right) \in {\mathbb {R}}^{2W} \, : \, y = - X^+ x \right\} . \end{aligned}$$

In the notation of Claim 3.4, consider the transfer matrix \({\widetilde{\Phi }}_{i+1,-N}\), and let

$$\begin{aligned} \Omega _\epsilon = \Omega _\epsilon ^F[{\widetilde{\Phi }}^*] \cap \Omega _\epsilon ^{F_+} [{\widetilde{\Phi }}^*] \cap \Omega _\epsilon ^{F_-} [{\widetilde{\Phi }}^*] \cap \Omega _\epsilon ^{F_+} [{\widetilde{\Phi }} ] \end{aligned}$$

(note that \({\widetilde{\Phi }}_{i+1,-N}\) is independent of \(X^+\) and thus also of F). It suffices to show that on \(\Omega _\epsilon \)

$$\begin{aligned} s_W(X^+ - X^-) \geqslant e^{-\frac{\epsilon }{2} N}. \end{aligned}$$
(33)

Let us write the singular value decomposition of \({\widetilde{\Phi }} = {\widetilde{\Phi }}_{i+1,-N}\) in block form:

$$\begin{aligned} \left( \begin{array}{cc} {\widetilde{\Phi }}_{11} &{} {\widetilde{\Phi }}_{12} \\ {\widetilde{\Phi }}_{21} &{} {\widetilde{\Phi }}_{22} \end{array}\right) = \left( \begin{array}{cc} U_{11} &{} U_{12} \\ U_{21} &{} U_{22} \end{array}\right) \left( \begin{array}{cc} {\widehat{\Sigma }}^+ &{} \\ &{} {\widehat{\Sigma }}^- \end{array}\right) \left( \begin{array}{cc} V_{11}^\intercal &{} V_{21}^\intercal \\ V_{12}^\intercal &{} V_{22}^\intercal \end{array}\right) \end{aligned}$$

whence on \(\Omega _\epsilon \)

$$\begin{aligned} \Vert {\widetilde{\Phi }}_{11} - U_{11} {\widehat{\Sigma }}^+ V_{11}^\intercal \Vert , \Vert {\widetilde{\Phi }}_{21} - U_{21} {\widehat{\Sigma }}^+ V_{11}^\intercal \Vert \leqslant e^{-cN}. \end{aligned}$$

Further, by Claim 3.4 we have on \(\Omega _\epsilon \):

$$\begin{aligned} s_W(U_{11}), s_W(U_{21}), s_W(V_{22}) \geqslant e^{-\frac{\epsilon }{25} N}. \end{aligned}$$
(34)

Let us show that

$$\begin{aligned} \Vert X^- - U_{11} U_{21}^{-1} \Vert \leqslant e^{-c'N}. \end{aligned}$$
(35)

To this end, start with the relation

$$\begin{aligned} X^- = (U_{11} {\widehat{\Sigma }}^+ V_{11}^\intercal + E_1) (U_{21} {\widehat{\Sigma }}^+ V_{11}^\intercal + E_2)^{-1}, \quad \Vert E_1\Vert , \Vert E_2\Vert \leqslant e^{-cN}. \end{aligned}$$

In view of the bound

$$\begin{aligned} s_W(U_{21} {\widehat{\Sigma }}^+ V_{11}^\intercal ) \geqslant s_W(U_{21}) s_W({\widehat{\Sigma }}^+) s_W(V_{11}^\intercal ) \geqslant e^{+c_1 N}, \end{aligned}$$

we can set \(E_2' = E_2 (U_{21} {\widehat{\Sigma }}^+ V_{11}^\intercal )^{-1}\) and rewrite

$$\begin{aligned} (U_{21} {\widehat{\Sigma }}^+ V_{11}^\intercal + E_2)^{-1} = (U_{21} {\widehat{\Sigma }}^+ V_{11}^\intercal )^{-1} (1 + E_2'), \quad \Vert E_2'\Vert \leqslant e^{-c_2N}, \end{aligned}$$

which implies (35).

Now, the matrix \(X^+\) is symmetric, therefore \(x - X^+ y = 0\) for \(\left( {\begin{array}{c}x\\ y\end{array}}\right) \in F^\perp \), whence for any \(\left( {\begin{array}{c}x\\ y\end{array}}\right) \in {\mathbb {R}}^{2W}\)

$$\begin{aligned} x - X^+ y = (\mathbbm {1} \,\,\mid \,\, - X^+) \pi _F^* \pi _F \left( {\begin{array}{c}x\\ y\end{array}}\right) \end{aligned}$$

(where the first term is a \(1 \times 2\) block matrix). Therefore, we have, by another application of Claim 3.4:

$$\begin{aligned} \begin{aligned} s_W(U_{11} - X^+ U_{21})&= s_W((\mathbbm {1} \,\,\mid \,\, - X^+) \pi _F^* \pi _F U \pi _{F_+}^*) \\&\geqslant s_W((\mathbbm {1} \,\,\mid \,\, - X^+) \pi _F^*) s_W(\pi _F U \pi _{F_+}^*)\\&\geqslant s_W(\pi _F U \pi _{F_+}^*) \geqslant e^{-\frac{\epsilon }{25}N}. \end{aligned} \end{aligned}$$

This, together with (35) and (34), concludes the proof of (33), and of (32).

Now we consider the elements \(G_E[H_{[-N,N]}](i, i\pm 1)\). We have:

$$\begin{aligned} G_E[H_{[-N,N]}](i, i\pm 1) =\Psi _{i+1}^\pm (\Psi _i^\pm )^{-1} G_E[H_{[-N,N]}](i, i). \end{aligned}$$

The norm of \( G_E[H_{[-N,N]}](i, i)\) is controlled by (32), whereas \(\Psi _{i+1}^\pm (\Psi _i^\pm )^{-1} = L_i^{-1} X^\pm \) are controlled using (35) and Claim 3.4. \(\square \)

4 On Generalisations

Other distributions The assumptions (A)–(C) in Theorems 1 and 2 can probably be relaxed. Instead of a finite fractional moment in (A), it should be sufficient to assume the existence of a sufficiently high logarithmic moment:

$$\begin{aligned} \mathbb {E} (\log _+^A \Vert V_0 \Vert + \log _+^A \Vert L_0\Vert + \log _+^A \Vert L_0^{-1}\Vert ) < \infty \end{aligned}$$

for a sufficiently large \(A > 1\). To carry out the proof under this assumption in place of (A), one would need appropriate versions of large deviation estimates for random matrix products.

As we saw in the previous section, the rôle of the assumptions (B)–(C) is to ensure that the conditions of the Goldsheid–Margulis theorem [16] are satisfied. That is, our argument yields the following:

Theorem 3

Let \(I \subset {\mathbb {R}}\) be a compact interval. Assume (A) and that for any \(E \in I\) the group generated by

$$\begin{aligned} \left\{ Q(L, E\mathbbm {1} - V)\right\} _{L \in {\mathcal {L}}, \, V \in {\mathcal {V}}} \end{aligned}$$

is Zariski-dense in \({\text {Sp}}(2W, {\mathbb {R}})\). Then:

  1. 1.

    The spectrum of H in I is almost surely pure point, and

    $$\begin{aligned} {\mathbb {P}} \left\{ \forall (E, \psi ) \in {\mathcal {E}}[H] \,\,\, E \in I \Longrightarrow \limsup _{x \rightarrow \pm \infty } \frac{1}{|x|} \log \Vert \psi (x)\Vert \leqslant - \gamma _W(E)\right\} =1;\nonumber \\ \end{aligned}$$
    (36)
  2. 2.

    for any compact subinterval \(I' \subset I\) (possibly equal to I) one has:

    $$\begin{aligned} {\mathbb {P}}\left\{ \limsup _{x \rightarrow \pm \infty } \frac{1}{|x|} \log Q_I(x, y) \leqslant - \inf _{E \in I} \gamma _W(E) \right\} = 1. \end{aligned}$$
    (37)

As we saw in the previous section, the second condition of this theorem is implied by our assumptions (B)–(C). Most probably, weaker assumptions should suffice, and, in fact, we believe that the conclusions of Theorems 1 and 2 hold as stated without the assumption (B). A proof would require an appropriate generalisation of the results of Goldsheid [15].

Another interesting class of models appears when \(V_x \equiv 0\). The complex counterpart of this class, along with a generalisation in which the distribution of \(L_x\) depends on the parity of x, has recently been considered by Shapiro [31], in view of applications to topological insulators. An interesting feature of such models is that the slowest Lyapunov exponent \(\gamma _W(E)\) may vanish at \(E=0\). This circle of questions (in particular, the positivity of the smallest Lyapunov exponent and Anderson localisation) is studied in [31] under the assumption that the distribution of \(L_0\) in \({\text {GL}}(W, {\mathbb {C}})\) is regular. In order to extend the results of [31] (for matrices complex entries) to singular distributions, one would first need an extension of [16] to the Hermitian symplectic group.

Returning to the (real) setting of the current paper, assume that (B)–(C) are replaced with

(B\('\)):

the group generated by \({\mathcal {L}}\) is Zariski-dense in \({\text {GL}}(W, {\mathbb {R}})\);

(C\('\)):

\(V_x \equiv 0\).

Along the arguments of [31], one can check that the conditions of [16] hold for any \(E \ne 0\). From Theorem 3, one deduces that the conclusion of Theorem 1 holds under the assumptions (A), (B\('\)), (C\('\)), whereas the conclusion (37) of Theorem 2 holds for compact intervals I not containing 0. If \(\gamma _W(0)= 0\), (37) is vacuous for \(I \ni 0\). If \(\gamma _W(0) > 0\), (37) is meaningful and probably true for such intervals, however, additional arguments are required to establish the large deviation estimates required for the proof.

Finally, we note that Theorem 3 remains valid if the independence assumption is relaxed as follows: \(\{(V_x, L_x)\}_{x \in {\mathbb {Z}}}\) are jointly independent (i.e. we can allow dependence between \(V_x\) and the corresponding \(L_x\)).

The half-line Similar results can be established for random operators on the half-line. For simplicity, we focus on the case \(L_x \equiv \mathbbm {1}\). Fix a Lagrangian subspace \(F \subset {\mathbb {R}}^{2W}\), and consider the space \({\mathcal {H}}_F\) of square-summable sequences \(\psi : {\mathbb {Z}}_+ \rightarrow {\mathbb {C}}^W\) such that \(\left( {\begin{array}{c}\psi (1)\\ \psi (0)\end{array}}\right) \in F\). Define an operator \(H_F\) acting on \({\mathcal {H}}_F\) so that

$$\begin{aligned} (H_F\psi )(x) = L_x \psi (x+1) + V_x \psi (x) + L_{x-1}^\intercal \psi (x-1), \quad x \geqslant 1 \end{aligned}$$

(see e.g. [4] for details).

Theorem 4

Fix a Lagrangian subspace \(F \subset {\mathbb {R}}^{2W}\). Under the assumptions (A) and (C) with \(L_x \equiv 1\), the spectrum of \(H_F\) in any compact interval I is almost surely pure point, and

$$\begin{aligned} {\mathbb {P}} \left\{ \forall (E, \psi ) \in {\mathcal {E}}[H_F] \,\,\, E \in I \Longrightarrow \limsup _{x \rightarrow \infty } \frac{1}{|x|} \log \Vert \psi (x)\Vert \leqslant - \gamma _W(E)\right\} =1. \nonumber \\ \end{aligned}$$
(38)

Remark 4.1

  1. 1.

    For general \(L_x\), the boundary condition has to be prescribed in a different way. However, in the Dirichlet case \(F = F_+\) the result holds as stated for general \(L_x\) satisfying (A)–(B).

  2. 2.

    Combining the proof of Theorem 2 with the additional argument described below, one can also prove dynamical localisation.

  3. 3.

    For \(W=1\), a result of Kotani [24] implies that the operator \(H_F\) has pure point spectrum for almost every boundary condition F; a similar statement is valid for \(W>1\). As to fixed (deterministic) boundary conditions, the only published reference known to us is the work of Gorodetski–Kleptsyn [20], treating Schrödinger operators in \(W=1\) with Dirichlet boundary conditions.

  4. 4.

    The event of full probability provided by Theorem 4 depends on the boundary condition F. And indeed, a result of Gordon [19] implies that (almost surely) there exists a residual set of initial conditions F for which the spectrum of \(H_F\) is not pure point (and in fact has only isolated eigenvalues).

Sketch of proof of Theorem 4

We indicate the necessary modifications with respect to the proof of Theorem 1. First, we modify the definition (11) of \({\text {Res}}(\tau , E, N)\) as follows: \(x \geqslant N+1\) is said to be \((\tau ,E,N)\)-non-resonant (\(x \notin {\text {Res}}(\tau , E,N)\)) under the same condition

$$\begin{aligned} \Vert G_E[H_{[x-N, x+N]}](x, x\pm N)\Vert \leqslant e^{-(\gamma _W(E) - \tau )N}, \end{aligned}$$
(39)

while \(x \in \{1, \ldots , 2 N\}\) is said to be \((\tau ,E,N)\)-non-resonant if

$$\begin{aligned} \det (\pi _F \Phi _N(E)^* \Phi _N(E) \pi _F^*) \geqslant e^{2 (\gamma _1(E) + \cdots + \gamma _{W}(E) - \tau )N} \end{aligned}$$
(40)

(this condition does not depend on x and only depends on the restriction of the operator to [1, N]). We claim that Proposition 2.1 is still valid:

$$\begin{aligned} {\mathbb {P}} \left\{ \max _{E \in I} {\text {diam}}({\text {Res}}(\tau , E, N) \cap [1, N^2]) > 2N \right\} \leqslant C e^{-cN}. \end{aligned}$$
(41)

To prove this estimate, it suffices to show that for any \(1 \leqslant x < y \leqslant N^2\) with \(|y-x|>2N\) one has

$$\begin{aligned} {\mathbb {P}} \left\{ \exists {E \in I} : \, x, y \in {\text {Res}}(\tau , E, N) \right\} \leqslant C e^{-cN}. \end{aligned}$$
(42)

The case \(x, y > 2N\) is covered by the current Proposition 2.1. If \(x \leqslant 2N\) and \(y > 2N\) the events \(x \in {\text {Res}}(\tau , E, N)\) and \(y \in {\text {Res}}(\tau , E, N)\) are independent; the probability that \(x \in {\text {Res}}(\tau , E, N)\) is exponentially small due to the large deviation estimate (9), and the collection of E violating (40) is the union of \(\leqslant N^{2W}\) intervals. From this point the proof of (42) mimics the argument in the proof of Proposition 2.1.

As in the proof of Theorem 1, let \(\psi \) be a generalised solution at energy \(E \in I\) given by Schnol’s lemma, \(x^{-1} \log \Vert \psi (x) \Vert \rightarrow 0\). Letting \(u_x = \left( {\begin{array}{c}\psi (x)\\ \psi (x-1)\end{array}}\right) \), we have

$$\begin{aligned} \Vert \Phi _N (E) u_1 \Vert \leqslant e^{\tau N }, \end{aligned}$$

hence for sufficiently large N one has

$$\begin{aligned} s_W(\Phi _N(E) \pi _F^*) \leqslant e^{2 \tau N}. \end{aligned}$$

On the other hand, on an event of full probability one has for all \(E \in I\) and all sufficiently large N

$$\begin{aligned} (s_1 \ldots s_{W-1})(\Phi _N(E) \pi _F^*) \leqslant (s_1 \ldots s_{W-1})(\Phi _N(E)) \leqslant e^{(\gamma _1(E) + \cdots + \gamma _{W-1}(E) + \tau ) N } \end{aligned}$$

due to a version of the Craig–Simon theorem [7] (cf. [18, Lemma 2.2]). This implies

$$\begin{aligned} (s_1 \ldots s_{W})(\Phi _N(E) \pi _F^*) \geqslant e^{(\gamma _1(E) + \cdots + \gamma _{W-1}(E) + 3\tau ) N }, \end{aligned}$$

which contradicts (40) when \(\tau > 0\) is small enough. Thus, for N large enough

$$\begin{aligned} {\text {Res}}(\tau , E,N) \cap [N+1, N^2] = \varnothing , \end{aligned}$$

and thus \(\psi \) decays exponentially as in the proof of Theorem 1. \(\square \)