1 Introduction and Results

The Anderson model dates back to the work of Anderson in 1958 [2] in condensed matter physics who argued that the presence of disorder will drastically change the dynamics of electrons in a solid. This has triggered a huge research activity in mathematics and physics during the past 60 years. We refer to the monographs [3, 38, 42, 48] for an overview on the mathematics literature. While Anderson’s original work was on a lattice model, analogous phenomena have since been studied for continuum Schrödinger operators. The prototypical model investigated in this context is the ergodic Alloy-type or continuum Anderson model

$$\begin{aligned} H_\omega ^{\mathrm{erg}}= - \Delta + V_{\mathrm{per}}+ V_\omega ^{\mathrm{erg}}= H_{\mathrm{per}}+ V_\omega ^{\mathrm{erg}}, \quad V_\omega ^{\mathrm{erg}}= \sum _{j \in {\mathbb {Z}}^d} \omega _j u(\cdot - j), \end{aligned}$$

in \(L^2({\mathbb {R}}^d)\), where \(V_{\mathrm{per}}\) is a \({\mathbb {Z}}^d\)-periodic potential, \(\omega =(\omega _j)_{j \in {\mathbb {Z}}^d}\) is a sequence of independent and identically distributed random variables with bounded density, and u is a bump function modelling the effective potential around a single atom. Under mild assumptions, this random family of self-adjoint operators has almost sure spectrum, which means that there exists a set \(\Sigma \subset {\mathbb {R}}\) such that for almost every realization of \(\omega \) the random operator \(H_\omega ^{\mathrm{erg}}\) has spectrum \(\Sigma \).

The general philosophy is that randomness leads to Anderson localization, at least in a neighbourhood of the edges of \(\Sigma \), or—in dimension one—on the whole of \(\Sigma \). Anderson (or exponential) localization in an interval \(I \subset \Sigma \) means that I almost surely consists of pure point spectrum of \(H_\omega ^{\mathrm{erg}}\), that is,

$$\begin{aligned} I \subset {{\,\mathrm{\sigma }\,}}_{\mathrm{pp}}(H_\omega ^{\mathrm{erg}}),\quad I \cap {{\,\mathrm{\sigma }\,}}_c(H_\omega ^{\mathrm{erg}}) = \emptyset , \end{aligned}$$
(1.1)

and all eigenfunctions corresponding to eigenvalues in I are exponentially decaying. This is a dramatic difference to the background operator \(H_{\mathrm{per}}\) which has only absolutely continuous spectrum and no eigenvalues. There also exist stronger notions of localization such as dynamical localization, describing the non-spreading of wave packets, see, e.g. [22] for an overview. Its formally strongest form in [22, Definition 3.1 (vii)] (cf. [15, Eq. (1.6)]) is formulated in terms of a sub-exponential kernel decay in Hilbert-Schmidt norm,

$$\begin{aligned} {\mathbb {E}}\Biggl [\sup _{\Vert { f }\Vert _\infty \le 1} \Vert { \chi _{\Lambda _1(x)} \chi _I(H_\omega ^{\mathrm{erg}}) f(H_\omega ^{\mathrm{erg}}) \chi _{\Lambda _1(y)} }\Vert _{\mathrm{HS}}^2\Biggr ] \le C_\zeta \mathrm{e}^{-|{x-y}|^\zeta } \end{aligned}$$
(1.2)

for all \(x,y \in {\mathbb {Z}}^d\) and \(\zeta \in (0,1)\). Here, the supremum is taken over Borel functions on \({\mathbb {R}}\), \(\Vert { \,\cdot \, }\Vert _{\mathrm{HS}}\) denotes the Hilbert-Schmidt norm, \(\chi _\cdot \) denotes the indicator function of a set, and \(C_\zeta >0\) is a constant depending on \(\zeta \) and various model parameters. This indeed implies Anderson localization in I by the RAGE theorem, see [1, Section 2.5], or, alternatively, via [16, Theorem 4.2] in combination with [15, Theorem 3.11] or [22, Theorem 6.4]. It also yields strong full Hilbert-Schmidt-dynamical localization as in [22, Definition 3.1 (vi)] (cf. [15, Eq. (1.7)]),

$$\begin{aligned} {\mathbb {E}}\Biggl [\sup _{\Vert { f }\Vert _\infty \le 1}\Vert { |{X}|^{q/2} \chi _I(H_\omega ^{\mathrm{erg}}) f(H_\omega ^{\mathrm{erg}}) \chi _{\Gamma } }\Vert _{\mathrm{HS}}^2\Biggr ]<\infty \end{aligned}$$
(1.3)

for all \(q>0\) and all bounded Borel sets \(\Gamma \subset {\mathbb {R}}^d\), where X denotes the multiplication by x; see, e.g. [22, Remark 3.3]. In the current setting of random Schrödinger operators, (1.2) is in fact equivalent to (1.3) by [16, Theorem 4.2]. The standard method to prove (1.2) is the so-called bootstrap multi-scale analysis from [15].

The edge of \(\Sigma \) where localization is most tangible is the bottom of the spectrum. More challenging is the situation where the almost sure spectrum \(\Sigma \) has a band structure. The spectrum of operators of the form \(- \Delta + V_{\mathrm{per}}\) typically has a band structure, as can be seen by Floquet theory, see e.g. [32] for an overview. When an ergodic random potential \(V_\omega \) is added, the almost sure spectrum \(\Sigma \) of \(H_\omega ^{\mathrm{erg}}\) will inherit this band structure, tacitly acknowledging that \(V_\omega \) is not too large such that not all spectral gaps close. It is near these edges of \(\Sigma \) where we prove localization.

In dimension \(d = 1\), randomness will immediately lead to full localization on the whole spectrum [18]. In dimensions two and larger, localization in a neighbourhood of the bottom of the spectrum has been proved in different models [9, 14, 19, 20, 23, 24, 28, 36, 40]. Apart from the approach in [23], which relies on sufficiently high disorder, and the one in [40, Section 4.5] for Bernoulli-Delone-operators, which relies on similar techniques to the ones we employ below, a common strategy in this context is the utilization of the so-called Lifshitz tails at the bottom of the spectrum. These imply an initial scale estimate (ISE), a major ingredient for the above-mentioned multi-scale analysis. At internal band edges, the validity of Lifshitz tails on a general scope is a complicated issue.

Localization at internal band edges has been intensively studied in the second half of the 1990s of the last century. First results, among them [4] by Barbaroux, Combes, and Hislop and [28] by Kirsch, Stollmann, and Stolz, additionally required a sufficiently fast decay of the distribution of the random variables \(\omega _j\) near their extreme values, for which however no physical justification is given and which excludes for instance the uniform distribution. It rather is a technical assumption necessary for the proposed proofs for an initial scale estimate which proceed along the following idea: Consider the event where all coupling constants in a box are above a certain threshold, thus lifting the eigenvalues, and then tune the probability distributions of the individual coupling constants such that the probability of said event is a priori large. In some regard, our approach (as well as the one in [40]) can be understood as a refinement of the technique of [4, 28], see Remark 3.3 below.

Since then, further progress towards band edge localization has been driven by progress in Lifshitz tails. The fundamental task of understanding Lifshitz tails at internal band edges was approached by Klopp [25]. Lifshitz tails at \(E_0\) mean that the integrated density of states \(N(\cdot )\) of \(H_\omega ^{\mathrm{erg}}\) satisfies

$$\begin{aligned} \lim _{E \searrow E_0}\frac{\ln |\ln \left( N(E) - N(E_0) \right) |}{\ln \left( E - E_0 \right) }=- \frac{d}{2}. \end{aligned}$$
(1.4)

Klopp proved that Lifshitz tails occur for the random operator \(H_\omega ^{\mathrm{erg}}\) if the background operator \(H_{\mathrm{per}}\) has regular Floquet eigenvalues near these edges, which means that these edges are generated by a quadratic extremum of an eigenvalue curve in the so-called dispersion relation. This implies an initial scale estimate which, together with a Wegner estimate, can then be used to start the multi-scale analysis and prove localization [47]. In fact, for compactly supported single-site potentials u, regular Floquet eigenvalues are even equivalent to Lifshitz tails in the sense of (1.4), see [25], whereas for non-compactly supported u, this equivalence no longer holds since there exists another mechanism which leads to Lifshity tail behaviour [26]: For sufficiently slowly decaying u, the effective random potential around a particular site will be an infinite, weighted average which has thin tails and will thus be of a similar form as assumed in [4, 28].

There are now two natural questions: Firstly, is it actually possible that Floquet eigenvalues of \(H_{\mathrm{per}}\) are not regular? Secondly, how does the integrated density of states for \(H_\omega ^{\mathrm{erg}}\) look like if \(H_{\mathrm{per}}\) exhibits a non-quadratic Floquet minimum?

The first question is answered in dimension one and in dimension two for “small” potentials by [7], where it is proved that regularity of Floquet eigenvalues is generic. (it occurs in a precise sense for almost all choices of the potential \(V_{\mathrm{per}}\), but there are exceptional cases where it does not.) In higher dimensions, there are partial results, e.g. [27] which states that potentials for which band edges are attained by a single Floquet eigenvalue are generic, but the question whether regular Floquet eigenvalues are generic in all dimensions is still open [32, Conjecture 5.25].

The second question was studied by Klopp and Wolff [33]. Therein it is shown in two space dimensions that even if a proper Lifshitz tail does not occur for the integrated density of states of the random operator, a weaker version of (1.4) with \(- d/2\) replaced by \(- \alpha \) for some \(\alpha > 0\) always holds. Such an asymptotic still implies an initial scale estimate and thus localization, see Theorems 0.3 and 0.4 in [33]. While it seems plausible for the statement of [33] to hold also in higher dimensions, the reasoning therein is explicitly two-dimensional and relies on tools from analytic geometry such as the Newton diagram, which would at least introduce additional technical complications in higher dimensions. We are not aware of any progress made following this strategy since the early 2000s. In summary, in higher dimensions band edge localization has been known only under additional assumptions so far.

Our main result of this note, Theorem 1.1, proves that in dimension \(d \ge 2\), band edge localization always occurs, independently of the regularity or degeneracy of the Floquet eigenvalues and of Lifshitz tails. Recall that \(d \ge 2\) is not a restriction here since in dimension \(d = 1\) one anyway has the stronger full localization. The key idea comes from the observation that certain favourable configurations of the random potential have overwhelming probability. The scale-free quantitative unique continuation principle (UCP) for spectral projectors from [36] then allows to prove that these configurations lift eigenvalues. This results in a robust initial scale estimate the proof of which does not rely on Floquet theory and makes no use of periodicity. Quantitative unique continuation is a technique which has been introduced to the random Schrödinger operators community in [5] and has since found various applications in the theory of random Schrödinger operators [6, 11, 13, 23, 35, 36, 41, 44, 45].

Freed from the burdens of periodicity and ergodicity, we can even state our initial scale estimate in a more general context of non-ergodic Schrödinger operators in Theorem 2.2. There has recently been some activity on localization for non-ergodic operators [19, 23, 35, 39, 41, 45]. Our initial scale estimate in Theorem 2.2 might be used as an induction anchor for the multi-scale analysis for such operators, but one would have to combine this with corresponding considerations on the multi-scale analysis in the non-ergodic setting itself and on almost sure statements on the spectrum. This is a subject for future investigations. Particularly, the existence of almost sure band edges for such operators is a somewhat touchy business. In our context this is bypassed by Hypothesis (H3’) below, see also Remark 2.1. At the end of Sect. 2, we finally point out some situations of non-ergodic random Schrödinger operators where our initial scale estimate can be used as in input.

The paper is organized as follows: In Sect. 1.1, the notation and the ergodic model are introduced, whereas Sect. 1.2 presents the main result, Theorem 1.1, on band edge localization. After that, Sect. 2 presents Theorem 2.2, the initial scale estimate in the non-ergodic setting. Finally, Sect. 3 is devoted to the proof of Theorem 2.2.

1.1 The Model

We always work in dimension \(d \ge 2\). For \(L > 0\) and \(x \in {\mathbb {R}}^d\), we denote by \(\Lambda _L(x)\) the open hypercube in \({\mathbb {R}}^d\) of side length L, centred at x. If \(x = 0\), we simply write \(\Lambda _L\). Similarly, we denote by \(B_\delta (x)\) the open ball of radius \(\delta \), centred at x, and if \(x = 0\) we just write \(B_\delta \). Given a measurable subset \(A \subset {\mathbb {R}}^d\), we write \(\chi _A\) for the characteristic function of this set.

We consider a \({\mathbb {Z}}^d\)-ergodic random Schrödinger operator \(H_\omega ^{\mathrm{erg}}= H_{\mathrm{per}}+ V_\omega ^{\mathrm{erg}}\) in \(L^2({\mathbb {R}}^d)\) satisfying the following hypotheses (cf., e.g. [12, 28]):

  1. (H1)

    \(H_{\mathrm{per}}= - \Delta + V_{\mathrm{per}}\), where \(V_{\mathrm{per}}\in L^\infty ({\mathbb {R}}^d)\) is \({\mathbb {Z}}^d\)-periodic and real-valued.

  2. (H2)

    \(V_\omega ^{\mathrm{erg}}= \sum _{j \in {\mathbb {Z}}^d} \omega _j u(\cdot -j)\), where \(u \in L^p({\mathbb {R}}^d)\) with \(p = 2\) if \(d \in \{2,3\}\) and \(p> d/2\) if \(d \ge 4\) is non-negative and compactly supported. Moreover, there are \(c, \delta > 0\) such that

    $$\begin{aligned} c \chi _{B_\delta } \le u. \end{aligned}$$

    The random variables \(\omega _j\) are independent and identically distributed on a probability space \((\Omega , {\mathbb {P}})\) with bounded density and support equal to the interval [0, 1].

  3. (H3)

    There are \(- \infty \le a< b < \infty \) such that \((a,b) \in \rho (H_\omega ^{\mathrm{erg}})\) and \(b \in \sigma (H_\omega ^{\mathrm{erg}})\) almost surely.

The reason why we assume the potential \(V_{\mathrm{per}}\) in (H1) to be bounded is that this is a requirement of the quantitative unique continuation principle for spectral projectors [36], a major ingredient in the proof. There have been recent works removing this boundedness assumption [30, 31], but since this is not the main focus of this work, we refrain from pursuing this path further here.

1.2 Main Results

The following theorem spells out localization in a neighbourhood of the lower edge of a connected component of the almost sure spectrum. The case of an upper edge can be treated analogously by obvious modifications to Hypothesis (H3) and the proofs.

Theorem 1.1

(Dynamical localization near band edges). Assume (H1), (H2), and (H3). Then there exists \(\epsilon > 0\) such that for all \(\zeta \in (0,1)\) there is a constant \(C_\zeta > 0\) with

$$\begin{aligned} {\mathbb {E}}\Biggl [\sup _{\Vert { f }\Vert _\infty \le 1} \Vert { \chi _{\Lambda _1(x)} \chi _{[b, b + \epsilon ]}(H_\omega ) f(H_\omega ) \chi _{\Lambda _1(y)} }\Vert _{\mathrm{HS}}^2 \Biggr ]\le C_\zeta \mathrm{e}^{-|{x-y}|^\zeta } \end{aligned}$$

for all \(x,y \in {\mathbb {Z}}^d\). Here, the supremum is taken over all Borel functions on \({\mathbb {R}}\), and \(\Vert { \,\cdot \, }\Vert _{\mathrm{HS}}\) denotes the Hilbert-Schmidt norm.

As mentioned in the introduction, Theorem 1.1 yields strong Hilbert-Schmidt-dynamical localization, as well as Anderson localization, in the interval \([b,b+\varepsilon ]\), cf. (1.3) and (1.1), respectively. It also includes the essentially well-known particular case where the bottom of the almost sure spectrum is considered, cf. [12, 28]. On the other hand, to the best of our knowledge, Theorem 1.1 provides the first proof of such localization near internal band edges in the continuum without additional assumptions. In particular, it does not require regularity of Floquet eigenvalues as in [47]. The core of the proof of Theorem 1.1 is a so-called initial scale estimate, which in the situation of the theorem takes the following form (see also Corollary 2.3 below):

For all \(q > 0\) and \(\alpha \in (0,1)\) there exists \(L_0 \in {\mathbb {N}}\) such that for all \(L \in {\mathbb {N}}\) with \(L \ge L_0\) we have

$$\begin{aligned} {\mathbb {P}}\bigl [\sigma ( H_{\omega ,L}^{\mathrm{erg}}) \cap [b, b + L^{- \alpha }) = \emptyset \bigr ]\ge 1 - L^{- q}, \end{aligned}$$
(1.5)

where \(H_{\omega ,L}^{\mathrm{erg}}\) denotes the restriction of \(H_\omega ^{\mathrm{erg}}\) onto \(L^2(\Lambda _L)\) with periodic boundary conditions. Theorem 1.1 then follows from (1.5), combined with the well-known Combes-Thomas estimate, and the Wegner estimate [28, Theorem 3.1] via the bootstrap multi-scale analysis, see, e.g. [15, Theorem 3.8]; cf. also [16, Theorems 4.1 and 4.2]. Since this procedure is well understood by now, we omit the explicit treatment of this multi-scale analysis here and just content ourselves with the proof of the initial scale estimate (1.5).

Remark 1.2

Our proof of the initial scale estimate does not rely on the fact that the \(\omega _j\) have a Lebesgue density. This opens the way to consider band edge localization for, e.g. i.i.d. Bernoulli random variables. In this situation, a Wegner estimate is no longer available, but an initial scale estimate still implies localization via the multi-scale analysis developed in [5], see for instance Remark 1.3 of [17] for an explicit reference in this setting with bounded single-site potential u.

Although it is very plausible that dynamical localization should hold also in the case where the single-site potential is not compactly supported but has sufficiently fast decay at infinity, we are not aware of any proofs for the corresponding multi-scale analysis in the literature; we note however that a partial result has been stated in [12, Theorem 4.3]. On the other hand, for only Anderson localization the multi-scale analysis in this long range setting is well established, see, e.g. [29, 34, 49]. Since the proof of our initial scale estimate is not restricted to compactly supported single-site potentials, we therefore still have the following statement:

Remark 1.3

If the single-site potential u in (H2) is not assumed to have compact support but merely to have sufficiently fast decay at infinity, the operator \(H_\omega ^{\mathrm{erg}}\) still exhibits Anderson localization near band edges, cf. [29, 47].

Our proof of the initial scale estimate does also not rely on periodicity or ergodicity. Therefore, we prove a more general initial scale estimate for not necessarily ergodic random Schrödinger operators \(H_\omega = H_0 + V_\omega \) in Theorem 2.2 below.

Let us conclude this section by briefly explaining the main idea of the proof of the initial scale estimate (1.5); the full proof in the more general non-ergodic situation can be found in Sect. 3 below: The quantitative unique continuation principle implies that eigenvalues of \(- \Delta + V\) will move up by some positive amount if the potential V is varied by some \(c > 0\) on a—in general disconnected—subset U with typical distance l between its components. However, the price to pay is that the lifting is very small and it scales unfavourably with increasing l, namely the eigenvalues will be lifted proportional to \(c \exp (- l^{4/3+\varepsilon })\). On the one hand, this seems to be too weak for the polynomial bound in (1.5). On the other hand, by large deviation arguments favourable configurations of the random potential appear with overwhelming probability for large l: The random potential will be larger than c on a set U with typical distance l between its components with probability \(1 - \exp (- l^d)\). Since \(\exp (- l^{4/3+\varepsilon })\) decays slower than \(\exp (- l^d)\) in dimensions \(d \ge 2\), we can trade the large deviations bound against the meagre eigenvalue lifting from unique continuation and conclude the statement.

2 An Initial Scale Estimate for Non-ergodic Random Schrödinger Operators

For a random operator \(H_\omega = H_0 + V_\omega \), \(L > 0\), and \(x \in {\mathbb {R}}^d\), we denote by \(H_{\omega , L, x}\) the restriction of \(H_\omega \) to \(L^2(\Lambda _L(x))\) with a fixed choice of either Dirichlet, Neumann, or periodic boundary conditions.

We formulate the following hypotheses:

  1. (H1’)

    \(H_0 = - \Delta + V_0\), where \(V_0 \in L^\infty ({\mathbb {R}}^d)\) is real-valued.

  2. (H2’)

    \(V_\omega = \sum _{j \in (G{\mathbb {Z}})^d} \omega _j u_j\), \(G > 0\), where \(u_j \in L_{\mathrm{loc}}^p({\mathbb {R}}^d)\) with \(p > d/2\) is non-negative satisfying

    $$\begin{aligned} \sup _{k\in (G{\mathbb {Z}})^d} \sum _{j\in (G{\mathbb {Z}})^d} \Vert {u_j}\Vert _{L^p(\Lambda _G(k))} < \infty . \end{aligned}$$
    (2.1)

    In addition, there are \(c > 0\) and \(\delta \in (0,G/2]\) such that for every \(j \in (G {\mathbb {Z}})^d\) there exists \(x_j \in \Lambda _G(j)\) with \(B_\delta (x_j) \subset \Lambda _G(j)\) and

    $$\begin{aligned} c \chi _{B_\delta (x_j)} \le u_j. \end{aligned}$$

    Moreover, the random variables \(\omega _j\) are independent on a probability space \((\Omega , {\mathbb {P}})\), with values contained in the interval [0, 1] almost surely, and there are \(\eta , \kappa > 0\) such that \({\mathbb {P}}[ \omega _j \ge \eta ] \ge \kappa \) for all \(j \in (G {\mathbb {Z}})^d\).

  3. (H3’)

    There are \(b \in {\mathbb {R}}\) and \(M_b \subset (G {\mathbb {N}}) \times (G {\mathbb {Z}})^d\) such that for all \((L,x) \in M_b\) there is \(a < b\) with \((a,b) \subset \rho (H_{0,L,x} + t W_{L,x})\) for all \(t \in [0,1]\), where

    $$\begin{aligned} W := \sum _{j \in (G {\mathbb {Z}})^d} u_j. \end{aligned}$$

Remark 2.1

  1. (1)

    If the random variables \(\omega _j\) in (H2’) are identically distributed and non-trivial, then one can choose \(\kappa = 1/2\) and \(\eta = {\text {Med}} (\omega _0)\), the median of \(\omega _0\).

  2. (2)

    Condition (2.1) guarantees that W in (H3’) is locally \(L^p\) with a uniform bound on the \(L^p\)-norm on cubes of side length G. Since \(p > d/2\) and \(d \ge 2\), this ensures that W (and consequently also \(V_\omega \) almost surely) belongs to the Kato class in \({\mathbb {R}}^d\) and is thus infinitesimally form bounded with respect to \(H_0\), see, e.g. [8, Section 1.2]. Note that (2.1) is obviously satisfied if \(\Vert {u_j}\Vert _{L^p(\Lambda _G(k))}\) exhibits a sufficiently fast decay in \(|{j-k}|\), say \(\Vert {u_j}\Vert _{L^p(\Lambda _G(k))} \le \text {const}/(1+|{j-k}|)^{m}\) for all \(k,j \in (G{\mathbb {Z}})^d\) and some \(m > d\).

  3. (3)

    Hypothesis (H3’) implies that for every \((L,x) \in M_b\) the number of eigenvalues of \(H_{\omega ,L,x}\) below b is almost surely constant whence the infimum of the spectrum of \(H_{\omega ,L,x}\) in \([b, \infty )\) can experience no “jumps” when the random potential is varied, see the proof of Lemma 3.2 below for more details.

The following theorem is the core of the present paper:

Theorem 2.2

(ISE for non-ergodic random Schrödinger operators). Assume Hypotheses (H1’), (H2’), and (H3’). Then, for all \(q > 0\) and \(\alpha \in (0,1)\) there exists \(L_0 \in G {\mathbb {N}}\) such that for all \((L,x) \in M_b\) with \(L \ge L_0\) we have

$$\begin{aligned} {\mathbb {P}}\bigl [\sigma ( H_{\omega ,L,x}) \cap [b, b + L^{- \alpha }) = \emptyset \bigr ] \ge 1 - L^{- q}. \end{aligned}$$

It is clear that (H1)–(H2) are a particular case of (H1’)–(H2’) with \(V_0=V_{\mathrm{per}}\), \(G=1\), and \(u_j=u(\cdot -j)\). We also note that Hypotheses (H1’)–(H2’) define a generalization of the crooked Anderson model and the Delone-Anderson model for which Wegner estimates are known [23, 41]. Let us comment on the connection between the gap conditions (H3) and (H3’): With the fixed choice of periodic boundary conditions, (H3) implies (H3’) with \(M_b = {\mathbb {N}}\times {\mathbb {Z}}^d\), and the gap (ab) in (H3) belongs to the resolvent set of each periodic box restriction in (H3’). Indeed, the function W in (H3’) then is \({\mathbb {Z}}^d\)-periodic and the almost sure spectrum of \(H^{\mathrm{erg}}_\omega \) agrees with \(\Sigma = \bigcup _{t \in [0,1]} {{\,\mathrm{\sigma }\,}}(H_{\mathrm{per}}+ t W)\), see, e.g. [42, Lemma 1.4.1]. It is then a consequence of Floquet theory that the gap \((a,b) \subset {\mathbb {R}}\setminus \Sigma \) also belongs to the resolvent set of the periodic restrictions of \(H_{\mathrm{per}}+ tW\) to the boxes \(\Lambda _L\) of integer side length in (H3’). Note that the latter reasoning also applies with Dirichlet and Neumann boundary conditions if the \({\mathbb {Z}}^d\)-periodic background potential \(V_{\mathrm{per}}\) additionally is also symmetric under all coordinate reflections. This can be seen by extending Neumann and Dirichlet eigenfunctions by symmetric and antisymmetric reflections, respectively, to a box of double the side length, on which they satisfy periodic boundary conditions.

In summary, we have seen that (1.5) is a particular case of Theorem 2.2, so that we obtain the following corollary:

Corollary 2.3

(Ergodic ISE at band edges). Assume (H1), (H2), and (H3). Then for all \(q > 0\) and \(\alpha \in (0,1)\) there exists \(L_0 \in {\mathbb {N}}\) such that for all \(L \in {\mathbb {N}}\) with \(L \ge L_0\) we have

$$\begin{aligned} {\mathbb {P}}\bigl [\sigma ( H_{\omega ,L}^{\mathrm{erg}}) \cap [b, b + L^{- \alpha }) = \emptyset \bigr ]\ge 1 - L^{- q}. \end{aligned}$$

The fact that Theorem 2.2 does not require ergodicity leads to a robustness of localization proofs which is difficult to achieve when relying on Floquet theory. We are not going to engage in the multi-scale analysis in the non-ergodic setting here, but we nevertheless point out some non-ergodic situations where Theorem 2.2 can be used as an input, namely:

Example 2.4

  1. (a)

    The Delone-Anderson model

    $$\begin{aligned} H_\omega = - \Delta + V_{\mathrm{per}}+ \sum _{x \in {\mathcal {D}}} \omega _j u( \cdot - x) \end{aligned}$$

    where \({\mathcal {D}} \subset {\mathbb {R}}^d\) is a so-called Delone set, see [23, 41] for definitions. One may even consider here to have the periodic background potential \(V_{\mathrm{per}}\) be perturbed by a fast decaying potential. We note however that an alternative approach to fast decaying perturbations can be found in [10], where it is proved that localization is robust under such perturbations.

  2. (b)

    More generally, the crooked Anderson model treated in [23, 45], assuming that the random variables do not concentrate too much, so that there exist \(\eta \) and \(\kappa \) as in (H2’).

  3. (c)

    Models of the form

    $$\begin{aligned} H_\omega = - \Delta + V_{{\mathrm{per}},G_1} + \sum _{x \in (G_2 {\mathbb {Z}})^d} \omega _j u(\cdot - j), \end{aligned}$$

    where the periodicity cell of the background potential is of a size \(G_1 > 0\) incommensurate with the ergodicity length \(G_2\), that is, the system exhibits a quasi-periodic structure.

In the above situations, Hypothesis (H3’) may be verified, for instance, by supposing that the random potential is sufficiently small, provided that the background operator admits a suitable spectral gap for the box restrictions \(H_{0,L,x}\).

3 Proof of Theorem 2.2

By scaling, we may assume \(G = 1\). Furthermore, for notational convenience, we assume that \({\mathbb {N}}\times \{ 0 \} \subset M_b\) and therefore only prove the statement for \(x = 0\) and sufficiently large L, writing \(H_{\omega ,L}\) and \(H_{0,L}\) instead of \(H_{\omega ,L,x}\) and \(H_{0,L,x}\).

The proof of Theorem 2.2 relies on the scale-free quantitative unique continuation principle [37] given in Proposition 3.1 below. We start by introducing some notation: Given \(l>0\) and \(\delta \in (0,l/2)\), a sequence \(Z=(y_j)_{j\in (l{\mathbb {Z}})^d}\) in \({\mathbb {R}}^d\) is called \((l,\delta )\)-equidistributed if \(B_\delta (y_j)\subset \Lambda _l(j)\) for all \(j\in (l{\mathbb {Z}})^d\). If Z is \((l, \delta )\)-equidistributed and \(L > 0\), we write

$$\begin{aligned} S_{Z,L} :=\bigcup _{j \in (l{\mathbb {Z}})^d} B_\delta (y_j) \cap \Lambda _L. \end{aligned}$$

Proposition 3.1

(Scale-free unique continuation principle [36, 37]). Let \(V \in L^\infty ({\mathbb {R}}^d)\), \(l \in {\mathbb {N}}_{\mathrm{odd}}\), \(L \in {\mathbb {N}}\) with \(l \le L\), and \(\delta \in (0,l/2)\). Denote the restriction of \(- \Delta + V\) to \(\Lambda _L\) with Dirichlet, Neumann, or periodic boundary conditions by \(H_L\). Let Z be an \((l,\delta )\)-equidistributed sequence. Then, for every \(E\in {\mathbb {R}}\) and every \(\phi \in {{\,\mathrm{Ran}\,}}(\chi _{(-\infty ,E]}(H_L))\) we have

$$\begin{aligned} \Vert \phi \Vert _{L^2 (S_{Z,L})}^2 \ge \left( \frac{\delta }{l} \right) ^{N \bigl (1 + l^{4/3} \Vert V \Vert _\infty ^{2/3} + l \sqrt{\max \{ E,0 \}} \bigr )} \Vert \phi \Vert _{L^2 (\Lambda _L)}^2, \end{aligned}$$
(3.1)

where \(N > 0\) is a constant that only depends on the dimension.

Note also that for fixed \(\delta \in (0, 1/2]\) and sufficiently large l (depending on \(\delta \), \(\Vert V \Vert _\infty \), E, and N only), we can estimate

$$\begin{aligned} \left( \frac{\delta }{l} \right) ^{N \bigl (1 + l^{4/3} \Vert V \Vert _\infty ^{2/3} +l \sqrt{\max \{E,0\}} \bigr )} \ge \exp \bigl (- l^{7/5}\bigr ). \end{aligned}$$
(3.2)

In the situation of Theorem 2.2, let \(J(\omega ):=\{ k\in {\mathbb {Z}}^d :\omega _k \ge \eta \} \subset {\mathbb {Z}}^d\) for \(\omega \in \Omega \) and consider for \(l \le L\) the event

$$\begin{aligned} A_{l,L} := \bigl \{ \omega \in \Omega :\Lambda _l(j) \cap J(\omega ) \ne \emptyset \ \text { for all }j\in (l{\mathbb {Z}})^d \cap \Lambda _{2L}\bigr \}. \end{aligned}$$
(3.3)

The main idea is that if \(\omega \in A_{l,L}\) then we can pick \(k_j \in \Lambda _l(j) \cap J(\omega )\), where j runs over \((l{\mathbb {Z}})^d \cap \Lambda _{2L}\), such that the corresponding points \(x_{k_j}\) from Hypothesis (H2’) are part of an \((l,\delta )\)-equidistributed sequence. The scale-free unique continuation principle then implies the following eigenvalue lifting estimate:

Lemma 3.2

There is \(l_0 \in {\mathbb {N}}\), depending only on \(\delta \), b, d, c, \(\eta \), and \(\Vert {V_0}\Vert _\infty \), such that for all \(L \in {\mathbb {N}}\) and \(l \in {\mathbb {N}}_{\mathrm{odd}}\) with \(L \ge l \ge l_0\) and all \(\omega \in A_{l,L}\) we have

$$\begin{aligned} \inf \sigma (H_{\omega , L}) \cap [b, \infty ) \ge b + \eta c \exp \bigl ( - l^{7/5} \bigr ). \end{aligned}$$

Proof

If \(\inf \sigma (H_{\omega , L}) \cap [b, \infty ) \ge b + \eta c\), there is nothing to prove. So, from now on assume that

$$\begin{aligned} \inf \sigma (H_{\omega , L}) \cap [b, \infty ) \le b + \eta c =: E. \end{aligned}$$
(3.4)

Since \(H_{0,L} \le H_{0,L} + \eta c\chi _{S_{Z,L}} \le H_{\omega ,L} \le H_{0,L} + W_L\), the minimax principle for eigenvalues implies that for every \(k \in {\mathbb {N}}\) the k-th eigenvalues, counted from the bottom of the spectrum, satisfy

$$\begin{aligned} \lambda _k(H_{0,L}) \le \lambda _k(H_{0,L} + \eta c\chi _{S_{Z,L}}) \le \lambda _k(H_{\omega ,L}) \le \lambda _k(H_{0,L} + W_L). \end{aligned}$$
(3.5)

Observe that \(t \mapsto H_{0,L} + t W_L\) defines by [21, Theorem VII.4.2] a so-called holomorphic family of operators of type (B) in a sufficiently small complex neighbourhood of [0, 1]. In turn, it follows from [21, Remark VII.4.22] that the eigenvalue curves \(t \mapsto \lambda _k( H_{0,L} + t W_L )\) with the prescribed ordering of the eigenvalues are continuous on [0, 1]. Since Hypothesis (H3’) for fixed L prevents eigenvalues of the family \([0,1] \mapsto H_{0,L} + t W_L\) from entering the corresponding interval (ab) during this variation, we conclude from Ineq. (3.5) that no eigenvalues can enter (ab) when passing from \(H_{0,L} + \eta c\chi _{S_{Z,L}}\) to \(H_{\omega ,L}\) either. In particular, there exists \(k_0 \in {\mathbb {N}}\) such that \(\lambda _{k_0}(H_{0,L})\), \(\lambda _{k_0}(H_{0,L} + \eta c\chi _{S_{Z,L}})\), and \(\lambda _{k_0}(H_{\omega ,L})\) denote the lowest eigenvalue in \([b, \infty )\) of the respective operators. Therefore, it suffices to prove that

$$\begin{aligned} \lambda _{k_0}(H_{0,L} + \eta c\chi _{S_{Z,L}}) \ge \lambda _{k_0}(H_{0,L}) + \eta c \exp (- l^{7/5}). \end{aligned}$$
(3.6)

To this end, observe that by (3.4) and (3.5) we have

$$\begin{aligned} \lambda _{k_0}(H_{0,L} + \eta c\chi _{S_{Z,L}}) \le \lambda _{k_0}(H_{\omega ,L}) \le E. \end{aligned}$$

Hence, for sufficiently large l, Proposition 3.1 and Ineq. (3.2) yield that

$$\begin{aligned} \Vert \eta c\chi _{S_{Z,L}} \phi \Vert ^2 \ge \eta c \exp (- l^{7/5}) \Vert \phi \Vert ^2 \end{aligned}$$

for all \(\phi \in {{\,\mathrm{Ran}\,}}( \chi _{(- \infty ,E]}(H_{0,L} + \eta c\chi _{S_{Z,L}}))\). Ineq. (3.6) now follows from the minimax principle for eigenvalues as, for instance, in Lemma 3.5 of [37]. \(\square \)

We are ready to prove Theorem 2.2.

Proof of Theorem 2.2

For sufficiently large \(L \in {\mathbb {N}}\) (depending only on \(\alpha \)), we find \(l \in {\mathbb {N}}_{\mathrm{odd}}\) with \(l \le L\) such that

$$\begin{aligned} \frac{1}{2} \ln (L^\alpha )^{2/3} < l \le \ln (L^\alpha )^{2/3}. \end{aligned}$$
(3.7)

Choosing L possibly larger (depending only on \(\alpha , \eta , c\)) we furthermore have that \(l^{7/5} \le \ln (L^\alpha )^{\frac{14}{15}} \le \ln (c \eta L^\alpha )\), which implies

$$\begin{aligned} \eta c \exp \bigl ( - l^{7/5} \bigr ) \ge L^{-\alpha }. \end{aligned}$$

From Lemma 3.2, we deduce

$$\begin{aligned}&{\mathbb {P}}\bigl [ \inf \sigma (H_{\omega , L}) \cap [b, \infty ) \ge b + L^{-\alpha } \bigr ]\nonumber \\&\quad \ge {\mathbb {P}}\Big [ \inf \sigma (H_{\omega , L}) \cap [b, \infty ) \ge b + \eta c \exp \bigl ( - l^{7/5} \bigr ) \Bigr ] \ge {\mathbb {P}}[A_{l,L}]. \end{aligned}$$
(3.8)

It remains to give a lower bound on \({\mathbb {P}}[A_{l,L}]\). To this end, note that since \(l \in {\mathbb {N}}_{\mathrm{odd}}\), we have for each \(j \in \Lambda _L \cap (l{\mathbb {Z}})^d\) that \(\#(\Lambda _l(j) \cap {\mathbb {Z}}^d) = l^d\). Thus

$$\begin{aligned} {\mathbb {P}}\bigl [ \{\omega :\Lambda _l(j) \cap J(\omega ) = \emptyset \} \bigr ] = {\mathbb {P}}\bigl [ \{\omega :\omega _k < \eta \ \forall k \in \Lambda _l(j) \cap {\mathbb {Z}}^d \} \bigr ] \le ( 1 - \kappa )^{l^d}. \end{aligned}$$

Inserting (3.7) yields

$$\begin{aligned} {\mathbb {P}}\bigl [ \{\omega :\Lambda _l(j) \cap J(\omega )&= \emptyset \} \bigr ] \le ( 1 - \kappa )^{(\frac{1}{2}\ln (L^\alpha )^{2/3})^d}\\&= \exp \Bigl ( \frac{\ln (1-\kappa )}{2^d} \alpha ^{2d/3} \ln (L)^{(2d-3)/3} \cdot \ln L \Bigr )\\&\le \exp \bigl ( -(q + d) \ln L - d \ln 2 \ln L \bigr ) \\&\le \exp \bigl ( -q \ln (L) - d \ln (2 L) \bigr ) = (2L)^{-d} L^{-q}, \end{aligned}$$

provided that L is so large that \(-\frac{\ln (1-\kappa )}{2^d} \alpha ^{2d/3} \ln (L)^{(2d-3)/3} \ge (q + d) + d \ln 2\); recall here that \(d \ge 2\).

Finally, by a union bound we obtain

$$\begin{aligned} {\mathbb {P}}[\Omega \setminus A_{l,L}]&\le \sum _{j \in \Lambda _{2 L} \cap (l{\mathbb {Z}})^d} {\mathbb {P}}\bigl [ \{\omega :\Lambda _l(j) \cap J(\omega ) = \emptyset \} \bigr ]\\&\le (2L)^d \cdot (2L)^{-d} \cdot L^{-q} = L^{-q}, \end{aligned}$$

which, in view of (3.8), proves the claim. \(\square \)

Remark 3.3

  1. (1)

    At the bottom of the spectrum, it suffices to have a quantitative unique continuation principle for eigenfunctions (and not for spectral subspaces) as in [41]. In the setting of Bernoulli random variables, such an argument appears in the PhD thesis of Rojas-Molina, see [40, Section 4.5], and in [35, Lemma 3.8].

  2. (2)

    Our approach (as well as the one in [40]) can in some regard be understood as a refinement of the technique of [4, 28]: We do not consider the event where all coupling constants exceed a value, but only random variables in a sufficiently rich subset. Due to large deviation arguments, the probability of such configurations has thin tails and this eliminates the need to tune the individual probability distributions as in [4, 28].

  3. (3)

    The proof of Theorem 2.2 merely relies on the fact that configurations for which the potential is larger than \(\eta c\) on an \((l, \delta )\)-equidistributed set within \(\Lambda _L\) has overwhelming probability. Therefore, its proof verbatim transfers to other models which share this feature such as the random breather model [36, 43, 46].

  4. (4)

    We use that the probability for the set \(\{ x \in \Lambda _l :V_\omega (x) \ge c \}\) to contain a \(\delta \)-ball is of order \(1 - \exp ( - l^d)\). A careful look at the proof shows that it would indeed suffice for this to be of order \(1 - \exp ( - l^{4/3 + \epsilon })\) for some \(\epsilon > 0\). Thus, for any non-negative random potential with this property we also obtain an initial scale estimate. This includes for instance models based on random fields such as Gaussian processes.