The Fermionic Entanglement Entropy of the Vacuum State of a Schwarzschild Black Hole Horizon

We define and analyze the fermionic entanglement entropy of a Schwarzschild black hole horizon for the regularized vacuum state of an observer at infinity. Using separation of variables and an integral representation of the Dirac propagator, the entanglement entropy is computed to be a prefactor times the number of occupied angular momentum modes on the event horizon.

black holes behave thermally if one interprets surface gravity as temperature and the area of the event horizon as entropy [2,13]. The analogy to the second law of thermodynamics suggests that the area of the black hole horizon should only increase. However, this is in contradiction with the discovery of Hawking radiation and the resulting "evaporation" of a black hole [11,12]. This so-called information paradox [14] inspired the holographic principle [28,29] and the current program of attempting to understand the structure of spacetime via information theory, entanglement entropy and gauge/gravity dualities [20], [16].
The present work is devoted to the mathematical analysis of the entropy of a horizon of a Schwarzschild black hole of mass M . More precisely, we compute the entanglement entropy of the quasi-free fermionic Hadamard state which is obtained by frequency splitting for the observer in a rest frame at infinity, with an ultraviolet regularization on a length scale ε. We find that, up to a prefactor which depends on εM , this entanglement entropy is given by the number of occupied angular momentum modes, making it possible to reduce the computation of the entanglement entropy to counting the number of occupied states.
Entropy is a measure for the disorder of a physical system. There are various notions of entropy, like the entropy in classical statistical mechanics as introduced by Boltzmann and Gibbs, the Shannon and Rényi entropies in information theory or the von Neumann entropy for quantum systems. Here we focus on the entanglement entropy of a quasi-free fermionic state. In a more physical language, we consider a free Fermi gas formed of non-interacting one-particle Dirac states. Based on formulas derived in [15,18] (for more details see the preliminaries in Section 2.1), the entanglement entropy can be expressed in terms of the reduced one-particle density operator. We choose this one-particle density operator as the regularized projection operator to all negative-frequency solutions of the Dirac equation in the exterior Schwarzschild geometry (where frequency splitting refers to the Schwarzschild time of an observer at rest). Making use of the integral representation of the Dirac propagator in [7] and employing techniques developed in [18,30,25,26,24], it becomes possible to compute the entanglement entropy on the black hole horizon explicitly.
More precisely, denoting the regularized projection operator to the negative frequency solutions of the Dirac equation by Π − ε , we consider the entropic difference operator as introduced [17, Section 3] (for more details and references see the preliminaries in Section 2.1) where η is a logarithmic function describing the entanglement entropy of the corresponding Fock state, x ∈ (0, 1) 0 , else (for a plot see Figure 1 on page 5). In order to obtain the entropy of the horizon, we chooseΛ as an annular region around the horizon of width ρ, i.e. in Regge-Wheeler coordinatesΛ := (u 0 − ρ, u 0 ) × S 2 (see also Figure 2 on page 22). In these coordinates, the horizon is located at u → −∞. Therefore, the fermionic entanglement entropy is obtained as the trace of the entropic difference operator (1.1) in the limit u 0 → −∞. We shall prove that, to leading order in the regularization length ε, this trace is independent of ρ. It turns out that we get equal contributions from the two boundaries at u 0 − ρ and u 0 as u 0 → −∞. Therefore, the fermionic entanglement entropy is given by one half this trace. Before stating our main result, we note that the trace of the entropic difference operator can be decomposed into a sum over all occupied angular momentum modes, i.e. tr ∆(Π − ε ,Λ) = (k,n) occupied where (Π − ε ) kn can be thought of as diagonal block element of Π − ε acting on a subspace of the solution space corresponding to the given angular mode. As a consequence of the mode decomposition, the characteristic function χΛ goes over to χ Λ with Λ = (u 0 − ρ, u 0 ). We define the mode-wise entropy of the black hole as where f (ε) is a function describing the highest order of divergence in ε (we will later see that here f (ε) = ln(M/ε) with M the black hole mass). Finally, the resulting fermionic entanglement entropy of the black hole can be written as the sum of the entropies of all occupied modes, Our main result shows that S kn has the same numerical value for each angular mode. Theorem 1.1. Let n ∈ Z and k ∈ Z + 1/2 arbitrary then where M is the black hole mass.
In simple terms, this result shows that each occupied angular momentum mode gives the same contribution to the entanglement entropy. This makes it possible to compute the entanglement entropy of the horizon simply by counting the number of occupied angular momentum modes. This is reminiscent of the counting of states in string theory [27] and loop quantum gravity [1]. In order to push the analogy further, assuming a minimal area ε 2 on the horizon, the number of occupied angular modes should scale like M 2 /ε 2 . In this way, we find that the entanglement entropy is indeed proportional to the area of the black hole. More precisely, the factor ln(M/ε) in the above theorem can be understood as an enhanced area law. We point out that, in our case, the counting takes place in the four-dimensional Schwarzschild geometry.
The article is structured as follows. Section 2 provides the necessary preliminaries on entanglement entropy, the Dirac equation, the Schwarzschild Propagator and Schatten classes. In Section 3 the regularized projection operator on the negative-frequency solutions of the Dirac equation is defined and decomposed into angular momentum modes. For each angular momentum mode, the resulting functional calculus is formulated and the corresponding operator is rewritten in the language of pseudo-differential operators. Moreover, the symbol will be further simplified at the horizon. After these preparations, we can give a mathematical definition of the entanglement entropy (Section 4). In the slightly technical Section 5, we establish some helpful tools for working with pseudo-differential operators. Following the preparations, the core of the work begins in Section 6, where we calculate the entropy of a simplified limiting operator (in the sense that the regularization goes to zero) at the horizon. Afterwards (Section 7) we estimate the error caused by using the limiting operator instead of the regularized one. It turns out that it drops out in the limiting process. Finally (Section 8) we complete the proof of the main result (Theorem 1.1) by combining the results from the previous sections. We finally discuss conclusions and open problems (Section 9).
They satisfy the canonical anti-commutation relations and all other operators anti-commute, A quasi-free fermionic state Ω is characterized by its two-point distribution denoted by Ω Ψ † (φ ) Ψ(φ) . We consider the state where this two-point distribution has the form Ω Ψ † (φ ) Ψ(φ) = (φ | Π − ε φ) m , where Π − ε is the reduced one-particle density operator. As shown in [15], the von Neumann entropy S(Ω) of the quasi-free fermionic state can be expressed in terms of the reduced one-particle density operator by S(Ω) = tr η(Π − ε ) , where η is the function (for a plot see Figure 1). For the entanglement entropy we need to assume that the Hilbert space H m is formed of wave functions in spacetime. Restricting them to a Cauchy surface, we obtain functions defined on three-dimensional space N (which could be R 3 or, more generally, a three-dimensional manifold). Given a spatial subregion Λ ⊂ N, the entanglement entropy H(Ω, Λ) is defined by (for details see [17,Section 3]) Note that here and in the rest of the paper we sometimes identify the multiplication operator M f by the function f with the corresponding function. For example we just used this convention when writing  Figure 1. Plot of the function η. Note that it is non-differentiable at x = 0 and x = 1, but smooth everywhere else. Moreover, it vanishes at x = 0 and x = 1.
Remark 2.1. We point out that our definition of entanglement entropy differs from the conventions in [15,17] in that we do not add the entropic difference operator of the complement ofΛ. This is justified as follows. On the technical level, our procedure is easier, because it suffices to consider compact spatial regions (indeed, we expect that the entropic difference operator on the complement ofΛ is not trace class). Conceptually, restricting attention to the entropic difference operator ofΛ can be understood from the fact that occupied states which are supported either inside or outsideΛ do not contribute to the entanglement entropy. Thus it suffices to consider the states which are non-zero both inside and outside. These "boundary states" are taken into account already in the entropic difference operator (1.1).
This qualitative argument can be made more precise with the following formal computation, which shows that at least the unregularized entropic difference is the same for the inner and the outer parts: First of all note that η(x) vanishes at x = 0 and x = 1. Since Π − is a projection this means that η(Π − ) = 0 and therefore tr χΛ η(Π − ) χΛ = 0 = tr χΛ c η(Π − ) χΛ c .
Due to the symmetry of η, namely for any x ∈ R , this then leads to Repeating the same argument as before with χΛ c Π − χΛ c finally gives As a consequence, the corresponding entanglement entropy is given by Regularizing this expression, we end up with twice the entropic difference with respect to Π − ε andΛ. ♦

The Dirac Equation in
Globally Hyperbolic Spacetimes. Since we are ultimately interested in Schwarzschild space time, the abstract setting for the Dirac equation is given as follows (for more details see for example [8]). Our starting point is a four dimensional, smooth, globally hyperbolic Lorentzian spin manifold (M, g), with metric g of signature (+, −, −, −). We denote the corresponding spinor bundle by SM. Its fibres S x M are endowed with an inner product ≺.|. x of signature (2, 2), referred to as the spin inner product. Moreover, the mapping where the γ j are the Dirac matrices defined via the anti-commutation relations provides the structure of a Clifford multiplication.
Smooth sections in the spinor bundle are denoted by C ∞ (M, SM). Likewise, C ∞ 0 (M, SM) are the smooth sections with compact support. We also refer to sections in the spinor bundle as wave functions. The Dirac operator D takes the form where ∇ denotes the connections on the tangent bundle and the spinor bundle. Then the Dirac equation with parameter m (in the physical context corresponding to the particle mass) reads (D − m) ψ = 0 .
Due to global hyperbolicity, our spacetime admits a foliation by Cauchy surfaces M = (N t ) t∈R . Smooth initial data on any such Cauchy surface yield a unique global solution of the Dirac equation. Our main focus lies on smooth solutions with spatially compact support, denoted by C ∞ sc (M, SM). The solutions in this class are endowed with the scalar product where N is a Cauchy surface N with future-directed normal ν (compared to the conventions in [8], we here preferred to leave out a factor of 2π). This scalar product is independent on the choice of N (for details see [8,Section 2]). Finally we define the Hilbert space (H m , (.|.) m ) by completion, 2.3. The Dirac Propagator in the Schwarzschild Geometry.
2.3.1. The Integral Representation of the Propagator. We recall the form of the Dirac equation in the Schwarzschild geometry and its separation, closely following the presentation in [7] and [9]. Given a parameter M > 0 (the black hole mass), the exterior Schwarzschild metric reads Here the coordinates (t, r, ϑ, ϕ) takes values in the intervals where r 1 := 2M is the event horizon. In this geometry, the Dirac operator takes the form (see also [9, Section 2.2]): It is most convenient to transform the radial coordinate to the so called Regge-Wheelercoordinate u ∈ R defined by In this coordinate, the event horizon is located at u → −∞, whereas u → ∞ corresponds to spatial infinity i.e. r → ∞. Then the Dirac equation can be separated with the ansatz with k ∈ Z + 1/2, n ∈ N and ω ∈ R. The angular functions Y kn ± can be expressed in terms of spin-weighted spherical harmonics. 1 The radial functions X kn ± satisfy a 1 To be more precise, Y kn + (ϑ) = 1/2 Y kn (ϑ, ϕ)e −ikϕ and Y kn − (ϑ) = −1/2 Y kn (ϑ, ϕ)e −ikϕ , where ±1/2 Y kn are the ordinary spin-weighted spherical harmonics. Also note that the factor e −ikϕ cancels the ϕdependence. For details see also [10]. system of partial differential equations for details see [7,Section 2]. Moreover, employing the ansatz 2) goes over to a system of ordinary differential equations, which admits two two-component fundamental solutions labeled by a = 1, 2. We denote the resulting Dirac solution by X knω a = (X knω a,+ , X knω a,− ) (for more details on the choice of the fundamental solutions see Section 2.3.4 below).
In what follows we will often use the following notation for two-component functions The norm in C 2 will be denoted by | . |, the canonical inner product on L 2 (R, C 2 ) by .|. and the corresponding norm by . . As implied by [7,Theorem 3.6], one can then find the following formula for the mode-wise propagator: 2) can be written as for any u, t ∈ R. The X knω a (x) are the fundamental solutions mentioned before. Here the coefficients t knω ab satisfy the relations where the Hamiltonian H kn is an essentially self-adjoint operator on L 2 (R, C 2 ) with dense domain D(H) = C ∞ 0 (R, C 2 ). This makes it possible to write the solution of the Cauchy problem as Here, the initial data can be an arbitrary vector-valued function in the Hilbert space, i.e. X 0 ∈ L 2 (R, C 2 ). If we specialize to smooth initial data with compact support, i.e. X 0 ∈ C ∞ 0 (R, C 2 ), then the time evolution operator can be written with the help of Theorem 2.2 as We point out that this formula does not immediately extend to general X 0 ∈ L 2 (R, C 2 ); we will come back to this technical issue a few times in this paper.

2.3.3.
Connection to the Full Propagator. In this section we will explain, why it suffices to focus on one angular mode instead of the full propagator and why we can use the ordinary L 2 -scalar product instead of (.|.) m .
To this end, we introduce the function Moreover, for each fixed k ∈ Z + 1/2, n ∈ Z we denote by (H 0 m ) kn the completion of (again with respect to (.|.) m ), where ((k i , n i )) i∈N is an enumeration of (Z + 1/2) × Z. Furthermore, each space (H 0 m ) kn can be connected with L 2 (R, C 2 ) using the mapping S : Then a direct computation shows the scalar products transform as S ψ |Sφ L 2 = (ψ | φ) m for any φ, ψ ∈ (H 0 m ) kn . This shows thatS is unitary and we can identify the two spaces. Now recall that the Dirac-equation can be separated by solutions of the form and can then be described mode-wise by the Hamiltonian H kn on the space L 2 (R, C 2 ). Therefore denotingH kn :=S −1 H knS , the diagonal block operator (with respect to the decomposition (2.4)) defines an essentially self-adjoint Hamiltonian for the original Dirac equation on the space H 0 m . Moreover, any function ofH is of the same diagonal block operator form. The same holds for any multiplication operator M χŨ , whereŨ is a spherical symmetric set In particular, such an operator has the block operator representation We therefore conclude that when computing traces of operators of the form (for some suitable function f ), we may consider each angular mode separately and then sum over the occupied states (and similarly for Schatten norms of such operators). Moreover we point out that instead of (H 0 m ) kn we can work with the corresponding objects in L 2 (R, C 2 ), as the spaces are unitarily equivalent. Note, that then the multiplication operator M χŨ goes over to M χ U , i.e.
In particular this leads to tr χŨ f (H)χŨ =

2.3.4.
Asymptotics of the Radial Solutions. We now recall the asymptotics of the solutions of the radial ODEs and specify our choice of fundamental solutions. Since we want to consider the propagator at the horizon, we will need near-horizon approximations of the X knω 's. In order to control the resulting error terms, we now state a slightly stronger version of [7, Lemma 3.1], specialized to the Schwarzschild case. Lemma 2.3. For any u 2 ∈ R fixed, in Schwarzschild space every solution X ≡ X knω for u ∈ (−∞, u 2 ) is of the form where the error term R 0 decays exponentially in u, uniformly in ω. More precisely, writing the vector-valued function g = (g + , g − ) satisfies the bounds where λ is a dimensionless constant depending only on k and n.
The proof, which follows the method in [7], is given in detail in Appendix A.
We can now explain how to construct the fundamental solutions X a = (X + a , X − a ) for a = 1 and 2 (for this see also [7, p. 41] and [9, p. 9-10]). In the case |ω| > m we choose X 1 and X 2 such that the corresponding functions f 0 from the previous lemma are of the form In the case |ω| ≤ m we consider the behavior of solutions at infinity (i.e. asymptotically as u → ∞). It turns out that there is (up to a prefactor) a unique fundamental solution which decays exponentially. We denote it by X 1 . Moreover, we choose X 2 as an exponentially increasing fundamental solution. We normalize the resulting fundamental system at the horizon by lim Representing these solutions in the form of the previous lemma we obtain with coefficients f ± 0,1/2 ∈ C. Due to the normalization, we know that |f 0,1/2 | = 1 and in particular |f ± 0,1/2 | ≤ 1 . Note however, that f 0 and R 0 from the previous Lemma may in general also depend on k and n, but we will suppress to corresponding indices for ease of notation.
2.4. Schatten Classes and Norms. As we will later see, we can often estimate traces of functions of operators by their so called Schatten norms, which we now introduce (the following definition and example are based on [26, Section 2.1]).
Definition 2.4. Let H and G be Hilbert spaces and S ∞ the set of compact operators from H to G. 2 Moreover, let S ⊆ S ∞ be a two-sided ideal with a functional . S on it, then . S is called a quasi-norm if the following three conditions are satisfied Moreover, if a quasi-norm . S is called symmetric if it fulfills the two additional conditions noting the operator norm (throughout the paper) (5) T S = T ∞ for any T ∈ S with one-dimensional image.
Let S as before with a symmetric quasi-norm . S such that S is complete, then the pair (S, . S ) is referred to as a quasi-normed ideal.
A quasi-normed ideal (S, . S ) is called q-normed ideal if there is an equivalent quasi-norm ||| . ||| S such that the so called q-triangle-inequality In what follows we will sometimes use the convention, that if we write T S for any operator T ∈ L(H, G) then we automatically imply that T is already in S.
Example 2.5. Let q ∈ (0, ∞) and S q be the q th Schatten-von-Neumann ideal, i.e. all compact operators T such that is finite, where the s k (T ) denote the singular numbers of T , i.e. the square roots of the eigenvalues of T * T (which are clearly non-negative). Then for q ≥ 1 the functional . q defines a norm and for q ∈ (0, 1) a q-norm (see [26, Section 2.1] with references to [5] and [21]).
Note that for q = 1 this coincides with the trace-norm. ♦ 2 Note that this is a two-sided ideal.
Remark 2.6. Note that the q-th Schatten-norm is invariant under unitary transformations: Let H and G be Hilbert spaces, U ∈ L(G, H) unitary and A ∈ S q ⊆ L(H) then which is unitarily equivalent to A * A and thus has the same eigenvalues showing that In particular, in the case q = 1 this shows that the trace norm of A is conserved under unitary transformation.
Moreover, we will frequently use the following function norms (see for example [24, p. 5-6] with slight modifications) 3 , which are continuous, bounded and continuously partially differentiable in the first variable up to order n, in the second to m and in the third to k and whose partial derivatives up to these orders are bounded as well. For b ∈ S (n,m,k) and l, r > 0 we introduce the norm Similarly, S (n,k) (R d ) with n, k ∈ N 0 denotes the space of all complex-valued functions on (R d ) 2 , which are continuous and bounded and continuously partially differentiable in the first variable up to order n and in the second to m and whose partial derivatives up to these orders are bounded. For b ∈ S (n,m) (R d ) and l, r > 0 we introduce the norm Note that any symbol b ∈ S (n,n) may be interpreted as symbol in R d by the identification b(u, u , ξ) ≡ b(u, ξ) for any u ∈ R d . Then, for any l, r > 0 and m ∈ N 0 one has N (n,m,k) (b; l, r) = N (n,k) (b; l, r) .

The Regularized Projection Operator
3.1. Definition and Basic Properties. As previously mentioned, the entropy is computed using the regularized projection operator to the negative frequency space Π − ε . This operator emerges from e −itH kn from Section 2.3.2 by setting t = iε (the "iε"-regularization) and restricting to the negative frequencies. Similar as explained in Section 2.3.3, for operators of this form it suffices to consider the corresponding operator for one angular mode (Π − ε ) kn . So more precisely, for any X ∈ C ∞ 0 (R, C 2 ) the operator (Π − ε ) kn is defined by for any x ∈ R.
Since in this section we focus on one angular mode, we will drop the superscripts kn on the functions X knω a and t knω ab . Moreover, we will sometimes write the ω-dependence of X knω a or t knω ab as an argument, i.e. X knω The asymptotics of the radial solutions at the horizon (Lemma 2.3) yield the following boundedness properties for the functions X ω a : Remark 3.1. Given u 2 ∈ R and a constant C > 0, we consider measurable functions X, Z : R → C 2 with the properties Then the estimate in Lemma 2.3 yields for almost all u, u , ω ∈ R and with constants c, d only depending on k, n and u 2 .
If we assume in addition that φ and ψ are compactly supported, then for any g ∈ is well-defined. Moreover, applying Fubini we may interchange the order of integration arbitrarily.
Furthermore, we will need the following technical Lemma, which tells us that testing with smooth and compactly supported functions suffices to determine if a function is in L 2 and to estimate its L 2 -norm: Lemma 3.2. Let N be a manifold with integration measure µ. Given a function f ∈ L 1 loc (N, C n ) (with n ∈ N), we assume that the corresponding functional on the test functions is bounded with respect to the L 2 -norm, i.e.
Proof. Being bounded, the functional Φ can be extended continuously to L 2 (N, C n ). The Fréchet-Riesz theorem makes it possible to represent this functional by an The fundamental lemma of the calculus of variations (for vector-valued functions on a manifold) yields that f =f almost everywhere.
Now we have all the tools to prove the boundedness of the operator (Π − ε ) kn .
Applying Remark 3.1, we may interchange integrations such that Moreover, from [7, proof of Theorem 3.6] we obtain the estimatê

3.2.
Functional Calculus for H kn . In order to derive some more properties of (Π − ε ) kn we need to employ the functional calculus of H kn , as we want to rewrite (Π − ε ) kn = g(H kn ) , for some suitable function g.
To this end, we construct a specific orthonormal basis of L 2 (R, C 2 ) consisting of C ∞ 0 (R, C 2 )-functions, making it possible to apply the integral representation. Definition 3.4. Let (ξ l ) l∈N ⊆ C ∞ 0 (R, C) be a given sequence of smooth compactly supported functions which is dense in L 2 (R, C) (such a sequence exists as L 2 (R, C) and thus also C ∞ 0 (R, C) are separable metric spaces and because C ∞ 0 (R, C) is dense in L 2 (R, C)). Applying the Gram-Schmidt process with respect to the L 2 (R, C)-norm yields a countable orthonormal functions. Finally, we introduce the notation We can now state the main result of this subsection.
Proposition 3.5. Let g ∈ L 1 (R)∩L ∞ (R) be a real valued function. Then the operator g(H kn ) has the following properties: valid for almost any u ∈ R.
In the next Lemma we will prove property (iii) of Lemma 3.5 for elements of Ξ.
Lemma 3.6. For any X ∈ Ξ and real valued g

5)
Proof. We proceed in two steps.
First step: Proof for g ∈ C ∞ 0 (R): Since the Fourier transform is an automorphism on the Schwartz space, for any g ∈ C ∞ 0 (R) there is a functionĝ ∈ S(R) such that We evaluate the right hand side of (3.4) for X = ξ cl . Note that, when testing this with some Z ∈ C ∞ 0 (R, C 2 ), we may interchange the u-and ω-integrations due to an argument similar as in Remark 3.1 We thus obtain Using the rapid decay ofĝ together with (3.2) (applied to Z and X = ξ cl ), we can make use of the Fubini-Tonelli theorem which leads to It is shown in [7] that Now we can again apply Fubini's theorem due to the rapid decay ofĝ and the boundedness of the operator e −itH kn (which follows from 3.2), leading to Next we use the multiplication operator version of the spectral theorem to rewrite H kn as with a suitable unitary operator U and a Borel function f on the corresponding measure space σ(H kn ), Σ, µ . Then and thus for any X ∈ L 2 (R, C 2 ) and almost any x ∈ A it holds that By linearity we then conclude that for any X ∈ Ξ and Z ∈ C ∞ 0 (R, C 2 ) we have Then Lemma 3.2 (together with similar estimates as before) yields that and therefore almost everywhere.
We can find a sequence of test functions (g n ) n∈N in C ∞ 0 (R) which is uniformly bounded by a constant C > 0 such that 3 g n → g in L 1 (R) and pointwise almost everywhere .
Then with f and U as before (where we applied the spectral theorem to H kn ) we obtain for any X ∈ L 2 (R, C 2 ) Moreover, with the notation ∆g n := g n − g we can estimate for almost all x ∈ σ(H kn ) .
So the function C + g ∞ |U −1 X| ∈ L 2 (A, µ) dominates the sequence of measurable functions M ∆gn•f (U −1 X) n∈N which additionally tends to zero pointwise almost everywhere. Therefore, using Lebesgue's dominated convergence theorem, we conclude that and thus In particular, we conclude that for any X ∈ Ξ and Z ∈ C ∞ 0 (R × S 2 , C 4 ), Next we need to show that the corresponding integral representations converge. To this end, we note that, just as in the first case, we may interchange integrations in the way Now keep in mind that Remark 3.1 also yields the bound 2 a,b=1 which holds uniformly in ω. Using this inequality, we obtain the estimate Combined with (3.6), this finally yields for any We obtain (3.4) just as in the first case using Lemma 3.2. Finally, (3.5) follows by testing with Z and again interchanging the integrals as explained before.
We can now prove Proposition 3.5: Proof of Proposition 3.5. (i) This follows directly from (3.5) together with (3.2), since (ii) Using (3.4), the following computation shows that the operator g(H kn ) is also self-adjoint because for any X, Z ∈ Ξ we have Note that applying Fubini is justified in view of Remark 3.1. From this equation the self-adjointness follows by continuous extension. (iii) Keep in mind that it is a priori not clear if the operator has a similar integral representation for any L 2 -function. However, we can show that this at least holds for any . As g(H kn ) is a bounded operator this then also yields Going over to a subsequence (which we will in an obvious abuse of notation still call X p ), we can assume that these convergences also hold pointwise almost everywhere. Now consider equation (3.7)-(3.8) applied to Z ∈ C ∞ 0 (R, C 2 ) arbitrary and ∆X p := X − X p . Then the expression on the right hand side obviously tends to zero for p → ∞. Due to Lemma 3.2, this implieŝ Combining these estimates, we conclude that for almost any u ∈ R, Now we apply these results to the operator Π − ε : Corollary 3.7. Consider the function then we have (Π − ε ) kn = g(H kn ) Moreover, for η as before we have: Proof. First of all note that (Π − ε ) kn = g(H kn ) , as both operators clearly agree on the dense subset Ξ ⊆ L 2 (R, C 2 ) and are bounded. Equation (3.9) then follows by applying the functional calculus of H kn .

3.3.
Representation as a Pseudo-Differential Operator. For technical reasons we now rewrite Π − ε as pseudo-differential operator of the form The symbol a is a suitable matrix-valued map a : can be extended continuously to all of L 2 (R d , C n ). The parameter d ∈ N can be thought of as the spatial dimension and the parameter n ∈ N as the number of components of the wave function ψ. If the integral representation (3.10) of op α (a) extends to all Schwartz functions, we write Moreover, by Op α (.) or op α (.) restricted to a Borel set U ⊆ R d we mean the operator with the same integral representation, but considered as operator in L(L 2 (U, C n )). This operator may be identified with χ U Op α (a)χ U or χ U op α (a)χ U respectively. Note that, by choosing U = R d , we obtain the non-restricted operators.
In order to conveniently compute the entanglement entropy we will often be interested in the trace of the following operator: where Λ ⊆ R d is some measurable set which will be specified later. Similarly we set The general idea is to rewrite Π − ε in the form of op α (a) and identify α with the inverse regularization constant: with a suitable reference length l 0 .
With the help of (3.3), we obtain for any ψ ∈ C ∞ 0 (R, and some error term R 0 (u, u , ω) related to the error term R 0 (u) in Lemma 2.3. A more detailed computation is given in Appendix B. Moreover, the more precise form of R 0 (u, u , ω) can be found in Section 7.1.3.
In order to bring (Π − ε ) kn in the form of op α (a), we need to rescale the ω-integral by a dimensionless parameter α. As previously mentioned the idea is to set α = l 0 /ε with some reference length l 0 . In Schwarzschild-space the only scaling parameter of the geometry is the mass of the black hole M . Thus we choose as reference length l 0 = M and rescale the ω-integral by α := M ε .
Introducing the notation we thereby obtain and set Note that in the matrix-valued functions a ε and R 0 we use the scaling parameter ε and otherwise α. This is convenient because we will first consider the α → ∞ limit of an operator related to the ε → 0 limit of A ε and then estimate the errors caused by this procedure. In this sense α and ε can at first be considered independent scaling parameters. When considering the limiting case ε → 0 however, one has to keep their relation in mind.

Definition of the Entropy of the Horizon
We now specify what we mean by "entanglement entropy of the horizon". Our staring point is the entropic difference operator from 2.1: where for the area we take an annular regionΛ around the horizon of width ρ, i.e.
see also figure 2. Note that in the Regge-Wheeler coordinates the horizon is located at −∞, so ultimately we want to consider the limit u 0 → −∞ and ρ → ∞.
As explained in Section 2.3.3 we can compute the trace mode wise going over to the subregions Λ: Thus we define the mode-wise entropy of the black hole as where f (α) is a function describing the highest order of divergence in α (we will later see that here f (α) = ln α).
The complete entanglement entropy of the black hole is then the sum over all occupied modes: In order to compute this in more detail, we will prove that where (We will later see that the operators in (4.2) and (4.3) are well-defined and trace class). The notation A 0 is supposed to emphasize the connection to the ε → 0 limit of A ε .
Since A 0 is diagonal the computation of (4.3) is much easier than the one for (4.2). In fact we have with the scalar functions This reduces the computation of (4.3) to a problem for real-valued symbols for which many results are already established.

Properties of Pseudo-Differential Operators
In this section we will establish a few general results for pseudo-differential operators of the form op α (a) as in (3.10). Proof. Note that where M a(./α,α) denotes the multiplication operator by a(./α, α) and F the unitary extension of the Fourier transform on L 2 (R d , C n ) (since (5.1) holds for C ∞ 0 (R d , C n )functions and the right hand side defines a continuous operator on L 2 (R d , C n )). This also shows, that op α (a) is bounded (and therefore well-defined) and self-adjoint. Thus by the multiplicative form of the spectral theorem we have f (op α (a)) = FM f (a(./α,α)) F −1 = op α (f (a)) .
The next lemma will be needed for consistency reasons when taking the limit u 0 → −∞: Lemma 5.2. Let U, V ⊂ R d be arbitrary Borel sets and c ∈ R d an arbitrary vector. For any u, u , ξ ∈ R d , α > 0 we transform a given symbol a by T c (a)(u, u , ξ, α) := a(u + c, u + c, ξ, α) .

Then there is a unitary transformation
In particular this leads to as well as for any q > 0, provided that the corresponding norms/traces are well-defined and finite.
Moreover, assuming in addition that op α (a) is self-adjoint, we conclude that for any Borel function f , with similar consequences for the trace and Schatten-norms.
Proof. We will show that the desired unitary operator is given by the translation operator t c : , (which is obviously unitary). Note that for any Borel set W ⊆ R d and therefore By a change of coordinates we obtain for arbitrary Then the result follows by the unitary invariance of the trace and the Schatten-norms. For (5.2) we make use the multiplication operator version of the spectral theorem. This provides a unitary transformation φ and a suitable function g such that Combined with the previous discussion this implies which is the multiplication operator representation of χ U +c op α T −c (a) χ U +c , because φ t c is also a unitary operator. Therefore du a(u, u , ξ.α) 2 n×n < ∞ , for any u ∈ R d and α > 0 (where . n×n is the ordinary sup-norm on the n × n-matrices). Then the integralrepresentation of op α (a) may be extended to all L 2 (R d , C n )-functions and the u and the ξ integrations may be interchanged. Thus for any ψ ∈ L 2 (R d , C n ), α > 0 and almost any u ∈ R d , the following equations hold, Proof. We first show that, applying the Fubini-Tonelli theorem and Hölder's inequality, the integrations may be interchanged, by estimatinĝ Next, we want to show that we can extend the integral representation to all L 2 (R d , C n )functions, i.e that the above integral indeed corresponds to op α (a)ψ (u). To this end let (ψ n ) n∈N be a sequence of C ∞ 0 (R d , C n )-functions converging to ψ with respect to the L 2 (R d , C n )-norm. Then op α (a)ψ is by definition given by where the convergence is with respect to the L 2 (R d , C n )-norm. However, going over to a subsequence we can assume that this convergence also holds pointwise outside of the null set N ⊆ R d . Thus for any u ∈ U \ N and α > 0 we can compute with ∆ψ n := ψ − ψ n .
Remark 5.4. We want to apply Lemma 5.3 to the operator χ Λ (Π − ε ) kn χ Λ with Λ = (u 0 − ρ, u 0 ). By rescaling as before, we see that this operator is of the form op α (a) (restricted to Λ) with Note that, a-priori, the integral representation of this operator is well-defined only for C ∞ 0 (R) 2 -functions with support in Λ. In order to extend the integral representation to all L 2 (Λ, C 2 )-functions it suffices to verify the condition in Lemma 5.3. To this end we note that, due to Lemma 2.3, for given u 2 > u 0 , we have |X kn(αξ) a,i (u)| < 1 + ce du , for any a ∈ {1, 2}, i ∈ {+, −}, u < u 2 and ξ ∈ R, where the constants c, d > 0 are independent of ξ. Also using that the transmission coefficients t ab are always bounded by 1/2, we obtain the estimate a(u, u', ξ, α) n×n ≤ C e M ξ χ (−∞,0) (ξ) χ Λ (u) χ Λ (u ) for any α > 0 with C independent of u, u and ξ. Thus for any u ∈ Λ and α > 0 we haveˆd Clearly, the entire expression vanishes for u / ∈ Λ. This shows that we can indeed apply Lemma 5.3 to χ Λ (Π − ε ) kn χ Λ , meaning that the corresponding integral representation can be applied to any L 2 (Λ, C 2 ) function, and the u -and ξ-integrations may be interchanged. Moreover, due to the characteristic functions in the symbol, we can even extend the integral representation to all functions in L 2 (R, C 2 ). ♦ Lemma 5.5. Let a(u, u , ξ, α) = a(u, ξ, α) and b(u, u , ξ, α) = b(u , ξ, α) be symbols such that Op α (a) and Op α (b) are well-defined and the following two conditions hold: may be continuously extended to L 2 (R d , C n ).
Proof. We first note that, due to condition (i), as both sides define continuous operators on L 2 (R d , C n ) and agree on the Schwartz functions (where again F is the continuous extension of the Fourier transform to L 2 (R d , C n )). Similarly, we conclude that  for some measurable set U ⊆ R d . Then condition (ii) of Lemma 5.5 is obviously fulfilled, because for any Schwartz function ψ we have (ii) Moreover, in the following, the symbol a is sometimes independent of u and bounded by a constant C > 0, then for any ψ ∈ L 2 (R d , C n ) it follows that Therefore, condition (i) in Lemma 5.5 is also fulfilled. (iii) Another case we will consider later is that a is scalar-valued and continuous with compact support supp a ⊆ B l (v) × B r (µ) .
Then from the following argument we conclude that A also fulfills condition (i) from Lemma 5.5. Take ψ ∈ L 2 (R d ) arbitrary and consider Here we may interchange the order of integration due to the Fubini-Tonelli Theorem sincê where C is a bound for the absolute value of the continuous and compactly supported function a. Note that L 2 (B r (µ), C 2 ) ⊆ L 1 (B r (µ), C 2 ) since B r (µ) is bounded. We then obtain |Aψ(u)| 2 du =ˆdξ ψ(ξ)ˆdξ ψ(ξ )ˆdu e −iu(ξ −ξ) a(u, ξ/α, α) a(u, ξ /α, α) where the functionã is again continuous and compactly supported, which makes the last integral finite. We remark that in the last line we again applied Hölder's inequality. This estimate shows that condition (i) from Lemma 5.5 is again satisfied. ♦

Trace of the Limiting Operator
In this section we will only consider the operator Op α (a 0,1 ) in (4.5). Of course, the same methods apply to Op α (a 0,2 ).  Notation 6.1. In the following it might happen that in the symbol a we can factor out a characteristic function in ξ, i.e. a(ξ, u, u , α) = χ Ω (ξ)ã(ξ, u, u , α) .
In this case, we will sometimes denote the characteristic function in ξ corresponding to the set Ω by I Ω . This is to avoid confusion with the characteristic function χ Λ in the variables u or u . Remark 6.2. Note that, in view of Lemma 5.3, the operator Op α (a 0,1 ) is well-defined when restricted to any bounded subset of R. In particular, χ Λ Op α (a 0,1 )χ Λ is welldefined on all of L 2 (R) and the integral representation also holds on L 2 (R).

Idea for Smooth Functions.
The general idea is to make use of the following one-dimensional result by Widom [30]. Theorem 6.3. Let K, J ⊆ R intervals, f ∈ C ∞ (R) be a smooth function with f (0) = 0 and a ∈ C ∞ (R 2 ) a complex-valued Schwartz function which we identify with the symbol a(u, u , ξ, α) ≡ a(u, ξ) for any u, u , ξ, α ∈ R. Moreover, for any symbol b we denote its symmetric localization by (recall that I J is the characteristic function corresponding to J ⊆ R with respect to the variable ξ). Then where v i are the vertices of K × J (see Figure 3) and U (ã; f ) :=ˆ1 but the results can clearly be transferred using the transformation ξ → −ξ. (ii) Moreover, Widom considers operators of the form Op α (a) whose integral representation extends to all of L 2 (K). We note that, in view of Lemma 5.3, this assumption holds for any operator Op α (a) with Schwartz symbol a = a(u, ξ), even if, a-priori, the integral representation holds only when inserting Schwartz functions. ♦ We want to apply the above theorem with J = (−∞, 0) and K = (u 0 − ρ, u 0 ), where we choose f as a suitable approximation of the function η (again with f (0) = 0) and a an approximation of the diagonal matrix entries a 0,1/2 in (4.4) and (4.5). For ease of notation, we only consider a ≈ a 0,1 , noting that our methods apply similarly to a 0,2 . To be more precise, we first introduce the smooth non-negative cutoff functions Ψ, Φ ∈ C ∞ (R) with and set Φ u 0 (x) := Φ(x − u 0 ). Then we may introduce a as the function For a plot of a see Figure 4. Note that then a is a Schwartz function and Moreover, the resulting symbol clearly fulfills the condition of Lemma 5.3, so we can extend the corresponding integral representation to all L 2 (R, C)-functions. In addition, the operator is self-adjoint, because of Lemma 5.1. This implies that we can leave out the symmetrization in (6.1), i.e.
Furthermore, due to Lemma 5.1, we may pull out any function f as in the above Theorem 6.3 in the sense that where we used that a 0,1 vanishes outside J and that f (0) = 0. In our application the vertices of K × J are (similar as in Figure 3 (B)) given by and thus a(v i ) = 1 , for any i = 1, 2 , leading to Note that, using Lemma 5.2 and the fact that a 0,1 does not depend on u or u , the O(1)-term does not change when varying u 0 , and therefore the result stays same when we take the limit u 0 → −∞. We need to keep this in mind because we shall take the limit u 0 → −∞ before the limit α → ∞ (cf. (4.1)).

6.2.
A Few More Technical Results. In order to estimate the errors in Theorem 6.3 caused by using a smooth approximation of η, we need a few more technical results. The first lemma shows that Op α (b) is bounded with respect to the operator norm uniformly in α as long as b ∈ S (n,m,k) (R d ).
Finally, making use of the fact that Op α (χ (−∞,0) ) is a projection operator (which follows from Lemma 5.1), we conclude that  Let q, r > 0 parameters, n ≥ 2 a natural number and f ∈ C n 0 (−r, r). Let S ⊂ L(H) (with a Hilbert space H) be a q-normed ideal such that there is σ ∈ (0, 1] with Moreover, consider a self-adjoint operator A on D(A) ⊆ H and a projection operator P such that P D(A) ⊆ D(A) and |P A(1 − P )| σ ∈ S and P A(1 − P ) extends to a bounded operator. Then with a constant C independent of A, P and f .
In order to also estimate the error caused by a non-differentiable function f in Theorem 6.3, we need the next lemma.
Let S, A and P as in Lemma 6.7 with σ < γ, then S . Example 6.9. Ultimately we want to apply Lemma 6.8 to η (times a cutoff function). Therefore we introduce the function for any x ∈ R , with a smooth non-negative function Ψ 1 such that . Note that f then satisfies the conditions of Lemma 6.8 for any n ∈ N, γ < 1 arbitrary, R = 3/4 and x 0 = 0 .
This gives an idea how Lemma 6.8 can be used to estimate the error in Theorem 6.3 caused by functions like η, which are not differentiable everywhere. ♦ Note that later we want to apply the previous lemmas to A = Op α (a 0,1 ) and P = χ Λ . Moreover, in what follows we will often use the notation P Ω,α := Op α (χ Ω ) for some measurable Ω ⊆ R d , which emphasizes that this is a projection operator (this and that it is well-defined follows from Lemma 5.1).
In order to prove this Lemma we need to make use of Theorem 3.2, Theorem 4.2 and Corollary 4.7 from [25]. Therefore, we first state them (applied to the cases needed): Theorem 6.11. [25, Theorem 3.2, (3.5)] (case t = 0, d = 1) Let q ∈ (0, 1] and h 1 and h 2 be two L ∞ (R)-functions whose supports have distance at least R. Moreover, we choose a symbol b ∈ S n,m (R) for m ≥ n with supp b ⊆ B l (v) × B r (µ) for some v, µ ∈ R. Then, for any choice of parameters q ∈ [0, 1), αlr ≥ l 0 > 0 and R ≥ l, the following inequality holds,  For any two open bounded intervals K, J as well as numbers q ∈ (0, 1] and α ≥ 2, the following inequality holds, In preparation of the proof of Lemma 6.10 we will first prove the following modified version of Theorem 6.12. Lemma 6.14. Let Λ = (u 0 − ρ, u 0 ) as before, q = (0, 1], αlr ≥ l 0 ≥ 1 with 0 < l ≤ ρ, n = q −1 + 1 and m = 2q −1 + 1. Let again b ∈ S (n,m) (R) be a symbol with support contained in B l (v) × B r (µ) for some v, µ ∈ R. Then with C q,m,l 0 independent of u 0 and ρ.
Proof. We first note that, due to Lemma 5.2, We take the q th power and apply the q-triangle inequality to obtain with a constant C > 0 depending only on q. Using again Lemma 5.2, we obtain Using this relation, we can apply Lemma 6.12 to the first and the third term in (6.5), Note that, by definition of N (n,m) , we know that, for any u ∈ R, N (n,m) (T u (b); l, r) = N (n,m) (b; l, r) , which yields that For the second and fourth term, we apply Theorem 6.11, noting that the numbers n, m in the claim clearly satisfy the corresponding conditions in this theorem. Using the inequalities 1 q − m ≤ 0 and αrρ ≥ 1, we obtain the estimate with a constantC q,m,l 0 independent of u 0 and ρ. Putting all the estimates together, the claim follows.
After these preparations, we can now prove Lemma 6.10: Proof of Lemma 6.10. Without loss of generality we can assume that 0 ∈ B r (µ), because otherwise the product b(u, ξ)χ (−∞,0) (ξ) is either zero or equal to b(u, ξ), and the claim follows immediately from Lemma 6.14.
In total, we obtain This concludes the proof.
4 Note that for any A ∈ Sq we can write A q = ( i √ λi q ) 1/q where the λi are the eigenvalues of A * A (note that since A is compact, there are countably many). Moreover, for any eigenvector ψ of A * A corresponding to a non-zero eigenvalue, the vector Aψ is an eigenvector of AA * with the same eigenvalue, so we see that A * q ≤ A q (as ψ might lie in the kernel of A). Then, symmetry yields the equality.
Moreover, we note that for any j ≤ 2 and numbers n, m ∈ N 0 andl,r > 0, N (n,m) (a j ;l,r) ≤ C M,n,m,l,r e j/M .
On the other hand, in the case j > 2 we have a j ≡ 0 and thus N (n,m) (a j ;l,r) = 0 .
and g(0) = g(1) = 0 . Then We are mainly interested in the case g = η. Therefore, before entering the proof of this theorem, we now verify in detail that the function η satisfies all the conditions of Theorem 6.16. Lemma 6.17. Consider the function η in (2.1). Then and for any γ < 1, x ∈ R, k = 0, 1, 2 and z ∈ {0, 1} there are neighborhoods around z and constants C z,k > 0 such that: (where "l H" denotes the use of L'Hôpital's rule) and Moreover, for any x ∈ (0, 1) we have Thus, for any γ < 1 and obviously Therefore, there exists a neighborhood U 0,0 of z = 0 and a constant C 0,0 such that for any x ∈ U 0,0 , Similarly, yielding a neighborhood U 1,0 of z = 1 and a constant C 1,0 such that for any x ∈ U 1,0 : The other estimates follow analogously by computing the limits This concludes the proof.
Proof of Theorem 6.16. Before beginning, we note that the u 0 -limit in (6.9) may be disregarded, because the symbol is translation invariant in position space (see Lemma 5.2, noting that a 0,1 does not depend on u or u ). The remainder of the proof is based on the idea of the proof of [26, Theorem 4.4] Let a be the symbol in (6.3). By Lemma 6.5, we can assume that the operator norm of Op α (a) is uniformly bounded in α. We want to apply Lemma 5.5 with a as in (6.3) and b = I Ω . In order to verify the conditions of this lemma, we first note that, Remark 5.6 (i) yields condition (ii), whereas condition (i) follows from the estimatê , (which holds for any ψ ∈ L 2 (R)). Now Lemma 5.5 yields Op α (I Ω a) = Op α (a) P α,Ω .
Since P α,Ω is a projection operator, we see that Op α (I Ω a) ∞ ≤ Op α (a) ∞ for all α. In particular, the operator Op α (I Ω a) is bounded uniformly in α. Hence, uniformly in α (recall that A(a) is the symmetric localization from Theorem 6.3). Moreover, the sup-norm of the symbol a 0,1 itself is bounded by a constant C 2 . We conclude that we only need to consider the function g on the interval Therefore, we may assume that with C := max{C 1 , C 2 } + 1 , possibly replacing g by the functiong Figure 5. Visualization of the cutoffs and approximations in the first step of the proof of Theorem 6.16 for C = 2. We start with a function g, which is first multiplied by the cutoff-function Ψ C , givingg. This function is then approximated by a polynomial g δ . Multiplying g δ by the cutoff function Ψ C results in a function which is here calledg δ (but does not directly appear in the proof). The functionf δ is then given by the difference betweeng andg δ . with a smooth cutoff function Ψ C ≥ 0 such that Ψ C | [−C+1,C−1] ≡ 1 and supp Ψ C ⊆ [−C, C]. For ease of notation, we will write g ≡g in what follows.
We remark that the function η which we plan to consider later already satisfies this property by definition with C = 2. From Lemma 6.8 and Lemma 6.15 we see that D α (g, Λ, a 0,1 ) is indeed trace class. We now compute this trace, proceeding in two steps.

First
Step: Proof for g ∈ C 2 (R). To this end, we first apply the Weierstrass approximation theorem as given in [19,Theorem 1.6.2] to obtain a polynomial g δ such that f δ := g − g δ fulfills In order to control the error of the polynomial approximation, we apply Lemma 6.7 with n = 2, r = C, some σ ∈ (0, 1), q = 1 and A = Op α (a 0,1 ) , P = χ Λ , g =f δ := f δ Ψ C (note that here g is the function in Lemma 6.7) where Ψ C is the cutoff function from before (the cutoffs and approximation are visualized in Figure 5). This gives In order to further estimate the last norm, recall that by definition of the symbol a we have Moreover, using Lemma 5.5 we obtain Therefore, also applying Lemma 6.15 with q = σ, we conclude that for α large enough σ σ ≤ c σ,M,ρ,C δ ln α , (with a constant c q,M,ρ,C depending on σ, M, ρ and C). Using this inequality, we can estimate the trace by In order to compute the remaining trace, we can again apply Theorem 6.3 (exactly as in the example (6.4)). This gives tr D α (g δ , Λ, a 0,1 ) = 1 2π 2 ln(α) U (1; g δ ) + O(1) , and thus tr D α (g, Λ, a 0,1 ) ≤ 1 2π 2 ln(α) U (1; g δ ) + c σ,M,ρ,C δ ln α + O(1) , which yields lim sup α→∞ 1 ln α tr D α (g, Λ, a 0,1 ) ≤ 1 2π 2 U (1; g δ ) + c σ,M,ρ,C δ . (6.10) In the next step we want to take the limit δ → 0. To this end, we need to analyze how the quantity U (1; g δ ) behaves in this limit. In particular, we want to show that lim δ→0 U (1; g δ ) = U (1; g) .

Thus we begin by estimating
.

Second
Step: Proof for g as in claim. By choosing a suitable partition of unity and making use of linearity, it suffices to consider the case X = {z} meaning that g is non-differentiable only at one point z.
and writing g = g R + g see also Figure 6. Note that the derivatives of g R satisfy the bounds (with some numerical constants c(n, k)) and therefore the norm . 2 in Lemma 6.8 can be estimated by Noting that on the support of g R 2 ≤ c g 2 with c independent of R (also note that g 2 is bounded by assumption).
For what follows, it is also useful to keep in mind that R (1) . Now we apply (6.11) to the function g (2) R (which clearly is in C (2) (R)), R , Λ, a 0,1 ) = U (1; g R ) .
Just as before, it follows that lim sup α→∞ 1 ln α tr D α (g, Λ, a 0,1 ) ≤ U 1; g The end result follows just as before by taking the limit R → 0, provided that we can show the convergence U (1; g To this end we need to estimate the function .
We first note that, unless z = 0 or z = 1, both integrals are bounded by for R sufficiently small (more precisely, so small that g R vanishes in neighborhoods around 0 and 1; note that the integrand is supported in [z − R, z + R] and bounded uniformly in R). These estimates show that lim R→0 U (1; g − g (2) R ) = 0 in the case that z is neither 0 nor 1. It remains to consider the cases z = 0 and z = 1: 5 (a) Case z = 0: First note that |g (1) R 2 |t| γ ≤ c g 2 |t| γ , where c is again independent of R. This yields for R ≤ 1/2, which vanishes in the limit R → 0. Moreover, the integral (ii) vanishes for R < 1/2. (b) Case z = 1: Similarly as in the previous case, we now have Moreover, just as in the previous case, one can estimate |g (1) 5 Interestingly, these are the two points where the function η is singular; therefore this step is crucial for us.
for any t ∈ (1/2, 1) and with c independent of R. This yields for R ≤ 1/2 which again vanishes for R → 0. This concludes the proof.
We finally apply Theorem 6.16 to the function η and the matrix-valued symbol A 0 (see (2.1) and (4.4)).  This gives the result.
Corollary 6.18 already looks quite similar to Theorem 1.1. The remaining task is to show equality in (4.3). To this end, we need to show that all the correction terms drop out in the limits u 0 → ∞ and α → ∞. The next section is devoted to this task.
In preparation, translate Λ to Λ 0 with the help of the unitary operator T u 0 making use of Lemma 5.2. Moreover, we use that the operators (Π − ε ) kn and Op α (A 0 ) are self-adjoint. We thus obtain where A 0 is the kernel of the limiting operator from (4.4). Now we can estimate In the following we will estimate the expressions (I) and (II) separately.
7.1. Estimate of the Error Term (I). We begin with the following simplified result from [26], which is related to Lemma 6.8.
with some x 0 ∈ R and n ≥ 2 be a function such that supp f = [x 0 − R, x 0 + R] for some R > 0 and with some γ > 0. Let S be a q-normed ideal of compact operators on H such that there is a σ ∈ (0, 1] with σ < γ and Let A, B be two bounded self-adjoint operators on H. Suppose that |A − B| σ ∈ S, then with a positive constant C n independent of A, B, f and R.
In order to apply this theorem to the function η, as in the proof of Theorem 6.16 we use a partition of unity. Similar as in Lemma 6.17, we need to choose γ < 1. This gives rise to the constraint σ < 1 . With this in mind, we choose σ as a fixed number smaller than but close to one. In particular, it is useful to choose sigma as some fixed number in the range σ ∈ 2 3 , 1 .
)χ Λ 0 (which are clearly bounded and self-adjoint) we obtain with C independent of A and B (and thus in particular independent of u 0 and α).
Note that the symbol of the Op α (.) in (7.1) is matrix-valued. Since most estimates for such operators have been carried out for scalar-valued symbols, we now show how to reduce our case to a problem with scalar-valued symbols. To this end, we use the σ-triangle inequality of . σ in the following way, with a constant C > 0 only depending on σ. For the next step, recall that for any operator A its Schatten-norm A σ only depends on the singular values of A * A. Considering a matrix A with only one non-zero entry a, the non-zero singular values of A * A are the same as those of a * a. Applying this to the estimate from before, we obtain with scalar-valued symbols ∆(a ij ) u 0 . We now proceed by estimating the Schatten norms of the operators 7.1.1. Error Terms with Small Support. We use this methods for terms which do not depend on u and u and which in ξ are supported in a small neighborhood of the origin. More precisely, these terms are of the form Since these operators are translation invariant, we do not need to apply the translation operator T u 0 . This also shows that the error corresponding to these terms can be estimated independent of u 0 . For the estimate we will use the following result from [24]: For σ ∈ (0, 2) and g ∈ L 2 loc (R) set Then, given functions a, h ∈ L 2 loc (R) with |a| σ , |h| σ < ∞, it follows that h Op α (a) ∈ S σ with h Op 1 (a) σ ≤ C |h| σ |a| σ .
As an example, consider h := χ Λ , which are both in L 2 loc (R) because |f + 0,1 | is bounded. Moreover, applying a coordinate change we obtain: which is again in L 2 loc (R) for the same reasons asã. Now we apply Proposition 7.2 to obtain where we used property (4) in Definition 2.4. Next, noting that Λ 0 = (−ρ, 0) ⊆ (− ρ , 0), it follows that Similarly, since |a α (ω)| is bounded by one, Combining the last two inequalities, we conclude that Completely similar forb The estimates (7.6) and (7.7) show that the error terms with small support are bounded uniformly in u 0 and α. Therefore, dividing by ln α and taking the limit α → ∞, these error terms drop out. 7.1.2. Rapidly Oscillating Error Terms. After translating the symbol by u 0 , these error terms are of the form for some functions g,g which are measurable and bounded. They appear in ∆(a 12 ) u 0 and ∆(a 21 ) u 0 . For simplicity, we restrict attention to the symbols of the form a, but all estimates work the same forã in the same way. We will make use of the following results from [5, Theorem 4 on page 273 and p. 254, 263, 273], which adapted and applied to our case of interest can be stated as follows.
We want to apply this theorem in the case p = σ, where σ is some (arbitrarily large) number smaller than 1, so it suffices to take l = 1. Moreover, in view of Lemma 5.3, the integral representation corresponding to χ Λ 0 Op α (a)χ Λ 0 may be extended to all of L 2 (R) and we may interchange the dξ and du integrations. Thus we need to estimate the norm θ 2 of the kernel of this operator. Thus we consider kernels of the form 2πˆ∞ 0 e iω(u+u +u 0 ) e −εω g(ω) dω .
and f ± as in (A.1). Using this asymptotics in the integral representation, we conclude that the error terms in (7.2)-(7.5) take the form In order to estimate these terms, the idea is to apply Theorem 7.3 (as well as the σ-triangle inequality) to each of these terms (with u and u shifted by u 0 ) and then take the limit u 0 → −∞. We will do this for the first few terms explicitly, noting that the other terms can be estimated similarly.
By Lemma 5.3, the corresponding kernel can be extended to all of L 2 (R), since R + is bounded uniformly in ω when restricted to the compact interval Λ 0 due to Lemma 2.3, and the e M ξ -factor provides exponential decay in ω. Moreover, Lemma 5.3 again implies that we may interchange the dξ and du integrations. In order to estimate the corresponding error term, we apply Theorem 7.3 to the kernel k u 0 ,α (u, u ) := 1 2πˆ0 −∞ e −iω(u−u ) e εω t ab (ω) f + 0,a (ω) R +,b (u + u 0 , ω) dω (note that we could leave out the χ Λ 0 -functions because in Theorem 7.3 we consider the operator on L 2 (Λ 0 ); moreover, we rescaled back as before). This kernel is differentiable for similar reasons as before and d du k u 0 ,α (u, u ) := 1 2πˆ0 −∞ (−iω) e −iω(u−u ) e εω t ab (ω) f + 0,a (ω) R +,b (u + u 0 , ω) dω .
All the other error terms contributing to r ij can be treated in the same way: The absolute value of the corresponding kernels (and their first derivatives) can always be estimated by a factor continuous in u and u times a factor exponentially decaying in u 0 like e du 0 . This makes it possible to estimate θ 2 by a function which decays exponentially as u 0 → −∞.
The terms e −εω ω k with k = 1, 2 are clearly bounded and in L 1 (R). Moreover, showing that as ω → ∞ those terms decay like e −εω . Therefore, these terms are also bounded and in L 1 (R).

Proof of the Main Result
We can now prove our main result.
Proof of Theorem 1.1. Having estimated all the error terms in trace norm and knowing that the limiting operator is trace class (see the proof of Theorem 6.16), we conclude that the operator is trace class. Moreover, we saw that all the error terms vanish after dividing by ln α and taking the limits u 0 → −∞ and α → ∞ (in this order). We thus obtain Recalling that α = M/ε yields the claim.

Conclusions and Outlook
To summarize this article, we introduced the fermionic entanglement entropy of a Schwarzschild black hole horizon based on the Dirac propagator as We have shown that we may treat each angular mode separately. This transition enables us to disregard the angular coordinates, which makes the problem essentially one-dimensional in space. Furthermore, in the limiting case we were able to replace the symbol of the corresponding pseudo-differential operator by A 0 in (4.4). Since this symbol is diagonal matrix-valued, this reduces the problem to one spin dimension. Moreover, because A 0 is also independent of ε, the trace with the replaced symbol can be computed explicitly. It turns out to be a numerical constant independent of the considered angular mode. This leads us to the conclusion that the fermionic entaglement entropy of the horizon is proportional to the number of angular modes occupied at the horizon, S BH = (k,n) occupied S kn = 1 6 # (k, n) angular mode (k, n) occupied .
This is comparable to the counting of states in in string theory [27] and loop quantum gravity [1]. Furthermore, assuming that there is a minimal area of order ε 2 , the number of occupied modes at the horizon were given by M 2 /ε 2 , which would lead to Bringing the factor ln(M/ε) in (9.1) to the other side, this would mean that, up to lower oders in ε −1 , we would obtain the enhanced area law (k,n) occupied M 2 ε 2 ln(M/ε) + o M 2 /ε 2 ln(M/ε) . An interesting topic for future research would be to determine the number of occupied anuglar momentum modes at the horizon in more detail, for example by considering a collapse model. order to circumvent this issue, we can use the freedom of coordinate change ω → −ω in the dω integration of the (2,2) and (1,2) components. This yields (3.11).