1 Introduction

A problem of both theoretical and practical interest in quantum information theory is to assess the “complexity” of a quantum state- or operation. A natural approach is to take as a measure of complexity the minimum number of operations from an underlying set considered as “basic” [1,2,3]. Typical results in this context include bounds on the growth of complexity under time evolution, see e.g. [4, 5]. There are also proposals in the context of the AdS-CFT correspondence, linking the growth of complexity of a state in the boundary quantum field theory (QFT) to various geometric quantities in the bulk, see e.g. [6,7,8].

One may ask how to define a notion of complexity directly in a relativistic continuum QFT without reference to holographic ideas. In QFT, one faces the immediate problem to identify a suitable set of basic operations relative to which the complexity of a composite operation is to be assessed. If one wants to maintain a close analogy to ideas such as [4, 5], it appears that one would have to specify a preferred set of local quantum field operators, possibly within some lattice regularization of the theory. For instance for Gaussian field theories, concrete proposals include e.g. [9, 10], whereas e.g. [11,12,13] emphasize the role of symmetry operations – especially in theories with very large symmetry algebras. For sufficiently generic QFTs it seems to us that both the operator basis and/or lattice regularization would be highly non-unique.

One approach to this issue is to take a broader view of the problem, departing from the notion of basic operation and focussing attention instead on a suitable notion of “distance”, \(D(S\Vert T)\), between two channels ST. Complexity would then be defined as \(c(T)=D(id\Vert T)\), the distance to the trivial (identity) channel. Of course, one would like specific properties from D to connect to the idea of complexity. Natural requirements would be:

  • Subadditivity: \(c(T_1 \circ T_2) \le c(T_1) + c(T_2)\) expressing that the complexity of a composite channel is not bigger than the sum of its parts. In particluar, for a 1-parameter (Markov-) semi-group \(T_t, t \ge 0\) of channels, we automatically get at most linear growth in time, \(c(T_{Nt_0}) \le C_0 N\).

  • Locality: \(c(T_1 \circ T_2) = c(T_1) + c(T_2)\) if \(T_1\) and \(T_2\) are localized in spacelike related parts of the system. This expresses that the complexity c respects Einstein causality/locality.

  • Convexity: Thinking about performing operations \(T_i\) randomly with some probabilities \(p_i\) it is natural to ask that \(c(\sum p_i T_i) \le \sum p_i c(T_i)\).

One way to obtain a notion of channel divergence, hence c, is to start with a corresponding divergence \(D(\varphi \Vert \psi )\) in the ordinary sense (see e.g. [14]) between quantum statesFootnote 1\(\varphi , \psi \), by considering how much the actions of T and S on a state can deviate as quantified by this divergence. With this idea in mind, a naive guess might be to consider \(\sup _{\psi } D(\psi \circ S \Vert \psi \circ T)\), where the maximization is over all normalized states of the system and \(\psi \circ T\) is the action of the channel on the state \(\psi \), viewed in this paper as an expectation functional on observables, see footnote 1. However, as is well-known, this notion is actually inadequate for quantum systems because one can obtain refined information about the action of channels by coupling the systems in question to an ancillary system and considering states that have a suitably engineered entanglement between the original system and the ancillary system. So one should define insteadFootnote 2

$$\begin{aligned} D(S\Vert T) = \sup _{\psi , {\mathcal {A}}} D(\psi \circ (S \otimes id_{\mathcal {A}}) \Vert \psi \circ (T \otimes id_{\mathcal {A}})) \end{aligned}$$
(1)

where the maximization is now over all states of the original observable algebra \({\mathcal {M}}\) tensored with the ancillary algebra \({\mathcal {A}}\). This is the definition that we shall also adopt, up to some technical caveats related to the fact that we will be dealing with von Neumann algebras of a sufficiently general type as appropriate for QFT: in such a setting, it is most natural to model the ancillary system by another von Neumann algebra, \({\mathcal {A}}\), and it does not seem natural to restrict the nature of that system, e.g. by imposing that \({\mathcal {A}}\) should have a particular type such as I\(_n\). Then we must also have an enlarged Hilbert space on which both the original von Neumann algebras as well as the ancillary algebra \({\mathcal {A}}\) acts, i.e. we must consider bi-modules of von Neumann algebras, see e.g. [16, 17] and references therein.

Of course the main question is what D we should start from. One possibility might be a geometric approach along the lines of [2, 3]. In [18], on the other hand, the authors propose to use a particular quantum version [19] of the classical “Wasserstein-distance”, see e.g. [20], and derive several convincing properties of the corresponding notion of complexity, including the ones listed above. The quantum Wasserstein distance as defined by [19] is for finite dimensional systems with Hilbert space of the form \(({\mathbb {C}}^d)^{\otimes N}\). While it may be possible to generalize it to von Neumann algebras of type III appearing in QFT [21], we proceed differently here and work with the so-called Belavkin–Staszewski (BS) divergence [22] \(D_{BS}\). That divergence has been consideredFootnote 3 recently in the context of channel discrimination by [24] and is

$$\begin{aligned} D_{BS}(\varphi \Vert \psi ) = \textrm{Tr}(\rho _\varphi ^{} \log [\rho _\varphi ^{1/2} \rho _\psi ^{-1} \rho _\varphi ^{1/2}]) \end{aligned}$$
(2)

for matrix algebras. Here, \(\rho _\psi \) is the density matrix representing the expectation functional \(\psi \), i.e. \(\psi (m) = \textrm{Tr}(m\rho _\psi )\) for all \(m\in {\mathcal {M}}\). A generalization to arbitrary von Neumann algebras is possible [25, 26]. Our reason for considering \(D_{BS}\) is that [27] (see also [28]) have shown in the finite dimensional setting that it gives rise to a channel divergence with a subadditivity property under composition and an additivity property under the tensor product of channels.Footnote 4 This would not be the case for other well-known divergences such as, say, the more commonly used Araki-Umegaki relative entropy [29]. The classic works [30, 31], while somewhat similar in spirit to ours, use that relative entropy to define an entropy-like quantity for the inclusion channel \(\iota : {\mathcal {N}}\rightarrow {\mathcal {M}}\) between two von Neumann algebras possessing a corresponding conditional expectation \(E: {\mathcal {M}}\rightarrow {\mathcal {N}}\). Though this quantity has many useful properties, it does not therefore seem to have the above-mentioned (sub-)additivity properties (at least not in total generality, see Remark 4.5 of [31]).

In this work, we analyze the channel divergence based on the BS divergence in the context of general von Neumann algebras, and prove that the corresponding notion of complexity c has the above properties in QFT. While we have to leave for future investigations the question how our c is related to conventional notions related to computational cost à la [2, 3], or even to holographic proposals such as [6,7,8], we prove a number of further properties of our complexity c:

  1. 1.

    If \(T(a) = uau^*\) is the channel corresponding to a non-trivial local unitary, then \(c(T)=\infty \).

  2. 2.

    If \(T(a) = \rho (a)\) is the channel corresponding to a non-trivial representation of the QFT (“charge superselection sector”), then \(c(\rho ) = \infty \).

  3. 3.

    If \(M(a) = \sum _{i=1}^N e_i a e_i\) is the channel corresponding to a local N-ary von Neumann measurement, then \(c(M) = \log N\).

  4. 4.

    Let \(E_\rho \) be the (minimal) conditional expectation from \({\mathcal {A}}(O)\) to \(\rho ({\mathcal {A}}(O))\) where \(\rho \) is a charge superselection sector (charged representation), then

    $$\begin{aligned} c(E_\rho ) = \log (\text {Jones Index}) = \log d_\rho ^2, \end{aligned}$$
    (3)

    where we mean the Jones index [32] of the inclusion \(\rho ({\mathcal {A}}(O)) \subset {\mathcal {A}}(O)\), and where \(d_\rho \) is the statistical (or “quantum-”) dimension of the sector.Footnote 5

Items (1), (2) are basically negative results, but perhaps not totally unreasonable if we remember that any local operation in a continuum QFT (i.e. an operation in a finite spacetime region) must still involve an infinite number of degrees of freedom. The channels in items (3), (4) are conditional expectations. This suggest that these are to be regarded as the basic operations in QFT.

Particular measurements in (3) implementing the idea of “setting individual q-bits” can be constructed trivially as follows. Imagine the QFT has a “basic” real scalar field \(\phi \) and consider a cube of side-length \(\delta \) in a time slice. Let f be a non-negative testfunction supported in the cube, let \(S= \int \phi (0,\textbf{x}) f(\textbf{x}) d^{n-1}{} \textbf{x}\), where n is the dimension of spacetime, and let \(p_\pm \) be the projectors corresponding to a positive/negative measurement of S. Shifting the cube periodically in the \(n-1\) spatial directions we can obtain a finite lattice \(\Lambda \) with corresponding projections \(p_{\textbf{x},\sigma }, \sigma = \pm , \textbf{x} \in \Lambda \) associated with each point \(\textbf{x}\) of the (dual) lattice. Then we can define projections \(e(\{ \sigma \}) = \prod _{\textbf{x} \in \Lambda } p_{\textbf{x},\sigma (\textbf{x})}\), each corresponding to measuring a particular lattice configuration \(\{ \sigma \}\), e.g.

figure a

The complexity of the corresponding measurement channel is clearly

$$\begin{aligned} c(M) = \frac{\textrm{vol}(\Lambda )}{\delta ^{n-1}} \log 2 \end{aligned}$$
(4)

As an example of item (4), consider the QFT of a real N-component free complex Klein-Gordon quantum field \(\phi _I(x), I=1, \dots , N\). We consider as observables the SU(N) singlet operators (gauge invariant observables) under the SU(N)-symmetry. Consider a state \(\Psi \) in the Hilbert space which is in some non-trivial representation R of SU(N). Then \(\Psi \) cannot be generated from the vacuum \(\Omega \) by the action of any charge neutral operator a, so the representation of charge neutral operators built on \(\Psi \) is not unitarily equivalent to the vacuum representation. In fact, by DHR theory [34, 35], there exists an endomorphism \(\rho \) of the local algebra generated by SU(N) singlet operators such that

$$\begin{aligned} \langle \Omega , \rho (a) \Omega \rangle = \langle \Psi , a \Psi \rangle \quad \text {for all }SU(N)\text { singlet operators }a, \end{aligned}$$
(5)

and \(\rho \) implements the charged sector with representation R. The statistical dimension \(d_\rho \) of this \(\rho \) equals the dimension \(d_R\) of the representation R in this case, e.g. \(d_\rho = N^2-1\) if R is the adjoint representation. Details of this construction are given in Example 4.10 below. For low dimensional QFTs, \(d_\rho \) does not have to be integer.

In fact, the Jones index in (4) (\(=d_\rho ^2\)) is restricted to the set \(\{ 4\cos ^2(\pi /n): n=3,4,5,\dots \} \cup [4,\infty ]\) by Jones’ theorem [32], the smallest non-trivial value of which is 2, realized e.g. by the sector \(\rho \) of the (4, 3) minimal (Ising) model with quantum dimension \(d_\rho =\sqrt{2}\). We conjecture that for any localized channel TFootnote 6

$$\begin{aligned} \text {Either} \quad c(T) \ge \log 2 \quad \text {or} \quad T=id \quad (\text {conjecture}), \end{aligned}$$
(6)

which is reminiscent of the Landauer bound [17, 36].

This paper is organized as follows. In Sect. 2, we first recall the theory of f-divergences and operator means for states on von Neumann algebras and introduce our main technical tool, a variational characterization of \(D_{BS}\) (Proposition 2.17). In Sect. 3 we introduce \(D_{BS}\) for channels of von Neumann algebras of general type, and prove some basic properties. In Sect. 4, we apply these results to QFT.

2 Preliminaries

2.1 Von Neumann algebra terminology and basic objects

See e.g. [37] as a general reference.

  • Von Neumann algebra: A von Neumann algebra \({\mathcal {M}}\) is a closed \(*-\)subalgebra of the algebra of bounded operators \(B({\mathscr {H}})\) on a Hilbert space \({\mathscr {H}}\) in the weak operator topology. The weak topology is defined by the matrix elements, i.e. the open neighborhoods are \(N(\xi _i,\eta _i,\varepsilon ,a) = \{b: \ |\langle \xi _i, (b-a) \eta _i \rangle |<\varepsilon , i=1, \dots , n\}\), where \(a \in B({\mathscr {H}}), \xi _i,\eta _i \in {\mathscr {H}}, \varepsilon >0\). All Hilbert spaces appearing in this paper are assumed to be separable. The squared norm \(\Vert m\Vert ^2\) of an operator \(m \in {\mathcal {M}}\) is defined to be the supremum of the spectrum \(\sigma (mm^*)\) of the positive operator \(mm^*\). The subset of all such operators is denoted by \({\mathcal {M}}_+\) (positive part).

  • Any finite-dimensional von Neumann algebra is isomorphic to \(\oplus _{i=1}^N M_{n_i}({\mathbb {C}})\) for some \(n_i\), where \(M_n({\mathbb {C}})\) is the algebra of complex \(n \times n\) matrices.

  • (Bi-)Commutant: An equivalent characterization of von Neumann algebra is \({\mathcal {M}}''={\mathcal {M}}\), where \({\mathcal {M}}':=\left\{ x \in B({\mathscr {H}}) \ | \ xm=mx \ \forall m \in {\mathcal {M}}\right\} \) is the commutant of a \(*-\)algebra \({\mathcal {M}}\) in \(B({\mathscr {H}})\), and \({\mathcal {M}}''=({\mathcal {M}}')'\) is the bicommutant. A von Neumann algebra is called a factor if \({\mathcal {M}}\cap {\mathcal {M}}' = {\mathbb {C}}1\). One denotes by \({\mathcal {A}}\vee {\mathcal {B}}= ({\mathcal {A}}\cup {\mathcal {B}})''\) the von Neumann algebra generated by \(*-\)algebras of bounded operators \({\mathcal {A}},{\mathcal {B}}\).

  • States: A state is a linear, positive, normal, normalized functional \(\psi :{\mathcal {M}}\rightarrow {\mathbb {C}}\), where positive means \(\psi (mm^*) \ge 0\) for all \(m \in {\mathcal {M}}\) and normalized means \(\psi (1)=1\). A linear functional \(\psi \) is called normal if it is ultra-weakly continuous, and a positive linear functional \(\psi \) is called faithful if \(\psi (mm^*)=0 \Longrightarrow m=0\). The set of normal states is also denoted by \({\mathcal {M}}_{*,+}\). The existence of a normal faithful positive linear functional is guaranteed since we are assuming that \({\mathscr {H}}\) is separable. On a matrix algebra every state is of the form

    $$\begin{aligned} \psi (m) = \textrm{Tr}(m\rho _\psi ) \end{aligned}$$
    (7)

    for a unique density matrix \(\rho _\psi \).

  • Channels: Generalizing the notion of state, a channel \(T:{\mathcal {M}}\rightarrow {\mathcal {N}}\) is a normal, positive, unital (meaning \(T(1)=1\)) linear map which is also completely positive, meaning that \(T \otimes id: {\mathcal {M}}\odot B({\mathscr {K}}) \rightarrow {\mathcal {N}}\odot B({\mathscr {K}})\) is positive, where \(\odot \) means the algebraic tensor product and where \({\mathscr {K}}\) is any Hilbert space. If \(\psi \) is a state on \({\mathcal {N}}\), then \(\psi \circ T(m):=\psi (T(m))\) is a state on \({\mathcal {M}}\), and \(\psi \mapsto \psi \circ T\) corresponds to the dual action of channels on states (Schrödinger picture). In much of the quantum information theory literature, the Schrödinger picture is considered, but of course this is just a matter of convention. For finite dimensional von Neumann algebras \({\mathcal {N}},{\mathcal {M}}\), the action of T in the Schrödinger picture may also be thought of as an action \(T^+\) on density matrices (7),

    $$\begin{aligned} \textrm{Tr}(T^+(\rho _\psi ) m) = \textrm{Tr}(\rho _\psi T(m)) \equiv \psi \circ T(m) \quad m \in {\mathcal {M}}. \end{aligned}$$
    (8)

    Then \(T^+\) is completely positive and trace preserving (corresponding to \(T(1)=1\)).

  • Standard form: A vector \(\Omega \in {\mathscr {H}}\) is called cyclic if \({\mathcal {M}}\Omega \) is dense in \({\mathscr {H}}\) in the strong topology, and it is called separating if \(m\Omega = 0 \Longrightarrow m=0\). Such a representation on \({\mathscr {H}}\) of \({\mathcal {M}}\) and vector can always be obtained by the GNS-representation of a faithful normal state \(\omega \). A cyclic and separating vector is also called standard and a representation of \({\mathcal {M}}\) on a Hilbert space with standard vector is called a standard representation. Associated with \(\Omega \) is an anti-linear involution J on \({\mathscr {H}}\) such that \(J\Omega = \Omega \) and \(J{\mathcal {M}}J = {\mathcal {M}}'\) called the modular conjugation. The closure of the set of vectors of the form \(aJaJ\Omega , a \in {\mathcal {M}}\) is called “natural cone” and is also denoted as \(L^2({\mathcal {M}},\Omega )_+ \subset {\mathscr {H}}\).

  • Conditional expectations: If \({\mathcal {N}}\subset {\mathcal {M}}\) is a von Neumann subalgebra, then a conditional expectation \(E:{\mathcal {M}}\rightarrow {\mathcal {N}}\) is a channel such that \(E(n_1mn_2) = n_1 E(m) n_2\) for all \(m \in {\mathcal {M}}, n_i \in {\mathcal {N}}\). The index \(\lambda _E \in [1,\infty ]\) of a conditional expectation is the infimum over all positive real numbers \(\lambda \) such that the \(E(mm^*) \ge \lambda ^{-1} mm^*\) for all \(m \in {\mathcal {M}}\).

  • Jones-index: Assume that \({\mathcal {N}}, {\mathcal {M}}\) are factors such that there exists a conditional expectation \(E:{\mathcal {M}}\rightarrow {\mathcal {N}}\). If \(\lambda _E<\infty \), there exists a unique \(E_0\) called “minimal conditional expectation” [38] such that \(\lambda _{E_0}\) is minimal, and in such a case \(\lambda _{E_0} =: [{\mathcal {M}}:{\mathcal {N}}]\) is called the Jones-Kosaki index [32, 39] of the inclusion. Otherwise we set \([{\mathcal {M}}:{\mathcal {N}}]=\infty \).

  • \(L^p\)-space: One can construct so-called “non-commutative \(L^p\) spaces” (\(p \in [1,\infty ]\)) interpolating between the space of normal functionals on \({\mathcal {M}}\) and \({\mathcal {M}}\) itself. They are defined relative to some standard vector \(\Omega \) and denoted as \(L^p({\mathcal {M}},\Omega )\), see [40, 41]. One has \(L^2({\mathcal {M}},\Omega ) = {\mathscr {H}}\). Beyond this, we will only need \(L^\infty ({\mathcal {M}},\Omega )\) which is a linear subspace of \({\mathscr {H}}\). We will mainly use the following characterization of this space [40]: As a vector space \(L^\infty ({\mathcal {M}},\Omega ) = {\mathcal {M}}\Omega \). The Banach space norm is \(\Vert \xi \Vert _{L^\infty ({\mathcal {M}},\Omega )} = \Vert m\Vert \) where \(m \in {\mathcal {M}}\) is the unique element such that \(\xi = m\Omega \).

  • Opposite algebra: The opposite algebra \({\mathcal {M}}^{op}\) of a von Neumann algebra \({\mathcal {M}}\) is identical as a vector space with \(*\)-operation, but has the reversed product \(m_1^{op}m_2^{op} = (m_2m_1)^{op}\).

2.2 Maximal f-divergence for bounded operators

See [25, 42,43,44] as general references. Central to the concept of operator mean and the divergences studied in this paper are the notions of operator monotone- and operator convex functions.

Definition 2.1

Let \(I\subset {\mathbb {R}}\) be an interval. \(f:I\rightarrow {\mathbb {R}}\) is said to be

  • operator monotone if \(f(A)\le f(B)\) whenever \(A,B \in B({\mathscr {H}})\) are self adjoint operators on a Hilbert space such that \(A \le B\) and that their spectra satisfy \(\sigma (A),\sigma (B) \subset I\);

  • operator convex if \(f(\lambda A + (1-\lambda )B)\le \lambda f(A) + (1-\lambda )f(B), \ \forall \lambda \in (0,1)\) whenever \(A,B \in B({\mathscr {H}})\), with \(\sigma (A),\sigma (B) \subset I\).

Remark 2.2

Let \(t_0 \in (0,\infty ]\) and \(f:[0,t_0)\rightarrow {\mathbb {R}}\). Then (f is operator convex and \(f(0)\le 0\)) if and only if (\(\frac{f(t)}{t}\) is operator monotone on \((0,t_0)\)). Furthermore, if \(f:[0,t_0)\rightarrow {\mathbb {R}}\) is operator monotone, then it is also operator concave. While the converse is not true, it is the case that (f operator concave and \(f(t)\ge 0\) for all \(t \in [0,\infty )\)) implies (f is operator monotone on \([0,\infty )\)).

Example 2.3

On \([0,\infty )\), the function \(t^\alpha \) is operator monotone if and only \(\alpha \in [0,1]\). \(t^\alpha \) is operator convex if and only if \(\alpha \in [-1,0]\cup [1,2]\). The function \(\log (t)\) is operator monotone on \((0,\infty )\).

The following well-known representation (9) allows one to reduce many constructions involving operator monotone functions to certain weighted averages of a special operator monotone function. Consider a continuous operator monotone function f on \([0,\infty )\), let \(a=f(0), \ b=f'(\infty ):=\lim _{t\rightarrow \infty }\frac{f(t)}{t}\). There exists a unique finite positive Radon measure \(\mu \) on \([0,\infty )\), such that

$$\begin{aligned} f(t)=a+bt+\int _{(0,\infty )} \frac{(1+s) t}{t+s}d\mu (s). \end{aligned}$$
(9)

Definition 2.4

(Kubo–Ando means, [45]). Consider a binary operation \(\sigma \) on \(B({\mathscr {H}})_+\) (non-negative self-adjoint bounded operators), i.e. \(\sigma : B({\mathscr {H}})_+ \times B({\mathscr {H}})_+ \rightarrow B({\mathscr {H}})_+\). We write \(\sigma (A\times B)=: A\sigma B \in B({\mathscr {H}})_+\). \(\sigma \) is called a Kubo–Ando connection if, for all \(A,B,C,D \in B({\mathscr {H}})_+\), the following hold

  1. 1.

    Joint monotonicity, i.e. \(A\le C, \ B\le D,\ \text {then} \ A\sigma B\le C\sigma D;\)

  2. 2.

    Transformer inequality, i.e. \(C(A\sigma B)C\le (CAC)\sigma (CBC);\)

  3. 3.

    Upper semicontinuity, i.e. whenever \(A_n \downarrow A\), \(B_n \downarrow B\) strongly, then \(A_n\sigma B_n \downarrow A\sigma B,\) strongly.

Moreover \(\sigma \) is called a (Kubo–Ando operator) mean if the above hold and

  1. 4.

    Normalization, i.e.

    $$I_{\mathscr {H}}\sigma I_{\mathscr {H}}=I_{\mathscr {H}}.$$

The Kubo–Ando theorem establishes a one-to-one correspondence between operator connections and non-negative operator monotone functions on \([0,\infty )\), see [45, Theorems 3.3, 3.4]. The isomorphism is provided by \(\sigma \mapsto f\), where \(f(t)I_{\mathscr {H}}:=I_{\mathscr {H}}\sigma (tI_{\mathscr {H}})\). Its inverse \(f \mapsto \sigma \) is defined by taking the integral expression (9) of a non-negative operator monotone function f on \([0,\infty )\), \(f(t)=a+bt+\int _{(0,\infty )} \frac{(1+s) t}{t+s}d\mu (s)\), and then defining the corresponding \(\sigma \) as

$$\begin{aligned} A\sigma B:= a A + b B + \int _{(0,\infty )}\frac{1+t}{t} [(tA):B] \, d\mu (t) \end{aligned}$$
(10)

Here, A : B is the parallel sum operator connection which is defined as the bounded quadratic form (see e.g. [43, Lemma 3.1.5])

$$\begin{aligned} \langle \xi , (A:B)\xi \rangle :=\inf \{ \langle \zeta ,A\zeta \rangle + \langle \xi -\zeta , B(\xi -\zeta ) \rangle \, \ \zeta \in {\mathscr {H}}\}. \end{aligned}$$
(11)

If A and B are positive operators with bounded inverses, then

$$\begin{aligned} A:B = (A^{-1} + B^{-1})^{-1}. \end{aligned}$$
(12)

Example 2.5

(Left and right trivial means). The left trivial mean \(\sigma _1\) is induced by the function \(f(x)\equiv 1\) and gives \(A\sigma _1 B=A\). The right trivial mean \(\sigma _x\) is induced by \(f(x)=x\) and gives \(A\sigma _x B=B\).

Example 2.6

(\(\alpha \)geometric means). The \(\alpha \)–geometric means are defined in terms of the operator monotone function

$$\begin{aligned} f_\alpha (t):=t^\alpha =\frac{\sin \alpha \pi }{\pi }\int _{0}^{\infty }\frac{t}{s + t}\frac{ds}{s^{1-\alpha }},\quad t \ge 0, \alpha \in (0,1). \end{aligned}$$
(13)

The corresponding measure \(\mu _\alpha \) and constants \(a_\alpha , b_\alpha \) as in (9) are therefore

$$\begin{aligned} d\mu _{f_\alpha } =\frac{\sin \alpha \pi }{\pi } \frac{t^\alpha dt}{t(t+1)}, \quad a_\alpha = b_\alpha = 0. \end{aligned}$$
(14)

Particular examples are the left- and right trivial means (for \(\alpha =0,1\)) and the geometric mean for which \(\alpha = 1/2\).

Example 2.7

(Logarithm). The logarithm \(f(t) = \log t\) is operator monotone on \((0,\infty )\) and formally has \(a=-\infty \), \(b=0\) and \(d\mu _{\log } = t^{-1}(1+t)^{-1}dt\). We will typically consider the approximation \(f_n(t):=\log (t+\frac{1}{n}), n \in {\mathbb {N}}\) which is operator monotone on \([0,\infty )\) with \(a_n= -\log n, b_n = 0\).

Consider two bounded positive operators such that \(B \le \lambda A\) for some \(\lambda < \infty \). Then \(f(A^{-1/2} B A^{-1/2}) \le f(\lambda ) I\) is a bounded operator and the Kubo-Ando mean \(\sigma \) corresponding to the operator monotone function f can also be expressed as [45, Theorem 3.3]

$$\begin{aligned} A \sigma B = A^{1/2} f(A^{-1/2} B A^{-1/2}) A^{1/2}, \end{aligned}$$
(15)

as one may see using the integral representations (9), (10) as well as the expression for parallel sum (12). The following divergences first appeared in [46] and were developed further in [47, 48].

Definition 2.8

Consider a non-negative operator monotone function \(f: [0,\infty ) \rightarrow [0,\infty )\) characterized by (9) and positive trace class operators AB such that \(B \le \lambda A\) for some \(\lambda > 0\). Then the “maximal quantum f–divergence” of A with respect to B is defined by

$$\begin{aligned} D_f(A\Vert B):=-\log \textrm{Tr}_{\mathscr {H}}(A \sigma B), \end{aligned}$$
(16)

where \(\sigma \) is the operator connection corresponding to f.

Remark 2.9

For general positive trace class operators AB such that \(B \le \lambda A\) possibly does not hold for any \(\lambda < \infty \), one defines

$$\begin{aligned} D_f(A\Vert B):= \lim \limits _{\varepsilon \downarrow 0}D_f(A+ \varepsilon C \Vert B + \varepsilon C) \in (-\infty , +\infty ]. \end{aligned}$$
(17)

Here C is any bounded positive operator such that \(\lambda ^{-1}(A+B) \le C \le \lambda (A+B)\) for some \(\lambda < \infty \). By the monotonicity property of operator connections, the limit exists because the sequence is monotone decreasing. The limit is independent of the particular choice of C as a special case of Lemma 2.12 below.

2.3 Maximal f-divergence for von Neumann algebras

Operator connections and maximal f-divergences can be generalized from bounded operators to more general settings such as to suitable classes of unbounded positive quadratic forms [49,50,51]. In this work, we will mainly be interested in the notion of operator connection and maximal f-divergence between two positive normal functionals \(\varphi , \psi \) on a von Neumann algebra \({\mathcal {M}}\). This setting is investigated in great detail in [25, 26] to which we refer as general references. Based on the results [25, Appendix D] one can for instance obtain quite easily a variational characterization of the maximal f-divergence which will be the basis of most developments in this work.

The starting point in the von Neumann algebra setting is the Connes cocycle together with the following well-known result [52], see e.g. [40] for the definitions of the modular operators \(\Delta _\psi \) and Connes cocycles \([D\psi :D\varphi ]_t\).

Lemma 2.10

Let \(\psi ,\varphi \) be normal, positive functionals on \({\mathcal {M}}\), assume that \(\psi \le \lambda \varphi \) for \(\lambda >0\). Then the Connes cocycle derivative \([D\psi :D\varphi ]_{t}\) admits an extension to a weakly continuous (\({\mathcal {M}}\)-valued) function \([D\psi :D\varphi ]_z\) for z in the strip \(-1/2\le \Im z\le 0\) which is analytic in the interior. The generator \([D\psi :D\varphi ]_{-i/2} \in {\mathcal {M}}\) has norm less than \(\sqrt{\lambda }\), and \(\Delta _{\psi }^{1/2}=[D\psi :D\varphi ]_{-i/2}\Delta _{\varphi }^{1/2}\).

Using this and the next lemma, one can define [26]:

Definition 2.11

Consider an operator monotone function \(f: [0,\infty ) \rightarrow [0,\infty )\), and normal, positive functionals \(\varphi ,\psi \) be on \({\mathcal {M}}\) such that there exists \(\lambda >0\) such that \(\lambda ^{-1}\varphi \le \psi \le \lambda \varphi \). Then we have the positive invertible operator \(T_\varphi ^\psi :=([D\psi :D\varphi ]_{-i/2})^*[D\psi :D\varphi ]_{-i/2} \in {\mathcal {M}}\). Define the maximal quantum f–divergence of \(\varphi \) with respect to \(\psi \) by

$$\begin{aligned} D_f(\varphi \Vert \psi ):=-\log (\varphi (f(T_\varphi ^\psi ))). \end{aligned}$$
(18)

Lemma 2.12

(See [26]) Let \(\varphi ,\psi \) be normal, positive functionals on \({\mathcal {M}}\). For every \(\phi \sim \varphi +\psi \), i.e. there exists \(\lambda >0\) such that \(\lambda ^{-1}(\varphi +\psi )\le \phi \le \lambda (\varphi +\psi )\), the limit

$$\begin{aligned} \lim \limits _{\varepsilon \downarrow 0}D_f(\varphi + \varepsilon \phi \Vert \psi +\varepsilon \phi ) \in (-\infty , +\infty ] \end{aligned}$$

exists, and it is independent on the choice of \(\phi \) above.

It therefore makes sense to make the following definition [26].

Definition 2.13

Let \(\varphi ,\psi \) let normal, positive functionals on \({\mathcal {M}}\), and let \(f: [0,\infty ) \rightarrow [0,\infty )\) be an operator monotone function. The maximal quantum f-divergence of \(\varphi \) with respect to \(\psi \) is defined as

$$\begin{aligned} D_f(\varphi \Vert \psi ):=\lim \limits _{\varepsilon \downarrow 0}D_f(\varphi + \varepsilon \phi \Vert \psi +\varepsilon \phi ) \end{aligned}$$
(19)

where \(\phi \) is any positive normal functional on \({\mathcal {M}}\) satisfying \(\phi \sim \varphi + \psi \). Note that we may chose \(\phi = \varphi + \psi \).

Remark 2.14

If \({\mathcal {M}}= B({\mathscr {H}})\) is a type I von Neumann factor (or more generally, direct sum of factors), positive normal functionals on \({\mathcal {M}}\) are in one to one correspondence with positive trace class operators on \({\mathscr {H}}\). Under this identification, the above definition of maximal f-divergence reduces to Definition 2.8 and Remark 2.9.

Remark 2.15

The attentive reader will notice that compared to [26], we require f in Definition 2.11 to be operator monotone rather than operator convex, the order of the states is reversed and we have a logarithm. The presence of the logarithm is just for convenience to make the entropy additive under tensor products. If we ignore the logarithm then our definition reduces to \( -{\hat{S}}_{-f}(\psi \Vert \varphi )\) of [26] for non-negative operator monotone functions noting that \(-f\) is operator convex. Of course, the definition of [26] works for the larger class of all operator convex functions.

2.4 Properties of maximal f–divergence

Many properties of \(D_f\) and \(D_{BS}\), and of the corresponding connections \(\sigma \) between states,Footnote 7 in the setting of von Neumann algebras are known, see e.g. [25, Theorem 4.4, Proposition 4.5]. In the present work, a variational formula for \(D_f\) and the BS divergence \(D_{BS}\) will take center stage and from this, many of these properties could be seen directly in retrospect. First, one defines an analogue of the parallel sum (12) for two normal positive functionals \(\varphi ,\psi \) on the von Neumann algebra \({\mathcal {M}}\) by \((z \in {\mathcal {M}})\)

$$\begin{aligned} (\varphi : \psi )(zz^*):= \inf \{ \varphi (xx^*) + \psi (yy^*) \mid x+y = z, x,y \in {\mathcal {M}}\}. \end{aligned}$$
(20)

Then \(\varphi : \psi \) is a positive normal functional on \({\mathcal {M}}_+\) which is extended to all of \({\mathcal {M}}\) by writing a general element as a difference of elements from \({\mathcal {M}}_+\). Using the notion of parallel sum, one can next define a notion of operator mean \(\varphi \sigma \psi \) associated with an operator monotone function \(f: [0,\infty ) \rightarrow [0,\infty )\) with representation (9) between two positive normal functionals on \({\mathcal {M}}\) by an analogue of the formula (10):

$$\begin{aligned} (\varphi \sigma \psi )(m):= a\varphi (m) + b\varphi (m) + \int _{(0,\infty )} \frac{1+t}{t} [(t \varphi ):\psi ](m) \, d\mu (t), \end{aligned}$$
(21)

where \(a,b,d\mu \) with \(a,b<\infty \) correspond to the operator monotone function \(f: [0,\infty ) \rightarrow [0,\infty )\) as in (9). By combining [25, Theorem D.7, D.8, D.10], it follows that

$$\begin{aligned} D_f(\varphi \Vert \psi ) = -\log \left( a \varphi (1) + b \psi (1) + \int _{(0,\infty )} \frac{1+t}{t} [(t \varphi ):\psi ](1) \, d\mu (t) \right) . \end{aligned}$$
(22)

From this relation and (20) one can obtain the variational formula with ease, see also [44, Remark 9.5] for a closely related formula in the case of \({\mathcal {M}}= B({\mathscr {H}})\):

Proposition 2.16

Let \(f: [0,\infty ) \rightarrow [0,\infty )\) be an operator monotone function with representation (9) where \(a,b<\infty \). Let \(\psi ,\varphi \) be normal, positive functionals on \({\mathcal {M}}\). Then we have

$$\begin{aligned}{} & {} D_f(\varphi \Vert \psi ) \nonumber \\{} & {} = \ -\log \left( a\varphi (1) + b\psi (1) + \inf \limits _{(0,\infty )\xrightarrow {x} {\mathcal {M}}} \int _{(0,\infty )}(1+t)\left[ \varphi (x_t^{}x_t^*) + \frac{1}{t}\psi (y_t^{}y^*_t)\right] d\mu (t) \right) ,\nonumber \\ \end{aligned}$$
(23)

where the infimum is taken over all step functions \(x:(0,\infty )\rightarrow {\mathcal {M}}\) with finite range such that \(x_t=1\) for sufficiently small t, such that \(x_t=0\) for sufficiently large t, and where \(y_t:=1-x_t\).

Proof

Note that since \(0 \le [(t \varphi ):\psi ](1) \le \textrm{min}(\psi (1), t\varphi (1))\) we can choose a \(\delta >0\), and a \(K<\infty \) such that

  • \(|\int _{(0,\delta )}(1+t)\varphi (1)d\mu -\int _{(0,\delta )} \frac{1+t}{t} [(t \varphi ):\psi ](1) d\mu | \le \varepsilon \)

  • \(|\int _{(K,\infty )} \frac{1+t}{t}\psi (1)d\mu -\int _{(K,\infty )} \frac{1+t}{t} [(t \varphi ):\psi ](1) d\mu | \le \varepsilon \)

Then we define \(y_t=0, t<\delta \), and \(y_t=1, t>K\). Next we build a step function \(x: [\delta ,K]\rightarrow {\mathcal {M}}\) with finite range such that \(\int _{[\delta ,K]} (1+t)\left[ \varphi (x_tx^*_t)+\frac{1}{t}\psi (y_ty_t^*)\right] d\mu \) approximates \( \int _{[\delta ,K]}\frac{1+t}{t} [(t \varphi ):\psi ](1) d\mu \) to within tolerance \(\varepsilon \). This can be done by using the inner regularity of the Radon measure \(\mu \); for details on this standard procedure see [53]. \(\square \)

The above proposition does not cover the operator monotone function \(f(t) = \log (t)\) (formally \(a=-\infty \) in the representation (9)). Since this case underlies the Belavkin–Staszewski (BS) divergence and is particularly interesting for us, we treat it explicitly. Consider first a pair of positive normal functionals on \({\mathcal {M}}\) such that \(\varphi \sim \psi \). Define the BS divergence as

$$\begin{aligned} D_{BS}(\varphi \Vert \psi ):=-\varphi (\log (T_\varphi ^\psi )) \end{aligned}$$
(24)

For a general pair of positive normal functionals such that \(\varphi \sim \psi \) does not hold, we define \(D_{BS}(\varphi \Vert \psi )\) analogously to Definition 2.13. The BS divergence can be seen as the limit \(\alpha \rightarrow 1\) of the maximal geometric \(\alpha \)-divergence corresponding to \(f_\alpha (t) = t^\alpha \), \(\alpha \in (0,1)\).

Proposition 2.17

We have

$$\begin{aligned} D_{BS}(\varphi \Vert \psi ) = \sup \sup \left\{ \varphi (1)\log n - \int _{1/n}^{\infty } \left[ \varphi (x^{}_t x^*_t) + \frac{1}{t}\psi (y^{}_ty_t^*)\right] \frac{dt}{t} \right\} , \end{aligned}$$
(25)

where the first \(\sup \) is taken over \(n\in {\mathbb {N}}\), while the second is over finite range step functions x on \((\frac{1}{n},\infty )\) as in Proposition 2.16.

Proof

First we assume \(\varphi \sim \psi \) and consider the approximating sequence \(f_n(t):=\log (t+\frac{1}{n}), t\ge 0\) of operator monotone functions, which have the integral representation \(f_n(t)=-\log n + \int _{1/n}^{\infty }\frac{t}{t+s}\frac{ds}{s}\). Define \(F_n(\varphi \Vert \psi ):=-\varphi (f_n(T^\psi _\varphi ))\). The spectral theorem implies that the operator in Lemma 2.10 has a spectral representation \(T^\psi _\varphi =\int _{0}^{\lambda } \lambda ' dE_{\lambda '}\) where \(\lambda <\infty \). Thus as \(f_n(t) \uparrow \log (t)\) for all \(t>0\), by the monotone convergence theorem we have that \(F_n(\varphi \Vert \psi ) \uparrow D_{BS}(\varphi \Vert \psi )\). On the other hand, the same argument of Proposition 2.16 applies to \(F_n\) (compare the definition of \(F_n\) with Definition 2.11), i.e.

$$\begin{aligned} F_n(\varphi \Vert \psi )= \sup \left\{ \varphi (1)\log n - \int _{1/n}^{\infty } \left[ \varphi (x^{}_t x^*_t) + \frac{1}{t}\psi (y^{}_ty_t^*)\right] \frac{dt}{t} \right\} \,. \end{aligned}$$
(26)

Here the sup is, as usual, over finite range step-functions. Thus, the claim follows.

Next, let \(\varphi ,\psi \) be an arbitrary pair of normal and positive functionals on \({\mathcal {M}}\). Following Definition 2.13, we notice that as \(D_{BS}(\varphi +\varepsilon \phi \Vert \psi + \varepsilon \phi )\ge F_n(\varphi +\varepsilon \phi \Vert \psi + \varepsilon \phi )\) for all \(n\in {\mathbb {N}}\),

$$\begin{aligned} D_{BS}(\varphi \Vert \psi ) =\lim _{\varepsilon \downarrow 0} D_{BS}(\varphi +\varepsilon \phi \Vert \psi + \varepsilon \phi ) \ge \lim _{\varepsilon \downarrow 0}F_n(\varphi +\varepsilon \phi \Vert \psi + \varepsilon \phi )=F_n(\varphi \Vert \psi ).\nonumber \\ \end{aligned}$$
(27)

Using the variational formula (26) for \(F_n(\varphi +\varepsilon \phi \Vert \psi + \varepsilon \phi )\), since we may take \(\phi =\varphi +\psi \),

$$\begin{aligned} F_n(\varphi +\varepsilon \phi \Vert \psi + \varepsilon \phi )\le & {} F_n(\varphi \Vert \psi ) + \varepsilon ( F_n(\varphi \Vert \varphi ) + F_n(\psi \Vert \psi )). \end{aligned}$$
(28)

Therefore

$$\begin{aligned} D_{BS}(\varphi +\varepsilon \phi \Vert \psi + \varepsilon \phi )= & {} \sup _n F_n(\varphi +\varepsilon \phi \Vert \psi + \varepsilon \phi )\nonumber \\\le & {} \sup _n F_n(\varphi \Vert \psi ) + \varepsilon \sup _n ( F_n(\varphi \Vert \varphi ) + F_n(\psi \Vert \psi ))\,. \end{aligned}$$
(29)

As \(\sup _n( F_n(\varphi \Vert \varphi )+F_n(\psi \Vert \psi ))\) is bounded, we have

$$\begin{aligned} \lim _{\varepsilon \downarrow 0} D_{BS}(\varphi +\varepsilon \phi \Vert \psi + \varepsilon \phi )= \lim _{\varepsilon \downarrow 0} \sup _n F_n(\varphi +\varepsilon \phi \Vert \psi + \varepsilon \phi ) \le \sup _n F_n(\varphi \Vert \psi ). \end{aligned}$$
(30)

The proof is completed by combining the inequalities (27), (30). \(\square \)

We now list some of the main properties of \(D_f\) and \(D_{BS}\), see [25, Theorem 4.4, Proposition 4.5].

  1. 1.

    (Data processing inequality) Let \(T: {\mathcal {N}}\rightarrow {\mathcal {M}}\) be a positive, normal, unital linear map between von Neumann algebras \({\mathcal {M}}\) and \({\mathcal {N}}\) satisfying the Schwarz property \( T(nn^*) \ge T(n)T(n)^* \) for all \(n \in {\mathcal {N}}\).Footnote 8 Then \(D_f(\varphi \circ T\Vert \psi \circ T) \le D_f(\varphi \Vert \psi )\).

  2. 2.

    (Lower semi-continuity) Let \(\varphi _n, \psi _n\) be sequences of normal positive functionals converging pointwise to normal positive functionals \(\varphi , \psi \) as \(n \rightarrow \infty \). Then \(D_f(\varphi \Vert \psi ) \le \liminf _n D_f(\varphi _n \Vert \psi _n)\).

  3. 3.

    (Martingale property) Consider a von Neumann algebra \({\mathcal {M}}\) and an increasing sequence of von Neumann subalgebras \(\{{\mathcal {M}}_n\}\) such that \({\mathcal {M}}=\left( \bigcup _n {\mathcal {M}}_n \right) ''\). Then

    (31)
  4. 4.

    (Joint convexity and subadditivity) The functional \({\mathcal {M}}_{*,+}\times {\mathcal {M}}_{*,+} \xrightarrow {D_f} (-\infty ,+\infty ]\) is jointly convex and subadditive.

The analogous properties hold for \(D_{BS}\).

Remark 2.18

Item 2 is not provided in [25, Theorem 4.4, Proposition 4.5]; indeed it is presented as a conjecture for general von Neumann algebras [25, Problem 4.13]. The variational expressions given in Propositions 2.17 and 2.16 provide an immediate proof of this for \(D_f\) and \(D_{BS}\), see for example [54] for the analogous argument for the (Araki) relative entropy.

3 Bimodules and f-Divergences for Channels

3.1 Definitions

Definition 3.1

(Bimodules). Given von Neumann algebras \({\mathcal {N}}, {\mathcal {M}}\), a \({\mathcal {N}}-{\mathcal {M}}\) bimodule is a triple \(\left( {\mathscr {H}},\ell _{\mathscr {H}}, r_{\mathscr {H}}\right) \) where \({\mathscr {H}}\) is a Hilbert space, and \({\mathcal {N}}\xrightarrow {\ell _{\mathscr {H}}} B({\mathscr {H}}) \xleftarrow {r_{\mathscr {H}}}{\mathcal {M}}\) are a normal representation, and a normal anti-representation, respectively, such that \(\ell _{\mathscr {H}}({\mathcal {N}})\) and \(r_{\mathscr {H}}({\mathcal {M}})\) commute.

When it is clear from the context, we will denote a bimodule by the underlying Hilbert space \({\mathscr {H}}\). For more details on bimodules see e.g. [16, 17].

Remark 3.2

We will use the natural notation \(n\xi m:= \ell _{\mathscr {H}}(n)r_{\mathscr {H}}(m)\xi , \ n\in {\mathcal {N}}, m\in {\mathcal {M}},\xi \in {\mathscr {H}},\) when the bimodule Hilbert space is the identity bimodule \(L^2({\mathcal {M}},\Omega )\), which is the bimodule arising from a standard representation and standard vector \(\Omega \) of \({\mathcal {M}}\), so it is unique up to unitary equivalence. As a vector space, \(L^2({\mathcal {M}},\Omega )\) is realized up to unitary equivalence as the GNS Hilbert space \({\mathscr {H}}\) of some chosen faithful normal state \(\omega \) with associated cyclic and separating GNS vector \(\Omega \). The right- and left action defining the \({\mathcal {M}}-{\mathcal {M}}\) bimodule structure of \(L^2({\mathcal {M}},\Omega )\) are defined as

$$\begin{aligned} \ell _{\mathscr {H}}(m) \xi = m\xi , \quad r_{\mathscr {H}}(m) \xi = Jm^*J\xi , \end{aligned}$$
(32)

where \(J=J_\Omega \) is the modular conjugation associated with \(\Omega \) that sends \({\mathcal {M}}\) anti-unitarily to \({\mathcal {M}}'\).

The following proposition [17, Proposition 2.6] will be referenced below:

Proposition 3.3

Let \(T:{\mathcal {N}}\rightarrow {\mathcal {M}}\) be a channel and let \(\varphi \) be a normal state of \({\mathcal {M}}\) with vector representative \(\xi \in L^2({\mathcal {M}})_+\) in the natural cone. There exists a \({\mathcal {N}}-{\mathcal {M}}\) bimodule \({\mathscr {H}}_T\) and a vector \(\eta \in {\mathscr {H}}_T\) such that

$$\begin{aligned} \langle \xi , T(n) Jm^*J \xi \rangle =\langle \eta ,\ell _{{\mathscr {H}}_T}(n)r_{{\mathscr {H}}_T}(m)\eta \rangle , \end{aligned}$$
(33)

and \(\eta \) is cyclic for \(\ell _{{\mathscr {H}}_T}({\mathcal {N}})\vee r_{{\mathscr {H}}_T}({\mathcal {M}})\). Moreover, such a bimodule and unit vector are unique up to unitary transformations.

In the following, \(f: [0,\infty ) \rightarrow [0,\infty )\) is an operator monotone function (such that \(a,b<\infty \) in its representation (9)).

Definition 3.4

Consider a pair of channels \(S,T:{\mathcal {N}}\rightarrow {\mathcal {M}}\), and a von Neumann algebra \({\mathcal {A}}\). We extend the channels to \(S \otimes \text {id}_{\mathcal {A}}, T \otimes \text {id}_{\mathcal {A}}:{\mathcal {N}}\odot {\mathcal {A}}^{op} \rightarrow {\mathcal {M}}\odot {\mathcal {A}}^{op}\). Let \(\pi \) be any binormal representation of \({\mathcal {M}}\odot {\mathcal {A}}^{op}\). Then for every vector \(\xi \in {\mathscr {H}}_\pi \) in the \({\mathcal {M}}-{\mathcal {A}}\) bimodule given by \(\pi \), we can consider the states \(\varphi _{S,\pi ,\xi }= \varphi _{\xi }\circ \pi \circ (S\otimes \text {id}_{\mathcal {A}})\) and \(\varphi _{T,\pi ,\xi }= \varphi _{\xi }\circ \pi \circ (T\otimes \text {id}_{\mathcal {A}})\). Consider \(D_f(\varphi _{S,\pi ,\xi }\Vert \varphi _{T,\pi ,\xi })\) as defined by Proposition 2.17, now involving the supremum over finite range step functions x with values in \({\mathcal {N}}\odot {\mathcal {A}}^{op}\). Then we define

$$\begin{aligned} D_f(S \Vert T):= \sup _{({\mathcal {A}},\pi ,\xi )} D_f(\varphi _{S,\pi ,\xi }\Vert \varphi _{T,\pi ,\xi }), \end{aligned}$$
(34)

where the supremum is over the triples \(({\mathcal {A}},\pi ,\xi )\) consisting of a von Neumann algebra \({\mathcal {A}}\), bimodule \(\pi \) as above and normalized \(\xi \in {\mathscr {H}}\). We make the analogous definition for the BS divergence.

Remark 3.5

When \({\mathcal {M}}\) is finite-dimensional, our definition for \(D_{BS}\) agrees with that of [27]. This follows from Proposition 2.17 and the part of Proposition 3.7 referring to finite dimensional type I algebras. In fact [27] also consider the channel divergence for the function \(f(t) = t^\alpha , \alpha \in (1,2]\). That case is not considered in the present work since this function is not operator monotone but operator convex, and it is not obvious to what extent the variational formula in Proposition 2.16 still applies in this case.

Remark 3.6

Consider a normal homomorphism \(\theta :{\mathcal {A}}\rightarrow {\mathcal {M}}\). Then a new bimodule \({\mathscr {H}}_\theta \) can be constructed by twisting the identity bimodule \(L^2({\mathcal {M}})\) on the right by using \(\theta \). More explicitly, \({\mathscr {H}}_\theta =L^2({\mathcal {M}})\) as Hilbert space, the left action of \({\mathcal {M}}\) is the one coming from the structure of \(L^2({\mathcal {M}})\) as a left \({\mathcal {M}}-\)module, while the right action of \({\mathcal {A}}\) is defined \(r_\theta (a)\eta :=\eta \theta (a), \ \eta \in L^2({\mathcal {M}}), \ a \in {\mathcal {A}}\). In this case, the bimodule is denoted by \(L^2_\theta ({\mathcal {M}},\Omega ).\)Footnote 9

Even though we will stick with the above definition in what follows, one may ask to what extent it is necessary to consider all bimodules in the definition of the channel divergence (34).

Proposition 3.7

If \({\mathcal {M}}\) is properly infinite (direct sum of factors of types I\(_\infty \), II\(_\infty \) or III) or a direct sum of type I\(_n\) factors then we have

$$\begin{aligned} D_f(S\Vert T)=\sup \limits _{\xi \in L^2({\mathcal {M}})_+} D_f(\varphi _{S,\pi ,\xi }\Vert \varphi _{T,\pi ,\xi }). \end{aligned}$$
(35)

If \({\mathcal {M}}\) is infinite dimensional and finite (direct sum of factors of type II\(_1\)) then

$$\begin{aligned} D_f(S\Vert T)=\sup \limits _{\xi \in \left( L^2({\mathcal {M}}) \otimes L^2(\ell ^2\left( {\mathbb {N}}\right) )\right) _+} D_f(\varphi _{S,\pi ,\xi }\Vert \varphi _{T,\pi ,\xi }). \end{aligned}$$

where by \(L^2(\ell ^2\left( {\mathbb {N}}\right) )\) we mean the Hilbert-Schmidt operators on the separable Hilbert space \(\ell ^2({\mathbb {N}})\) and by \(L^2({\mathcal {M}}) \otimes L^2(\ell ^2\left( {\mathbb {N}}\right) )\) we mean the associated \({\mathcal {M}}- {\mathcal {M}}\otimes B(\ell ^2({\mathbb {N}}))\)-bimodule. The same holds for the BS divergence.

Proof

By definition \(D_f(S\Vert T)\ge \sup \limits _{\xi \in L^2({\mathcal {M}})_+} D_f(\varphi _{S,\pi ,\xi }\Vert \varphi _{T,\pi ,\xi }) \). To prove the reverse inequality, we can assume for the sake of simplicity that \({\mathcal {N}},{\mathcal {M}},{\mathcal {A}}\) are all factors; the general case may be treated by performing the usual decomposition into a direct sum of factors.

Case (1) \({\mathcal {M}}\) is of type I\(_\infty \), II\(_\infty \), III. Then the sup in Definition 3.4 can always be realized for a properly infinite \({\mathcal {A}}\) because we can take the tensor product \({\mathcal {A}}\otimes B({\mathscr {H}})\) and the corresponding bimodule if necessary. Consider a \({\mathcal {M}}-{\mathcal {A}}\) bimodule \({\mathscr {H}}\). In this case, [17, Corollary 2.7] implies that there exists a normal homomorphism \(\theta :{\mathcal {A}}\rightarrow {\mathcal {M}}\), such that \({\mathscr {H}}_\pi \) is isomorphic to \(L^2_\theta ({\mathcal {M}})\). In other words, there exists a unitary \(U:{\mathscr {H}}\rightarrow L^2({\mathcal {M}})\) intertwining the right representation of \({\mathscr {H}}\) with the right representation of \(L^2_\theta ({\mathcal {M}})\). For a vector \(\xi \in {\mathscr {H}}\), denote \(\eta :=U\xi \). Then \(D_f(\varphi _{S,\pi ,\xi }\Vert \varphi _{T,\pi \,\xi })=D_f(\varphi _{S,\pi _\theta ,\eta }\Vert \varphi _{T,\pi _\theta ,\eta })\), where \(\pi _\theta \) is the bimodule representation relative to \(L^2_\theta ({\mathcal {M}})\). Now, using the variational formula in Proposition 2.16

$$\begin{aligned} \begin{aligned}&D_f(\varphi _{S,\pi _\theta ,\eta }\Vert \varphi _{T,\pi _\theta ,\eta })\\&\quad =\sup \limits _{(0,\infty )\xrightarrow {x} {\mathcal {N}}\odot {\mathcal {A}}} -\log \Bigg ( a\varphi _\eta ((S\otimes \theta )(1))+b\varphi _\eta ((T\otimes \theta )(1))\\&\qquad + \int _{(0,\infty )}(1+t)\{\varphi _\eta ((S\otimes \theta )(x_t^{}x_t^*))+\frac{1}{t}\varphi _\eta ((T\otimes \theta )(y_t^*y_t^*)) \} d\mu \Bigg ) \\&\quad \le \sup \limits _{(0,\infty )\xrightarrow {v} {\mathcal {N}}\odot {\mathcal {M}}} -\log \Bigg (a\varphi _\eta ((S\otimes \text {id}_{\mathcal {M}})(1))+b\varphi _\eta ((T\otimes \text {id}_{\mathcal {M}})(1))\\&\qquad + \int _{(0,\infty )}(1+t)\{\varphi _\eta ((S\otimes \text {id}_{\mathcal {M}})(v^{}_tv_t^*))+\frac{1}{t}\varphi _\eta ((T\otimes \text {id}_{\mathcal {M}})(w^{}_tw_t^*)) \} d\mu \Bigg ), \end{aligned} \end{aligned}$$
(36)

where the inequality follows because the second sup is over a larger set which is easily seen by setting \(v_t:=(\text {id}_{\mathcal {N}}\otimes \theta )(x_t)\) and \(w_t=1-v_t\). The right side is \(D_f(\varphi _{S, \pi , \eta }\Vert \varphi _{T,\pi ,\eta })\), again by our variational formula. Taking the supremum over unit vectors \(\eta \) in the natural cone then demonstrates the reverse inequality \(D_f(S\Vert T)\le \sup \limits _{\xi \in L^2({\mathcal {M}})_+} D_f(\varphi _{S,\pi ,\xi }\Vert \varphi _{T,\pi ,\xi })\) and we are done.

Case (2) \({\mathcal {M}}\) is of type I\(_n\), i.e. \({\mathcal {M}}= M_n({\mathbb {C}})\). Let \((\xi ,\pi ,{\mathcal {A}})\) be a nearly optimal triple in Definition 3.4 with corresponding bimodule \({\mathscr {H}}\), up to tolerance \(\varepsilon \). By replacing \(r_{{\mathscr {H}}}({\mathcal {A}})\) if necessary with the potentially larger von Neumann algebra \(\ell _{{\mathscr {H}}}({\mathcal {M}})'\) (which is type I), we can assume that \(r_{{\mathscr {H}}}({\mathcal {A}}) = B({\mathscr {K}})\), as well as \({\mathscr {H}}= {\mathbb {C}}^n \otimes {\mathscr {K}}\). Now let P be the orthogonal projection on \({\mathscr {H}}\) with range \(\ell _{\mathscr {H}}({\mathcal {M}})\xi \). Then \(P \in \ell _{{\mathscr {H}}}({\mathcal {M}})'\) so \(P = r_{{\mathscr {H}}}(p)\) for some orthogonal projection \(p \in {\mathcal {A}}\), and by the Schmidt-decomposition theorem, p has rank \(\le n\). Going through the definitions, we then have for \(n \in {\mathcal {N}}, a \in {\mathcal {A}}\):

$$\begin{aligned} \varphi _{S,\pi ,\xi }(n \otimes a)= & {} \langle \xi , S(n)\xi a\rangle = \langle \xi , S(n)\xi pap \rangle \nonumber \\= & {} \varphi _{S,\pi ,\xi }((1_n \otimes p)(n \otimes a)(1_n \otimes p)) \end{aligned}$$
(37)

Let x be a step function valued in \({\mathcal {M}}\odot {\mathcal {A}}\) such that the infimum in Proposition 2.16 is achieved up to tolerance \(\varepsilon \). Observe that

$$\begin{aligned} \varphi _{S,\pi ,\xi }(x_tx_t^*)= & {} \varphi _{S,\pi ,\xi }((1_n \otimes p)x_t x_t^*(1_n \otimes p)) \nonumber \\\ge & {} \varphi _{S,\pi ,\xi }((1_n \otimes p)x_t (1_n \otimes p) x_t^*(1_n \otimes p)), \end{aligned}$$
(38)

and we get a similar relation for \(S \rightarrow T\) and \(x_t \rightarrow y_t = 1-x_t\). Therefore, setting \({{\hat{x}}}_t = (1_n \otimes p)x_t (1_n \otimes p), {{\hat{y}}}_t = (1_n \otimes p)y_t (1_n \otimes p)\), we have that

$$\begin{aligned} D_f(S\Vert T) -2\varepsilon{} & {} \le -\log \left( a + b + \int _{(0,\infty )} (1+t) \{ \varphi _{S,\pi ,\xi }(x_tx_t^*) + \frac{1}{t} \varphi _{T,\pi ,\xi }(y_ty_t^*) \} d\mu \right) \nonumber \\{} & {} \le -\log \left( a + b + \int _{(0,\infty )} (1+t) \{ \varphi _{S,\pi ,\xi }({{\hat{x}}}_t {{\hat{x}}}_t^*) + \frac{1}{t} \varphi _{T,\pi ,\xi }({{\hat{y}}}_t {{\hat{y}}}_t^*) \} d\mu \right) \nonumber \\ \end{aligned}$$
(39)

Now observe that \(1_n \otimes p\) is the unit in \({\mathcal {M}}\odot p{\mathcal {A}}p\) and that \({{\hat{x}}}_t + {{\hat{y}}}_t = 1_n \otimes p\), so \({{\hat{x}}}_t \in {\mathcal {M}}\odot p{\mathcal {A}}p\) is an admissible step function in the variational principle of Proposition 2.16. Furthermore \(p{\mathcal {A}}p\) is naturally isomorphic to a subalgebra of \(M_n({\mathbb {C}}^n) = {\mathcal {M}}\) (since the rank of p is \(\le n\)), and that \(P{\mathscr {H}}\) (which contains \(\xi = P\xi \)) is isometric to a subspace of \({\mathbb {C}}^n \otimes {\mathbb {C}}^n\) (since \(P = r_{{\mathscr {H}}}(p)\)). Therefore, the right side of (39) is less than or equal to \(D_f(\varphi _{S,{{\hat{\pi }}}, {{\hat{\xi }}}}\Vert \varphi _{T,{{\hat{\pi }}}, {{\hat{\xi }}}})\), where \({{\hat{\pi }}}\) is the representation associated with the standard \({\mathcal {M}}-{\mathcal {M}}\)-bimodule \({\mathbb {C}}^n \otimes {\mathbb {C}}^n\), and where \({{\hat{\xi }}}\) is the unit vector in that bimodule corresponding to \(\xi \). Thus, if \({\mathcal {M}}= M_n({\mathbb {C}}^n)\) it is sufficient in the variational Definition 3.4 to consider only \({\mathcal {M}}-{\mathcal {M}}\) bimodules.

Case (3) \({\mathcal {M}}\) is of type II\(_1\). Consider a \({\mathcal {M}}-{\mathcal {A}}\)-bimodule \({\mathscr {H}}\). Then \({\mathscr {H}}\otimes L^2(\ell ^2({\mathbb {N}}))\) is naturally a \({\mathcal {M}}\otimes B(\ell ^2({\mathbb {N}})) - {\mathcal {A}}\otimes B(\ell ^2({\mathbb {N}}))\)-bimodule. Both algebras \({\mathcal {M}}\otimes B(\ell ^2({\mathbb {N}}))\) and \({\mathcal {A}}\otimes B(\ell ^2({\mathbb {N}}))\) are properly infinite. Thus, by the same reasoning as above in (1), we know that the maximizing bimodule can be taken to be a standard bimodule for \({\mathcal {M}}\otimes B(\ell ^2({\mathbb {N}}))\). Restricting this standard bimodule to a \({\mathcal {M}}- {\mathcal {M}}\otimes B(\ell ^2({\mathbb {N}}))\)-bimodule gives a bimodule whose associated f-divergence as in Definition 3.4 is not smaller than that of the original \({\mathcal {M}}-{\mathcal {A}}\)-bimodule.

For the BS divergence, we consider the variational principle given by Proposition 2.17 instead of Proposition 2.16. \(\square \)

3.2 Basic properties of channel divergence

We will now prove some basic properties of the channel divergences. In the next lemmas, \(D_f\) is the divergence associated with a non-negative operator monotone function \(f:[0,\infty ) \rightarrow [0,\infty )\) with the representation (9) such that \(a,b<\infty \) and \(D_{BS}\) is the BS divergence (basically corresponding to \(f(t) = \log t\)). In our proof of the next theorem, we cannot use directly the proofs [25, Theorem 4.4, Proposition 4.5] for \(D_f(\varphi \Vert \psi )\) because our definition of \(D_f(T\Vert S)\) is based fundamentally on a variational principle for testfunctions valued in \({\mathcal {M}}\odot {\mathcal {A}}^{op}\), which is not a von Neumann algebra. Fortunately, it is well-known [53] (see also [54, Chapter 5]) that variational principles can provide alternative proofs.

Theorem 3.8

  1. 1.

    (Lower semi-continuity). Let \(T_n, S_n\) be channels such that \(T_n(m) \rightarrow T(m), S_n(m) \rightarrow S(m)\) weakly for any \(m \in {\mathcal {M}}\). Then \(D_f(S \Vert T) \le \liminf _n D_f(S_n \Vert T_n)\), and similarly \(D_{BS}(S \Vert T) \le \liminf _n D_{BS}(S_n \Vert T_n)\).

  2. 2.

    (Data processing inequality) Let \(S_1, S_2: {\mathcal {N}}\rightarrow {\mathcal {M}}, T:{\mathcal {R}}\rightarrow {\mathcal {N}}\) be channels between von Neumann algebras. Then \(D_f(S_1\Vert S_2) \ge D_f(S_1 \circ T\Vert S_2 \circ T)\). Similarly, \(D_{BS}(S_1\Vert S_2) \ge D_{BS}(S_1 \circ T\Vert S_2 \circ T)\).

  3. 3.

    (Joint convexity) Let \(p_i, q_j\) be probability distributions over a finite set and \(T_i, S_j:{\mathcal {N}}\rightarrow {\mathcal {M}}\) channels. Then

    $$\begin{aligned} D_{f}(\sum _i p_i S_i \Vert \sum _j q_j T_j) \le \sum _{i,j} p_i q_j D_{f}(S_i\Vert T_j), \end{aligned}$$
    (40)

    and similarly for \(D_{BS}\).

  4. 4.

    (Dilation) Given channels \(S,T:{\mathcal {N}}\rightarrow {\mathcal {M}}\), then \(D_f(S\Vert T)=D_f(S\otimes id_{B(\ell ^2({\mathbb {N}}))}\Vert T\otimes id_{B(\ell ^2({\mathbb {N}}))})\). The same holds for \(D_{BS}(S\Vert T)\).

Proof

  1. (1)

    Consider an \({\mathcal {M}}- {\mathcal {A}}\) bimodule \(\pi \) and unit vector \(\xi \in {\mathscr {H}}_\pi \) that achieves the supremum in the variational definition (34) up to a tolerance \(\varepsilon \). Then

    $$\begin{aligned} \begin{aligned} D_f(S \Vert T) - 2\varepsilon&= D_f(\varphi _{S,\pi ,\xi } \Vert \varphi _{T,\pi ,\xi }) - \varepsilon \\&\le \liminf _n D_f(\varphi _{S_n,\pi ,\xi } \Vert \varphi _{T_n,\pi ,\xi })\\&\le \liminf _n D_f(S_n \Vert T_n). \end{aligned} \end{aligned}$$
    (41)

    The first inequality is proven as follows. Consider an admissible step function \(x: (0,\infty ) \rightarrow {\mathcal {M}}\odot {\mathcal {A}}\) in Proposition 2.16 such that \(D_f\) is achieved up to the tolerance \(\varepsilon \), we see

    $$\begin{aligned}{} & {} D_f(\varphi _{S,\pi ,\xi } \Vert \varphi _{T,\pi ,\xi }) -\varepsilon \nonumber \\{} & {} \quad =-\log \left( a + b + \int _{(0,\infty )}(1+t)\{\varphi _{S,\pi ,\xi }(x_t^{}x_t^*) + \frac{1}{t}\varphi _{T,\pi ,\xi }(y_t^{}y^*_t)\}d\mu (t) \right) \nonumber \\{} & {} \quad = -\log \left( a+b +\lim _n \int _{(0,\infty )}(1+t)\{\varphi _{S_n,\pi ,\xi }(x_t^{}x_t^*) + \frac{1}{t}\varphi _{T_n,\pi ,\xi }(y_t^{}y^*_t)\}d\mu (t) \right) \nonumber \\{} & {} \quad \le \liminf _n D_f(\varphi _{S_n,\pi ,\xi } \Vert \varphi _{T_n,\pi ,\xi }), \end{aligned}$$
    (42)

    using again the variational principle in the last line. The statement for the BS divergence likewise follows from the corresponding variational formula, see Lemma 2.17.

  2. (2)

    Let \((\xi , \pi , {\mathcal {A}})\) be a nearly optimal triple as in definition (34) for \(D_f(S_1 \circ T\Vert S_2 \circ T)\) up to tolerance \(\varepsilon \). Consider a step function \(x_t \in {\mathcal {N}}\otimes {\mathcal {A}}^{op}\) as in Proposition 2.16 that is nearly optimal in the variational characterization of \(D_f(\varphi _{S_1 \circ T,\pi ,\xi } \Vert \varphi _{S_2 \circ T,\pi ,\xi })\) up to a tolerance \(\varepsilon > 0\). Then clearly \((T \otimes id)(x_t) \in {\mathcal {M}}\odot {\mathcal {A}}^{op}\) is an admissible step function in the variational characterization of \(D_f(\varphi _{S_1,\pi ,\xi } \Vert \varphi _{S_2,\pi ,\xi })\), and so we have, using Kadison’s theorem \((T \otimes id)(n^*n) \ge (T \otimes id)(n)^*(T \otimes id)(n)\) and the unital property \(T(1)=1\),

    $$\begin{aligned}{} & {} D_f(S_1 \circ T\Vert S_2 \circ T)- 2\varepsilon \le D_f(\varphi _{S_1 \circ T,\pi ,\xi } \Vert \varphi _{S_2 \circ T,\pi ,\xi }) - \varepsilon \nonumber \\{} & {} \quad \le \ -\log \left( a + b + \int _{(0,\infty )}(1+t)\{\varphi _{S_1,\pi ,\xi } \circ T(x_t^{}x_t^*) + \frac{1}{t}\varphi _{S_2,\pi ,\xi } \circ T(y_t^{}y^*_t)\}d\mu (t) \right) \nonumber \\{} & {} \quad \le \ -\log \left( a + b + \int _{(0,\infty )}(1+t)\{\varphi _{S_1,\pi ,\xi }(T(x_t^{})T(x_t^*)) + \frac{1}{t}\varphi _{S_2,\pi ,\xi }(T(y_t^{})T(y^*_t))\}d\mu (t) \right) \nonumber \\{} & {} \quad \le \ -\log \left( a + b + \inf \limits _{(0,\infty )\xrightarrow {x} {\mathcal {M}}} \int _{(0,\infty )}(1+t)\{\varphi _{S_1,\pi ,\xi }(x_t^{}x_t^*) + \frac{1}{t}\varphi _{S_2,\pi ,\xi }(y_t^{}y^*_t)\}d\mu (t) \right) \nonumber \\{} & {} \quad = \ D_f(\varphi _{S_1,\pi ,\xi } \Vert \varphi _{S_2,\pi ,\xi }) \le \ D_f(S_1\Vert S_2), \end{aligned}$$
    (43)

    and therefore the statement follows because \(\varepsilon \) was arbitrary. The proof for the BS divergence is similar and now based on Proposition 2.17.

  3. (3)

    The variational principles expressed in definition (34) and Proposition 2.16 display \(D_f(S\Vert T)\) as a double supremum of affine functionals of ST. Joint convexity follows. The details are similar to (2).

  4. (4)

    Let \({{\tilde{{\mathscr {H}}}}}\) be a \({\mathcal {M}}\otimes B(\ell ^2({\mathbb {N}})) - {\mathcal {A}}\) bimodule. Since a left representation is a right representation of the opposite algebra, this is also a \({\mathcal {M}}- B(\ell ^2({\mathbb {N}}))^{op} \otimes {\mathcal {A}}\) bimodule, hence included in the maximization in Definition 3.4 of \(D_f(S\Vert T)\). Thus, we have \(D_f(S\Vert T) \ge D_f(S\otimes id_{B(\ell ^2({\mathbb {N}}))}\Vert T\otimes id_{B(\ell ^2({\mathbb {N}}))})\). On the other hand, let \((\pi ,{\mathcal {A}},\xi )\) be a nearly optimal triple in the Definition 3.4 of \(D_f(S\Vert T)\), up to tolerance \(\varepsilon \), where \(\pi \) corresponds to some \({\mathcal {M}}-{\mathcal {A}}\) bimodule \({\mathscr {H}}\). Then \({{\tilde{{\mathscr {H}}}}} = \ell ^2({\mathbb {N}}) \otimes {\mathscr {H}}\) is a \({\mathcal {M}}\otimes B(\ell ^2({\mathbb {N}}))-{\mathcal {A}}\) bimodule. Let \(\eta \) be any unit vector in \(\ell ^2({\mathbb {N}})\) and set \({{\tilde{\xi }}} = \xi \otimes \eta \) as well as \({{\tilde{S}}} = S\otimes id_{B(\ell ^2({\mathbb {N}}))}\). If \(x_t\) is a step function achieving the supremum the variational formula (Proposition 2.16) of \(D_f(\varphi _{\pi ,S,\xi } \Vert \varphi _{\pi ,S,\xi })\) up to tolerance \(\varepsilon \), it follows that \({{\tilde{x}}}_t:= x_t \otimes 1_{\ell ^2({\mathbb {C}})}\) is a valid step function in the variational definition of \(D_f(\varphi _{{{\tilde{\pi }}}, {{\tilde{S}}}, {{\tilde{\xi }}}} \Vert \varphi _{{{\tilde{\pi }}},{{\tilde{S}}},{{\tilde{\xi }}}})\) achieving the same value. So we have \(D_f(S\Vert T) -2\varepsilon \le D_f(S\otimes id_{B(\ell ^2({\mathbb {N}}))}\Vert T\otimes id_{B(\ell ^2({\mathbb {N}}))})\), and (4) is proven for \(D_f\). In the case of \(D_{BS}\) we use instead Proposition 2.17.

\(\square \)

Our basic proof strategy to prove certain more profound properties of the channel divergence below will be to reduce the statements to those for finite dimensional matrix algebras obtained recently in [27], and to do this we will need to restrict attention from now on to hyperfinite von Neumann algebras. [A von Neumann algebra \({\mathcal {N}}\) on a Hilbert space \({\mathscr {H}}\) is said to be hyperfinite if there exists a sequence (“filtration”) \(\{{\mathcal {N}}_n\}_{n\in {\mathbb {N}}}\subset {\mathcal {N}}\) of finite dimensional subalgebras, increasing, i.e. \({\mathcal {N}}_n \subset {\mathcal {N}}_{n+1}\), such that \({\mathcal {N}}=\left( \bigcup _n {\mathcal {N}}_n \right) ''\).]

Examples of hyperfinite factors are all type I factors, i.e. \(B({\mathscr {H}}), M_n({\mathbb {C}})\), but there also exist hyperfinite factors of types II and III. Let \({\mathcal {N}}, {\mathcal {M}}\) be hyperfinite factors, let \(S,T: {\mathcal {N}}\rightarrow {\mathcal {M}}\) be channels, and let \({\mathcal {N}}_n, {\mathcal {M}}_n\) be filtrations of \({\mathcal {N}}\), and \({\mathcal {M}}\), respectively. Following [55] one can construct for each n a “generalized conditional expectation” \(E_n:{\mathcal {M}}\rightarrow {\mathcal {M}}_n\) as follows. Let be the restriction of the faithful normal state \(\omega \) on \({\mathcal {M}}\), with GNS representation \(\pi _n\), GNS Hilbert space \({\mathscr {H}}_n\) and GNS vector \(\Omega _n\). Let \(J_n\) be the modular conjugation associated with \(\Omega _n\) and define a partial isometry \(V_n: {\mathscr {H}}_n \rightarrow {\mathscr {H}}\) by \(V_n \pi _n(x)\Omega _n:= x\Omega \) where \(x \in {\mathcal {M}}_n\). One can check that \(V_n^* {\mathcal {M}}' V_n \subset \pi _n({\mathcal {M}}_n)'\). It is therefore consistent to define \(E_n(m)\) to be the unique element \(x_n \in {\mathcal {M}}_n\) such that

$$\begin{aligned} \pi _n(x_n) = J_n V_n^* JmJ V_n J_n, \end{aligned}$$
(44)

where J is the modular conjugation associated with \(\Omega \) and \({\mathcal {M}}\). By construction, each \(E_n\) is a channel.

Lemma 3.9

(see [56]). \(E_n(m) \rightarrow m\) strongly as \(n \rightarrow \infty \) for all \(m \in {\mathcal {M}}\).

We get:

Proposition 3.10

(Martingale property). We have . Similarly,

Proof

Consider first the case \(D_f\). Let \((\xi , \pi , {\mathcal {A}})\) be a nearly optimal triple as in the definition (34) of \(D_f(S\Vert T)\). We let \(p_n\) be the abstract units in \({\mathcal {M}}_n\) which from an increasing net of projections in \({\mathcal {M}}\). By the von Neumann density theorem, \(p_n \rightarrow 1\) strongly. By monotonicity, i.e. by the data processing inequality applied to the inclusion channel \({\mathcal {M}}_n \subset {\mathcal {M}}\), we have .

Let \({\mathcal {B}}:= \bigcup _n {\mathcal {M}}_n\) which is a \(*\)-subalgebra of \({\mathcal {M}}\) whose weak closure is \({\mathcal {B}}''={\mathcal {M}}\). By the von Neumann density theorem, \({\mathcal {B}}\) is strongly dense on \({\mathcal {M}}\). Let \(x_t\) be an admissible step function as in Proposition 2.16 valued in \({\mathcal {M}}\odot {\mathcal {A}}^{op}\) which approximates \(D_f(\varphi _{S,\pi ,\xi } \Vert \varphi _{T,\pi ,\xi })\) up to an arbitrary chosen tolerance \(\varepsilon \). Because \(x_t\) has finite range and because \({\mathcal {B}}\) is strongly dense in \({\mathcal {M}}\), we can construct a sequence the step functions \(x_{n,t}\) in \({\mathcal {M}}_n \odot {\mathcal {A}}\) such that \(x_{n,t}\) is constant on each interval where \(x_t\) is constant, such that \(x_{n,t} = p_n\) for any t which is so small that \(x_t = 1\), and such that moreover \(x_{n,t} \rightarrow x_t\) strongly on each such interval as \(n \rightarrow \infty \). Then \(\varphi _{S,\pi ,\xi }(x_{n,t}^{}x_{n,t}^*) \rightarrow \varphi _{S,\pi ,\xi }(x_{t}^{}x_{t}^*)\) and, letting \(y_{n,t} = p_n - x_{n,t} \in {\mathcal {M}}_n\), \(\varphi _{T,\pi ,\xi }(y_{n,t}^{}y_{n,t}^*) \rightarrow \varphi _{T,\pi ,\xi }(y_{t}^{}y_{t}^*)\) as \(n \rightarrow \infty \), uniformly in t. We insert the step functions \(x_{n,t}, y_{n,t}\) and the unit \(p_n\) instead of \(x_t, y_t\) and 1 into the right side of the variational formula Proposition 2.16.

The convergence properties of the step functions \(x_{n,t}, y_{n,t}\) and the unit \(p_n\) mean that the right side converges to \(D_f(\varphi _{S,\pi ,\xi } \Vert \varphi _{T,\pi ,\xi }) -\varepsilon \) as \(n \rightarrow \infty \), thus . We can take \(\varepsilon \) smaller and smaller, proving that , which demonstrates the proposition.

For the BS divergence, we proceed in a similar way now using variational principle Proposition 2.17. \(\square \)

Combining the previous lemma and the martingale property we get:

Lemma 3.11

We have , and similarly for the BS divergence.

Proof

Consider the channels \(E_m \circ S, E_m \circ T: {\mathcal {N}}\rightarrow {\mathcal {M}}_m\) and an \({\mathcal {M}}_m - {\mathcal {A}}\) bimodule \({\mathscr {H}}_m\), representation \(\pi _m\), and vector \(\xi _m \in {\mathscr {H}}_m\) as in the definition of \(D_f(E_m \circ T\Vert E_m \circ S)\) such that the supremum (34) is achieved. By Lemma 3.7, we may assume the bimodule in question to be the standard \({\mathcal {M}}_m - {\mathcal {M}}_m\) bimodule \(L^2({\mathcal {M}}_m)\). From the channel \(E_m: {\mathcal {M}}\rightarrow {\mathcal {M}}_m\) and the functional \(\langle \xi _m, \ . \xi _m\rangle \), we then get an induced \({\mathcal {M}}- {\mathcal {M}}_m\) bimodule in view of Proposition 3.3. It immediately follows that \(D_f(S \Vert T) \ge D_f(E_m \circ S \Vert E_m \circ T)\) because in the variational definition (34) of \(D_f(S \Vert T)\), we take the supremum over the larger set of all bimodules whereas \(D_f(E_m \circ S \Vert E_m \circ T)\) corresponds precisely to the induced bimodule \({\mathcal {M}}- {\mathcal {M}}_m\) just described. Thus, we see that \(D_f(S \Vert T) \ge \limsup _m D_f(E_m \circ S \Vert E_m \circ T)\), whereas \(D_f(S \Vert T) \le \liminf _m D_f(E_m \circ S \Vert E_m \circ T)\) follows in view of the lower semi-continuity of the channel divergence because \(E_m\) is pointwise strongly – hence weakly – convergent by Lemma 3.9. Therefore, we see that, simply \(D_f(S \Vert T)= \lim _m D_f(E_m \circ S \Vert E_m \circ T)\). The statement now follows from the martingale property. The proof for the BS divergence is similar and based instead on Proposition 2.17. \(\square \)

The following property, observed and proven first in [27] for matrix algebras, is crucial for this work.

Proposition 3.12

(Internal subadditivity). Let \(S_2, T_2: {\mathcal {N}}\rightarrow {\mathcal {R}}, S_1, T_1: {\mathcal {R}}\rightarrow {\mathcal {M}}\) be channels between hyperfinite von Neumann algebras. Then we have

$$\begin{aligned} D_{BS}(S_2 \circ S_1 \Vert T_2 \circ T_1) \le \sum \limits _{i=1,2} D_{BS}(S_i\Vert T_i). \end{aligned}$$
(45)

Proof

Let \(E_m: {\mathcal {R}}\rightarrow {\mathcal {R}}_m, F_k:{\mathcal {M}}\rightarrow {\mathcal {M}}_k\) be sequences of generalized conditional expectations as described above.

(46)

In the first two lines we used lower semi-continuity, in the third line we used the martingale property, in the fourth line we used the result by [27] in the context of finite-dimensional von Neumann algebras, and in the last step we used the martingale property and Lemma 3.11. \(\square \)

Remark 3.13

A noteworthy special case of the proposition arises when \({\mathcal {M}}= {{\mathbb {C}}}\), i.e. \(S_1, T_1\) are states. In this case the subadditivity corresponds to the “chain rule” of [27].

Consider channels \(S_i, T_i: {\mathcal {N}}_i \rightarrow {\mathcal {M}}_i\) between hyperfinite von Neumann algebras represented on Hilbert spaces \({\mathscr {H}}_i\), where \(i=1,2\). We can form the weak closure of \({\mathcal {N}}_1 \odot {\mathcal {N}}_2\) in \({\mathcal {B}}({\mathscr {H}}_1 \otimes {\mathscr {H}}_2)\) and denote this (hyperfinite) von Neumann algebra by \({\mathcal {N}}_1 {{\bar{\otimes }}} {\mathcal {N}}_2\), and we proceed similarly for \({\mathcal {M}}_i\). Then it follows that \(S_1 \otimes S_2: {\mathcal {N}}_1 \odot {\mathcal {N}}_2 \rightarrow {\mathcal {M}}_1 \otimes {\mathcal {M}}_2\) can be extended to a channel \(S_1 \otimes S_2\) from \({\mathcal {N}}_1 {{\bar{\otimes }}} {\mathcal {N}}_2 \rightarrow {\mathcal {M}}_1 {{\bar{\otimes }}} {\mathcal {M}}_2\), and similarly for \(T_1 \otimes T_2\). Then we have:

Proposition 3.14

(External additivity). Let \(S_i, T_i: {\mathcal {N}}_i \rightarrow {\mathcal {M}}_i\) be channels between the hyperfinite von Neumann algebras \({\mathcal {N}}_i, {\mathcal {M}}_i, i=1,2\). Then \(D_{BS}(S_1\otimes S_2\Vert T_1\otimes T_2) = \sum \limits _{i=1,2} D_{BS}(S_i\Vert T_i)\).

Proof

Similar to the proof of internal subadditivity using again Lemma 3.11 and that \(D_{BS}\) is additive under the tensor product in the finite dimensional case by results of [27].

\(\square \)

3.3 Channel divergences for Kraus channels

Let \({\mathcal {M}}\) be a von Neumann algebra in standard form acting on the Hilbert space \({\mathscr {H}}\) with cyclic and separating vector \(\Omega \). We consider a class of channels \(T,S: {\mathcal {M}}\rightarrow {\mathcal {M}}\) of so-called “Kraus type” investigated in the context of general von Neumann algebras by [57]. By definition, these are of the form

$$\begin{aligned} \begin{aligned} S(m)&= \sum _{i=1}^N a_i^* m a_i, \quad \sum _{i=1}^N a_i^*a_i^{} = 1, \\ T(m)&= \sum _{i=1}^M b_i^* m b_i, \quad \sum _{i=1}^M b_i^*b_i^{} = 1, \end{aligned} \end{aligned}$$
(47)

with \(m,a_i,b_j \in {\mathcal {M}}\) and \(N,M \in {{\mathbb {N}}}\). Our aim is to give a formula for \(D_{BS}(S \Vert T)\) for the channel divergence of two Kraus channels in terms of their “Choi operators” also introduced in this context by [57]. To this end, we define the Choi operators \(C_S, C_T \in B({\mathscr {H}})\) for such channels as, respectively

$$\begin{aligned} C_S = \sum _{i=1}^N a_i^{}|\Omega \rangle \langle \Omega | a_i^*, \quad C_T = \sum _{i=1}^M b_i^{}|\Omega \rangle \langle \Omega | b_i^* \ . \end{aligned}$$
(48)

By construction \(C_S \in B({\mathscr {H}})\) is a non-negative operator of finite rank such that \(\textrm{Tr}_{\mathscr {H}}C_S = \sum _{i=1}^N \Vert a_i \Omega \Vert ^2 = 1\), and similarly for \(C_T\).

Let \({\mathcal {C}}\subset B({\mathscr {H}})\) be the \(*\)-subalgebra of all operators of the form \(\sum _{i=1}^{N} c_i |\Omega \rangle \langle \Omega | d_i\) for some \(N \in {{\mathbb {N}}}, c_j, d_j \in {\mathcal {M}}\). By [57, Theorem 4], the spectral projections of the operators \(C_S, C_T\) are in \({\mathcal {C}}\) and consequently this algebra is closed under the spectral calculus. Now suppose \(\sigma \) is the Kubo-Ando connection associated with a non-negative operator monotone function \(f:[0,\infty ) \rightarrow [0,\infty )\) with \(a,b<\infty \) in (9). It follows from (15) that the expressions under the limit in

$$\begin{aligned} C_S \sigma C_T = \lim _{\varepsilon \rightarrow 0} \Big (C_S + \varepsilon (C_S+C_T) \Big )\sigma \Big (C_T + \varepsilon (C_S+C_T) \Big ) \end{aligned}$$
(49)

are non-negative elements from \({\mathcal {C}}\). Since the arguments of the mean \(\sigma \) are decreasing and strongly convergent as \(\varepsilon \rightarrow 0\), the limit not only exists by the properties of the Kubo-Ando connections, but is also in \(C_S \sigma C_T \in {\mathcal {C}}\), see the proof of [57, Theorem 4]. Hence \(C_S \sigma C_T\) is in particular a non-negative finite rank operator in \(B({\mathscr {H}})\).

For the operator monotone function \(f(t) = \log t\) on \((0,\infty )\), similar arguments show that the operators \([C_S + \varepsilon (C_S+C_T)]\sigma [C_T + \varepsilon (C_S+C_T)]\) are still in \({\mathcal {C}}\) for \(\varepsilon >0\). The limit \(\varepsilon \rightarrow 0\) of this decreasing sequence exists but possibly only in the sense of an unbounded quadratic form. In fact, as long as \(\varepsilon >0\), one can see e.g. from (15), \(\Vert C_S\Vert , \Vert C_T\Vert \le 1\) together with \(V^* f(A) V \le f(V^*AV)\) for contractions V, positive \(A \in B({\mathscr {H}})_+\) and operator monotone functions \(f: (0,\infty ) \rightarrow {\mathbb {R}}\) (see e.g. [58]), that

$$\begin{aligned} \begin{aligned}&[C_S + \varepsilon (C_S+C_T)]\sigma [C_T + \varepsilon (C_S+C_T)] \\&\quad \le \max \{1, (1+\varepsilon )\Vert C_S \Vert + \varepsilon \Vert C_T\Vert \} \log [C_T + \varepsilon (C_S+C_T)] \\&\quad \le (1+2\varepsilon ) \log (1+2\varepsilon ) \ \xrightarrow {\varepsilon \rightarrow 0} \ 0. \end{aligned} \end{aligned}$$
(50)

Thus, for \(f(t)=\log t\), the corresponding Kubo-Ando mean \(C_S \sigma C_T\) defines a negative possibly unbounded quadratic form given by a finite rank operator in \({\mathcal {C}}\) on its domain. Assume that \(C_S \sigma C_T\) is bounded, hence in \({\mathcal {C}}\). Then the positive finite rank operator \(-C_S \sigma C_T\) may be written as a linear combination of its eigenprojections as \(\sum _{j=1}^K c_j |\Omega \rangle \langle \Omega |c_j^*\) for some \(c_j \in {\mathcal {M}}, K \in {\mathbb {N}}_0\), which gives, for \(m' \in {\mathcal {M}}'\)

$$\begin{aligned} \langle \Omega _{S,T}, m'{}^*m'\Omega _{S,T}\rangle = \sum _{j=1}^K \langle \Omega , m'{}^*c_j c_j^* m'\Omega \rangle \le \left( \sum _{j=1}^K \Vert c_j\Vert ^2 \right) \langle \Omega , m'{}^*m'\Omega \rangle . \end{aligned}$$
(51)

Definition 3.15

Let \(\sigma \) be the Kubo-Ando mean for \(f(t) = \log t\) and assume that \(C_S \sigma C_T\) is bounded (hence in \({\mathcal {C}}\)). Then we define \(\Omega _{S,T} \in L^2({\mathcal {M}},\Omega )_+\) as the unique representer of the positive normal functional on \({\mathcal {M}}'\) associated with the non-negative finite rank operator \(-C_S \sigma C_T\) for the operator mean associated with \(f(t) = \log t\),

$$\begin{aligned} \langle \Omega _{S,T}, m'\Omega _{S,T}\rangle = -\textrm{Tr}_{\mathscr {H}}\Big [ m'(C_S \sigma C_T) \Big ], \quad m' \in {\mathcal {M}}'. \end{aligned}$$
(52)

Remark 3.16

By the Connes–Radon–Nikodym theorem and (51), there must be \(m \in {\mathcal {M}}\) such that \(\Omega _{S,T} = m\Omega \in {\mathcal {M}}\Omega \). We therefore have \(\Omega _{S,T} \in L^\infty ({\mathcal {M}},\Omega ) \cong {\mathcal {M}}\Omega \) by the well-known characterization of this space.

Proposition 3.17

For two Kraus channels ST on the finite dimensional or properly infinite hyperfinite von Neumann algebra \({\mathcal {M}}\) standardly represented on \(L^2({\mathcal {M}}, \Omega )\) we have

$$\begin{aligned} D_{BS}(S \Vert T) = \Big \Vert \Omega _{S,T} \Big \Vert _{L^\infty ({\mathcal {M}}, \Omega )}^2 \end{aligned}$$
(53)

with the convention that the right side is \(+\infty \) if \(C_S \sigma C_T\) is unbounded.

Proof

First assume that \(C_S \sigma C_T\) is bounded, so \(\Omega _{S,T} \in L^\infty ({\mathcal {M}},\Omega ) \cong {\mathcal {M}}\Omega \) by the preceding remark. By Proposition 3.7 we can restrict attention to the standard bimodule \({\mathscr {H}}=L^2({\mathcal {M}}, \Omega )\) in the variational definition (34) of channel divergence. Furthermore, since \({\mathcal {M}}'\Omega \) is strongly dense in \({\mathscr {H}}\) as \(\Omega \) is standard, it is sufficient to restrict to vectors \(\xi \in {\mathscr {H}}\) of the form \(\xi = x'\Omega , x' \in {\mathcal {M}}'\) in the variational definition. We get using the definitions and the notations \(m' = Jm^{op *} J \in {\mathcal {M}}'\), \(x' = Jx^{op *} J\) and \(X:=\pi (1 \otimes x^{op})\),

$$\begin{aligned} \begin{aligned} \varphi _{S,\pi ,\xi }(m \otimes m^{op})&= \langle \xi , \pi (S(m) \otimes m^{op})\xi \rangle \\&= \sum _{i=1}^N \langle \xi , \ell _{{\mathscr {H}}}(a_i^* m a_i^{}) r_{{\mathscr {H}}}(m^{op}) \xi \rangle \\&= \sum _{i=1}^N \langle x'\Omega , a_i^* m a_i^{} m' x' \Omega \rangle \\&= \sum _{i=1}^N \langle x' a_i\Omega , m m' x' a_i \Omega \rangle \\&= \textrm{Tr}_{{\mathscr {H}}}\Big [XC_SX^* \pi (m \otimes m^{op})\Big ] \end{aligned} \end{aligned}$$
(54)

We also have a similar formula replacing S by T and \(a_j\) by \(b_j\). The variational principle for the maximal BS-divergence (Proposition 2.17) thereby gives us

$$\begin{aligned} \begin{aligned}&D_{BS}(\varphi _{S,\pi ,\xi } \Vert \varphi _{T,\pi ,\xi }) \\&\quad = \sup \sup \left( \log n - \int _{1/n}^\infty \{ \varphi _{S,\pi ,\xi }(v_t^{}v_t^*) + t^{-1} \varphi _{T,\pi ,\xi }(w_t^{}w_t^*) \} \frac{dt}{t} \right) \\&\quad = \sup \sup \left( \log n - \int _{1/n}^\infty \{ \textrm{Tr}_{\mathscr {H}}(V_t^*XC_SX^*V_t^{}) + t^{-1} \textrm{Tr}_{\mathscr {H}}(W_t^*XC_TX^*W_t^{}) \} \frac{dt}{t} \right) \end{aligned} \end{aligned}$$
(55)

by (54), where the first supremum is over \(n \in {\mathbb {N}}\), the second supremum is over the finite range step functions \((1/n,\infty ) \xrightarrow {v} {\mathcal {M}}\odot {\mathcal {M}}^{op}\) such \(v_t = 0\) for sufficiently large t, and where we use the abbreviations \(V_t = \pi (v_t), w_t = 1-v_t, W_t = \pi (v_t)\). Since the strong closure of \(\pi ({\mathcal {M}}\odot {\mathcal {M}}^{op})\) is strongly dense in \(B({\mathscr {H}})\), the step functions \(V_t\) can be used to approximate in the strong topology any given finite range step function \((0,\infty ) \rightarrow B({\mathscr {H}})\) which is zero for sufficiently large t and 1 for sufficiently small t. Let P be any orthogonal projection onto a finite dimensional subspace of \({\mathscr {H}}\) containing the (finite dimensional) ranges of \(XC_SX^*\) and \(XC_TX^*\). Then it follows that we may further replace \(V_t\) by \(PV_tP\) and \(W_t\) by \(PW_tP\) and the variational formula [44, Remark 9.2] (or our Proposition 2.17) therefore tells us that

$$\begin{aligned} \begin{aligned} D_{BS}(\varphi _{S,\pi ,\xi } \Vert \varphi _{T,\pi ,\xi })&= - \textrm{Tr}_{\mathscr {H}}\Big [ (XC_SX^*) \sigma (XC_TX^*) P \Big ]\\&= - \textrm{Tr}_{\mathscr {H}}\Big [ (XC_SX^*) \sigma (XC_TX^*) \Big ]\\&= - \textrm{Tr}_{\mathscr {H}}\Big [ X(C_S \sigma C_T)X^* \Big ] \\&= - \textrm{Tr}_{\mathscr {H}}\Big [ \pi (1 \otimes x^{op}) (C_S \sigma C_T)\pi (1 \otimes x^{op})^* \Big ] \\&= - \textrm{Tr}_{\mathscr {H}}\Big [ x'{}^* x' (C_S \sigma C_T) \Big ] \\&= \Big \Vert x' \Omega _{S,T} \Big \Vert ^2, \end{aligned} \end{aligned}$$
(56)

where we used that P was arbitrary so long as its range ranges of \(XC_SX^*\) and \(XC_TX^*\) to go the third line, and where we used the transformer equality (see e.g. [25, Lemma D.3]) to go to the fourth line. The last step is admissible if we assume that \(x'\), hence X, is invertible, which we assume momentarily is the case. Since we know that \(\Omega _{S,T} \in L^\infty ({\mathcal {M}},\Omega )\), there is \(m \in {\mathcal {M}}\) such that \(\Omega _{S,T} = m\Omega \), therefore

$$\begin{aligned} D_{BS}(\varphi _{S,\pi ,\xi } \Vert \varphi _{T,\pi ,\xi }) = \Big \Vert x' m \Omega \Big \Vert ^2 =\Big \Vert m\xi \Big \Vert ^2. \end{aligned}$$
(57)

If we could show that \(x'\Omega \) with \(x' \in {\mathcal {M}}'\) ranging over the invertible elements is dense in \({\mathscr {H}}\), then this formula would hold on for all \(\xi \in {\mathscr {H}}\). This follows, in fact, from the hyperfinite property because invertible elements are norm dense in a finite-dimensional von Neumann algebra, and \({\mathcal {M}}\) is the strong closure of hyperfinite algebras. Thus, we get a strongly convergent sequence \(x_n \rightarrow x\) with \(x_n\) invertible for any \(x \in {\mathcal {M}}\). Applying this to \(x:= Jx'J\) and choosing \(x_n' = Jx_nJ\) gives the statement. Taking the supremum over our strongly dense set of vectors \(\xi \) with unit norm now gives the statement of the proposition because \(\Vert m\Vert = \Vert \Omega _{S,T}\Vert _{L^\infty ({\mathcal {M}},\Omega )}\).

Let us now assume that \(C_S \sigma C_T\) is not bounded. The completely positive maps \(T_\varepsilon := T+\varepsilon (S+T), S_\varepsilon := S+\varepsilon (S+T)\) do not suffer from this problem for \(\varepsilon >0\) and are (non-normalized) increasing (as \(\varepsilon \rightarrow 0\)) sequences of Kraus channels. By monotonicity of the operator mean \(\sigma \), \(C_{S_\varepsilon } \sigma C_{T_{\varepsilon }}\) is an increasing sequence of self-adjoint operators in \({\mathcal {C}}\) whose range remains in a fixed finite dimensional subspace of \({\mathcal {H}}\). Hence it is convergent to the unbounded operator \(C_S \sigma C_T\) in norm from which we can see that there must be \(x' \in {\mathcal {M}}\) such that \(- \textrm{Tr}_{\mathscr {H}}[ x'{}^* x' (C_{S_\varepsilon } \sigma C_{T_{\varepsilon }}) ]\) diverges to \(+\infty \), hence so does \(D_{BS}(S_\varepsilon \Vert T_\varepsilon )\) by (56). However, since \(T_\varepsilon , S_\varepsilon \) are decreasing sequences of channels, by monotonicity \(D_{BS}(S \Vert T) \ge D_{BS}(S_\varepsilon \Vert T_\varepsilon ) \rightarrow \infty \). \(\square \)

3.4 Examples

As a simple special case of Kraus channels we consider ST in (47) of the form

$$\begin{aligned} \sum _{i=1}^N a_i^{} a_i^* = 1, \quad a_i^* a_j = \delta _{i,j}1, \quad \sum _{i=1}^M b_i^{} b_i^* = 1, \quad b_i^* b_j = \delta _{i,j}1. \end{aligned}$$
(58)

In other words, \(\{a_j\}\) respectively \(\{b_j\}\) each generate algebras isomorphic to the Cuntz algebras on N respectively M isometries.

Corollary 3.18

Let \({\mathcal {M}}\) be a finite dimensional or properly infinite von Neumann algebra and let ST be Kraus channels such that (58) holds for some \(N,M \in {\mathbb {N}}\). Then either \(N=M\) and \(T=S\), or we have \(D_{BS}(S\Vert T) = \infty \), or \(N<M\) and \(D_{BS}(S\Vert T)=0\).

Remark 3.19

In particular, note that if \(T(m) = u m u^*\) with \(u \in {\mathcal {M}}\) unitary and \(S=id\), we have \(D_{BS}(id \Vert T) = D_{BS}(T \Vert id) = \infty \) unless \(u=\lambda 1\).

Proof

It follows from the Cuntz algebra relations that the corresponding Choi operators are \(C_T = Q\) and \(C_S = P\) are orthogonal projections of rank N respectively M on \({\mathscr {H}}\). Denote by \(P \wedge Q\) the orthogonal projection onto the intersection of the ranges of P and Q. Consider the operator monotone functions \(f_n(t):=\log (t+\frac{1}{n}), t\ge 0\), which have integral representations \(f_n(t)=-\log n + \int _{1/n}^{\infty }\frac{s}{t+s}\frac{dt}{t}\), and let \(\sigma _n\) be the corresponding operator means. By the proof of [45, Theorem 3.7], we have \((tP):Q = t(t+1)^{-1} (P \wedge Q)\). By the integral representation (10) for this mean, we therefore get

$$\begin{aligned} \begin{aligned} P \sigma _n Q&= P \log \tfrac{1}{n} + \int _{1/n}^\infty [(tP):Q] \frac{dt}{t^2}\\&= P \log \tfrac{1}{n} + (P \wedge Q) \int _{1/n}^\infty \frac{dt}{t(t+1)}\\&= [(P \wedge Q)-P]\log n - (P \wedge Q)\Bigg [ \log n - (\log t - \log (1+t))_{1/n}^\infty \Bigg ]\\&= [(P \wedge Q)-P]\log n + (P \wedge Q) \log (1+\tfrac{1}{n}) \end{aligned} \end{aligned}$$
(59)

As \(n \rightarrow \infty \), the operator means \(P\sigma _n Q\) are decreasing (hence convergent) to the potentially unbounded quadratic form \(P\sigma Q = [(P \wedge Q)-P]\infty \), where \(\sigma \) corresponds to the operator monotone function \(f(t) = \log t\). Therefore, if \(P \sigma Q\) is to be bounded, we must have \((P \wedge Q)-P = 0\), so P must be a subprojection of Q, or otherwise \(D_{BS}(S\Vert T) = \infty \) by Proposition 3.17. If \(P=Q\), then \(N=M\), and there must be \(R_{ij} \in {\mathbb {C}}\) such that \(a_i \Omega = \sum _{j=1}^N R_{ij} b_j \Omega \), and since \(\Omega \) is separating, we must have \(a_i = \sum _{j=1}^N R_{ij} b_j\). The Cuntz algebra relations then show that \((R_{ij})\) is a unitary matrix and then clearly \(S=T\). If \(P<Q\), then clearly \(N<M\) and it follows that \(P \sigma _n Q\) is decreasing (hence convergent) to 0 and \(D_{BS}(S\Vert T) = 0\). \(\square \)

Another very simple but conceptually relevant example is:

Proposition 3.20

Let \({\mathcal {M}}\) be a finite dimensional or properly infinite, hyperfinite von Neumann algebra and let \(e_j \in {\mathcal {M}}\) be N mutually orthogonal projections such that \(\sum _i e_i=1, 0<e_i<1\). We consider the Kraus channel

$$\begin{aligned} M(m) = e_1me_1 + \dots + e_N m e_N \end{aligned}$$
(60)

corresponding to an N-ary measurement. Then

$$\begin{aligned} D_{BS}(id\Vert M) = \log N \end{aligned}$$
(61)

Proof

The Choi operator associated with M is \(C_M = \sum e_i|\Omega \rangle \langle \Omega | e_i\) and that for the identity channel id is \(C_{id} = |\Omega \rangle \langle \Omega |\). We begin by working out the parallel sum \(\langle \xi , [(tC_{id}):C_M] \xi \rangle \) using the variational definition (11). A minimizer \(\zeta _0\) in that definition has to satisfy

$$\begin{aligned} t\langle \Omega , \zeta _0\rangle \Omega - \sum _i \langle e_i\Omega , \xi -\zeta _0\rangle e_i\Omega = 0. \end{aligned}$$
(62)

The vectors \(e_i\Omega \) are non-zero and linearly independent because \(\Omega \) is separating and because the \(e_i\)’s are orthogonal and non-trivial. We therefore see that

$$\begin{aligned} t\langle \Omega , \zeta _0\rangle = \langle e_i\Omega , \xi -\zeta _0\rangle \end{aligned}$$
(63)

for all \(i=1, \dots , N\) and any solution \(\zeta _0\) is a minimizer for the variational problem (11). To find a solution we consider the ansatz \(\zeta _0 = \sum _i a_i \Vert e_i \Omega \Vert ^{-2 }e_i\Omega \), leading to a linear system for the unknown complex coefficients \(a_i\). A solution is

$$\begin{aligned} a_i = \langle e_i \Omega , \xi \rangle - \frac{t}{1+Nt} \langle \Omega , \xi \rangle . \end{aligned}$$
(64)

Substituting the corresponding \(\zeta _0\) into the variational definition (11) yields

$$\begin{aligned} \langle \xi , [(tC_{id}):C_M] \xi \rangle = \frac{t}{Nt+1} | \langle \xi , \Omega \rangle |^2, \end{aligned}$$
(65)

noting that the dependence upon \(e_i\) has cancelled. In other words \([(tC_{id}):C_M] = \tfrac{t}{Nt+1}|\Omega \rangle \langle \Omega |\). Next we use the integral representation (10) for the Kubo-Ando means \(\sigma _n\) associated with the functions \(f_n(t) = \log (\tfrac{1}{n}+t)\). The corresponding measures \(d\mu _n\) are read off from the integral representations \(f_n(t)=-\log n + \int _{1/n}^{\infty }\frac{s}{t+s}\frac{dt}{t}\). This gives for the Kubo-Ando mean \(C_{id} \sigma C_M\) associated with \(f(t) = \log t\) as required for the BS divergence,

$$\begin{aligned} \begin{aligned} -C_{id} \sigma C_M&= \lim _n -C_{id} \sigma _n C_M \\&= \lim _n \left( (\log n)C_{id} - \int _{(1/n,\infty )} [(tC_{id}):C_M] \frac{dt}{t^2} \right) \\&= \lim _n \lim _K \left( (\log n) |\Omega \rangle \langle \Omega | - \int _{(1/n,K)}\left( \frac{t}{Nt+1} |\Omega \rangle \langle \Omega | \right) \frac{dt}{t^2} \right) \\&= \lim _n \lim _K \left( \log n - \int _{(1/n,K)} \frac{dt}{t(Nt+1)} \right) |\Omega \rangle \langle \Omega | \\&= \lim _n \lim _K \left( \log n - \log (t) \bigg |_{1/n}^K + \log (Nt+1) \bigg |_{1/n}^K \right) |\Omega \rangle \langle \Omega | \\&= (\log N)|\Omega \rangle \langle \Omega |. \end{aligned} \end{aligned}$$
(66)

Next we use the definition (52) for \(S=id, T=M\), giving

$$\begin{aligned} \Omega _{id,M} = (\log N)^{1/2} \Omega . \end{aligned}$$
(67)

By Proposition 3.17, we therefore have \( D_{BS}(id\Vert M) = \log N \) as we wanted to show. \(\square \)

Our final example concerns finite index inclusions of von Neumann factors.

Proposition 3.21

Let \({\mathcal {N}}\subset {\mathcal {M}}\) be a finite index inclusion of von Neumann factors with associated minimal conditional expectation \(E: {\mathcal {M}}\rightarrow {\mathcal {N}}\). Then \(D_{BS}(id\Vert E) = \log [{\mathcal {M}}:{\mathcal {N}}]\).

Proof

a) We let \(d^2 = [{\mathcal {M}}:{\mathcal {N}}]\) and we first show \(D_{BS}(id \Vert E) \ge \log d^2\) using the variational definition (34) for the channel divergence in the case of the BS divergence. We let e be the Jones projection for the inclusion, i.e. \({\mathcal {M}}\) is generated by \({\mathcal {N}}\) and e. Then \(E(e) = d^{-2}1\). Let \(\pi \) be the representation of \({\mathcal {M}}\odot {\mathcal {M}}^{op}\) coming from the standard bimodule \(L^2({\mathcal {M}})\) with underlying Hilbert space \({\mathscr {H}}\). Recall that for \(\xi \in {\mathscr {H}}\) we have by definition \(\varphi _{\xi ,E,\pi }(m \otimes m^{op}) = \langle \xi , E(m)J(m^{op})^*J\xi \rangle \) for the quantity appearing in (34) for the channel \(E:{\mathcal {M}}\rightarrow {\mathcal {N}}\). We use this bimodule in the variational characterization of Proposition 2.17 involving a supremum over \(n \in {\mathbb {N}}\) and admissible step functions \((1/n,\infty )\xrightarrow {x} {\mathcal {M}}\odot {\mathcal {M}}^{op}\) (as well as \(y_t:=1-x_t\)). We obtain a lower bound by constructing a specific step function \(x_n\) for each \(n \in {\mathbb {N}}\) and show that the limit \(n \rightarrow \infty \) of the variational expression in Proposition 2.17 tends to a quantity that is at least \(\log d^2\).

For this, we choose a standard vector \(\Omega \in {\mathscr {H}}\) for \({\mathcal {M}}\) and let \(\xi :=e\Omega /\Vert e\Omega \Vert \). We also let

$$\begin{aligned} x_{t}:= {\left\{ \begin{array}{ll} 1 - \tfrac{t}{t+d^{-2}}e \otimes 1 &{} {1/n \le t \le n,}\\ 0 &{} {t>n} \end{array}\right. } \end{aligned}$$
(68)

and we let \(y_{t} = 1-x_{t}\). Since \(e\xi =\xi \) and \(E(e) = d^{-2}1\), we get

$$\begin{aligned} \varphi _{\xi ,E,\pi }(y_{t}^* y_{t}^{}) = \frac{t^2 d^{-2}}{(t+d^{-2})^2}, \quad \varphi _{\xi ,id,\pi }(x_{t}^* x_{t}^{}) = \frac{d^{-4}}{(t+d^{-2})^2} \end{aligned}$$
(69)

in the range \(t \le n\). This gives us

$$\begin{aligned} \begin{aligned}&\int _{1/n}^\infty \{ \varphi _{\xi ,id,\pi }(x_{t}^* x_{t}^{}) + \frac{1}{t} \varphi _{\xi ,E,\pi }(y_{t}^* y_{t}^{}) \} \frac{dt}{t}\\&\quad = \int _{1/n}^{n} \bigg ( \frac{d^{-4}}{(t+d^{-2})^2} + \frac{d^{-2} t}{(t+d^{-2})^2} \bigg ) \frac{dt}{t} + \frac{1}{n}\\&\quad \le \frac{1}{n} + d^{-2} \int _{1/n}^\infty \frac{dt}{t(t+d^{-2})} \\&\quad =\log n + \log (n^{-1} + d^{-2}) + \frac{1}{n}. \end{aligned} \end{aligned}$$
(70)

Then it follows from the variational characterization of the BS divergence (Proposition 2.17) that

$$\begin{aligned} \begin{aligned}&D_{BS}(\varphi _{\xi ,id,\pi }\Vert \varphi _{\xi ,E,\pi })\\&\quad \ge \sup _n \left( \log n - \int _{1/n}^\infty \{ \varphi _{\xi ,id,\pi }(x_{t}^* x_{t}^{}) + \frac{1}{t} \varphi _{\xi ,E,\pi }(y_{t}^* y_{t}^{}) \} \frac{dt}{t} \right) \\&\quad \ge \lim _n \left( \log n - \int _{1/n}^\infty \{ \varphi _{\xi ,id,\pi }(x_{t}^* x_{t}^{}) + \frac{1}{t} \varphi _{\xi ,E,\pi }(y_{t}^* y_{t}^{}) \} \frac{dt}{t} \right) \\&\quad \ge \lim _n \bigg ( \log n -(\log n + \log (n^{-1} + d^{-2}) + \frac{1}{n}) \bigg ) = \log d^2. \end{aligned} \end{aligned}$$
(71)

By the variational characterization of the channel divergence as a supremum of \(D_{BS}(\varphi _{\xi ,id,\pi }\Vert \varphi _{\xi ,E,\pi })\) over triples \(({\mathcal {A}},\pi ,\xi )\) we therefore have \(D_{BS}(id\Vert E) \ge \log d^2\).

b) The conditional expectation satisfies the Pimsner-Popa bound \(E \ge d^{-2} id\) [38, 59]. Let \(\varepsilon >0\). Then we can choose a triple \((\pi ,{\mathcal {A}},\xi )\) (consisting of a von Neumann algebra \({\mathcal {A}}\), binormal representation \(\pi \) on \({\mathscr {H}}\) of \({\mathcal {M}}\odot {\mathcal {A}}^{op}\), and unit vector \(\xi \) in \({\mathscr {H}}\)) an \(n \in {\mathbb {N}}\), and an admissible step function \((1/n,\infty )\xrightarrow {x} {\mathcal {M}}\odot {\mathcal {A}}^{op}\) such that the supremum in the variational definition (34) is saturated up to tolerance \(\varepsilon \):

$$\begin{aligned} \begin{aligned}&D_{BS}(id\Vert E)-\varepsilon \\&\quad \le \log n - \int _{1/n}^\infty \{ \varphi _{\xi ,id,\pi }(x_t^{*}x_t^{})+\frac{1}{t}\varphi _{\xi ,E,\pi }(y_t^*y_t^{})) \} \frac{dt}{t} \\&\quad \le \log n - \int _{1/n}^\infty \{ \varphi _{\xi ,id,\pi }(x_t^{*}x_t^{})+\frac{1}{td^2}\varphi _{\xi ,id,\pi }(y_t^*y_t^{}) \} \frac{dt}{t} \\&\quad = \log d^2 + \log (nd^{-2}) - \int _{1/(nd^{-2})}^\infty \{ \varphi _{\xi ,id,\pi }(x_t^{*}x_t^{})+\frac{1}{t}\varphi _{\xi ,id,\pi }(y_t^*y_t^{}) \} \frac{dt}{t} \end{aligned} \end{aligned}$$
(72)

The right side is \(\le \log d^2 + D_{BS}(id\Vert id) = \log d^2\) using the variational definition again. Since \(\varepsilon > 0\) can be as small as we like we have shown \(D_{BS}(id\Vert E) \le \log d^2\). We have already shown \(D_{BS}(id\Vert E) \ge \log d^2\) in a) so the proof is complete. \(\square \)

4 Applications to QFT

4.1 Algebraic QFT

We recall the axioms in the algebraic approach to QFT, see [60] as a general reference. In the preceding sections we have described properties of the channel divergence \(D_{BS}\) in the general context of von Neumann algebras. In the context of local QFT, one has additional structure due to spacetime localization, and it turns out that this structure plays very nicely with the notion of channel divergence. We restrict to the setting of Minkowski spacetime \(({{\mathbb {R}}}^n,\eta )\) for \(n \ge 2\).

A causal diamond O is the causal completion of an open, simply connected subset U with compact closure of a Cauchy surface, where the causal structure is induced by the Minkowski metric. A QFT in the algebraic setting (‘AQFT’) is an assignment of simply connected causal diamonds to von Neumann factors \(O \mapsto {\mathcal {A}}(O)\) represented on the same Hilbert space \({\mathscr {H}}\), subject to the following conditions:

  1. (a1)

    (Isotony) \({\mathcal {A}}(O_1) \subset {\mathcal {A}}(O_2)\) if \(O_1 \subset O_2\). We write \({\mathcal {A}}= \overline{\bigcup _O {\mathcal {A}}(O)}\) with completion in the operator norm.

  2. (a2)

    (Causality) \([{\mathcal {A}}(O_1),{\mathcal {A}}(O_2)]=\{0\}\) if \(O_1\) is space-like related to \(O_2\).

  3. (a3)

    (Relativistic covariance) For each \(g \in \widetilde{\textrm{P}}\) coveringFootnote 10 a Poincaré transformation , there is an automorphism \(\alpha _g\) on \({\mathcal {A}}\) such that \(\alpha _g {\mathcal {A}}(O) = {\mathcal {A}}(\Lambda O+a)\) for all causal diamonds O and such that \(\alpha _g \alpha _{g'} = \alpha _{gg'}\) and \(\alpha _{(1,0)}=id\) is the identity.

  4. (a4)

    (Vacuum) There is a strongly continuous positive energy representation \(g \mapsto U(g)\) on \({\mathscr {H}}\) implementing \(\alpha _g(a) = U(g) a U(g)^*\) for all \(a \in {\mathcal {A}}\). There is a vector \(\Omega \) (the vacuum) which is cyclic for \({\mathcal {A}}\) and such that \(U(g)\Omega = \Omega \) for all \(g \in \widetilde{\textrm{P}}\). Positive energy means that if \(x \in {{\mathbb {R}}}^n \subset \textrm{P}\) is a translation by x, we can write

    $$\begin{aligned} U(x) = \exp (-i \eta (P,x)), \end{aligned}$$
    (73)

    and the vector generator \(P=(P^0,P^1,\dots ,P^{n-1})\) has spectral values p in the forward lightcone \(p \in {\bar{V}}^+ = \{ p \in {\mathbb {R}}^n \mid \eta (p,p) \ge 0, p^0>0\}\).

  5. (a5)

    (Additivity) Let \(O_i\) be a family of causal diamonds such that \(O = \cup _i O_i\). Then \((\cup _i {\mathcal {A}}(O_i))'' = {\mathcal {A}}(O)\).

For technical purposes, we also impose a “nuclearity condition.” The main purpose of that condition is to ensure a certain regularity on the theory, and several closely related versions of such a condition have been proposed. In so far as we can see, many of these would more or less all be equally good for our purposes. For definiteness, we impose [61]:

  1. (a6)

    (BW-nuclearity) Let A be a ball of radius r in Cauchy surface, and let \(O_r\) be the corresponding causal diamond. Consider the map

    $$\begin{aligned} \Theta _{\beta ,r}: {\mathcal {A}}(O_r) \rightarrow {\mathscr {H}}\, \quad a \mapsto e^{-\beta H}a\Omega \, \end{aligned}$$
    (74)

    where \(\beta >0\) and where \(H=P^0\) is the Hamiltonian, i.e. the time-component of P in item (a4). It is required that there exist positive constants \(s>0\) and \(c = c(r)>0\) such that for \(r>0, \beta >0\) we have \( \Vert \Theta _{\beta , r} \Vert _1 \le e^{(c/\beta )^s} \ . \) Here we use the nuclear 1-norm discussed further e.g. in [62].

We now comment on two well-known important consequences of these results for our analysis, see [60] for further details and references. First, by the Reeh–Schlieder theorem, \(\Omega \) is cyclic and separating for each \({\mathcal {A}}(O)\), so the vacuum automatically provides a standard form for each local von Neumann algebra. Secondly, each \({\mathcal {A}}(O)\) is a hyperfinite factor of type III\(_1\) [63] which is a unique object up to von Neumann isomorphism by [64]. As a consequence, we can apply all of our results on the channel divergences \(D_{BS}\) to the local algebras \({\mathcal {A}}(O)\).

It is important to stress that a priori, \({\mathcal {A}}(O)\) is defined only for causal diamonds associated with simply connected subsets of a Cauchy surface. If K is any open, causally complete subset of \({\mathbb {R}}^n\), we could define either

$$\begin{aligned} {\mathcal {A}}(K) = (\vee _{O \subset K} {\mathcal {A}}(O))'', \quad \text {or} \quad {\mathcal {B}}(K) = (\vee _{O' \subset K'} {\mathcal {A}}(O'))'. \end{aligned}$$
(75)

In either case, a prime on a region O or K means the causal complement. For topologically trivial causal diamonds O with compact closure it is a result that \({\mathcal {A}}(O') = {\mathcal {A}}(O')\) (Haag duality), so by (a5), \({\mathcal {A}}(O) = {\mathcal {B}}(O)\) for topologically trivial causal diamonds. Either \({\mathcal {A}}\) or \({\mathcal {B}}\) gives a net in the above sense with the possible exception of condition (a5) in the case of \({\mathcal {B}}\). \({\mathcal {B}}(K)\) is in general strictly bigger for topologically non-trivial regions K than \({\mathcal {A}}(K)\).

DHR-Representations: See [34, 35, 60, 65]. The Hilbert space \({\mathscr {H}}\) may be considered as the defining (vacuum) representation of the net, but it is physically relevant to also consider other representations. We shall consider representation \(\pi \) of \({\mathcal {A}}\) on a Hilbert space \({\mathscr {H}}_\pi \) which are ultraweakly continuous when restricted to any \({\mathcal {A}}(O)\) and which satisfy:

  • (DHR-selection criterion) [34, 35] \(\pi |_{{\mathcal {A}}(O)' \cap {\mathcal {A}}}\) is unitarily equivalent to the vacuum representation for some O.

  • (BF-selection criterion) [65] The automorphisms \(\alpha _g\) in (a3) are unitarily implemented in \(\pi \), i.e. there exists a strongly continuous positive energy representation \(U_\pi (g)\) such that \(\pi (\alpha _g(a)) = U_\pi (g) \pi (a) U_\pi (g)^*\) such that the generator \(P_\pi \) of translations \(U_\pi (x) = \exp (-i \eta (P_\pi ,x))\) on \({\mathscr {H}}_\pi \) has an isolated mass shell in its spectrum, i.e. \(spec(P_\pi ) \subset \{ p: \eta (p,p) = M^2, p^0>0\} \cup \{ p: \eta (p,p) \ge m^2, p^0>0\}\) for some \(m^2> M^2 >0\).

If we let V be a unitary implementing the unitary equivalence in the first item, then \(\rho (a):= V^* \pi (a) V\) is an endomorphism of \({\mathcal {A}}\) such that

$$\begin{aligned} \rho |_{{\mathcal {A}}(O)' \cap {\mathcal {A}}} = id. \end{aligned}$$
(76)

One says that \(\rho \) is a localized endomorphism (in O) for this reason. Furthermore, \(\rho \) is transportable in the following sense. Let \(O_1:=O\), \(\rho _1:= \rho \) and let \(O_2\) be another causal diamond. Then there exists a unitary \(u_{21} \in {\mathcal {A}}(O_1) \vee {\mathcal {A}}(O_2)\) such that \(Ad u_{21} \circ \rho _1 =: \rho _2\) is an endomorphism satisfying the DHR- and BF- selection criteria that is localized in \(O_2\). We will refer to the endomorphisms arising from the selection criteria above as a localized, transportable endomorphism.

Let \(\rho \) be a transportable irreducible endomorphism localized in some O. As is known, the selection criteria imply a considerable amount of further algebraic structure associated with \(\rho \). First, we have a so-called conjugate transportable endomorphism \({\bar{\rho }}\) together with solutions \(r,{{\bar{r}}} \in {\mathcal {A}}(O)\) and \(d_\rho \ge 1\) to the intertwining

$$\begin{aligned} \rho {\bar{\rho }}(a) {{\bar{r}}} = {{\bar{r}}}a, \quad {\bar{\rho }} \rho (a)r = ra \quad (a \in {\mathcal {A}}) \end{aligned}$$
(77)

and conjugacy relations

$$\begin{aligned} r^* r = d_\rho 1, \quad {{\bar{r}}}^* {{\bar{r}}} = d_\rho 1, \quad r^* {\bar{\rho }}({{\bar{r}}}) = 1 = {{\bar{r}}}^* \rho (r). \end{aligned}$$
(78)

A left inverse of \(\rho \) is given by \(\Psi _\rho (a):=d^{-1}_\rho r^* {\bar{\rho }}(a) r\). The Jones projection for the extension \({\mathcal {A}}(O)\) of \(\rho ({\mathcal {A}}(O))\) is given by \(e_\rho =d^{-1}_\rho {{\bar{r}}} {{\bar{r}}}^*\) and the minimal conditional expectation is \(E_\rho : {\mathcal {A}}(O) \rightarrow \rho ({\mathcal {A}}(O))\) is given by \(E_\rho = \rho \circ \Psi _\rho \). \(d_\rho \) is referred to as the “statistical dimension” of \(\rho \). By the index-statistics theorem [33], \(d_\rho =[{\mathcal {A}}(O):\rho ({\mathcal {A}}(O))]^{1/2}<\infty \). Similar constructions apply to reducible endomorphisms/representations.

For a variant of this theory for conformal field theories in \(n=2\) spacetime dimensions see [33, 66].

4.2 Complexity of channels in AQFT

Let T be a completely positive map of the quasi-local algebra \({\mathcal {A}}\),Footnote 11 such that, for some sufficiently large causal diamond O, it restricts to a channel of \({\mathcal {A}}(O)\). By [17, Theorem 2.10], we may write

$$\begin{aligned} T(a) = v^*\theta (a)v, \quad a \in {\mathcal {A}}(O) \end{aligned}$$
(79)

where v is an isometry of \({\mathcal {A}}(O)\) and \(\theta \) is an endomorphism of \({\mathcal {A}}(O)\). This motivates the following definition.

Definition 4.1

A channel \(T: {\mathcal {A}}\rightarrow {\mathcal {A}}\) is called localized and transportable if it is of the form (79) for some localized (in some causal diamond O) transportable endomorphism \(\theta \) and some isometry \(v \in {\mathcal {A}}(O)\).

Remark 4.2

  1. (1)

    Note that by definition, \(T|_{{\mathcal {A}}(O)' \cap {\mathcal {A}}} = id\), i.e. T is the identity in the causal complement of O.

  2. (2)

    It is easy to see that the set of localized and transportable channels is stable under composition, i.e. the composition is again of the form (79). It is also closed under convex combinations: Let \(T_i\) be localized, transportable channels of the form (79) with \(v_i, \theta _i\), and \(p_i\) a probability distribution on a finite set. Since \({\mathcal {A}}(O)\) is type III [21], there are isometries \(a_i\) in \({\mathcal {A}}(O)\) satisfying the Cuntz algebra relations (58). Then set \(v = \sum \sqrt{p_i} a_iv_i\) and \(\theta (m) = \sum a_i \theta _i(m) a_i^*, m \in {\mathcal {A}}(O)\). It follows that \(\theta \) is a localized, transportable endomorphism of \({\mathcal {A}}\), that v is an isometry of \({\mathcal {A}}(O)\), and that \(\sum p_iT_i\) is of the form (79).

  3. (3)

    One may generalize the definition to channels between two nets \({\mathcal {A}}, {\mathcal {B}}\).

Let \(x \in {\mathbb {R}}^n\), let \(O+x\) be the translate of O and let \(\alpha _x(a) = U(x)^* a U(x)\) be the translate of an element \(a \in {\mathcal {A}}(O)\) to \({\mathcal {A}}(O+x)\) as in (a3). We consider \(T_x = \alpha _x \circ T \circ \alpha _{-x}\) as a channel of \({\mathcal {A}}(O+x)\). Then

$$\begin{aligned} T_x(a) = \alpha _x(v)^* U(x)\theta (\alpha _{-x}(a))U(x)^* \alpha _x(v). \end{aligned}$$
(80)

Since \(\theta \) is by assumption an endomorphism satisfying the DHR- and BF selection criteria, translations are implemented in the sector \(\theta \) by a strongly continuous group of unitaries \(U_\theta (x), x \in {\mathbb {R}}^n\) so we have \(\theta (\alpha _{-x}(a)) = U_\theta (x)^* \theta (a) U_\theta (x)\). Furthermore \(u(x) = U(x) U_\theta (x)^*\) is an element of \({\mathcal {A}}(O) \vee {\mathcal {A}}(O+x)\) transporting \(\theta \) to an endomorphism \(\theta _x = Ad_{u(x)} \circ \theta \) localized in in \(O+x\), and we have, with \(v_x = \alpha _x(v) \in {\mathcal {A}}(O+x)\),

$$\begin{aligned} T_x(a) = v_x^* \theta _x^{} (a) v_x^{}, a \in {\mathcal {A}}(O+x), \end{aligned}$$
(81)

i.e. it has the same form as (79) but with \(\theta _x, v_x\) now localized in \(O+x\).

We now make a proposal for the complexity of a channel in algebraic quantum field theory.

Definition 4.3

The complexity of a localizable and transportable channel T is defined as

$$\begin{aligned} c(T) = D_{BS}(id|_{{\mathcal {A}}(O)}\Vert T|_{{\mathcal {A}}(O)}), \end{aligned}$$
(82)

where O is any sufficiently large causal diamond such that \(T|_{{\mathcal {A}}(O)' \cap {\mathcal {A}}} = id\).

To be precise, we should demonstrate:

Lemma 4.4

The definition of c(T) does not depend on the sufficiently large causal diamond O chosen in (82).

Proof

Let \(O_1,O_2\) be causal diamonds such that \(T|_{{\mathcal {A}}(O_i)' \cap {\mathcal {A}}} = id\) and let O be the causal completion of \(O_1 \cup O_2\). Let \({\mathcal {A}}(O_1)^c:= {\mathcal {A}}(O_1)' \cap {\mathcal {A}}(O)\), let \({\mathcal {M}}_n^c\) be a net of finite dimensional type I algebras exhausting \({\mathcal {A}}(O_1)^c\), and let \({\mathcal {M}}_n\) be a net of finite-dimensional type I algebras exhausting \({\mathcal {A}}(O_1)\), which exist as a consequence of requirement (a6), see [21]. Then \((\cup _n {\mathcal {M}}_n^c \vee {\mathcal {M}}_n^{})''={\mathcal {A}}(O)\) and by the martingale property for \(D_{BS}\) we have

$$\begin{aligned} \begin{aligned} D_{BS}(id|_{{\mathcal {A}}(O)} \Vert T|_{{\mathcal {A}}(O)})&= \lim _n D_{BS}(id |_{{\mathcal {M}}_n^c \vee {\mathcal {M}}_n^{}} \Vert T|_{{\mathcal {M}}_n^c \vee {\mathcal {M}}_n^{}})\\&= \lim _n D_{BS}(id |_{{\mathcal {M}}_n^{}} \otimes id |_{{\mathcal {M}}^c_n} \Vert T|_{{\mathcal {M}}_n^{}} \otimes id |_{{\mathcal {M}}^c_n})\\&= \lim _n D_{BS}(id |_{{\mathcal {M}}_n^{}} \Vert T|_{{\mathcal {M}}_n^{}})\\&= D_{BS}(id |_{{\mathcal {A}}(O_1)} \Vert T|_{{\mathcal {A}}(O_1)}). \end{aligned} \end{aligned}$$
(83)

In the second equality, we used that \({\mathcal {M}}_n^{} \vee {\mathcal {M}}_n^c \cong {\mathcal {M}}_n^{} \otimes {\mathcal {M}}_n^c\) as von Neumann algebras because \({\mathcal {M}}_n\) and \({\mathcal {M}}^c\) are finite-dimensional and that T acts trivially on \({\mathcal {M}}_n^c\) by locality. In the third step we used external additivity of \(D_{BS}\). In the last step we used again the martingale property. The same could be shown for \(O_1 \rightarrow O_2\). Thus the definition of c(T) is independent of whether we take \(O_1\) or \(O_2\) in (82). \(\square \)

Theorem 4.5

The complexity c has the following properties (\(T,T_i\) localized, transportable channels):

  1. 1.

    (Identity) \(c(id)=0\).

  2. 2.

    (Internal subadditivity) \(c(T_1 \circ T_2) \le c(T_1) + c(T_2)\).

  3. 3.

    (Convexity) Let \(\{p_i\}\) be a probability distribution on a finite set. Then \(c(\sum p_i T_i) \le \sum p_i c(T_i)\).

  4. 4.

    (Locality) Let \(T_1\) and \(T_2\) be channels localized in spacelike related causal diamonds with strictly positive distance. Then \(c(T_1 \circ T_2) = c(T_1) + c(T_2)\).

  5. 5.

    (N-ary local measurement) Let \(M(a) = \sum _i e_i a e_i\) be the channel describing an N-ary local measurement associated with the N mutually orthogonal non-trivial projections \(e_i \in {\mathcal {A}}(O)\), \(\sum e_i=1\). Then M is localized and transportable and \(c(M) = \log N\).

  6. 6.

    (Net extensions) Let \({\mathcal {B}}\) be a net extending \({\mathcal {A}}\) [67] with corresponding conditional expectation E. Then

    $$\begin{aligned} c(E) = \log [{\mathcal {B}}(O):{\mathcal {A}}(O)]. \end{aligned}$$
    (84)
  7. 7.

    (Localized transportable endomorphisms I) Let \(\rho \) be a transportable localized transportable endomorphism with conditional expectation \(E_\rho \) and statistical dimension \(d_\rho \). Then \(E_\rho \) is a localized transportable channel and

    $$\begin{aligned} c(E_\rho ) = \log d_\rho ^2. \end{aligned}$$
    (85)
  8. 8.

    (Translations) If \(T_x\) is the translate of T by \(x \in {\mathbb {R}}^n\) as in (81), then \(c(T_x) = c(T)\).

  9. 9.

    (Localized transportale endomorphisms II) If \(\rho \ne id\) is a localized transportable endomorphism of \({\mathcal {A}}(O)\), then \(c(\rho ) =\infty \).

  10. 10.

    (Local unitaries) Let \(u \in {\mathcal {A}}(O)\) be a unitary and \(U(a) = u^*au\) be the corresponding channel on \({\mathcal {A}}(O)\). Then if \(U \ne id\), we have \(c(U)=\infty \).

Proof

(1)-(3), (6), (10) are taken from Sect. 3.

(4) Let \(T_i\) be localized in \(O_i\), where \(O_1\) and \(O_2\) are spacelike related with strictly positive distance. By locality \(T_1 \circ T_2(a_1a_2) = T_1(a_1)T_2(a_2), a_i \in {\mathcal {A}}(O_i)\). Then the BW-nucelarity assumption (a6) implies the split property for the algebras \({\mathcal {A}}(O_1)\) and \({\mathcal {A}}(O_2)\), see [21] or [60, Chapter V.5.2] as a general reference. So there is a unitary \(W:{\mathscr {H}}\rightarrow {\mathscr {H}}\otimes {\mathscr {H}}\) such that \(W^*(a_1 \otimes a_2)W = a_1 a_2\) and consequently \(T_1 \circ T_2|_{{\mathcal {A}}(O_1)\vee {\mathcal {A}}(O_2)} = Ad_W \circ (T_1 \otimes T_2) \circ Ad_{W^*}\), where \(T_1 \otimes T_2\) is the tensor product channel on \({\mathcal {A}}(O_1) \otimes {\mathcal {A}}(O_2)\) and \(Ad_W X = W^* X W\). In particular, the map \(T_1 \circ T_2\) is normal on \({\mathcal {A}}(O_1) \vee {\mathcal {A}}(O_2)\). Then, by applying internal subadditivity twice

$$\begin{aligned} D_{BS}(id\Vert T_1 \circ T_2) \ge D_{BS}(Ad_W\Vert Ad_W \circ (T_1 \otimes T_2)) \ge D_{BS}(id\Vert T_1 \otimes T_2) \end{aligned}$$
(86)

Since W is unitary, we have the reverse inequality by the same argument backwards, so \(c(T_1 \circ T_2) = c(T_1 \otimes T_2) = c(T_1) + c(T_2)\), by external additivity.

(5) This follows from Sect. 3; we only need to show that H is localized and transportable. Since \({\mathcal {A}}(O)\) is properly infinite, there are isometries \(a_i, i=1, \dots , N\) in \({\mathcal {A}}(O)\) satisfying the Cuntz algebra relations (58). Then we set \(v^* = \sum _i e_i a_i^*\) and \(\theta (m):=\sum _j a_j m a_j^*, m \in {\mathcal {A}}(O)\). It follows that \(v^* v = 1\), that \(\theta \) is a localized, transportable endomorphism, and that \(M(m) = v^*\theta (m)v\), as desired.

(7) The formulas \(E_\rho (a) = d_\rho ^{-1} \rho (r)^* \rho {\bar{\rho }}(a) \rho (r)\) and \(r^*r=d_\rho 1\) show that \(E_\rho \) is localized and transportable (with \(\theta = \rho {\bar{\rho }}\), \(v=d^{-1/2}_\rho \rho (r)\)). The formula follows from Proposition 3.21 because \(d_\rho = [{\mathcal {A}}(O):\rho {\mathcal {A}}(O)]^{1/2}\) by the index-statistics theorem [33].

(8) We only need to show \(c(T_x) = c(T)\). By applying internal subadditivity twice we see \(D_{BS}(id\Vert \alpha _x \circ T \circ \alpha _{-x}) \ge D_{BS}(\alpha _{-x}\Vert T \circ \alpha _{-x}) \ge D_{BS}(id\Vert T)\). We can also get the reverse inequality by running this argument backwards, thereby proving the claim.

(9) We view \(E_\rho , \Psi _\rho , \rho \) as maps on some \({\mathcal {A}}(O)\) such that \(\rho \) is localized within O. By using subadditivity and the formula \(\Psi _\rho \circ \rho = id\) twice:

$$\begin{aligned} D_{BS}(id\Vert \rho ) \ge D_{BS}(\Psi _\rho \Vert \Psi _\rho \circ \rho ) = D_{BS}(\Psi _\rho \Vert id) \ge D_{BS}(\Psi _\rho \circ \rho \Vert \rho ) = D_{BS}(id\Vert \rho ).\nonumber \\ \end{aligned}$$
(87)

So we must have equality in each step.

a) We first assume \(d_\rho >1\). To get a lower bound (actually \(+\infty \)) on \(D_{BS}(\Psi _\rho \Vert id)\), we proceed as in the proof of part a) of Proposition 3.21, noting that the Jones projection is \(e_\rho = d_\rho ^{-1} {{\bar{r}}} {{\bar{r}}}^*\), and \(\Psi _\rho (a) = d^{-1}_\rho r^* {\bar{\rho }}(a) r\), with \(r,{{\bar{r}}}, {\bar{\rho }}\) as in the conjugacy relations (78). Then, as is well-known, \(\Psi _\rho (e_\rho ) = d_{\rho }^{-2} 1\), again by the conjugacy relations. As our trial function, we now choose

$$\begin{aligned} x_{t}:= {\left\{ \begin{array}{ll} \tfrac{t^{-1}}{t^{-1}+d^{-2}_\rho }e_\rho \otimes 1 &{} {1/n \le t \le n,}\\ 0 &{} {t>n} \end{array}\right. } \end{aligned}$$
(88)

and we let \(y_{t} = 1-x_{t}\). The rest is similar as in as in the proof of part a) of Proposition 3.21: We see that with this trial function

$$\begin{aligned} \varphi _{\xi ,\Psi ,\pi }(x_t^{} x_t^*) + \frac{1}{t} \varphi _{\xi ,id,\pi }(y_t^{} y_t^*) = {\left\{ \begin{array}{ll} \frac{1}{d^2_\rho + t}, &{} \text {for }1/n \le t \le n,\\ \frac{1}{t^2} &{} {t>n} \end{array}\right. } \end{aligned}$$
(89)

and thereby:

$$\begin{aligned} \begin{aligned} D_{BS}(id\Vert \rho )&= D_{BS}(\Psi _\rho \Vert id) \\&= \sup _n \sup _{x,\xi } \left( \log n - \int _{1/n}^\infty \{ \varphi _{\xi ,\Psi ,\pi }(x_t^{} x_t^*) + \frac{1}{t} \varphi _{\xi ,id,\pi }(y_t^{} y_t^*) \} \frac{dt}{t} \right) \\&\ge \sup _n \left( \log n - \int _{1/n}^n \frac{1}{d^2_\rho + t} \frac{dt}{t} - \int _n^\infty \frac{dt}{t^2} \right) \\&= \sup _n \frac{1}{d_\rho ^2} \left( (d_\rho ^2-1) \log n - \log d^2_\rho + \log (1 + \frac{d_\rho ^2}{n}) - \log (1+\frac{1}{d_\rho ^2 n}) - \frac{d_\rho ^2}{n} \right) \\&= \infty \end{aligned} \end{aligned}$$
(90)

since \(d_\rho >1\).

b) If \(d_\rho =1\), then \(\rho ({\mathcal {A}}(O))={\mathcal {A}}(O)\) and \(\rho \) is an automorphism. Viewed as an automorphism of \({\mathcal {A}}\), we have \(\rho (b)=b\) for any \(b \in {\mathcal {A}}(K)\) so long as the causal diamond K is contained in \(O'\). Define the state \(\psi = \omega \circ \rho ^{-1}\) on \({\mathcal {A}}(O)\), where \(\omega \) is the vacuum state, and let \(\Psi \) be the representer of \(\psi \) in the natural cone of \(\Omega \). Then \(Ua\Omega := \rho (a)\Psi \) defines a unitary U and if \(b \in {\mathcal {A}}(K)\) then clearly \(bUa\Omega = b\rho (a)\Psi = \rho (ba) \Psi = Uba\Omega \). It follows that \(U \in (\cup _{K \subset O'} {\mathcal {A}}(K))' = {\mathcal {A}}(O')' = {\mathcal {A}}(O)\) by Haag duality. Thus \(\rho \) is inner when restricted to \({\mathcal {A}}(O)\) and hence \(c(\rho )=\infty \) unless \(\rho = id\) by item (10). \(\square \)

Remark 4.6

The specific local structure of QFT enters in an indirect way in Theorem 4.5 because it entails that the local algebras \({\mathcal {A}}(O)\) are hyperfinite, satisfy the split property, are properly infinite (in fact type III\(_1\)), and have a cyclic and separating vector – the vacuum by the Reeh–Schlieder theorem. This is used in various combinations in the proofs of these properties. Nevertheless the given specific forms of the axioms are probably not totally essential; in particular it seems unlikely that the specific properties of Minkowski spacetime are crucial. What is missing in Theorem 4.5 is a property linking the complexity of a channel to notions of energy transfer/cost.

Example 4.7

The simplest setting for an inclusion of nets as in (6) in Theorem 4.5 is when \({\mathcal {A}}= {\mathcal {B}}^G\) is the fixed point net under some finite “internal gauge” group G, say \({\mathbb {Z}}_N\) for definiteness. \({\mathbb {Z}}_N\) is acting by unitaries \(U(g), g \in \mathbb {\mathbb {Z}}_N\) on a common Hilbert space \({\mathscr {H}}\) for both nets and each U(g) commutes with the unitaries implementing the spacetime symmetries. The conditional expectation is just the group average \(E(b) = N^{-1} \sum _g U(g) b U(g)^*\). Let \(\chi _k\) be a character on \({\mathbb {Z}}_N\). Then \(P_k = \sum _g \chi _k(g) U(g)\) is a projection and \({\mathscr {H}}_k = P_k {\mathscr {H}}\) is an invariant subspace for each \({\mathcal {A}}(O)\) which can be seen as an irreducible representation \(\pi _k\) of \({\mathscr {H}}\).

Example 4.8

A more complicated class of examples for (6) in Theorem 4.5 arises in \(n=2\) conformal CFTs. Start with a conformal net \({\mathcal {V}}_c\) on circle \(S^1\), e.g. the Virasoro net for some central charge \(c<1\). The conformal net \({\mathcal {A}}\) is obtained as \({\mathcal {V}}_c \otimes {\mathcal {V}}_c^{op}\) identifying causal diamonds on \({\mathbb {R}}^2\) with Cartesian products of intervals. Then one can obtain an extension for \({\mathcal {A}}\) from the set of highest weight representations \({\mathcal {V}}_c\) (which give transportable irreducible endomorphisms \(\mu , \nu , \dots \) on the Virasoro net on the real line) by starting from the representation \(\sum _{\mu ,\nu } Z_{\mu ,\nu } \mu \otimes \nu ^{op}\) of \({\mathcal {V}}_c \otimes {\mathcal {V}}_c^{op}\), with \(Z_{\mu ,\nu }\) the multiplicities in the torus partition functions. If \({\mathcal {B}}\) is the corresponding net extending \({\mathcal {A}}\), the index is given by \([{\mathcal {B}}: {\mathcal {A}}] =\sum _\mu d_\mu ^2\). For details, see [68].

Example 4.9

A situation similar to to the previous example arises in local gauge theories of Yang-Mills type in \(n=4\) dimensions based on a compact local gauge group G: Take K to be the causal completion of a solid torus in a Cauchy-surface. Then we have \({\mathcal {A}}(K), {\mathcal {B}}(K)\) as in (75). As argued in [69], we should have \([{\mathcal {B}}(K):{\mathcal {A}}(K)] = \textrm{dim} Z(G)\) where Z(G) is the center of the gauge group (e.g. \({\mathbb {Z}}_N\) in the case of \(G=SU(N)\)). Thus we relate a property of the gauge group to the complexity of the conditional expectation \(E:{\mathcal {B}}(K) \rightarrow {\mathcal {A}}(K)\). An intuitive reasoning for what E does in terms of ‘t Hooft and Wilson loops is given in [69].

Example 4.10

An example for localized endomorphisms \(\rho \) as in (7) in Theorem 4.5 in a free field theory is the following [70, Section 4.7]. Consider a real N-component free complex Klein-Gordon quantum field \(\phi _I(x), I=1, \dots , N\) in \(n=4\) dimensions. We get a net \({\mathcal {A}}\) of all observables that are invariant under the obvious action of the SU(N)-symmetry. Consider a tensor \(T^{I_1 \dots I_k}, I_j=1, \dots , N\) whose symmetry properties under index permutations are characterized by some Young-tableau \(\mathbf {\lambda }=(\lambda _1,\dots ,\lambda _s)\) where \(\lambda _1 \ge \dots \ge \lambda _s\) and where \(\lambda _i\) is the number of boxes in the i-th row. Next, take testfunctions \(f_I\) with support in a causal diamond O. Define

$$\begin{aligned} \Psi = C \sum _{I_1, \dots , I_k=1}^N T^{I_1 \dots I_k} \phi _{I_1}(f_1) \dots \phi _{I_k}(f_k)\Omega \, \end{aligned}$$
(91)

where \(\phi _I(f)=\int \phi _I(x) f(x) d^4 x\) are the smeared KG quantum fields and \(\Omega \) the vacuum vector and C is a factor such that \(\Vert \Psi \Vert =1\). Let \(\dim (\mathbf {\lambda })\) be the dimension of the space of tensors with Young-tableau symmetry \(\mathbf {\lambda }\). By DHR theory [34, 35], there exist a localized (within O), transportable endomorphism \(\rho \) of the net such that

$$\begin{aligned} \langle \Omega , \rho (a) \Omega \rangle = \langle \Psi , a \Psi \rangle \quad \text {for }a \in {\mathcal {A}}\end{aligned}$$
(92)

and the statistical dimension \(d_\rho \) of this \(\rho \) equals the Young-tableau dimension \(\dim (\mathbf {\lambda })\). It is given by a standard formula in terms of the shape of the Young tableau, see e.g. [71], so we obtain from (7) in Theorem 4.5 in this example,

$$\begin{aligned} c(E_\rho )= \log \dim (\mathbf {\lambda })^2 = 2 \log \frac{\prod _i (\lambda _i+N-i) \prod _{i<j} (\lambda _i-\lambda _j-i+j)}{(N-1)!} \, \end{aligned}$$
(93)

In the following example diagram \(\mathbf {\lambda }:\)

figure b

with \(k=13\) and \(N=10\), the right side is \(2 \log 135\).

5 Conclusions

In this work we have proposed a notion of complexity of a channel based on a specific information theoretic notion of distance to the identity channel. It would clearly be interesting to understand better the uniqueness of our definition within the axiom scheme that we proposed. It would also be interesting to understand better the relation of our proposal, if any, to holographic approaches.