1 Introduction

In 1993, Elitzur and Vaidman proposed their famous bomb-tester experiment [1] to demonstrate that the arguably most intriguing property of quantum theory—superposition—can be exploited to detect an ultra-sensitive bomb in a black-box, in such a way that there is a non-vanishing probability that the bomb will not explode. Only two years later, Kwiat et al. [2] showed how to employ another fundamental phenomenon—the quantum Zeno effect [3]—to boost the probability that the bomb will not explode as close to 1 as one pleases. These powerful ideas found applications in “interaction-free” imaging [4, 5], counterfactual quantum computation [6, 7], counterfactual communication [8] and cryptography [9], and even complexity theory [10]. Despite the great success, it became apparent that the aforementioned techniques, which we will generically call “interaction-free” measurements, are subject to some fundamental limitations. Notably, it is impossible to learn the outcome of a decision problem solved by a quantum computer [7] without “running” the computer in at least one of the two cases, and two optically semi-transparent objects cannot be discriminated in such a way that no photon gets absorbed [11, 12].

Despite the results mentioned above, there seems to be no framework and analysis sufficiently general to pinpoint which objects can or cannot be discriminated perfectly by “interaction-free” measurements. Encouraged by recent results that generalize the quantum Zeno effect [13,14,15,16], we aim to remedy these shortcomings. To this end, we interpret the Elitzur–Vaidman bomb-tester experiment as a quantum channel discrimination problem and generalize the notion of “interaction-free” measurement to quantum channels via two slightly different, but in the end largely equivalent models. The theory of quantum supermaps [17] then provides the right framework to consider all possible (causally ordered) discrimination strategies, allowing us to decide when it is possible or impossible to discriminate two channels in an “interaction-free” manner.

Organization of the Paper This article is structured as follows: In the remainder of this section, we review the bomb-tester experiment in its versions by Elitzur and Vaidman and by Kwait et al. We also try to convey the idea of how the general model should look. Armed with this rough understanding, we will be able to state and discuss the major results of this work in Sect. 2. In Sect. 3, we give a detailed derivation of our model. Our main result, a characterization of what is possible and impossible to do with “interaction-free” measurements, is the combination of two pillars: a no-go theorem, in the form of an inequality, that tells us when it is impossible to discriminate two channels in an “interaction-free” manner; and a protocol that discriminates two channels in those cases that are not touched by the no-go theorem. A quantitative treatment of this protocol is given in Sect. 4, while the main content of Sect. 5 is the no-go theorem. Also in Sect. 5, we prove fundamental limits for the achievable decay rate of the “interaction” probability.

Fig. 1
figure 1

Elitzur–Vaidman bomb-tester experiment

The Bomb-Tester Experiment In the following, we briefly review the bomb-tester experiment in its original version by Elitzur and Vaidman and its iterative version by Kwiat et al. Suppose you have a box and you have been told that inside of this box there is an ultra-sensitive bomb. By ultra-sensitive, we mean that the bomb will explode even if only one photon hits it. As you do not trust the deliverer, you want to check if there is a bomb inside the box. For some reason, the only way to obtain information about the content of the box is by shining light through it. Doing so, however, might trigger the bomb, which is what we want to avoid. If photons were classical particles our task seems to be an impossible one.Footnote 1 To circumvent this problem, Elizur and Vaidman proposed to put the box into the upper arm of a Mach–Zehnder interferometer, as depicted in Fig. 1. If we work only with a single photon, then this proposal can be stated abstractly as follows: The Hilbert space of the problem is \({\mathcal {H}} = {\mathcal {H}}_U \otimes {\mathcal {H}}_L\), where \({\mathcal {H}}_U = {\mathcal {H}}_L = \mathrm {span}\{v, p\}\) are the Hilbert spaces associated with the upper and lower arm, and the orthogonal unit vectors v and p denote the vacuum and one-photon states, respectively. The 50/50 beamsplitter (BS) can be modeled as a unitary transformation U, defined by

$$\begin{aligned} \begin{aligned} U v\otimes v&= v\otimes v, \\ U p\otimes v&= \cos (\theta )\, v\otimes p + \sin (\theta ) \,p\otimes v,\\ U v\otimes p&= -\sin (\theta ) \,v\otimes p + \cos (\theta ) \,p\otimes v, \end{aligned} \end{aligned}$$
(1.1)

where \(\theta = 45^\circ \). Suppose we start with a photon in the lower input, then the initial state is \(s_0 := {|}v\otimes p{\rangle }{\langle }v\otimes p{|}\). There are two cases to analyze. On the one hand, if there is no bomb in the box, then the two beamsplitters rotate the state by \(90^\circ \). Hence, the photon ends up in the upper output. On the other hand, if there is a bomb in the box, then the bomb acts as a measurement device in the upper path. There are three possible outcomes of the experiment. The first possibility is that the photon takes the upper path and thus causes the bomb to explode. This happens with a probability of 50%. If the bomb does not explode, then, by the measurement postulate, the state of the system is still \(s_0\). Since the second beamsplitter has a 50/50 splitting ratio, the probability that we measure the photon in the upper output equals the probability that we measure the photon in the lower output, i.e., the probability for each of them is 25%. The important point here is that in 25% of the cases the photon ends up in the lower path. In that case, we can conclude that there is a bomb in the box, but the bomb has not been triggered. However, we only get this result in 25% of the cases.

Kwiat et al.’s Iterative Version To increase the efficiency of this protocol, the crucial idea is to feed the output back to the input, (thus, to let the photon go through the box many times) and to adjust the splitting ratio of the beamsplitters sensibly (see [2] for the experimental realization). The easiest way to analyze this proposal is to think of the feedback loop in a “rolled out” way. That is, we look at this proposal as if we had N copies of the Mach–Zehnder interferometer (where N is the number of times we let the photon go through the box), in each of which the box is in the upper arm (see Fig. 2).

Fig. 2
figure 2

Kwiat et al.’s version of the bomb-tester experiment

We further choose the angle \(\theta := \frac{90^\circ }{N}\) in (1.1), which defines the action of the beamsplitters. Let us analyze this protocol: If there is no bomb in the box and the photon starts in the lower path, then the photon travels through N beamsplitters, each of which rotates the state by an angle of \(\frac{90^\circ }{N}\). So overall the state is rotated by \(90^\circ \), which means that the photon will be in the upper output. For the case where there is a bomb in the box, let us calculate the probability that the photon always takes the lower path and therefore does not hit the bomb. For each of the beamsplitters, if the photon is in the lower path before the beamsplitter, then the probability that the photon will be in the lower path after the beamsplitter is given by \(\cos ^{2}(\theta )\). Since the bomb can be viewed as a measurement device, the probability that the photon always takes the lower path is simply the product of the probabilities at each beamsplitter. Hence, \(P(\text {always lower path}) = \cos ^{2N}(\theta )\). For \(N \rightarrow \infty \), we have

$$\begin{aligned} \cos ^{2N}(\theta ) = \left( 1-\frac{\pi ^2}{8N^2} + {\mathcal {O}}(N^{-4})\right) ^{2N} = 1-\frac{\pi ^2}{4N} + {\mathcal {O}}(N^{-2}) \xrightarrow {N \rightarrow \infty } 1. \end{aligned}$$

This simple calculation has the remarkable consequence that (when N is large enough) the photon will always end up in the lower path and the bomb will not explode. Since the photon will always end up in the upper path if there is no bomb in the box, this protocol enables us to tell (with probability approaching 1) whether there is a bomb in the box, while simultaneously ensuring that the bomb will not be triggered.

Interpretation as a Channel Discrimination Problem We have seen in the previous paragraph how to discriminate between a completely transparent object (empty box) and an opaque object (bomb) such that the probability that a photon gets absorbed by the opaque object can be made as small as one pleases. This problem can be reinterpreted as a channel discrimination problem as follows: The channel corresponding to the transparent object is simply the identity channel (\(T_{\mathrm{empty}} := \mathrm {id}\)), while the action of the opaque object can be identified with the channelFootnote 2\(T_{\mathrm{bomb}} : {\mathcal {B}}_1({\mathcal {H}}_U) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_U)\), defined by

$$\begin{aligned} T_{\mathrm{bomb}}(\cdot ) = \mathrm {tr}\left[ \cdot \right] {|}v{\rangle }{\langle }v{|}. \end{aligned}$$
Fig. 3
figure 3

N-step discrimination strategy

According to the theory of quantum combs,Footnote 3 the most general (causally ordered) strategy to discriminate channels is given by the sequential scheme, depicted in Fig. 3. That is, if the channels to be discriminated act on the system I (I for interaction), then the most general discrimination strategyFootnote 4 allowed by quantum theory can be described as follows: First, we choose an ancillary system Z (which might be arbitrarily large) and an initial state \(s_0 \in {\mathcal {S}}({\mathcal {H}}_I \otimes {\mathcal {H}}_Z)\). Then, we can apply a channelFootnote 5\(\Lambda _0 : {\mathcal {B}}_1({\mathcal {H}}_I \otimes {\mathcal {H}}_Z) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_I \otimes {\mathcal {H}}_Z)\) to \(s_0\). Afterwards, the unknown channel is applied to the system (i.e., if \(T : {\mathcal {B}}_1({\mathcal {H}}_I) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_I)\) is the unknown channel, then its application transforms the state \(\Lambda _0(s_0)\) to \((T\otimes \mathrm {id})(\Lambda _0(s_0))\)). Then, we can transform the system by applying a channel \(\Lambda _1 : {\mathcal {B}}_1({\mathcal {H}}_I \otimes {\mathcal {H}}_Z) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_I \otimes {\mathcal {H}}_Z)\). Afterwards, we apply the unknown channel again, followed by an application of a channel \(\Lambda _2 : {\mathcal {B}}_1({\mathcal {H}}_I \otimes {\mathcal {H}}_Z) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_I \otimes {\mathcal {H}}_Z)\). We repeat this process N times overall. In the end, our system is in a state \(\rho _N^T \in {\mathcal {S}}({\mathcal {H}}_I \otimes {\mathcal {H}}_Z)\), which depends on T. Hence, by measuring we can obtain information about the identity of T. Kwiat et al.’s protocol can be integrated in this formalism as follows: We identify the upper path with the system I and the lower path with the system Z and choose \(s_0 := {|}v \otimes p{\rangle } {\langle }v \otimes p{|}\). For \(0 \le i \le N-1\), the channels \(\Lambda _i\) are defined by \(\Lambda _i(\cdot ) := U\cdot U^\dagger = : {\hat{U}}(\cdot )\), with \(\theta = \frac{90^\circ }{N}\) and we set \(\Lambda _N := \mathrm {id}\). It is then easy to calculate that

$$\begin{aligned} \begin{aligned} \rho _N^{T_{\mathrm{empty}}}&= {\hat{U}}^N({|}v \otimes p{\rangle } {\langle }v \otimes p{|}) = {|}p\otimes v{\rangle }{\langle }p \otimes v{|},\\ \rho _N^{T_{\mathrm{bomb}}}&= \left( (T_{\mathrm{bomb}}\otimes \mathrm {id})\circ {\hat{U}}\right) ^N({|}v \otimes p{\rangle } {\langle }v \otimes p{|}) \\&= \cos ^{2N}(\theta ) {|}v\otimes p{\rangle }{\langle }v\otimes p{|} + (1-\cos ^{2N}(\theta )) {|}v\otimes v{\rangle }{\langle }v\otimes v{|}, \end{aligned} \end{aligned}$$
(1.2)

where \(\rho _N^{T_{\mathrm{empty}}}\) and \(\rho _N^{T_{\mathrm{bomb}}}\) denote the output states of the protocol when the unknown channel is \(T_{\mathrm{empty}}\) or \(T_{\mathrm{bomb}}\). An interesting aspect of the expressions (1.2) is that one can read off the results of the last paragraph, since the states are orthogonal and since the probability that the bomb explodes is simply given by the coefficient of \({|}v\otimes v{\rangle }{\langle }v\otimes v{|}\). To abstract from the bomb-tester experiment, we want to allow for arbitrary quantum channels and for arbitrary discrimination strategies (Fig. 3). In this more general setting, the concept of the output state does not change. What is not a priori clear is what it means that something was “interaction-free”. Since we want to allow for arbitrary strategies (for example, involving many photons in arbitrary superpositions), the output state does not, in general, contain the information if an interaction occurred. Therefore, we need to model separately what “interaction-free” means for general discrimination strategies. A derivation of such a model based on some axioms takes some effort. We will, therefore, postpone this discussion until Sect. 3. For now, let us just describe the essential constituents. First, for the notion of “interaction-free” to have any meaning, there needs to be some way not to interact with the object in the box. We will thus assume, in analogy to the bomb-tester experiment, the existence of a vacuum state. That is, we assume that for the channels under consideration, there exists a pure state \({|}v{\rangle }{\langle }v{|} \in {\mathcal {S}}({\mathcal {H}}_I)\) such that \({|}v{\rangle }{\langle }v{|}\) gets mapped to a pure state by the channel and that if the channel is applied to \({|}v{\rangle }{\langle }v{|}\), then there is no “interaction” with the object in the box. This concept is formalized by the notion of a channel with vacuum.

Definition 1.1

(Channel with vacuum). A channel with vacuum \(v \in {\mathcal {H}}\) is a channel \(T : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) together with a unit vector \(v \in {\mathcal {H}}\) such that \(T({|}v{\rangle }{\langle }v{|})\) is pure. The unit vector v is called the vacuum, and the state \({|}v{\rangle }{\langle }v{|} \in {\mathcal {S}}({\mathcal {H}})\) is called the vacuum state.

The notion of an object in the box already suggests that we should look at the given channel in the open system picture. To this end, we imagine a Demon sitting in the box and trying to figure out if something else than the vacuum was sent through the box. To do so, we allow the Demon to access the object in the box. In more mathematical terms, the Demon has full access to the output of the conjugate channel [20]. An important implicit assumption underlying the discussion above is that the channels we look at can be applied several times (which means that the channel does not change)—a Markovianity assumption. Given just this Markovianity assumption, it is possible to determine the probability that, for a certain discrimination strategy, the Demon will find out if at any point during the execution of the strategy, the channel was applied to something else than the vacuum state. We will call this probability the “interaction” probability (see Definition 3.3), denoted by \(P_I^T(D)\), where T denotes the channel and D the discrimination strategy. The central notion of discrimination in an “interaction-free” manner, as formalized in Definition 3.4, is then defined by demanding that the discrimination error probability as well as the “interaction” probability can be made arbitrarily small simultaneously. We finish this section by formalizing the notion of a discrimination strategyFootnote 6 and by fixing the notation.

Definition 1.2

(Discrimination strategy). An N-step discrimination strategy is a tuple \(({\mathcal {H}}, {\mathcal {H}}_Z, {\mathcal {H}}_i, {\mathcal {H}}_o, s_0, \Lambda )\), where \({\mathcal {H}}\), \({\mathcal {H}}_Z\), \({\mathcal {H}}_i\), and \({\mathcal {H}}_o\) are Hilbert spaces, \(s_0 \in {\mathcal {S}}({\mathcal {H}}_i)\) is the initial state and \(\Lambda := \{\Lambda _0, \Lambda _1, \dots , \Lambda _N\}\) is a set of channels, with \(\Lambda _0 : {\mathcal {B}}_1({\mathcal {H}}_i) \rightarrow {\mathcal {B}}_1({\mathcal {H}} \otimes {\mathcal {H}}_Z)\), \(\Lambda _n : {\mathcal {B}}_1({\mathcal {H}} \otimes {\mathcal {H}}_Z) \rightarrow {\mathcal {B}}_1({\mathcal {H}} \otimes {\mathcal {H}}_Z)\) for \(1 \le n \le N-1\), and \(\Lambda _N : {\mathcal {B}}_1({\mathcal {H}} \otimes {\mathcal {H}}_Z) \rightarrow {\mathcal {B}}_1(H_o)\).

An N-step discrimination strategy induces the intermediate state map \(\rho : {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}})) \times \{0, 1, 2, \dots , N\} \rightarrow {\mathcal {B}}_1({\mathcal {H}}\otimes {\mathcal {H}}_Z) \cup {\mathcal {B}}_1({\mathcal {H}}_o)\), defined by

$$\begin{aligned} \begin{aligned} \rho (T, 0)&= \Lambda _0(s_0),\\ \rho (T, n)&= \Lambda _n\circ (T \otimes \mathrm {id}) \circ \rho (T, n-1), \text { for } 1 \le n \le N. \end{aligned} \end{aligned}$$
(1.3)

We will always writeFootnote 7\(\rho ^T_n\) for \(\rho (T, n)\) and omit \({\mathcal {H}}_i\) and \({\mathcal {H}}_o\) if \({\mathcal {H}}_i = {\mathcal {H}}_o = {\mathcal {H}}\otimes {\mathcal {H}}_Z\).

Notation Throughout, \({\mathcal {H}}\) (with some subscript) denotes a separable complex Hilbert space and in this paragraph, \({\mathcal {X}}\) and \({\mathcal {Y}}\) are Banach spaces. The range of a map \(f : {\mathcal {X}} \rightarrow {\mathcal {Y}}\) is denoted by \(\mathrm {ran}(f) := \left\{ f(x) \,\big |\, x \in {\mathcal {X}} \right\} \). The kernel of f is \(\mathrm {ker}(f) := \left\{ x \in {\mathcal {X}} \,\big |\, f(x) = 0 \right\} \). The dual space \({\mathcal {X}}^*\) of \({\mathcal {X}}\) is the set of bounded linear functionals on \({\mathcal {X}}\). The orthogonal complement of a linear subspace \({\mathcal {V}} \subseteq {\mathcal {H}}\) is denoted by \({\mathcal {V}}^\bot \). The open \(\epsilon \)-ball around \(x_0\in {\mathcal {X}}\) is defined by \(B_\epsilon (x_0) := \left\{ x \in {\mathcal {X}} \,\big |\, \left\Vert x-x_0 \right\Vert < \epsilon \right\} \) and the closed \(\delta \)-disc around \(z_0 \in {\mathbb {C}}\) is denoted by \({\mathbb {D}}_{\delta }(z_0) := \left\{ z \in {\mathbb {C}} \,\big |\, \left| z-z_0 \right| \le \delta \right\} \).

The Banach space of bounded linear operators \({\mathcal {X}} \rightarrow {\mathcal {X}}\) is denoted by \({\mathcal {B}}({\mathcal {X}})\). The space of trace-class operators \({\mathcal {B}}_1({\mathcal {H}})\) becomes a Banach space with trace-norm \(\left\Vert \cdot \right\Vert _1 := \mathrm {tr}\left[ \left| \cdot \right| \right] \). For \(A \in {\mathcal {B}}({\mathcal {H}})\), the adjoint is denoted by \(A^\dagger \) and the support of A is defined by \(\mathrm {supp}(A) := \mathrm {ker}(A)^\bot \). If \(A^\dagger = A\), then A is called self-adjoint. A is called positive semi-definite, sometimes denoted by \(A \ge 0\), if A is self-adjoint and for all \(\psi \in {\mathcal {H}}\). For a closed subspace \({\mathcal {V}} \subseteq {\mathcal {H}}\), we denote (in a slight abuse of notation) by \({\mathcal {B}}({\mathcal {V}}) \subseteq {\mathcal {B}}({\mathcal {H}})\) the bounded linear operators with range and support in \({\mathcal {V}}\) and by \({\mathcal {B}}_1({\mathcal {V}})\) the trace-class operators with range and support in \({\mathcal {V}}\).

A linear operator \(T \in {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\) is called a quantum operation if it is completely positive and trace non-increasing. If T is in addition trace-preserving, then T is called a (quantum) channel. If a quantum channel T is written in the form \(T(\cdot ) = \mathrm {tr}_{E}\left[ V\cdot V^\dagger \right] \), where \(V : {\mathcal {H}} \rightarrow {\mathcal {H}}_E \otimes {\mathcal {H}}\) is an isometry and where \(\mathrm {tr}_E\) is the partial trace, then V is called a Stinespring isometry. The set of (quantum) states on \({\mathcal {H}}\) is given by \({\mathcal {S}}({\mathcal {H}}) := \left\{ \rho \in {\mathcal {B}}_1({\mathcal {H}}) \,\big |\, \rho \ge 0, \mathrm {tr}\left[ \rho \right] = 1 \right\} \). The identity channel is denoted by \(\mathrm {id}\) and the unit matrix by \(\mathbbm {1}\). For positive semi-definite trace-class operators \(\rho \) and \(\sigma \), the fidelity is defined by \(\sqrt{F}(\rho , \sigma ) := \left\Vert \sqrt{\rho }\sqrt{\sigma } \right\Vert _1\).

For \(B \in {\mathcal {B}}({\mathcal {X}})\), the resolvent set is \(\rho (B) := \left\{ z\in {\mathbb {C}} \,\big |\, z-B \text { is invertible} \right\} \) and the spectrum is \(\sigma (B) := {\mathbb {C}}\setminus \rho (B)\). The discrete spectrum of B is the subset of isolated points of \(\sigma (B)\) such that the corresponding Riesz projection has finite rank.

2 Results

To state and discuss our main results, we need one more concept, which is similar to that of a decoherence-free subspace.Footnote 8

Definition 2.1

(Isometric subspace). Let \({\mathcal {V}}\) be a closed linear subspace of a Hilbert space \({\mathcal {H}}\). A channel \(T : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) is said to be isometric on \({\mathcal {V}}\) if there exists an isometry \(V: {\mathcal {V}} \rightarrow {\mathcal {H}}\), such thatFootnote 9

$$\begin{aligned} T\vert _{{\mathcal {B}}_1({\mathcal {V}})}(\cdot ) = V \cdot V^\dagger . \end{aligned}$$
(2.1)

If T is isometric on \({\mathcal {V}}\), we call \({\mathcal {V}}\) an isometric subspace w.r.t. T.

The significance of channels that are isometric on \({\mathcal {V}}\) is that they are the analogue to the identity channel in the bomb-tester case. To see why, note that \(T|_{{\mathcal {B}}_1({\mathcal {V}})}\) satisfies the Knill–Laflamme error-correcting conditions [21]. Hence, by composing \(T|_{{\mathcal {B}}_1({\mathcal {V}})}\) with an appropriate channel, we obtain the identity channel on \({\mathcal {B}}_1({\mathcal {V}})\). Furthermore, as Lemma 3.10 proves in a language adapted to our model, the output of the conjugate channel of T will be the same for all \(\rho \in {\mathcal {B}}_1({\mathcal {V}})\). In particular, if we have \(v \in {\mathcal {V}}\), where v is the vacuum, then even though \(\rho \in {\mathcal {B}}_1({\mathcal {V}})\) might be different from \({|}v{\rangle }{\langle }v{|}\), the Demon (having access to the conjugate channel only) has no chance of telling that something other than the vacuum has been sent through the box.

We are now ready to state our main result, which is an easy to check necessary and sufficient criterion that tells us when it is possible (or impossible) to discriminate two quantum channels in an “interaction-free” manner.

Theorem 2.2

(Main result). Let \(\mathrm {dim}({\mathcal {H}}) < \infty \). Two channels \(T_A, T_B : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) with vacuum \(v \in {\mathcal {H}}\) can be discriminated in an “interaction-free” manner if and only if there exists a subspace \({\mathcal {V}} \subseteq {\mathcal {H}}\) with the following three properties:

  1. 1.

    \(v \in {\mathcal {V}}\).

  2. 2.

    At least one of the two channels is isometric on \({\mathcal {V}}\).

  3. 3.

    \(T_A\vert _{{\mathcal {B}}({\mathcal {V}})} \ne T_B\vert _{{\mathcal {B}}({\mathcal {V}})}\).

Note that the central notion “discrimination in an ‘interaction-free’ manner” has only been defined informally in the paragraph following Definition 1.1. The formal definition, as well as the one for the “interaction” probability, can be found in Sect. 3.3, after a derivation of the mathematical form of these quantities from first principles in Sect. 3.1.

Remark 2.3

At first glance it may seem to be hard to check whether such a subspace exists. This is not so, as one only needs to consider two candidates for \({\mathcal {V}}\), the so-called maximal vacuum subspaces \({\mathcal {V}}_{T_A}\) and \({\mathcal {V}}_{T_B}\), which we define and study in 3.9 and 3.10.

Theorem 2.2 is a direct consequence of two results: a protocol to discriminate two channels and a no-go theorem. We discuss these cases separately in the following two subsections.

2.1 The Constructive Case

We consider the case where our main theorem says that we can discriminate the two channels in an “interaction-free” manner. That is, where there is a subspace \({\mathcal {V}}\), such that \({\mathcal {V}}\) contains the vacuum and one of the two channels is isometric on \({\mathcal {V}}\) and \(T_A\vert _{{\mathcal {B}}({\mathcal {V}})} \ne T_B\vert _{{\mathcal {B}}({\mathcal {V}})}\). For this case, we propose a protocol (see Sect. 4) that can discriminate two channels in an “interaction-free” manner. We will discuss the properties of this protocol in the following. It turns out that one does not need complete information about the two channels to perform the discrimination task. To account for this, we consider the more general task, where we want to know to which one of two known, disjoint, sets of channels the unknown channel belongs. Of course, Theorem 2.2 puts some restrictions on how these sets may look like. Specifically, we consider the following: Given a channel T with vacuum \(v \in {\mathcal {V}}\) that is isometric on \({\mathcal {V}}\), we take as our first set (a subset of) the set of channels that equal T if we restrict their domains to \({\mathcal {B}}_1({\mathcal {V}})\). The second set is less restricted in that we only assume that all channels must be channels with (the same) vacuum v and that the restrictions to \({\mathcal {B}}_1({\mathcal {V}})\) must not equal \(T|_{{\mathcal {B}}_1({\mathcal {V}})}\). It will then turn out that under these conditions, these two sets can be discriminated in an “interaction-free” manner. Roughly speaking, this tells us that we can test whether the unknown channel is T or some other channel, whose identity is unknown. Putting it yet another way, if the identity channel is interpreted as an empty box and every other channel as a non-empty box, then our result says that one can always find out (in an “interaction-free” manner) if there is something or nothing in the box. Before we state this in mathematical terms, we need to define the discrimination error probability for two sets.

Definition 2.4

(Error probability). Let \({\mathcal {C}}_A, {\mathcal {C}}_B \subseteq {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\) be two sets of channels. For an N-step discrimination strategy D and a two-valued POVM \(\Pi = \{\pi _A, \pi _B\}\), the discrimination error probability is defined by

$$\begin{aligned} P_e(D,\Pi )&:= \frac{1}{2} \left[ \sup _{T \in {\mathcal {C}}_A} \mathrm {tr}\left[ \pi _B \rho _N^{T}\right] + \sup _{T \in {\mathcal {C}}_B} \mathrm {tr}\left[ \pi _A \rho _N^{T}\right] \right] . \end{aligned}$$
(2.2)

Theorem 2.5

(Discrimination strategy). For \(\mathrm {dim}({\mathcal {H}}) < \infty \), let \({\mathcal {C}}_A, {\mathcal {C}}_B \subseteq {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\) be two closed sets of channels and \({\mathcal {V}}\) be a subspace of \({\mathcal {H}}\), such that

  1. 1.

    For all \(T \in {\mathcal {C}}_A \cup {\mathcal {C}}_B\), T is a channel with vacuum \(v \in {\mathcal {V}}\).

  2. 2.

    For all \(T \in {\mathcal {C}}_A\), T is isometric on \({\mathcal {V}}\).

  3. 3.

    The set \({\mathcal {C}}_A\vert _{{\mathcal {B}}_1({\mathcal {V}})} := \left\{ T\vert _{{\mathcal {B}}_1({\mathcal {V}})} \,\big |\, T \in {\mathcal {C}}_A \right\} \) contains exactly one element.

  4. 4.

    \({\mathcal {C}}_A\vert _{{\mathcal {B}}_1({\mathcal {V}})}\) and \({\mathcal {C}}_B\vert _{{\mathcal {B}}_1({\mathcal {V}})} := \left\{ T\vert _{{\mathcal {B}}_1({\mathcal {V}})} \,\big |\, T \in {\mathcal {C}}_B \right\} \) are disjoint.

Then there exist a constant C, and for every \(N \in {\mathbb {N}}\), an N-step discrimination strategy D and a two-valued POVM \(\Pi \), such that

$$\begin{aligned} P_e(D, \Pi ) \le \frac{C}{N^2}, \end{aligned}$$
(2.3)
$$\begin{aligned} P_I^{T_A}(D) = 0 \quad \text { and } \quad P_I^{T_B}(D) \le \frac{C}{N}, \end{aligned}$$
(2.4)

for all \(T_A \in {\mathcal {C}}_A\) and all \(T_B \in {\mathcal {C}}_B\), where \(P_I\) denotes the “interaction” probability. Thus, the sets \({\mathcal {C}}_A\) and \({\mathcal {C}}_B\) can be discriminated in an “interaction-free” manner.

Remark 2.6

The strategy we propose that has the properties stated in Theorem 2.5 only requires one ancillary qubit system in the worst-case scenario (as does the Kwiat et al. protocol) and might thus be implementable in the near future. We also show that one cannot get rid of the ancillary qubit in a naive way.

Remark 2.7

Although Theorem 2.5 is formulated for finite-dimensional spaces, a key part of the proof works also in infinite-dimensional spaces (Theorems 4.5 and 4.6).

Remark 2.8

For two channels \(T_A\) and \(T_B\) with vacuum \(v \in {\mathcal {H}}\), we can define the sets \({\mathcal {C}}_A := \{T_A\}\) and \({\mathcal {C}}_B := \{T_B\}\). If there is a subspace \({\mathcal {V}}\) such that the Conditions 1-3 in the main theorem are fulfilled (and, w.l.o.g, \(T_A\) is isometric on \({\mathcal {V}}\)), then clearly \({\mathcal {C}}_A\) and \({\mathcal {C}}_B\) satisfy the hypothesis of Theorem 2.5 and thus \(T_A\) and \(T_B\) can be discriminated in an “interaction-free” manner. This proves the direct part of Theorem 2.2.

Given the result of Theorem 2.5, it is natural to ask whether the bounds on the error probability and the “interaction” probability have the optimal dependence on N. This is clearly not the case for the error probability, as is already evident from the bomb-tester experiment. For the “interaction” probability, we were able to show (under a mild condition on \({\mathcal {C}}_A\) and \({\mathcal {C}}_B\)) that \(N^{-1}\) is indeed the best possible rate. We state this as a meta theorem (see Theorem 5.9).

Theorem

Subject to a condition stated in Theorem 5.9, there exists a constant \(C > 0\) such that

$$\begin{aligned} \max (P_I^{T_A}(D), P_I^{T_B}(D)) \ge C\,\frac{(1-2P_e(D,\Pi ))^4}{N}, \end{aligned}$$
(2.5)

for all N-step discrimination strategies D and all two-valued POVM’s \(\Pi \).

The result above cannot hold unconditionally. If there is a subspace \({\mathcal {V}}\) such that \(v \in {\mathcal {V}}\), and both channels are isometric on \({\mathcal {V}}\) and \(T_A\vert _{{\mathcal {B}}({\mathcal {V}})} \ne T_B\vert _{{\mathcal {B}}({\mathcal {V}})}\), then we can restrict ourselves to probing the channel only with states in \(\rho \in {\mathcal {B}}_1({\mathcal {V}})\). Since the Demon cannot tell the difference between these states, the “interaction” probability is zero and the remaining problem is to discriminate two isometric channels. This problem can be solved with discrimination error probability equal to zero, in a finite number of steps [22]. We were unable to show that the case described above is the only one where the \(N^{-1}\)-rule can be violated, but this seems plausible.

2.2 The No-Go Case

In this section, we consider the case for which our main theorem tells us that “interaction-free” channel discrimination is impossible; that is, if there exists no subspace satisfying all three properties of Theorem 2.2. In this case the channels \(T_A\) and \(T_B\) must be such that whenever there is a subspace \({\mathcal {V}}\) that contains the vacuum and on which at least one of the two channels is isometric, then the two channels must necessarily be the same on that subspace.Footnote 10 In this case, we were able to establish the following theorem that shows that there is a trade-off between the error probability and the “interaction” probability, in the sense that not both of them can go to zero simultaneously.

Theorem 2.9

(No-go theorem). For \(\mathrm {dim({\mathcal {H}})} < \infty \), let \(T_A, T_B : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be two channels with vacuum \(v\in {\mathcal {H}}\). Suppose that no subspace satisfies the properties 1, 2, and 3 of Theorem 2.2 simultaneously.

Then, there exists a constant \(C > 0\), such that

$$\begin{aligned} (1-2P_e(D, \Pi ))^2 \le C \max (P_I^{T_A}(D), P_I^{T_B}(D)), \end{aligned}$$
(2.6)

for all finite-dimensional N-step discrimination strategies D and all two-valued POVMs, \(\Pi \). Hence, \(T_A\) and \(T_B\) cannot be discriminated in an “interaction-free” manner.

Clearly, this implies the converse in Theorem 2.2.

As a by-product, we obtained an inequality for the fidelity, which might be of independent interest.

Proposition 2.10

For \(\mathrm {dim}({\mathcal {H}}) < \infty \), let \(T_A^\downarrow , T_B^\downarrow : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be quantum operations and let \({\mathcal {V}}\) be a subspace of \({\mathcal {H}}\) such that \(T_A^\downarrow \vert _{{\mathcal {B}}_1(\mathcal {V)}} = T_B^\downarrow \vert _{{\mathcal {B}}_1(\mathcal {V)}}\) and \(T_A^\downarrow \vert _{{\mathcal {B}}_1(\mathcal {V)}}\) is trace-preserving. Then,

$$\begin{aligned} \sqrt{F}(T^\downarrow _A(\rho ), T^\downarrow _B(\sigma )) \ge \sqrt{F}(\rho , \sigma ) - 2 \sqrt{F}(P^\bot \rho P^\bot , P^\bot \sigma P^\bot ), \end{aligned}$$
(2.7)

for all \(\rho , \sigma \ge 0\), where \(P^\bot \) is the orthogonal projection onto \({\mathcal {V}}^\bot \).

3 The Models

In this section, we propose two different, but in the end largely equivalent models that generalize the notion of “interaction-free” measurement to quantum channels. Since the sequential scheme, given in Fig. 3, is the most general causally ordered strategy allowed in quantum theory [17], it suffices to define our notions for this kind of strategy. In both models, we assume the validity of Fig. 3. That is, we assume that the unknown channel T does not change during the execution of the discrimination strategy—the Markovianity assumption. This is a relatively weak assumption, since we are in control of the duration between the individual channel invocations. This section consists of four subsections. In the first two subsections, we derive our two models. The third subsection summarizes the former two by properly defining the quantities of merit and thereby setting the stage for a rigorous analysis in the later sections. In the fourth subsection, we compare the two models by deriving some elementary properties, which will be used later on.

3.1 The “Interaction” Model

In our first model, we interpret the term “interaction-free” in an information-theoretic way. That is, we imagine a Demon sitting in the box trying to figure out, if we interacted with the interior of the box. In more technical terms, this means that the Demon has full access to the output of the conjugate channel. Since our task would be trivially infeasible otherwise, there must be a way not to interact with the box. Therefore, we only consider channels with vacuum. That is, we assume that for all channels under consideration there exists a distinguished pure state, the vacuum state, \({|}v{\rangle }{\langle }v{|}\). This state is assumed to have the following two important properties: First, if the vacuum state is sent through the channel, then the Demon concludes that no interaction has occurred. Second, we assume that the channels under consideration map the vacuum state to a pure state. This assumption is physically reasonable as it means that the state of the probe system does not become entangled with the Demon’s system. If in contrast, the probe system becomes entangled with the Demon’s system, then there must have been an interaction and the term “interaction-free” measurement would be inappropriate. We should mention, however, that the transmission model, which we are going to describe in the next section does not use the ““vacuum maps to pure state”” assumption. This comes at the cost that the transmission functional is no longer a property of a channel (as the “interaction” functional will turn out to be) but rather an object that has to be modeled separately. Together, these two assumptions yield the definition of a channel with vacuum (Definition 1.1). For a given channel T with vacuum \(v \in {\mathcal {H}}_I\) and an N-step discrimination strategy \(D = ({\mathcal {H}}_I, {\mathcal {H}}_Z, {\mathcal {H}}_i, {\mathcal {H}}_o, s_0, \Lambda )\), we want to define the “interaction” probability \(P_I^T(D)\) as the probability that the Demon in the box encounters that, during the execution of D, something other than the vacuum state was sent through the channel. To define this probability, we need to specify how the Demon can obtain information about what was sent through the channel.

Fig. 4
figure 4

General scenario

A natural way to model this is by assuming that for each of the N channel-uses (indexed by n) in the discrimination strategy, the Demon is allowed to implement the channel T via a channel \(D_n : {\mathcal {B}}_1({\mathcal {H}}_{I^\prime }\otimes {\mathcal {H}}_I) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_{I^\prime }\otimes {\mathcal {H}}_I)\), where \({\mathcal {H}}_{I^\prime }\) is the Hilbert space associated with a system \(I^\prime \), which the Demon controls. We further allow the Demon to keep an arbitrarily large memory system M (with Hilbert space \({\mathcal {H}}_M\)) which he can manipulate freely (i.e., he can choose the channels \(M_n\), defined below). The most general (causally ordered) scheme that can be obtained from the above description is depicted in Fig. 4. Mathematically, the Demon’s strategy is completely determined by an initial state \(s_0^D \in {\mathcal {S}}({\mathcal {H}}_M \otimes {\mathcal {H}}_{I^\prime })\) and channels \(M_0, M_1, \dots , M_N : {\mathcal {B}}_1({\mathcal {H}}_M \otimes {\mathcal {H}}_{I^\prime }) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_M \otimes {\mathcal {H}}_{I^\prime })\) and \(D_1, D_2, \dots ,D_N : {\mathcal {B}}_1({\mathcal {H}}_{I^\prime } \otimes {\mathcal {H}}_I) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_{I^\prime } \otimes {\mathcal {H}}_I)\). Given this data, the scheme in Fig. 4 produces the output (or final) state \(\rho _{F} \in {\mathcal {S}}({\mathcal {H}}_M \otimes {\mathcal {H}}_{I^\prime } \otimes {\mathcal {H}}_o)\), defined by

$$\begin{aligned} \rho _{F} := (M_N \otimes \Lambda _N) (\mathrm {id}\otimes D_N \otimes \mathrm {id}) \dots (M_1 \otimes \Lambda _1) (\mathrm {id}\otimes D_1 \otimes \mathrm {id})(M_0 \otimes \Lambda _0)(\chi _0), \end{aligned}$$

where \(\chi _0 := s_0^D \otimes s_0\). In the end, the Demon will measure his system (\(M+I^\prime \)) and decide, based on the measurement outcome, if an interaction has occurred. The “interaction” probability is then the probability that he detects such an interaction if he chooses his strategy optimally within the given constraints.

Before we can analyze what the Demon’s optimal strategy is, we need to formulate mathematically the assumption that \(D_n\) implements T, and that T must be independent of the Demon’s strategy (Markovianity). Precisely, we assume that \(D_n\) must be such that if the Demon’s system (\(I^\prime \)) and I are uncorrelated, then the action on the system I must be independent of the state of the system \(I^\prime \). In formulas, we assume that

$$\begin{aligned} \mathrm {tr}_{I^\prime }\left[ D_n(\rho _{I^\prime }\otimes \rho _{I})\right] = T(\rho _I) \end{aligned}$$
(3.1)

for all \(\rho _{I^\prime } \in {\mathcal {S}}({\mathcal {H}}_{I^\prime })\), \(\rho _I \in {\mathcal {S}}({\mathcal {H}}_I)\), and \(n \in \{1, 2, \dots , N\}\). We note that (3.1) is exactly the definition of a semicausal channel, as introduced in [23]. A structure theorem by Eggeling et al. [24] tells us that semi-causal channels are semi-localizable. That is, \(D_n\) can be written in the form:

$$\begin{aligned} D_n(\rho _{I^\prime I}) = \mathrm {tr}_{E_n}\left[ (X_n \otimes \mathrm {id}_I) (\mathrm {id}_{I^\prime } \otimes {\hat{V}}_n)(\rho _{I^\prime I})\right] , \end{aligned}$$

where \({\hat{V}}_n : {\mathcal {B}}_1({\mathcal {H}}_I) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_{E_n} \otimes {\mathcal {H}}_I),\) defined by \({\hat{V}}_n(\cdot ) = V_n \cdot V_n^\dagger \) is the quantum channel associated with a Stinespring isometry \(V_n: {\mathcal {H}}_I \rightarrow {\mathcal {H}}_{E_n} \otimes {\mathcal {H}}_I\) of T and \(X_n : {\mathcal {B}}_1({\mathcal {H}}_{I^\prime } \otimes {\mathcal {H}}_{E_n}) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_{I^\prime } \otimes {\mathcal {H}}_{E_n})\) is some channel. To proceed further in our search for the Demon’s optimal strategy, we make a few simplifying observations and definitions. First, the unitary freedom in the Stinespring dilation \({\hat{V}}_n\) can be absorbed into the channel \(X_i\). We can therefore assume, without loss of generality, that \({\mathcal {H}}_{E_1} = {\mathcal {H}}_{E_2} = \dots = {\mathcal {H}}_{E_N} =: {\mathcal {H}}_E\) and \({\hat{V}}_1 = {\hat{V}}_2 = \dots = {\hat{V}}_N =: {\hat{V}}\). Second, for \(\rho \in {\mathcal {S}}({\mathcal {H}}_M \otimes {\mathcal {H}}_{I^\prime } \otimes {\mathcal {H}}_I)\), we have

$$\begin{aligned} (M_n \otimes \mathrm {id}_{I})D_n(\rho ) = \mathrm {tr}_{E_n}\left[ (\left[ ( M_n \otimes \mathrm {id}_{E_n})(\mathrm {id}_M \otimes X_n)\right] \otimes \mathrm {id}_I) (\mathrm {id}_{M I^\prime } \otimes {\hat{V}}_n)(\rho )\right] , \end{aligned}$$

which motivates the definition \({\underline{X}}_n := (M_n \otimes \mathrm {id}_{E_n})(\mathrm {id}_M \otimes X_n)\). In the following, we adopt the convention that if some channel acts trivially on a tensor factor (i.e., as the identity), then we omit these tensor factors in the notation (e.g., \({\underline{X}}_i \otimes \mathrm {id}_I\) becomes just \({\underline{X}}_i\)). With the newly introduced notation, it follows from the definition of \(\sigma _F\) that the state the Demon obtains is

$$\begin{aligned} \mathrm {tr}_{IZ}\left[ \rho _F\right] = \mathrm {tr}_{I Z} \, \Lambda _N \mathrm {tr}_{E_N} {\underline{X}}_N {\hat{V}}_N \Lambda _{N-1} \mathrm {tr}_{E_{N-1}} {\underline{X}}_{N-1} \dots \Lambda _{1} \mathrm {tr}_{E_1} {\underline{X}}_{1}{\hat{V}}_{1} M_0\Lambda _0 (\chi _0). \end{aligned}$$

We can commute the \({\underline{X}}_i\)s and \(\mathrm {tr}_{E_i}\)s to the left. Thus, upon defining the channel \(\Gamma : {\mathcal {B}}_1({\mathcal {H}}_{E_N}\otimes {\mathcal {H}}_{E_{N-1}}\otimes \dots \otimes {\mathcal {H}}_{E_1}) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_{M}\otimes {\mathcal {H}}_{I^\prime })\) by

$$\begin{aligned} \Gamma (\rho ) = \mathrm {tr}_{E_N} {\underline{X}}_N \mathrm {tr}_{E_{N-1}}{\underline{X}}_{N-1} \dots \mathrm {tr}_{E_1 }{\underline{X}}_{1}M_0(s_0^D\otimes \rho ), \end{aligned}$$

we have

$$\begin{aligned} \mathrm {tr}_{IZ}\left[ \rho _F\right] = \Gamma (\mathrm {tr}_{I Z}\, \Lambda _N {\hat{V}}_N \Lambda _{N-1} {\hat{V}}_{N-1} \dots \Lambda _{1} {\hat{V}}_{1}\Lambda _0(s_0)). \end{aligned}$$

To decide if the channel was ever applied to a state different from the vacuum state, the Demon measures his state with a two-valued POVM, \(\{Q_1, Q_2\}\). By convention, he will conclude that an interaction occurred (something other than the vacuum was sent through) if the event corresponding to \(Q_2\) occurs. If the state sent through the channel is always the vacuum state, then the Demon’s final state is

$$\begin{aligned} \Gamma (\left[ \mathrm {tr}_{I}\left[ V{|}v{\rangle }{\langle }v{|}V^\dagger \right] \right] ^{\otimes N}), \end{aligned}$$

where the tensor power is in the space \({\mathcal {H}}_{E_N} \otimes {\mathcal {H}}_{E_{N-1}} \otimes \dots \otimes {\mathcal {H}}_{E_1}\). Since the Demon must not report an interaction, if the state was always the vacuum state, we demand

$$\begin{aligned} 0 = \mathrm {tr}\left[ Q_2 \Gamma (\left[ \mathrm {tr}_{I}\left[ V{|}v{\rangle }{\langle }v{|} V^\dagger \right] \right] ^{\otimes N})\right] = \mathrm {tr}\left[ \Gamma ^*(Q_2) \left[ \mathrm {tr}_{I}\left[ V{|}v{\rangle }{\langle }v{|} V^\dagger \right] \right] ^{\otimes N}\right] , \end{aligned}$$

where \(\Gamma ^*\) denotes the channel \(\Gamma \) in the Heisenberg picture. Clearly, if \(\Gamma ^*(Q_2) = \mathbbm {1}^{\otimes N} - P_v^{\otimes N}\), where \(P_v\) is the orthogonal projection onto the support of \(\mathrm {tr}_{I}\left[ V{|}v{\rangle }{\langle }v{|} V^\dagger \right] \), then this requirement is fulfilled. Since we want to choose the optimal strategy the Demon can pursue, we want to set \(\Gamma ^*(Q_2) := \mathbbm {1}^{\otimes N} - P_v^{\otimes N}\). We can always choose \(\Gamma \) and \(Q_2\) to satisfy the last equation, because this corresponds to the strategy where the Demon simply stores all the states he obtains from the Stinespring dilation in each round. This justifies the graphical representation in Fig. 5. Since we defined the “interaction” probability to be the probability that the Demon concludes that an interaction occurred (if he acts optimally), we have

$$\begin{aligned} P_I^T(D) := \mathrm {tr}\left[ (\mathbbm {1}^{\otimes N} - P_v^{\otimes N}) \; \mathrm {tr}_{I Z} {\hat{V}}_N \Lambda _{N-1} {\hat{V}}_{N-1} \dots \Lambda _{1} {\hat{V}}_{1}(\rho _0^T)\right] , \end{aligned}$$
(3.2)

where \(\rho _0^T := \Lambda _0(s_0)\) is the first intermediate state. We remark that the definition of \(P_I^T(D)\) does not depend on the particular choice of the Stinespring dilation, since the unitary freedom in the Stinespring isometries is compensated by the equal and opposite freedom in \(P_v\).

Fig. 5
figure 5

Scenario when the Demon’s strategy is optimal

We can simplify this expression a bit. We define \(P_v^\bot := \mathbbm {1}- P_v\) and note that

$$\begin{aligned} \mathbbm {1}^{\otimes N} - P_v^{\otimes N}&= \sum _{n = 0}^{N-1} \mathbbm {1}^{\otimes N-n-1} \otimes P_v^\bot \otimes P_v^{\otimes n},\\ P_v^\bot \otimes P_v^{\otimes K}&= \prod _{j = 0}^{K-1} P_v^\bot \otimes \mathbbm {1}^{\otimes j} \otimes P_v \otimes \mathbbm {1}^{\otimes K - j - 1}. \end{aligned}$$

Using these two expressions and (several times) that \(\Lambda _n\) is trace-preserving, we obtain our final version for \(P_I^T(D)\),

$$\begin{aligned} P_I^T(D)&= \sum _{n=0}^{N-1} \mathrm {tr}\left[ \mathbbm {1}^{\otimes N-n-1} \otimes P_v^\bot \otimes P_v^{\otimes n} \; \mathrm {tr}_{I Z} {\hat{V}}_N \Lambda _{N-1} {\hat{V}}_{N-1} \dots \Lambda _{1} {\hat{V}}_{1}(\rho _0^T)\right] \\&= \sum _{n=0}^{N-1} \mathrm {tr}\left[ P_v^\bot \otimes P_v^{\otimes n} \; \mathrm {tr}_{I Z} {\hat{V}}_{n+1} \Lambda _{n} {\hat{V}}_{n} \dots \Lambda _{1} {\hat{V}}_{1}(\rho _0^T)\right] \\&= \sum _{n=0}^{N-1} \mathrm {tr}\left[ P_v^\bot \mathrm {tr}_{I Z}( {\hat{V}}_{n+1} ( \Lambda _{n} ( \mathrm {tr}_{E_i} ( (P_v\otimes \mathbbm {1}) {\hat{V}}_{n}( .. \mathrm {tr}_{E_1}( (P_v\otimes \mathbbm {1}){\hat{V}}_1(\rho _0^T) ..)\right] \\&= \sum _{n=0}^{N-1} \mathrm {tr}\left[ P_v^\bot \mathrm {tr}_I {\hat{V}}(\mathrm {tr}_Z(\Lambda _i T^\downarrow \Lambda _{n-1} T^\downarrow \dots \Lambda _{1} T^\downarrow (\rho _0^T)))\right] \\&= \sum _{n=0}^{N-1} \mathrm {tr}\left[ P_v^\bot \mathrm {tr}_I {\hat{V}}(\mathrm {tr}_{Z}\left[ \rho _n^{T^\downarrow }\right] )\right] . \end{aligned}$$

In the second to last line, we defined \(T^\downarrow (\cdot ) = \mathrm {tr}_{E}\left[ (P_v\otimes \mathbbm {1}) V \cdot V^\dagger \right] \) and \(\rho _n^{T^\downarrow }\) is determined by the intermediate state map. We have thus succeeded in our goal to define the “interaction” probability.

Remark 3.1

It is immediate from (3.2) that an alternative expression for \(P_I^T(D)\) is given by

$$\begin{aligned} P_I^T(D) = 1-\mathrm {tr}\left[ \rho ^{T^\downarrow }_N\right] . \end{aligned}$$
(3.3)

There are two reasons to prefer the lengthy version derived above. First, it makes the connection between the “interaction” model and the transmission model (defined below) explicit and thus allows us to treat these points of view on an equal footing. Second, it suggests to approach the problem by looking at the inputs of the individual channel uses, which turns out to be fruitful.

3.2 The Transmission Model

In our second model, we think of an interaction as something that does damage to the system in the box. As a guiding example, we think of a biological system—say a body cell. For the sake of argument, assume that we want to use high-energetic radiation (e.g., X-ray) to resolve the inner structure of the cell. Of course, radiation might damage the cell, which is usually undesirable. A reasonable measure for how much damage has been done to a cell seems to be the number of X-ray photons that were absorbed by the cell. In other words, the damage is quantified by the amount of energy that got transmitted from the probe system (X-ray) to the interior of the box (biological cell). Furthermore, if the cell is exposed to radiation several times, then the damage measure should be the sum of the number of photons that were absorbed each time. Let us now abstract away from this example. Assume that the system in the box is modeled quantum mechanically on a Hilbert space \({\mathcal {H}}_E\) and that the probe system is modeled on \({\mathcal {H}}_I\). Assume that initially the system E is in the state \(\rho _E \in {\mathcal {S}}({\mathcal {H}}_E)\). If we probe the system with a state \(\rho _I \in {\mathcal {S}}({\mathcal {H}}_I)\), then the combined evolution is described by a (not necessarily unitary) channel \(U : {\mathcal {B}}_1({\mathcal {H}}_E\otimes {\mathcal {H}}_I) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_E\otimes {\mathcal {H}}_I)\). Thus, the state of the combined system after the evolution is given by

$$\begin{aligned} \rho ^\prime _{E I} = U(\rho _E\otimes \rho _I). \end{aligned}$$

Now assume that, in analogy to the number of absorbed photons in the example above, there is some physical quantity (an observable) that got transmitted from the probe system to the interior of the box by the above process, and that this quantity is related to the damage done to the object in the box. We further assume that the process above can only cause damage and cannot repair the system in the box. Thus, the observable must be a positive semi-definite operator \(\Theta \) on the Hilbert space \({\mathcal {H}}_E\). Hence, for a single shot experiment, the important object is the positive linear functional \({\mathfrak {t}} : {\mathcal {B}}_1({\mathcal {H}}_I) \rightarrow {\mathbb {C}}\), defined by

$$\begin{aligned} {\mathfrak {t}}(\rho _I) = \mathrm {tr}\left[ \Theta \, \mathrm {tr}_{I}\left[ U(\rho _E\otimes \rho _I)\right] \right] . \end{aligned}$$

For a general N-step discrimination strategy D (with intermediate state map \(\rho \)), we assume that the transmitted quantity is extensive. Since the state of the part of the probe system that interacts with the interior of the box in the nth step is given by \(\mathrm {tr}_{Z}\left[ \rho _n^T\right] \) (T is the channel defined by \(T(\rho _I) = \mathrm {tr}_{E}\left[ U(\rho _E\otimes \rho _I)\right] \)), a good definition for the total transmission \({\mathfrak {T}}_T(D)\) is

$$\begin{aligned} {\mathfrak {T}}_T(D) := \sum _{n = 0}^{N-1} {\mathfrak {t}}_T\left( \mathrm {tr}_{Z}\left[ \rho _n^{T}\right] \right) . \end{aligned}$$

We raise this to a principle by assuming that for every channel T we have a positive linear functional \({\mathfrak {t}}_T\), which we call the transmission functional, that models the damage done to the object. The total transmission then plays the same role for the transmission model as the “interaction” probability does for the “interaction” model.

3.3 Formal Definition

We cast the principles developed in the last sections into formal definitions.

Definition 3.2

(“Interaction” functional). Let \(T : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be a channel with vacuum \(v \in {\mathcal {H}}\) and let \(V : {\mathcal {H}} \rightarrow {\mathcal {H}}_E \otimes {\mathcal {H}}\) be any Stinespring isometry of T. The positive linear functional \({\mathfrak {i}}_T : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathbb {C}}\), defined by

$$\begin{aligned} {\mathfrak {i}}_T(\cdot ) := \mathrm {tr}\left[ P_v^\bot \mathrm {tr}_{{\mathcal {H}}}\left[ V \cdot V^\dagger \right] \right] , \end{aligned}$$
(3.4)

is called the “interaction” functional of T, where \(P_v^\bot \) is the orthogonal projection onto the kernel of \(\mathrm {tr}_{{\mathcal {H}}}\left[ V {|}v{\rangle }{\langle }v{|} V^\dagger \right] \).

Definition 3.3

(“Interaction” probability). Let \(T : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be a channel with vacuum \(v \in {\mathcal {H}}\) and let \(D = ({\mathcal {H}}, {\mathcal {H}}_Z, {\mathcal {H}}_i, {\mathcal {H}}_o, s_0, \Lambda )\) be an N-step discrimination strategy. The “interaction” probability is defined by

$$\begin{aligned} P_I^T(D) := \sum _{n = 0}^{N-1} {\mathfrak {i}}_T\left( \mathrm {tr}_{Z}\left[ \rho _n^{T^\downarrow }\right] \right) , \end{aligned}$$
(3.5)

where the quantum operation \(T^\downarrow : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) is defined by

$$\begin{aligned} T^\downarrow (\cdot ) = \mathrm {tr}_{E}\left[ (P_v\otimes \mathbbm {1}) V \cdot V^\dagger \right] , \end{aligned}$$
(3.6)

and where \(V: {\mathcal {H}} \rightarrow {\mathcal {H}}_E \otimes {\mathcal {H}}\) is any Stinespring isometry of T and \(P_v\) is the orthogonal projection onto the support of \(\mathrm {tr}_{{\mathcal {H}}}\left[ V {|}v{\rangle }{\langle }v{|} V^\dagger \right] \).

Definition 3.4

(“Interaction-free” discrimination). Let \(v \in {\mathcal {H}}\) and \({\mathcal {C}}_A, {\mathcal {C}}_B \subseteq {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\) be two sets of channels such that for all \(T \in {\mathcal {C}}_A \cup {\mathcal {C}}_B\), T is a channel with vacuum v. We say that \({\mathcal {C}}_A\) and \({\mathcal {C}}_B\) can be discriminated in an “interaction-free” manner if for every \(\epsilon , \delta > 0\) there exists an N-step discrimination strategy D and a two-valued POVM \(\Pi \) such that

$$\begin{aligned} P_e(D, \Pi )< \epsilon \quad \text { and }\quad P_I^T(D) < \delta , \end{aligned}$$
(3.7)

for all \(T \in {\mathcal {C}}_A \cup {\mathcal {C}}_B\).

Definition 3.5

(Channel with transmission functional). A channel with transmission functional \({\mathfrak {t}}_T\) is a channel \(T : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) together with a positive linear functional \({\mathfrak {t}}_T \in \left( {\mathcal {B}}_1({\mathcal {H}})\right) ^*\). We call \({\mathfrak {t}}_T\) the transmission functional.

Definition 3.6

(Total transmission) Let \(T : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be a channel with transmission functional \({\mathfrak {t}}_T\). For an N-step discrimination strategy \(D = ({\mathcal {H}}, {\mathcal {H}}_Z, {\mathcal {H}}_i, {\mathcal {H}}_o, s_0, \Lambda )\), the total transmission is defined by

$$\begin{aligned} {\mathfrak {T}}_T(D) := \sum _{n = 0}^{N-1} {\mathfrak {t}}_T\left( \mathrm {tr}_{Z}\left[ \rho _n^{T}\right] \right) . \end{aligned}$$
(3.8)

Definition 3.7

(Transmission-free discrimination). Let \({\mathcal {C}}_A, {\mathcal {C}}_B \subseteq {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\) be two sets of channels such that for all \(T \in {\mathcal {C}}_A \cup {\mathcal {C}}_B\), T is a channel with transmission functional \({\mathfrak {t}}_T\). We say that \({\mathcal {C}}_A\) and \({\mathcal {C}}_B\) can be discriminated in a transmission-free manner if for every \(\epsilon , \delta > 0\) there exists an N-step discrimination strategy D and a two-valued POVM \(\Pi \) such that

$$\begin{aligned} P_e(D, \Pi )< \epsilon \quad \text { and }\quad {\mathfrak {T}}_T(D) < \delta , \end{aligned}$$
(3.9)

for all \(T \in {\mathcal {C}}_A \cup {\mathcal {C}}_B\).

3.4 Comparison of the Models and Elementary Properties

In this section, we clarify the relation between the transmission model and the “interaction” model. As a rule of thumb, the transmission model can be thought of as a generalization of the “interaction” model. Since we admit arbitrary positive linear functionals as transmission functionals, we have a much greater flexibility when modeling. For example, one could decide that out of the two objects to be discriminated, it does not matter (or is even desirable) if the second one gets destroyed. We should therefore set the transmission functional of the second channel to zero. This is something that is not possible in the “interaction” model. On the other hand, the advantage of the “interaction” model is that the “interaction” probability has a very clear interpretation and that the “interaction” functional is an intrinsic property of the channel. For the relation between these models, we note the following lemma.

Lemma 3.8

Let \(T : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be a channel with vacuum \(v \in {\mathcal {H}}\) and let \({\mathfrak {i}}_T\) be its “interaction” functional. If we interpret T as a channel with transmission functional \({\mathfrak {i}}_{T}\), then

$$\begin{aligned} P_I^{T}(D) \le {\mathfrak {T}}_{T}(D), \end{aligned}$$
(3.10)

for all N-step discrimination strategies D.

Proof

Immediate from the definition, since (by induction) \(\rho ^{T^\downarrow }_i \le \rho ^{T}_i\). \(\square \)

The insight that should be gained from this lemma is that if we want to prove that a certain discrimination task can be done in an “interaction-free” or in a transmission-free manner, then it suffices to tackle the problem in the transmission model. Thus, the results in Sect. 4 will be formulated in terms of the transmission model. On the other hand, if we want to prove a no-go theorem, then it is sufficient to work in the “interaction” model. At this point, there is a little detail that we do not want to hide, which is that it is possible that certain discrimination tasks can be performed with less resources, if one works in the “interaction” model and not in the transmission model. We will not investigate this possibility any further. We close this section by introducing the concept of a maximal vacuum subspace.

Definition 3.9

(Maximal vacuum subspace). Let \(T : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be a channel with vacuum \(v \in {\mathcal {H}}\) and let \(V : {\mathcal {H}} \rightarrow {\mathcal {H}}_E \otimes {\mathcal {H}}\) be any Stinespring isometry of T. The subspace \({\mathcal {V}}_T\) of \({\mathcal {H}}\), defined byFootnote 11

$$\begin{aligned} {\mathcal {V}}_T := V^{-1} \left[ \mathrm {supp}(\mathrm {tr}_{{\mathcal {H}}}\left[ V {|}v{\rangle }{\langle }v{|} V^\dagger \right] ) \otimes {\mathcal {H}} \right] , \end{aligned}$$
(3.11)

is called the maximal vacuum subspace of T.

Lemma 3.10

(Properties of maximal vacuum subspaces). For \(\mathrm {dim}({\mathcal {H}}) < \infty \), let \(T : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be a channel with vacuum \(v \in {\mathcal {H}}\). The maximal vacuum subspace \({\mathcal {V}}_T\) has the following properties:

  1. 1.

    \(v \in {\mathcal {V}}_T\).

  2. 2.

    T is isometric on \({\mathcal {V}}_T\).

  3. 3.

    If T is isometric on a subspace \({\mathcal {V}}^\prime \subseteq {\mathcal {H}}\), then either \({\mathcal {V}}_T \cap {\mathcal {V}}^\prime = \{0\}\) or \({\mathcal {V}}^\prime \subseteq {\mathcal {V}}_T\).

  4. 4.

    \({\mathcal {V}}_T\) is the union of all subspaces that contain v and on which T is isometric.

  5. 5.

    There exists a constant \(C_T > 0\) such that \({\mathfrak {i}}_T(\rho ) \ge C_T \mathrm {tr}\left[ P^\bot \rho \right] \) for all \(\rho \ge 0\), where \(P^\bot \) is the projection onto \({\mathcal {V}}_T^\bot \).

  6. 6.

    For all \(\rho \ge 0\), we have \({\mathfrak {i}}_T(\rho ) \le \mathrm {tr}\left[ P^\bot \rho \right] \), where \(P^\bot \) is the projection onto \({\mathcal {V}}_T^\bot \).

Remark 3.11

The Claims 1–4 and 6 remain true if one lifts the assumption that \({\mathcal {H}}\) is finite-dimensional. Claim 5, however, would then be wrong.

Proof

We start with the following observation: Let \(V : {\mathcal {H}} \rightarrow {\mathcal {H}}_E \otimes {\mathcal {H}}\) be any Stinespring isometry of T. Since \(T({|}v{\rangle }{\langle }v{|})\) is pure, Vv must be a tensor product. Thus, there are two unit vectors \(v^\prime \in {\mathcal {H}}\) and \(e \in {\mathcal {H}}_E\) such that

$$\begin{aligned} Vv = e\otimes v^\prime . \end{aligned}$$

Hence, \(\mathrm {tr}_{{\mathcal {H}}}\left[ V{|}v{\rangle }{\langle }v{|}V^\dagger \right] = {|}e{\rangle }{\langle }e{|}\) and

$$\begin{aligned} \mathrm {supp}(\mathrm {tr}_{{\mathcal {H}}}\left[ V{|}v{\rangle }{\langle }v{|}V^\dagger \right] ) = \mathrm {span}\{e\}. \end{aligned}$$
(3.12)

(1) Clearly, \(V v \in \mathrm {supp}(\mathrm {tr}_{{\mathcal {H}}}\left[ V{|}v{\rangle }{\langle }v{|}V^\dagger \right] ) \otimes {\mathcal {H}}\). Thus, \(v \in V^{-1}\left[ V v\right] \subseteq \mathcal {V_T}\).

(2) For \(\phi \in {\mathcal {V}}_T\), we have \(V\phi = e\otimes \psi _\phi \) for a uniquely defined \(\psi _\phi \in {\mathcal {H}}\). We define \(U : {\mathcal {V}}_T \rightarrow {\mathcal {H}}\) by \(U\phi := \psi _\phi \). It is easy to check that U is an isometry and that \(T({|}\phi {\rangle }{\langle }\phi {|}) = U{|}\phi {\rangle }{\langle }\phi {|}U^\dagger \). Since this holds for all \(\phi \in {\mathcal {V}}_T\), T is isometric on \({\mathcal {V}}_T\).

(3) Suppose that T is isometric on \({\mathcal {V}}^\prime \), with isometry \(U^\prime : {\mathcal {V}}^\prime \rightarrow {\mathcal {H}}\). If \(\mathrm {dim}({\mathcal {V}}^\prime ) \le 1\) then the claim is trivially true. So we can assume that \(\mathrm {dim}({\mathcal {V}}^\prime ) \ge 2\). Let \(v_1\) and \(v_2\) be two orthogonal unit vectors in \({\mathcal {V}}^\prime \). By assumption,

$$\begin{aligned} T({|}v_i{\rangle }{\langle }v_i{|}) = \mathrm {tr}_{E}\left[ V{|}v_i{\rangle }{\langle }v_i{|}V^\dagger \right] = U^\prime {|}v_i{\rangle }{\langle }v_i{|}U^{\prime \dagger }, \end{aligned}$$

for \(i \in \{1, 2\}\). As \(U^\prime {|}v_i{\rangle }{\langle }v_i{|}U^{\prime \dagger }\) is pure, there exists a pair of unit vectors \(e_1, e_2 \in {\mathcal {H}}_E\) such that \(Vv_i = e_i \otimes U^\prime v_i\). By linearity, we have

This can only be true if , which is true only if \(e_1 = e_2\). Thus, by transitivity, there is a unit vector \(e^\prime \in {\mathcal {H}}_E\) such that \(V v^\prime = e^\prime \otimes U^\prime v^\prime \) for all \(v^\prime \in {\mathcal {V}}^\prime \). With the definition of U in the proof of 2, we also have \(V\phi = e\otimes U\phi \) for all \(\phi \in {\mathcal {V}}_T\). Assume that \({\mathcal {V}}_T \cap {\mathcal {V}}^\prime \ne \{0\}\). For a unit vector \({\hat{v}} \in {\mathcal {V}}_T \cap {\mathcal {V}}^\prime \), the Cauchy–Schwarz inequality yields

Hence, the Cauchy–Schwarz inequality is satisfied with equality, which implies that the vectors e and \(e^\prime \) differ only by a phase factor. In particular, \(\mathrm {span}\{e\} = \mathrm {span}\{e^\prime \}\). Using (3.12), we have for any \(v^\prime \in {\mathcal {V}}^\prime \) that \(Vv^\prime = e^\prime \otimes U^\prime v^\prime \in \mathrm {supp}(\mathrm {tr}_{{\mathcal {H}}}\left[ V {|}v{\rangle }{\langle }v{|} V^\dagger \right] ) \otimes {\mathcal {H}}\). Consequently, \(v^\prime \in {\mathcal {V}}_T\). As \(v^\prime \) was arbitrary, this proves \({\mathcal {V}}^\prime \subseteq {\mathcal {V}}_T\) as claimed.

4) If an isometric subspace \({\mathcal {V}}^\prime \) contains v, then (by 1) the intersection with \({\mathcal {V}}_T\) is non-trivial. Thus, (by 3) \({\mathcal {V}}^\prime \) is a subspace of \({\mathcal {V}}_T\). Hence, \({\mathcal {V}}_T\) contains all isometric subspaces and the claim follows as (by 2) \({\mathcal {V}}_T\) is isometric itself.

The following consideration is needed in the proof of 5 as well as in the proof of 6. We define the projections \({\hat{P}} := P_v\otimes \mathbbm {1}\) and \({\hat{P}}^\bot := \mathbbm {1}- {\hat{P}}\), where \(P_v := {|}v{\rangle }{\langle }v{|}\). We further denote by P, the orthogonal projection onto \({\mathcal {V}}_T\) and define \(P^\bot := \mathbbm {1}- P\). In the following, let \(\rho \ge 0\). By definition, we have

$$\begin{aligned} {\mathfrak {i}}_T(\rho )&= \mathrm {tr}\left[ P_v^\bot \mathrm {tr}_{{\mathcal {H}}}\left[ V\rho V^\dagger \right] \right] \\&= \mathrm {tr}\left[ {\hat{P}}^\bot V\rho V^\dagger \right] \\&= \mathrm {tr}\left[ {\hat{P}}^\bot V P\rho P V^\dagger \right] + \mathrm {tr}\left[ {\hat{P}}^\bot V P\rho P^\bot V^\dagger \right] \\&\quad + \mathrm {tr}\left[ {\hat{P}}^\bot V P^\bot \rho P V^\dagger \right] + \mathrm {tr}\left[ {\hat{P}}^\bot V P^\bot \rho P^\bot V^\dagger \right] . \end{aligned}$$

By definition, if \(\psi \in {\mathcal {V}}_T\), then \({\hat{P}}^\bot V \psi = 0\). Thus, \({\hat{P}}^\bot V P = 0\) as an operator. Hence, all summands except the last one vanish. Thus, we have

$$\begin{aligned} {\mathfrak {i}}_T(\rho ) = \mathrm {tr}\left[ {\hat{P}}^\bot V P^\bot \rho P^\bot V^\dagger \right] = \mathrm {tr}\left[ V^\dagger {\hat{P}}^\bot V P^\bot \rho P^\bot \right] . \end{aligned}$$
(3.13)

We can now prove 5. To this end, note that if \(\mathrm {tr}\left[ P^\bot \rho P^\bot \right] = 0\), then the claim follows trivially. Otherwise, \(\frac{P^\bot \rho P^\bot }{\mathrm {tr}\left[ P^\bot \rho P^\bot \right] }\) is a density matrix and the spectral theorem implies that

$$\begin{aligned} \frac{P^\bot \rho P^\bot }{\mathrm {tr}\left[ P^\bot \rho P^\bot \right] } = \sum _i p_i {|}\psi ^\bot _i{\rangle }{\langle }\psi ^\bot _i{|}, \end{aligned}$$

with \(p_i \ge 0\), \(\sum _i p_i = 1\) and \(\psi _i^\bot \in {\mathcal {V}}_T^\bot \). By convexity, we have

$$\begin{aligned} \mathrm {tr}\left[ {\hat{P}}^\bot V P^\bot \rho P^\bot V^\dagger \right]&= \mathrm {tr}\left[ P^\bot \rho \right] \mathrm {tr}\left[ {\hat{P}}^\bot V \frac{P^\bot \rho P^\bot }{\mathrm {tr}\left[ P^\bot \rho P^\bot \right] } V^\dagger \right] \\&\ge \mathrm {tr}\left[ P^\bot \rho \right] \inf _{\begin{array}{c} \psi ^\bot \in {\mathcal {V}}_T^\bot \\ \left\Vert \psi ^\bot \right\Vert = 1 \end{array}} \mathrm {tr}\left[ {\hat{P}}^\bot V {|}\psi ^\bot {\rangle }{\langle }\psi ^\bot {|} V^\dagger \right] . \end{aligned}$$

If the infimum is strictly positive, then this is the \(C_T\) we are looking for. To see that this is indeed the case, note that the set \(\{\psi ^\bot \in {\mathcal {V}}_T^\bot \mid | \left\Vert \psi ^\bot \right\Vert = 1\}\) is compact. Thus, the infimum is actually a minimum. Assume for the sake of contradiction that \(\mathrm {tr}\left[ {\hat{P}}^\bot V {|}\psi ^\bot {\rangle }{\langle }\psi ^\bot {|} V^\dagger \right] = 0\), for some unit vector \(\psi ^\bot \in {\mathcal {V}}_T^\bot \). Then and consequently \({\hat{P}}^\bot V \psi ^\bot = 0\). Hence, \(V\psi ^\bot \in \mathrm {supp}(\mathrm {tr}_{{\mathcal {H}}}\left[ V {|}v{\rangle }{\langle }v{|} V^\dagger \right] ) \otimes {\mathcal {H}}\) and \(\psi ^\bot \in {\mathcal {V}}_T\). As this is a contradiction, the claim follows.

To prove 6, we use Hölder’s inequality for Schatten norms. Applying this inequality to the RHS of (3.13) yields

$$\begin{aligned} {\mathfrak {i}}_T(\rho ) \le \left\Vert V^\dagger {\hat{P}}^\bot V \right\Vert _\infty \left\Vert P^\bot \rho P^\bot \right\Vert _1 = \mathrm {tr}\left[ P^\bot \rho \right] . \end{aligned}$$

The last equality follows, since \(V^\dagger {\hat{P}}^\bot V\) is an orthogonal projection (and thus has norm 1) and since \(P^\bot \rho P^\bot \ge 0\). This proves the claim. \(\square \)

Remark 3.12

Since by the previous theorem, every subspace that is isometric w.r.t. T and contains the vacuum is contained in \({\mathcal {V}}_T\), checking the conditions in Theorem 2.2 reduces to checking whether

$$\begin{aligned} T_A\vert _{{\mathcal {B}}({\mathcal {V}}_{T_A})} \ne T_B\vert _{{\mathcal {B}}({\mathcal {V}}_{T_A})} \quad \mathrm {or} \quad T_A\vert _{{\mathcal {B}}({\mathcal {V}}_{T_B})} \ne T_B\vert _{{\mathcal {B}}({\mathcal {V}}_{T_B})}. \end{aligned}$$
(3.14)

This can be done efficiently, since \({\mathcal {V}}_{T_A}\) and \({\mathcal {V}}_{T_B}\) can be computed by simple linear algebraic methods.

4 The Discrimination Protocol

The main goal of this section is to prove Theorem 2.5. This is done in two steps. At first, we show how to discriminate between the identity channel and a compact set of channels, where some additional conditions are imposed on the channels under consideration. In particular, we obtain the following theorem.

Theorem 4.1

For \(\mathrm {dim}({\mathcal {H}}) < \infty \), let \({\mathcal {C}} \subseteq {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\) be a closed set of channels and let \(v \in {\mathcal {H}}\) be a unit vector such that for all \(T \in {\mathcal {C}}\), the state \({|}v{\rangle }{\langle }v{|}\) is the only state that is a fixed point of T. Then, there exists a constant C and for every \(N \in {\mathbb {N}}\) an N-step discrimination strategy D and a two-valued POVM \(\Pi \) such that

$$\begin{aligned} P_e(D, \Pi ) \le \frac{C}{N^2}, \end{aligned}$$
(4.1)

where the discrimination error probability is w.r.t the sets \(\{\mathrm {id}\}\) and \({\mathcal {C}}\).

Furthermore, if \(T \in {\mathcal {C}}\) is a channel with transmission functional \({\mathfrak {t}}_{T}\) and \({\mathfrak {t}}_{T}({|}v{\rangle }{\langle }v{|}) = 0\), then the total transmission \({\mathfrak {T}}_T(D)\) is bounded by

$$\begin{aligned} {\mathfrak {T}}_T(D) \le \frac{C\left\Vert {\mathfrak {t}}_T \right\Vert }{N}. \end{aligned}$$
(4.2)

In particular, if \({\mathfrak {t}}_\mathrm {id}= 0\) and for all \(T \in {\mathcal {C}}\), T is a channel with transmission functional \({\mathfrak {t}}_{T}\), with \({\mathfrak {t}}_{T}({|}v{\rangle }{\langle }v{|}) = 0\); and if \(\sup _{T \in {\mathcal {C}}} \left\Vert {\mathfrak {t}}_T \right\Vert < \infty \), then the sets \(\{\mathrm {id}\}\) and \({\mathcal {C}}\) can be discriminated in a transmission-free manner.

Proof

This statement is a direct consequence of Theorem 4.10 and the discussion in the paragraph “Description of the discrimination strategy.” \(\square \)

The second step then is to show how to reduce the general case to Theorem 4.1. This is the main content of Sect. 4.2, in which we also prove Theorem 2.5.

4.1 Empty or Not?

In this section we study a special case of the general discrimination task. That is, we study the case where we want to discriminate between the identity channel (empty box) and a compact set of channels \({\mathcal {C}} \subseteq {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\), which does not contain the identity channel. We show that under some conditions on the spectrum of the channels in \({\mathcal {C}}\) and on the transmission functionals, a Kwiat et al.-like strategy suffices to perform the task in a transmission-free manner, even if the underlying Hilbert space is infinite-dimensional. In the finite-dimensional case, our considerations reduce to Theorem 4.1. Before we go into detail on what we mean by a Kwiat et al.-like strategy, we give an overview of the additional conditions we impose on the channels in \({\mathcal {C}}\).

Outline of the Assumptions Our first assumption is that there is a pure state \({|}v{\rangle }{\langle }v{|} \in {\mathcal {S}}({\mathcal {H}})\) (vacuum) that is a fixed point of all channels in \({\mathcal {C}}\) and that the transmission functionals satisfy \({\mathfrak {t}}_T({|}v{\rangle }{\langle }v{|}) = 0\) for all \(T \in {\mathcal {C}}\). As a remark, note that if there were no state \(\rho \in {\mathcal {S}}({\mathcal {H}})\), with \({\mathfrak {t}}_T(\rho ) = 0\) for all \(T \in {\mathcal {C}}\), then, of course, the discrimination task is impossible. On the other hand, if there exists such a state \(\rho \), then, by the spectral theorem and the linearity and positivity of \({\mathfrak {t}}_T\), there exists a pure state \(\rho _v \in {\mathcal {S}}({\mathcal {H}})\), with \({\mathfrak {t}}_T(\rho _v) = 0\) for all \(T \in {\mathcal {C}}\). But then, if \(\rho _v\) is not a fixed point of T, the discrimination task becomes trivial. Thus, assuming a pure fixed point for the current setting is not a strong assumption.

Our second assumption is that all channels in \({\mathcal {C}}\) have a spectral gap. That is, if we exclude 1 from the spectrum of T, then the remaining part must be contained in a disk of radius less than 1 (remember that since T is a channel, its spectral radius is 1 and 1 is part of the spectrum). In Remark 4.11, we show that the spectral gap assumption cannot be waived completely if a Kwiat et al.-like protocol (defined below) should succeed.

Our third assumption is that the spectral gap assumption is compatible in a certain sense with the discrimination strategy. Expression (4.9) in the statement of Theorem  4.5 makes this statement precise. A sufficient condition for the compatibility assumption to be fulfilled (given our second assumption) is that 1 is a simple eigenvalue of every channel in \({\mathcal {C}}\). This is the content of Theorem 4.6. Furthermore, in the finite-dimensional case our second assumption is automatically fulfilled (given our first assumption) if 1 is a simple eigenvalue of every channel in \({\mathcal {C}}\). This is the content of Theorem 4.10.

Our fourth assumption concerns the relation between the channels in \({\mathcal {C}}\) and their associated transmission functionals. Note that the definition of a transmission functional (Definition 3.5) does not impose such a relation. For our current purpose, however, this is problematic since \(\sup _{T \in {\mathcal {C}}} \left\Vert {\mathfrak {t}}_T \right\Vert \) may be infinite. We will thus assume that \(\sup _{T \in {\mathcal {C}}} \left\Vert {\mathfrak {t}}_T \right\Vert \) is finite. This is a very mild assumption, since it is implied if \({\mathfrak {t}}_T\) depends continuously on T (which is very reasonable on physical grounds). Furthermore, note that if \({\mathfrak {t}}_T\) is an “interaction” functional, then, as a consequence of Claim 6 in Lemma 3.10, we have \(\sup _{T \in {\mathcal {C}}} \left\Vert {\mathfrak {t}}_T \right\Vert \le 1\).

Fig. 6
figure 6

General form of a Kwiat et al.-like strategy

Description of the Discrimination Strategy The next step is to design a strategy that allows us to discriminate between the identity channel and \({\mathcal {C}}\). An important factor in designing a strategy is the amount of resources that are needed to implement it. To this end, we show that only a bare minimum is required. Let \(H \in {\mathcal {B}}({\mathcal {H}})\) be a self-adjoint operator such that v is not an eigenvector of \(e^{-iH}\). In other words, we assume that is strictly less than 1. Then, our strategy is to repeat the N-step discrimination strategy, depicted in Fig. 6, a total of K times. More precisely, upon defining the 1-parameter family of channels \(U_t : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) by \(U_t(\cdot ) = e^{-iHt}\,\cdot \, e^{iHt}\), the discrimination strategy is given by the initial state \(s_0 := {|}v{\rangle }{\langle }v{|}\) and the set of channels \(\Lambda \), with \(\Lambda _i := U_\frac{1}{N}\) for \(0 \le i \le N-1\) and \(\Lambda _N := \mathrm {id}\). After each execution of the discrimination strategy, we perform a measurement described by the two-valued POVM \(\{P^\bot , {|}v{\rangle }{\langle }v{|}\}\), where \(P^\bot := \mathbbm {1}- {|}v{\rangle }{\langle }v{|}\). If all K outcomes correspond to the second event, then we decide that the unknown channel is in \({\mathcal {C}}\) and if otherwise we decide that the unknown channel is the identity. Of course, this protocol can be cast into the form of an NK-step discrimination strategy by using an ancillary system and the principle of deferred measurement (see [25], p. 186). We call this strategy \(D_{H,N,K}\). By Definition 2.4, the error probability is then given by

$$\begin{aligned} P_e(D_{H, N, K}, \Pi ) = \frac{1}{2} \left( \mathrm {tr}\left[ {|}v{\rangle }{\langle }v{|} \rho _N^\mathrm {id}\right] ^K + \sup _{T \in {\mathcal {C}}} \left\{ \mathrm {tr}\left[ P^\bot \rho _N^T\right] \sum _{k = 0}^{K-1} \mathrm {tr}\left[ {|}v{\rangle }{\langle }v{|} \rho _N^T\right] ^k \right\} \right) , \end{aligned}$$

where \(\rho \) is the intermediate state map and where \(\Pi \) denotes the measurement scheme described above. Explicitly, we have

$$\begin{aligned} \rho _N^\mathrm {id}= U_{\frac{1}{N}}^N({|}v{\rangle }{\langle }v{|}) = e^{-iH}{|}v{\rangle }{\langle }v{|}e^{iH} \quad \text { and } \quad \rho _N^T = (T\circ U_{\frac{1}{N}})^N({|}v{\rangle }{\langle }v{|}). \end{aligned}$$

In general, this leads to the estimate

$$\begin{aligned} P_e(D_{H, N, K}, \Pi ) \le \frac{1}{2} \left( C_H^{2K} + K \sup _{T\in {\mathcal {C}}} \mathrm {tr}\left[ P^\bot \rho _N^T\right] \right) . \end{aligned}$$
(4.3)

Now suppose that \(P_M := \sup _{T\in {\mathcal {C}}} \mathrm {tr}\left[ P^\bot \rho _N^T\right] \) approaches zero as \(N \rightarrow \infty \) (we show this below). Then, for given \(\epsilon > 0\), we can choose \(K := \left\lceil \frac{\ln (\epsilon )}{\ln (C_H)}\right\rceil \) and N such that \(K P_M < \epsilon \). It follows from (4.3) that \(P_e(D_{H, N, K}, \Pi ) < \epsilon \). In other words, \(P_e(D_{H, N, K}, \Pi )\) approaches zero if and only if \(P_M\) does. Furthermore, for a channel \(T \in {\mathcal {C}}\), the total transmission is given by

$$\begin{aligned} {\mathfrak {T}}_T(D_{H, N, K}) = K \sum _{n = 0}^{N-1} {\mathfrak {t}}_T\left( \rho _n^{T}\right) = K{\mathfrak {T}}_T(D_{H, N, 1}). \end{aligned}$$

Thus, also \({\mathfrak {T}}_T(D_{H, N, K})\) approaches zero if and only if \({\mathfrak {T}}_T(D_{H, N, 1})\) does. In addition to that, we could always choose H such that . In that case, it suffices to set \(K = 1\), which yields the simple expression

$$\begin{aligned} P_e(D_{H, N, 1}, \Pi ) = \frac{1}{2} \mathrm {tr}\left[ P^\bot (T\circ U_{\frac{1}{N}})^N({|}v{\rangle }{\langle }v{|})\right] , \end{aligned}$$

for the error probability. Hence, in order to find a strategy that discriminates between the identity channel and the set \({\mathcal {C}}\), we only need to show that the quantities \(P_M\) and \(\sup _{T\in {\mathcal {C}}}{\mathfrak {T}}_T(D_{H, N, 1})\) approach zero for \(N \rightarrow \infty \). Moreover, since \({\mathfrak {t}}_{T}\) can be written in the form \({\mathfrak {t}}_{T}(\cdot ) = \mathrm {tr}\left[ \Theta _T \cdot \right] \) for some positive semi-definite operator \(\Theta _T \in {\mathcal {B}}({\mathcal {H}})\) and since, by assumption \({\mathfrak {t}}_{T}({|}v{\rangle }{\langle }v{|}) = 0\), we can conclude that for \(\rho \ge 0\),

$$\begin{aligned} {\mathfrak {t}}_{T}(\rho ) \le \left\Vert {\mathfrak {t}}_{T} \right\Vert \mathrm {tr}\left[ P^\bot \rho \right] . \end{aligned}$$

The important conclusion that we draw from the discussion above is that in order to prove Theorem 4.1, it suffices to show (under the hypotheses of Theorem 4.1) that for any self-adjoint \(H \in {\mathcal {B}}({\mathcal {H}})\), there is a constant C such that the inequalities

$$\begin{aligned} \mathrm {tr}\left[ P^\bot \, (T \circ U_{\frac{1}{N}})^N({|}v{\rangle }{\langle }v{|})\right] \le \frac{C}{N^2}, \end{aligned}$$
(4.4)
$$\begin{aligned} \mathrm {tr}\left[ P^\bot \, \sum _{n = 0}^{N-1} (U_{\frac{1}{N}} \circ T)^n({|}v{\rangle }{\langle }v{|})\right] \le \frac{C}{N}, \end{aligned}$$
(4.5)

hold for all \(N \in {\mathbb {N}}\). This is precisely the statement of Theorem 4.10. Taking the validity of Theorem 4.10 for granted, we conclude that Theorem 4.1 holds.

Technical Theorems The remainder of this section is devoted to the proof of Theorem 4.10 and its infinite-dimensional versions. The following lemmas serve this purpose.

Lemma 4.2

([26], p. 202). Let \(T \in {\mathcal {B}}({\mathcal {H}})\), let \(z \in {\mathbb {C}}\) be in the unbounded component of the resolvent \(\rho (T)\), and let X be a closed invariant subspace of T. Then, X is an invariant subspace of \((z-T)^{-1}\).

Lemma 4.3

Let \(T: {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be a channel such that 1 is in the discrete spectrum of T. Then, for any \(n \in {\mathbb {N}}\) and any (rectifiable) path inside the resolvent set of T that encloses 1, and separates 1 from \(\sigma (T)\setminus \{1\}\), we have

$$\begin{aligned} \frac{1}{2\pi i} \oint \limits _{\Gamma _1}\frac{z^n}{z-T}\,\mathrm {d}z = \frac{1}{2\pi i} \oint \limits _{\Gamma _1}\frac{1}{z-T}\,\mathrm {d}z. \end{aligned}$$
(4.6)

Proof

See “Appendix A”. \(\square \)

Lemma 4.4

(Invariant subspace lemma). Let \(T : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be a channel, where \({\mathcal {H}}\) can be finite or infinite dimensional. Let \(v \in {\mathcal {H}}\) be such that \({|}v{\rangle }{\langle }v{|}\) is a fixed point of T and set \(V_v := \mathrm {span}\{v\}\). Then, the subspaces

$$\begin{aligned} {\mathcal {B}}_{v \bot }&:= \left\{ {|}v{\rangle }{\langle }\phi {|} \,\big |\, \phi \in V_v^\bot \right\} , \end{aligned}$$
(4.7)
$$\begin{aligned} {\mathcal {B}}_{\bot v}&:= \left\{ {|}\phi {\rangle }{\langle }v{|} \,\big |\, \phi \in V_v^\bot \right\} , \end{aligned}$$
(4.8)

are invariant under T.

Proof

We prove that \({\mathcal {B}}_{v \bot }\) is invariant. The invariance of \({\mathcal {B}}_{\bot v}\) follows as T is Hermiticity-preserving. Let \(\{ K_i \}\) be a set of (non-zero) Kraus-operators of T. By assumption we have

$$\begin{aligned} {|}v{\rangle }{\langle }v{|}&= T({|}v{\rangle }{\langle }v{|}) = \sum _i \mathrm {tr}\left[ K_i^\dagger K_i\right] \frac{K_i {|}v{\rangle }{\langle }v{|} K_i^\dagger }{\mathrm {tr}\left[ K_i^\dagger K_i\right] }, \end{aligned}$$

where the series converges in trace norm. As the pure state \({|}v{\rangle }{\langle }v{|}\) is an extreme point of the closed and convex set of quantum states and the RHS is a convex combination of states, we must have that \(K_i {|}v{\rangle }{\langle }v{|} K_i^\dagger \) is proportional to \({|}v{\rangle }{\langle }v{|}\). Henceforth, v is an eigenvector of \(K_i\) for all i. We denote the corresponding eigenvalue by \(\lambda _i\). So for \(\psi \in V_v^\bot \), we get

$$\begin{aligned} T({|}v{\rangle }{\langle }\psi {|}) = \sum _i K_i {|}v{\rangle } {\langle }\psi {|}K_i^\dagger = {|}v{\rangle }{\langle }\phi {|}, \end{aligned}$$

where \(\phi = \sum _i \overline{\lambda _i} K_i \,\psi \). As T is trace-preserving, we have

Hence, \(\phi \in V_v^\bot \). This proves the claim. \(\square \)

The following theorem is the main technical result. In fact, everything else in this section can (to some extent) be regarded as a corollary to this theorem.

Theorem 4.5

Let \(T: {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be a channel such that 1 is in the discrete spectrum of T, and let \(v \in {\mathcal {H}}\) be a unit vector such that \({|}v{\rangle }{\langle }v{|}\) is a fixed point of T. Furthermore, let \(H \in {\mathcal {B}}({\mathcal {H}})\) be self-adjoint, \(\tau > 0\) and \(0< \delta < 1\) such that

$$\begin{aligned} \sigma (U_t \circ T) \subseteq {\mathbb {D}}_{1-\delta }(0) \cup \{1\}, \end{aligned}$$
(4.9)

for \(0 \le t \le \tau \), where \(U_t:{\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) is defined by \(U_t(\cdot ) := e^{-iHt} \cdot e^{iHt}\). Then, the inequalities

$$\begin{aligned} \mathrm {tr}\left[ P^\bot \, (T \circ U_{\frac{1}{N}})^N({|}v{\rangle }{\langle }v{|})\right] \le \frac{C}{N^2}, \end{aligned}$$
(4.10)
$$\begin{aligned} \mathrm {tr}\left[ P^\bot \, \sum _{n = 0}^{N-1} (U_{\frac{1}{N}} \circ T)^n({|}v{\rangle }{\langle }v{|})\right] \le \frac{C}{N}, \end{aligned}$$
(4.11)

hold for all \(N \in {\mathbb {N}}\). Here, \(P^\bot := \mathbbm {1}- {|}v{\rangle }{\langle }v{|}\) and

$$\begin{aligned} C := \max \left\{ \tau ^{-2}, \;18 \delta ^{-1} \left\Vert H \right\Vert _{{\mathcal {B}}({\mathcal {H}})}^2 \max _{\begin{array}{c} 0 \le t \le \tau \\ z \in \Gamma \end{array}} \left\Vert (z-T)^{-1} \right\Vert \left\Vert (z-U_tT)^{-1} \right\Vert \right\} < \infty , \end{aligned}$$

where \(\Gamma := \left\{ z \in {\mathbb {C}} \,\big |\, |z| = 1-\frac{\delta }{2} \right\} \cup \left\{ z \in {\mathbb {C}} \,\big |\, |z-1| = \frac{\delta }{2} \right\} \).

Proof

We need to calculate the quantities (4.10) and (4.11). To do so, we employ the holomorphic functional calculus. For \(0 \le t \le \tau \) and \(n \in {\mathbb {N}}\), we have

$$\begin{aligned} (U_tT)^n&= \frac{1}{2\pi i} \oint \limits _{|z - 1| = \frac{\delta }{2}}\frac{z^n}{z-U_tT}\,\mathrm {d}z + \frac{1}{2\pi i}\oint \limits _{|z| = 1-\frac{\delta }{2}}\frac{z^n}{z-U_tT}\,\mathrm {d}z \end{aligned}$$
(4.12)
$$\begin{aligned}&= \frac{1}{2\pi i} \oint \limits _{|z - 1| = \frac{\delta }{2}}\frac{1}{z-U_tT}\,\mathrm {d}z + \frac{1}{2\pi i}\oint \limits _{|z| = 1-\frac{\delta }{2}}\frac{z^n}{z-U_tT}\,\mathrm {d}z, \end{aligned}$$
(4.13)

where we used Lemma 4.3 to obtain the second line. Under the trace, we can (crudely) estimate this term as follows:

$$\begin{aligned} \begin{aligned} \left| \mathrm {tr}\left[ P^\bot (U_tT)^n({|}v{\rangle }{\langle }v{|})\right] \right|&\le \frac{\delta }{2} \max _{|z-1| = \frac{\delta }{2}} \left| \mathrm {tr}\left[ P^\bot \frac{1}{z-U_tT}({|}v{\rangle }{\langle }v{|})\right] \right| \\&\quad + \left( 1-\frac{\delta }{2}\right) ^{n+1} \max _{|z| = 1-\frac{\delta }{2}} \left| \mathrm {tr}\left[ P^\bot \frac{1}{z-U_tT}({|}v{\rangle }{\langle }v{|})\right] \right| \\&\le \max _{z \in \Gamma } \left| \mathrm {tr}\left[ P^\bot \frac{1}{z-U_tT}({|}v{\rangle }{\langle }v{|})\right] \right| . \end{aligned} \end{aligned}$$
(4.14)

In everything that follows, we assume that \(z \in \Gamma \). To proceed, we need two auxiliary calculations. First, we use the second resolvent identity ([27], p. 84) twice to obtain

$$\begin{aligned} \begin{aligned} \frac{1}{z-U_tT}&= \frac{1}{z-T} + \frac{1}{z-T}(U_t-\mathrm {id})\frac{T}{z-T} \\&\quad + \frac{1}{z-U_tT}(U_t-\mathrm {id})\frac{T}{z-T}(U_t-\mathrm {id})\frac{T}{z-T}. \end{aligned} \end{aligned}$$
(4.15)

Second, an elementary application of Taylor’s formula yields

$$\begin{aligned} \left\Vert U_t - \mathrm {id} \right\Vert&\le 2 \left\Vert H \right\Vert _{{\mathcal {B}}({\mathcal {H}})} t, \end{aligned}$$
(4.16)
$$\begin{aligned} (U_t - \mathrm {id})(\rho )&= i[\rho , H]t + {\mathfrak {U}} t^2, \end{aligned}$$
(4.17)

with \(||{\mathfrak {U}}|| \le 2 \left\Vert H \right\Vert _{{\mathcal {B}}({\mathcal {H}})}^2\). When looking at (4.15), it is clear that the summands are of zeroth, first and second order in t, as \(t \rightarrow 0\). The crucial step is to show that under the trace, the second term is \({\mathcal {O}}(t^2)\). Using (4.17), we get

$$\begin{aligned} \frac{1}{z-T}(U_t-\mathrm {id})\frac{T}{z-T}({|}v{\rangle }{\langle }v{|})&= \frac{1}{z-1}\frac{1}{z-T}(U_t-\mathrm {id})({|}v{\rangle }{\langle }v{|}) \nonumber \\&= \frac{i t}{z-1} \frac{1}{z-T} ({|}v{\rangle }{\langle }H v{|} - {|}H v{\rangle }{\langle }v{|}) \nonumber \\&\quad + \frac{t^2}{z-1} \frac{1}{z-T}({\mathfrak {U}}({|}v{\rangle }{\langle }v{|})). \end{aligned}$$
(4.18)

It is easily verified, using the self-adjointness of H, that \({|}v{\rangle }{\langle }H v{|} - {|}H v{\rangle }{\langle }v{|} = {|}v{\rangle }{\langle }\phi {|} - {|}\phi {\rangle }{\langle }v{|}\), with . Clearly, . Thus, \({|}\phi {\rangle }{\langle }v{|} \in {\mathcal {B}}_{\bot v}\) and \({|}v{\rangle }{\langle }\phi {|} \in {\mathcal {B}}_{v \bot }\), where \({\mathcal {B}}_{\bot v}\) and \({\mathcal {B}}_{v \bot }\) are both invariant subspaces of T (by Lemma 4.4). As z is in the unbounded component of the resolvent set of T, Lemma 4.2 implies that also \((z-T)^{-1}({|}\phi {\rangle }{\langle }v{|}) \in {\mathcal {B}}_{\bot v}\) and \((z-T)^{-1}({|}v{\rangle }{\langle }\phi {|}) \in {\mathcal {B}}_{v \bot }\). Thus, the first term in (4.18) vanishes under the trace, and we get

$$\begin{aligned} \left| \mathrm {tr}\left[ P^\bot (4.18)\right] \right| \le t^2 \frac{2 \left\Vert H \right\Vert _{{\mathcal {B}}({\mathcal {H}})}^2}{|z-1|} \left\Vert (z-T)^{-1} \right\Vert . \end{aligned}$$
(4.19)

So under the trace, this term is indeed quadratic in t. For the other two terms in (4.15), we have

$$\begin{aligned} \left| \mathrm {tr}\left[ P^\bot \frac{1}{z-T}({|}v{\rangle }{\langle }v{|})\right] \right| = \frac{1}{|z-1|}\mathrm {tr}\left[ P^\bot {|}v{\rangle }{\langle }v{|})\right] = 0 \end{aligned}$$
(4.20)

and

$$\begin{aligned}&\left| \mathrm {tr}\left[ P^\bot \;\frac{1}{z-U_tT}(U_t-\mathrm {id})\frac{T}{z-T}(U_t-\mathrm {id})\frac{T}{z-T}({|}v{\rangle }{\langle }v{|})\right] \right| \nonumber \\&\quad \le \frac{1}{|z-1|} \left\Vert (z-U_tT)^{-1} \right\Vert \left\Vert U_t - \mathrm {id} \right\Vert ^2 \left\Vert \frac{T}{z-T} \right\Vert \nonumber \\&\quad \le t^2 \frac{4\left\Vert H \right\Vert _{{\mathcal {B}}({\mathcal {H}})}^2}{|z-1|} \left\Vert (z-U_tT)^{-1} \right\Vert \left\Vert (z-T)^{-1} \right\Vert , \end{aligned}$$
(4.21)

where we used the estimate (4.16) and \(\left\Vert T \right\Vert = 1\) to obtain the last line. We can now use the results obtained in (4.19), (4.20), and (4.21) to estimate the quantity of interest, (4.14). We have

$$\begin{aligned} {}(4.14)&\le 2 t^2 \left\Vert H \right\Vert _{{\mathcal {B}}({\mathcal {H}})}^2 \max _{z \in \Gamma } \frac{\left\Vert (z-T)^{-1} \right\Vert (1+2\left\Vert (z-U_tT)^{-1} \right\Vert )}{|z-1|}\nonumber \\&\le t^2 \left( 18 \delta ^{-1} \left\Vert H \right\Vert _{{\mathcal {B}}({\mathcal {H}})}^2 \max _{\begin{array}{c} 0 \le t^\prime \le \tau \\ z \in \Gamma \end{array}} \left\Vert (z-T)^{-1} \right\Vert \left\Vert (z-U_{t^\prime }T)^{-1} \right\Vert \right) \nonumber \\&=: t^2 C_0. \end{aligned}$$
(4.22)

To obtain the second estimate, we used \(\max _{z \in \Gamma } |z-1|^{-1} = 2\delta ^{-1}\) and \(\left\Vert (z-U_tT)^{-1} \right\Vert \ge \left\Vert (z-U_tT) \right\Vert ^{-1} \ge (|z| + 1)^{-1} \ge \frac{2}{5}\). Equation (4.22) is a bound for \(t \le \tau \). To prove the theorem, we need a bound for all \(t \ge 0\). To this end, we note that \(\mathrm {tr}\left[ P^\bot (U_tT)^n({|}v{\rangle }{\langle }v{|})\right] \le 1\), since the expression represents a probability. We further define \(C := \max (\tau ^{-2}, C_0)\). If \(t \le \tau \), then by Eq. (4.22),

$$\begin{aligned} \mathrm {tr}\left[ P^\bot (U_tT)^n({|}v{\rangle }{\langle }v{|})\right] \le t^2 C_0 \le Ct^2. \end{aligned}$$

And if \(t > \tau \), then

$$\begin{aligned} \mathrm {tr}\left[ P^\bot (U_tT)^n({|}v{\rangle }{\langle }v{|})\right] \le 1 \le \frac{t^2}{\tau ^2} \le Ct^2. \end{aligned}$$

Hence,

$$\begin{aligned} \mathrm {tr}\left[ P^\bot (U_tT)^n({|}v{\rangle }{\langle }v{|})\right] \le C t^2, \end{aligned}$$

for all \(t \ge 0\). This is a bound independent of n. Inequality (4.11) is then easily obtained by setting \(t := \frac{1}{N}\) and summing over all n, which yields an additional factor N. It remains to show inequality (4.10), in which \(U_t\) and T have switched order. Since \({|}v{\rangle }{\langle }v{|}\) is a fixed point of T, we have \(\mathrm {tr}\left[ P^\bot (TU_t)^N({|}v{\rangle }{\langle }v{|})\right] = \mathrm {tr}\left[ P^\bot T(U_tT)^N({|}v{\rangle }{\langle }v{|})\right] \). We set \(\rho := (U_tT)^N({|}v{\rangle }{\langle }v{|})\) and \(\phi := P^\bot \rho v\) and write

Clearly, \({|}v{\rangle }{\langle }\phi {|} \in {\mathcal {B}}_{v \bot }\) and \({|}\phi {\rangle }{\langle }v{|} \in {\mathcal {B}}_{\bot v}\). Hence, by Lemma 4.4, we have \(T({|}v{\rangle }{\langle }\phi {|}) \in {\mathcal {B}}_{v \bot }\) and \(T({|}\phi {\rangle }{\langle }v{|}) \in {\mathcal {B}}_{\bot v}\). Thus,

$$\begin{aligned} \mathrm {tr}\left[ P^\bot T(\rho )\right] = \mathrm {tr}\left[ P^\bot T(P^\bot \rho P^\bot )\right] \le \mathrm {tr}\left[ T(P^\bot \rho P^\bot )\right] = \mathrm {tr}\left[ P^\bot \rho \right] . \end{aligned}$$

Hence,

$$\begin{aligned} \mathrm {tr}\left[ P^\bot (TU_t)^N({|}v{\rangle }{\langle }v{|})\right] \le \frac{C}{N^2}. \end{aligned}$$

This finishes the proof. \(\square \)

Theorem 4.6

Let \({\mathcal {C}} \subseteq {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\) be a compact set of channels, and let \(v \in {\mathcal {H}}\) be a unit vector. Assume that

  1. 1.

    For all \(T \in {\mathcal {C}}\), the quantity

    $$\begin{aligned} r_T := \sup _{z \in \sigma (T) \setminus \{1\}} |z| \end{aligned}$$

    is strictly less than 1. In other words, the spectral gap is nonzero.

  2. 2.

    For each \(T \in {\mathcal {C}}\), the state \({|}v{\rangle }{\langle }v{|}\) is a fixed point of T.

  3. 3.

    For all \(T \in {\mathcal {C}}\), the algebraic multiplicityFootnote 12 of the isolated point \(1 \in \sigma (T)\) is 1. In other words, 1 is a simple eigenvalue.

Furthermore, let \(H \in {\mathcal {B}}({\mathcal {H}})\) be self-adjoint and \(U_t: {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be defined by \(U_t(\cdot ) = e^{-iHt}\cdot e^{iHt}\). Then, there exists a constant \(C_{\mathcal {C}} < \infty \), such that

$$\begin{aligned} \mathrm {tr}\left[ P^\bot \, (T \circ U_{\frac{1}{N}})^N({|}v{\rangle }{\langle }v{|})\right]&\le \frac{C_{\mathcal {C}} \left\Vert H \right\Vert ^2_{{\mathcal {B}}({\mathcal {H}})}}{N^2}, \end{aligned}$$
(4.23)
$$\begin{aligned} \mathrm {tr}\left[ P^\bot \, \sum _{n = 0}^{N-1} (U_{\frac{1}{N}} \circ T)^n({|}v{\rangle }{\langle }v{|})\right]&\le \frac{C_{\mathcal {C}} \left\Vert H \right\Vert ^2_{{\mathcal {B}}({\mathcal {H}})} }{N}, \end{aligned}$$
(4.24)

for all \(N \in {\mathbb {N}}\), where \(P^\bot := \mathbbm {1}- {|}v{\rangle }{\langle }v{|}\).

Remark 4.7

The distinctive feature of the preceding theorems is the \(N^{-2}\) bound in (4.23). It seems that such a bound cannot be obtained directly from the results in [13,14,15,16], because those results are of the form \(\left( U_{\frac{1}{N}} \circ T\right) ^N \approx {\mathcal {P}}+ {\mathcal {O}}(\frac{1}{N})\) for \(N \rightarrow \infty \), where \({\mathcal {P}}\) denotes the spectral projection on the eigenspace with eigenvalue 1.

Proof

The basic strategy is to reduce the claim to an application of Theorem 4.5. To this end, we basically need to show that Conditions 1–3 imply that condition (4.9) can be satisfied uniformly, i.e., that there exist \(\tau > 0\) and \(0< \delta < 1\) such that (4.9) is satisfied for all \(T \in {\mathcal {C}}\). The main tool to show this is the upper semi-continuity of the spectrum. To use that property, we import the following two theorems.

Theorem 4.8

([28], p. 208). For a Banach space \({\mathcal {X}}\), let \(T, S \in {\mathcal {B}}({\mathcal {X}})\), and let \(\Gamma \) be a compact subset of the resolvent set \(\rho (T)\).

If \(\left\Vert T-S \right\Vert < \min _{z \in \Gamma } \left\Vert (z-T)^{-1} \right\Vert ^{-1}\), then \(\Gamma \subseteq \rho (S)\). Furthermore, for any open set \(V \subseteq {\mathbb {C}}\), with \(\sigma (T) \subset V\), there exists \(\gamma > 0\), such that \(\sigma (S) \subseteq V\) whenever \(\left\Vert S-T \right\Vert < \gamma \).

Theorem 4.9

([29], p. 67). For a Banach space \({\mathcal {X}}\), let \(P, Q \in {\mathcal {B}}({\mathcal {X}})\) be bounded projections with \(\left\Vert P - Q \right\Vert < 1\). Then, there exists an invertible operator \(A \in {\mathcal {B}}({\mathcal {X}})\), such that \(Q = APA^{-1}\). In particular \(\mathrm {ran}(P)\) and \(\mathrm {ran}(Q)\) are isomorphic.

To start, we show that not only \(r_T < 1\) for all \(T \in {\mathcal {C}}\), but that \(\sup _{T\in {\mathcal {C}}} r_T < 1\). To this end, we show that the function \(r : {\mathcal {C}} \rightarrow {\mathbb {R}}, \; T \mapsto r_T\) is upper semi-continuous. That is, we need to show that for every \(T \in {\mathcal {C}}\) and every \(\epsilon > 0\), there is a set \(U \subseteq {\mathcal {C}}\), which is open in the relative topology on \({\mathcal {C}}\), such that \(r_S \le r_T + \epsilon \) for all \(S \in U\). For fixed T and \(\epsilon > 0\), define \(\epsilon ^\prime := \min (\epsilon , \frac{1-r_{T}}{3})\) and the open set \(V_{\epsilon ^\prime } := B_{r_{T}+\epsilon ^\prime }(0) \cup B_{\epsilon ^\prime }(1) \subseteq {\mathbb {C}}\). By construction, \(\sigma (T) \subseteq V_{\epsilon ^\prime }\). Thus, Theorem 4.8 implies that there exists \(\gamma > 0\) such that \(\sigma (S) \subseteq V_{\epsilon ^\prime }\) for all \(S \in B_\gamma (T)\). Thus, for \(S \in B_\gamma (T)\) the projection \(P_S\) onto the spectral subspace associated with the spectral subset \(\sigma (S) \cap B_{\epsilon ^\prime }(1)\) is given by

$$\begin{aligned} P_S := \frac{1}{2\pi i} \oint \limits _{|z - 1| = \frac{1-r_T}{2}}\frac{1}{z-T_n}\,\mathrm {d}z = P_T + \frac{1}{2\pi i} \oint \limits _{|z - 1| = \frac{1-r_T}{2}}\frac{1}{z-S}(S-T)\frac{1}{z-T}\,\mathrm {d}z, \end{aligned}$$

where we used the second resolvent identity to obtain the last equation. A standard estimate yields

$$\begin{aligned} \left\Vert P_S - P_T \right\Vert \le \frac{1-r_T}{2} \left\Vert S-T \right\Vert \max _{|z-1| = \frac{1-r_T}{2}} \left\{ \left\Vert (z-S)^{-1} \right\Vert \left\Vert (z-T)^{-1} \right\Vert \right\} . \end{aligned}$$

Since the set \(S_0 := \overline{B_{\frac{\gamma }{2}}(T)} \cap {\mathcal {C}}\) is compact, the constant

$$\begin{aligned} C_0 :=\max _{\begin{array}{c} |z-1| = \frac{1-r_T}{2}\\ S \in S_0 \end{array}} \left\{ \left\Vert (z-S)^{-1} \right\Vert \left\Vert (z-T)^{-1} \right\Vert \right\} \end{aligned}$$

is finite. We set \(\gamma ^\prime := \min \{\frac{\gamma }{2}, \frac{1}{(1-r_T) C_0}\}\) and \(U := B_{\gamma ^\prime }(T) \cap {\mathcal {C}}\). By construction, U is open in the relative topology on \({\mathcal {C}}\) and we have \(\sigma (S) \subseteq V_{\epsilon ^\prime }\) and \(\left\Vert P_S - P_T \right\Vert \le \frac{1}{2} < 1\) for all \(S \in U\). By Assumption 3, \(\mathrm {ran}(P_T)\) is 1-dimensional. Thus, by Theorem 4.9, also \(\mathrm {ran}(P_S)\) is one-dimensional, for \(S \in U\). Thus, there can be only one point in \(\sigma (S) \cap B_{\epsilon ^\prime }(1)\), and this point must be 1, as 1 is in the spectrum of every channel. Hence, for \(S \in U\), we have \(\sigma (S)\setminus \{ 1 \} \subseteq B_{r_T+\epsilon ^\prime }(0)\). So \(r(S) = r_{S} \le r_{T} + \epsilon ^\prime \le r_{T} + \epsilon = r(T) + \epsilon \). In other words, r is upper semi-continuous. The upper semi-continuous function r assumes its maximum on the compact set \({\mathcal {C}}\). This maximum cannot be equal to 1, as this would contradict Assumption 1. Thus \(\max _{T\in {\mathcal {C}}} r_T < 1\), as claimed.

In preparation for the application of Theorem 4.5, we define the joint spectral gap

$$\begin{aligned} \delta _J := 1 - \max _{T \in {\mathcal {C}}} r(T). \end{aligned}$$
(4.25)

We have \(0< \delta _J < 1\) and

$$\begin{aligned} \sigma (T) \subseteq {\mathbb {D}}_{1-\delta _J}(0) \cup \{1\}, \end{aligned}$$

for all \(T \in {\mathcal {C}}\). We define \(\Gamma := {\mathbb {D}}_{1+\frac{\delta _J}{3}}(0)\setminus (B_{\frac{\delta _J}{3}}(1) \cup B_{1-\frac{2\delta _J}{3}}(0))\), which is a compact subset of \(\rho (T)\) for all \(T\in {\mathcal {C}}\), and we set

$$\begin{aligned} \tau := \frac{1}{7\left\Vert H \right\Vert _{{\mathcal {B}}({\mathcal {H}})}} \min _{\begin{array}{c} T \in {\mathcal {C}} \\ z\in \Gamma \end{array}} \left\Vert (z-T)^{-1} \right\Vert ^{-2}, \end{aligned}$$

which is nonzero, as the minimization is over a strictly positive function on a compact set. For this particular choice of \(\tau \), we show that

$$\begin{aligned} \sigma (U_tT) \subseteq D_{1-\frac{2\delta _J}{3}}(0) \cup \{1\} \end{aligned}$$

for \(0 \le t \le \tau \) and then use Theorem 4.5. From now on, let \(0 \le t \le \tau \) and \(T\in {\mathcal {C}}\). Using the Taylor estimate (4.16) and the definition of \(\tau \) yields

$$\begin{aligned} \left\Vert T - U_tT \right\Vert&\le \left\Vert U_t-\mathrm {id} \right\Vert \left\Vert T \right\Vert \le 2\left\Vert H \right\Vert _{{\mathcal {B}}({\mathcal {H}})}t \nonumber \\&\le \frac{2}{7} \min _{\begin{array}{c} T \in {\mathcal {C}} \\ z\in \Gamma \end{array}} \left\Vert (z-T)^{-1} \right\Vert ^{-2}. \end{aligned}$$
(4.26)

This inequality has two important implications. First, for \(z \in \Gamma \) we have \(\left\Vert (z-T)^{-1} \right\Vert ^{-1} \le \left\Vert z-T \right\Vert \le |z| + 1 \le \frac{7}{3}\). Hence, (4.26) \(< \min _{\begin{array}{c} T \in {\mathcal {C}} \\ z\in \Gamma \end{array}} \left\Vert (z-T)^{-1} \right\Vert ^{-1}\) and we can apply Theorem 4.8, which tells us that \(\Gamma \subseteq \rho (U_tT)\) for all \(T\in {\mathcal {C}}\) and \(0 \le t \le \tau \). Equivalently,

$$\begin{aligned} \sigma (U_tT) \subseteq {\mathbb {D}}_{1-\frac{2\delta _J}{3}}(0) \cup {\mathbb {D}}_{\frac{\delta _J}{3}}(1). \end{aligned}$$

Thus, we only have to show that \(\sigma (U_tT) \cap {\mathbb {D}}_{\frac{\delta _J}{3}}(1) = \{1\}\).

Second, \(\left\Vert (U_tT - T)(z-T)^{-1} \right\Vert \le \frac{2}{7}\min _{\begin{array}{c} T \in {\mathcal {C}} \\ z\in \Gamma \end{array}} \left\Vert (z-T)^{-1} \right\Vert ^{-1} \le \frac{2}{3}\). Thus, the series

$$\begin{aligned} \frac{1}{z-T} \sum _{k = 0}^\infty \left[ (U_tT-T)(z-T)^{-1}\right] ^k = (z-U_tT)^{-1} \end{aligned}$$

converges. A term-by-term estimate yields

$$\begin{aligned} \left\Vert (z-U_tT)^{-1} \right\Vert \le 3\left\Vert (z-T)^{-1} \right\Vert , \end{aligned}$$
(4.27)

Let \(P_t := \frac{1}{2\pi i} \oint \limits _{|z - 1| = \frac{\delta _J}{3}}\frac{1}{z-U_tT} \,\mathrm {d}z\) be the spectral projection, then

$$\begin{aligned} \left\Vert P_t - P_0 \right\Vert&= \left\Vert \frac{1}{2\pi i} \oint \limits _{|z - 1| = \frac{\delta _J}{3}}\frac{1}{z-U_tT} - \frac{1}{z-T}\,\mathrm {d}z \right\Vert \\&\le \frac{\delta _J}{3} \max _{|z - 1| = \frac{\delta _J}{3}} \left\Vert (z-U_tT)^{-1} - (z-T)^{-1} \right\Vert \\&= \frac{\delta _J}{3} \max _{|z - 1| = \frac{\delta _J}{3}} \left\Vert (z-U_tT)^{-1} (U_tT-T) (z-T)^{-1} \right\Vert \\&\le \delta _J \left\Vert U_tT-T \right\Vert \max _{z \in \Gamma } \left\Vert (z-T)^{-1} \right\Vert ^{2} \\&\le \frac{2\delta _J}{7} < 1, \end{aligned}$$

where we used the second resolvent identity to obtain the third line, (4.27) for the fourth line and (4.26) for the fifth line. Hence, by Theorem 4.9, the dimension of \(\mathrm {ran}(P_t)\) equals the dimension of \(\mathrm {ran}(P_0)\) for all \(0 \le t \le \tau \), and the latter dimension is 1. Thus, \(\sigma (U_tT) \cap {\mathbb {D}}_{\frac{\delta _J}{3}}(1)\) contains exactly one point, which must be 1, as \(U_tT\) is a channel. In conclusion, we have

$$\begin{aligned} \sigma (U_tT) \subseteq {\mathbb {D}}_{1 - \delta }(0) \cup \{1\}, \end{aligned}$$

for all \(T\in {\mathcal {C}}\) and \(0 \le t \le \tau \), with \(\delta := \frac{2\delta _J}{3}\). Finally, a direct application of Theorem 4.5 proves the claim. We can also get an explicit bound for \(C_{\mathcal {C}}\). To this end, we need to bound the constant that appears in Theorem 4.5. We have

$$\begin{aligned} \tau ^{-2} = 49 \left\Vert H \right\Vert _{{\mathcal {B}}({\mathcal {H}})}^2 \max _{\begin{array}{c} T \in {\mathcal {C}}\\ z\in \Gamma \end{array}} \left\Vert (z-T)^{-1} \right\Vert ^4 \end{aligned}$$

and, by (4.27), the second term can be bounded by

$$\begin{aligned} 36 \delta _J^{-1} \left\Vert H \right\Vert _{{\mathcal {B}}({\mathcal {H}})}^2 \max _{\begin{array}{c} T \in {\mathcal {C}}\\ z \in \Gamma \end{array}} \left\Vert (z-T)^{-1} \right\Vert ^2. \end{aligned}$$
(4.28)

Furthermore, by the spectral mapping theorem, the spectral radius of \((z-T)^{-1}\) is given by \((\inf _{s \in \sigma (T)} \left\Vert z-s \right\Vert )^{-1} = (\mathrm {dist}(z, \sigma (T)))^{-1}\). Since the norm of any operator is an upper bound for the spectral radius, we have

$$\begin{aligned} \max _{\begin{array}{c} T \in {\mathcal {C}}\\ z\in \Gamma \end{array}} \left\Vert (z-T)^{-1} \right\Vert \ge \max _{\begin{array}{c} T \in {\mathcal {C}}\\ z\in \Gamma \end{array}} \left\{ \mathrm {dist}(z, \sigma (T))^{-1} \right\} \ge 3\delta _J^{-1} \ge 3. \end{aligned}$$

By applying this bound to (4.28), we see that \(\tau ^{-2} \ge \) (4.28). Thus, we can choose

$$\begin{aligned} C_{\mathcal {C}} := 49 \max _{\begin{array}{c} T\in {\mathcal {C}} \\ z \in \Gamma \end{array}} \left\Vert (z-T)^{-1} \right\Vert ^4 < \infty . \end{aligned}$$
(4.29)

\(\square \)

Theorem 4.10

For \(\mathrm {dim}({\mathcal {H}}) < \infty \), let \({\mathcal {C}}\) be a closed set of channels \(T : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) and let \(v \in {\mathcal {H}}\) be a unit vector such that for every \(T \in {\mathcal {C}}\), the state \({|}v{\rangle }{\langle }v{|}\) is the only state that is a fixed point of T.

Furthermore, let \(H \in {\mathcal {B}}({\mathcal {H}})\) be self-adjoint and \(U_t: {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be defined by \(U_t(\cdot ) = e^{-iHt}\cdot e^{iHt}\). Then, there exists a constant \(C_{\mathcal {C}} < \infty \), such that for all \(N \in {\mathbb {N}}\),

$$\begin{aligned} \mathrm {tr}\left[ P^\bot \, (T \circ U_{\frac{1}{N}})^N({|}v{\rangle }{\langle }v{|})\right]&\le \frac{C_{\mathcal {C}} \left\Vert H \right\Vert ^2_{{\mathcal {B}}({\mathcal {H}})} }{N^2} \end{aligned}$$
(4.30)
$$\begin{aligned} \mathrm {tr}\left[ P^\bot \, \sum _{n = 0}^{N-1} (U_{\frac{1}{N}} \circ T)^n({|}v{\rangle }{\langle }v{|})\right]&\le \frac{C_{\mathcal {C}} \left\Vert H \right\Vert ^2_{{\mathcal {B}}({\mathcal {H}})} }{N}, \end{aligned}$$
(4.31)

where \(P^\bot := \mathbbm {1}- {|}v{\rangle }{\langle }v{|}\).

Proof

The claim follows from Theorem 4.6 and from results by Burgarth and Giovannetti [30]. In particular, in their terminology, a channel T is called \(\textit{ergodic}\) if there is a unique state that is a fixed point of T. And (according to Theorem 7 in [30]), T is called mixing if 1 is the only eigenvalue with modulus 1 and the eigenvalue 1 is simple. Thus, in particular, if T is mixing, then the spectral gap is nonzero. Theorem 8 in [30] says that ergodic channels are mixing if the unique state that is a fixed point is pure. By assumption, every \(T \in {\mathcal {C}}\) is ergodic and the only state that is a fixed point is the pure state \({|}v{\rangle }{\langle }v{|}\). Thus, all \(T \in {\mathcal {C}}\) are mixing and the conditions in Theorem 4.6 are automatically satisfied. This proves the claim. \(\square \)

Remark 4.11

In the previous theorem, it is important that \({|}v{\rangle }{\langle }v{|}\) is the only state that is a fixed point. To demonstrate this, we define the Hamiltonian on a qubit system, \({\mathcal {H}}_Q := \mathrm {span}\{v, q_1\}\), as \(H := \frac{\pi }{2} \sigma _y\), where \(\sigma _y\) is the Pauli matrix.Footnote 13 So, \(U_t(\cdot ) := e^{-iHt} \cdot e^{iHt}\). The channel \(T : {\mathcal {B}}_1({\mathcal {H}}_Q) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_Q)\) is then defined by

$$\begin{aligned} T(\cdot ) := \mathrm {tr}\left[ {|}v{\rangle }{\langle }v{|}\, \cdot \,\right] {|}v{\rangle }{\langle }v{|} + \mathrm {tr}\left[ {|}q_1{\rangle }{\langle }q_1{|}\, \cdot \,\right] {|}q_1{\rangle }{\langle }q_1{|}. \end{aligned}$$

It is not hard to verify by induction that

$$\begin{aligned} (U_{\frac{1}{N}}\circ T)^n = U_{\frac{1}{N}}\left( \frac{1}{2}(1+\cos ^n(2\theta )) {|}v{\rangle }{\langle }v{|} + \frac{1}{2}(1-\cos ^n(2\theta )) {|}q_1{\rangle }{\langle }q_1{|}\right) , \end{aligned}$$

where \(\theta := \frac{\pi }{2N}\). The formula for the sum of the geometric progression yields

$$\begin{aligned} \sum _{n = 0}^{N-1} (U_{\frac{1}{N}} \circ T)^n({|}v{\rangle }{\langle }v{|}) = U_{\frac{1}{N}}\left( \frac{1}{2}\left( N + \lambda \right) {|}v{\rangle }{\langle }v{|} + \frac{1}{2}\left( N - \lambda \right) {|}q_1{\rangle }{\langle }q_1{|} \right) , \end{aligned}$$

with \(\lambda := \frac{1-\cos ^N(2\theta )}{2\sin ^2(\theta )}\). It is an exercise in elementary calculus (or a query in your favorite computer algebra system) to show that

$$\begin{aligned} \lim _{N\rightarrow \infty }(N-\lambda ) = \frac{\pi ^2}{4}. \end{aligned}$$
(4.32)

Since \(U_{\frac{1}{N}} \rightarrow \mathrm {id}\), when \(N \rightarrow \infty \), it follows that the quantity on the RHS of (4.31) does not vanish as \(N \rightarrow \infty \). In particular, our example shows that the Kwiat et al.-like protocol cannot be applied naively. Thus, the reduction process described in the next section is needed in some cases.

Remark 4.12

If the channel in Theorem 4.10 is a qubit channel (\({\mathcal {H}} = \mathrm {span}\{v, p\}\)), then one can determine the precise asymptotics in a rather tedious calculation. We only state the result, which is that if \(H := \frac{\pi }{2} \sigma _y\), then

$$\begin{aligned} \begin{aligned} \lim _{N\rightarrow \infty } N^2 \mathrm {tr}\left[ P^\bot \, (T \circ U_{\frac{1}{N}})^N({|}v{\rangle }{\langle }v{|})\right]&= \lim _{N\rightarrow \infty } N \mathrm {tr}\left[ P^\bot \, \sum _{n = 0}^{N-1} (U_{\frac{1}{N}} \circ T)^n({|}v{\rangle }{\langle }v{|})\right] \\&= \frac{\pi ^2}{4}\frac{1- \left| \tau _0 \right| ^2}{(1-\tau ) \left| 1-\tau _0 \right| ^2}, \end{aligned} \end{aligned}$$
(4.33)

where \(\tau := \mathrm {tr}\left[ P^\bot T(P^\bot )\right] \) and \(\tau _0 := \mathrm {tr}\left[ {|}p{\rangle }{\langle }v{|} \,T({|}v{\rangle }{\langle }p{|})\right] \).

This result contains as a special case the result for semi-transparent objects [31, 32].

Remark 4.13

It is a direct consequence of the results in the next section that the \(N^{-1}\) form of the bound is optimal.

4.2 The Reduction Protocol

In this section, in which we assume that all Hilbert spaces are finite-dimensional, we want to transform our given channel in such a way that the Kwiat et al.-like strategy, which was described in the previous section, can be applied. The general idea is that instead of inserting the unknown channel directly into the circuit of Fig. 6, we preprocess and postprocess the states that go in and out of the channel. In other words, we replace the channel T in Fig. 6 by the construction that is depicted on the RHS of Fig. 7. In Fig. 7, \({\mathcal {H}}_Q\) and \({\mathcal {H}}_A\) are Hilbert spaces and \(R_0 : {\mathcal {B}}_1({\mathcal {H}}_Q) \rightarrow {\mathcal {B}}_1({\mathcal {H}}\otimes {\mathcal {H}}_A)\) and \(R_0^\prime : {\mathcal {B}}_1({\mathcal {H}}\otimes {\mathcal {H}}_A) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_Q)\) are channels. The resulting transformation can be viewed as a map \(R : {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}})) \rightarrow {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}_Q))\), defined by \(R(T) := R_0^\prime (T\otimes \mathrm {id}) R_0\). Maps of this kind are usually called superchannels [33]. Clearly, if T is a channel with transmission functional \({\mathfrak {t}}_T\), then R(T) is a channel with transmission functional \({\mathfrak {t}}_{R(T)} := {\mathfrak {t}}_{T} \circ \mathrm {tr}_A \circ R_0\). We say that the superchannel R transforms the transmission functional \({\mathfrak {t}}_T\) to \({\mathfrak {t}}_{R(T)}\). For consistency reasons, we also remark the following: As is shown in [33], for any superchannel \(S : {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}})) \rightarrow {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}_Q))\), there exists a Hilbert space \({\mathcal {H}}_{A^\prime }\) and channels \(S_0 : {\mathcal {B}}_1({\mathcal {H}}_Q) \rightarrow {\mathcal {B}}_1({\mathcal {H}}\otimes {\mathcal {H}}_{A^\prime })\) and \(S_0^\prime : {\mathcal {B}}_1({\mathcal {H}}\otimes {\mathcal {H}}_{A^\prime }) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_Q)\) such that \(S(T) = S_0^\prime (T\otimes \mathrm {id}) S_0\) for all \(T \in {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\). Of course, the choice of \({\mathcal {H}}_{A^\prime }\), \(S_0\), and \(S_0^\prime \) is not unique. The transformation of the transmission functional, however, is unique. To see this, assume that we apply S to the map \(T_B\), defined by \(T_B(\cdot ) = \mathrm {tr}\left[ B\,\cdot \,\right] \rho _0\), where \(\rho _0 \in {\mathcal {S}}({\mathcal {H}})\) and \(B\in {\mathcal {B}}({\mathcal {H}})\) are arbitrary. Since \(S_0^\prime \) is trace-preserving, we have for \(\sigma \in {\mathcal {B}}({\mathcal {H}}_Q)\), that \(\mathrm {tr}\left[ S(T)(\sigma )\right] = \mathrm {tr}\left[ (T\otimes \mathrm {id})S_0(\sigma )\right] = \mathrm {tr}\left[ B \mathrm {tr}_{A^\prime }\left[ S_0(\sigma )\right] \right] \). Since B and \(\sigma \) were arbitrary, it follows that \(\mathrm {tr}_{A^\prime }\circ S_0\) is independent of the choice of \({\mathcal {H}}_{A^\prime }\), \(S_0\), and \(S_0^\prime \). Hence, the transformation of the transmission functional is independent of the particular implementation of a superchannel. Formally, the replacement described above yields a transformation of the discrimination strategy. That is, given a discrimination strategy \(D = ({\mathcal {H}}_Q, {\mathcal {H}}_Z, {\mathcal {H}}_i, {\mathcal {H}}_o, s_0, \Lambda )\), with \(\Lambda = \{\Lambda _1, \Lambda _2, \dots , \Lambda _N\}\), then we obtain the transformed discrimination strategy \(D^R := ({\mathcal {H}}, {\mathcal {H}}_A\otimes {\mathcal {H}}_Z, {\mathcal {H}}_i, {\mathcal {H}}_o, s_0, \Lambda _R)\), with \(\Lambda ^R_0 := (R_0 \otimes \mathrm {id}_Z)\Lambda _0\), \(\Lambda ^R_N := \Lambda _N(R_0^\prime \otimes \mathrm {id}_Z)\), and \(\Lambda ^R_n := (R_0 \otimes \mathrm {id}_Z)\Lambda _n(R_0^\prime \otimes \mathrm {id}_Z)\), for \(1 \le n \le N-1\).

Fig. 7
figure 7

General transformation scheme: a superchannel

The task of this section is to show the existence of a superchannel such that the general discrimination task reduces to the one described in the last section. It will be evident from the proof of the following theorem that such a superchannel can be implemented by using only one ancillary qubit and classical resources. Furthermore, we show in Remark 4.18 that in general the implementation of such a superchannel is impossible without using an ancillary qubit.

Theorem 4.14

(Reduction superchannel). For \(\mathrm {dim}({\mathcal {H}}) < \infty \), let \(T : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be a channel and let \({\mathcal {V}} \subseteq {\mathcal {H}}\) be a subspace such that T is isometric on \({\mathcal {V}}\). Furthermore, let \(v \in {\mathcal {V}}\) be a unit vector. Then, there exists a two-dimensional Hilbert space \({\mathcal {H}}_Q\), with orthonormal basis \(\{q_0, q_1\}\) and a superchannel \(R : {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}})) \rightarrow {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}_Q))\) with the following properties:

  1. 1.

    If \(T^\prime \in {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\) satisfies \(T\vert _{{\mathcal {B}}_1({\mathcal {V}})} = T^\prime \vert _{{\mathcal {B}}_1({\mathcal {V}})}\), then \(R(T^\prime ) = \mathrm {id}\).

  2. 2.

    If \(T^\prime \in {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\) is a channel such that \(T\vert _{{\mathcal {B}}_1({\mathcal {V}})} \ne T^\prime \vert _{{\mathcal {B}}_1({\mathcal {V}})}\), then the only state that is a fixed point of \(R(T^\prime )\), is \({|}q_0{\rangle }{\langle }q_0{|}\).

  3. 3.

    If \(T^\prime \in {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\) is a channel with transmission functional \({\mathfrak {t}}_{T^\prime }\) and \({\mathfrak {t}}_{T}({|}v{\rangle }{\langle }v{|}) = 0\), then the transformed transmission functional \({\mathfrak {t}}_{R(T^\prime )}\) is given by

    $$\begin{aligned} {\mathfrak {t}}_{R(T^\prime )}(\cdot ) = \bigg \{\begin{array}{lr} \frac{1}{2} {\mathfrak {t}}_{T^\prime }(\frac{P^\bot }{d-1}) \mathrm {tr}\left[ {|}q_1{\rangle }{\langle }q_1{|}\,\cdot \,\right] &{} \text {if } d > 1\\ 0 &{} \text {if } d = 1 \end{array}, \end{aligned}$$
    (4.34)

    where \(d := \mathrm {dim}({\mathcal {V}})\) and where \(P^\bot \) denotes the orthogonal projection onto .

Before we prove the theorem, let us explore its consequences. First, we establish the analog of Theorem 2.5 for the transmission functional model.

Corollary 4.15

For \(\mathrm {dim}({\mathcal {H}}) < \infty \), let \({\mathcal {C}}_A, {\mathcal {C}}_B \subseteq {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\) be two closed sets of channels. Furthermore, let \({\mathcal {V}}\) be a subspace of \({\mathcal {H}}\) and let \(v \in {\mathcal {V}}\) be a unit vector such that

  1. 1.

    For all \(T \in {\mathcal {C}}_A \cup {\mathcal {C}}_B\), T is a channel with transmission functional \({\mathfrak {t}}_{T}\).

  2. 2.

    For all \(T \in {\mathcal {C}}_A\), T is isometric on \({\mathcal {V}}\).

  3. 3.

    For all \(T \in {\mathcal {C}}_A\), \({\mathfrak {t}}_{T}\vert _{{\mathcal {B}}_1({\mathcal {V}})} = 0\).

  4. 4.

    For all \(T \in {\mathcal {C}}_B\), \({\mathfrak {t}}_{T}({|}v{\rangle }{\langle }v{|}) = 0\).

  5. 5.

    \(\sup _{T \in {\mathcal {C}}_B} \left\Vert {\mathfrak {t}}_T\vert _{{\mathcal {B}}_1({\mathcal {V}})} \right\Vert < \infty \)

  6. 6.

    The set \({\mathcal {C}}_A\vert _{{\mathcal {B}}_1({\mathcal {V}})} := \left\{ T\vert _{{\mathcal {B}}_1({\mathcal {V}})} \,\big |\, T \in {\mathcal {C}}_A \right\} \) contains exactly one element.

  7. 7.

    \({\mathcal {C}}_A\vert _{{\mathcal {B}}_1({\mathcal {V}})}\) and \({\mathcal {C}}_B\vert _{{\mathcal {B}}_1({\mathcal {V}})} := \left\{ T\vert _{{\mathcal {B}}_1({\mathcal {V}})} \,\big |\, T \in {\mathcal {C}}_B \right\} \) are disjoint.

Then, there exist a constant C and for every \(N \in {\mathbb {N}}\), an N-step discrimination strategy D and a two-valued POVM \(\Pi \) such that

$$\begin{aligned} P_e(D, \Pi )\le & {} \frac{C}{N^2}, \end{aligned}$$
(4.35)
$$\begin{aligned} {\mathfrak {T}}_{T_A}(D)= & {} 0 \quad \text { and } \quad {\mathfrak {T}}_{T_B}(D) \le \frac{C}{N}, \end{aligned}$$
(4.36)

for all \(T_A \in {\mathcal {C}}_A\) and all \(T_B \in {\mathcal {C}}_B\), where the discrimination error probability is w.r.t. the sets \({\mathcal {C}}_A\) and \({\mathcal {C}}_B\). Hence, the sets \({\mathcal {C}}_A\) and \({\mathcal {C}}_B\) can be discriminated in a transmission-free manner.

Proof

We combine Theorems 4.1 and 4.14. Let us fix some \(T_A \in {\mathcal {C}}_A\). From Theorem 4.14 (with \(T = T_A\)), we obtain the map R, with the properties (1), (2), and (3). We want to apply Theorem 4.1 with \({\mathcal {C}} := R({\mathcal {C}}_B)\). Since \({\mathcal {C}}_B\) is (as a closed subset of the compact set of channels) compact and R is continuous, \({\mathcal {C}}\) is compact and hence closed. Furthermore, since by Assumption 7, the sets \({\mathcal {C}}_A\vert _{{\mathcal {B}}_1({\mathcal {V}})}\) and \({\mathcal {C}}_B\vert _{{\mathcal {B}}_1({\mathcal {V}})}\) are disjoint, we have \(T^\prime \vert _{{\mathcal {B}}_1(\mathcal {V)}} \ne T_A\vert _{{\mathcal {B}}_1({\mathcal {V}})}\) for all \(T^\prime \in {\mathcal {C}}_B\). Hence, property (2) implies that for all \(T \in {\mathcal {C}}\), the state \({|}q_0{\rangle }{\langle }q_0{|}\) is the only state that is a fixed point of T. In particular, \(\mathrm {id}\notin {\mathcal {C}}\). Furthermore, Assumption 6 implies that \(T^\prime \vert _{{\mathcal {B}}_1(\mathcal {V)}} = T_A\vert _{{\mathcal {B}}_1({\mathcal {V}})}\), for all \(T^\prime \in {\mathcal {C}}_A\). Hence, by property (1), \(R(\mathcal {C_A}) = \{\mathrm {id}\}\). Thus, Theorem 4.1 yields a discrimination strategy \({\tilde{D}}\) and a two-valued POVM such that \(P_e({\tilde{D}}, \Pi ) \le {\tilde{C}}N^{-2}\), for some constant \({\tilde{C}}\). By construction, \(P_e({\tilde{D}}, \Pi )\) is the discrimination probability w.r.t. the sets \({\mathcal {C}}\) and \(\{\mathrm {id}\}\), but since we have for \(T^\prime \in {\mathcal {C}}_A \cup {\mathcal {C}}_B\) that \(R(T^\prime ) \in \{\mathrm {id}\}\) iff \(T^\prime \in {\mathcal {C}}_A\) and \(R(T^\prime ) \in {\mathcal {C}}\) iff \(R(T^\prime ) \in {\mathcal {C}}_B\), it follows that \(P_e({\tilde{D}}^R, \Pi ) = P_e({\tilde{D}}, \Pi )\), where \({\tilde{D}}^R\) is the transformed discrimination strategy, as defined in the main text. For \(T^\prime \in {\mathcal {C}}_A\), condition 3 and property (3) imply that the transformed transmission functional \({\mathfrak {t}}_{R(T^\prime )} = 0\). Thus, \({\mathfrak {T}}_{T^\prime }({\tilde{D}}^R) = 0\). Furthermore, for \(T^\prime \in {\mathcal {C}}_B\) with transmission functional \({\mathfrak {t}}_{T^\prime }\), property (3) implies that the norm of the transformed transmission functional satisfies \(\left\Vert {\mathfrak {t}}_{R(T^\prime )} \right\Vert = \frac{1}{2} {\mathfrak {t}}_{T^\prime }\left( \frac{P^\bot }{d-1}\right) \le \frac{1}{2}\left\Vert {\mathfrak {t}}_{T^\prime }\vert _{{\mathcal {B}}_1({\mathcal {V}})} \right\Vert \). Since we have \({\mathfrak {T}}_{T^\prime }({\tilde{D}}^R) = {\mathfrak {T}}_{R(T^\prime )}({\tilde{D}})\), Theorem 4.1 implies that \({\mathfrak {T}}_{T^\prime }(D^R) \le \frac{{\tilde{C}}\left\Vert {\mathfrak {t}}_{T^\prime }\vert _{{\mathcal {B}}_1({\mathcal {V}})} \right\Vert }{2N}\). We finish the proof by identifying D with \({\tilde{D}}^R\) and defining

$$\begin{aligned} C := \max \left[ {\tilde{C}}, \frac{{\tilde{C}}}{2}\sup _{T^\prime \in {\mathcal {C}}_B}\left\Vert {\mathfrak {t}}_{T^\prime }\vert _{{\mathcal {B}}_1({\mathcal {V}})} \right\Vert \right] < \infty . \end{aligned}$$
(4.37)

\(\square \)

As a direct consequence of the previous result, we get the validity of Theorem 2.5.

Proof

(Theorem 2.5) We interpret every channel T with “interaction” functional \({\mathfrak {i}}_{T}\) as channel with transmission functional \({\mathfrak {i}}_{T}\). By Lemma 3.8, it suffices to check Conditions 1-7 of Corollary 4.15. 1, 2, 6, and 7 follow by assumption and 3, 4, and 5 follow directly from Lemma 3.10 (6). \(\square \)

The remainder of this section is devoted to the proof of Theorem 4.14. We show that the transformation depicted in Fig. 8 has the desired properties. We define this superchannel precisely in the proof of Theorem 4.14. An important part is the so-called twirling operation, which we study here for a special group.

Fig. 8
figure 8

The reduction superchannel for \(\mathrm {dim}({\mathcal {V}}) > 1\)

Lemma 4.16

(Twirling). For \(2 \le d := \mathrm {dim}({\mathcal {H}}) < \infty \), let \(v \in {\mathcal {H}}\) be a unit vector and set \(V_v := \mathrm {span}\{v\}\). We define the group

$$\begin{aligned} G := \left\{ g = \mathbbm {1}_{V_v} \oplus U_g \in {\mathcal {B}}(V_v\oplus V_v^\bot ) \,\big |\, U_g \in {\mathcal {B}}(V_v^\bot ) \text { is unitary} \right\} \end{aligned}$$
(4.38)

and the twirling superchannel \(S : {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}})) \rightarrow {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\) by

$$\begin{aligned} S(T) = \int {\hat{U}}_g \circ T \circ {\hat{U}}_g^{-1} \,\mathrm {d}\mu _G(g), \end{aligned}$$
(4.39)

where \(\mu _G\) is the Haar measure on G and \({\hat{U}}_g : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) is the quantum channel obtained by conjugation with the group element \(g \in G\), i.e., \({\hat{U}}_g(\cdot ) = g \cdot g^{-1}\). Then, the following statements hold.

  • Let \(\psi \in V_v^\bot \) be any unit vector and \(\phi := \frac{1}{\sqrt{2}}\left( v + \psi \right) \). If \(T : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) is a channel and \({|}\phi {\rangle }{\langle }\phi {|}\) is a fixed point of S(T), then \(T = \mathrm {id}\). Conversely, \(S(\mathrm {id}) = \mathrm {id}\) and thus \({|}\phi {\rangle }{\langle }\phi {|}\) is a fixed point of \(S(\mathrm {id})\).

  • For a functional \({\mathfrak {t}} : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathbb {C}}\), we have

    $$\begin{aligned} \int {\mathfrak {t}} \circ {\hat{U}}_g^{-1} \,\mathrm {d}\mu _G(g) = {\mathfrak {t}}\left( \frac{P^\bot }{d-1}\right) \, \mathrm {tr}\left[ P^\bot \,\cdot \, \right] + {\mathfrak {t}}({|}v{\rangle }{\langle }v{|}) \,\mathrm {tr}\left[ {|}v{\rangle }{\langle }v{|}\,\cdot \,\right] . \end{aligned}$$
    (4.40)

Remark 4.17

The integration over the Haar measure can be replaced by a unitary t-design [34]. We can thus implement the superchannel S without using an ancillary quantum system.

Proof

We start by showing that the range of S is spanned by the following seven operators:

$$\begin{aligned} \begin{aligned} \begin{aligned}&\mathrm {tr}\left[ {|}v{\rangle }{\langle }v{|} \,\cdot \, \right] {|}v{\rangle }{\langle }v{|}, \qquad&\mathrm {tr}\left[ P^\bot \,\cdot \, \right] {|}v{\rangle }{\langle }v{|}, \qquad&\mathrm {tr}\left[ {|}v{\rangle }{\langle }v{|} \,\cdot \,\right] \frac{P^\bot }{d-1}, \\&\mathrm {tr}\left[ P^\bot \,\cdot \, \right] \frac{P^\bot }{d-1},&P^\bot \cdot {|}v{\rangle }{\langle }v{|},&{|}v{\rangle }{\langle }v{|} \cdot P^\bot , \end{aligned}\\ P^\bot \cdot P^\bot - \mathrm {tr}\left[ P^\bot \,\cdot \,\right] \frac{P^\bot }{d-1}. \end{aligned} \end{aligned}$$
(4.41)

Using the definition of the Haar measure, we obtain that the range of S consists of precisely those operators \(T : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) that commute with \({\hat{U}}_g\) for all \(g \in G\). We calculate the commutant on the level of Choi matrices. To this end, we identify \({\mathcal {B}}_1({\mathcal {H}})\) with \({\mathcal {H}} \otimes {\mathcal {H}}\) via the Choi isomorphism (\({|}h_i{\rangle }{\langle }h_j{|} \leftrightarrow h_i \otimes h_j\)), where \(h_0, h_1, \dots h_{d-1}\) is an orthonormal basis of \({\mathcal {H}}\) such that \(h_0 = v\). The operator corresponding to \({\hat{U}}_g\) is \(g \otimes {\overline{g}}\), where the complex conjugation is w.r.t the aforementioned basis. We can rewrite this operator as:

$$\begin{aligned} g\otimes {\overline{g}}&= (\mathbbm {1}_{V_v} \oplus U_g) \otimes (\mathbbm {1}_{V_v} \oplus {\overline{U}}_g) \\&= (\mathbbm {1}_{V_v} \otimes \mathbbm {1}_{V_v}) \oplus (\mathbbm {1}_{V_v} \otimes {\overline{U}}_g) \oplus (U_g \otimes \mathbbm {1}_{V_v}) \oplus (U_g\otimes {\overline{U}}_g). \end{aligned}$$

The maps \(g \mapsto \mathbbm {1}_{V_v} \otimes \mathbbm {1}_{V_v}\), \(g \mapsto \mathbbm {1}_{V_v} \oplus {\overline{U}}_g\), and \(g \mapsto U_g \otimes \mathbbm {1}_{V_v}\) are inequivalent irreducible representations of G. If \(d = 2\), the representation \(g \mapsto (U_g\otimes {\overline{U}}_g)\) is the trivial 1-dimensional representation. A simple consequence of Schur’s lemma is that the commutant then is \(2^2 + 1^2 + 1^2 = 6\) dimensional (see [35], p. 60 for the dimension formula). For \(d=2\), the span of the operators in (4.41) is also 6-dimensional (\(P^\bot \cdot P^\bot - \mathrm {tr}\left[ P^\bot \cdot \right] \frac{P^\bot }{d-1} = 0\)). So in this case, we have proven the claim. If \(d \ge 3\), then the representation \(g \mapsto (U_g\otimes {\overline{U}}_g)\) is the direct sum of the trivial 1-dimensional representation and an irreducible \(((d-1)^2 - 1)\)-dimensional representation (see [36]). Hence, the dimension of the commutant is \(2^2+1^2+1^2+1^2 = 7\). Also, the dimension of the span of the operators in (4.41) is 7-dimensional. This proves that the range of S is indeed given by the span of the operators in (4.41).

For our first claim, we clearly have \(S(\mathrm {id}) = \mathrm {id}\). Conversely, let T be a channel such that \({|}\phi {\rangle }{\langle }\phi {|}\) is a fixed point of S(T). Let \(\alpha _1, \alpha _2, \dots ,\alpha _7\) be the coefficients of an expansion of S(T) in terms of the operators in (4.41). Note that for \(d = 2\), this expansion is not unique but can be made that way by demanding \(\alpha _7 := 1\). As \({|}\phi {\rangle }{\langle }\phi {|}\) is a fixed point of S(T), we have

$$\begin{aligned} \begin{aligned} {|}\phi {\rangle }{\langle }\phi {|}&= \frac{1}{2}\left( {|}v{\rangle }{\langle }v{|} + {|}v{\rangle }{\langle }\psi {|} + {|}\psi {\rangle }{\langle }v{|} + {|}\psi {\rangle }{\langle }\psi {|} \right) \\&= S(T)({|}\phi {\rangle }{\langle }\phi {|})\\&\begin{aligned} \;=\frac{1}{2} \bigg ( (\alpha _1 + \alpha _2){|}v{\rangle }{\langle }v{|} + (\alpha _3 + \alpha _4 - \alpha _7)\frac{P^\bot }{d-1} + \alpha _5 {|}\psi {\rangle }{\langle }v{|}&+ \alpha _6 {|}v{\rangle }{\langle }\psi {|} \\ {}&+ \alpha _7 {|}\psi {\rangle }{\langle }\psi {|} \bigg ). \end{aligned} \end{aligned} \end{aligned}$$

By comparing the second and the last expression, it follows that \(\alpha _1 + \alpha _2 = 1\) and \(\alpha _5 = \alpha _6 = 1\). If \(d = 2\), then \(P^\bot = {|}\psi {\rangle }{\langle }\psi {|}\) and \(\alpha _3 + \alpha _4 = 1\). Otherwise, we have \(\alpha _7 = 1\) and \(\alpha _3 + \alpha _4 - \alpha _7 = 0\), hence also \(\alpha _3 + \alpha _4 = 1\). Furthermore,

$$\begin{aligned} S(T)({|}v{\rangle }{\langle }v{|})&= \alpha _1 {|}v{\rangle }{\langle }v{|} + \alpha _3 \frac{P^\bot }{d-1},\nonumber \\ S(T)({|}\psi {\rangle }{\langle }\psi {|})&= \alpha _2 {|}v{\rangle }{\langle }v{|} + (\alpha _4 - \alpha _7) \frac{P^\bot }{d-1} + \alpha _7 {|}\psi {\rangle }{\langle }\psi {|}. \end{aligned}$$
(4.42)

As S(T) is trace-preserving, we obtain \(\alpha _1 + \alpha _3 = 1\) and \(\alpha _3 + \alpha _4 = 1\). Our equations imply that \(\alpha _2 = 1-\alpha _1, \alpha _3 = 1-\alpha _1\), and \(\alpha _4 = \alpha _1\). Positivity of S(T) in (4.42) implies that \(\alpha _1 \ge 0\) and \(\alpha _3 \ge 0\). Thus, \(0\le \alpha _1 \le 1\). We want to show that complete positivity of S(T) even implies \(\alpha _1 = 1\). To this end, we define \({\mathcal {H}}_A := \mathrm {span}\{v, \psi \}\) and \(\Omega ^+, \Omega ^- \in {\mathcal {H}}_A \otimes {\mathcal {H}}\) by

$$\begin{aligned} \Omega ^+ := v\otimes v + \psi \otimes \psi , \qquad \Omega ^- := v\otimes v - \psi \otimes \psi . \end{aligned}$$

As S(T) is completely positive, we have

Thus, \(\alpha _1 \ge 1\). This further implies that \(\alpha _1 = 1, \alpha _2 = 0, \alpha _3 = 0\), and \(\alpha _4 = 1\). Together with the earlier result that \(\alpha _5= \alpha _6 = \alpha _7 = 1\), we obtain

$$\begin{aligned} S(T)&= \mathrm {tr}\left[ {|}v{\rangle }{\langle }v{|} \,\cdot \, \right] {|}v{\rangle }{\langle }v{|} + \mathrm {tr}\left[ P^\bot \,\cdot \, \right] \frac{P^\bot }{d-1}+ P^\bot \cdot {|}v{\rangle }{\langle }v{|}+{|}v{\rangle }{\langle }v{|} \cdot P^\bot \\&\quad +P^\bot \cdot P^\bot - \mathrm {tr}\left[ P^\bot \,\cdot \,\right] \frac{P^\bot }{d-1} \\&= \mathrm {id}. \end{aligned}$$

Thus, we have shown that if \({|}\phi {\rangle }{\langle }\phi {|}\) is a fixed point of S(T), then \(S(T) = \mathrm {id}\). To see that this also implies that \(T = \mathrm {id}\), we note that S(T) is a convex combination of the channels \({\hat{U}}_g \circ T \circ {\hat{U}}^{-1}_g\). But as the identity is an extremal element of the convex set of quantum channels, \({\hat{U}}_g \circ T \circ {\hat{U}}^{-1}_g\) must be proportional to the identity \(\mu _G\)-almost everywhere. In particular, \({\hat{U}}_g \circ T \circ {\hat{U}}^{-1}_g = \mathrm {id}\), for some \(g \in G\). Thus, \(T = \mathrm {id}\). This proves the first claim.

It remains to prove the second claim. For \({\mathfrak {t}}(\cdot ) = \mathrm {tr}\left[ L\, \cdot \,\right] \) and \(\rho \in {\mathcal {B}}_1({\mathcal {H}})\), we have

$$\begin{aligned} S^\prime ({\mathfrak {t}})(\rho ) := \int {\mathfrak {t}} \circ {\hat{U}}_g^{-1}(\rho ) \,\mathrm {d}\mu _G(g) = \mathrm {tr}\left[ \int g L g^{-1} \,\mathrm {d}\mu _G(g) \; \rho \right] . \end{aligned}$$

By the definition of the Haar measure, the integral must commute with all \(g \in G\). The representation \(g \mapsto \mathbbm {1}_{V_v} \oplus U_g\) is the sum of two inequivalent irreducible representations of G. Thus, the commutant is 2-dimensional. It is easy to check that \(P^\bot \) and \({|}v{\rangle }{\langle }v{|}\) are in the commutant. Thus,

$$\begin{aligned} \int g L g^{-1} \,\mathrm {d}\mu _G(g) = \lambda _1 P^\bot + \lambda _2 {|}v{\rangle }{\langle }v{|}, \end{aligned}$$

for some \(\lambda _1, \lambda _2 \in {\mathbb {C}}\). Therefore, we can write

$$\begin{aligned} S^\prime ({\mathfrak {t}})(\rho ) = \lambda _1 \mathrm {tr}\left[ P^\bot \rho \right] + \lambda _2 \mathrm {tr}\left[ {|}v{\rangle }{\langle }v{|} \rho \right] . \end{aligned}$$

Substituting \(P^\bot \) and \({|}v{\rangle }{\langle }v{|}\) for \(\rho \) yields \(\lambda _1 = (d-1)^{-1} \,S^\prime ({\mathfrak {t}})(P^\bot )\) and \(\lambda _2 = S^\prime ({\mathfrak {t}})({|}v{\rangle }{\langle }v{|})\). As \(P^\bot \) and \({|}v{\rangle }{\langle }v{|}\) commute with all \(g \in G\), we have

$$\begin{aligned} S^\prime ({\mathfrak {t}})(P^\bot )&= {\mathfrak {t}} \left( \int g^{-1} P^\bot g \,\mathrm {d}\mu _G(g) \right) = {\mathfrak {t}}(P^\bot ),\\ S^\prime ({\mathfrak {t}})({|}v{\rangle }{\langle }v{|})&= {\mathfrak {t}} \left( \int g^{-1} {|}v{\rangle }{\langle }v{|} g \,\mathrm {d}\mu _G(g) \right) = {\mathfrak {t}}({|}v{\rangle }{\langle }v{|}). \end{aligned}$$

We plug this into (4.43) and obtain the desired result, Eq. (4.40). Thus, we have proven our last claim. \(\square \)

We are now ready to prove Theorem 4.14.

Proof

As already mentioned, the proof consists of an explicit construction of the superchannel R. The construction is depicted in Fig. 8. We start by defining the components of this circuit from left to right. For the definition of the first component, we define \({\mathcal {H}}_A\) to be a two-dimensional Hilbert space with orthonormal basis \(\{a_0, a_1 \}\). The channel \({\hat{W}} : {\mathcal {B}}_1({\mathcal {H}}_Q) \rightarrow {\mathcal {B}}_1({\mathcal {V}} \otimes {\mathcal {H}}_A)\) is defined by \({\hat{W}}(\cdot ) = W\cdot W^\dagger \), with isometry \(W : {\mathcal {H}}_Q \rightarrow {\mathcal {V}} \otimes {\mathcal {H}}_A\) defined by

$$\begin{aligned} W q_0&= v\otimes a_0,\\ W q_1&= \bigg \{\begin{array}{lr} \frac{1}{\sqrt{2}}\left( v + \psi \right) \otimes a_1, &{} \text {if } \mathrm {dim}({\mathcal {V}}) > 1\\ v\otimes a_1, &{} \text {if } \mathrm {dim}({\mathcal {V}}) = 1 \end{array}, \end{aligned}$$

where \(\psi \in {\mathcal {V}}\) is any unit vector that is orthogonal to v. This channel is designed in order to exhibit the second conclusion of Lemma 4.16.

The second component is the twirling operation \(S: {\mathcal {B}}({\mathcal {B}}_1({\mathcal {V}})) \rightarrow {\mathcal {B}}({\mathcal {B}}_1({\mathcal {V}}))\), which is a superchannel on its own and which we only define for \(\mathrm {dim}({\mathcal {V}}) > 1\). This operation is depicted by the two unitary channels \({\hat{U}}_g\) and \({\hat{U}}_g^{-1}\) connected by a dashed line and acts as

$$\begin{aligned} S(\cdot ) := \int {\hat{U}}_g \circ (\cdot ) \circ {\hat{U}}_g^{-1} \,\mathrm {d}\mu _G(g), \end{aligned}$$
(4.43)

where \(\mu _G\) is the Haar measure on the compact group G, defined by (cf. Lemma 4.16)

$$\begin{aligned} G := \left\{ g = \mathbbm {1}_{V_v} \oplus U_g \in {\mathcal {B}}(V_v\oplus V_v^\bot ) \,\big |\, U_g \in {\mathcal {B}}(V_v^\bot ) \text { is unitary} \right\} , \end{aligned}$$

with \(V_v := \mathrm {span}\{v\}\). The channels \({\hat{U}}_g, {\hat{U}}^{-1}_g : {\mathcal {B}}_1({\mathcal {V}}) \rightarrow {\mathcal {B}}_1({\mathcal {V}})\) are defined by

$$\begin{aligned} {\hat{U}}_g(\cdot )&:= (\mathbbm {1}_{V_v} \oplus U_g) (\cdot ) (\mathbbm {1}_{V_v} \oplus U_g^\dagger )&\text {and}&{\hat{U}}_g^{-1}(\cdot ) := (\mathbbm {1}_{V_v} \oplus U_g^\dagger ) (\cdot ) (\mathbbm {1}_{V_v} \oplus U_g). \end{aligned}$$

The channel \(\mathrm {id}_{{\mathcal {V}} \rightarrow {\mathcal {H}}} : {\mathcal {B}}_1({\mathcal {V}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}}), \rho \mapsto \rho \) embeds \({\mathcal {B}}_1({\mathcal {V}})\) into \({\mathcal {B}}_1({\mathcal {H}})\).

To define the channel \({\hat{V}}^{-1} : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\), we use that by assumption, T is isometric on \({\mathcal {V}}\). This means that there exists an isometry \({\tilde{V}} : {\mathcal {V}} \rightarrow {\mathcal {H}}\) such that \(T\vert _{{\mathcal {B}}_1({\mathcal {V}})}(\cdot ) = {\tilde{V}} \cdot {\tilde{V}}^\dagger \). This isometry can be extended (in a non-unique way) to a unitary and therefore invertible operation \(V: {\mathcal {H}} \rightarrow {\mathcal {H}}\). We then define

$$\begin{aligned} {\hat{V}}^{-1}(\cdot ) := V^\dagger \cdot V. \end{aligned}$$

We define the channel \({\hat{P}}_{\mathcal {V}} : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {V}})\) by

$$\begin{aligned} {\hat{P}}_{\mathcal {V}}(\cdot ) := P_{\mathcal {V}} \cdot P_{\mathcal {V}}^\dagger + \mathrm {tr}\left[ (\mathbbm {1}-P_{\mathcal {V}}^\dagger P_{\mathcal {V}}) (\cdot )\right] {|}v{\rangle }{\langle }v{|}, \end{aligned}$$

where \(P_{\mathcal {V}} : {\mathcal {H}} \rightarrow {\mathcal {V}}\) is the orthogonal projection onto \({\mathcal {V}}\). To finish the channel definitions, we define the channel \(P_W : {\mathcal {B}}_1({\mathcal {V}}\otimes {\mathcal {H}}_A) \rightarrow {\mathcal {B}}_1({\mathcal {H}}_Q)\) by

$$\begin{aligned} P_W(\cdot ) := W^\dagger \cdot W + \mathrm {tr}\left[ (\mathbbm {1}- W W^\dagger ) (\cdot )\right] {|}q_0{\rangle }{\langle }q_0{|}. \end{aligned}$$

We can now define the superchannel R. If \(\mathrm {dim}({\mathcal {V}}) > 1\), we define

$$\begin{aligned} R(\cdot ) := P_W\circ \left( \left[ \int {\hat{U}}_g \circ {\hat{P}}_{\mathcal {V}} \circ {\hat{V}}^{-1} \circ (\cdot ) \circ \mathrm {id}_{{\mathcal {V}}\rightarrow {\mathcal {H}}} \circ {\hat{U}}_g^{-1} \,\mathrm {d}\mu _G(g)\right] \otimes \mathrm {id}_A \right) \circ {\hat{W}}, \end{aligned}$$
(4.44)

and if \(\mathrm {dim}({\mathcal {V}}) = 1\), we define

$$\begin{aligned} R(\cdot ) := P_W \circ {\hat{V}}^{-1} \circ (\cdot ) \circ \mathrm {id}_{{\mathcal {V}}\rightarrow {\mathcal {H}}} \circ {\hat{W}}. \end{aligned}$$
(4.45)

With the definition in place, it only remains to show that the superchannel R has the claimed properties. To prove the first claim, let \(T^\prime \in {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\) such that \(T\vert _{{\mathcal {B}}_1({\mathcal {V}})} = T^\prime \vert _{{\mathcal {B}}_1({\mathcal {V}})}\). For \(\mathrm {dim}({\mathcal {V}}) > 1\), we use that by construction \({\hat{V}}^{-1}\circ T^\prime |_{{\mathcal {B}}_1({\mathcal {V}})} = \mathrm {id}_{\mathcal {V}}\) and that operators in \({\mathcal {B}}_1({\mathcal {V}})\) are fixed points of \(P_{\mathcal {V}}\). We get

$$\begin{aligned} R(T^\prime )&= P_W\circ \left( \left[ \int {\hat{U}}_g \circ \mathrm {id}_{\mathcal {V}} \circ {\hat{U}}_g^{-1} \,\mathrm {d}\mu _G(g)\right] \otimes \mathrm {id}_A \right) \circ {\hat{W}} \\&= P_W \circ {\hat{W}}\\&= \mathrm {id}_Q. \end{aligned}$$

By means of a similar argument, it follows that the claim also holds for \(\mathrm {dim}({\mathcal {V}}) = 1\). To prove the second claim, we start by showing that \({|}q_0{\rangle }{\langle }q_0{|}\) is a fixed point of \(R(T^\prime )\), for every channel \(T^\prime \). For \(\mathrm {dim}({\mathcal {V}}) > 1\), we have

$$\begin{aligned} R(T^\prime )({|}q_0{\rangle }{\langle }q_0{|})&= P_W\circ \left( S({\hat{P}}_{\mathcal {V}} \circ {\hat{V}}^{-1} \circ T^\prime \circ \mathrm {id}_{{\mathcal {V}} \rightarrow {\mathcal {H}}}) \otimes \mathrm {id}_A\right) \circ W({|}q_0{\rangle }{\langle }q_0{|}) \\&= P_W \left( S({\hat{P}}_{\mathcal {V}} \circ {\hat{V}}^{-1} \circ T^\prime \circ \mathrm {id}_{{\mathcal {V}} \rightarrow {\mathcal {H}}})({|}v{\rangle }{\langle }v{|}) \otimes {|}a_0{\rangle }{\langle }a_0{|} \right) \\&= {|}q_0{\rangle }{\langle }q_0{|}, \end{aligned}$$

where the last line follows as \(P_W\) maps every state of the form \(\sigma \otimes {|}a_0{\rangle }{\langle }a_0{|}\) to \({|}q_0{\rangle }{\langle }q_0{|}\). An analogous argument yields that \({|}q_0{\rangle }{\langle }q_0{|}\) is also a fixed point of \(R(T^\prime )\) if \(\mathrm {dim}({\mathcal {V}}) = 1\). Conversely, assume that \(T^\prime \in {\mathcal {B}}({\mathcal {B}}_1({\mathcal {H}}))\) is a channel such that \(T\vert _{{\mathcal {B}}_1({\mathcal {V}})} \ne T^\prime \vert _{{\mathcal {B}}_1({\mathcal {V}})}\) and \(\rho \in {\mathcal {S}}({\mathcal {H}}_Q)\) is a fixed point of \(R(T^\prime )\). We prove that \(\rho = {|}q_0{\rangle }{\langle }q_0{|}\). We do so by first showing that if \(\rho \ne {|}q_0{\rangle }{\langle }q_0{|}\), then \({|}q_1{\rangle }{\langle }q_1{|}\) is also a fixed point of \(R(T^\prime )\), which will lead to a contradiction. By part 1 of the theorem, \({|}q_0{\rangle }{\langle }q_0{|}\) is a fixed point of \(R(T^\prime )\). Hence, Lemma 4.4 implies that \(\mathrm {span}\{ {|}q_0{\rangle }{\langle }q_1{|} \}\) and \(\mathrm {span}\{ {|}q_1{\rangle }{\langle }q_0{|} \}\) are invariant subspaces of \(R(T^\prime )\). Thus,

We then have

Hence,

If , then positivity of \(\rho \) implies that \(\rho = {|}q_0{\rangle }{\langle }q_0{|}\), which contradicts the assumption that \(\rho \ne {|}q_0{\rangle }{\langle }q_0{|}\). It follows that

Positivity of \(R(T^\prime )(\rho )\) yields \(R(T^\prime )({|}q_1{\rangle }{\langle }q_1{|}) = {|}q_1{\rangle }{\langle }q_1{|}\), which shows that \({|}q_1{\rangle }{\langle }q_1{|}\) is a fixed point of \(R(T^\prime )\). We now show that this leads to a contradiction. With the abbreviations \({\tilde{S}} := S({\hat{P}}_{\mathcal {V}} \circ {\hat{V}}^{-1} \circ T^\prime \circ \mathrm {id}_{{\mathcal {V}} \rightarrow {\mathcal {H}}})\) and \(\phi := \frac{1}{\sqrt{2}}(v + \psi )\), we get

$$\begin{aligned} {|}q_1{\rangle }{\langle }q_1{|}&= R(T^\prime )({|}q_1{\rangle }{\langle }q_1{|}) \\&= P_W \left( {\tilde{S}}({|}\phi {\rangle }{\langle }\phi {|}) \otimes {|}a_1{\rangle }{\langle }a_1{|} \right) \\&= \mathrm {tr}\left[ {|}\phi {\rangle }{\langle }\phi {|} \,{\tilde{S}}({|}\phi {\rangle }{\langle }\phi {|}) \right] {|}q_1{\rangle }{\langle }q_1{|} + \mathrm {tr}\left[ (\mathbbm {1}- W W^\dagger )\, {\tilde{S}}({|}\phi {\rangle }{\langle }\phi {|})\right] {|}q_0{\rangle }{\langle }q_0{|}. \end{aligned}$$

Comparing the last with the first line implies that \(\mathrm {tr}\left[ {|}\phi {\rangle }{\langle }\phi {|} \,{\tilde{S}}({|}\phi {\rangle }{\langle }\phi {|}) \right] = 1\). We observe the latter equation says that the Cauchy–Schwarz inequality (w.r.t. the Hilbert–Schmidt inner product) is satisfied with equality. Thus, \({\tilde{S}}({|}\phi {\rangle }{\langle }\phi {|}) = {|}\phi {\rangle }{\langle }\phi {|}\). Lemma 4.16 then implies

$$\begin{aligned} {\hat{P}}_{\mathcal {V}} \circ {\hat{V}}^{-1} \circ T^\prime \circ \mathrm {id}_{{\mathcal {V}} \rightarrow {\mathcal {H}}} = \mathrm {id}_{\mathcal {V}}. \end{aligned}$$

Note that \(P_{\mathcal {V}}\) is the sum of the two completely positive trace non-increasing maps, \(P_1(\cdot ) := P_{\mathcal {V}} \cdot P_{\mathcal {V}}\) and \(P_2(\cdot ) := \mathrm {tr}\left[ (\mathbbm {1}-P_{\mathcal {V}}^\dagger P_{\mathcal {V}}) (\cdot )\right] {|}v{\rangle }{\langle }v{|}\). Thus, with the appropriate normalization, the extremal point of the convex set of completely positive maps, \(\mathrm {id}_{\mathcal {V}}\), can be written as a convex combination of \(P_i \circ {\hat{V}}^{-1} \circ T^\prime \circ \mathrm {id}_{{\mathcal {V}} \rightarrow {\mathcal {H}}}\). Thus,

$$\begin{aligned} {\hat{V}}^{-1} \circ T^\prime \circ \mathrm {id}_{{\mathcal {V}} \rightarrow {\mathcal {H}}} = \mathrm {id}_{{\mathcal {V}} \rightarrow {\mathcal {H}}}. \end{aligned}$$
(4.46)

As \({\hat{V}}^{-1}\) is invertible and \(T^\prime \circ \mathrm {id}_{{\mathcal {V}} \rightarrow {\mathcal {H}}}\), identity (4.46) is equivalent to

$$\begin{aligned} T^\prime \vert _{{\mathcal {B}}_1({\mathcal {V}})} = {\hat{V}}\vert _{{\mathcal {B}}_1({\mathcal {V}})}. \end{aligned}$$

By construction of \({\hat{V}}\), the RHS equals \(T\vert _{{\mathcal {B}}_1({\mathcal {V}})}\). But this contradicts the assumption that \(T\vert _{{\mathcal {B}}_1({\mathcal {V}})} \ne T^\prime \vert _{{\mathcal {B}}_1({\mathcal {V}})}\). Thus, \({|}q_1{\rangle }{\langle }q_1{|}\) cannot be a fixed point of \(R(T^\prime )\). Consequently, \(\rho = {|}q_0{\rangle }{\langle }q_0{|}\), which proves that \({|}q_0{\rangle }{\langle }q_0{|}\) is the only state that is a fixed point of \(R(T^\prime )\). This proves the second claim. To prove the third claim, we must calculate how our protocol transforms the transmission functional. For \(\mathrm {dim}({\mathcal {V}}) = 1\), we get directly from the definition (4.45) that \({\mathfrak {t}}_{R(T)}(\cdot ) = \mathrm {tr}\left[ \cdot \right] {\mathfrak {t}}_T({|}v{\rangle }{\langle }v{|}) = 0\). For \(\mathrm {dim}({\mathcal {V}}) > 1\), the transmission functional \({\mathfrak {t}}_T\) transforms to \({\mathfrak {t}}_{R(T)}\), given by

$$\begin{aligned} {\mathfrak {t}}_{R(T)} := \int {\mathfrak {t}}_T \circ \mathrm {id}_{{\mathcal {V}} \rightarrow {\mathcal {H}}} \circ {\hat{U}}_g^{-1} \circ \mathrm {tr}_A \circ {\hat{W}} \,\mathrm {d}\mu _G(g). \end{aligned}$$
(4.47)

To evaluate (4.47), we use (4.40) and get

$$\begin{aligned} {\mathfrak {t}}_{R(T)}(\cdot ) = {\mathfrak {t}}_T\circ \mathrm {id}_{{\mathcal {V}}\rightarrow {\mathcal {H}}}\left( \frac{P^\bot }{d-1}\right) \mathrm {tr}\left[ P^\bot \mathrm {tr}_{A}\left[ {\hat{W}}(\cdot )\right] \right] . \end{aligned}$$

A direct calculation then yields the claim. \(\square \)

Remark 4.18

With our protocol, we achieved a transformation from channels on \({\mathcal {H}}\) to qubit channels with certain properties. This was achieved by using classical communication and one ancillary qubit. To demonstrate that our implementation of this transformation uses the quantum resources in the most economic way possible, we show that in general one cannot use only classical communication to implement a transformation which has the desired properties. To this end, we consider the following procedure. First, we use an instrument to transform the state and to obtain classical information. Then, we apply the channel, which should be transformed. Afterwards, we apply some quantum channel, where the choice of the channel may depend on the classical information that we obtained in the first step. Our instrument described by a collection of nonzero quantum operations \(I_1, I_2, \dots , I_N\), such that \(\sum _i I_i\) is trace-preserving. We denote the associated channels that are applied in the last step by \(\Lambda _1, \Lambda _2, \dots , \Lambda _N\). Our protocol then implements the following transformation:

$$\begin{aligned} T \mapsto \sum _i \Lambda _i \circ T \circ I_i. \end{aligned}$$
(4.48)

Assume that the channel T of the Theorem 4.14 is the identity and \(\mathrm {dim}({\mathcal {H}}) = \mathrm {dim}({\mathcal {V}}) = 2\). Our first requirement is that \(\mathrm {id}\mapsto \mathrm {id}\). Thus,

$$\begin{aligned} \mathrm {id}= \sum _i \Lambda _i\circ I_i. \end{aligned}$$
(4.49)

Since \(\mathrm {id}\) is an extreme point of the convex set of quantum operations, there must be non-negative coefficients \(p_1, p_2, \dots , p_N\), such that

$$\begin{aligned} \Lambda _i\circ I_i = p_i\cdot \mathrm {id}, \text { for } i = 1, 2, \dots , N. \end{aligned}$$
(4.50)

This implies that \(\Lambda _i\) and \(I_i\) must be proportional to a unitary conjugation, i.e., \(\Lambda _i(\cdot ) = U_i^\dagger \cdot U_i\) and \(I_i(\cdot ) = p_i U_i \cdot U_i^\dagger \), for some unitary operator \(U_i\). Our second requirement is that (since \({\mathcal {V}} = {\mathcal {H}}\)) every channel except \(\mathrm {id}\) must be transformed to a state whose only fixed point is \({|}q_0{\rangle }{\langle }q_0{|} =: P_0\). In particular, for the pinching channel, defined by \(T_P(\cdot ) = P_0\cdot P_0 + P_1\cdot P_1\), with \(P_1 := \mathbbm {1}- P_0\), we have

$$\begin{aligned} P_0 = \sum _i \sum _{j=0}^1 p_i (U_i^\dagger P_j U_i) P_0 (U_i^\dagger P_j U_i). \end{aligned}$$
(4.51)

Since \(P_0\) is an extremal point of the convex set \(\left\{ \rho \ge 0 \,\big |\, \mathrm {tr}\left[ \rho \right] \le 1 \right\} \), we get that

$$\begin{aligned} (U_i^\dagger P_j U_i) P_0 (U_i^\dagger P_j U_i) = \lambda _{ij} P_0, \end{aligned}$$
(4.52)

for some \(\lambda _{ij} \ge 0\). From this, we conclude that either \(U_i^\dagger P_j U_i = P_0\) or \(U_i^\dagger P_j U_i = P_1\). But then the application of the transformed channel to \(P_1\) yields

$$\begin{aligned} \sum _i \sum _{j=0}^1 p_i (U_i^\dagger P_j U_i) P_1 (U_i^\dagger P_j U_i) = P_1. \end{aligned}$$
(4.53)

Thus, \(P_0\) is not the only state that is a fixed point of the transformed channel. Hence, to achieve our transformation, an ancillary system is needed.

5 No-Go Results

In this section, we consider the case for which we claimed in our main theorem that it is impossible to discriminate two channels in an “interaction-free” manner. There are two major results in this section: Theorem 5.7 which claims an inequality between the error probability and the “interaction” probability; and Theorem 5.9, which claims that, under a certain condition, the best achievable rate (in terms of the number of channel uses, N) for the “interaction” probability is proportional to \(N^{-1}\). Both theorems are consequences of our main technical results: Propositions 5.2 and 5.3. The proof techniques for these results are inspired by the techniques used in two papers by Mitchison, Massar, and Pironio [11, 12], who proved an analogous no-go result for the special case of a semitransparent object. Before we state the first proposition, we define a quantity that will appear as proportionality constant in the results of this section. As this may seem complicated, we want to stress that in all relevant cases, \(C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )}\) can be bounded by 2.

Definition 5.1

For \(\mathrm {dim}({\mathcal {H}}) < \infty \), let \(T_A^\downarrow , T_B^\downarrow : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be two linear maps, let \({\mathcal {V}}\) be a linear subspace of \({\mathcal {H}}\), and let \({\mathcal {W}} = \{{\mathcal {W}}_1, {\mathcal {W}}_2, \dots , {\mathcal {W}}_K\}\) be a collection of mutually orthogonal subspaces of \({\mathcal {V}}^\bot \) with the property that \({\mathcal {V}}^\bot = {\mathcal {W}}_1 \oplus {\mathcal {W}}_2 \oplus \dots \oplus {\mathcal {W}}_K\). Furthermore, let P and \(P_1, P_2, \dots , P_K\) be the orthogonal projections onto \({\mathcal {V}}\) and \({\mathcal {W}}_1, {\mathcal {W}}_2, \dots {\mathcal {W}}_K\).

We define the quantity \(C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )}\) to be the infimum of the (possibly empty) set of real numbers r with the property that there exists a finite-dimensional Hilbert space \({\mathcal {H}}_E\), isometries \(V_A, V_B : {\mathcal {H}} \rightarrow {\mathcal {H}}_E\otimes {\mathcal {H}}\), and orthogonal projections \(P_A, P_B : {\mathcal {H}}_E \rightarrow {\mathcal {H}}_E\) such thatFootnote 14

$$\begin{aligned} r&= \max _{1\le k \le K} \left\Vert P_k ({V_A}^\dagger (P_A P_B \otimes \mathbbm {1}) V_B - \mathbbm {1}) P_k \right\Vert , \end{aligned}$$
(5.1a)
$$\begin{aligned} V_A P&= V_B P , \end{aligned}$$
(5.1b)
$$\begin{aligned} T_X^\downarrow (\cdot )&= \mathrm {tr}_{E}\left[ (P_X\otimes \mathbbm {1})V_X \cdot V_X^\dagger \right] , \end{aligned}$$
(5.1c)

for \(X \in \{A, B\}\).

We are now ready to state the first important proposition, which establishes, for a single channel use, an uncertainty relation between the “information-gain” (RHS of (5.2)) about the identity of the channel (is it \(T_A\) or \(T_B\)?) and a quantity that depends on the probability that if we would measure the input states, we would find that they are supported in the orthogonal complement of a subspace \({\mathcal {V}}\). Later on, this subspace will be chosen to be a maximum vacuum subspace.

Proposition 5.2

(Information-interaction tradeoff). For \(\mathrm {dim}({\mathcal {H}}) < \infty \), let \(T_A^\downarrow , T_B^\downarrow : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be quantum operations and let \({\mathcal {V}}\) be a subspace of \({\mathcal {H}}\) such that \(T_A^\downarrow \vert _{{\mathcal {B}}_1(\mathcal {V)}}\) is trace-preserving and \(T_A^\downarrow \vert _{{\mathcal {B}}_1(\mathcal {V)}} = T_B^\downarrow \vert _{{\mathcal {B}}_1(\mathcal {V)}}\). Let \({\mathcal {W}} = \{{\mathcal {W}}_1, {\mathcal {W}}_2, \dots , {\mathcal {W}}_K\}\) be a collection of mutually orthogonal subspaces of \({\mathcal {V}}^\bot \), such that \({\mathcal {V}}^\bot = {\mathcal {W}}_1 \oplus {\mathcal {W}}_2 \oplus \dots \oplus {\mathcal {W}}_K\). Denote the orthogonal projections onto these subspaces by \(P_1, P_2, \dots , P_K\). Then, \(C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \le 2\) and

$$\begin{aligned} \sqrt{F}(\rho , \sigma ) - \sqrt{F}(T^\downarrow _A(\rho ), T^\downarrow _B(\sigma ))&\le C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \sum _{k=1}^K \sqrt{\mathrm {tr}\left[ P_k \rho \right] \mathrm {tr}\left[ P_k \sigma \right] }, \end{aligned}$$
(5.2)

for all \(\rho , \sigma \ge 0\).

Before proving the proposition, let us remark that Proposition 2.10 is a direct consequence thereof.

Proof

(Proposition 2.10) This follows directly from the fact that the fidelity can be characterized in terms of the minimum over measurements of expressions of the form given on the RHS of (5.2) (see [25], p. 412). \(\square \)

Proof

(Proposition 5.2) We first establish that \(C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \le 2\). Let P, \(P^\bot \) be the orthogonal projections onto \({\mathcal {V}}\) and \({\mathcal {V}}^\bot \). By applying the triangular inequality and the sub-multiplicativity of the operator norm to the definition of \(C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )}\), it follows that if there exist \({\mathcal {H}}_E, V_A, V_B, P_A\), and \(P_B\) with the properties of Definition 5.1, then \(C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \le 2\). Therefore, we start our proof by showing the existence of the aforementioned quantities. It is a basic property of completely positive trace non-increasing maps (see [25], p. 365) that there exist finite-dimensional Hilbert spaces \({\mathcal {H}}_{E_A}\) and \({\mathcal {H}}_{E_B}\), isometries \({\tilde{V}}_A : {\mathcal {H}} \rightarrow {\mathcal {H}}_{E_A} \otimes {\mathcal {H}}\) and \({\tilde{V}}_B : {\mathcal {H}} \rightarrow {\mathcal {H}}_{E_B} \otimes {\mathcal {H}}\), and orthogonal projections \({\tilde{P}}_A : {\mathcal {H}}_{E_A} \rightarrow {\mathcal {H}}_{E_A}\) and \({\tilde{P}}_B : {\mathcal {H}}_{E_B} \rightarrow {\mathcal {H}}_{E_B}\), such that \(T_A^\downarrow (\cdot ) = \mathrm {tr}_{E_A}\left[ ({\tilde{P}}_A \otimes \mathbbm {1}) {\tilde{V}}_A \cdot {\tilde{V}}_A^\dagger \right] \) and \(T_B^\downarrow (\cdot ) = \mathrm {tr}_{E_B}\left[ ({\tilde{P}}_B \otimes \mathbbm {1}) {\tilde{V}}_B \cdot {\tilde{V}}_B^\dagger \right] \). By enlarging the smaller of the two ancillary Hilbert spaces and identifying two orthonormal basis, we can achieve that \({\mathcal {H}}_{E_A}\) and \({\mathcal {H}}_{E_B}\) are the same space, \({\mathcal {H}}_E\). By assumption, \(T_A\vert _{{\mathcal {B}}_1({\mathcal {V}})}\) and \(T_B\vert _{{\mathcal {B}}_1({\mathcal {V}})}\) are trace-preserving. It follows that \(({\tilde{P}}_A\otimes \mathbbm {1}) {\tilde{V}}_A|_{\mathcal {V}}\) and \(({\tilde{P}}_A\otimes \mathbbm {1}) {\tilde{V}}_B|_{\mathcal {V}}\) are isometries and thus \(({\tilde{P}}_A\otimes \mathbbm {1}) {\tilde{V}}_A|_{\mathcal {V}} = {\tilde{V}}_A|_{\mathcal {V}}\) and \(({\tilde{P}}_B\otimes \mathbbm {1}) {\tilde{V}}_B|_{\mathcal {V}} = {\tilde{V}}_B|_{\mathcal {V}}\). Hence, \({\tilde{V}}_A|_{\mathcal {V}}\) and \({\tilde{V}}_B|_{\mathcal {V}}\) are Stinespring isometries of the same channel and thus are related by a unitary operator on \({\mathcal {H}}_E\). Precisely, there exists a unitary operator \(W : {\mathcal {H}}_E \rightarrow {\mathcal {H}}_E\) such that \({\tilde{V}}_B|_{\mathcal {V}} = (W\otimes \mathbbm {1}){\tilde{V}}_A|_{\mathcal {V}}\). Equivalently, \({\tilde{V}}_BP = (W\otimes \mathbbm {1}){\tilde{V}}_A P\). It is then easy to verify that the operators \(V_A := (W\otimes \mathbbm {1}){\tilde{V}}_A\), \(V_B := {\tilde{V}}_B\) and \(P_A := W {\tilde{P}}_A W^{-1}, P_B := {\tilde{P}}_B\) satisfy the requirements (5.1c) and (5.1b). In particular, we have

$$\begin{aligned} (P_A\otimes \mathbbm {1})V_AP&= V_AP = V_BP = (P_B\otimes \mathbbm {1})V_BP. \end{aligned}$$
(5.3)

This finishes the proof of the first part of the proposition. For the second part, we fix \(V_A, V_B, P_A\), and \(P_B\) such that the Conditions (5.1c) and (5.1b) are satisfied. In particular, this implies that (5.3) holds. To prove the inequality, we proceed as follows: for two positive operators \(\rho , \sigma \ge 0\), Uhlmann’s theorem implies that there exists a finite-dimensional Hilbert space \({\mathcal {H}}_Q\) and two vectors \(\psi , \phi \in {\mathcal {H}}_Q \otimes {\mathcal {H}}\) (purifications) such that \(\mathrm {tr}_{Q}\left[ {|}\psi {\rangle }{\langle }\psi {|}\right] = \rho \) and \(\mathrm {tr}_{Q}\left[ {|}\phi {\rangle }{\langle }\phi {|}\right] = \sigma \) and . We further note that \((\mathbbm {1}_Q \otimes (P_A\otimes \mathbbm {1}) V_A){|}\psi {\rangle }\) and \((\mathbbm {1}_Q \otimes (P_B\otimes \mathbbm {1}) V_B){|}\phi {\rangle }\) are purifications of \(T_A^\downarrow (\rho )\) and \(T_B^\downarrow (\sigma )\). Hence, Uhlmann’s theorem implies that

(5.4)

By inserting \(\mathbbm {1}_{\mathcal {Q}}\otimes P + \mathbbm {1}_{\mathcal {Q}}\otimes P^\bot \) (which is equal to the identity) and expanding the scalar product, we obtain

(5.5a)
(5.5b)
(5.5c)
(5.5d)

It is not hard to see from (5.3) that the terms (5.5b) and (5.5c) vanish. Explicitly, we have

and similarly for (5.5c). Adding and subtracting and using the inverse triangular inequality yields

(5.6)

We further use \(P^\bot P = 0\) (thus ) and some rearrangement to arrive at

(5.7)

As by assumption, \(P^\bot = \sum _k P_k\) and \(P_kP_l = 0\) for \(k\ne l\), we get

(5.8)

where we used the Cauchy–Schwarz inequality and the sub-multiplicativity of the matrix norm to get from the first to the second line. For the last line, we used

As the only constraints that \(V_A, V_B, P_A, P_B\), and \({\mathcal {H}}_E\) have to satisfy are the ones of Definition 5.1, we conclude that

$$\begin{aligned} {}(5.8) \ge \sqrt{F}(\rho , \sigma ) - C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \sum _{k=1}^K \sqrt{\mathrm {tr}\left[ P_k\rho \right] \mathrm {tr}\left[ P_k\sigma \right] }. \end{aligned}$$

This proves the claim. \(\square \)

Proposition 5.2 does not allow for ancillary systems. In the following proposition, which is an iterated refinement of the preceding one, we show that this problem can be solved by applying Proposition 5.2 to \(T^\downarrow \otimes \mathrm {id}\).

Proposition 5.3

(Technical no-go theorem). For \(\mathrm {dim}({\mathcal {H}}) < \infty \), let \(T_A^\downarrow , T_B^\downarrow : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be two completely positive trace non-increasing maps. Let \({\mathcal {V}}\) be a subspace of \({\mathcal {H}}\) such that \(T_A^\downarrow \vert _{{\mathcal {B}}_1(\mathcal {V)}}\) is trace-preserving and \(T_A^\downarrow \vert _{{\mathcal {B}}_1(\mathcal {V)}} = T_B^\downarrow \vert _{{\mathcal {B}}_1(\mathcal {V)}}\). Let \({\mathcal {W}} = \{{\mathcal {W}}_1, {\mathcal {W}}_2, \dots , {\mathcal {W}}_K\}\) be a collection of mutually orthogonal subspaces of \({\mathcal {V}}^\bot \), such that \({\mathcal {V}}^\bot = {\mathcal {W}}_1 \oplus {\mathcal {W}}_2 \oplus \dots \oplus {\mathcal {W}}_K\). We denote the orthogonal projections onto these subspaces by \(P_1, P_2, \dots , P_K\). Furthermore, let \(T_A, T_B : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be completely positive maps such that \(T_A - T_A^\downarrow \) and \(T_B - T_B^\downarrow \) are also completely positive. Then, for every finite-dimensional N-step discrimination strategy \(D = ({\mathcal {H}}, {\mathcal {H}}_Z, s_0, \Lambda )\), we have

$$\begin{aligned} 1 - \sqrt{F}(\rho _N^{T_A}, \rho _N^{T_B}) \le C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \sum _{i = 0}^{N-1}\sum _{k = 1}^{K} \sqrt{\mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \rho _i^{T_A^\downarrow }\right] \right] \cdot \mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \rho _i^{T_B^\downarrow }\right] \right] }, \end{aligned}$$
(5.9)

where \(\rho \) is the intermediate state map of D. Furthermore, \(C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \le 2\).

Corollary 5.4

For \(\mathrm {dim}({\mathcal {H}}) < \infty \), let \(T_A^\downarrow , T_B^\downarrow : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be two completely positive trace non-increasing maps. Let \({\mathcal {V}}\) be a subspace of \({\mathcal {H}}\) such that \(T_A^\downarrow \vert _{{\mathcal {B}}_1(\mathcal {V)}}\) is trace-preserving and \(T_A^\downarrow \vert _{{\mathcal {B}}_1(\mathcal {V)}} = T_B^\downarrow \vert _{{\mathcal {B}}_1(\mathcal {V)}}\). Then,

$$\begin{aligned} 1 - \sqrt{F}(\rho _N^{T_A}, \rho _N^{T_B}) \le C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \sum _{i = 0}^{N-1}\sum _{k = 1}^{K} \sqrt{\mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \rho _i^{T_A^\downarrow }\right] \right] \cdot \mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \rho _i^{T_B^\downarrow }\right] \right] }. \end{aligned}$$
(5.10)

Proof

To reduce the overhead in notation, we define \(\rho _i := \rho _i^{T_A}\), \(\rho _i^\downarrow := \rho _i^{T_A^\downarrow }\) and \(\sigma _i := \rho _i^{T_B}\), \(\sigma _i^\downarrow := \rho _i^{T_B^\downarrow }\). We start to prove the proposition by showing that

$$\begin{aligned} 1-\sqrt{F}(\rho _N, \sigma _N) \le 1 - \sqrt{F}(\rho _N^\downarrow , \sigma _N^\downarrow ). \end{aligned}$$
(5.11)

This inequality follows from the strong concavity of the fidelity and the observation that \(\rho _N - \rho _N^\downarrow \ge 0\) and \(\sigma _N - \sigma _N^\downarrow \ge 0\). The latter statement follows inductively, as \(\rho _0 - \rho _0^\downarrow = 0 \ge 0\) and

$$\begin{aligned} \rho _{i+1} - \rho ^\downarrow _{i+1}&= \Lambda _i((T_A\otimes \mathrm {id})(\rho _{i}) - (T_A^\downarrow \otimes \mathrm {id})(\rho ^\downarrow _{i}) )\\&= \Lambda _i((T_A\otimes \mathrm {id})(\rho _i - \rho _i^\downarrow ) +((T_A-T_A^\downarrow )\otimes \mathrm {id})(\rho _i^\downarrow )) \\&\ge 0. \end{aligned}$$

The last line follows, as by induction \(\rho _i - \rho _i^\downarrow \ge 0\) and \(T_A-T_A^\downarrow \) is, by assumption, completely positive. Replacing \(\rho \) by \(\sigma \) and A by B in the argument above shows that also \(\sigma _N - \sigma _N^\downarrow \ge 0\). We write \(\Delta \rho := \rho _N-\rho ^\downarrow _N\) and \(\Delta \sigma := \sigma _N-\sigma ^\downarrow _N\) and use the strong concavity (see [25], p. 414) and the non-negativity of the fidelity, to obtain the following inequality:

$$\begin{aligned} \sqrt{F}(\rho _N, \sigma _N)&= \sqrt{F}(\rho _N^\downarrow + \Delta \rho , \sigma _N^\downarrow + \Delta \sigma )\\&\ge \sqrt{F}(\rho _N^\downarrow , \sigma _N^\downarrow ) + \sqrt{F}(\Delta \rho , \Delta \sigma ) \\&\ge \sqrt{F}(\rho _N^\downarrow , \sigma _N^\downarrow ), \end{aligned}$$

which is equivalent to (5.11). To prove (5.9), it remains to show that

$$\begin{aligned} 1 - \sqrt{F}(\rho _N^\downarrow , \sigma _N^\downarrow ) \le C^{(T_A^\downarrow , T_B^\downarrow )}_{{\mathcal {V}}, {\mathcal {W}}} \sum _{i = 0}^{N-1}\sum _{k = 0}^{K} \sqrt{\mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \rho _i^\downarrow \right] \right] \cdot \mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \sigma _i^\downarrow \right] \right] }. \end{aligned}$$
(5.12)

To this end, notice that if \(T^\downarrow _A\vert _{{\mathcal {B}}_1({\mathcal {V}})} = T^\downarrow _B\vert _{{\mathcal {B}}_1({\mathcal {V}})}\), then \((T^\downarrow _A\otimes \mathrm {id})\vert _{{\mathcal {B}}_1({\mathcal {V}}\otimes {\mathcal {H}}_Z)} = (T^\downarrow _B\otimes \mathrm {id})\vert _{{\mathcal {B}}_1({\mathcal {V}}\otimes {\mathcal {H}}_Z)}\). Hence, \(T_A^\prime := (T^\downarrow _A \otimes \mathrm {id})\), \(T_B^\prime := (T^\downarrow _B \otimes \mathrm {id})\), \({\mathcal {V}}^\prime := {\mathcal {V}} \otimes {\mathcal {H}}_Z\) and \({\mathcal {W}}^\prime := \{{\mathcal {W}}_1 \otimes {\mathcal {H}}_Z, \dots , {\mathcal {W}}_K \otimes {\mathcal {H}}_Z\}\) satisfy the assumptions of Proposition 5.2. Furthermore, as the fidelity is non-decreasing under the action of the channel \(\Lambda _i\) (see [25], p. 414), we have

$$\begin{aligned} \sqrt{F}(\rho ^\downarrow _i, \sigma _i^\downarrow ) - \sqrt{F}(\rho ^\downarrow _{i+1}, \sigma ^\downarrow _{i+1})&= \sqrt{F}(\rho ^\downarrow _i, \sigma ^\downarrow _i) - \sqrt{F}(\Lambda _i\circ T_A^\prime (\rho ^\downarrow _{i}), \Lambda _i\circ T_B^\prime (\sigma ^\downarrow _{i})) \\&\le \sqrt{F}(\rho ^\downarrow _i, \sigma ^\downarrow _i) - \sqrt{F}(T_A^\prime (\rho ^\downarrow _{i}), T_B^\prime (\sigma ^\downarrow _{i})). \end{aligned}$$

We want to apply Proposition 5.2 to the RHS of this expression. To do this correctly, we should notice that the projections, appearing in (5.2), project onto \({\mathcal {W}}_k\otimes {\mathcal {H}}_Z\), and hence are equal to \(P_k \otimes \mathbbm {1}\). Also, if \(V_A, V_B, P_A\), and \(P_B\) satisfy Conditions (5.1b) and (5.1c), then \(V_A\otimes \mathbbm {1}, V_B\otimes \mathbbm {1}, P_A\otimes \mathbbm {1}\), and \(P_B\otimes \mathbbm {1}\) satisfy the Conditions (5.1b) and (5.1c) for \(T_A^\prime \) and \(T_B^\prime \). If we plug this into (5.1a) and use that in general \(\left\Vert X\otimes \mathbbm {1} \right\Vert = \left\Vert X \right\Vert \), we obtain

$$\begin{aligned} C_{{\mathcal {V}}^\prime , {\mathcal {W}}^\prime }^{(T_A^\downarrow \otimes \mathrm {id}, T_B^\downarrow \otimes \mathrm {id})} \le C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )}. \end{aligned}$$

Using these observations, we get

$$\begin{aligned} \sqrt{F}(\rho ^\downarrow _i, \sigma ^\downarrow _i) - \sqrt{F}(\rho ^\downarrow _{i+1}, \sigma ^\downarrow _{i+1})&\le C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \sum _{k = 1}^K \sqrt{\mathrm {tr}\left[ (P_k\otimes \mathbbm {1}) \rho ^\downarrow _i\right] \mathrm {tr}\left[ (P_k \otimes \mathbbm {1}) \sigma ^\downarrow _i\right] } \\&= C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \sum _{k = 1}^K\sqrt{\mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \rho ^\downarrow _i\right] \right] \mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \sigma ^\downarrow _i\right] \right] }. \end{aligned}$$

Equivalently,

$$\begin{aligned} \sqrt{F}(\rho ^\downarrow _{i+1}, \sigma ^\downarrow _{i+1}) \ge \sqrt{F}(\rho ^\downarrow _i, \sigma ^\downarrow _i) - C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \sum _{k = 1}^K\sqrt{\mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \rho ^\downarrow _i\right] \right] \mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \sigma ^\downarrow _i\right] \right] }. \end{aligned}$$

If we iterate this inequality, we obtain

$$\begin{aligned} \sqrt{F}(\rho ^\downarrow _N, \sigma ^\downarrow _N) \ge \sqrt{F}(\rho ^\downarrow _0, \sigma ^\downarrow _0) - C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \sum _{i = 0}^{N-1}\sum _{k = 1}^K \sqrt{\mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \rho ^\downarrow _i\right] \right] \mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \sigma ^\downarrow _i\right] \right] }. \end{aligned}$$

Using \(\sqrt{F}(\rho ^\downarrow _0, \sigma ^\downarrow _0) = \sqrt{F}(s_0, s_0) = 1\) and some rearrangement establishes (5.12) and completes the proof of the theorem. \(\square \)

To connect this technical result with the main results of this section, we need two auxiliary lemmas.

Lemma 5.5

For \(\mathrm {dim}({\mathcal {H}}) < \infty \), let \(T_A, T_B : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be two channels and let D be a finite-dimensional N-step discrimination strategy and \(\Pi \) be a two-valued POVM. Then,

$$\begin{aligned} \frac{(1 - 2 P_e(D, \Pi ))^2}{2} \le 1-\sqrt{F}(\rho _N^{T_A}, \rho _N^{T_B}), \end{aligned}$$
(5.13)

where \(\rho \) is the intermediate state map of D.

Proof

By definition,

$$\begin{aligned} P_e(D,\Pi ) = \frac{1}{2} \left[ \mathrm {tr}\left[ \pi _B \rho _N^{T_A}\right] + \mathrm {tr}\left[ \pi _A \rho _N^{T_B}\right] \right] . \end{aligned}$$
(5.14)

If we minimize over the possible two-valued POVMs \(\Pi ^\prime \), the famous Holevo-Helstrom formula reads

$$\begin{aligned} P^m_e(D) := \min _{\Pi ^\prime } P_e(D, \Pi ^\prime ) = \frac{1}{2}\left[ 1 - \frac{1}{2}\left\Vert \rho _N^{T_A} - \rho _N^{T_B} \right\Vert _1 \right] . \end{aligned}$$

Since \(0 \le P_e(D, \Pi ) \le \frac{1}{2}\), we have \(1 - 2 P_e(D, \Pi ) \ge 0\). Thus,

$$\begin{aligned} \frac{(1 - 2 P_e(D, \Pi ))^2}{2} \le \frac{(1 - 2 P_e^m(D))^2}{2}. \end{aligned}$$
(5.15)

By the Fuchs–van de Graaf inequality (see [25], p. 416),

$$\begin{aligned} \frac{1}{2}\left\Vert \rho - \sigma \right\Vert _1 \le \sqrt{1-\sqrt{F}(\rho , \sigma )^2}. \end{aligned}$$

Thus,

$$\begin{aligned} \frac{(1 - 2 P_e^m(D))^2}{2}&= \frac{\left( \frac{1}{2} \left\Vert \rho _N^{T_A} - \rho _N^{T_B} \right\Vert _1 \right) ^2}{2}\\&\le \frac{1-\sqrt{F}(\rho _N^{T_A}, \rho _N^{T_B})^2}{2} \\&= (1-\sqrt{F}(\rho _N^{T_A}, \rho _N^{T_B})) \frac{1+\sqrt{F}(\rho _N^{T_A}, \rho _N^{T_B})}{2} \\&\le 1-\sqrt{F}(\rho _N^{T_A}, \rho _N^{T_B}). \end{aligned}$$

Together with (5.15), this proves the claim. \(\square \)

Lemma 5.6

For \(\mathrm {dim}({\mathcal {H}}) < \infty \), let \(T_A, T_B : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be two channels with vacuum \(v \in {\mathcal {H}}\). Let \({\mathcal {V}}_{T_A}\) and \({\mathcal {V}}_{T_B}\) be the respective maximal vacuum subspaces and let \(T_A^\downarrow \) and \(T_B^\downarrow \) be as in Definition 3.3 (Eq. 3.6). Furthermore, let \({\mathcal {V}}\) be a subspace such that \(v \in {\mathcal {V}}\) and \({\mathcal {V}} \subseteq {\mathcal {V}}_{T_A} \cap {\mathcal {V}}_{T_B}\). Let \({\mathcal {W}} = \{{\mathcal {W}}_1, {\mathcal {W}}_2, \dots , {\mathcal {W}}_K\}\) be a collection of mutually orthogonal subspaces of \({\mathcal {V}}^\bot \), such that \({\mathcal {V}}^\bot = {\mathcal {W}}_1 \oplus {\mathcal {W}}_2 \oplus \dots \oplus {\mathcal {W}}_K\). Denote the orthogonal projections onto these subspaces by \(P_1, P_2, \dots , P_K\). Then,

$$\begin{aligned} \frac{(1 - 2 P_e(D, \Pi ))^2}{2} \le C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \sum _{i = 0}^{N-1}\sum _{k = 1}^{K} \sqrt{\mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \rho _i^{T_A^\downarrow }\right] \right] \cdot \mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \rho _i^{T_B^\downarrow }\right] \right] }, \end{aligned}$$
(5.16)

for all finite-dimensional N-step discrimination strategies \(D = ({\mathcal {H}}, {\mathcal {H}}_Z, s_0, \Lambda )\) and all two-valued POVMs, \(\Pi \).

Proof

By Lemma 5.5, we have for any finite-dimensional N-step discrimination strategy D and any two-valued POVM, \(\Pi \), that

$$\begin{aligned} \frac{(1 - 2 P_e(D, \Pi ))^2}{2} \le 1-\sqrt{F}(\rho _N^{T_A}, \rho _N^{T_B}). \end{aligned}$$
(5.17)

We want to apply Proposition 5.3 to the RHS of this inequality. To this end, we have to define the quantities appearing in that proposition. We identify \(T_A, T_B, {\mathcal {V}}\), and \({\mathcal {W}}\) with the objects that bear the same name. In the following let \(X \in \{A, B\}\). We define \(T_X^\downarrow \) as in Definition 3.3 and need to check that \(T_X - T^\downarrow _X\) is completely positive and that \(T_X^\downarrow \vert _{{\mathcal {B}}_1({\mathcal {V}})}\) is trace-preserving. To this end, we fix a Stinespring isometry \(V_X : {\mathcal {H}} \rightarrow {\mathcal {H}}_E \otimes {\mathcal {H}}\) of \(T_X\). Then, \(T^\downarrow _X\) is defined by

$$\begin{aligned} T_X^\downarrow (\cdot ) = \mathrm {tr}_{E}\left[ (P_v^{(X)}\otimes \mathbbm {1}) V_X \cdot V_X^\dagger \right] , \end{aligned}$$

where \(P_v^{(X)}\) is the projection onto the support of \(\mathrm {tr}_{{\mathcal {H}}}\left[ V_X {|}v{\rangle }{\langle }v{|} V_X^\dagger \right] \). It follows immediately from this expression that \(T_X-T_X^\downarrow \) is completely positive. To see that \(T_X^\downarrow \vert _{{\mathcal {B}}_1({\mathcal {V}}_{T_X})}\) is trace-preserving, note that by Definition 3.9

$$\begin{aligned} {\mathcal {V}}_{T_X} = V_X^{-1} \left[ \mathrm {supp}(\mathrm {tr}_{{\mathcal {H}}}\left[ V_X {|}v{\rangle }{\langle }v{|} V_X^\dagger \right] ) \otimes {\mathcal {H}} \right] . \end{aligned}$$

Thus, for anyFootnote 15\(\rho \in {\mathcal {B}}_1({\mathcal {V}}_{T_X})\),

$$\begin{aligned} V_X \rho V_X^\dagger \in {\mathcal {B}}_1(\mathrm {supp}(\mathrm {tr}_{{\mathcal {H}}}\left[ V_X {|}v{\rangle }{\langle }v{|} V_X^\dagger \right] ) \otimes {\mathcal {H}}). \end{aligned}$$

As \(P_v^{(X)}\otimes \mathbbm {1}\) is the projection onto \(\mathrm {supp}(\mathrm {tr}_{{\mathcal {H}}}\left[ V_X {|}v{\rangle }{\langle }v{|} V_X^\dagger \right] ) \otimes {\mathcal {H}}\), we have

$$\begin{aligned} T_X^\downarrow \vert _{{\mathcal {B}}_1({\mathcal {V}}_{T_X})}(\cdot )&= \mathrm {tr}_{E}\left[ (P_v^{(X)}\otimes \mathbbm {1}) V_X \cdot V_X^\dagger \right] = \mathrm {tr}_{E}\left[ V_X \cdot V_X^\dagger \right] = T_X\vert _{{\mathcal {B}}_1({\mathcal {V}}_{T_X})}(\cdot ). \end{aligned}$$

Thus, \(T_X^\downarrow \vert _{{\mathcal {B}}_1({\mathcal {V}}_{T_X})}\) is trace-preserving, as \(T_X\vert _{{\mathcal {B}}_1({\mathcal {V}}_{T_X})}\) is. As \({\mathcal {V}}\) is a subspace of \({\mathcal {V}}_{T_X}\), also \(T_X^\downarrow \vert _{{\mathcal {B}}_1({\mathcal {V}})}\) is trace-preserving. This is what we have claimed. As all assumptions are satisfied, we can invoke Proposition 5.3, which directly yields the desired inequality. \(\square \)

The next result has already been stated in the results section.

Theorem 5.7

(No-go theorem). For \(\mathrm {dim({\mathcal {H}})} < \infty \), let \(T_A, T_B : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be two channels with vacuum \(v\in {\mathcal {H}}\). If there exists no subspace \({\mathcal {V}} \subseteq {\mathcal {H}}\) such that \(v \in {\mathcal {V}}\), at least one of the channels \(T_A\) or \(T_B\) is isometric on \({\mathcal {V}}\) and \(T_A\vert _{{\mathcal {B}}_1({\mathcal {V}})} \ne T_B\vert _{{\mathcal {B}}_1({\mathcal {V}})}\), then there exists a constant \(C < \infty \), such that

$$\begin{aligned} (1-2P_e(D, \Pi ))^2 \le C \sqrt{P_I^{T_A}(D)\cdot P_I^{T_B}(D)} \le C \max (P_I^{T_A}(D), P_I^{T_B}(D)), \end{aligned}$$
(5.18)

for all finite-dimensional N-step discrimination strategies D and all two-valued POVMs, \(\Pi \). Hence, \(T_A\) and \(T_B\) cannot be discriminated in an “interaction-free” manner.

Remark 5.8

The assumption “The statement that \(T_A\) or \(T_B\) is isometric on a subspace \({\mathcal {V}}\), with \(v \in {\mathcal {V}}\), already implies that \(T_A\vert _{{\mathcal {B}}_1({\mathcal {V}})} = T_B\vert _{{\mathcal {B}}_1({\mathcal {V}})}\)” can be rephrased in two equivalent ways. The first one is that the Conditions 1, 2, and 3 in the Main Theorem (Sect. 2) cannot be fulfilled simultaneously. The second reformulation is that for the maximum vacuum subspaces \({\mathcal {V}}_{T_A}\) and \({\mathcal {V}}_{T_B}\), we have \({\mathcal {V}}_{T_A} = {\mathcal {V}}_{T_B}\) and \(T_A\vert _{{\mathcal {B}}_1({\mathcal {V}}_{T_A})} = T_B\vert _{{\mathcal {B}}_1({\mathcal {V}}_{T_B})}\). The equivalence follows directly from the characterization of maximal vacuum subspaces in Lemma 3.10 4. This second reformulation is not only important in the proof, but also if one wants to check this criterion, as \({\mathcal {V}}_{T_A}\) and \({\mathcal {V}}_{T_B}\) are efficiently computable directly from Definition 3.9.

Proof

We use the second characterization in Remark 5.8. That is, \({\mathcal {V}}_{T_A} = {\mathcal {V}}_{T_B}\) and \(T_A\vert _{{\mathcal {B}}_1({\mathcal {V}}_{T_A})} = T_B\vert _{{\mathcal {B}}_1({\mathcal {V}}_{T_B})}\). We set \({\mathcal {V}} := {\mathcal {V}}_{T_A}\) and let \(T_A^\downarrow \) and \(T_B^\downarrow \) be as in Definition 3.3. Furthermore, we define \({\mathcal {W}} := \{{\mathcal {W}}_1\}\), with \({\mathcal {W}}_1 := \mathcal {V^\bot }\). Then, by Lemma 5.6, we have

$$\begin{aligned} (1 - 2 P_e(D, \Pi ))^2 \le 2C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \sum _{i = 0}^{N-1} \sqrt{\mathrm {tr}\left[ P^\bot \mathrm {tr}_{Z}\left[ \rho _i^{T_A^\downarrow }\right] \right] \cdot \mathrm {tr}\left[ P^\bot \mathrm {tr}_{Z}\left[ \rho _i^{T_B^\downarrow }\right] \right] }, \end{aligned}$$
(5.19)

where \(P^\bot \) is the orthogonal projection onto \({\mathcal {W}}_1 = {\mathcal {V}}^\bot \). As \({\mathcal {V}}\) is the maximum vacuum subspace of \(T_A\) and \(T_B\), Lemma 3.10 5 implies that for \(X \in \{A, B\}\), there is a constant \(C_{T_X} > 0\) such that \({\mathfrak {i}}_{T_X}(\rho ) \ge C_{T_X} \mathrm {tr}\left[ P^\bot \, \rho \right] \) for all \(\rho \ge 0\). As \(\mathrm {tr}_{Z}\left[ \rho _i^{T_X^\downarrow }\right] \ge 0\), we get

$$\begin{aligned} {}(5.19)&\le \frac{2C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )}}{\sqrt{C_{T_A} C_{T_B}}} \sum _{i = 0}^{N-1} \sqrt{{\mathfrak {i}}_{T_A}\left( \mathrm {tr}_{Z}\left[ \rho _i^{T_A^\downarrow }\right] \right) \,{\mathfrak {i}}_{T_B}\left( \mathrm {tr}_{Z}\left[ \rho _i^{T_B^\downarrow }\right] \right) } \\&\le \frac{2C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )}}{\sqrt{C_{T_A} C_{T_B}}} \sqrt{\left( \sum _{i = 0}^{N-1} {\mathfrak {i}}_{T_A}\left( \mathrm {tr}_{Z}\left[ \rho _i^{T_A^\downarrow }\right] \right) \right) \,\left( \sum _{i = 0}^{N-1}{\mathfrak {i}}_{T_B}\left( \mathrm {tr}_{Z}\left[ \rho _i^{T_B^\downarrow }\right] \right) \right) } \\&= \frac{2C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )}}{\sqrt{C_{T_A} C_{T_B}}} \sqrt{P_I^{T_A}(D)\cdot P_I^{T_B}(D)}, \end{aligned}$$

where we used the Cauchy–Schwarz inequality (on \({\mathbb {C}}^{N}\)) to obtain the second line and the definition of the “interaction” probability in the last line. We note that the last inequality in the statement of the theorem is trivial. Thus, by setting \(C := \frac{2C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )}}{\sqrt{C_{T_A} C_{T_B}}}\), we have proven the claim. \(\square \)

The following theorem is the technical version of the result stated in the results section.

Theorem 5.9

(Rate limit theorem). For \(\mathrm {dim({\mathcal {H}})} < \infty \), let \(T_A, T_B : {\mathcal {B}}_1({\mathcal {H}}) \rightarrow {\mathcal {B}}_1({\mathcal {H}})\) be two channels with vacuum \(v\in {\mathcal {H}}\). Let \({\mathcal {V}}_{T_A}\) and \({\mathcal {V}}_{T_B}\) be the respective maximal vacuum subspace of \(T_A\) and \(T_B\). Set \({\mathcal {V}} := {\mathcal {V}}_{T_A} \cap {\mathcal {V}}_{T_B}\). Suppose that \(T_A\vert _{{\mathcal {B}}_1({\mathcal {V}})} = T_B\vert _{{\mathcal {B}}_1({\mathcal {V}})}\) and that \({\mathcal {V}}^\bot \cap {\mathcal {V}}_{T_A}\) and \({\mathcal {V}}^\bot \cap {\mathcal {V}}_{T_B}\) are orthogonal.

Then there exists a constant \(C > 0\) such that

$$\begin{aligned} \max (P_I^{T_A}(D), P_I^{T_B}(D)) \ge C\,\frac{(1-2P_e(D,\Pi ))^4}{N}, \end{aligned}$$
(5.20)

for all finite-dimensional N-step discrimination strategies D, and any two-valued POVM \(\Pi \).

Proof

The proof is similar to the one of the no-go theorem. Let \(T_A^\downarrow \) and \(T_B^\downarrow \) be as in Definition 3.3 and set \({\mathcal {V}} := {\mathcal {V}}_{T_A} \cap {\mathcal {V}}_{T_B}\). Furthermore, define \({\mathcal {W}} := \{{\mathcal {W}}_1, {\mathcal {W}}_2, {\mathcal {W}}_3 \}\) with \({\mathcal {W}}_1 := {\mathcal {V}}^\bot \cap {\mathcal {V}}_{T_A}\), \({\mathcal {W}}_2 := {\mathcal {V}}^\bot \cap {\mathcal {V}}_{T_B}\) and \({\mathcal {W}}_3 := ({\mathcal {W}}_1\oplus {\mathcal {W}}_2)^\bot \cap {\mathcal {V}}^\bot \). Clearly, \({\mathcal {W}}_1, {\mathcal {W}}_2\), and \({\mathcal {W}}_3\) are mutually orthogonal and their direct sum is \({\mathcal {V}}^\bot \). Furthermore, \({\mathcal {W}}_2\oplus {\mathcal {W}}_3 = {\mathcal {V}}_{T_A}^\bot \) and \({\mathcal {W}}_1\oplus {\mathcal {W}}_3 = {\mathcal {V}}_{T_B}^\bot \). Thus, by Lemma 5.6, we have

$$\begin{aligned} (1 - 2 P_e(D, \Pi ))^2 \le 2C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \sum _{i = 0}^{N-1}\sum _{k= 1}^{3} \sqrt{\mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \rho _i^{T_A^\downarrow }\right] \right] \cdot \mathrm {tr}\left[ P_k \mathrm {tr}_{Z}\left[ \rho _i^{T_B^\downarrow }\right] \right] }, \end{aligned}$$
(5.21)

where for \(k \in \{1, 2, 3\}\), \(P_k\) is the orthogonal projection onto \({\mathcal {W}}_k\). Using the Cauchy–Schwarz inequality (on \({\mathbb {C}}^{3}\)), and the fact that probabilities are less than one, and afterwards the Cauchy–Schwarz inequality on \({\mathbb {C}}^N\), we get

$$\begin{aligned} {}(5.21)&\le \sqrt{12}C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \sum _{i = 0}^{N-1} \sqrt{\mathrm {tr}\left[ (P_2+P_3) \mathrm {tr}_{Z}\left[ \rho _i^{T_A^\downarrow }\right] \right] + \mathrm {tr}\left[ (P_1+P_3) \mathrm {tr}_{Z}\left[ \rho _i^{T_B^\downarrow }\right] \right] }\nonumber \\&\le \sqrt{12N} C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \sqrt{\sum _{i = 0}^{N-1} \mathrm {tr}\left[ P_{{\mathcal {V}}_{T_A}}^\bot \mathrm {tr}_{Z}\left[ \rho _i^{T_A^\downarrow }\right] \right] + \mathrm {tr}\left[ P_{{\mathcal {V}}_{T_B}}^\bot \mathrm {tr}_{Z}\left[ \rho _i^{T_B^\downarrow }\right] \right] }, \end{aligned}$$
(5.22)

where \(P_{{\mathcal {V}}_{T_A}}^\bot \) and \(P_{{\mathcal {V}}_{T_B}}^\bot \) are the projections onto \({\mathcal {V}}_{T_A}^\bot \) and \({\mathcal {V}}_{T_B}^\bot \). Lemma 3.10, 5 implies that for \(X \in \{A, B\}\), there is a constant \(C_{T_X} > 0\) such that \({\mathfrak {i}}_{T_X}(\rho ) \ge C_{T_X} \mathrm {tr}\left[ P_{{\mathcal {V}}_{T_X}}^\bot \, \rho \right] \) for all \(\rho \ge 0\). As \(\mathrm {tr}_{Z}\left[ \rho _i^{T_X^\downarrow }\right] \ge 0\), we get

$$\begin{aligned} {}(5.22)&\le \sqrt{12N} C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )} \sqrt{C_{T_A}^{-1}\sum _{i = 0}^{N-1}{\mathfrak {i}}_{T_A}\left( \mathrm {tr}_{Z}\left[ \rho _i^{T_A^\downarrow }\right] \right) + C_{T_B}^{-1}\sum _{i = 0}^{N-1}{\mathfrak {i}}_{T_B}\left( \mathrm {tr}_{Z}\left[ \rho _i^{T_B^\downarrow }\right] \right) }\\&\le C_{\mathcal {V}}^{(T_A^\downarrow , T_B^\downarrow )} \sqrt{\frac{24}{\min (C_{T_A}, C_{T_B})}} \sqrt{N\,\max (P_I^{T_A}(D), P_I^{T_B}(D))}. \end{aligned}$$

Taking the square and defining \(C := \frac{\min (C_{T_A}, C_{T_B})}{24 {C_{{\mathcal {V}}, {\mathcal {W}}}^{(T_A^\downarrow , T_B^\downarrow )}}^2}\) proves the claim. \(\square \)

6 Related Work

In this section we compare our setup and results to selected other works in the literature.Footnote 16 We start with a detailed comparison with the work on counterfactual computation (CFC) by Mitchison and Josza [7]. CFC aims to determine the outcome of a quantum computation without switching on the computer. Expressed in a language closer to ours, CFC aims to discriminate (counterfactually, the term analogous to “interaction-free”) between two unitaries \(U_0\) and \(U_1\) defined on a bipartite system \({\mathcal {H}}_O \otimes {\mathcal {H}}_S\) (where “O” stands for output and “S” for switch).

Before defining what counterfactual means, we need to discuss the allowed discrimination strategies. Here, it is allowed to use the unknown unitary many times while performing unitary operations and measurements in between. It is also allowed to add an ancillary system \({\mathcal {H}}_Z\) of arbitrary size and to let the unitary operations act on the space \({\mathcal {H}}_O \otimes {\mathcal {H}}_S \otimes {\mathcal {H}}_Z\). This implies that measurements can be deferred until the unknown unitary was applied for the last time. Thus, if the unknown unitary \(U_r\), \(r\in \{0, 1\}\) is used N times, the initial state is \(\psi _I \in {\mathcal {H}}_O \otimes {\mathcal {H}}_S \otimes {\mathcal {H}}_Z\) and the intermediary unitaries are \(V_1, V_2, \dots ,V_{N-1} \in {\mathcal {B}}({\mathcal {H}}_O \otimes {\mathcal {H}}_S \otimes {\mathcal {H}}_Z)\), then the (r-dependent) state before the final measurement is

$$\begin{aligned} \psi _F^r = (U_r \otimes \mathbbm {1}_Z) V_{N-1} (U_r \otimes \mathbbm {1}_Z) V_{N-2} \cdots (U_r \otimes \mathbbm {1}_Z) V_{1} (U_r \otimes \mathbbm {1}_Z) \psi _I. \end{aligned}$$
(6.1)

In preparation for defining the term counterfactual, one assumes that for each \(r \in \{0,1\}\) we can split the switch space into two orthogonal spaces \({\mathcal {H}}_S = {\mathcal {H}}_S^{r,\text {off}} \oplus {\mathcal {H}}_S^{r,\text {on}}\), called the off and on subspaces, respectively. The interpretation here is that if we apply \(U_r\) to a state in \({\mathcal {H}}_O \otimes {\mathcal {H}}_S^{r,\text {off}}\), then the computer does not run. Consistent with this interpretation, it is also assumed that

$$\begin{aligned} U_r \psi = \psi , \text { for all } \psi \in {\mathcal {H}}_O \otimes {\mathcal {H}}_S^{r,\text {off}}. \end{aligned}$$
(6.2)

One then introduces a decomposition into so-called histories. To this end, one imagines that after each application of \(U_r\) a measurement was performed, projecting either onto \({\mathcal {H}}_O \otimes {\mathcal {H}}_S^{r,\text {off}} \otimes {\mathcal {H}}_Z\) or onto \({\mathcal {H}}_O \otimes {\mathcal {H}}_S^{r,\text {on}} \otimes {\mathcal {H}}_Z\). We denote the corresponding projections by \(P^{r}_{\text {off}}\) and \(P^{r}_{\text {on}}\). On can then decompose \(\psi _F^r\) as:

$$\begin{aligned} \begin{aligned} \psi _F^r&= \sum _{h \in \{\text {on}, \text {off}\}^N} v_h^r, \text { with}\\ v_h^r&= P^{r}_{h_N} (U_r \otimes \mathbbm {1}_Z) V_{N-1} \cdots P^{r}_{h_2} (U_r \otimes \mathbbm {1}_Z) V_{1} P^{r}_{h_1}(U_r \otimes \mathbbm {1}_Z) \psi _I. \end{aligned} \end{aligned}$$
(6.3)

Each of the on/off sequences h in (6.3) is called a history.

Suppose we perform a projective measurement on the final state with possible outcomes \(m \in \{1, 2, \dots , M\}\) and associated projections \(\{Q_1, Q_2, \dots , Q_M\}\). Mitchison and Josza (Definition 5.1 in [7]) then define an outcome m to be a counterfactual outcome of type \(r \in \{0, 1\}\), if

  1. 1.

    \(Q_m v_h^r = 0\), if h is not the all-off history,

  2. 2.

    \(Q_m \psi _F^{1-r} = 0\).

The first condition says that the only history consistent with the outcome m must be the all-off history and the second condition demands that the outcome m can only occur if the unknown unitary is \(U_r\) (and not \(U_{1-r}\)).

Now, how does CFC relate to “interaction-free” channel discrimination? First, one can interpret “interaction-free” channel discrimination in terms of CFC after some modifications, as follows. Consider a channel T with vacuum \(v \in {\mathcal {H}}_I\), given by \(T(\cdot ) = \mathrm {tr}_{E}\left[ V \cdot V^\dagger \right] \). In Sect. 3.1, we determined that the Demon’s optimal strategy is to perform a two-outcome measurement on E (with corresponding projections \(P_v\) and \(P_v^\bot \)). After extending V to a unitary U, we can interpret the whole space \({\mathcal {H}}_E \otimes {\mathcal {H}}_I\) as the switch space \({\mathcal {H}}_S\) and set \({\mathcal {H}}_O := {\mathbb {C}}\). A natural way to introduce the splitting of \({\mathcal {H}}_S\) into on and off subspace is then to define \({\mathcal {H}}_S^{\text {off}} = \mathrm {range}(P_v) \otimes {\mathcal {H}}_I\) and \({\mathcal {H}}_S^{on} = \mathrm {range}(P_v^\bot ) \otimes {\mathcal {H}}_I\). Note, however, that this definition does not satisfy (6.2).Footnote 17 A violation of assumption (6.2) does not prevent us from defining histories, nor does it interfere with the definition of a counterfactual outcome as above. So, one might consider broadening the definition of CFC by dropping it. However, upon close investigation one finds that (the proofs of) all theorems in [7] rely crucially on that assumption. In any case, even after dropping that assumption, the definition of a counterfactual outcome above is still too restrictive to fully cover “interaction-free” channel discrimination, since we do not require that the “interaction” probability or the error probability are exactly zero (as demanded by CFC) but rather that they can be made arbitrarily small. This requires a probabilistic modification of the definition of a counterfactual outcome, such as the one suggested in the discussion section in [7]. We therefore conclude that “interaction-free” channel discrimination is consistent with a sufficiently broadened definition of CFC. Unfortunately, however, we do not think that this point of view has any important direct implications for the feasibility of the “interaction-free” channel discrimination task. The main reasons for this belief are that even after reformulation into the language of CFC, the allowed discrimination strategies differ considerably and that the only result in [7] that goes beyond the qubit case is that the number of insertions of \(U_r\) must tend to infinity for an optimal success probability.Footnote 18

What about implications of our results for CFC? We believe that a conceptual weakness of CFC is that there are (in general) no observable consequences—in the sense that (the surroundings of) the apparatus changes—regardless of whether a computation was performed counterfactually or not. This is so because the imagined measurements after each application of the unknown unitary are not actually performed. We think that the question about a change of (the surroundings of) the apparatus is the relevant one for technical applications, which is our main focus. If one demands that the imaginary measurements are actually performed, then CFC becomes a special case of “interaction-free” channel discrimination by assigning to the unitary \(U_r \in {\mathcal {B}}({\mathcal {H}}_O \otimes {\mathcal {H}}_S)\) the channel \(T_r : {\mathcal {B}}({\mathcal {H}}_O \otimes {\mathcal {H}}_S)\) given by

$$\begin{aligned} T_r(\rho ) = (\mathbbm {1}_O \otimes P^{r, \text {off}}_S) U\rho U^\dagger (\mathbbm {1}_O \otimes P^{r, \text {off}}_S) + (\mathbbm {1}_O \otimes P^{r, \text {on}}_S) U\rho U^\dagger (\mathbbm {1}_O \otimes P^{r, \text {on}}_S), \end{aligned}$$
(6.4)

for all \(\rho \in {\mathcal {B}}_1({\mathcal {H}}_O \otimes {\mathcal {H}}_S)\), where \(P^{r, \text {off}}_S\) and \(P^{r, \text {on}}_S\) are the projections according to the splitting of \({\mathcal {H}}_S\) into on and off subspace. It follows from (6.2) that \(T_r\) is a channel with vacuum, where the vacuum can be taken to be any vector in \({\mathcal {H}}_O \otimes {\mathcal {H}}_S^{r, \text {off}}\). Hence, our results apply to this setting.

From our technological point of view, some interpretational discussions in the literature can be avoided. For example, in [37], Hosten et. al claimed that they could discriminate counterfactually between four unitaries associated with the result of a Grover search. We agree with [38, 39] that the proposal in [37] does not constitute a CFC for all possible outcomes in the sense of [7]. However, from the point of view of our model, this is a rather artificial debate. Since a unitarily evolving system does not interact with its surroundings (the Demon), there is no way to tell whether a computation has been performed or not by looking at the surroundings. In that sense, the task in [37] was (as every other discrimination task involving only unitary operations) performed in an “interaction-free” manner.

A work with a title similar to ours is “Interaction-free measurement as quantum channel discrimination” by Zhou and Yung [32]. The objective of their work was to determine if the Kwiat et. al protocol for detecting a semitransparent object can be enhanced by using an entangled initial state. The study was conducted by employing tools from quantum channel theory, but no attempts were made to generalize the notion of “interaction-free” measurements. Generalizing this notion, however, is the main focus of the present work.

7 Conclusion and Open Problems

In our work, we have characterized when it is possible and impossible to discriminate quantum channels in an “interaction-free” manner. This answers the question, what can be done perfectly with “interaction-free” measurements. However, there are still some open questions. One question that is in direct succession of our work is, under which conditions two channels can be discriminated such that the “interaction” probability decays faster than \(~N^{-1}\). Another question would ask for a more quantitative treatment, i.e., even though one might not be able to discriminate two channels in an “interaction-free” manner, there still might be a significant quantum advantage over classical strategies. A related question suggested to us by an anonymous reviewer is what kind of information about the discriminator’s strategy the Demon can obtain. In this context, we showed that the Demon cannot distinguish (under our conditions) between a strategy that always sends the vacuum through the channels and our proposed one. However, the more general question remains open. A big question concerns the influence of noise and decoherence. We note that noise may influence what can or cannot be done in both directions, since the noise can also be on the Demon’s side and hence make his detection skills weaker. Before the no-go results for semitransparent objects were established [11, 12], one anticipated application of “interaction-free” measurement was to eliminate the exposure of humans to radiation in medical applications such as X-ray scans. This is not possible. However, our no-go theorem does not touch the case of asymmetric “interaction-free” discrimination. That is, we may allow that one of the two objects to be discriminated gets destroyed (for example, by simply setting its transmission functional to zero). This might even be a desirable effect. For example, in a medical context, we would love to design a procedure such that a tumor gets destroyed, while the healthy tissue stays intact.