1 Introduction

The task of discriminating states via ‘one-shot’ measurements is fundamental to a number of processes in many scenarios: classically, the solution of a calculation performed on a computer cannot be read, nor can a private encryption key be shared, unless we are able to correctly identify the resulting bits. This remains true within the quantum realm, where any quantum process is only ever as good as our ability to distinguish the possible states the system may be in. However, due to the nature of states within quantum mechanics, we are subject to difficulties not seen classically: whereas a single measurement is generally sufficient to fully distinguish between a set of possible pure states in a classic system, for a collection of pure quantum states it is impossible to distinguish them in such a setting, with the exception of a set of mutually orthogonal states. As such, the aim is to maximise our chances of distinguishing the states via clever choices of measurement.

As is the case with a number of notions within quantum theory, there is no clear-cut ‘best choice’ for what one wishes to optimise over in a state discrimination task, and by extension no obvious choice of measurement to perform in any given circumstance. In the 1970s, Helstrom [1], Holevo [2, 3], and Yuen et al. [4] began work on minimum error state discrimination, which aimed to minimise the average error in determining each possible state in a set via their corresponding measurement outcome. In the late 1980s, Ivanonic [5] (followed by Dieks [6] and Peres [7]) considered so-called unambiguous state discrimination, where the goal was no longer to minimise error, but rather to maximise the chance of unambiguously distinguishing states where possible. More recently an additional figure of merit referred to as “maximum confidence” was introduced by Kosut et al. [8] and Croke et al. [9] (who coined the term), which refers to how confident we may be that we started with a particular state, given that we obtained its corresponding measurement outcome. Each of these figures of merit comes with shortcomings: in minimum error state discrimination, we will still find some instances of incorrect measurement outcomes for a given state, whereas the latter two admit inconclusive measurement results where no state can be inferred. As such, one generally decides a particular figure of merit based on their preferred trade-off. For more in-depth summaries of state discrimination tasks and their applications, we refer the reader to [10,11,12].

It is well known that for distinguishing a set of mutually orthogonal pure states (with regard to any figure of merit) the obvious choice is a sharp observable containing the states in question as projections. However, in this work we consider minimum error state discrimination in the case where one is unable to use such a sharp observable, and instead some unsharp variant is available. This construct is in contrast to the standard problem presented, where a set of non-orthogonal states are distinguished, but considering our scenario is physically justified: unsharp measurements can allow for the possibility of sequential or repeated measurements without complete loss of the initial state, and furthermore, encapsulate the reality that any measurement will contain some inherent noise, be it the result of mechanical or human error. In fact, it is impossible to perform projective measurements on quantum systems using finite resources [13]. Whilst this may seem to be a hindrance, we shall see that the aforementioned admission of repeated measurements works in our favour.

In [14] it was shown that in the limit of number of repetitions going to infinity, the information content of an unsharp binary observable tends to that of its spectral counterpart, but no consideration was given to intermediate steps. This is the point of interest for this paper, as it is not only a more realistic scenario, but it also allows us to see where the trade-off point lies where an increased number of measurements provide diminishing returns in terms of increased success probability. In a realistic scenario one would aim to perform a finite number of measurement rounds and in doing so reach a sufficiently high confidence about the initial state.

The use of repeated measurements in state discrimination tasks has previously been adopted in the context of unambiguous state discrimination [15,16,17], and for minimum error discrimination [18, 19]. However, in the studied scenarios the observables measured differ over the course of the measurement process (with, for example [18] relying on sequential binary measurements to approximate an N-valued observable as described in [20]). In another example [15] there are multiple agents aiming to gather the same information, solely based on their individually obtained measurement outcomes. In this paper, we instead restrict ourselves to performing repetitions of the same N-valued observable on the system, thereby receiving a string of outcomes from which we can post-process the data to determine the most likely state amongst N alternatives. Mathematically, this translates to constructing a joint observable whose effects we partition to form a new N-valued observable, which is then used to calculate the success probability.

We begin this paper with an overview of the necessary concepts in Sect. 2; these contain the mathematical descriptions of finite outcome observables and repeated measurements, as well as expressing the problem of minimum error state discrimination and post-processing of observables. We motivate our main work in Sect. 3, where we explicitly calculate the success probability for distinguishing two eigenvectors of the x-direction operator \(\sigma _x\) by repeated measurements of the unsharp spin-x observable \(\mathsf {X}_t\). This example introduces us to the “rule of three” that will be prevalent in the cases considered later: the second measurement does not lead to an increase in the success probability, but the third does. The case of binary observables is considered in Sect. 4 for an arbitrary number of repetitions, and we find a general pattern whereby the success probability only increases for odd numbers of repetitions. We extend the results of Sect. 4 by providing some partial results for commutative N-valued observables (\(N>2\)) in Sect. 5, where the rule of three is found to still hold, though the general pattern for binary observables is no longer seen to hold. Finally, we discuss our results in Sect. 6.

2 General framework for state discrimination via repeated measurements

2.1 Quantum measurements

In this section we recall the basic formalism of quantum measurements. (For a more detailed discussion see, for example [21].) We shall generally consider an arbitrary separable Hilbert space \(\mathcal {H}\) of (possibly infinite) dimension d and its space \(\mathcal {L(H)}\) of bounded linear operators. The operators we will primarily work with are the so-called effects on \(\mathcal {H}\): \(\mathcal {E(H)}= \{E\in \mathcal {L(H)}\ |\ 0\le E \le I\}\), where the partial order is in terms of the expectation values of vectors of \(\mathcal {H}\), i.e. \(E\le F\) means that \(\left\langle \psi | E\psi \right\rangle \le \left\langle \psi | F\psi \right\rangle \) for all \(\psi \in \mathcal {H}\). The state space of \(\mathcal {H}\), denoted by \(\mathcal {S(H)}\), consists of positive operators of unit trace: \(\mathcal {S(H)}= \{\rho \in \mathcal {T(H)}\ |\ \rho \ge 0\ \mathrm {and}\ \mathrm {tr}_{}\!\left[ \rho \right] =1 \}\) where \(\mathcal {T(H)}\subset \mathcal {L(H)}\) denotes the set of trace-class operators on \(\mathcal {H}\).

Quantum observables are mathematically described by positive operator-valued measures (POVMs), and we will restrict ourselves to N-valued observables, where N is a finite integer. An N-valued POVM \(\mathsf {A}\) is a map from the finite outcome set \(\Omega _\mathsf {A}=\{1,2,\dots ,N\}\) to the set of effects; in specific, \(\mathsf {A}:x \mapsto \mathsf {A}(x)\in \mathcal {E(H)}\) with \(\sum _x \mathsf {A}(x)=I\). For a system prepared in a state \(\rho \), the probability of obtaining the outcome x when measuring observable \(\mathsf {A}\) is given by the Born rule \(p_\rho ^\mathsf {A}(x) =\mathrm {tr}_{}\!\left[ \rho \mathsf {A}(x)\right] \).

An observable describes the measurement outcome statistics, but leaves open how the quantum state is transformed during the measurement process; this task is accomplished by an instrument, a mathematical object that describes both. For a finite set \(\Omega =\{1,\dots ,N\}\), an instrument \(\mathcal {I}\) is a map from \(\Omega \) to the set of operations on \(\mathcal {T(H)}\); that is, \(\mathcal {I}:x \mapsto \mathcal {I}_{x}\) where

  1. (i)

    \(\mathcal {I}_{x}:\mathcal {T(H)}\rightarrow \mathcal {T(H)}\) is completely positive and linear for all \(x\in \Omega \);

  2. (ii)

    \(\mathcal {I}_{x}\) is trace non-increasing for all \(x\in \Omega \);

  3. (iii)

    \(\sum _x \mathcal {I}_{x}\) is trace preserving.

The dual instrument \(\mathcal {I}^*:\mathcal {L(H)}\rightarrow \mathcal {L(H)}\) defines a unique observable \(\mathsf {A}_\mathcal {I}:\Omega \rightarrow \mathcal {E(H)}\) via \(\mathsf {A}_\mathcal {I}(x) = \mathcal {I}^*_{x}(I)\) and thus \(\mathrm {tr}_{}\!\left[ \mathcal {I}_{x}(\rho )\right] = p_\rho ^{\mathsf {A}_\mathcal {I}}(x)\). For any observable \(\mathsf {A}\) there exist infinitely many instruments \(\mathcal {I}\) such that \(\mathsf {A}_\mathcal {I}=\mathsf {A}\), each describing some particular way of measuring it. If the observable \(\mathsf {A}\) is measured in a way described by the instrument \(\mathcal {I}\) on a system in state \(\rho \) and outcome x is recorded, then the unnormalised state of the system after the measurement will be given by \(\mathcal {I}_{x}(\rho )\).

Fig. 1
figure 1

We consider the situation where we subject the system of interest to a repeated measurement of an observable \(\mathsf {A}\) n times. After each measurement we record the result \(x_i\) and the system experiences a state change \(\rho \mapsto \mathcal {I}_{x_i}(\rho )\). Once each measurement is performed we are left with an n-tuple \((x_1,\dots ,x_n)\) of measurement results, which can be considered the results of an n-round observable \(\mathsf {A}^{(n)}_{\mathcal {I}}\)

2.2 Repeated quantum measurements

In our investigation, we restrict ourselves to situations where the same measurement apparatus is used repeatedly on the same system; see Fig. 1. Suppose that \(\mathcal {I}\) is the instrument describing the measurement of an observable \(\mathsf {A}\). Then, after n repeated measurements we get an outcome array \((x_1,\ldots ,x_n)\in \Omega _\mathsf {A}^n\) with probability

$$\begin{aligned} \mathrm {tr}_{}\!\left[ \rho \mathsf {A}^{(n)}_{\mathcal {I}}(x_1,\ldots ,x_n)\right] :=\mathrm {tr}_{}\!\left[ (\mathcal {I}_{x_n}\circ \cdots \circ \mathcal {I}_{x_1})(\rho )\right] , \end{aligned}$$
(1)

where \(\mathcal {I}_{x_n}\circ \cdots \circ \mathcal {I}_{x_1}\) is the functional composition of the corresponding operations. This equation, assumed to be valid for all input states \(\rho \), determines the observable \(\mathsf {A}^{(n)}_{\mathcal {I}}\), which we shall refer to as an n-round observable. The mathematical structure of \(\mathsf {A}^{(n)}_{\mathcal {I}}\) depends crucially on the specific form of \(\mathcal {I}\). For instance, if \(\mathcal {I}\) is of the measure-and-prepare form, then no subsequent measurement can give anything more than was already found in the first measurement. Further examples of repeated measurements that do not provide new information after the first, or some finite number of, repetitions are given in [14].

In the current work we limit ourselves to Lüders instruments, i.e. we choose \(\mathcal {I}_{x}(\rho ) = \sqrt{\mathsf {A}(x)}\rho \sqrt{\mathsf {A}(x)}\). For ease of notation we denote the corresponding n-round observable \(\mathsf {A}^{(n)}_\mathcal {I}\) simply by \(\mathsf {A}^{(n)}\) (since no other type of instrument will be considered in what follows). Hence,

$$\begin{aligned} \mathsf {A}^{(n)}(x_1,\ldots ,x_n)=\sqrt{\mathsf {A}(x_1)}\cdots \sqrt{\mathsf {A}(x_{n-1})}\mathsf {A}(x_n)\sqrt{\mathsf {A}(x_{n-1})}\cdots \sqrt{\mathsf {A}(x_1)} . \end{aligned}$$
(2)

If \(\mathsf {A}\) is commutative (i.e. \(\left[ \mathsf {A}(x),\mathsf {A}(y)\right] =0\) for all xy), then the previous expression reduces to

$$\begin{aligned} \mathsf {A}^{(n)}(x_1,\ldots ,x_n)=\mathsf {A}(x_1)\mathsf {A}(x_2)\cdots \mathsf {A}(x_n) . \end{aligned}$$
(3)

We observe that in this case the probability of getting a particular outcome array \((x_1,\ldots ,x_n)\) in a state \(\rho \) does not depend on the order of the outcomes.

2.3 State discrimination in n rounds

We are considering a particular quantum information task, namely, the discrimination of states. We assume that the initial observable \(\mathsf {A}\) with outcome set \(\Omega _\mathsf {A}=\{1,\dots ,N\}\) discriminates N states \(\{\rho _1, \dots ,\rho _N\}\) with some error that is larger than in the optimal discrimination of these states. The observable \(\mathsf {A}\) can be, for instance, a noisy version of the observable that optimally discriminates the N states. The success probability \(P_\mathrm {succ}^{(1)}\) for successfully distinguishing the states with \(\mathsf {A}\) is given by

$$\begin{aligned} P_\mathrm {succ}^{(1)}=\frac{1}{N}\sum _{j=1}^N \mathrm {tr}_{}\!\left[ \rho _j \mathsf {A}(j)\right] , \end{aligned}$$
(4)

where we have assumed uniform a priori distribution of the states.

By repeating the measurement we hope to increase the probability of guessing the correct state. After n Lüders measurements of \(\mathsf {A}\), resulting in the observable \(\mathsf {A}^{(n)}\) with outcome set \(\Omega _\mathsf {A}^n\), we possess statistics for \(N^n\) possible strings of measurement outcomes. To assess the ability of \(\mathsf {A}^{(n)}\) in distinguishing the original N states, we need to post-process \(\mathsf {A}^{(n)}\) such that we are left with an N-valued observable denoted by \(\mathsf {B}^{(n)}\) with outcome set \(\Omega _\mathsf {A}\). This means that for each outcome array \((x_1,\ldots ,x_n)\), we need to decide the most likely state \(\rho _j\) and hence relabel this outcome array into j.

Mathematically, the post-processing is performed by a Markov kernel \(w:\Omega _\mathsf {A}\times \Omega _\mathsf {A}^n \rightarrow [0,1]\) satisfying \(w(j, \varvec{x}) =:w^{j}_{\varvec{x}} \ge 0\) for all \(j\in \Omega _\mathsf {A}\) and \(\varvec{x}\in \Omega _\mathsf {A}^n\), and \(\sum _{j} w^{j}_{\varvec{x}} =1\) for all \(\varvec{x}\in \Omega _\mathsf {A}^n\). The numerical value \(w(j, \varvec{x})\) is the probability that \(\varvec{x}\) is relabelled into j. If each \(\varvec{x}\) determines a unique j (i.e. \(w(j, \varvec{x})\in \{0,1\}\) for all \(j\in \Omega _\mathsf {A}\) and \(\varvec{x}\in \Omega _\mathsf {A}^n\)), then we say that w is deterministic. The post-processed observable is given by

$$\begin{aligned} \mathsf {B}^{(n)}(j) = \sum _{\varvec{x}\in \Omega _\mathsf {A}^{(n)}} w(j,\varvec{x}) \mathsf {A}^{(n)}(\varvec{x}), \end{aligned}$$
(5)

from which we arrive at the n-round success probability \(P_\mathrm {succ}^{(n)}\):

$$\begin{aligned} P_\mathrm {succ}^{(n)} = \frac{1}{N} \sum _{j=1}^N \mathrm {tr}_{}\!\left[ \rho _j\mathsf {B}^{(n)}(j)\right] . \end{aligned}$$
(6)

This expression clearly depends on the chosen post-processing. As we want to maximise \(P_\mathrm {succ}^{(n)}\), we can restrict to deterministic Markov kernels; other Markov kernels are their convex mixtures [22]. The choice of the optimal post-processing will be studied in Sects. 4 and 5.

In performing the previously described method of state discrimination via repeated measurements, we may come across outcome arrays that do not suggest to us one particular state over another. Let \(\mathcal {S}'\subseteq \mathcal {S(H)}\) be a subset of states. We say that a measurement outcome array \(\varvec{x} \in \Omega _\mathsf {A}^n\) is ambiguous with respect to \(\mathcal {S}'\) if \(\mathrm {tr}_{}\!\left[ \rho \mathsf {A}^{(n)}(\varvec{{x}})\right] =\text {const.}\) for all \(\rho \in \mathcal {S}'\). Two illustrative examples of this notion in the case of \(n=1\) are the following. First, if \(\mathcal {S}'\) is the whole state space \(\mathcal {S(H)}\), then a trivial observable \(\mathsf {T}: x\mapsto p_x I\) only possesses outcomes that are ambiguous with respect to \(\mathcal {S}'\). Second, when \(\mathcal {S}'\) is equal to an orthonormal basis \(\{\varphi _i\}\), then any sharp observable in a basis mutually unbiased to \(\{\varphi _i\}\) will only possess outcomes that are ambiguous with respect to the states.

We remark that the method outlined in this section for state discrimination via repeated measurements would work similarly had we chosen another instrument \(\mathcal {I}\) than the Lüders instrument, corresponding to a different physical implementation of the measurement. The forms of \(\mathsf {B}^{(n)}\) and \(P_\mathrm {succ}^{(n)}\) obviously depend on the chosen instrument and, as said before, we do not mark the instrument simply because we stick to Lüders instruments.

2.4 Noisy measurement in state discrimination

The general framework discussed in earlier subsections is applicable for any \(\mathsf {A}\) and \(\mathcal {I}\). In this section we consider a more specific setting that will be relevant for what follows. We begin with a collection of N states \(\{\rho _1, \dots ,\rho _N\}\) that are perfectly distinguishable, i.e. there exists an N-outcome observable \(\mathsf {D}\) such that \(\mathrm {tr}_{}\!\left[ \rho _i \mathsf {D}(j)\right] =\delta _{ij}\). This is the case if and only if they are orthogonal pure states or, more generally, mixed states with orthogonal supports. (Note that if \(N<d\), then \(\mathsf {D}\) is not unique.) However, we assume that such \(\mathsf {D}\) is not available, and instead we use a noisy observable \(\mathsf {A}\) to distinguish the states. We assume that \(\mathsf {A}\) is still reasonably good in distinguishing the states \(\{\rho _1, \dots ,\rho _N\}\), which we take to mean that \(\mathrm {tr}_{}\!\left[ \rho _j\mathsf {A}(j)\right] \ge \mathrm {tr}_{}\!\left[ \rho \mathsf {A}(j)\right] \) and \(\mathrm {tr}_{}\!\left[ \rho _i\mathsf {A}(j)\right] \le \mathrm {tr}_{}\!\left[ \rho \mathsf {A}(j)\right] \) if \(i\ne j\) for all states \(\rho \). We further make a simplifying uniformity assumption that \(\mathrm {tr}_{}\!\left[ \rho _j\mathsf {A}(j)\right] \) is the same for all j and similarly \(\mathrm {tr}_{}\!\left[ \rho _i\mathsf {A}(j)\right] \) is the same for all ij.

Proposition 1

Let

$$\begin{aligned} \forall j: \quad \sup _{\rho } \mathrm {tr}_{}\!\left[ \rho \mathsf {A}(j)\right] = \mathrm {tr}_{}\!\left[ \rho _j\mathsf {A}(j)\right] =\lambda \end{aligned}$$
(7)

and

$$\begin{aligned} \forall i \ne j: \quad \inf _{\rho } \mathrm {tr}_{}\!\left[ \rho \mathsf {A}(j)\right] = \mathrm {tr}_{}\!\left[ \rho _i\mathsf {A}(j)\right] =\mu . \end{aligned}$$
(8)

Then

$$\begin{aligned} \mathsf {A}(j)\rho _i = {\left\{ \begin{array}{ll} \lambda \ \rho _i, \quad &{}i=j , \\ \frac{1-\lambda }{N-1}\ \rho _i, \quad &{}i\ne j , \end{array}\right. } \end{aligned}$$
(9)

and \(\lambda \ge \tfrac{1}{N}\).

Proof

It follows from (7) and (8) that \(\lambda \) and \(\mu \) are the maximal and minimal eigenvalues of \(\mathsf {A}(j)\), respectively. Then, as \(\lambda \) is the maximal eigenvalue, a unit vector \(\psi \in \mathcal {H}\) satisfies \(\left\langle \psi | \mathsf {A}(j)\psi \right\rangle =\lambda \) only if \(\mathsf {A}(j)\psi = \lambda \psi \). Using the spectral decomposition of \(\rho _j\) we then conclude that \(\mathsf {A}(j)\rho _j = \lambda \rho _j\). Analogous reasoning shows that \(\mathsf {A}(j)\rho _i = \mu \rho _i\) for \(i\ne j\). Finally,

$$\begin{aligned} 1 = \mathrm {tr}_{}\!\left[ \rho _i I\right] =\sum _{j}\mathrm {tr}_{}\!\left[ \rho _i\mathsf {A}(j)\right] = \lambda + (N-1) \mu , \end{aligned}$$

from which (9) follows. \(\square \)

We take the setting of Proposition 1 as our starting point in the following investigations. We remark that (9) does not determine \(\mathsf {A}\) uniquely unless \(N=d\). For example, if \(\mathsf {D}\) is any observable that perfectly discriminates the states, then the observable \(\mathsf {A}\) given by

$$\begin{aligned} \mathsf {A}(j) = (\lambda - \mu )\mathsf {D}(j) + \mu I, \end{aligned}$$
(10)

where \(\mu = \frac{1-\lambda }{N-1}\), satisfies condition (9).

3 Motivating qubit example

To motivate our main result, we first provide an explicit example. Consider the qubit system \(\mathbb C^2\) and suppose that we wish to distinguish between the two eigenstates of the \(\sigma _x\) operator \(P_\pm = |\pm \rangle \!\langle \pm |\). We assume that we must attempt to do so via an unsharp unbiased spin-x measurement that is parametrised by \(t\in [0,1]\); i.e. our observable, denoted \(\mathsf {X}_t\), is given by

$$\begin{aligned} \mathsf {X}_t(\pm ) = \frac{I\pm t \sigma _x}{2} = \frac{1\pm t}{2} P_+ + \frac{1\mp t}{2} P_- . \end{aligned}$$
(11)

The eigenvalues of these effects are \(\lambda _\pm = (1\pm t)/2\), where \(\lambda _+ \ge \lambda _-\) and \(\lambda _+ + \lambda _- =1\). The success probability of distinguishing between \(P_+\) and \(P_-\) with \(\mathsf {X}_t\) is

$$\begin{aligned} P^{(1)}_\mathrm {succ} = \frac{1}{2}\mathrm {tr}_{}\!\left[ P_+\mathsf {X}_t(+) + P_-\mathsf {X}_t(-)\right] =\lambda _+ . \end{aligned}$$
(12)

Performing the Lüders measurement of the observable a second time leads to the sequential observable \(\mathsf {X}^{(2)}_t\), with effects

$$\begin{aligned} \begin{aligned} \mathsf {X}^{(2)}_t(+,+)&= \mathsf {X}_t(+)\mathsf {X}_t(+)=\lambda _+^2 P_+ + \lambda _-^2 P_- ,\\ \mathsf {X}^{(2)}_t(-,-)&= \mathsf {X}_t(-)\mathsf {X}_t(-)=\lambda _-^2 P_+ + \lambda _+^2 P_- ,\\ \mathsf {X}^{(2)}_t(+,-)&= \mathsf {X}^{(2)}_t(-,+) = \mathsf {X}_t(-)\mathsf {X}_t(-)=\lambda _+\lambda _- I .\\ \end{aligned} \end{aligned}$$
(13)

The first two effects can be seen to be confirmatory in nature, since the first and second measurements outcomes are in agreement, whereas the last two are ambiguous as they result in the same values for any state. To assess how capable \(\mathsf {X}^{(2)}_t\) is of distinguishing \(P_\pm \) we must post-process the observable to create a new binary observable. Taking the most general post-processing possible, we let \(\mathsf {B}^{(2)}(+)=\sum _{i,j} w^+_{ij} \mathsf {X}^{(2)}_t (i,j)\), with weights \(0\le w^+_{ij}\le 1\), denote the first effect of this new observable that we consider to be the “+” outcome. The second effect is then \(\mathsf {B}^{(2)}(-)=I- \mathsf {B}^{(2)}(+)\). The success probability at this stage, denoted by \(P_\mathrm {succ}^{(2)}\), is given by

$$\begin{aligned} P_\mathrm {succ}^{(2)} = \frac{1}{2} \mathrm {tr}_{}\!\left[ P_+ \mathsf {B}^{(2)}(+) + P_- \mathsf {B}^{(2)}(-)\right] = \frac{1}{2}(1+2\ \mathrm {tr}_{}\!\left[ P_+\mathsf {B}^{(2)}(+)\right] -\mathrm {tr}_{}\!\left[ \mathsf {B}^{(2)}(+)\right] ), \end{aligned}$$
(14)

where in the last equality we have relied on \(I=P_+ + P_-\). Using the forms in Eq. (13) we find that the success probability is

$$\begin{aligned} P_\mathrm {succ}^{(2)} =&\frac{1}{2}\big [1 + 2 (w^{+}_{++}\lambda _+^2 +w^{+}_{--}\lambda _-^2 +(w^{+}_{+-}+w^{+}_{-+})\lambda _+\lambda _-) \\&-(w^{+}_{++} + w^{+}_{--})(\lambda ^2_+ + \lambda ^2_-)- 2(w^{+}_{+-}+w^{+}_{-+})\lambda _+\lambda _-\big ]\\ =&\frac{1}{2}(1 + (w^{+}_{++}-w^{+}_{--})(\lambda _+^2 - \lambda _-^2)). \end{aligned}$$

Note that, as expected, neither \(w^{+}_{+-}\) nor \(w^{+}_{-+}\) contribute to the success probability, as ambiguous results should not be able to help us draw a conclusion about which state was measured. Since the eigenvalues \(\lambda _\pm \) sum to one, we can rewrite \(\lambda _+^2 - \lambda _-^2 = 2\lambda _+ -1\). Furthermore, since the weights are non-negative, we have \(w^{+}_{++}-w^{+}_{--}\le w^{+}_{++}\le 1\), and so the optimal success probability arises when \(w^{+}_{++}=1\) and \(w^{+}_{--}=0\), which simply leads to \(P_\mathrm {succ}^{(2)}=\lambda _+=P_\mathrm {succ}^{(1)}\). In other words, performing a second measurement of \(\mathsf {X}_t\) does not improve our likelihood of distinguishing between \(P_+\) and \(P_-\).

However, if we repeat the measurement another time then we will notice an improvement. The observable \(\mathsf {X}^{(3)}_t\) is given by

$$\begin{aligned} \begin{aligned} \mathsf {X}_t^{(3)}(+,+,+)&= \lambda _+^3 P_+ + \lambda _-^3 P_- ,\\ \mathsf {X}_t^{(3)}(+,+,-) = \mathsf {X}_t^{(3)}(+,-,+)&= \mathsf {X}_t^{(3)}(-,+,+)= \lambda _+\lambda _-\mathsf {X}_t(+) , \\ \mathsf {X}_t^{(3)}(-,+,-) = \mathsf {X}_t^{(3)}(-,+,-)&= \mathsf {X}_t^{(3)}(+,-,-)= \lambda _+\lambda _-\mathsf {X}_t(-) , \\ \mathsf {X}_t^{(3)}(-,-,-)&= \lambda _-^3 P_+ + \lambda _+^3 P_- . \end{aligned} \end{aligned}$$
(15)

With a similar calculation to before we arrive at the success probability \(P_\mathrm {succ}^{(3)}=3\lambda _+^2 - 2\lambda _+^3\), which is strictly greater than \(P_\mathrm {succ}^{(1)}\) for \(0<t< 1\). We have therefore seen that whilst performing the unsharp measurement twice will not provide us with an advantage in distinguishing the states, we do gain one by performing it a third time. We should stress, however, this does not mean that the observable \(\mathsf {X}^{(2)}_t\) is in general equivalent to \(\mathsf {X}_t\). Indeed, unless \(t=1\) and the observable were sharp, \(\mathsf {X}^{(2)}_t\) is strictly higher in post-processing ordering [23] than \(\mathsf {X}_t\) since any effect of \(\mathsf {X}_t\) can be reached from post-processing of \(\mathsf {X}^{(2)}_t\) but not vice versa. It is the considered discrimination task for which \(\mathsf {X}_t\) and \(\mathsf {X}^{(2)}_t\) perform equally well.

4 Binary observables

Consider a separable (not necessarily finite) Hilbert space \(\mathcal {H}\) and two perfectly distinguishable states \(\rho _+\) and \(\rho _-\). Let \(\mathsf {A}\) be a binary observable satisfying (9) for these states, i.e.

$$\begin{aligned} \mathsf {A}(\pm )\rho _\pm = \lambda \rho _\pm , \quad \mathsf {A}(\pm )\rho _\mp = (1-\lambda ) \rho _\mp \end{aligned}$$
(16)

for some \(\frac{1}{2}\le \lambda \le 1\). To simplify notation we shall adopt the convention \(\mathsf {A}(+) = A\) (and hence \(\mathsf {A}(-) = I-A\)). The success probability \(P_\mathrm {succ}^{(1)}\) of discriminating the two states \(\rho _+\) and \(\rho _-\) via the observable \(\mathsf {A}\) is given by

$$\begin{aligned} P_\mathrm {succ}^{(1)} = \frac{1}{2}\mathrm {tr}_{}\!\left[ \rho _+\mathsf {A}(+) + \rho _-\mathsf {A}(-)\right] = \lambda . \end{aligned}$$
(17)

Since the effects A and \(I-A\) necessarily commute, the n-round observables \(\mathsf {A}^{(n)}\) are of the form given in Eq. (3). For instance, the 2-round observable \(\mathsf {A}^{(2)}\) has effects

$$\begin{aligned} \mathsf {A}^{(2)}(+,+)=A^2 , \quad \mathsf {A}^{(2)}(+,-)=\mathsf {A}^{(2)}(-,+)=A(I-A) , \quad \mathsf {A}^{(2)}(-,-)=(I-A)^2 , \end{aligned}$$

and so forth. Note that for the observables \(\mathsf {A}^{(n)}\) the ordering of the outcomes is not reflected in the form of the effects \(\mathsf {A}^{(n)}(\varvec{x})\); instead, the only relevant fact is the total number of “+” or “-” outcomes in \(\varvec{x}\). We can hence divide post-processing into two steps: first, we group all the arrays with the same number of “+” outcomes, after which we study how these effects should be relabelled to form the final observable \(\mathsf {B}^{(n)}\).

Letting p denote the number of “+” outcomes in a given n-length measurement array, the first step in the post-processing leads to

$$\begin{aligned} \bar{\mathsf {A}}^{(n)}(p) := \sum _{\varvec{x}\in I_{n,p}} \mathsf {A}^{(n)}(\varvec{x}) = \left( {\begin{array}{c}n\\ p\end{array}}\right) A^{p} (I-A)^{n-p} , \end{aligned}$$
(18)

where \(p\in \{ 0,1,\ldots ,n \}\) and \(I_{n,p}\) is set of all n-arrays containing exactly p “+” outcomes.

In the second step we create the final binary observable \(\mathsf {B}^{(n)}\), where

$$\begin{aligned} \mathsf {B}^{(n)}(+) = \sum _{p=0}^n w_p \bar{\mathsf {A}}^{(n)}(p)=\sum _{p=0}^n w_p \left( {\begin{array}{c}n\\ p\end{array}}\right) A^p(I-A)^{n-p}, \end{aligned}$$
(19)

and \(w_p\in \{0,1\}\) are weights determining if the outcome arrays containing p “+” outcomes are relabelled into “+” or “-”. One might expect a ‘majority rule’, whereby arrays with more “+” than “-” will be relabelled to “+”. In the following we see that this is, indeed, the case and we further analyse the success probability.

The n-round success probability \(P_\mathrm {succ}^{(n)}\) is

$$\begin{aligned} \begin{aligned} P_\mathrm {succ}^{(n)}&= \frac{1}{2}\bigg (\mathrm {tr}_{}\!\left[ \rho _+\mathsf {B}^{(n)}(+)\right] +\mathrm {tr}_{}\!\left[ \rho _-\mathsf {B}^{(n)}(-)\right] \bigg )=\frac{1}{2}\bigg (1+\mathrm {tr}_{}\!\left[ \mathsf {B}^{(n)}(+)(\rho _+-\rho _-)\right] \bigg )\\&= \frac{1}{2}\left( 1 + \sum _{p=0}^n w_p \left( {\begin{array}{c}n\\ p\end{array}}\right) \bigg (\lambda ^p(1-\lambda )^{n-p}-\lambda ^{n-p}(1-\lambda )^p\bigg )\right) , \end{aligned} \end{aligned}$$

and so in wanting to maximise the success probability we must decide the appropriate weights \(w_p\). The suitable solution for this depends on n, and we will therefore consider the cases for odd and even n separately.

If we have performed an odd number of repetitions, then the sum contains an even number of terms, and so can be neatly split between \(0\le p \le \frac{n-1}{2}\) and \(\frac{n+1}{2}\le p \le n\). Since \(\lambda \ge 1/2\), we can see that the value \(\lambda ^p(1-\lambda )^{n-p}-\lambda ^{n-p}(1-\lambda )^p\) is negative for any value of p belonging to the first half of the split, from which we conclude that its corresponding weight ought to be 0. This means that the corresponding outcome arrays are interpreted as “-”. At the same time, for the second half of the split the quantity \(\lambda ^p(1-\lambda )^{n-p}-\lambda ^{n-p}(1-\lambda )^p\) is positive and so the weight ought to be 1. This means that the corresponding outcome arrays are interpreted as “+”. Combining these pieces of information together we can conclude that in the case of an odd n integer of repetitions the maximum success probability is

$$\begin{aligned} P_\mathrm {succ}^{(n)} = \frac{1}{2}\left( 1 + \sum _{p=\frac{n+1}{2}}^n \left( {\begin{array}{c}n\\ p\end{array}}\right) \big (\lambda ^p(1-\lambda )^{n-p}-\lambda ^{n-p}(1-\lambda )^p\big )\right) . \end{aligned}$$
(20)

If we perform an even number of repetitions, then the summation for \(P_\mathrm {succ}^{(n)}\) contains an odd number of terms that can be split into three groups: \(0\le p \le \frac{n}{2}-1\), \(\frac{n}{2}+1\le p \le n\) and \(p=\frac{n}{2}\). By the same logic as in the odd case, \(w_p=0\) for \(0\le p \le \frac{n}{2}-1\) and \(w_p = 1\) for \(\frac{n}{2}+1\le p \le n\). For \(p=\frac{n}{2}\) we observe that \(\mathrm {tr}_{}\!\left[ \rho _+\bar{\mathsf {A}}^{(n)}(p)\right] = \mathrm {tr}_{}\!\left[ \rho _-\bar{\mathsf {A}}^{(n)}(p)\right] \), which means that the corresponding measurement outcome arrays are ambiguous with respect to \(\{ \rho _+,\rho _-\}\). We are therefore free to use any weighting \(w_{n/2}\) as it will not change the success probability, so we set \(w_{n/2}=0\). Hence, we conclude that the maximum success probability for an even n is

$$\begin{aligned} P_\mathrm {succ}^{(n)} = \frac{1}{2}\left( 1 + \sum _{p=\frac{n}{2}+1}^n \left( {\begin{array}{c}n\\ p\end{array}}\right) \big (\lambda ^p(1-\lambda )^{n-p}-\lambda ^{n-p}(1-\lambda )^p\big )\right) . \end{aligned}$$
(21)

Remarkably, according to the following theorem, an even number of repetitions provides no further improvement in success over the odd number that proceeds it:

Theorem 1

Let n be an odd integer. The success probability of distinguishing \(\rho _{+}\) and \(\rho _{-}\) after n repeated measurements of \(\mathsf {A}\) is

$$\begin{aligned} P_\mathrm {succ}^{(n)}=P_\mathrm {succ}^{(n+1)} = \sum _{i=\frac{n+1}{2}}^n \left( {\begin{array}{c}n\\ i\end{array}}\right) \left( {\begin{array}{c}i-1\\ \frac{n-1}{2}\end{array}}\right) (-1)^{i-\frac{n+1}{2}} \lambda ^i. \end{aligned}$$
(22)
Fig. 2
figure 2

The success probability of distinguishing between two eigenstates of a binary observable \(\mathsf {A}\) in arbitrary dimension d after n measurements of \(\mathsf {A}\). a Comparison of the success probability distribution for \(n=,21\) and 41. b Comparison of the success probability for \(n=1,\dots ,50\) when \(\lambda =0.6,0.7\) and 0.8. As n increases the success probability nears 1 for lower and lower values of the maximum eigenvalue \(\lambda \)

A plot of this success probability for several values of n is given in Fig. 2 in terms of its overall form (Fig. 2a) and for particular values of \(\lambda \) (Fig. 2b). As we increase n we need smaller values of \(\lambda \) (corresponding to noisier observables) to near complete success in distinguishing the states. The proof of this result requires us to derive the success probability for both the case of n and \(n+1\) with odd n and show that they coincide. To simplify this we make use of the following lemma:

Lemma 1

For odd n and any \(\frac{n+1}{2}\le i \le n\),

$$\begin{aligned} \sum _{j=0}^{i-\frac{n+1}{2}}\left( {\begin{array}{c}i\\ j\end{array}}\right) (-1)^{j}=\left( {\begin{array}{c}i-1\\ \frac{n-1}{2}\end{array}}\right) (-1)^{i-\frac{n+1}{2}}. \end{aligned}$$
(23)

Proof

This is a direct consequence of the sum and found by collecting common factors:

$$\begin{aligned} \sum _{j=0}^{i-\frac{n+1}{2}}\left( {\begin{array}{c}i\\ j\end{array}}\right) (-1)^{j}&=1- i + \frac{i(i-1)}{2} -\frac{i(i-1)(i-2)}{3!}+\cdots \\&\quad +\frac{i(i-1)(i-2)\dots (i-i+\frac{n-1}{2})}{(i-\frac{n+1}{2})!}(-1)^{i-\frac{n+1}{2}}\\&=-(i-1)\left( 1 -\frac{i}{2}+ \frac{i(i-2)}{3!} -\cdots +\frac{i(i-2)\cdots (i-i+\frac{n-1}{2})}{(i-\frac{n+1}{2})!}(-1)^{i-\frac{n+3}{2}}\right) \\&=\frac{(i-1)(i-2)}{2}\left( 1 - \frac{i}{3} -\cdots +\frac{2i\cdots (i-i+\frac{n-1}{2})}{(i-\frac{n+1}{2})!}(-1)^{i-\frac{n+5}{2}}\right) \\&=\frac{(-1)^{i-\frac{n+1}{2}}}{(i-\frac{n+1}{2})!}\prod _{k=1}^{i-\frac{n+1}{2}}(i-k)= (-1)^{i-\frac{n+1}{2}} \frac{(i-1)!}{(i-\frac{n+1}{2})!\frac{n-1}{2}!}=(-1)^{i-\frac{n+1}{2}} \left( {\begin{array}{c}i-1\\ \frac{n-1}{2}\end{array}}\right) . \end{aligned}$$

\(\square \)

Proof of Theorem 1

We first consider the case for odd n. We have already argued that the observable \(\mathsf {B}^{(n)}\) that optimises the success probability must be of the form

$$\begin{aligned} \mathsf {B}^{(n)}(+)&=\sum _{i=\frac{n+1}{2}}^n \left( {\begin{array}{c}n\\ i\end{array}}\right) A^i (I-A)^{n-i} , \quad \mathsf {B}^{(n)}(-) = \sum _{i=\frac{n+1}{2}}^n \left( {\begin{array}{c}n\\ i\end{array}}\right) A^{n-i} (I-A)^{i} . \end{aligned}$$

Since the states \(\rho _\pm \) satisfy \(\mathrm {tr}_{}\!\left[ A\rho _+\right] =\mathrm {tr}_{}\!\left[ (I-A)\rho _-\right] =\lambda \), we see that \(\mathrm {tr}_{}\!\left[ \rho _+\mathsf {B}^{(n)}(+)\right] =\mathrm {tr}_{}\!\left[ \rho _-\mathsf {B}^{(n)}(-)\right] \), and so the success probability reduces to

$$\begin{aligned} P_\mathrm {succ}^{(n)} = \sum _{i=\frac{n+1}{2}}^n \left( {\begin{array}{c}n\\ i\end{array}}\right) \lambda ^i (1-\lambda )^{n-i}. \end{aligned}$$

By making use of the binomial expansion \((1-\lambda )^{n-i}=\sum _{j=0}^{n-i}\left( {\begin{array}{c}n-i\\ j\end{array}}\right) (-\lambda )^j\), we can rewrite the success probability as

$$\begin{aligned} \begin{aligned} P_\mathrm {succ}^{(n)}&= \sum _{i=\frac{n+1}{2}}^n \left( {\begin{array}{c}n\\ i\end{array}}\right) \sum _{j=0}^{n-i}\left( {\begin{array}{c}n-i\\ j\end{array}}\right) (-1)^j\lambda ^{i+j}=\sum _{i=\frac{n+1}{2}}^n \sum _{j=0}^{n-i}\left( {\begin{array}{c}n\\ i+j\end{array}}\right) \left( {\begin{array}{c}i+j\\ j\end{array}}\right) (-1)^j\lambda ^{i+j} . \end{aligned} \end{aligned}$$

If we let \(\ell =i+j\), noting that \(\ell = \frac{n+1}{2},\dots ,n\), and \(j=\ell -i = 0,\dots , \ell -\frac{n+1}{2}\), then we arrive at

$$\begin{aligned} P_\mathrm {succ}^{(n)}=\sum _{i=\frac{n+1}{2}}^n \left( {\begin{array}{c}n\\ i\end{array}}\right) \sum _{j=0}^{i-\frac{n+1}{2}}\left( {\begin{array}{c}i\\ j\end{array}}\right) (-1)^j \lambda ^i \end{aligned}$$

which, by Lemma 1, is our intended result for the odd case.

For the even case we start by recalling that the success probability does not depend on which effect contains the operators \(A^{\frac{n+1}{2}}(I-A)^{\frac{n+1}{2}}\) as they do not contribute to the final success probability, and so we can choose

$$\begin{aligned} \mathsf {B}^{(n)}(+)= \sum _{i=\frac{n+3}{2}}^{n+1}\left( {\begin{array}{c}n+1\\ i\end{array}}\right) A^i (I-A)^{n+1-i} . \end{aligned}$$

This means that the complement effect is of the form

$$\begin{aligned} \mathsf {B}^{(n)}(-)= \left( {\begin{array}{c}n+1\\ \frac{n+1}{2}\end{array}}\right) A^{\frac{n+1}{2}}(I-A)^{\frac{n+1}{2}} + \sum _{i=\frac{n+3}{2}}^{n+1}\left( {\begin{array}{c}n+1\\ i\end{array}}\right) A^{n+1-i} (I-A)^{i} , \end{aligned}$$

from which we see that the success probability is

$$\begin{aligned} P_\mathrm {succ}^{(n+1)}= \sum _{i=\frac{n+3}{2}}^{n+1}\left( {\begin{array}{c}n+1\\ i\end{array}}\right) \lambda ^{i}(1-\lambda )^{n+1-i} +\frac{1}{2}\left( {\begin{array}{c}n+1\\ \frac{n+1}{2}\end{array}}\right) \lambda ^{\frac{n+1}{2}}(1-\lambda )^{\frac{n+1}{2}}. \end{aligned}$$
(24)

Again making use of the binomial expansion of \((1-\lambda )^\frac{n+1}{2}\) we see that the second term in Eq. (24) (omitting the factor of \(\frac{1}{2}\)) can be written as

$$\begin{aligned} \left( {\begin{array}{c}n+1\\ \frac{n+1}{2}\end{array}}\right) \lambda ^{\frac{n+1}{2}}(1-\lambda )^{\frac{n+1}{2}}&= \sum _{i=0}^\frac{n+1}{2} \left( {\begin{array}{c}n+1\\ \frac{n+1}{2}\end{array}}\right) \left( {\begin{array}{c}\frac{n+1}{2}\\ i\end{array}}\right) (-1)^{i}\lambda ^{i+\frac{n+1}{2}}\\&=\sum _{i=\frac{n+1}{2}}^{n+1} \left( {\begin{array}{c}n+1\\ \frac{n+1}{2}\end{array}}\right) \left( {\begin{array}{c}\frac{n+1}{2}\\ i-\frac{n+1}{2}\end{array}}\right) (-1)^{i-\frac{n+1}{2}}\lambda ^{i}\\&=\left( {\begin{array}{c}n+1\\ \frac{n+1}{2}\end{array}}\right) \lambda ^\frac{n+1}{2}+\sum _{i=\frac{n+3}{2}}^{n+1} \left( {\begin{array}{c}n+1\\ i\end{array}}\right) \left( {\begin{array}{c}i\\ \frac{n+1}{2}\end{array}}\right) (-1)^{i-\frac{n+1}{2}}\lambda ^{i}, \end{aligned}$$

which, when combined with the corresponding expansion for the first term

$$\begin{aligned} \sum _{i=\frac{n+3}{2}}^{n+1}\left( {\begin{array}{c}n+1\\ i\end{array}}\right) \lambda ^{i}(1-\lambda )^{n+1-i}&= \sum _{i=\frac{n+3}{2}}^{n+1} \sum _{j=0}^{n+1-i}\left( {\begin{array}{c}n+1\\ i\end{array}}\right) \left( {\begin{array}{c}n+1-i\\ j\end{array}}\right) (-1)^{j}\lambda ^{i+j}\\&= \sum _{i=\frac{n+3}{2}}^{n+1} \sum _{j=0}^{i-\frac{n+3}{2}}\left( {\begin{array}{c}n+1\\ i\end{array}}\right) \left( {\begin{array}{c}i\\ j\end{array}}\right) (-1)^{j}\lambda ^{i}, \end{aligned}$$

leads to the following form of the success probability:

$$\begin{aligned} P_\mathrm {succ}^{(n+1)}= \frac{1}{2}\left( {\begin{array}{c}n+1\\ \frac{n+1}{2}\end{array}}\right) \lambda ^\frac{n+1}{2}+ \sum _{i=\frac{n+3}{2}}^{n+1}\!\left( {\begin{array}{c}n+1\\ i\end{array}}\right) \!\left( \frac{1}{2}\left( {\begin{array}{c}i\\ \frac{n+1}{2}\end{array}}\right) (-1)^{i-\frac{n+1}{2}}+\sum _{j=0}^{i-\frac{n+3}{2}}\left( {\begin{array}{c}i\\ j\end{array}}\right) (-1)^{j} \right) \lambda ^i. \end{aligned}$$
(25)

To reach \(P_\mathrm {succ}^{(n)}\) we need to remove terms of the order \(\lambda ^{n+1}\), as well as binomial coefficients involving \(n+1\). To resolve the first of these we note that for odd n

$$\begin{aligned} \begin{aligned} \sum _{j=0}^{n+1}\left( {\begin{array}{c}n+1\\ j\end{array}}\right) (-1)^j&= \sum _{j=0}^{\frac{n-3}{2}}\left( {\begin{array}{c}n+1\\ j\end{array}}\right) (-1)^j + \sum _{j=\frac{n+3}{2}}^{n+1}\left( {\begin{array}{c}n+1\\ j\end{array}}\right) (-1)^j +\left( {\begin{array}{c}n+1\\ \frac{n+1}{2}\end{array}}\right) (-1)^\frac{n+1}{2}\\&= 2 \sum _{j=0}^{\frac{n-3}{2}}\left( {\begin{array}{c}n+1\\ j\end{array}}\right) (-1)^j + \left( {\begin{array}{c}n+1\\ \frac{n+1}{2}\end{array}}\right) (-1)^\frac{n+1}{2} = 0 , \end{aligned} \end{aligned}$$

since \(\sum _{j=0}^{n+1}\left( {\begin{array}{c}n+1\\ j\end{array}}\right) (-1)^j= (1-1)^{n+1} = 0\). From this we see that the \(\lambda ^{n+1}\) terms cancel. For the binomial coefficients we begin by noting that for a given integer \(i\le n\), \(\left( {\begin{array}{c}n+1\\ i\end{array}}\right) \) can be rewritten:

$$\begin{aligned} \left( {\begin{array}{c}n+1\\ i\end{array}}\right) =\frac{(n+1)!}{i! (n+1-i)!}= \frac{n+1}{n+1-i}\left( {\begin{array}{c}n\\ i\end{array}}\right) = \left( 1 + \frac{i}{n+1-i}\right) \left( {\begin{array}{c}n\\ i\end{array}}\right) , \end{aligned}$$

and in particular \(\left( {\begin{array}{c}n+1\\ \frac{n+1}{2}\end{array}}\right) = 2\left( {\begin{array}{c}n\\ \frac{n+1}{2}\end{array}}\right) \). Combining these we can rewrite Eq. (25) as

$$\begin{aligned} P_\mathrm {succ}^{(n+1)}&= \left( {\begin{array}{c}n\\ \frac{n+1}{2}\end{array}}\right) \lambda ^\frac{n+1}{2} + \sum _{i=\frac{n+3}{2}}^{n}\left( 1 + \frac{i}{n+1-i}\right) \left( {\begin{array}{c}n\\ i\end{array}}\right) \left( \frac{1}{2}\left( {\begin{array}{c}i\\ \frac{n+1}{2}\end{array}}\right) (-1)^{i-\frac{n+1}{2}}+\sum _{j=0}^{i-\frac{n+3}{2}}\left( {\begin{array}{c}i\\ j\end{array}}\right) (-1)^{j} \right) \lambda ^i\\&=\left( {\begin{array}{c}n\\ \frac{n+1}{2}\end{array}}\right) \left( {\begin{array}{c}\frac{n+1}{2}\\ 0\end{array}}\right) \lambda ^\frac{n+1}{2} + \sum _{i=\frac{n+3}{2}}^{n}\left( 1 + \frac{i}{n+1-i}\right) \left( {\begin{array}{c}n\\ i\end{array}}\right) \left( \sum _{j=0}^{i-\frac{n+1}{2}}\left( {\begin{array}{c}i\\ j\end{array}}\right) (-1)^{j}-\frac{1}{2}\left( {\begin{array}{c}i\\ \frac{n+1}{2}\end{array}}\right) (-1)^{i-\frac{n+1}{2}} \right) \lambda ^i\\&= \sum _{i=\frac{n+1}{2}}^{n}\sum _{j=0}^{i-\frac{n+1}{2}}\left( {\begin{array}{c}n\\ i\end{array}}\right) \left( {\begin{array}{c}i\\ j\end{array}}\right) (-1)^{j}\lambda ^i + \sum _{i=\frac{n+3}{2}}^{n} \frac{i}{n+1-i} \left( \left( {\begin{array}{c}i-1\\ \frac{n-1}{2}\end{array}}\right) (-1)^{i-\frac{n+1}{2}}-\frac{n+1}{2i}\left( {\begin{array}{c}i\\ \frac{n+1}{2}\end{array}}\right) (-1)^{i-\frac{n+1}{2}}\right) \lambda ^i\\&= P_\mathrm {succ}^{(n)}, \end{aligned}$$

where we have again used Lemma 1 between the second and third line, as well as between the third and fourth. \(\square \)

5 Higher outcome observables

In this section, \(\{\rho _i\}_{i=1}^N\) is a set of mutually orthogonal pure states and \(\mathsf {A}\) is a commutative N-valued observable satisfying the conditions (9). We further assume that \(1/N< \lambda < 1\) since the boundary values are not interesting. We recall that \(\mathsf {A}^{(n)}\) denotes the n-round observable of the form in Equation (3), i.e.

$$\begin{aligned} \mathsf {A}^{(n)}(x_1,\ldots ,x_n)=\mathsf {A}(x_1)\mathsf {A}(x_2)\cdots \mathsf {A}(x_n) . \end{aligned}$$
(26)

We start by characterising the ambiguous measurement arrays for the given task. For any \(\varvec{x}\in \Omega _\mathsf {A}^n\) and \(j\in \Omega _\mathsf {A}\), we denote by \(m(\varvec{x},j)\) the multiplicity of j in \(\varvec{x}\), i.e. the number of occurrences of j in \(\varvec{x}\). For instance, \(m((1,2,1),1)=2\), \(m((1,2,1),2)=1\) and \(m((1,2,1),3)=0\).

Proposition 2

Let \(\varvec{x}\in \Omega _\mathsf {A}^n\). The following are equivalent:

  1. (i)

    \(\varvec{x}\) is ambiguous with respect to \(\{ \rho _{i_1},\ldots ,\rho _{i_k}\}\),

  2. (ii)

    \(m(\varvec{x},i_1)=m(\varvec{x},i_2)=\cdots = m(\varvec{x},i_k)\).

Proof

Fix an outcome array \(\varvec{x}\). From Equations (3) and (9), we can see that for any state \(\rho _j\) and result array \(\varvec{x}\),

$$\begin{aligned} \mathrm {tr}_{}\!\left[ \rho _j\mathsf {A}^{(n)}(\varvec{x})\right] = \lambda ^{m_j} \left( \frac{1-\lambda }{N-1}\right) ^{n-m_j} , \end{aligned}$$
(27)

where we have denoted \(m_j:=m(\varvec{x},j)\).

Let us assume that (i) holds. For ease of notation, and without loss of generality, let us assume that \(\varvec{x}\) is ambiguous with regard to the first k states, i.e. \(\mathrm {tr}_{}\!\left[ \rho _j\mathsf {A}^{(n)}(\varvec{x})\right] = \text {const.}\) for \(j\in \{1,\dots ,k\}\). By equation (27) this means that

$$\begin{aligned} \lambda ^{m_i} \left( \frac{1-\lambda }{N-1}\right) ^{n-m_i} = \lambda ^{m_j} \left( \frac{1-\lambda }{N-1}\right) ^{n-m_j} \end{aligned}$$
(28)

for \(i,j\in \{1,\dots ,k\}\). We make a counter assumption that (ii) does not hold, i.e. \(m_i\ne m_j\) for some distinct \(i,j\in \{1,\dots ,k\}\). Without loss of generality, we can assume that \(m_i>m_j\), and hence \(n-m_j > n-m_i\). We can therefore rearrange (28) to arrive at

$$\begin{aligned} \lambda ^{m_i - m_j} = \left( \frac{1-\lambda }{N-1}\right) ^{m_i - m_j}. \end{aligned}$$
(29)

Since \(m_i - m_j>0\), we can see that \(\lambda = (1-\lambda )/(N-1)\), and so \(\lambda = 1/N\). However, this is in contradiction to the initial assumption \(\lambda > 1/N\) and so \(m_i=m_j\) for all \(i,j\in \{1,\dots ,k\}\).

To prove that (ii) implies (i), it is sufficient to see from Eq. (27) that any set of k elements of \(\Omega _\mathsf {A}\) with the same multiplicity in a given outcome array \(\varvec{x}\) will produce the same probabilities for their respective states. \(\square \)

The previous result gives another confirmation of our earlier proof of the ‘rule of three’ in the case \(N=2\). The situation is different for \(N>2\), as in that case an outcome array \((i_1,i_2)\) is not ambiguous with respect to the full set \(\{\rho _i\}_{i=1}^N\). Hence, one might expect to gain some partial information from having a given outcome in a second measurement; that is, after obtaining (1,2) outcome array one may assume that the measured state is either \(\rho _1\) or \(\rho _2\) but no other. However, as we will see, this partial information does not improve our likelihood of success in discrimination, and it is only by performing the third measurement that such results become more informative. The ‘rule of three’ therefore also applies in this case.

Proposition 3

The success probability of distinguishing the states \(\{\rho _i\}_{i=1}^N\) does not increase if a second measurement of \(\mathsf {A}\) is performed, i.e. \(P_\mathrm {succ}^{(2)}=P_\mathrm {succ}^{(1)}\).

Proof

For a given effect \(\mathsf {A}^{(2)}(i,j)\) and state \(\rho _k,\) we obtain one of three possible outcomes:

$$\begin{aligned} \mathrm {tr}_{}\!\left[ \rho _k\mathsf {A}^{(2)}(i,j)\right] ={\left\{ \begin{array}{ll} \lambda ^2, \quad &{}i=j=k,\\ \lambda \left( \frac{1-\lambda }{N-1} \right) , \quad &{}i=k\ne j \ \mathrm {or} \ j=k\ne i,\\ \left( \frac{1-\lambda }{N-1} \right) ^2, \quad &{} i\ne k \ne j. \end{array}\right. } \end{aligned}$$
(30)

We post-process \(\mathsf {A}^{(2)}\) via the Markov kernel \(w:(k,(i,j))\mapsto w^k_{ij}\), \(i,j,k=1,\dots ,N\), to form the N-ary observable \(\mathsf {B}^{(2)}\) with effects \(\mathsf {B}^{(2)}(k) = \sum _{i,j} w^k_{ij} \mathsf {A}^{(2)}(i,j)\). The success probability \(P_\mathrm {succ}^{(2)}\) takes the form

$$\begin{aligned} P_\mathrm {succ}^{(2)}&=\frac{1}{N}\mathrm {tr}_{}\!\left[ \rho _k \sum _{k=1}^N\mathsf {B}^{(2)}(k)\right] \nonumber \\&= \frac{\lambda ^2}{N}\sum _{k=1}^N w^k_{kk} +\frac{\lambda (1-\lambda )}{N(N-1)} \sum _{k=1}^N\sum _{i\ne k}(w^k_{ik}+w^k_{ki}) + \frac{1}{N} \left( \frac{1-\lambda }{N-1} \right) ^2\sum _{k=1}^N\sum _{i,j\ne k}w^k_{ij} . \end{aligned}$$
(31)

The final term in Eq. (31) can be decomposed as follows:

$$\begin{aligned} \sum _{k=1}^N \sum _{i,j\ne k} w^k_{ij}&= \sum _{k=1}^N\left( \sum _{i\ne k} w^k_{ii} + \sum _{i\ne k}\sum _{j\ne k,i} w^k_{ij}\right) = \sum _{k=1}^N\left( \sum _{i} w^k_{ii} -w^k_{kk} + \sum _{i\ne k}\sum _{j\ne k,i} w^k_{ij}\right) \nonumber \\&= \sum _{k=1}^N\sum _{i} w^k_{ii} + \sum _{k=1}^N\left( - w^k_{kk} + \sum _{i\ne k}\sum _{j\ne k,i} w^k_{ij}\right) \end{aligned}$$
(32)
$$\begin{aligned}&= N + \sum _{k=1}^N\left( - w^k_{kk} + \sum _{i\ne k}\sum _{j\ne k,i} w^k_{ij}\right) , \end{aligned}$$
(33)

where the last equality is a consequence of the normalisation of the kernel: \(\sum _k \sum _i w^k_{ii} = \sum _i \sum _k w^k_{ii} = N\).

Next we elaborate on the middle term in Eq. (31). For a fixed k we have

$$\begin{aligned} \sum _{i\ne k}(w^k_{ik}+w^k_{ki})+ \sum _{i\ne k} \sum _{j\ne k, i} w^k_{ij} = \sum _{i} \sum _{j\ne i} w^k_{ij} \end{aligned}$$
(34)

and therefore

$$\begin{aligned} \begin{aligned} \sum _{k=1}^N\sum _{i\ne k}(w^k_{ik}+w^k_{ki})&= \sum _{k=1}^N\sum _{i} \sum _{j\ne i} w^k_{ij} - \sum _{k=1}^N\sum _{i\ne k} \sum _{j\ne k, i} w^k_{ij} \\&=N(N-1) - \sum _{k=1}^N\sum _{i\ne k} \sum _{j\ne k, i} w^k_{ij} . \end{aligned} \end{aligned}$$
(35)

Making use of Eqs. (32) and (35), we can rearrange Eq. (31):

$$\begin{aligned} P_\mathrm {succ}^{(2)} =&\left( \frac{1-\lambda }{N-1} \right) ^2 + \lambda (1-\lambda ) + \frac{1}{N} \left( \lambda ^2 - \left( \frac{1-\lambda }{N-1} \right) ^2\right) \sum _{k=1}^N w^k_{kk} \nonumber \\&+ \frac{1}{N} \left( \left( \frac{1-\lambda }{N-1} \right) ^2 - \frac{\lambda (1-\lambda )}{N-1} \right) \sum _{k=1}^N \sum _{i\ne k} \sum _{j\ne k,i} w^k_{ij}\nonumber \\ =&\left( \frac{1-\lambda }{N-1} \right) ^2 + \lambda (1-\lambda ) + \frac{(N\lambda -1)(1+(N-2)\lambda )}{N(N-1)^2}\sum _{k=1}^N w^k_{kk} \nonumber \\&+ \frac{(1-\lambda )(1-N\lambda )}{N(N-1)^2}\sum _{k=1}^N \sum _{i\ne k} \sum _{j\ne k,i} w^k_{ij}. \end{aligned}$$
(36)

For the range of \(\lambda \) considered, namely \(\frac{1}{N}< \lambda < 1\), we have \((N\lambda -1)(1+(N-2)\lambda )>0\) whereas \((1-\lambda )(1-N\lambda ) < 0\). Hence, to maximise \(P_\mathrm {succ}^{(2)}\) we set the weights \(w^k_{kk}=1\) for all k and \(w^k_{ij}=0\) whenever all indices ijk are different. These choices lead to a valid Markov kernel if we further set \(w^i_{jj}=0\), \(w^i_{ij}=1\) and \(w^j_{ij}=0\) for all \(i\ne j\). In doing so Eq. (36) reduces to

$$\begin{aligned} P_\mathrm {succ}^{(2)} = \left( \frac{1-\lambda }{N-1} \right) ^2 + \lambda (1-\lambda )+ \lambda ^2 - \left( \frac{1-\lambda }{N-1} \right) ^2= \lambda =P_\mathrm {succ}^{(1)} . \end{aligned}$$
(37)

\(\square \)

We proceed to higher rounds of repetitions.

Proposition 4

The success probability of distinguishing the states \(\{\rho _i\}_{i=1}^N\) after three and four repeated measurements of \(\mathsf {A}\) is given by

$$\begin{aligned} P_\mathrm {succ}^{(3)}=&\frac{1}{N-1}\lambda \big ((N-2)+(N+1)\lambda - N\lambda ^2\big ), \end{aligned}$$
(38)
$$\begin{aligned} P_\mathrm {succ}^{(4)}=&\frac{1}{(N-1)^2}\lambda \big ((N-2)(N-3) + 3(N^2-3)\lambda \nonumber \\&+ (4+7N-5N^2)\lambda ^2 + 2N(N-2)\lambda ^3\big ). \end{aligned}$$
(39)

Proof

We provide the derivation of \(P_\mathrm {succ}^{(3)}\); the same method applies to \(P_\mathrm {succ}^{(4)}\) (with some additional care needed). Let \(\varvec{x}=(x_1,x_2,x_3)\in \{1,\dots ,N\}^3\) be an array of measurement outcomes, with the corresponding effect for the observable \(\mathsf {A}^{(3)}\) being \(\mathsf {A}^{(3)}(\varvec{x})=\mathsf {A}(x_1)\mathsf {A}(x_2)\mathsf {A}(x_3)\). Fix a state \(\rho _j\in \{\rho _i\}_{i=1}^N\) and so, letting \(m_j = m(\varvec{x},j)\), the probability of obtaining outcome array \(\varvec{x}\) is

$$\begin{aligned} \mathrm {tr}_{}\!\left[ \rho _j \mathsf {A}^{(3)}(\varvec{x})\right] = \lambda ^{m_j} \left( \frac{1-\lambda }{N-1}\right) ^{3-m_j}. \end{aligned}$$

We now post-process \(\mathsf {A}^{(3)}\) via the deterministic kernel \(w:\{1,\dots ,N\} \times \{1,\dots ,N\}^3 \rightarrow \{0,1\}\) to arrive at the observable \(\mathsf {B}^{(3)}\) with effects \(\mathsf {B}^{(3)}(y)= \sum _{\varvec{x}} w(y,\varvec{x})\mathsf {A}(\varvec{x})\). Let \(j\in \{1,\dots ,N\}\) and \(k=0,\dots ,3\), then define the subsets \(X^j_k = \{\varvec{x}\in \{1,\dots ,N\}^3\ |\ m(\varvec{x},j) =k \}\). For \(k=0,1,2,3\), these subsets have order \((N-1)^3, 3(N-1)^2, 3(N-1)\) and 1, respectively. We therefore have the following decomposition:

$$\begin{aligned} \begin{aligned} \mathrm {tr}_{}\!\left[ \rho _j \mathsf {B}^{(3)}(j)\right] =&w(j, \varvec{j}) \lambda ^3 + \lambda ^2 \frac{1-\lambda }{N-1} \sum _{\varvec{x}\in X^j_2} w(j,\varvec{x}) \\&+ \lambda \left( \frac{1-\lambda }{N-1}\right) ^2 \sum _{\varvec{x}\in X^j_1} w(j,\varvec{x}) + \left( \frac{1-\lambda }{N-1}\right) ^3 \sum _{\varvec{x}\in X^j_0} w(j,\varvec{x}), \end{aligned} \end{aligned}$$
(40)

where we introduce the notation \(\varvec{j}=(j,j,j)\).

The set \(X^j_1\) has the following decomposition

$$\begin{aligned} X^j_1 = Y^j_2 \cup Y^j_{1,1}, \end{aligned}$$
(41)

where \(Y^j_2 = \{\varvec{x}\in X^j_1\ |\ m(\varvec{x},k) =2, k\ne j\}\) and \(Y^j_{1,1} = \{\varvec{x}\in X^j_1\ |\ m(\varvec{x},k) =1 \ \mathrm {and} \ m(\varvec{x},\ell ) =1, k,\ell \ne j\}\). The subsets \(Y^j_2\) and \(Y^j_{1,1}\) have orders \(3(N-1)\) and \(3(N-1)(N-2)\), respectively, for each j. In a similar fashion, we can decompose \(X^j_0\) in the following way:

$$\begin{aligned} X^j_0 = Z^j_3 \cup Z^j_{2,1} \cup Z^j_{1,1,1} \end{aligned}$$
(42)

where \(Z^j_3 = \{\varvec{x}\in X^j_0\ |\ m(\varvec{x},k)=3, k\ne j\}\), \(Z^j_{2,1} = \{\varvec{x}\in X^j_0\ |\ m(\varvec{x},k) =2 \ \mathrm {and} \ m(\varvec{x},\ell ) =1, k,\ell \ne j\}\) and \(Z^j_{1,1,1} = \{\varvec{x}\in X^j_0\ |\ m(\varvec{x},k) =1, m(\varvec{x},\ell ) =1 \ \mathrm {and} \ m(\varvec{x},r)=1, k,\ell ,r\ne j\}\).

Making use of Eqs. (40), (41) and (42), the success probability can be expressed as

$$\begin{aligned} P_\mathrm {succ}^{(3)} =&\frac{\lambda ^3}{N}\sum _j w(j,\varvec{j}) + \frac{\lambda ^2(1-\lambda )}{N(N-1)}\sum _j \sum _{\varvec{x}\in X^j_2} w(j,\varvec{x}) \nonumber \\&+ \frac{\lambda (1-\lambda )^2}{N(N-1)^2} \sum _j\sum _{\varvec{x}\in X^j_1} w(j,\varvec{x})+ \frac{(1-\lambda )^3}{N(N-1)^3} \sum _j\sum _{\varvec{x}\in X^j_0} w(j,\varvec{x})\nonumber \\ =&\frac{\lambda ^3}{N}\sum _j w(j,\varvec{j}) + \frac{\lambda ^2(1-\lambda )}{N(N-1)}\sum _j \sum _{\varvec{x}\in X^j_2} w(j,\varvec{x})\nonumber \\&+ \frac{\lambda (1-\lambda )^2}{N(N-1)^2} \sum _j\left( \sum _{\varvec{x}\in Y^j_2} w(j,\varvec{x})+\sum _{\varvec{x}\in Y^j_{1,1}} w(j,\varvec{x})\right) \nonumber \\&+ \frac{(1-\lambda )^3}{N(N-1)^3} \sum _j \left( \sum _{\varvec{x}\in Z^j_3} w(j,\varvec{x})+\sum _{\varvec{x}\in Z^j_{2,1}} w(j,\varvec{x})+\sum _{\varvec{x}\in Z^j_{1,1,1}} w(j,\varvec{x})\right) . \end{aligned}$$
(43)

By noticing that \(Z^j_3 = \{\varvec{k}\ | \ k\ne j\}\), we can see that \(\sum _{\varvec{x}\in Z^j_3} w(j,\varvec{x}) = \sum _k w(j,\varvec{k}) - w(j,\varvec{j})\), and hence

$$\begin{aligned} \sum _j \sum _{\varvec{x}\in Z^j_3} w(j,\varvec{x}) = \sum _j \sum _k w(j,\varvec{k}) - \sum _j w(j,\varvec{j}) = N- \sum _j w(j,\varvec{j}), \end{aligned}$$
(44)

where we again utilise the normalisation of the kernel: \(\sum _j \sum _k w(j,\varvec{k}) = \sum _k \big (\sum _j w(j,\varvec{k})\big ) = N\). If we now denote by \(S_2 = \{\varvec{x} \ | \ \exists \ j, m(\varvec{x},j)=2 \}\) the set of vectors containing an element of multiplicity two, then we find that for each j, \(\sum _{\varvec{x}\in Z^j_{2,1}} w(j,\varvec{x}) = \sum _{\varvec{x}\in S_2} w(j,\varvec{x}) - \sum _{\varvec{x}\in X^j_2} w(j,\varvec{x}) - \sum _{\varvec{x}\in Y^j_{2}} w(j,\varvec{x})\). In a similar way to Eq. (44) we find that

$$\begin{aligned} {\begin{matrix} \sum _j\sum _{\varvec{x}\in Z^j_{2,1}} w(j,\varvec{x}) &{}= \sum _j \sum _{\varvec{x}\in S_2} w(j,\varvec{x}) - \sum _j\sum _{\varvec{x}\in X^j_2} w(j,\varvec{x}) - \sum _j\sum _{\varvec{x}\in Y^j_{2}} w(j,\varvec{x})\\ &{}= 3N(N-1) - \sum _j\sum _{\varvec{x}\in X^j_2} w(j,\varvec{x}) - \sum _j\sum _{\varvec{x}\in Y^j_{2}} w(j,\varvec{x}), \end{matrix}} \end{aligned}$$
(45)

where \(3N(N-1)=\left| S_2 \right| \). Finally, we let \(S_{1} = \{(i,j,k)\ | \ i\ne j\ne k\ne i\}\) be the set of vectors where each component has multiplicity one. From this we have that \(\sum _{\varvec{x}\in Z^j_{1,1,1}} w(j,\varvec{x}) = \sum _{\varvec{x}\in S_{1}} w(j,\varvec{x}) - \sum _{\varvec{x}\in Y^j_{1,1}} w(j,\varvec{x})\) for each j, and so

$$\begin{aligned} \begin{aligned} \sum _j\sum _{\varvec{x}\in Z^j_{1,1,1}} w(j,\varvec{x})&= \sum _j \sum _{\varvec{x}\in S_1} w(j,\varvec{x}) - \sum _j\sum _{\varvec{x}\in Y^j_{1,1}} w(j,\varvec{x})\\&= N(N-1)(N-2) - \sum _j\sum _{\varvec{x}\in Y^j_{1,1}} w(j,\varvec{x}), \end{aligned} \end{aligned}$$
(46)
Fig. 3
figure 3

A comparison of \(P_\mathrm {succ}^{(1)}\), \(P_\mathrm {succ}^{(3)}\) and \(P_\mathrm {succ}^{(4)}\) for an \(N=10\)-outcome observable of the form given in the section, acting on a Hilbert space of arbitrary dimension \(d\ge 10\). For any N-ary observable of this form, and for \(\lambda \in [\frac{1}{N},1]\), the success probability of distinguishing the N states after three measurements is greater than when a single measurement is performed and, contrary to the binary case, the success probability after 4 measurements is greater than after 3

where \(N(N-1)(N-2)=\left| S_1 \right| \). Inserting Eqs. (44) to (46) into Eq. (43), we arrive at

$$\begin{aligned} P^{(3)}_\mathrm {succ}&=\frac{(1-\lambda )^3}{(N-1)^3} + \frac{3(1-\lambda )^3}{(N-1)^2} + \frac{(N-2)(1-\lambda )^3}{(N-1)^2} + \left( \frac{\lambda ^3}{N} -\frac{(1-\lambda )^3}{N(N-1)^3}\right) \sum _j w(j,\varvec{j})\nonumber \\&\quad + \left( \frac{\lambda ^2(1-\lambda )}{N(N-1)} - \frac{(1-\lambda )^3}{N(N-1)^3} \right) \sum _j \sum _{\varvec{x}\in X^j_2} w(j,\varvec{x})\nonumber \\&\quad +\left( \frac{\lambda (1-\lambda )^2}{N(N-1)^2} - \frac{(1-\lambda )^3}{N(N-1)^3}\right) \sum _j\left( \sum _{\varvec{x}\in Y^j_2} w(j,\varvec{x})+\sum _{\varvec{x}\in Y^j_{1,1}} w(j,\varvec{x})\right) . \end{aligned}$$
(47)

Since the \(\lambda \ge 1/N\), it follows that \((1-\lambda )/(N-1)\le 1/N\le \lambda \), hence every factor in front of a summation in Eq. (47) is positive, which suggests that each such kernel value in the summations should be equal to one. However, this would lead to certain outcome arrays being counted more than once: for a given array (ijj), say, one could clearly let \(w(i,(i,j,j))=1\) since \((i,j,j)\in Y^i_2\), or \(w(j,(i,j,j))=1\) since \((i,j,j)\in X^j_2\), but not both because \(w(i,(i,j,j))+w(j,(i,j,j))\le 1\). Because the elements in \(X^j_2\) provide a larger contribution, and \(\cup _j X^j_2 = \cup _j Y^j_2\), we set \(w(j,\varvec{x})=1\) for \(\varvec{x}\in X^j_2\) and \(w(j,\varvec{x})=0\) for \(\varvec{x}\in Y^j_2\). Similarly, for a given outcome array (ijk) one has its inclusion three times since \((i,j,k)\in Y^i_{1,1}\cap Y^j_{1,1}\cap Y^k_{1,1}\). As such, the summation over the \(Y^j_{1,1}\) subsets needs to be divided by three (this may be alternatively seen as randomly assigning the outcome array (ijk) to \(\mathsf {B}^{(3)}(i)\), \(\mathsf {B}^{(3)}(j)\) or \(\mathsf {B}^{(3)}(k)\)).

Combining these results, Eq. (47) reduces to

$$\begin{aligned} \begin{aligned} P^{(3)}_\mathrm {succ} =&\frac{(1-\lambda )^3}{(N-1)^3} + \frac{3(1-\lambda )^3}{(N-1)^2} + \frac{(N-2)(1-\lambda )^3}{(N-1)^2} + \left( \frac{\lambda ^3}{N} -\frac{(1-\lambda )^3}{N(N-1)^3}\right) \left| \bigcup _j X^j_3 \right| \\&+ \left( \frac{\lambda ^2(1-\lambda )}{N(N-1)} - \frac{(1-\lambda )^3}{N(N-1)^3} \right) \left| \bigcup _j X^j_2 \right| +\frac{1}{3}\left( \frac{\lambda (1-\lambda )^2}{N(N-1)^2} - \frac{(1-\lambda )^3}{N(N-1)^3}\right) \left| \bigcup _j Y^j_{1,1} \right| \\ =&\frac{(1-\lambda )^3}{(N-1)^3} + \frac{3(1-\lambda )^3}{(N-1)^2} + \frac{(N-2)(1-\lambda )^3}{(N-1)^2} + \left( \lambda ^3 -\frac{(1-\lambda )^3}{(N-1)^3}\right) \\&+ 3\left( \lambda ^2(1-\lambda ) - \frac{(1-\lambda )^3}{(N-1)^2} \right) +(N-2)\left( \frac{\lambda (1-\lambda )^2}{N-1} - \frac{(1-\lambda )^3}{(N-1)^2}\right) \\ =&\lambda ^3 + 3 \lambda ^2(1-\lambda ) +(N-2)\frac{\lambda (1-\lambda )^2}{N-1} = \frac{1}{N-1}\lambda \big ((N-2)+(N+1)\lambda - N\lambda ^2\big ). \end{aligned} \end{aligned}$$
(48)

\(\square \)

For the values of \(\lambda \) considered, namely \(\frac{1}{N}< \lambda < 1\), the probability \(P_\mathrm {succ}^{(3)}\) is strictly larger than \(P_\mathrm {succ}^{(1)}\) and \(P_\mathrm {succ}^{(4)}\) strictly larger than \(P_\mathrm {succ}^{(3)}\), as shown for the case of \(N=10\) in Fig. 3. This is in contrast to the binary case, where they coincide.

6 Conclusions

This paper has considered the task of minimum error state discrimination for a set of mutually orthogonal states, under the restriction of using one unsharp observable however many times as desired. In the case of binary observables distinguishing two states we encountered a ‘rule of three’, where by performing the measurement twice would provide no further advantage over a single instance, but a third measurement led to an increase for all unsharpness values \(\lambda \in (\tfrac{1}{2},1)\). As the number of repetitions increased, it was shown that the success probability would only improve when an odd number of steps were performed, which may be seen as an overcoming of ambiguous results that are present in the even cases. Whilst this shows that there exists benefits in performing repeated measurements, one must be cautious in the number of iterations performed, as an even number proves redundant compared to odd.

In the case of commutative N-valued observables with \(N>2\), the rule of three was still shown to hold, but the step increase in success probability found in binary observables was no longer present. Since there are more possible outcomes we should not be surprised at such a result, as such ambiguity will not rise at the same points. However, as Equation (47) shows, we do not suddenly see steps occurring every three iterations for trinary observables. This suggests that we need not be as cautious about accidentally performing a redundant additional measurement, as in the binary case, but as is seen in Fig. 3, the increase for \(N>2\) between steps is perhaps less dramatic.