Quantum State Discrimination via Repeated Measurements and the Rule of Three

The task of state discrimination for a set of mutually orthogonal pure states is trivial if one has access to the corresponding sharp (projection-valued) measurement, but what if we are restricted to an unsharp measurement? Given that any realistic measurement device will be subject to some noise, such a problem is worth considering. In this paper we consider minimum error state discrimination for mutually orthogonal states with a noisy measurement. We show that by considering repetitions of commutative L\"uders measurements on the same system we are able to increase the probability of successfully distinguishing states. In the case of binary L\"uders measurements we provide a full characterisation of the success probabilities for any number of repetitions. This leads us to identify a 'rule of three', where no change in probability is obtained from a second measurement but there is noticeable improvement after a third. We also provide partial results for $N$-valued commutative measurements where the rule of three remains, but the general pattern present in binary measurements is no longer satisfied.


I. INTRODUCTION
The task of discriminating states via 'one-shot' measurements is fundamental to a number of processes in many scenarios: classically, the solution of a calculation performed on a computer cannot be read, nor can a private encryption key be shared, unless we are able to correctly identify the resulting bits. This remains true within the quantum realm, where any quantum process is only ever as good as our ability to distinguish the possible states the system may be in. However, due to the nature of states within quantum mechanics, we are subject to difficulties not seen classically: whereas a single measurement is generally sufficient to fully distinguish between a set of possible pure states in a classic system, for a collection of pure quantum states it is impossible to distinguish them in such a setting, with the exception of a set of mutually orthogonal states. As such, the aim is to maximise our chances of distinguishing the states via clever choices of measurement.
As is the case with a number of notions within quantum theory, there is no clear-cut 'best choice' for what one wishes to optimise over in a state discrimination task, and by extension no obvious choice of measurement to perform in any given circumstance. In the 1970's Helstrom [1], Holevo [2,3], and Yuen et al. [4] began work on minimum error state discrimination, which aimed to minimise the average error in determining each possible state in a set via their corresponding measurement outcome. In the late 1980's Ivanonic [5] (followed by Dieks [6] and Peres [7]) considered so-called unambiguous state discrimination, where the goal was no longer to minimise error, but rather to maximise the chance of unambiguously distinguishing states where possible. More recently an additional figure of merit referred to as "maximum confidence" was introduced by Kosut et al. [8] and Croke et al. [9] (who coined the term), which refers to how confident we may be that we started with a particular state, given that we obtained its corresponding measurement outcome. Each of these figures of merit come with shortcomings: in minimum error state discrimination we will still find some instances of incorrect measurement outcomes for a given state, whereas the latter two admit inconclusive measurement results where no state can be inferred. As such, one generally decides a particular figure of merit based on their preferred trade-off. For more in-depth summaries of state discrimination tasks and their applications we refer the reader to [10][11][12].
It is well known that for distinguishing a set of mutually orthogonal pure states (with regard to any figure of merit) the obvious choice is a sharp observable containing the states in question as projections. However, in this work we consider minimum error state discrimination in the case where one is unable to use such a sharp observable, and instead some unsharp variant is available. This construct is in contrast to the standard problem presented, where a set of non-orthogonal states are distinguished, but considering our scenario is physically justified: unsharp measurements can allow for the possibility of sequential or repeated measurements without complete loss of the initial state, and furthermore encapsulate the reality that any measurement will contain some inherent noise, be it the result of mechanical or human error. In fact, it is impossible to perform projective measurements on quantum systems using finite resources [13]. While this may seem to be a hindrance, we shall see that the aforementioned admission of repeated measurements works in our favour.
In [14] it was shown that in the limit of number of repetitions going to infinity, the information content of an unsharp binary observable tends to that of its spectral counterpart, but no consideration was given to intermediate steps. This is the point of interest for this paper, as it is not only a more realistic scenario, but allows us to see where the trade-off point lies where an increased number of measurements provides diminishing returns in terms of increased success probability. In a realistic scenario one would aim to perform a finite number of measurement rounds and in doing so reach a sufficiently high confidence about the initial state.
The use of repeated measurements in state discrimination tasks have previously been adopted in the context of unambiguous state discrimination [15][16][17], and for minimum error discrimination [18,19]. However, in the studied scenarios the observables measured differ over the course of the measurement process (with, for example [18] relying on sequential binary measurements to approximate an N -valued observable as described in [20]). In another example [15] there are multiple agents aiming to gather the same information, solely based on their individually obtained measurement outcomes. In this paper we instead restrict ourselves to performing repetitions of the same N -valued observable on the system, thereby receiving a string of outcomes from which we can post-process the data to determine the most likely state among N alternatives. Mathematically, this translates to constructing a joint observable whose effects we partition to form a new N -valued observable, which is then used to calculate the success probability.
We begin this paper with an overview of the necessary concepts in Section II; these contain the mathematical descriptions of finite outcome observables and repeated measurements, as well as expressing the problem of minimum error state discrimination and post-processing of observables. We motivate our main work in Section III, where we explicitly calculate the success probability for distinguishing two eigenvectors of the x-direction operator σ x by repeated measurements of the unsharp spin-x observable X t . This example introduces us to the "rule of three" that will be prevalent in the cases considered later: the second measurement does not lead to an increase in the success probability, but the third does. The case of binary observables is considered in Section IV for an arbitrary number of repetitions, and we find a general pattern whereby the success probability only increases for odd numbers of repetitions. We extend the results of Section IV by providing some partial results for commutative N -valued observables (N > 2) in Section V, where the rule of three is found to still hold, though the general pattern for binary observables is no longer seen to hold. Finally, we discuss our results in Section VI.

A. Quantum measurements
In this section we recall the basic formalism of quantum measurements. (For a more detailed discussion see, for example [21].) We shall generally consider an arbitrary separable Hilbert space H of (possibly infinite) dimension d and its space L(H) of bounded linear operators. The operators we will primarily work with are the so-called effects on H: Quantum observables are mathematically described by positive operator-valued measures (POVMs), and we will restrict ourselves to N -valued observables, where N is a finite integer. An N -valued POVM A is a map from the finite outcome set Ω A = {1, 2, . . . , N } to the set of effects; specifically, A : (ii) I x is trace non-increasing for all x ∈ Ω; (iii) x I x is trace preserving. some particular way of measuring it. If the observable A is measured in a way described by the instrument I on a system in state ρ and outcome x is recorded, then the unnormalised state of the system after the measurement will be given by I x (ρ).

B. Repeated quantum measurements
In our investigation, we restrict ourselves to situations where the same measurement apparatus is used repeatedly on the same system; see Figure 1. Suppose that I is the instrument describing the measurement of an observable A. Then, after n repeated measurements we get an outcome array (x 1 , . . . , x n ) ∈ Ω n A with probability tr ρA (n) where I xn • · · · • I x1 is the functional composition of the corresponding operations. This equation, assumed to be valid for all input states ρ, determines the observable A (n) I , which we shall refer to as an n-round observable. The mathematical structure of A (n) I depends crucially on the specific form of I. For instance, if I is of the measureand-prepare form, then no subsequent measurement can give anything more than was already found in the first measurement. Further examples of repeated measurements that do not provide new information after the first, or some finite number of, repetitions are given in [14].
In the current work we limit ourselves to Lüders instruments, i.e., we choose I x (ρ) = A(x)ρ A(x). For ease of notation we denote the corresponding n-round observable A (n) I simply by A (n) (since no other type of instrument will be considered in what follows). Hence, If A is commutative (i.e. [A(x), A(y)] = 0 for all x, y), then the previous expression reduces to We observe that in this case the probability of getting a particular outcome array (x 1 , . . . , x n ) in a state ρ does not depend on the order of the outcomes.

C. State discrimination in n rounds
We are considering a particular quantum information task, namely, the discrimination of states. We assume that the initial observable A with outcome set Ω A = {1, . . . , N } discriminates N states {ρ 1 , . . . , ρ N } with some error that is larger than in the optimal discrimination of these states. The observable A can be, for instance, a noisy version of the observable that optimally discriminates the N states. The success probability P (1) succ for successfully distinguishing the states with A is given by where we have assumed uniform a priori distribution of the states.
By repeating the measurement we hope to increase the probability of guessing the correct state. After n Lüders measurements of A, resulting in the observable A (n) with outcome set Ω n A , we possess statistics for N n possible strings of measurement outcomes. In order to assess the ability of A (n) in distinguishing the original N states, we need to post-process A (n) such that we are left with an N -valued observable denoted by B (n) with outcome set Ω A . This means that for each outcome array (x 1 , . . . , x n ), we need to decide the most likely state ρ j and hence relabel this outcome array into j.
Mathematically, the post-processing is performed by a Markov kernel w : If each x determines a unique j (i.e., w(j, x) ∈ {0, 1} for all j ∈ Ω A and x ∈ Ω n A ), then we say that w is deterministic. The post-processed observable is given by from which we arrive at the n-round success probability P (n) succ : This expression clearly depends on the chosen post-processing. As we want to maximize P (n) succ , we can restrict to deterministic Markov kernels; other Markov kernels are their convex mixtures [22]. The choice of the optimal postprocessing will be studied in Sections. IV and V.
In performing the previously described method of state discrimination via repeated measurements, we may come across outcome arrays that do not suggest to us one particular state over another. Let S ⊆ S(H) be a subset of states. We say that a measurement outcome array x ∈ Ω n A is ambiguous with respect to S if tr ρA (n) (x) = const. for all ρ ∈ S . Two illustrative examples of this notion in the case of n = 1 are the following. Firstly, if S is the whole state space S(H), then a trivial observable T : x → p x I only possesses outcomes that are ambiguous with respect to S . Secondly, when S is equal to an orthonormal basis {ϕ i }, then any sharp observable in a basis mutually unbiased to {ϕ i } will only possess outcomes that are ambiguous with respect to the states.
We remark that the previous framework for state discrimination via repeated measurements would work similarly had we chosen another instrument I than the Lüders instrument. The forms of B (n) and P (n) succ obviously depend on the chosen instrument and, as said before, we do not mark the instrument simply because we stick to Lüders instruments.

D. Noisy measurement in state discrimination
The general framework discussed in earlier subsections is applicable for any A and I. In this section we consider a more specific setting that will be relevant for what follows. We begin with a collection of N states {ρ 1 , . . . , ρ N } that are perfectly distinguishable, i.e., there exists an N -outcome observable D such that tr[ρ i D(j)] = δ ij . This is the case if and only if they are orthogonal pure states or, more generally, mixed states with orthogonal supports. (Note that if N < d, then D is not unique.) However, we assume that such D is not available, and instead we use a noisy observable A to distinguish the states. We assume that A is still reasonably good in distinguishing the states {ρ 1 , . . . , ρ N }, which we take to mean that tr[ρ j A(j)] ≥ tr[ρA(j)] and tr[ρ i A(j)] ≤ tr[ρA(j)] if i = j for all states ρ. We further make a simplifying uniformity assumption that tr[ρ j A(j)] is the same for all j and similarly tr[ρ i A(j)] is the same for all i, j. and Then and λ ≥ 1 N .
Proof. It follows from (7) and (8) that λ and µ are the maximal and minimal eigenvalues of A(j), respectively. Then, as λ is the maximal eigenvalue, a unit vector ψ ∈ H satisfies ψ|A(j)ψ = λ only if A(j)ψ = λψ. By using the spectral decompsition of ρ j we then conclude that A(j)ρ j = λρ j . Analogous reasoning shows that A(j)ρ i = µρ i for i = j. Finally, from which (9) follows.
We take the setting of Prop. 1 as our starting point in the following investigations. We remark that (9) does not determine A uniquely unless N = d. For example, if D is any observable that perfectly discriminates the states, then the observable A given by where µ = 1−λ N −1 , satisfies condition (9).

III. MOTIVATING QUBIT EXAMPLE
In order to motivate our main result, we first provide an explicit example. Consider the qubit system C 2 and suppose that we wish to distinguish between the two eigenstates of the σ x operator P ± = |± ±|. We assume that we must attempt to do so via an unsharp unbiased spin-x measurement that is parametrised by t ∈ [0, 1]; i.e., our observable, denoted X t , is given by The eigenvalues of these effects are λ ± = (1 ± t)/2, where λ + ≥ λ − and λ + + λ − = 1. The success probability of distinguishing between P + and P − with X t is Performing the Lüders measurement of the observable a second time leads to the sequential observable X The first two effects can be seen to be confirmatory in nature, since the first and second measurements outcomes are in agreement, whereas the last two are ambiguous as they result in the same values for any state. In order to assess how capable X (2) t is of distinguishing P ± we must post-process the observable to create a new binary observable. Taking the most general post-processing possible, we let B (2) , with weights 0 ≤ w + ij ≤ 1, denote the first effect of this new observable that we consider to be the "+" outcome. The second effect is then B (2) The success probability at this stage, denoted by P (2) succ , is given by where in the last equality we have relied on I = P + + P − . Using the forms in Equation (13) we find that the success probability is Note that, as expected, neither w + +− nor w + −+ contribute to the success probability, as ambiguous results should not be able to help us draw a conclusion about which state was measured. Since the eigenvalues λ ± sum to one, we can rewrite λ 2 + − λ 2 − = 2λ + − 1. Furthermore, since the weights are non-negative, we have w + ++ − w + −− ≤ w + ++ ≤ 1, and so the optimal success probability arises when w + ++ = 1 and w + −− = 0, which simply leads to P succ . In other words, performing a second measurement of X t does not improve our likelihood of distinguishing between P + and P − .
However, if we repeat the measurement another time then we will notice an improvement. The observable X (3) t is given by With a similar calculation to before we arrive at the success probability P succ = 3λ 2 + − 2λ 3 + , which is strictly greater than P (1) succ for 0 < t < 1. We have therefore seen that whilst performing the unsharp measurement twice will not provide us with an advantage in distinguishing the states, we do gain one by performing it a third time. We should stress, however, this does not mean that the observable X (2) t is in general equivalent to X t . Indeed, unless t = 1 and the observable were sharp, X (2) t is strictly higher in post-processing ordering [23] than X t since any effect of X t can be reached from post-processings of X (2) t but not vice versa. It is the considered discrimination task for which X t and X (2) t perform equally well.

IV. BINARY OBSERVABLES
Consider a separable (not necessarily finite) Hilbert space H and two perfectly distinguishable states ρ + and ρ − . Let A be a binary observable satisfying (9) for these states, i.e., for some 1 2 ≤ λ ≤ 1. To simplify notation we shall adopt the convention A(+) = A (and hence A(−) = I − A). The success probability P (1) succ of discriminating the two states ρ + and ρ − via the observable A is given by Since the effects A and I − A necessarily commute, the n-round observables A (n) are of the form given in Equation (3). For instance, the 2-round observable A (2) has effects and so forth. Note that for the observables A (n) the ordering of the outcomes is not reflected in the form of the effects A (n) (x); instead, the only relevant fact is the total number of "+" or "-" outcomes in x. We can hence divide post-processing into two steps: first we group all the arrays with the same number of "+" outcomes, after which we study how these effects should be relabelled to form the final observable B (n) .
Letting p denote the number of "+" outcomes in a given n-length measurement array, the first step in the postprocessing leads toĀ where p ∈ {0, 1, . . . , n} and I n,p is set of all n-arrays containing exactly p "+" outcomes.
In the second step we create the final binary observable B (n) , where and w p ∈ {0, 1} are weights determining if the outcome arrays containing p "+" outcomes are relabelled into "+" or "-". One might expect a 'majority rule', whereby arrays with more "+" than "-" will be relabelled to "+". In the following we see that this is, indeed, the case and we further analyse the success probability. The n-round success probability P (n) succ is and so in wanting to maximise the success probability we must decide the appropriate weights w p . The suitable solution for this depends on n, and we will therefore consider the cases for odd and even n separately. If we have performed an odd number of repetitions, then the sum contains an even number of terms, and so can be neatly split between 0 ≤ p ≤ n−1 2 and n+1 2 ≤ p ≤ n. Since λ ≥ 1/2, we can see that the value λ p (1−λ) n−p −λ n−p (1−λ) p is negative for any value of p belonging to the first half of the split, from which we conclude that its corresponding weight ought to be 0. This means that the corresponding outcome arrays are interpreted as "-". At the same time, for the second half of the split the quantity λ p (1 − λ) n−p − λ n−p (1 − λ) p is positive and so the weight ought to be 1. This means that the corresponding outcome arrays are interpreted as "+". Combining these pieces of information together we can conclude that in the case of an odd n integer of repetitions the maximum success probability is If we perform an even number of repetitions, then the summation for P (n) succ contains an odd number of terms that can be split into three groups: 0 ≤ p ≤ n 2 − 1, n 2 + 1 ≤ p ≤ n and p = n 2 . By the same logic as in the odd case, w p = 0 for 0 ≤ p ≤ n 2 − 1 and w p = 1 for n 2 + 1 ≤ p ≤ n. For p = n 2 we observe that tr ρ +Ā (n) (p) = tr ρ −Ā (n) (p) , which means that the corresponding measurement outcome arrays are ambiguous with respect to {ρ + , ρ − }. We are therefore free to use any weighting w n/2 as it will not change the success probability, so we set w n/2 = 0. Hence, we conclude that the maximum success probability for an even n is Remarkably, according to the following theorem, an even number of repetitions provides no further improvement in success over the odd number that proceeds it: Theorem 1. Let n be an odd integer. The success probability of distinguishing ρ + and ρ − after n repeated measurements of A is A plot of this success probability for several values of n is given in Figure 2 in terms of its overall form ( Figure 2a) and for particular values of λ ( Figure 2b). As we increase n we need smaller values of λ (corresponding to noisier observables) in order to near complete success in distinguishing the states. The proof of this result requires us to derive the success probability for both the case of n and n + 1 with odd n and show that they coincide. In order to simplify this we make use of the following lemma: Lemma 1. For odd n and any n+1 2 ≤ i ≤ n, Proof. This is a direct consequence of the sum and found by collecting common factors: Proof of Theorem 1. We first consider the case for odd n. We have already argued that the observable B (n) that optimises the success probability must be of the form Since the states ρ ± satisfy tr[Aρ + ] = tr[(I − A)ρ − ] = λ, we see that tr ρ + B (n) (+) = tr ρ − B (n) (−) , and so the success probability reduces to By making use of the binomial expansion (1 − λ) n−i = n−i j=0 n−i j (−λ) j , we can rewrite the success probability as If we let = i + j, noting that = n+1 2 , . . . , n, and j = − i = 0, . . . , − n+1 2 , then we arrive at which, by Lemma 1, is our intended result for the odd case.
For the even case we start by recalling that the success probability does not depend on which effect contains the operators A as they do not contribute to the final success probability, and so we can choose This means that the complement effect is of the form from which we see that the success probability is Again making use of the binomial expansion of (1 − λ) n+1 2 we see that the second term in Equation (24) (omitting the factor of 1 2 ) can be written as which, when combined with the corresponding expansion for the first term leads to the following form of the success probability: In order to reach P (n) succ we need to remove terms of the order λ n+1 , as well as binomial coefficients involving n + 1.
In order to resolve the first of these we note that for odd n n+1 j=0 n + 1 j From this we see that the λ n+1 terms cancel. For the binomial coefficients we begin by noting that for a given integer i ≤ n, n+1 i can be rewritten: and in particular n+1 . Combining these we can rewrite Equation (25) as where we have again used Lemma 1 between the second and third line, as well as between the third and fourth.

V. HIGHER-OUTCOME OBSERVABLES
In this section {ρ i } N i=1 is a set of mutually orthogonal pure states and A is a commutative N -valued observable satisfying the conditions (9). We further assume that 1/N < λ < 1 since the boundary values are not interesting. We recall that A (n) denotes the n-round observable of the form in Equation (3), i.e., We start by characterising the ambiguous measurement arrays for the given task. For any x ∈ Ω n A and j ∈ Ω A , we denote by m(x, j) the multiplicity of j in x, i.e., the number of occurences of j in x. For instance, m((1, 2, 1), 1) = 2, m((1, 2, 1), 2) = 1 and m ((1, 2, 1), 3) = 0.
The following are equivalent: (i) x is ambiguous with respect to {ρ i1 , . . . , ρ i k }, Proof. Fix an outcome array x. From Equations (3) and (9), we can see that for any state ρ j and result array x, where we have denoted m j := m(x, j).
Let us assume that (i) holds. For ease of notation, and without loss of generality, let us assume that x is ambiguous with regard to the first k states, i.e., tr ρ j A (n) (x) = const. for j ∈ {1, . . . , k}. By equation (27) this means that for i, j ∈ {1, . . . , k}. We make a counter assumption that (ii) does not hold, i.e., m i = m j for some distinct i, j ∈ {1, . . . , k}. Without loss of generality, we can assume that m i > m j , and hence n − m j > n − m i . We can therefore rearrange (28) to arrive at Since m i − m j > 0, we can see that λ = (1 − λ)/(N − 1), and so λ = 1/N . However, this is in contradiction to the initial assumption λ > 1/N and so m i = m j for all i, j ∈ {1, . . . , k}.
To prove that (ii) implies (i), it is sufficient to see from Equation (27) that any set of k elements of Ω A with the same multiplicity in a given outcome array x will produce the same probabilities for their respective states.
The previous result gives another confirmation of our earlier proof of the 'rule of three' in the case N = 2. The situation is different for N > 2, as in that case an outcome array (i 1 , i 2 ) is not ambiguous with respect to the full set {ρ i } N i=1 . Hence, one might expect to gain some partial information from having a given outcome in a second measurement; that is, after obtaining (1,2) outcome array one may assume that the measured state is either ρ 1 or ρ 2 but no other. However, as we will see, this partial information does not improve our likelihood of success in discrimination, and it is only by performing the third measurement that such results become more informative. The 'rule of three' therefore also applies in this case. Proof. For a given effect A (2) (i, j) and state ρ k we obtain one of three possible outcomes: We post-process A (2) via the Markov kernel w : (k, (i, j)) → w k ij , i, j, k = 1, . . . , N , to form the N -ary observable B (2) with effects B (2) (k) = i,j w k ij A (2) (i, j). The success probability P (2) succ takes the form The final term in Equation (31) can be decomposed as follows: where the last equality is a consequence of the normalisation of the kernel: k i w k ii = i k w k ii = N . Next we elaborate on the middle term in Equation (31). For a fixed k we have i =k and therefore Making use of Equations (32) and (34) we can rearrange Equation (31): For the range of λ considered, namely 1 Hence, in order to maximise P (2) succ we set the weights w k kk = 1 for all k and w k ij = 0 whenever all indices i, j, k are different. These choices lead to a valid Markov kernel if we further set w i jj = 0, w i ij = 1 and w j ij = 0 for all i = j. In doing so Equation (35) reduces to We proceed to higher rounds of repetitions.
The set X j 1 has the following decomposition where Y j 2 = {x ∈ X j 1 | m(x, k) = 2, k = j} and Y j 1,1 = {x ∈ X j 1 | m(x, k) = 1 and m(x, ) = 1, k, = j}. The subsets Y j 2 and Y j 1,1 have orders 3(N − 1) and 3(N − 1)(N − 2), respectively, for each j. In a similar fashion, we can decompose X j 0 in the following way: where Z j 3 = {x ∈ X j 0 | m(x, k) = 3, k = j}, Z j 2,1 = {x ∈ X j 0 | m(x, k) = 2 and m(x, ) = 1, k, = j} and Z j 1,1,1 = {x ∈ X j 0 | m(x, k) = 1, m(x, ) = 1 and m(x, r) = 1, k, , r = j}. Making use of Equations (39), (40) and (41), the success probability can be expressed as By noticing that Z j 3 = {k | k = j}, we can see that x∈Z j 3 w(j, x) = k w(j, k) − w(j, j), and hence where we again utilise the normalisation of the kernel: j k w(j, k) = k j w(j, k) = N . If we now denote by S 2 = {x | ∃ j, m(x, j) = 2} the set of vectors containing an element of multiplicity two, then we find that for each j, . In a similar way to Equation (43) we find that where 3N (N − 1) = |S 2 |. Finally, we let S 1 = {(i, j, k) | i = j = k = i} be the set of vectors where each component has multiplicity one. From this we have that x∈Z j 1,1,1 w(j, x) = x∈S1 w(j, x) − x∈Y j 1,1 w(j, x) for each j, and so j x∈Z j where N (N − 1)(N − 2) = |S 1 |. Inserting Equations (43) to (45) into Equation (42) we arrive at Since the λ ≥ 1/N , it follows that (1 − λ)/(N − 1) ≤ 1/N ≤ λ, hence every factor in front of a summation in Equation (46) is positive, which suggests that each such kernel value in the summations should be equal to one. However, this would lead to certain outcome arrays being counted more than once: for a given array (i, j, j), say, one could clearly let w(i, (i, j, j)) = 1 since (i, j, j) ∈ Y i 2 , or w(j, (i, j, j)) = 1 since (i, j, j) ∈ X j 2 , but not both because w(i, (i, j, j)) + w(j, (i, j, j)) ≤ 1. Because the elements in X j 2 provide a larger contribution, and ∪ j X j 2 = ∪ j Y j 2 , we set succ and P succ for an N = 10-outcome observable of the form given in the section, acting on a Hilbert space of arbitrary dimension d ≥ 10. For any N -ary observable of this form, and for λ ∈ [ 1 N , 1], the success probability of distinguishing the N states after three measurements is greater than when a single measurement is performed and, contrary to the binary case, the success probability after 4 measurements is greater than after 3. w(j, x) = 1 for x ∈ X j 2 and w(j, x) = 0 for x ∈ Y j 2 . Similarly, for a given outcome array (i, j, k) one has its inclusion three times since (i, j, k) ∈ Y i 1,1 ∩ Y j 1,1 ∩ Y k 1,1 . As such, the summation over the Y j 1,1 subsets needs to be divided by three (this may be alternatively seen as randomly assigning the outcome array (i, j, k) to B (3) (i), B (3) (j) or B (3) (k)).
Combining these results, Equation (46) reduces to For the values of λ considered, namely 1 N < λ < 1, the probability P succ is strictly larger than P succ and P (4) succ strictly larger than P succ , as shown for the case of N = 10 in Figure 3. This is in contrast to the binary case, where they coincide.

VI. CONCLUSIONS
This paper has considered the task of minimum error state discrimination for a set of mutually orthogonal states, under the restriction of using one unsharp observable however many times as desired. In the case of binary observables distinguishing two states we encountered a 'rule of three', where by performing the measurement twice would provide no further advantage over a single instance, but a third measurement led to an increase for all unsharpness values λ ∈ ( 1 2 , 1). As the number of repetitions increased, it was shown that the success probability would only improve when an odd number of steps were performed, which may be seen as an overcoming of ambiguous results that are present in the even cases. While this shows that there exists benefits in performing repeated measurements, one must be cautious in the number of iterations performed, as an even number proves redundant compared to odd.
In the case of commutative N -valued observables with N > 2, the rule of three was still shown to hold, but the step increase in success probability found in binary observables was no longer present. Since there are more possible outcomes we should not be surprised at such a result, as such ambiguity will not rise at the same points. However, as Equation (46) shows, we do not suddenly see steps occurring every three iterations for trinary observables. This suggests that we need not be as cautious about accidentally performing a redundant additional measurement, as in the binary case, but as is seen in Figure 3, the increase for N > 2 between steps is perhaps less dramatic.