1 Introduction

Uncertainty is an intrinsic property of quantum physics: Typically, a measurement of an observable can yield different results for two identically prepared states. This indeterminacy can be studied by considering the probability distribution of measurement outcomes given by the Born rule, and quantized by a number that characterizes the randomness of this distribution. The Shannon entropy is the most natural tool for this purpose. Obviously, the value of this quantity is determined by the choice of the initial state of the system before the measurement. When the number of possible measurement outcomes is finite and equals k, it varies from 0, if the measurement outcome is determined, to \(\ln k\), if all outcomes are equiprobable. If the measured observable is represented by a normalized rank-1 positive-operator-valued measure (POVM) on a d-dimensional complex Hilbert space, where \(d \le k\), then the upper bound is achieved for the maximally mixed state \(\mathbb I/d\). On the other hand, the Shannon entropy of measurement cannot be 0 unless the POVM is a projection-valued measure (PVM) representing projection (Lüders–von Neumann) measurement with \(k=d\), since it is bounded from below by \(\ln \left( k/d\right) \). Thus, in the general case, the following questions arise: how to choose the input state to minimize the uncertainty of the measurement outcomes, and what is the minimum value of the Shannon entropy for the distribution of measurement results in this case? In the present paper, we call this number the entropy of measurement.

The entropy of measurement has been widely studied by many authors since the 1960s [124], also in the context of entropic uncertainty principles [44], as well as in quantum information theory under the name of minimum output entropy of a quantum-classical channel [106]. Subtracting this quantity from \(\ln k\), we get the relative entropy of measurement (with respect to the uniform distribution), which may vary from 0 to \(\ln d\). In consequence, the optimization problem now reduces to finding its maximum value. Either way, we are looking for the “least quantum” or “most classical” states in the sense that the measurement of the system prepared in such a state gives the most defined results. The answer is immediate for a PVM, consisting of projections onto the elements of an orthonormal basis that are at the same time “most classical” with respect to this measurement. Such an obvious solution is not available for general POVM. Because of concavity of the entropy of measurement as a function of state, we only know that the optimal states must be pure.

Like many other optimization problems where the Shannon function \(\eta \left( x\right) =-x\ln x, x>0\) is involved, the minimization of the entropy of measurement seems to be too difficult to be solved analytically in the general case. In fact, analytical solutions have been found so far only for a few two-dimensional (qubit) cases, where the Bloch vectors of POVM elements constitute an n-gon [6, 52, 103], a tetrahedron [90] or an octahedron [32, 98]. All these POVMs are symmetric (group covariant), but, as we shall see, symmetry alone is not enough to solve the problem analytically. However, for symmetric rank-1 POVMs, the relative entropy of measurement gains an additional interpretation. It follows from [90] that it is equal to the informational power of measurement [6, 7], viz., the classical capacity of a quantum–classical channel generated by the POVM [64]. To distinguish the class of measurements for which the entropy minimization problem is feasible, we define highly symmetric (HS) normalized rank-1 POVMs as the symmetric subsets of the state space without non-trivial factors. The primary aim of this paper was to present a general method of attacking the minimization problem for such POVMs and to illustrate it, entirely solving the issue in the two-dimensional case.

Note that our method is not confined to qubits, and it works also in higher dimensions, at least for some important cases, though it is true that for dimension 3 or larger, it seems more difficult to be applied, mainly because the image of the Bloch representation of pure states is only a proper subset of the (generalized) Bloch sphere. However, one of us (A.S.) has published recently a paper [115], where the technique developed in an earlier version of the present paper has been used to find the minimum of the entropy for group covariant SIC-POVMs in dimension 3, including the Hesse HS SIC-POVM. The same method works also for a POVM consisting of four MUBs, again in dimension 3 (this result was first obtained by a different method in [4]), as well as for the 64 Hoggar lines HS SIC-POVM in dimension 8 [109]. After some additional work, one can prove that the technique developed in this paper for searching the minima can be also used to find the maximum entropy of the distribution in question among pure pre-measurement states for an arbitrary SIC-POVM in any dimension, as well as the maximum entropy for pure initial states for all HS-POVMs in dimension 2 [116]. Summarizing, this seems to be a quite universal technique of finding extrema, limited neither to qubits nor to the Shannon entropy, as it can be applied to various “entropy-like” quantities obtained with the help of other functions with similar properties as \(\eta \), such as power functions leading to the Rényi entropy or its variant, the Tsallis–Havrda–Charvát entropy [36], and even to more general “information functionals” considered in the same context in [23].

Going back to the dimension 2, we first classify all HS-POVMs, proving that their Bloch sphere representations must be either one of the five Platonic solids or the two quasiregular Archimedean solids (the cuboctahedron and icosidodecahedron), or belong to an infinite series of regular polygons. For such POVMs, we show that their entropy is minimal (and so the relative entropy is maximal), if and only if the input state is orthogonal to one of the states constituting a POVM. We present a unified proof of this fact for all eight cases, and for five of them (the cube, icosahedron, dodecahedron, cuboctahedron, and icosidodecahedron), the result seems to be new. Let us emphasize that commonly used methods of minimizing entropy, e.g., based on majorization, cannot be applied in all these cases.

The proof strategy is as follows. We consider a set \(S=\left\{ \sigma _{j}:j=1,\ldots ,k\right\} \) contained in the space of pure states (one-dimensional projections) \({\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \) representing a normalized rank-1 HS-POVM. The entropy of measurement H is given by \(H\left( \rho \right) =\sum \nolimits _{j=1}^{k}\eta \left( p_{j}\left( \rho \right) \right) \), where the probabilities of the measurement outcomes are \(p_{j}\left( \rho \right) =(d/k)\hbox {tr} \left( \sigma _{j}\rho \right) \) for \(\rho \in {\mathcal {P}}\left( {\mathbb {C}} ^{d}\right) \) and \(j=1,\ldots ,k\). We start from analyzing the group action of \(\hbox {Sym}(S)\), the group of unitary–antiunitary symmetries of S, on \({\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \). We identify points lying in the maximal stratum for this action, called inert states in physical literature. As the POVM is highly symmetric, this set contains S itself. According to the Michel theory of critical orbits of group actions [86, 88], the elements of the maximal stratum, being critical points for the entropy of measurement H, which is a \(\hbox {Sym} (S)\)-invariant function, are natural candidates for the minimizers. Studying their character, we see that H has local minima at the inert states \(\sigma _{j}^{\perp }\) (\(j=1,\ldots ,k\)) orthogonal to the elements of S. To prove that these minima are indeed global, we look for a simpler (polynomial) \(\hbox {Sym}(S)\)-invariant function P such that: \(P\le H\), and \(P=H\) at \(\sigma _{j}^{\perp }\) (\(j=1,\ldots ,k\)). To construct such a polynomial function, we define it as \(P\left( \rho \right) =\sum \nolimits _{j=1} ^{k}p\left( p_{j}\left( \rho \right) \right) \), for a polynomial p being a suitable Hermite approximation of \(\eta \) at values \(p_{j}\left( \sigma _{i}^{\perp }\right) \) (\(j=1,\ldots ,k\)) for some, and hence for all, \(i=1,\ldots ,k\). Now, it is enough to prove that these “suspicious” points are global minimizers for P, which is an apparently easier task. Proving that P has minimizers at \(\sigma _{j}^{\perp }\) (\(j=1,\ldots ,k\)), we use the fact that the structure of invariant polynomials for any finite subgroup of the projective unitary–antiunitary group is well known for \(d=2\). Employing a priori estimates for the degree of p, and hence for the degree of P, we can show that P is either constant, which completes the proof, or it is a low-degree polynomial function of known \(\hbox {Sym}(S)\)-invariant polynomials, which reduces the proof to a relatively easy algebraic problem. The following two points seem to be crucial to the proof: the form of the function \(\eta \) that guarantees that the Hermite interpolation polynomial p bounds \(\eta \) from below, and the knowledge of the subgroups of unitary–antiunitary group that may act as symmetry groups of the sets representing HS-POVMs as well as their invariant polynomials.

The problem considered in the present paper has a well-known continuous counterpart: the minimization of the Wehrl entropy over all pure states, see Sect. 6.4, where the (approximate) quantum measurement is described by an infinite family of group coherent states generated by a unitary and irreducible action of a linear group on a highly symmetric fiducial vector representing the vacuum. More than thirty years ago, Lieb [79], and quite recently, Lieb and Solovej [80] proved for harmonic oscillator and spin coherent states, respectively, that the minimum value of the Wehrl entropy is attained, when the state before the measurement is also a coherent state. Surprisingly, an analogous theorem need not be true in the discrete case, since the entropy of measurement need not be minimal for the states constituting the POVM. This discrepancy requires further study.

In Sect. 6.3, we show that the minimization of the entropy of measurement is also closely related to entropic uncertainty principles [123]. Indeed, every such principle leads to a lower bound for the entropy of some measurement, and conversely, such bounds may yield new uncertainty principles for single or multiple measurements. Moreover, in Sect. 6.5, we reveal the connection between the entropy of measurement and the quantum dynamical entropy with respect to this measurement [110], the quantity introduced independently by different authors to analyze the results of consecutive quantum measurements interwind with a given unitary evolution.

The rest of this paper is organized as follows. In Sect. 2, we review some of the standard material on quantum states and measurements including the generalized Bloch representation. In Sect. 3, we analyze the general notion of highly symmetric sets in metric spaces, and in Sect. 4, we apply this universal notion to normalized rank-1 POVMs. Section 5 contains the classification of all HS-POVMs in dimension 2. Section 6 provides a detailed exposition of entropy and relative entropy of quantum measurement as well as their relations to the notions of informational power and Wehrl entropy, and their connections with entropic uncertainty principles and quantum symbolic dynamics. In Sect. 7, we study local minima for the entropy of measurement in dimension 2, and in Sect. 8, we use Hermite interpolation and group-invariant polynomial techniques to derive our main theorem and to find the global minima in this case. Finally, in Sect. 9, we apply the obtained results to give a formula for the informational power of HS-POVMs in dimension 2.

2 Quantum states and POVMs

In this section, we collect all the necessary definitions and facts about quantum states and measurements that can be found, e.g., in [15] or [59]. Consider a quantum system for which the associated complex Hilbert space \(\mathcal {H}\) is finite dimensional, that is, \(\mathcal {H} ={\mathbb {C}}^{d}\) for some \(d=2,3,\ldots \). The pure states of the system can be described as the elements of the complex projective space \(\mathbb {P}\mathcal {H}=\mathbb {CP}^{d-1}\) endowed with the Fubini-Study (called also procrustean after Procrustes) Kähler metric given by \(D_{FS}\left( \left[ \varphi \right] ,\left[ \psi \right] \right) :=\arccos \frac{\left| \left\langle \varphi |\psi \right\rangle \right| }{\left\| \varphi \right\| \left\| \psi \right\| }\) for \(\varphi ,\psi \in \mathcal {H}\) [15, 49]. In this metric, there is only one geodesic between two pure states unless they are maximally remote [74, Theorem 1]. We can also identify \(\mathbb {P}\mathcal {H}\) with the set \({\mathcal {P}}\left( \mathcal {H}\right) \) of one-dimensional projections in \(\mathcal {H}\) by sending \(\left[ \varphi \right] \rightarrow P_{\varphi }:=\left| \varphi \right\rangle \left\langle \varphi \right| /\left\langle \varphi |\varphi \right\rangle \), where \(\left| \varphi \right\rangle \left\langle \varphi \right| \) denotes the orthogonal projection operator onto the subspace generated by \(\varphi \in \mathcal {H}\) (Dirac notation). The transferred metric on \({\mathcal {P}}\left( \mathcal {H}\right) \), also called the Fubini-Study metric, is given by \(D_{FS}\left( \rho ,\sigma \right) :=\arccos \sqrt{\hbox {tr}\left( \rho \sigma \right) }\) for \(\rho ,\sigma \in {\mathcal {P}}\left( \mathcal {H}\right) \). By \(\mathcal {S}\left( \mathcal {H}\right) \), we denote the convex closure of \({\mathcal {P}}\left( \mathcal {H}\right) \), that is, the set of density (positive semi-definite and trace one) operators on \(\mathcal {H}\), interpreted as \(\textit{mixed}\) \(\textit{states}\) of the system. Note that \(\dim _{\mathbb {R}}{\mathcal {P}}\left( \mathcal {H}\right) =2d-2\) and \(\dim _{\mathbb {R}}\mathcal {S}\left( \mathcal {H}\right) =d^{2}-1\). By \(m_{FS}\), we denote the unique unitarily invariant measure on \(\mathbb {CP}^{d-1}\) or, equivalently, on \({\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \).

The mixed states can be also described as elements of a (\(d^{2}-1\))-dimensional real Hilbert space (in fact, a Lie algebra) \(\mathcal L_s^0(\mathcal H)\) of Hermitian traceless operators on \(\mathcal {H}\), endowed with the Hilbert–Schmidt product given by \(\left\langle \left\langle \sigma ,\tau \right\rangle \right\rangle _{HS}:=\hbox {tr}\left( \sigma \tau \right) \) for \(\sigma ,\tau \in \mathcal L_s^0(\mathcal H)\). Namely, the map defined by \(b:\mathcal {S}\left( \mathcal {H}\right) \ni \rho \rightarrow \rho -I/d \in \mathcal L_s^0(\mathcal H)\) gives us an affine embedding (the generalized Bloch representation) of the set of mixed (respectively, pure) states into the ball (respectively, sphere) in \(\mathcal L_s^0(\mathcal H)\) of radius \(\sqrt{1-d^{-1}}\), called the generalized Bloch ball (respectively, the Bloch sphere). Note that the map \(A\mapsto iA\) allows us to identify \(\mathcal L_s^0(\mathcal H)\) with \(\mathfrak {su}(d)\), the Lie algebra of \(\text {SU}(d)\), consisting of traceless skew-adjoint operators. Only for \(d=2\), the map is onto, and for \(d>2\), its image (the Bloch vectors) constitutes a “thick” though proper subset of (\(d^{2}-1\))-dimensional ball, containing the (maximal) ball of radius \(1/\sqrt{d\left( d-1\right) }\) centered at 0. On the other hand, for \(d>2\), \(B\left( d\right) :=b\left( {\mathcal {P}}\left( \mathcal {H} \right) \right) \), the image of the space of pure states via b, constitutes a “thin” (\(2d-2\))-dimensional submanifold of the (\(d^{2} -2\))-sphere. The metric spaces \(\left( \mathbb {P}\mathcal {H},D_{FS}\right) \) and \(\left( B\left( d\right) ,D_{B}\right) \), where \(D_{B}\) is the great arc distance on the Bloch sphere, though non-isometric for \(d>2\), nevertheless are ordinally equivalent, as the distances \(D_{FS}\) and \(D_{B}\) are related by the formula \(D_{B}\left( b\left( \left| \varphi \right\rangle \left\langle \varphi \right| \right) ,b\left( \left| \psi \right\rangle \left\langle \psi \right| \right) \right) =\gamma \left( D_{FS}\left( \left[ \varphi \right] ,\left[ \psi \right] \right) \right) \) (\(\varphi ,\psi \in \mathcal {H}\)), where a convex function \(\gamma :\left[ 0,\pi /2\right] \rightarrow \mathbb {R}^{+}\) is given by \(\gamma \left( x\right) =\sqrt{1-d^{-1}}\arccos \frac{d\cos ^{2}x-1}{d-1}\) for \(0\le x\le \pi /2\). In other words, scalar products of state vectors in \({\mathbb {C}}^{d}\) and their images in \(\mathbb {R}^{d^{2}-1}\) fulfill the relation: \(\left| \left\langle \varphi |\psi \right\rangle \right| ^{2}=\left\langle \left\langle b\left( \left| \varphi \right\rangle \left\langle \varphi \right| \right) ,b\left( \left| \psi \right\rangle \left\langle \psi \right| \right) \right\rangle \right\rangle _{HS}+1/d\).

With a measurement of the system with a finite number k of possible outcomes, one can associate a positive-operator-valued measure (POVM) defining the probabilities of the outcomes. A finite POVM is an ensemble of positive semi-definite nonzero operators \(\varPi _{j}\) (\(j=1,\ldots ,k\)) on \(\mathcal {H}\) that sum to the identity operator, i.e., \(\sum \nolimits _{j=1} ^{k}\varPi _{j}=\mathbb {I}\). If the state of the system before the measurement (the input state) is \(\rho \), then the probability \(p_{j}\left( \rho \right) \) of the j-th outcome is given by the Born rule, \(p_{j}\left( \rho \right) =\hbox {tr}\left( \rho \varPi _{j}\right) \). In general situation, there is an infinite number of completely positive maps (measurement instruments in the sense of Davies and Lewis [40]) describing conditional state changes due to the measurement and producing the same measurement statistics, see [59, Ch. 5]. Among them, the efficient instruments [51] have particulary simple form: They are given by the solutions of the set of equations \(\varPi _{j}=A_{j}^{*}A_{j}\) (\(j=1,\ldots ,k\)), where \(A_{j}\) are bounded operators on \(\mathcal {H}\). If \(\rho \) is the input state and the measurement outcome is j, then the state of the system after the measurement is \(\rho _{j}^{post}=A_{j}\rho A_{j}^{*}/p_{j}\left( \rho \right) \). If, additionally, \(A_{j}=\sqrt{\varPi _{j}}\), we get so-called generalized Lüders instrument disturbing the initial state in the minimal way [41, p.  404].

A special class of POVMs are normalized rank- \(\textit{1}\) POVMs, where \(\varPi _{j}\) (\(j=1,\ldots ,k\)) are rank-1 operators and \(\hbox {tr}\left( \varPi _{j}\right) =\hbox {const}(j)=d/k\). Necessarily, \(k\ge d\) in this case, and there exists an ensemble of pure states \(\sigma _{j}\in {\mathcal {P}}\left( \mathcal {H}\right) \) (\(j=1,\ldots ,k\)) such that \(\varPi _{j}=\left( d/k\right) \sigma _{j}\). Thus, \(\sum \nolimits _{j=1} ^{k}\sigma _{j}=\left( k/d\right) \mathbb {I}\), and so a normalized rank-1 POVM can be also defined as a (multi-)set of points in \({\mathcal {P}}\left( \mathcal {H}\right) \) that constitutes a uniform (or normalized) tight frame in \({\mathcal {P}}\left( \mathcal {H}\right) \) [14, 26, 47], that is, an ensemble that fulfills \(\sum \nolimits _{j=1}^{k}\hbox {tr}\left( \sigma _{j} \rho \right) =k/d\) for every \(\rho \in {\mathcal {P}}\left( \mathcal {H} \right) \). In this case, we shall say that \(\sigma _{j}\) (\(j=1,\ldots ,k\)) constitute a POVM. Equivalently, we can define normalized rank-1 POVMs as complex projective 1-designs, where by a complex projective t -design (\(t\in \mathbb {N}\)) we mean an ensemble \(\left\{ \sigma _{j}:j=1,\ldots ,k\right\} \) such that

$$\begin{aligned} \frac{1}{k^{2}}\sum _{j,m=1}^{k}f\left( \hbox {tr}\left( \sigma _{j} \sigma _{m}\right) \right) =\int _{{\mathcal {P}}\left( {\mathbb {C}}^{d}\right) }\int _{{\mathcal {P}}\left( {\mathbb {C}}^{d}\right) }f\left( \hbox {tr} \left( \rho \sigma \right) \right) \hbox {d}m_{FS}\left( \rho \right) \hbox {d}m_{FS}\left( \sigma \right) \end{aligned}$$
(1)

for every \(f:\mathbb {R\rightarrow R}\) polynomial of degree t or less [104]. The equality \(\sum \nolimits _{j=1}^{k}\sigma _{j}=\left( k/d\right) \mathbb {I}\) is in turn equivalent to \(\sum \nolimits _{j=1} ^{k}b\left( \sigma _{j}\right) =0\), which gives the following simple characterization of normalized rank-1 POVMs in the language of Bloch vectors:

Proposition 1

The generalized Bloch representation gives a one-to-one correspondence between finite normalized rank-1 POVMs and finite (multi-)sets of points in \(B\left( d\right) \) with its center of mass at 0.

The probabilities of the measurement outcomes in the generalized Bloch representation take the form

$$\begin{aligned} p_{j}\left( \rho \right) =(d/k)\hbox {tr}\left( \sigma _{j}\rho \right) =(d\cdot \left\langle \left\langle b\left( \sigma _{j}\right) ,b\left( \rho \right) \right\rangle \right\rangle _{HS}+1)/k \end{aligned}$$
(2)

for \(\rho \in {\mathcal {P}}\left( {\mathbb {C}} ^{d}\right) \) and \(j=1,\ldots ,k\). Obviously, the probability of obtaining j-th outcome varies from 0, when the initial state is orthogonal to \(\sigma _j\), to d / k, when it coincides with \(\sigma _j\). In consequence, any outcome cannot be certain for given input state unless the measurement is projective (in which case \(k=d\)).

3 Symmetric, resolving and highly symmetric sets in metric spaces

In this section, we present a framework to investigate the concept of symmetry in metric spaces. Let us start from general definition. Let S be a subset of a metric space (Xr) which is homogeneous, i.e., the group of all isometries (surjective maps preserving metric r) acts transitively on X, that is, for every \(x,y\in X\), there exists an isometry \(f:X\rightarrow X\) such that \(f(x)=y\). By \(\hbox {Sym}\left( S\right) \), we denote the group of symmetries of S, that is, the group of all isometries leaving S invariant. We call S symmetric if \(\hbox {Sym}\left( S\right) \) acts transitively on S.

We say that S is a resolving set [45] if and only if \(r\left( a,x\right) = r\left( b,x\right) \) for every \(x\in S\) implies \(a=b\), for \(a,b\in X\). The following proposition belongs to folklore:

Proposition 2

If S is resolving, then \(f|_{S}=g|_{S}\) implies \(f=g\) for every \(f,g\in \hbox {Sym}\left( S\right) \). Moreover, if S is finite, then \(\hbox {Sym}\left( S\right) \) is finite.

Proof

Let \(f,g\in \hbox {Sym}\left( S\right) \), \(f|_{S}=g|_{S}\), and \(a\in X\). Then, for every \(x\in S\), we have \(r(fa,x)=r\left( a,f^{-1}x\right) =r\left( a,g^{-1}x\right) =r\left( ga,x\right) \). Hence, \(fa=ga\). Now, if \(\left| S\right| =k\), then \(\hbox {Sym}\left( S\right) \) is a subgroup of the symmetric group \(S_{k}\), and so is finite.

To single out sets of higher symmetry, we have to recall some notions from the general theory of group action, see, e.g., [48]. Let G be a group acting on X. For \(x\in X\), we define its orbit as \(Gx:=\left\{ gx:g\in G\right\} \) and its stabilizer (or isotropy subgroup) \(G_{x}\) as the set of elements in G that fix x, i.e., \(G_{x}:=\left\{ g\in G:gx=x\right\} \). Obviously, two points lying on the same orbit have conjugate stabilizers, since \(G_{gx}=gG_{x}g^{-1}\) for \(x\in X\) and \(g\in G\). The points of X with the same stabilizers up to a conjugacy are said to be of the same isotropy type, which is a measure of symmetry of points (orbits). The points of the same isotropy type as x form the orbit stratum \(\Sigma _{x}\). The decomposition of X into orbit strata is called the orbit stratification. Clearly, it induces a stratification of the orbit space X / G. The natural partial order on the set of all conjugacy classes of subgroups of G induces the order on the set of strata, namely \(\Sigma _{x}\prec \Sigma _{y}\) if and only if there exists \(g\in G\) such that \(G_{x}\subset gG_{y}g^{-1}\) for \(x,y\in X\), so that the maximal strata consist of points with maximal stabilizers.

Assume now that a non-empty finite set \(S\subset X\) is symmetric and consider the action of the group \(\hbox {Sym}\left( S\right) \) on X. Clearly, the whole set S is contained in one orbit and hence in one stratum. We shall say that S is highly symmetric if and only if this stratum is maximal. The following proposition gives a simple sufficient condition for the high symmetry.

Proposition 3

If \(\hbox {Sym}(S)\) acts primitively on S (i.e., the only \(\hbox {Sym}(S)\)-invariant partitions of S are trivial), and the set of its common fixed points in X is empty, then S is highly symmetric.

Proof

Put \(G:=\hbox {Sym}\left( S\right) \). As primitive action of G on S must be transitive, so S is symmetric. Assume that S is not highly symmetric. Then, there exists \(x \in S\) and \(y\in X {\setminus } S\) such that \(G_{x}\subsetneq G_{y}\). It follows from the primitivity of G that \(G_{x}\) is its maximal subgroup [66, Corollary 8.14]. Hence, \(G_{y}=G\), a contradiction.

If \(\hbox {Sym}\left( S\right) \) acts doubly transitively on S [66, p. 225], i.e., if for every \(x_{1},x_{2},y_{1},y_{2}\in S\), \(x_{1}\ne x_{2}\) and \(y_{1}\ne y_{2}\), there is \(g\in \hbox {Sym}(S)\) such that \(g\left( x_{i}\right) = y_{i}\) for \(i=1,2\), we shall call such a set super-symmetric after [133, 135]. It is well known that doubly transitive group action is primitive [66, Lemma 8.16]. Hence, we get

Corollary 1

If \(S\subset X\) is super-symmetric and the set of common fixed points of \(\hbox {Sym}(S)\) is empty, then S is highly symmetric.

Let \(S\subset X\) be symmetric. We say that \(\kappa :S\rightarrow X\) is \(\hbox {Sym}\left( S\right) \)-equivariant if and only if \(g\kappa (x)=\kappa (gx)\) for all \(g\in \hbox {Sym}\left( S\right) \) and for some (and hence for all) \(x\in S\). For such \(\kappa \), we call \(\kappa \left( S\right) \) a factor of S. Note that in this case, \(\hbox {Sym}\left( S\right) \subset \hbox {Sym}\left( \kappa \left( S\right) \right) \) and \(G_{x}\subset G_{\kappa \left( x\right) }\) for every \(x\in S\). A symmetric set is highly symmetric if and only if it does not have a non-trivial factor:

Proposition 4

Let \(S\subset X\) be symmetric. Then, S is highly symmetric if and only if every \(\hbox {Sym}\left( S\right) \)-equivariant map \(\kappa :S\rightarrow X\) is one to one.

Proof

If \(\left| S\right| =1\), then the proposition is trivial, as every singleton is highly symmetric. Assume that \(\left| S\right| \ge 2\) and put \(G:=\hbox {Sym}\left( S\right) \). If S is not highly symmetric, then there exist \(x\in S=Gx\) and \(y\notin S\) such that \(G_{x}\subsetneq G_{y}\). Put \(\kappa (gx):=g(y)\) for every \(g\in G\). Clearly, \(\kappa \) is well defined, \(\hbox {Sym}\left( S\right) \)-equivariant, and it is not one to one, since otherwise \(G_{y}\subset G_{x}\), which is a contradiction. On the other hand, take a G-equivariant map \(\kappa :S\rightarrow X\) that is not one to one. Then, there exist \(x\in S\) and \(g\in \hbox {Sym}\left( S\right) \) such that \(x\ne gx\) and \(\kappa \left( x\right) =\kappa \left( gx\right) =g \kappa \left( x\right) \), and so \(G_{x}\subsetneq G_{\kappa \left( x\right) }\), a contradiction.

It is interesting that an analogous idea was explored almost 50 years ago by Zajtz, who defined so-called primitive geometric objects in quite similar manner as highly symmetric sets defined above and proved the fact parallel to Proposition 4 [129, Theorem 1].

4 Symmetric, informationally complete and highly symmetric normalized rank-1 POVMs

To apply these general definitions to normalized rank-1 POVMs, note that from the celebrated Wigner theorem [125], it follows that for every separable Hilbert space \(\mathcal {H}\), the group of isometries of homogeneous metric space \(\left( {\mathcal {P}}\left( \mathcal {H}\right) ,D_{FS}\right) \) (quantum symmetries) is isomorphic to the projective unitary–antiunitary group \(\hbox {PUA}\left( \mathcal {H}\right) \), consisting of unitary and antiunitary transformations of \(\mathcal {H}\) defined up to phase factors, see also [27, 28, 49, 54, 73]. To be more precise, each such isometry is given by the map \(\sigma _{U}:{\mathcal {P}}\left( \mathcal {H}\right) \ni \rho \rightarrow U\rho U^{*}\in {\mathcal {P}}\left( \mathcal {H}\right) \) for a unitary or antiunitary U, and two such isometries coincide if and only if the corresponding transformations differ only by a phase. Equivalence classes of unitary isometries form a normal subgroup of \(\hbox {PUA}\left( \mathcal {H}\right) \) of index 2, namely the projective unitary group \(\hbox {PU}\left( \mathcal {H}\right) \). Clearly, every such isometry can be uniquely extended to a continuous affine map on \(\mathcal {S}\left( \mathcal {H}\right) \).

If \(\mathcal {H}={\mathbb {C}}^{d}\), then the generalized Bloch representation gives a one-to-one correspondence between the compact group \(\hbox {PUA}\left( d\right) \) and the group of isometries of the unit sphere in \((d^{2}-1)\)-dimensional real vector space \(\mathcal L_s^0\left( \mathbb {C}^{d} \right) \) endowed with the Hilbert–Schmidt product, whose action leaves the Bloch set \(b(\mathcal S\left( \mathbb {C}^{d} \right) ) \) invariant. This correspondence is given by \([U] \rightarrow \left\{ \rho \rightarrow U\rho U^{*}:\rho \in \mathcal L_s^0\left( \mathbb {C}^{d} \right) \right\} \) for \(U\in \hbox {UA}\left( d\right) \) (the unitary case is considered in [9], and it can be easily generalized to the antiunitary case). Hence, \(\hbox {PUA}\left( d\right) \) is isomorphic to a subgroup of the orthogonal group \(O\left( d^{2}-1\right) \). Moreover, \(m_{FS}\) is the unique \(\hbox {PUA}\left( d\right) \)-invariant measure on \({\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \simeq \mathbb {CP}^{d-1} \). In particular, for \(d=2\), we have \(\hbox {PUA}\left( 2\right) \simeq O\left( 3\right) \), and so all quantum symmetries of qubit states can be interpreted as rotations (for unitary symmetries, as \(\hbox {PU} \left( 2\right) \simeq SO\left( 3\right) \)), reflections or rotoreflections of the three-dimensional Euclidean space.

Taking this into account, we can transfer the notions of symmetry and high symmetry from \({\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \) to finite normalized rank-1 POVMs in \({\mathbb {C}}^{d}\). Let \(\varPi =(\varPi _{j} )_{j=1,\ldots ,k}\) be a finite normalized rank-1 POVM in \({\mathbb {C}}^{d}\), and S be a corresponding set of pure quantum states. We say that

  • \(\varPi \) is a symmetric POVM \(\Leftrightarrow \) S is symmetric in \(({{\mathcal {P}}}\left( {\mathbb {C}}^{d}\right) ,D_{FS})\);

  • \(\varPi \) is a highly symmetric POVM (HS-POVM) \(\Leftrightarrow \) S is highly symmetric in \(({\mathcal {P}}\left( {\mathbb {C}}^{d}\right) ,D_{FS})\).

For finite normalized rank-1 measurements, symmetric POVMs coincide with group covariant POVMs introduced by Holevo [61] and studied since then by many authors. We say that a measurement \(\varPi =(\varPi _{j} )_{j=1,\ldots ,k}\) is G -covariant for a group G if and only if there exists \(G\ni g\rightarrow \sigma _{U_{g}}\in \hbox {PUA}\left( d\right) \), a projective unitary–antiunitary representation of G (i.e., a homomorphism from G to \(\hbox {PUA}\left( d\right) \)), and a surjection \(s:G\rightarrow \left\{ 1,\ldots ,k\right\} \) such that \(\sigma _{U_{g_1}}(\varPi _{s( g_2)})=U_{g_1}\varPi _{s(g_2)} U_{g_1}^{*}=\varPi _{s(g_1g_2)}\) for all \(g_1,g_2\in G\). For the greater convenience, we can assume that \(\varPi \) is a multiset, and so we can label its elements by g instead of s(g): \(\varPi =(\varPi _g)_{g\in G}\). In order to guarantee that \(\sum _{g\in G}\varPi _g=\mathbb I\), we need to put \(\varPi _g=(|s(G)|/|G|)\varPi _{s(g)}\). Let \(\varPi \) be a finite normalized rank-1 POVM in \({\mathbb {C}}^{d}\), and S be a corresponding set of pure quantum states. It is clear that a symmetric finite normalized rank-1 POVM is \(\hbox {Sym}\left( S\right) \)-covariant, and conversely, if a finite normalized rank-1 POVM is G-covariant, then \(( \sigma _{U_{g}}) _{g\in G}\) is a subgroup of the group of isometries of \(({\mathcal {P}}\left( {\mathbb {C}}^{d}\right) ,D_{FS})\), acting transitively on the corresponding (multi-)set of pure states. We call the representation irreducible if and only if \(\mathbb {I}/d\) is the only element of \(\mathcal {S}\left( {\mathbb {C}}^{d}\right) \) invariant under action of the representation. It follows from the version of Schur’s lemma for unitary–antiunitary maps [46, Theorem II] that this definition coincides with the classical one. Irreducibility of the representation can be also equivalently expressed as follows: For any pure state \(\tau \in {\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \), its orbit under the action of the representation generates a rank-1 G-covariant POVM, i.e., \(\frac{1}{|G|}\sum _{g\in G}\sigma _{U_{g}}\left( \tau \right) =\mathbb {I}/d\), see also [118].

In the next section, we shall describe all HS-POVMs in dimension 2. From Corollary 1 and [133, Theorem 1], we already know that the SIC-POVM in dimension 2 (represented by a tetrahedron), the Hesse SIC-POVM in dimension 3, and the set of 64 Hoggar lines in dimension 8 are highly symmetric POVMs, see also [134]. Note that our definition of highly symmetric POVMs resembles the definition of highly symmetric frames introduced by Broome and Waldron [20, 21, 121]. However, they consider subsets of \({\mathbb {C}}^{d}\) rather than \(\mathbb {CP} ^{d-1}\) and unitary symmetries rather than projective unitary–antiunitary symmetries.

The next proposition clarifies the relations between the properties of the set of pure states constituting a finite normalized rank-1 POVM and the properties of its Bloch representation. We call a normalized rank-1 POVM \(\varPi =(\varPi _{j})_{j=1,\ldots ,k}\) informationally complete (respectively, purely informationally complete) if and only if the probabilities \(p_{j}\left( \rho \right) \) (\(j=1,\ldots ,k\)) determine uniquely every input state \(\rho \in \mathcal {S}\left( {\mathbb {C}}^{d}\right) \) (respectively, \({\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \)). Since we need \(d^2-1\) independent parameters to describe uniquely a quantum state, any IC-POVM must contain at least \(d^2\) elements. The following result provides necessary and sufficient conditions for informational completeness and purely informational completeness:

Proposition 5

Let \(\varPi =(\varPi _{j})_{j=1,\ldots ,k}\) be a finite normalized rank-1 POVM in \({\mathbb {C}}^{d}\) and \(S:=\left\{ \sigma _{j}:j=1,\ldots ,k\right\} \) be a corresponding set of pure quantum states, i.e., \(\sigma _{j}\in {\mathcal {P}}\left( \mathbb {C}^{d} \right) \) and \(\varPi _{j}=\left( d/k\right) \sigma _{j}\) for \(j=1,\ldots ,k\). Let us consider the following properties:

  1. (a)

    S is a complex projective 2-design;

  2. (b)

    \(b\left( S\right) \) is a normalized tight frame in \(\mathcal L_s^0\left( \mathbb {C}^{d} \right) \);

  3. (c)

    \(b\left( S\right) \) is a spherical 2-design in \(\mathcal L_s^0\left( \mathbb {C}^{d} \right) \);

  4. (d)

    \(\varPi \) is informationally complete;

  5. (e)

    \(b\left( S\right) \) generates \(\mathcal L_s^0\left( \mathbb {C}^{d} \right) \);

  6. (f)

    \(b\left( S\right) \) is a frame in \(\mathcal L_s^0\left( \mathbb {C}^{d} \right) \);

  7. (g)

    \(\varPi \) is purely informationally complete;

  8. (h)

    S is a resolving set in \(({\mathcal {P}}\left( \mathbb {C}^{d} \right) ,D_{FS})\);

  9. (i)

    \(b\left( S\right) \) is a resolving set in \(\left( B\left( d\right) ,D_{B}\right) \).

Then, \((a)\Leftrightarrow (b)\Leftrightarrow (c)\Rightarrow (d)\Leftrightarrow (e) \Leftrightarrow (f)\Rightarrow (g)\Leftrightarrow (h)\Leftrightarrow (i)\). Moreover, if \(d=2\), then \((g)\Rightarrow (d)\).

Proof

It is obvious that \((b)\Rightarrow (f)\) and \((d)\Rightarrow (g)\). The proof of \((a)\Leftrightarrow (b)\) can be found in [104, Proposition 13], \((b)\Leftrightarrow (c)\) in [120, p. 5] and \((d)\Leftrightarrow (e)\) in [59, Proposition 3.51]. It is well known that in finite-dimensional spaces, frames are generating sets, hence \((e)\Leftrightarrow (f)\). Furthermore, \((g)\Leftrightarrow (h)\Leftrightarrow (i)\) follows from the fact that the distances \(D_{FS}\) and \(D_{B}\) are ordinally equivalent, and from the equality \(\hbox {tr}\left( \rho \sigma \right) =\cos ^{2}D_{FS}\left( \rho ,\sigma \right) \) for \(\rho ,\sigma \in {\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \). Moreover, for \(d=2\), the notions of purely informational completeness and informational completeness coincide [58, Remark 1].

A POVM that satisfies (a) (or, equivalently, (b) or (c)) is called tight informationally complete POVM [104]. Note that (d) does not imply (b), even if S is symmetric and \(d=2\). To show this, consider \(S\subset {\mathcal {P}}\left( {\mathbb {C}}^{2}\right) \) such that \(b(S)=\{ 2^{-1/2}(e_{1} \pm e_{2}),\) \( 2^{-1/2}(- e_{1} \pm e_{3} )\} \), where \(\left\{ e_{1},e_{2},e_{3}\right\} \) is any orthonormal basis of \(\mathcal L_s^0\left( \mathbb {C}^{2} \right) \). Then, \(b\left( S\right) \) is a tetragonal disphenoid with the antiprismatic symmetry group \(D_{2d}\). Clearly, \(b\left( S\right) \) is a frame in \(\mathcal L_s^0\left( \mathbb {C}^{2} \right) \), but simple calculations show that it is not tight. On the other hand, one can prove \((d)\Rightarrow (b)\), under the additional assumption that the natural action of \(\hbox {Sym}\left( S\right) \) on \(\mathcal L_s^0\left( \mathbb {C}^{d} \right) \) is irreducible, applying [118, Theorem 6.3]. Moreover, as we shall see in the next section, all the conditions above are equivalent if S is highly symmetric and \(d=2\).

5 Classification of highly symmetric POVMs in dimension 2

Theorem 1

There are only eight types of HS-POVMs in two dimensions, seven exceptional informationally complete HS-POVM represented in \(\mathbb {R}^{3}\) by five Platonic solids (convex regular polyhedra): the tetrahedron, cube, octahedron, icosahedron and dodecahedron and two convex quasi-regular polyhedra: the cuboctahedron and icosidodecahedron, and an infinite series of non-informationally complete HS-POVMs represented in \(\mathbb {R}^{3}\) by regular polygons, including digon.

Proof

Let \(S=\left\{ \sigma _{j}:j=1,\ldots ,k\right\} \subset {\mathcal {P}}\left( {\mathbb {C}}^{2}\right) \simeq \mathbb {CP}\) constitute a HS-POVM, and let \(B:=b(S)\subset S^{2}\). Put \(G:=\hbox {Sym}\left( B\right) \). Then, it follows from the equivalence \((d)\Leftrightarrow (e)\) in Proposition 5 that either B is contained in a proper (one- or two-dimensional) subspace of \(\mathbb {R}^{3}\), or the POVM is informationally complete and, according to the implication \((d)\Rightarrow (h)\) in Proposition 5 and Proposition 2, G is finite.

If G is infinite, then necessarily the stabilizer of any element \(x\in B\) has to be infinite, since otherwise the whole orbit of x would be infinite. As the only linear isometries of \(\mathbb {R}^{3}\) leaving possibly x invariant are either rotations about the axis \(l_{x}\) through x , or reflections in any plane containing \(l_{x}\), the stabilizer \(G_{x}\) has to contain an infinite subgroup of rotations about \(l_{x}\). Thus, the orbit of any point beyond \(l_{x}\) under G must be infinite. In consequence, \(B=\left\{ -x,x\right\} \), and \(G=D_{\infty h}\simeq O\left( 2\right) \times C_{2}\).

If G is finite, it must be one of the point groups, i.e., finite subgroups of \(O\left( 3\right) \). The complete characterization of such subgroups has been known for very long time [105]. There exist seven infinite families of axial (or prismatic) groups \(C_{n}\), \(C_{nv}\), \(C_{nh}\), \(S_{2n}\), \(D_{n}\), \(D_{nd}\), and \(D_{nh}\), as well as seven additional polyhedral (or spherical) groups: T (chiral tetrahedral), \(T_{d}\) (full tetrahedral), \(T_{h}\) (pyritohedral), O (chiral octahedral), \(O_{h}\) (full octahedral), I (chiral icosahedral), and \(I_{h}\) (full icosahedral). Analyzing their standard action on \(S^{2}\) (see, e.g., [81, 88, 93, 96, 132]), one can find in all cases the orbits with maximal stabilizers. Gathering this information together, we get all highly symmetric finite subsets of \(S^{2}\), and so all HS-POVMs in two dimensions. These sets are listed in Table 1 together with their symmetry groups and the stabilizers of their elements with respect to these symmetry groups. For all but the first two types of HS-POVMs, the symmetry group G is a polyhedral group, and so it acts irreducibly on \(\mathbb {R}^{3}\). Hence, B must be a tight frame in all these cases. \(\square \)

Table 1 HS-POVMs in dimension 2, with their cardinalities, symmetry groups, and stabilizers of elements (in Schoenflies notation)

We have just shown that if \(S\subset {{\mathcal {P}}}\left( \mathbb {C}^{2} \right) \) constitutes an informationally complete HS-POVM in dimension 2, then b(S) is a spherical 2-design. However, it follows from [33, Theorem 2] and the form of corresponding group-invariant polynomials (listed in Sect. 8.2) that if \(\text {Sym}(b(S))=O_h\), then b(S) is a spherical 3-design, and if \(\text {Sym}(b(S))=I_h\), then b(S) is a spherical 5-design.

Classification of all finite symmetric subsets of \(S^{2}\) and, in consequence, all symmetric normalized rank-1 POVMs in two dimensions, is of course more complicated than for highly symmetric case. In particular, the number of such non-isometric subsets is uncountable. However, since each symmetric subset generates a vertex-transitive polyhedron in three-dimensional Euclidean space (and each such polyhedron is a symmetric set generating symmetric normalized rank-1 POVM), the task reduces to classifying such polyhedra, which was done by Robertson and Carter in the 1970s, see [34, 9597]. They proved that the transitive polyhedra in \(\mathbb {R}^{3}\) can be parameterized (up to isometry) by metric space (with the Hausdorff distance under the action of Euclidean isometries related closely to the Gromov–Hausdorff distance, see [85]), which is a two-dimensional CW complex with 0-cells corresponding exactly to highly symmetric subsets of \(S^{2}\).

Note that not only “regular polygonal” POVMs (e.g., the trine or “Mercedes-Benz” measurement for \(k=3\) [71] and the “Chrysler” measurement for \(k=5\) [126]), but also “Platonic solid” POVMs have been considered earlier by several authors in various quantum mechanical contexts, including quantum tomography, at least since 1989 [68, 70], see, for instance, [22, 25, 29, 42].

6 Entropy and relative entropy of measurement

6.1 Definition

Let \(\varPi =(\varPi _{j})_{j=1,\ldots ,k}\) be a finite POVM in \({\mathbb {C}}^{d}\). We shall look for the most “classical” (with respect to a given measurement) or “coherent” quantum states, i.e., for the states that minimize the uncertainty of the outcomes of the measurement. This uncertainty can be measured by the quantity called the entropy of measurement given by

$$\begin{aligned} H(\rho ,\varPi ):=\sum _{j=1}^{k}\eta \left( p_{j}\left( \rho ,\varPi \right) \right) , \end{aligned}$$
(3)

for \(\rho \in \mathcal {S}\left( {\mathbb {C}}^{d}\right) \), where the probability \(p_{j}\left( \rho ,\varPi \right) \) of the j-th outcome (\(j=1,\ldots ,k\)) is given by \(p_{j}\left( \rho ,\varPi \right) :=\hbox {tr}\left( \rho \varPi _{j}\right) \), and the Shannon entropy function \(\eta :\left[ 0,1\right] \rightarrow \mathbb {R}^{+}\) by \(\eta \left( x\right) :=-x\ln x\) for \(x>0\), and \(\eta \left( 0\right) :=0\). (In the sequel, we shall use frequently the identity \(\eta \left( xy\right) =\eta \left( x\right) y+\eta \left( y\right) x\), \(x,y\in \left[ 0,1\right] \).) Thus, the entropy of measurement \(H(\rho ,\varPi )\) is just the Boltzmann–Shannon entropy of the probability distribution of the measurement outcomes, assuming that the state of the system before the measurement was \(\rho \). This quantity (as well as its continuous analogue) has been considered by many authors, first in the 1960s under the name of Ingarden-Urbanik entropy or A -entropy, then, since the 1980s, in the context of entropic uncertainty principles [44, 76, 84, 123], and also quite recently for more general statistical theories [107, 108]. Wilde called it the Shannon entropy of POVM [126]. For a history of this notion, see [124] and [10].

The function \(H(\cdot ,\varPi ):\mathcal {S}\left( {\mathbb {C}}^{d}\right) \rightarrow \mathbb {R}\) is continuous and concave. In consequence, it attains minima in the set of pure states. Moreover, it is obviously bounded from above by \(\ln k\), the entropy of the uniform distribution, and for a normalized rank-1 POVM the upper bound is achieved for the maximally mixed state \(\rho _{*}:=\mathbb I/d\). The general bound from below is expressed with the help of the von Neumann entropy of the state \(\rho \) given by \(S(\rho ):=-\hbox {tr}(\rho \ln \rho )\) [78, Sect. 2.3]:

$$\begin{aligned} S\left( \rho \right) -\sum _{j=1}^k p_j\ln (\hbox {tr} (\varPi _j)) \le H(\rho ,\varPi )\le \ln k. \end{aligned}$$
(4)

Since for a normalized rank-1 POVM \(\hbox {tr}(\varPi _j)=d/k\) for all \(j=1,\ldots ,k\), we get

$$\begin{aligned} S\left( \rho \right) +\ln (k/d)\le H(\rho ,\varPi )\le \ln k. \end{aligned}$$
(5)

(Moreover, for \(\rho \in \mathcal {S}\left( {\mathbb {C}}^{d}\right) \), \(S\left( \rho \right) =\min H(\rho ,\varPi )\), where the minimum is taken over all normalized rank-1 POVMs \(\varPi \), see, e.g., [126, Sect. 11.1.2].) In consequence, for \(\rho \in {{\mathcal {P}}}\left( \mathbb {C}^{d} \right) \), we have

$$\begin{aligned} \ln (k/d) \le H(\rho ,\varPi ) \le \ln k. \end{aligned}$$
(6)

The first inequality in (6) follows also from the inequalities \(p_j(\rho ,\varPi ) \le d/k\) for every \(j=1,\ldots ,k\), and from the fact that \(\ln \) is an increasing function.

It is sometimes much more convenient to work with the relative entropy of measurement (with respect to the uniform distribution) [55, p. 67] that measures non-uniformity of the distribution of the measurement outcomes and is given by

$$\begin{aligned} \widetilde{H}(\rho ,\varPi ):=\ln k-H(\rho ,\varPi ), \end{aligned}$$
(7)

and to look for the states that maximize this quantity. Clearly, it follows from (5) that the relative entropy of measurement is bounded from below by 0, and from above by the relative von Neumann entropy of the state \(\rho \) with respect to the maximally mixed state \(\rho _{*}=I/d\):

$$\begin{aligned} 0\le \widetilde{H}(\rho ,\varPi )\le S\left( \rho |\rho _{*}\right) \le \ln d. \end{aligned}$$
(8)

6.2 Relation to informational power

The problem of minimizing entropy (and so maximizing relative entropy) is connected with the problem of maximization of the mutual information between ensembles of initial states (classical-quantum states) and the POVM \(\varPi \).

Let us consider an ensemble \(\mathcal {E}=\left( \left( \tau _{i}\right) _{i=1}^{m},\left( p_{i}\right) _{i=1}^{m}\right) \), where \(p_{i}\ge 0\) are a priori probabilities of density matrices \(\tau _{i}\in \mathcal {S} \left( {\mathbb {C}}^{d}\right) \), where \(i=1,\ldots ,m\), and \(\sum \nolimits _{i=1}^{m}p_{i}=1\). The mutual information between \(\mathcal {E}\) and \(\varPi \) is given by:

$$\begin{aligned} I\left( \mathcal {E},\varPi \right) :=\sum \limits _{i=1}^{m}\eta \left( \sum \limits _{j=1}^{k}P_{ij}\right) +\sum \limits _{j=1}^{k}\eta \left( \sum \limits _{i=1}^{m}P_{ij}\right) -\sum \limits _{j=1}^{k}\sum \limits _{i=1} ^{m}\eta \left( P_{ij}\right) \end{aligned}$$
(9)

where \(P_{ij}:=p_{i}\hbox {tr}\left( \tau _{i}\varPi _{j}\right) \) is the probability that the initial state of the system is \(\tau _{i}\) and the measurement result is j for \(i=1,\ldots ,m\) and \(j=1,\ldots ,k\).

The problem of maximization of \(I(\mathcal {E},\varPi )\) consists of two dual aspects [7, 63, 65]: maximization over all possible measurements, providing the ensemble \(\mathcal {E}\) is given, see, e.g., [39, 60, 101, 114], and (less explored) maximization over all ensembles, when the POVM \(\varPi \) is fixed [6, 90]. In the former case, the maximum is called accessible information. In the latter case, Dall’Arno et al. [6, 7] introduced the name informational power of \(\varPi \) for the maximum and denoted it by \(W\left( \varPi \right) \). Dall’Arno et al. [6] and, independently, Oreshkov et al. [90] showed that there always exists a maximally informative ensemble (i.e., ensemble that maximizes the mutual information) consisting of pure states only. Note that a POVM \(\varPi \) generates a quantum–classical channel \(\varPhi :\mathcal {S}\left( {\mathbb {C}}^{d}\right) \rightarrow \mathcal {S}\left( {\mathbb {C}}^{k}\right) \) given by \(\varPhi \left( \rho \right) =\sum _{j=1} ^{k}\hbox {tr}\left( \rho \varPi _{j}\right) \left| e_{j}\right\rangle \left\langle e_{j}\right| \), where \(\left( \left| e_{j}\right\rangle \right) _{j=1}^{k}\) is any orthonormal basis in \({\mathbb {C}}^{k}\). The minimum output entropy of \(\varPhi \) is equal to the minimum entropy of \(\varPi \), i.e., \(\min _{\rho \in {\mathcal {P}}\left( {\mathbb {C}}^{d}\right) } S(\varPhi (\rho ))=\min _{\rho \in {\mathcal {P}}\left( {\mathbb {C}}^{d}\right) } H(\rho ,\varPi )\) [106]. On the other hand, the informational power of \(\varPi \) can be identified [6, 64, 90] as the classical (Holevo) capacity \(\chi (\varPhi )\) of the channel \(\varPhi \), i.e.,

$$\begin{aligned} W(\varPi )=\chi (\varPhi ):=\max _{\left( \left( \tau _{i}\right) _{i=1}^{m},\left( p_{i}\right) _{i=1}^{m}\right) }\left\{ S\left( \sum \limits _{i=1}^{m} p_{i}\varPhi (\tau _{i})\right) -\sum \limits _{i=1}^{m}p_{i}S(\varPhi (\tau _{i}))\right\} . \end{aligned}$$

What are the relation between informational power and entropy minimization? It follows from [64, p. 2] that

$$\begin{aligned} I\left( \mathcal {E},\varPi \right) =H\left( \sum \limits _{i=1}^{m}p_{i}\tau _{i},\varPi \right) -\sum \limits _{i=1}^{m}p_{i}H\left( \tau _{i},\varPi \right) . \end{aligned}$$

Clearly, \(H\left( \sum \nolimits _{i=1}^{m}p_{i}\tau _{i},\varPi \right) \le \ln k\), and for \(i=1,\ldots ,k\), we have \(\min \limits _{\rho \in {\mathcal {P}}\left( {\mathbb {C}}^{d}\right) }H\left( \rho ,\varPi \right) \le H\left( \tau _{i},\varPi \right) \). Hence,

$$\begin{aligned} W\left( \varPi \right) \le \ln k-\min \limits _{\rho \in {\mathcal {P}}\left( {\mathbb {C}}^{d}\right) }H\left( \rho ,\varPi \right) = \max \limits _{\rho \in {\mathcal {P}}\left( {\mathbb {C}}^{d}\right) }\widetilde{H}\left( \rho ,\varPi \right) . \end{aligned}$$
(10)

In consequence, the equality in (10) holds if and only if there exists an ensemble \(\mathcal {E}=\left( \left( \tau _{i}\right) _{i=1} ^{m},\left( p_{i}\right) _{i=1}^{m}\right) \) such that \(\hbox {tr} \left( \left( \sum \nolimits _{i=1}^{m}p_{i}\tau _{i}\right) \varPi _{j}\right) =1/k\) for \(j=1,\ldots ,k\) and \(\tau _{1},\ldots ,\tau _{m}\in \arg \min H\).

Assume now that \(\varPi \) is normalized 1-rank POVM with \(\varPi _{j}=(d/k)\sigma _{j}\), for \(j=1,\ldots ,k\), where \(S:=\left\{ \sigma _{j}:j=1,\ldots ,k\right\} \subset {\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \). Then, applying the Carathéodory convexity theorem, we can characterize the situation, where the two maximization problems coincide:

Proposition 6

The following two conditions are equivalent:

  1. 1.

    \(\left\langle b\left( S\right) \right\rangle ^{\perp }\cap \hbox {conv}\left( b\left( \arg \min H\right) \right) \ne \emptyset \);

  2. 2.

    the equality in (10) holds.

Moreover, if \(\varPi \) is informationally complete, then (1) can be replaced by

\(0\in \hbox {conv}\left( b\left( \arg \min H\right) \right) \).

In particular, condition (1) is fulfilled if \(\varPi \) (and so S) is symmetric or if \(\hbox {Sym}(S)\) acts irreducibly on \({{\mathcal {P}}}\left( \mathbb {C}^{d} \right) \). (The latter is true, if, e.g., \(d=2\), and S is a union of pairs of orthogonal states.) To see this, it is enough to consider the ensemble consisting of equiprobable elements of the orbit of any minimizer of H under the action of \(\hbox {Sym}(S)\) and use the fact that \((1/\left| \hbox {Sym}(S)\right| )\sum \nolimits _{g\in \mathrm{Sym}(S)}\left( g\sigma \right) =I/d\) for all \(\sigma \in S\). This fact was observed already by Holevo in [62].

6.3 Relation to entropic uncertainty principles

The entropic uncertainty principles form another area of research related to quantifying the uncertainty in quantum theory. They were introduced by Białynicki-Birula and Mycielski [17], who showed that they are stronger than “standard” Heisenberg’s uncertainty principle, and Deutsch [44], who provided the first lower bound for the sum of entropic uncertainties of two observables independent on the initial state. This bound has been later improved by Maassen and Uffink [82, 117], and de Vicente and Sánchez-Ruiz [119]. The generalizations for POVMs (all previous results referred to PVMs) have been formulated subsequently in [56, 76, 84] and [94]. More detailed survey of the topic can be found in [123].

The entropic uncertainty relations are closely connected with entropy minimization. In fact, any lower bound for the entropy of measurement can be regarded as an entropic uncertainty relation for single measurement [76]. Moreover, combining m normalized rank-1 k-element POVMs \(\varPi ^{i}=(\varPi _{j}^{i})_{j=1,\ldots ,k}\) (\(i=1,\ldots ,m\)), we obtain another normalized rank-1 km-element POVM \(\varPi :=(\frac{1}{m}\varPi _{j} ^{i})_{j=1,\ldots ,k}^{i=1,\ldots ,m}\). Now, from an entropic uncertainty principle for \(\left( \varPi ^{i}\right) _{i=1,\ldots ,m}\) written in the form \(\frac{1}{m}\sum \nolimits _{i=1}^m H(\rho ,\varPi ^{i})\ge C>0\) [123, p. 3], we get automatically a lower bound for entropy of \(\varPi \), namely

$$\begin{aligned} H(\rho ,\varPi )=\frac{1}{m}\sum _{i=1}^m H(\rho ,\varPi ^{i})+\ln m\ge C+\ln m \end{aligned}$$
(11)

for \(\rho \in {\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \), and vice versa, proving a lower bound for entropy of \(\varPi \) we get immediately an entropic uncertainty principle.

To be more specific, assume now that \(m=2\) and \(\varPi _{j}^{i}=\left( d/k\right) \sigma _{j}^{i}\), where \(\sigma _{j}^{i}=\left| \varphi _{j}^{i}\right\rangle \left\langle \varphi _{j}^{i}\right| \in {\mathcal {P}}\left( {\mathbb {C}} ^{d}\right) \), denoting their Bloch vectors by \(x_{j}^{i}:=b\left( \sigma _{j}^{i}\right) \in \mathcal L_s^0\left( \mathbb {C}^{d} \right) \simeq \mathbb {R}^{d^{2}-1}\) for \(j=1,\ldots ,k\), \(i=1,2\). The Krishna-Parthasarathy entropic uncertainty principle [76, Corollary 2.6], combined with (11) gives us

$$\begin{aligned} H(\rho ,\varPi )&\ge \ln \left( 2k/d\right) -\ln \max _{j,l=1,\ldots ,k}\left| \left\langle \varphi _{j}^{1}|\varphi _{l}^{2}\right\rangle \right| \\&=\ln \left( 2k/d\right) -\frac{1}{2}\ln \left( \max _{j,l=1,\ldots ,k} \left\langle \left\langle x_{j}^{1},x_{l}^{2}\right\rangle \right\rangle _{HS} +1/d\right) \nonumber \end{aligned}$$
(12)

for \(\rho \in {\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \). In consequence, taking into account that the radius of the Bloch sphere is \(\sqrt{1-1/d}\), we get an upper bound for relative entropy

$$\begin{aligned} \widetilde{H}(\rho ,\varPi )&\le \ln d+\frac{1}{2}\ln \left( \max _{j,l=1,\ldots ,k} \left\langle \left\langle x_{j}^{1},x_{l}^{2}\right\rangle \right\rangle _{HS}+1/d\right) \\&=\ln d+\frac{1}{2}\ln \left( ( 1-1/d) \left( \max _{j,l=1,\ldots ,k} \cos \left( 2\theta _{jl}\right) \right) +1/d\right) ,\nonumber \end{aligned}$$
(13)

where \(\theta _{jl}:=\measuredangle \left( x_{j}^{1},x_{l}^{2}\right) /2\) for \(j,l=1,\ldots ,k\). As this upper bound does not depend on the input state \(\rho \), it gives us also an upper bound for the informational power of \(\varPi \).

If \(d=2\), this inequality takes a simple form

$$\begin{aligned} \widetilde{H}(\rho ,\varPi )\le \ln 2+\ln \max _{j,l=1,\ldots ,k} \left| \cos \theta _{jl}\right| . \end{aligned}$$
(14)

We may use this bound, e.g., for the “rectangle” POVM analyzed in Sect. 7, that can be treated as the aggregation of two pairs of antipodal points on the sphere representing two PVM measurements. In this case, we deduce from (14) that \(\widetilde{H}\le \ln 2+\ln \max \left( \left| \sin \left( \alpha /2\right) \right| ,\left| \cos \left( \alpha /2\right) \right| \right) \), where \(\alpha \) is the measure of the angle between the diagonals of the rectangle. In particular, for the “square” POVM, we get \(\widetilde{H}\le \frac{1}{2}\ln 2\). As we shall see in Sect. 9, this bound is actually reached for each of four states constituting the POVM and represented by the vertices of the square.

6.4 Relation to Wehrl entropy minimization

Let us consider now a normalized rank-1 POVM \(\varPi =(\varPi _{j})_{j=1,\ldots ,k}\) in \({\mathbb {C}}^{d}\) with \(\varPi _{j}=\left( d/k\right) \sigma _{j}\) for \(j=1,\ldots ,k\), where \(S:=\left\{ \sigma _{j}:j=1,\ldots ,k\right\} \subset {{\mathcal {P}}}\left( \mathbb {C}^{d} \right) \). Then, we get after simple calculations

$$\begin{aligned} H(\rho ,\varPi )=\frac{d}{k}\sum _{j=1}^{k}\eta \left( \hbox {tr}\left( \rho \sigma _{j}\right) \right) -\ln \left( d/k\right) \end{aligned}$$
(15)

and so

$$\begin{aligned} \widetilde{H}(\rho ,\varPi )=\ln d-\frac{d}{k}\sum _{j=1}^{k}\eta \left( \hbox {tr}\left( \rho \sigma _{j}\right) \right) \end{aligned}$$
(16)

for \(\rho \in {\mathcal {P}}\left( \mathbb {C}^{d} \right) \). Assume now that \(\varPi \) is symmetric (group covariant) and put \(G:=\hbox {Sym}\left( S\right) \). Then, for each \(\tau \in S\), we have \(S=\left\{ g\tau :g\in G\right\} \) and

$$\begin{aligned} \widetilde{H}(\rho ,\varPi )&= \ln d-\frac{d}{\left| S\right| }\sum _{\left[ g\right] \in G/G_{\tau } }\eta \left( \hbox {tr}\left( \rho \left( g\tau \right) \right) \right) \nonumber \\&= \ln d-\frac{d}{\left| G\right| }\sum _{g\in G}\eta \left( \hbox {tr}\left( \rho \left( g\tau \right) \right) \right) \end{aligned}$$
(17)

for \(\rho \in {\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \). The same formulae are true for any subgroup of \(\hbox {Sym}\left( S\right) \) acting transitively on S. Note that the behavior of the functions \(H(\cdot ,\varPi ), \widetilde{H}(\cdot ,\varPi ):{\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \rightarrow \mathbb {R}^{+}\) depends only on the choice of the fiducial state \(\tau \). Moreover, observe that both functions are G-invariant, as for \(\rho \in {\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \) and \(g \in G\) we have \(\hbox {tr}(\rho (g \tau )) = \hbox {tr} (\tau (g^{-1} \rho ))\), and so from (17), we get

$$\begin{aligned} H(\rho ,\varPi )=\frac{d}{\left| G\right| }\sum _{g\in G}\eta \left( \hbox {tr}\left( \tau \left( g\rho \right) \right) \right) -\ln (d/k) \end{aligned}$$
(18)

and

$$\begin{aligned} \widetilde{H}(\rho ,\varPi )=\ln d-\frac{d}{\left| G\right| }\sum _{g\in G} \eta \left( \hbox {tr}\left( \tau \left( g\rho \right) \right) \right) . \end{aligned}$$
(19)

Now, one can observe that relative entropy of symmetric POVM is closely related to the semiclassical quantum entropy introduced in 1978 by Wehrl for the harmonic oscillator coherent states [124] and named later after him. The definition was generalized by Schroeck [102], who analyzed its basic properties. Let G be a compact topological group acting unitarily and irreducibly on \({\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \). Fixing fiducial state \(\tau \in {\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \), we get the family of states \(\left( g\tau \right) _{g\in G/G_{\tau }}\) called (generalized or group) coherent states [1, 92] that fulfills the identity: \(\int _{G/G_{\tau }}g\tau d\mu ([g]_{G/G_{\tau }})=\mathbb {I}\), where \(\mu \) is the G-invariant measure on \(G/G_{\tau }\) such that \(\mu \left( G/G_{\tau }\right) =d\). Then, for \(\rho \in \mathcal {S}\left( {\mathbb {C}}^{d}\right) \), we define the generalized Wehrl entropy of \(\rho \) by

$$\begin{aligned} S_{\mathrm{Wehrl}}\left( \rho \right) :=\int _{G/G_{\tau }}\eta \left( \hbox {tr}\left( \rho \left( g\tau \right) \right) \right) d\mu (\left[ g\right] _{G/G_{\tau }}). \end{aligned}$$
(20)

It is just the Boltzmann–Gibbs entropy for the density function on \(\left( G/G_{\tau },\mu \right) \) called the Husimi function of \(\rho \) and given by \(G/G_{\tau }\ni \left[ g\right] _{G/G_{\tau }}\rightarrow \hbox {tr}\left( \rho \left( g\tau \right) \right) \in \mathbb {R} ^{+}\) that represents the probability density of the results of an approximate coherent states measurement (or in other words continuous POVM) [24, 38]. Then, the relative Boltzmann–Gibbs entropy of the Husimi distribution of \(\rho \) with respect to the Husimi distribution of the maximally mixed state \(\rho _{*}\), that is, the constant density on \(\left( G/G_{\tau },\mu \right) \) equal 1 / d, given by

$$\begin{aligned} S_{\mathrm{Wehrl}}\left( \rho |\rho _{*}\right) :=\ln d-S_{\mathrm{Wehrl}}\left( \rho \right) \end{aligned}$$
(21)

is a continuous analogue of \(\widetilde{H}(\cdot ,\varPi )\) given by (17). What is more, the relative entropy of measurement is just a special case of such transformed Wehrl entropy, when we consider the discrete coherent states (i.e., POVM) generated by a finite group. On the other hand, the entropy of measurement \(H(\cdot ,\varPi )\) has no continuous analogue, as it may diverge to infinity, where \(k\rightarrow \infty \). In principle, to define coherent states, we can use an arbitrary fiducial state. However, to obtain coherent states with sensible properties, one has to choose the fiducial state \(\tau \) to be the vacuum state, that is, the state with maximal symmetry with respect to G [75],[92, Sect. 2.4].

To investigate the Wehrl entropy, it is enough to require that G should be locally compact. In fact, Wehrl defined this quantity for the harmonic oscillator coherent states, where G is the Heisenberg–Weyl group \(H_{4}\) acting on projective (infinite dimensional and separable) Hilbert space, \(G_{\tau }\simeq U\left( 1\right) \times U\left( 1\right) \), and \(G/G_{\tau }\simeq {\mathbb {C}}\). This notion was generalized by Lieb [79] to spin (Bloch) coherent states, with \(G=SU(2)\) acting on \(\mathbb {CP}^{d-1}\) (\(d\ge 2\)), \(G_{\tau }\simeq U(1)\) and \(G/G_{\tau }\simeq S^{2}\). In this paper, Lieb proved that for harmonic oscillator coherent states, the minimum value of the Wehrl entropy is attained for coherent states themselves. (It follows from the group invariance that this quantity is the same for each coherent state.) He also conjectured that the statement is true for spin coherent states, but, despite many partial results, the problem, called the Lieb conjecture, had remained open for next 35 years until it was finally proved by Lieb himself and by Solovej in 2012 [80]. They also expressed the hope that the same result holds for SU(N) coherent states for arbitrary \(N\in \mathbb {N}\), or even for any compact connected semisimple Lie group (the generalized Lieb conjecture), see also [53, 108]. Bandyopadhyay received recently some partial results in this direction for \(G=SU(1,1)\) coherent states [11], where \(G_{\tau }\simeq U(1)\) and \(G/G_{\tau }\) is the hyperbolic plane.

For finite groups and covariant POVMs, the minimization of Wehrl entropy is equivalent to the maximization of the relative entropy of measurement, which is in turn equivalent to the minimization of the entropy of measurement. Consequently, one could expect that the entropy of measurement should be minimal for the states constituting the POVM that are already known to be critical as inert states. We shall see in Sect. 8 that this need not be always the case. In particular, it is not true for the tetrahedral POVM or in the situation where the states constituting a POVM form a regular polygon with odd number of vertices. Thus, it is conceivable that to prove the “generalized Lieb conjecture,” some additional assumptions will be necessary.

6.5 Relation to quantum dynamical entropy

As in the preceding section, let \(\varPi =(\varPi _{j})_{j=1,\ldots ,k}\) be a finite normalized rank-1 POVM in \({\mathbb {C}}^{d}\), and let \(S=\left\{ \sigma _{j}:j=1,\ldots ,k\right\} \) be a corresponding (multi-)set of pure quantum states. Set \(\sigma _{j}=\left| \varphi _{j}\right\rangle \left\langle \varphi _{j}\right| \), where \(\varphi _{j}\in {\mathbb {C}}^{d}\), \(\left\| \varphi _{j}\right\| =1\). Assume that successive measurements described by the generalized Lüders instrument connected with \(\varPi \), where \(\sigma _i\) serve as the “output states,” are performed on an evolving quantum system, and that the motion of the system between two subsequent measurements is governed by a unitary matrix U. Clearly, the sequence of measurements introduces a non-unitary evolution, and the complete dynamics of the system can be described by a quantum Markovian stochastic process, see [110].

The results of consecutive measurements are represented by finite strings of letters from a k-element alphabet. Probability of obtaining the string \(\left( i_{1},\ldots ,i_{n}\right) \), where \(i_{j}=1,\ldots ,k\) for \(j=1,\ldots ,n\) and \(n\in \mathbb {N}\) is then given by

$$\begin{aligned} P_{i_{1},\ldots ,i_{n}}\left( \rho \right) :=p_{i_{1}}\left( \rho \right) \cdot {\textstyle \prod \nolimits _{m=1}^{n-1}} p_{i_{m}i_{m+1}}, \end{aligned}$$
(22)

where \(\rho \) is the initial state of the system, \(p_{i}\left( \rho \right) :=\left( d/k\right) \hbox {tr}\left( \rho \sigma _{i}\right) \) is the probability of obtaining i in the first measurement, and \(p_{ij}:=\left( d/k\right) \hbox {tr}\left( U\sigma _{i}U^{*}\sigma _{j}\right) =\left( d/k\right) \left| \left\langle \varphi _{j}|U|\varphi _{i}\right\rangle \right| ^{2}\) is the probability of getting j as the result of the measurement, providing the result of the preceding measurement was i, for \(i,j=1,\ldots ,k\) [108, 110]. The randomness of the measurement outcomes can be analyzed with the help of (quantum) dynamical entropy, the quantity introduced for the Lüders–von Neumann measurement independently by Srinivas [112], Pechukas [91], Beck and Graudenz [13] and many others, see [110, p. 5685], then generalized by Życzkowski and one of the present authors (W.S.) to arbitrary classical or quantum measurements and instruments [77, 108, 110, 111], and recently rediscovered by Crutchfield and Wiesner under the name of quantum entropy rate [35].

The definition of (quantum) dynamical entropy of U with respect to \(\varPi \) mimics its classical counterpart, the Kolmogorov–Sinai entropy:

$$\begin{aligned} H\left( U,\varPi \right) :=\lim _{n\rightarrow \infty }(H_{n+1}-H_{n})=\lim _{n\rightarrow \infty }H_{n}/n, \end{aligned}$$
(23)

where \(H_{n}:=\sum _{i_{1},\ldots ,i_{n}=1}^{k}\eta \left( P_{i_{1},\ldots ,i_{n} }\left( \rho _{*}\right) \right) \) for \(n\in \mathbb {N}\). The maximally mixed state \(\rho _{*}=\mathbb I/d\) plays here the role of the “stationary state” for combined evolution. It is easy to show that the quantity is given by

$$\begin{aligned} H\left( U,\varPi \right)&=\frac{1}{k}\sum _{i,j=1}^{k}\eta \left( \left( d/k\right) \hbox {tr}\left( U\sigma _{i}U^{*}\sigma _{j}\right) \right) \nonumber \\&=\ln \left( k/d\right) +\frac{d}{k^{2}}\sum _{i,j=1}^{k}\eta \left( \hbox {tr}\left( U\sigma _{i}U^{*}\sigma _{j}\right) \right) \nonumber \\&=\ln \left( k/d\right) +\frac{d}{k^{2}}\sum _{i,j=1}^{k}\eta \left( \left| \left\langle \varphi _{i}|U|\varphi _{j}\right\rangle \right| ^{2}\right) , \end{aligned}$$
(24)

which is a special case of much more general integral entropy formula [108]. Using (15) and (24), we see that the dynamical entropy of U is expressed as the mean entropy of measurement over output states transformed by U:

$$\begin{aligned} H\left( U,\varPi \right) =\frac{1}{k}\sum _{i=1}^{k}H(U\sigma _{i}U^{*},\varPi ) \end{aligned}$$
(25)

There are two sources of randomness in this model: the underlying unitary dynamics and the measurement process. The latter can be measured by the quantity \(H_{\mathrm{meas}}\left( \varPi \right) :=H\left( \mathbb I,\varPi \right) \) called (quantum) measurement entropy. From (25), we get

$$\begin{aligned} H_{\mathrm{meas}}\left( \varPi \right) =\frac{1}{k}\sum _{i=1}^{k}H(\sigma _{i},\varPi ). \end{aligned}$$
(26)

If \(\varPi \) is symmetric, then all the summands in (26) are the same. Hence, in this case, the measurement entropy \(H_{\mathrm{meas}}\left( \varPi \right) \) is equal to the entropy of measurement \(H(\rho ,\varPi )\), where the input state \(\rho \) is one of the output states from S.

6.6 Entropy in the Bloch representation

Using the Bloch representation for states and normalized rank-1 POVMs (see Sect. 2), one can reformulate the problems of entropy minimization and relative entropy maximization as the problems of finding the global extrema of the corresponding function on \(B(d)\subset S^{d^2-2}\). Such reformulation significantly reduces the complexity of the problem, especially in dimension 2, since in this case B(2) is isomorphic with \(S^2\). Let \(\varPi =(\varPi _{j} )_{j=1,\ldots ,k}\) be a normalized rank-1 POVM in \({\mathbb {C}}^{d}\) such that \(\varPi _{j}=\left( d/k\right) \sigma _{j}\), \(\sigma _{j}\in {\mathcal {P}}\left( {\mathbb {C}}^{d}\right) \), and let \(B := \left\{ v_{j}|j=1,\ldots ,k\right\} \), where \(v_{j} := \sqrt{d/(d-1)}b(\sigma _{j}) \in S^{d^2-2}\) (\(j=1,\ldots ,k\)). For \(\rho \in \mathcal {S}\left( {\mathbb {C}}^{d}\right) \), \(u := \sqrt{d/(d-1)}b(\rho ) \in B^{d^2-2}\) and \(j=1,\ldots ,k\), we get from (2)

$$\begin{aligned} p_j(\rho ,\varPi )=((d-1)u\cdot v_j+1)/k. \end{aligned}$$
(27)

Applying (27), (15), and (16), we obtain

$$\begin{aligned} H_{B}(u):=H(\rho ,\varPi )=\sum _{j=1}^{k}\eta \left( \frac{(d-1)u\cdot v_{j}+1}{k}\right) =\ln \frac{k}{d}+\frac{d}{k}\sum _{j=1}^{k}h\left( u\cdot v_{j}\right) \end{aligned}$$
(28)

and

$$\begin{aligned} \widetilde{H}_{B}(u):=\widetilde{H}(\rho ,\varPi )=\ln d-\frac{d}{k}\sum _{j=1} ^{k}h\left( u\cdot v_{j}\right) , \end{aligned}$$
(29)

where the function \(h:\left[ -1/(d-1),1\right] \rightarrow \mathbb {R}^{+}\) is given by

$$\begin{aligned} h\left( t\right) :=\eta \left( \frac{(d-1)t+1}{d}\right) \end{aligned}$$
(30)

for \(-1/(d-1)\le t\le 1\). It is clear that the functions \(H_{B}\) and \(\widetilde{H}_{B}\) restricted to \(\sqrt{d/(d-1)}B(d)\) are of \(C^{2}\) class (even analytic) except at the points “orthogonal” to the points from B, in the sense that they represent orthogonal states to states \(\sigma _j\), \(j=1,\ldots ,k\). For \(d=2\), they are just the points antipodal to the points from B. Despite the fact that the function h is non-differentiable at \(-1/(d-1)\), some standard calculations show that in dimension 2, \(H_B\) and \(\widetilde{H}_B\) are at these points of \(C^{1}\) class but not twice differentiable.

For \(d=2\), it follows from (28) that if B is contained in a plane L, then \(H_{B}\) attains global minima on this plane. Indeed, for \(u \in S^2\), we have \(H_B(u)=H_B(\widetilde{u})\), where \(\widetilde{u}\) is the orthogonal projection of u onto L, as \(u\cdot v_{j}=\widetilde{u}\cdot v_{j}\) for \(j=1,\ldots ,k\).

For a symmetric POVM, there exists a finite group \(G\subset O\left( d^2-1\right) \) acting transitively on B. It follows from (28) and (29) that \(H_{B}\) and \(\widetilde{H}_{B}\) are G-invariant functions on B(d) given by

$$\begin{aligned} H_{v}(u):=H_{B}(u)=\ln \frac{\left| Gv\right| }{d}+\frac{d}{\left| G\right| }\sum _{g\in G}h\left( gu\cdot v\right) \end{aligned}$$
(31)

and

$$\begin{aligned} \widetilde{H}_{v}(u):=\widetilde{H}_{B}(u)=\ln d-\frac{d}{\left| G\right| }\sum _{g\in G}h\left( gu\cdot v\right) , \end{aligned}$$
(32)

for \(u\in B(d)\), where \(v \in B=\left\{ gv:g\in G\right\} \) is the normalized Bloch vector of an arbitrary fiducial state. This fact allows us to use the theory of solving symmetric variational problems developed by Louis Michel and others in the 1970s, and applied since then in many physical contexts, especially in analyzing the spontaneous symmetry breaking phenomenon [87].

7 Local extrema of entropy for symmetric POVMs in dimension 2

We start by quoting several theorems concerning smooth action of finite groups on finite-dimensional manifolds. They are usually formulated for compact Lie groups, but since finite groups are zero-dimensional Lie group, thus the results apply equally well in this case.

Let G be a finite group of \(C^{1}\) maps acting on a compact finite-dimensional manifold M. In the set of strata, consider the order \(\prec \) introduced in Sect. 3. Then,

Theorem

(Montgomery and Yang [88, Theorem 4a], [48, 89]) The set of strata is finite. There exists a unique minimal stratum, comprising elements of trivial stabilizers, that is open and dense in M, called generic or principal. For every \(x\in M\), the set \(\bigcup \left\{ \Sigma _{u}:u\in M,~\Sigma _{x}\preceq \Sigma _{u}\right\} \) is closed in M; in particular, the maximal strata are closed.

The next result tells us where we should look for the critical points of an invariant function, i.e., the points where its gradient vanishes: We have to focus on the maximal strata of G action on M.

Theorem

(Michel [88, Corollary 4.3], [86]) Let \(F:M\rightarrow \mathbb {R}\) be a G-invariant \(C^{1}\) map, and let \(\Sigma \) be a maximal stratum. Then,

  1. 1.

    \(\Sigma \) contains some critical points of F;

  2. 2.

    if \(\Sigma \) is finite, then all its elements are critical points of F.

Such points are called inert states in physical literature; they are critical regardless of the exact form of F, see, e.g., [127]. Of course, an invariant function can have other critical points than those guaranteed by the above theorem (non-inert states). However, we shall see that for highly symmetric POVMs in dimension 2, the global minima of entropy function \(H_{B}\) lie always on maximal strata. Although Michel’s theorem indicates a special character of the points with maximal stabilizer, it does not give us any information about the nature of these critical points. In some cases, we can apply the following result:

Theorem

(Modern Purkiss Principle [122, p. 385]) Let \(F:M\rightarrow \mathbb {R}\) be a G-invariant \(C^{2}\) map, and let \(u\in M\). Assume that the action of the linear isotropy group \(\left\{ T_{u}h:h\in G_{u}\right\} \) on \(T_{u}M\) is irreducible. Then, u is a critical point of F, which is either degenerate (i.e., the Hessian of F is singular at u) or a local extremum of F.

For a finite group \(G\subset \text {O}(2)\) acting irreducibly on the sphere \(S^2\), points lying on its rotation axes form the maximal strata, and so, it follows from Michel’s theorem that they are critical for entropy functions. We can divide them into three categories depending on whether they are antipodal to the elements of the fiducial vector’s orbit (type I) and, if not, whether their stabilizers act irreducibly on the tangent space (type II) or not (type III). In the first case, as well as, generically, in the second case, we can determine the character of critical point using the following proposition:

Proposition 7

Let \(u\in S^{2}\) be a point lying on a rotation axis of the group \(G\subset \text {O}(2)\) acting irreducibly on the sphere \(S^{2}\), and let v denote the normalized Bloch vector of an arbitrary fiducial state for a rank-1 G-covariant POVM. Then:

  1. 1.

    If there exists \(g\in G\) such that \(u=-gv\), then u is a local minimizer (respectively, maximizer) for \(H_{v}\) (respectively, \(\widetilde{H}_{v}\));

  2. 2.

    If \(u\ne -gv\) for every \(g\in G\) and the linear isotropy group \(\left\{ T_{u}g:g\in G_{u}\right\} \) acts irreducibly on \(T_{u}S^{2}\) (or, equivalently, \(G_{u}\) contains a cyclic subgroup of order greater than 2), then:

    1. (a)

      if

      $$\begin{aligned} \frac{2}{|G/G_{u}|}\sum _{\left[ h\right] \in G/G_{u}}(hu\cdot v)\ln (1+hu\cdot v)>1, \end{aligned}$$
      (33)

      then u is a local minimizer (respectively, maximizer) for \(H_{v}\) (respectively, \(\widetilde{H}_{v}\)),

    2. (b)

      if

      $$\begin{aligned} \frac{2}{|G/G_{u}|}\sum _{\left[ h\right] \in G/G_{u}}(hu\cdot v)\ln (1+hu\cdot v)<1, \end{aligned}$$
      (34)

      then u is a local maximizer (respectively, minimizer) for \(H_{v}\) (respectively, \(\widetilde{H}_{v}\)).

Proof

Fix any geodesic (i.e., a great circle) passing by u. Let q be one of two vectors lying on the intersection of the plane orthogonal to u passing through 0 and the geodesic. As 0 is the only G-invariant vector in \(\mathbb {R}^{3}\) and, at the same time, the only \(G_{u}\)-invariant element orthogonal to u, we have \((1/|G_u|)\sum _{g\in G}gu=\sum _{\left[ h\right] \in G/G_{u}}hu=0={\textstyle \sum _{g\in G_{u}}}gq\). Consider a natural parametrization of the great circle \(\gamma :(-\pi ,\pi )\rightarrow \mathbb {S}^{2}\) (throwing away \(-u\)) given by \(\gamma \left( \delta \right) :=(\sin \delta )q+(\cos \delta )u\) for \(\delta \in (-\pi ,\pi )\), where \(\delta \) is the measure of the angle between vectors u and \(\gamma \left( \delta \right) \). Put \(w:=\gamma \left( \delta \right) \). Then, it follows from (30), (31) and the equality \(\sum _{g\in G} gw=0\) that

$$\begin{aligned} \left( H_{v}\circ \gamma \right) \left( \delta \right)&=H_{v}(w)\\&= \ln \frac{|Gv|}{2}+\frac{2}{|G|} \sum _{g \in G} \eta ( (1+gw\cdot v)/2)\\&=\ln {|Gv|}+\frac{1}{|G|}\sum _{g\in G}\eta (1+gw\cdot v)\\&=\ln {|Gv|}+\frac{1}{|G|}\sum _{\left[ h\right] \in G/G_{u}}\sum _{g\in G_u}\eta (1+hgw\cdot v)\\&=\ln {|Gv|}+\frac{1}{|G|}\sum _{\left[ h\right] \in G/G_{u}}\sum _{g\in G_u}\eta (1+(\sin \delta )hgq\cdot v+(\cos \delta )hu\cdot v)\\&=\ln {|Gv|}+\frac{1}{|G/G_{u}|}\sum _{\left[ h\right] \in G/G_{u}} f_{h}(\delta ), \end{aligned}$$

where

$$\begin{aligned} f_{h}(\delta ):=\frac{1}{|G_{u}|}\sum _{g\in G_{u}}\eta (1+(\sin \delta )hgq\cdot v+(\cos \delta )hu\cdot v). \end{aligned}$$

Let \(h\in G\) be such that \(hu\ne -v\). Then, for \(\delta \) small enough, we get

$$\begin{aligned} f_{h}^{\prime }(\delta )&=\frac{1}{|G_{u}|}\sum _{g\in G_{u}}\eta ^{\prime }(1+(\sin \delta )hgq\cdot v+(\cos \delta )hu\cdot v)\\&\quad \times ((\cos \delta )hgq\cdot v-(\sin \delta )hu\cdot v). \end{aligned}$$

In particular, \(f_{h}^{\prime }(0)=0\). Moreover,

$$\begin{aligned} f_{h}^{\prime \prime }(\delta )&=\frac{1}{|G_{u}|}\sum _{g\in G_{u}} \eta ^{\prime \prime }(1+(\sin \delta )hgq\cdot v+(\cos \delta )hu\cdot v)\\&\quad \times ((\cos \delta )hgq\cdot v-(\sin \delta )hu\cdot v)^{2}\\&\quad +\eta ^{\prime }(1+(\sin \delta )hgq\cdot v+(\cos \delta )hu\cdot v)(-(\sin \delta )hgq\cdot v-(\cos \delta )hu\cdot v)). \end{aligned}$$

(1) Let \(\widetilde{h}u=-v\) for some \(\widetilde{h}\in G\). Then, \(\widetilde{h}gq\cdot v=0\) and so \(f_{\widetilde{h}}(\delta )\) reduces to

$$\begin{aligned} f_{\widetilde{h}}(\delta )=\frac{1}{|G_{u}|} \sum _{g\in G_u}\eta (1-\cos \delta ). \end{aligned}$$

In consequence, for \(\delta \ne 0\)

$$\begin{aligned} f_{\widetilde{h}}^{\prime }(\delta )=-(\ln (1-\cos \delta )+1) \sin \delta , \end{aligned}$$

and so

$$\begin{aligned} f_{\widetilde{h}}^{\prime \prime }(\delta )=-1-(\cos \delta )(\ln (1-\cos \delta )+2). \end{aligned}$$

Since \(f_{\widetilde{h}}^{\prime }(\delta )\rightarrow 0\) as \(\delta \rightarrow 0\), so \(f_{\widetilde{h}}^{\prime }(0)=0\). Moreover, \(f_{\widetilde{h}}^{\prime \prime }(\delta )\rightarrow \infty \) as \(\delta \rightarrow 0\). Let us observe that there exists \(1>c>0\), such that the inequality \(hu\cdot v\ge -1+c\) holds for any \([h]\in G/G_{u}\),\(\ \left[ h\right] \ne [\widetilde{h}]\). Now, we can estimate \(|f_{h}^{\prime \prime }(\delta )|\) as follows:

$$\begin{aligned} |f_{h}^{\prime \prime }(\delta )|&\le \frac{1}{|G_{u}|}\sum _{g\in G_{u} }\left| \frac{((\cos \delta )hgq\cdot v-(\sin \delta )hu\cdot v)^{2}}{1+hgw\cdot v}\right| \\&\quad +\frac{1}{|G_{u}|}\sum _{g\in G_{u}}\left| (\ln (1+hgw\cdot v)+1)(hgw\cdot v)\right| \\&\le \frac{1}{|G_{u}|}\sum _{g\in G_{u}}\left( \frac{1}{\left| 1+hgw\cdot v\right| }+(|\ln (1+hgw\cdot v)|+1)\right) \\&\le f\left( 1-|\sin \delta |+(c-1)\cos \delta \right) , \end{aligned}$$

for \(\left| \delta \right| <c\), where \(f\left( x\right) :=\frac{1}{\left| x\right| }+\left| \ln x\right| +1\) for \(x>0\). The last inequality follows from the fact that f is decreasing in (0, 1), \(1 + hgw\cdot v \ge 1 - |\sin \delta | + (c-1) \cos \delta \) and \(1 - |\sin \delta | + (c-1) \cos \delta \ge 0\) for \(|\delta | < c\).

Thus,

$$\begin{aligned} (H_{v}\circ \gamma )^{\prime \prime }(\delta )&=\frac{1}{|G/G_{u} |}\Big (f_{\widetilde{h}}^{\prime \prime }(\delta )+\sum _{[h]\in G/G_{u},[h]\ne [\widetilde{h}]}f_{h}^{\prime \prime }(\delta )\Big )\\&\ge g\left( \delta \right) \overset{\delta \rightarrow 0}{\longrightarrow }+\infty , \end{aligned}$$

where

$$\begin{aligned} g\left( \delta \right) :=-\frac{1+(\cos \delta )(\ln (1{-}\cos \delta )+2)+(|G/G_{u}|{-}1)f(1{-}\sin \delta +(c{-}1)\cos \delta )}{|G/G_{u}|} \end{aligned}$$

for \(\delta >0\). In particular, \((H_v \circ \gamma )'(0) = 0\) and there is \(\varepsilon >0\) such that \((H_{v} \circ \gamma )^{\prime \prime }(\delta )>0\) for \(\left| \delta \right| <\varepsilon \). Hence, one can find a neighborhood \(\mathcal {V}\subset S^{2}\) of u such that for any geodesic passing by u, \(H_{v}\) is strictly convex on its part contained in \(\mathcal {V}\) and has minimum at u. Consequently, \(H_{v}(u)>H_{v}(w)\) for every \(w\in \mathcal {V}\), \(w\ne u\), which completes the proof of (1).

(2) Assume that \(u\ne -gv\) for every \(g\in G\) and the linear isotropy group \(\left\{ T_{u}g:g\in G_{u}\right\} \) acts irreducibly on \(T_{u}S^{2}\). Then, for every \(h\in G\), we have

$$\begin{aligned} f_{h}^{\prime \prime }(0)&=\frac{1}{|G_{u}|}\sum _{g\in G_{u}}(\eta ^{\prime \prime }(1+hu\cdot v)(hgq\cdot v)^{2}+\eta ^{\prime }(1+hu\cdot v)(-hu\cdot v))\\&=\eta ^{\prime }(1+hu\cdot v)(-hu\cdot v)+\eta ^{\prime \prime }(1+hu\cdot v)\frac{1}{|G_{u}|}\sum _{g\in G_{u}}(hgq\cdot v)^{2}\\&=(hu\cdot v)(\ln (1+hu\cdot v)+1)-\frac{1}{1+hu\cdot v}\frac{1}{2}(1-(hu\cdot v)^{2})\\&=(hu\cdot v)(\ln (1+hu\cdot v)+3/2)-1/2, \end{aligned}$$

where the last but one identity follows from the fact that \(\{hgq:g\in G_{u}\}\) is a normalized tight frame in \(S^{2}\) contained in the plane orthogonal to hu for each \(h\in G_{u}\). Thus, we obtain

$$\begin{aligned} (H_{v}\circ \gamma )^{\prime \prime }(0)=\frac{1}{|G/G_{u}|}\sum _{\left[ h\right] \in G/G_{u}}(hu\cdot v)\ln (1+hu\cdot v)-1/2 \end{aligned}$$

and (2) follows from the Modern Purkiss principle.

If \(\varPi \) is a HS-POVM, we can assume that G is one of the following groups: \(D_{nh}\), \(T_{d}\), \(O_{h}\) or \(I_{h}\), and the Bloch vector v of the fiducial vector lies in the maximal strata, consisting of points where the rotation axes of the group intersect the Bloch sphere. For \(D_{nh}\) group, we have one n-fold and n twofold rotation axes (\(2n+2\) points: a digon and two regular n-gons); for \(T_{d}\) group: three twofold, four threefold rotation axes (14 points: an octahedron and two dual tetrahedra); for \(O_{h}\) group: six twofold, four threefold, three fourfold rotation axes (26 points: a cuboctahedron, a cube and an octahedron); for \(I_{h}\) group: fifteen twofold, ten threefold, six fivefold rotation axes (62 points: an icosidodecahedron, a dodecahedron, and an icosahedron). The character of these singularities is described by the following proposition.

Proposition 8

In the situation above, singular points of type I are minima (respectively, maxima), of type II maxima (respectively, minima), and of type III saddle points for \(H_{B}\) (respectively, \(\widetilde{H}_{B}\)).

The proof of this fact is quite elementary. From Proposition 7.1, we deduce the character of singular points of type I. For type II, it is enough to use Proposition 7.2. For type III, one have to indicate two great circles such that the second derivatives along these curves have different sign. As we will not use this fact in the sequel, we omit the details.

Hence, the points of type I are the natural candidates for minimizing \(H_{B}\) (respectively, maximizing \(\widetilde{H}_{B}\)), and indeed, we will show in the next section that they are global minimizers (respectively, maximizers). However, if a POVM is merely symmetric, the global extrema of entropy functions may also occur in other (non-inert) points. An example of this phenomenon can be found in [52], see also [19, 136]. Let us consider a symmetric (but non-highly symmetric) POVM generated by the set of four Bloch vectors forming a rectangle \(B=\left\{ v_{1},-v_{1},v_{2},-v_{2}\right\} \), where \(v_{1},v_{2}\in S^{2}\), \(v_{1}\notin \left\{ -v_{2},v_{2}\right\} \), and \(v_{1}\cdot v_{2}\ne 0\), with \(\hbox {Sym}\left( B\right) \simeq D_{2h}\) having three mutually perpendicular twofold rotation axis. In this way, we get six vectors in \(S^{2}\) that are necessarily critical for \(H_{B}\) and \(\widetilde{H}_{B}\): two perpendicular both to \(v_{1}\) and to \(v_{2}\), and four lying in the plane generated by \(v_{1}\) and \(v_{2}\), proportional to \(\pm v_{1}\pm v_{2}\). The former are local maxima of \(H_{B}\), and the latter either local minima or saddle points, depending on the value of the parameter \(\alpha :=\arccos \left( v_{1}\cdot v_{2}\right) \in \left( 0,\pi \right) \), \(\alpha \ne \pi /2\). Let \(\overline{\alpha }\approx 1.17056\) be a unique solution of the equation \((\cos (\overline{\alpha }/2))\ln (\tan ^{2}(\overline{\alpha }/4)) =-2\) in the interval \(\left( 0,\pi /2\right) \). In [52], the authors showed that for \(\alpha \in \left( 0,\overline{\alpha }\right] \), the function \(H_{B}\) (resp. \(\widetilde{H}_{B}\)) attains the global minimum (respectively, maximum) at the points \(\pm \left( v_{1} +v_{2}\right) /\left( 2\left| \cos \left( \alpha /2\right) \right| \right) \), whereas \(\pm \left( v_{1}-v_{2}\right) /\left( 2\left| \sin \left( \alpha /2\right) \right| \right) \) are saddle points, and for \(\alpha \in \left[ \pi -\overline{\alpha },\pi \right) \), the situation is reversed. However, for \(\alpha \in \left( \overline{\alpha },\pi -\overline{\alpha }\right) \), all these inert states become saddles, and two pairs of new global minimizers emerge, lying symmetrically with respect to the old ones. The appearance of this pitchfork bifurcation phenomenon shows also that, in general, one cannot expect an analytic solution of the minimization problem in a merely symmetric case. This is why we restrict our attention to highly symmetric POVMs.

Note also that for highly symmetric POVMs, we can use, instead of full symmetry group \(\hbox {Sym}\left( B\right) \), the rotational symmetry group of B that acts transitively on B, i.e., \(C_{n}\) for the regular n-gon, T for the tetrahedron, O for the cuboctahedron, cube and octahedron, and I for the icosidodecahedron, dodecahedron, and icosahedron.

8 Global minima of entropy for highly symmetric POVMs in dimension 2

8.1 The minimization method based on the Hermite interpolation

In order to prove that the antipodal points to the Bloch vectors of POVM elements are not only local but also global minimizers, we shall use a method based on the Hermite interpolation.

Consider a sequence of points \(a\le t_{1}<t_{2}<\cdots <t_{m}\le b\), a sequence of positive integers \(k_{1},k_{2},\ldots ,k_{m}\), and a real-valued function \(f\in C^{D}([a,b])\), where \(D:=k_{1}+k_{2}+\cdots +k_{m}\). We are looking for a polynomial p of degree less than D that agree with f at \(t_{i}\) up to a derivative of order \(k_{i}-1\) (for \(1\le i\le m\)), that is,

$$\begin{aligned} p^{(k)}(t_{i})=f^{(k)}(t_{i}),\qquad 0\le k<k_{i}. \end{aligned}$$
(35)

The existence and uniqueness of such polynomial follows from the injectivity (and hence also the surjectivity) of a linear map \(\varPhi :\mathbb {R}_{<D}\left[ X\right] \rightarrow \mathbb {R}^{D}\) given by \(\varPhi \left( p\right) :=(p(t_{1}),p^{\prime }(t_{1}),\ldots ,p^{(k_{1}-1)}(t_{1}),\ldots ,p(t_{m}),\ldots ,p^{(k_{m}-1)}(t_{m}))\). We will also use the following well-known formula for the error in Hermite interpolation [113, Sect. 2.1.5]:

Lemma 1

For each \(t\in \left( a,b\right) \), there exists \(\xi \in \left( a,b\right) \) such that \(\min \{t,t_{1}\}<\xi <\max \{t,t_{m}\}\) and

$$\begin{aligned} f(t)-p(t)=\frac{f^{(D)}(\xi )}{D!}\prod _{i=1}^{m}(t-t_{i})^{k_{i}}. \end{aligned}$$
(36)

Now, we apply this general method in our situation. We will interpolate the function \(h:\left[ -1,1\right] \rightarrow \mathbb {R}^{+}\) defined by (30), choosing the interpolation points from the set \(T:=\{-gv\cdot v|g\in G\}\subset [-1,1]\), where v is the Bloch vector representation of the fiducial vector, and \(-v\) is supposed to be the Bloch vector of a global minimizer. We must distinguish two situations: either the inversion \(-I\in G\) (equivalently \(-v\in B\)) or not. The former is the case for \(G=D_{nh}\) (for even n), \(O_{h}\), \(I_{h}\), and then \(1\in T\), the latter for \(G=D_{nh}\) (for odd n), \(T_{d}\), and then \(1\notin T\). After reordering the elements of T, we obtain an increasing sequence \(\{t_{i}\}_{i=1}^{m}\), where \(m := \left| T\right| \). In particular, \(t_{1}=-1\). We are looking for a polynomial \(p_{v}\) that matches the values of h at all points from T and the values of \(h^{\prime }\) at all points but \(-1\) and, possibly, 1, if \(1\in T\), i.e., such that (35) holds for \(f=h\) with

$$\begin{aligned} k_{i}:=\left\{ \begin{array}{l@{\quad }l} 1, &{} \hbox {if}\,t_{i}\in \left\{ {-1,1}\right\} \\ 2, &{} \hbox {otherwise} \end{array}. \right. \end{aligned}$$
(37)

Then, \(\deg p_{v}<D(v):=2m-2\), if \(1\in T\), and \(\deg p_{v}<D(v):=2m-1\), otherwise. Though h is not differentiable at \(t_1\), we still can use (36) to estimate the interpolation error, as proof of Lemma 1 is based on repeated usage of Rolle’s Theorem.

If \(1\in T\), then \(t_{m}=1\), and we have

$$\begin{aligned} \prod _{i=1}^{m}(t-t_{i})^{k_{i}}=(t+1)(t-1) \prod _{i=2}^{m-1}(t-t_{i})^{2} \le 0, \end{aligned}$$
(38)

for \(t\in \left[ -1,1\right] \). Similarly, if \(1\notin T\), then \(t_{m}<1\) and

$$\begin{aligned} \prod _{i=1}^{m}(t-t_{i})^{k_{i}}=(t+1) \prod _{i=2}^{m}(t-t_{i})^{2}\ge 0 \end{aligned}$$
(39)

for \(t\in \left[ -1,1\right] \). Moreover, inequalities above turn into equalities only for \(t\in T\). Furthermore, as all the derivatives of h of even order are strictly negative in \(\left( -1,1\right) \) and these of odd order greater than 1 are strictly positive, we get

$$\begin{aligned} h^{(D\left( v\right) )}(\xi )=\left\{ \begin{array}{l@{\quad }l} h^{(2m-2)}(\xi )<0, &{} \hbox {if}\, 1\in T\\ h^{(2m-1)}(\xi )>0, &{} \hbox {if}\, 1\notin T \end{array} \right. \end{aligned}$$
(40)

for each \(\xi \in \left( -1,1\right) \). Hence, and from Lemma 1, the interpolating polynomial \(p_{v}\) fulfills \(p_{v}(t)=h(t)\) if and only if \(t\in T\) and it interpolates h from below, see the illustration of this for the octahedral POVM in Fig. 1.

Fig. 1
figure 1

Cubic polynomial function \(p_{v}\) (gray) interpolating h (black) from below for the octahedral measurement, with \(t_{1}=-1\), \(t_{2}=0\) and \(t_{3}=1\)

Let us define now a G-invariant polynomial function \(P_{v}:\mathbb {R} ^{3}\rightarrow \mathbb {R}\) replacing h in (31) by its interpolation polynomial \(p_{v}\), i.e.,

$$\begin{aligned} P_{v}(u):=\ln \frac{|Gv|}{2}+\frac{2}{|G|}\sum _{g\in G}p_{v}(gv\cdot u) \end{aligned}$$
(41)

for \(u\in \mathbb {R}^{3}\). Combining the above facts, we get

$$\begin{aligned} H_{v}(u)=\ln \frac{|Gv|}{2}+\frac{2}{|G|}\sum _{g\in G}h(gv\cdot u)\ge \ln \frac{|Gv|}{2}+\frac{2}{|G|}\sum _{g\in G}p_{v}(gv\cdot u)=P_{v}(u) \end{aligned}$$
(42)

for \(u\in S^{2}\), and \(H_{v}(-gv)=P_{v}(-gv)\) for \(g\in G\).

In consequence, now it is enough to show that \(-v\) is a global minimizer of P (and hence all the elements of its orbit \(\left\{ -gv:g\in G\right\} \) are), because then we have \(H_{v}(u)\ge P_{v}(u)\ge P_{v}(-v)=H_{v}(-v)\) for all \(u\in S^{2}\). This method of finding global minima was inspired by the one used in [90] for \(G=T_{d}\), where h is constant. Note, however, that a similar technique was used by Cohn, Kumar, and Woo [30, 31] to solve the problem of potential energy minimization on the unit sphere. The whole idea can be traced back even further to [128] and [2]. Our method has been already adapted in [5] to provide some upper bounds for the informational power of t-design POVMs for \(1 \le t \le 5\).

Of course, the lower the degree of the interpolating polynomial \(p_{v}\) is, the easier it is to find the minima of \(P_{v}\), as \(\deg P_{v}\le \deg p_{v}\). The latter quantity in turn depends on the cardinality of \(T:=\{-gv\cdot v|g\in G\}\) that can be calculated by analyzing double cosets of isotropy subgroups of any subgroup \(K\subset G\cap SO(3)\) acting transitively on B, because \(T=\{-gv\cdot v|g\in K\}\) and for \(h,g\in K\), if h is in a double coset \(K_{v}gK_{v}\) or \(K_{v}g^{-1}K_{v}\), then \(hv\cdot v=gv\cdot v\). Hence, \(\left| T\right| \le n(v):=n_{s}(v)+\frac{1}{2}n_{a}(v)\), where \(n_{s}(v)\) is the number of self-inverse double cosets of \(K_{v}\), i.e., the cosets fulfilling \(K_{v}gK_{v}=K_{v}g^{-1}K_{v}\), and \(n_{a}(v)\) is the number of non-self-inverse ones. Thus,

$$\begin{aligned} \deg p_{v}\le \left\{ \begin{array}{l@{\quad }l} 2n(v)-3, &{} \hbox {if}\, -v\in Kv\\ 2n(v)-2, &{} \hbox {if}\, -v\notin Kv \end{array} \right. . \end{aligned}$$
(43)

Moreover, for \(g\in K\), using the well-known formula for the cardinality of a double coset, see, e.g., [18, Prop. 5.1.3], we have \(\left| K_{v}gK_{v}\right| =\left| K_{v}\right| \left| K_{v}/\left( K_{v}\cap K_{gv}\right) \right| =\left| K_{v}\right| \), if \(gv=v\) or \(gv=-v\), and \(\left| K_{v}\right| ^{2}\), otherwise. Hence, if \(-v\in Kv\), then \(\left| Kv\right| \left| K_{v}\right| =\left| K\right| =2\left| K_{v}\right| +\left( n_{s}(v)-2\right) \left| K_{v}\right| ^{2}+n_{a}(v)\left| K_{v}\right| ^{2}\), and so \(n_{s}(v)+n_{a}(v)=\left( \left| Kv\right| -2\right) /\left| K_{v}\right| +2\). Analogously, if \(-v\notin Kv\), then we have \(n_{s}(v)+n_{a}(v)=\left( \left| Kv\right| -1\right) /\left| K_{v}\right| +1\). Using these facts and (43), we get finally

$$\begin{aligned} \deg p_{v}\le \left\{ \begin{array}{l@{\quad }l} \frac{\left| Kv\right| -2}{\left| K_{v}\right| }+n_{s}(v)-1, &{} \hbox {if}\, -v\in Kv \\ \frac{\left| Kv\right| -1}{\left| K_{v}\right| }+n_{s}(v)-1, &{} \hbox {if}\, -v\notin Kv \end{array} \right. . \end{aligned}$$
(44)

Applying (44) to HS-POVMs in dimension 2, we get the upper bounds for the degree of interpolating polynomials gathered in Table 2.

Table 2 HS-POVMs in dimension 2: upper bounds for the number of interpolating points (n(v)) and the degree of interpolating polynomial

To find global minimizers of \(P_{v}\), we can express the polynomial in terms of primary and secondary invariants for the corresponding ring of G-invariant polynomials. In fact, as we will see in the next section, only the former will be used.

8.2 Group-invariant polynomials

The material of this subsection is taken from [43, Ch. 3] and [67], see also [50]. Let G be a finite subgroup of the general linear group \(GL_{n}\left( \mathbb {R}\right) \). By \(\mathbb {R}\left[ x_{1},\ldots ,x_{n}\right] ^{G}\), we denote the ring of G-invariant real polynomials in n variables. Its properties were studied by Hilbert and Noether at the beginning of twentieth century. In particular, they showed that \(\mathbb {R} \left[ x_{1},\ldots ,x_{n}\right] ^{G}\) is finitely generated as an \(\mathbb {R}\)-algebra. Later, it was proven that it is possible to represent each G-invariant polynomial in the form \({\sum _{j=1}^{m}} P_{j}\left( \theta _{1},\ldots ,\theta _{n}\right) \eta _{j}\), where \(\theta _{1},\ldots ,\theta _{n}\) are algebraically independent homogeneous G-invariant polynomials called primary invariants, forming so-called homogeneous system of parameters, \(\eta _{1}=1,\ldots ,\eta _{m}\) are G-invariant homogeneous polynomials called secondary invariants, and \(P_{j}\) (\(j=1,\ldots ,m\)) are elements from \(\mathbb {R}\left[ x_{1} ,\ldots ,x_{n}\right] \). Moreover, \(\eta _{1},\ldots ,\eta _{m}\) can be chosen in such a way that they generate \(\mathbb {R}\left[ x_{1},\ldots ,x_{n}\right] ^{G}\) as a free module over \(\mathbb {R}\left[ \theta _{1},\ldots ,\theta _{n}\right] \). Both sets of polynomials combined form so-called integrity basis. Note that neither primary nor secondary invariants are uniquely determined. If \(m=1\), we call the basis regular and the group G coregular. The invariant polynomial functions on \(\mathbb {R}^{n}\) separate the G-orbits. In consequence, the map \(\mathbb {R}^{n}/G\ni Gx\rightarrow \left( \theta _{1}\left( x\right) ,\ldots ,\theta _{n}\left( x\right) , \eta _{2}\left( x\right) ,\ldots ,\eta _{m}\left( x\right) \right) \in \mathbb {R}^{n+m-1}\) maps bijectively the orbit space onto an n-dimensional connected closed semialgebraic subset of \(\mathbb {R}^{n+m-1}\). There is also a correspondence between the orbit stratification of \(\mathbb {R}^{n}/G\) and the natural stratification of this semi-algebraic set into the primary strata, i.e., connected semialgebraic differentiable varieties. If \(G\subset O\left( n\right) \) is a coregular group acting irreducibly on \(\mathbb {R}^{n}\), we may assume that \(\theta _{1}\left( x\right) =\sum _{i=1}^{n}x_{i}^{2}\) is a non-constant invariant polynomial of the lowest degree. Then, the orbit map \(\omega :S^{n-1}/G\ni Gx\rightarrow \left( \theta _{2}\left( x\right) ,\ldots ,\theta _{n}\left( x\right) \right) \in \mathbb {R}^{n-1}\) is also one to one, and its range is a semialgebraic \(\left( n-1\right) \)-dimensional set. In consequence, the minimizing of a G-invariant polynomial \(P\left( x_{1},\ldots ,x_{n}\right) \) on \(S^{n-1}\) is equivalent to the minimizing of the respective polynomial \(P_{1}\left( \theta _{1},\ldots ,\theta _{n}\right) \) on the range of \(\omega \). In the 1980s, Abud and Sartori proposed a general procedure for finding the algebraic equations and inequalities defining this set and its strata, and thus also a general scheme for finding minima of \(P_{1}\) on the range of the orbit map, see [99, 100].

Let us now take a closer look at the G-invariant polynomials of three real variables, which will be of our particular interest while considering HS-POVMs in dimension 2. An element from \(GL_{n}\left( \mathbb {R}\right) \) is called a pseudo-reflection, if its fixed points space has codimension 1. The classical Chevalley–Shephard–Todd theorem says that every pseudo-reflection (i.e., generated by pseudo-reflections) group is coregular. As all the symmetry groups of polyhedra representing HS-POVMs in dimension 2 (\(D_{nh}\), \(T_{d}\), \(O_{h}\), \(I_{h}\)) are pseudo-reflection groups, the interpolating polynomials can be expressed by their primary invariants listed below. Put \(\rho :=x^{2}+y^{2}\), \(\gamma _{n}:=\mathfrak {R}\left( x+iy\right) ^{n}\), \(I_{2}:=x^{2}+y^{2}+z^{2}\), \(I_{3}:=xyz\), \(I_{4}:=x^{4}+y^{4}+z^{4}\), \(I_{6}:=x^{6}+y^{6}+z^{6}\), \(I_{6}^{\prime } :=(\tau ^{2}x^{2}-y^{2})(\tau ^{2}y^{2}-z^{2})(\tau ^{2}z^{2}-x^{2})\) and \(I_{10}:=(x+y+z)(x-y-z)(y-z-x) (z-y-x)(\tau ^{-2}x^{2}-\tau ^{2}y^{2})(\tau ^{-2}y^{2}-\tau ^{2}z^{2})(\tau ^{-2}z^{2}-\tau ^{2}x^{2})\), where \(\tau :=(1+\sqrt{5})/2\) (the golden ratio). Note that the indices coincide with the degrees of invariant polynomials. Then, (notation and results are taken from [67]) for the canonical representations of these groups, i.e., if coordinates x, y, and z are so chosen, that the origin is the fixed point for the group action and: the x and z axes are two- and n-fold axes, respectively (\(D_{nh}\)); the threefold axes pass through vertices of a tetrahedron at (1, 1, 1), \((1,-1,-1)\), \((-1,1,-1)\), \((-1,-1,1)\) (\(T_{d}\)); x, y, and z axes are the fourfold axes (\(O_{h}\)); the fivefold axes pass through the vertices of an icosahedron at \((\pm \tau ,\pm 1,0)\), \(0,\pm \tau ,\pm 1)\), \((\pm 1,0,\pm \tau )\) (\(I_h\)), we get the primary invariants listed in Table 3:

Table 3 Primary invariants for four point groups

In [67], the stratification of the range of the orbit map is analytically described in all these cases.

8.3 The main theorem

Theorem 2

For HS-POVMs in dimension 2, the points lying on the orbit of the point antipodal to the Bloch vector of the fiducial vector (that is the Bloch vector of the state orthogonal to the fiducial vector) are the only global minimizers (respectively, maximizers) for the entropy of measurement (respectively, the relative entropy of measurement).

Proof

We will give a proof of the theorem in two steps. Firstly, we show that the antipodal points to the Bloch vectors of POVM elements, i.e., the points \(\left\{ -gv:g\in G\right\} \) are the global minima of the G-invariant polynomial \(P_{v}\) constructed in Sect. 8.1. (In particular, this is true if \(P_v\) is constant.) Then, we prove the uniqueness of designated global minimizers of the entropy of measurement.

We shall use the a priori estimates for \(\deg P_{v}\) that can be read from Table 2 and the primary invariants of G listed in Table 3. We may exclude the trivial case when the HS-POVM in question is PVM represented by two antipodal points on the Bloch sphere (digon), as in this situation the minimal value of H equal 0 is achieved at these points and the assertion follows. The proof is divided into four cases according to the symmetry group of the HS-POVM.

Case I (prismatic symmetry)

Regular n -gon In Sect. 6.6, we showed that in this case, it is enough to look for the global minimizers on the circle \(S^{1}:=\left\{ (x,y)\in \mathbb {R}^{2}:x^{2}+y^{2}=1\right\} \) containing the n-gon. Its symmetry group acts on the plane \(z=0\) as the dihedral group \(D_{n}\), and so the interpolating polynomial \(P_{v}\) restricted to the circle \(S^{1}\) can be expressed in terms of its primary invariants, i.e., \(\rho =x^{2}+y^{2}\) and \(\gamma _{n}=\mathfrak {R}(x+iy)^{n}\). Since \(\deg P_{v}<n\), it follows that \(P_{v}|_{S}\) has to be a linear combination of \(\rho ^{m}\), \(0\le 2m<n\), and hence constant.

Case II (tetrahedral symmetry)

Tetrahedron This case is immediate, as \(\deg P_{v}\le \deg p_{v} \le 2\), and so \(P_{v}\) has to be constant on the sphere \(S^{2}\).

Case III (octahedral symmetry)

For \(O_{h}\), we have inert states at the \(O_{h}\)-orbits of the points: \(x_{1}:=(0,0,1)\) (vertices of an octahedron), \(x_{2}:=\frac{1}{\sqrt{2} }(0,1,1)\) (vertices of a cuboctahedron), and \(x_{3}:=\frac{1}{\sqrt{3} }(1,1,1)\) (vertices of a cube). Using the Lagrange multipliers, it is easy to check that these points are the only critical points for \(I_{4}\) and \(I_{6}\) restricted to the sphere \(S^{2}\). By comparing the values of \(I_{4}\) and \(I_{6}\) (which are \(I_{4}\left( x_{1}\right) =1\), \(I_{4}\left( x_{2}\right) =1/2\), \(I_{4}\left( x_{3}\right) =1/3\), \(I_{6}\left( x_{1}\right) =1\), \(I_{6}\left( x_{2}\right) =1/4\), \(I_{6}\left( x_{3}\right) =1/9\)), we find that the points lying on the orbit of \(x_{3}\) are global minimizers both for \(I_{4}\) and \(I_{6}\).

Octahedron This case is straightforward, as for \(v=x_{1}\), we have \(\deg P_{v}\le \deg p_{v}\le 3\), and so \(P_{v}\) has to be constant on the sphere \(S^{2}\).

Cube In this case, we have \(v=x_{3}\) and \(\deg P_{v}\le \deg p_{v}\le 5\). In consequence, \(P_{v}\) must be a linear combination of 1, \(I_{2}\), \(I_{4}\), and \(I_{2}^{2}\). After the restriction to the sphere, \(P_{v}|_{^{S^{2}}}\) can be expressed as \(A+BI_{4}\), for some \(A,B\in \mathbb {R}\). Thus, all we need to know now is the sign of B. Calculating the values of \(P_{v}\) in two points from different orbits (e.g., \(x_{1}\) and \(x_{3}\)) and solving the system of two linear equations, we get \(B=(3/8)\ln (27/16)>0\). Thus, the global minimizers for \(P_{v}\) are the same as for \(I_{4} \), i.e., they lie on the orbit of v or, equivalently, \(-v\), as required.

Cuboctahedron For the cuboctahedral measurement, we have \(v=x_{2}\) and \(\deg P_{v}\le \deg p_{v}\le 7\). Consequently, \(\deg P_{v}\le 6\) and \(P_{v}\) is a linear combination of 1, \(I_{2}\), \(I_{4}\), \(I_{2}^{2}\), \(I_{6} \), \(I_{4}I_{2}\), and \(I_{2}^{3}\). Hence, after the restriction to the sphere \(S^{2}\), we get \(P_{v}|_{^{S^{2}}}=A+BI_{4}+CI_{6}\), for some \(A,B,C\in \mathbb {R}\). Put \(\beta :=-B/(3C)\). Clearly, all inert states are critical for \(P_{v}|_{^{S^{2}}}\) with \(P_{v}\left( x_{1}\right) =A+C(1-3\beta )\), \(P_{v}\left( x_{2}\right) =A+C(1-6\beta )/4\), \(P_{v}\left( x_{3}\right) =A+C(1-9\beta )/9\). One can show easily that they are only critical points unless \(1/4<\beta <1/2\). In this case, there are another critical points, namely the orbit of the point \(x_{4}:=\left( \sqrt{4\beta -1},\sqrt{1-2\beta },\sqrt{1-2\beta }\right) \) with \(P_{v}\left( x_{4}\right) =C\left( 1-9\beta +24\beta ^{2}-24\beta ^{3}\right) \). To find B and C, we need to calculate the values of \(P_{v}\) in three points from different orbits (e.g., \(x_{1}\), \(x_{2}\), and \(x_{3}\)) and to solve the system of three linear equations. In this way, we get \(B=\frac{520}{9}\ln 2-37\ln 3<0\), \(C=-\frac{364}{9}\ln 2+26\ln 3>0\) and \(\beta \approx 0.3775\). Comparison of the values that \(P_{v}\) achieves at points \(x_{1}\), \(x_{2}\), \(x_{3}\), and \(x_{4}\) leads to the conclusion that the global minima are achieved for the vertices of cuboctahedron that form the orbit of v and thus also of \(-v\).

Case IV (icosahedral symmetry)

The inert states for \(I_{h}\), that is, the \(I_{h}\)-orbits of points: \(x_{1}=(0,0,1)\) (vertices of an icosidodecahedron), \(x_{5}:=\frac{1}{\sqrt{\tau +2}}(0,\tau ,1)\) (vertices of an icosahedron), and \(x_{6}:=\frac{1}{\sqrt{3}}(0,\frac{1}{\tau },\tau )\) (vertices of a dodecahedron), are the only critical points for \(I_{6}^{\prime }\). They are, correspondingly, saddle, minimum, and maximum points with values: 0, \(-(2+\sqrt{5})/5\), and \((2+\sqrt{5})/27\), respectively. For \(I_{10}\), the \(I_{h}\)-orbit of \(x_{6}\) also coincides with the set of the global maxima, and we have local maxima at the \(I_{h}\)-orbit of \(x_{5}\) and saddle points at the orbit of \(x_{1}\), but there are also non-inert critical points, namely 60 minima at the vertices of a non-Archimedean vertex truncated icosahedron (Fig. 14 in [131]), and 60 saddles at the vertices of an edge truncated Archimedean vertex truncated icosahedron (Fig. 5 in [130]), see [67, p. 26].

Icosahedron This case is immediate, as \(v=x_{5}\) and \(\deg P_{v} \le \deg p_{v}\le 5\). Hence, \(P_{v}\) restricted to \(S^{2}\) is constant.

Dodecahedron In this case, \(v=x_{6}\) and \(\deg P_{v}\le \deg p_{v}\le 9\). Therefore, \(P_{v}\) must be a linear combination of 1, \(I_{2}\), \(I_{2}^{2}\), \(I_{2}^{3}\), \(I_{6}^{\prime }\), \(I_{2}^{4}\), and \(I_{6}^{\prime }I_{2}\). After restriction to \(S^{2}\), we obtain \(P_{v}|_{^{S^{2}}} =A+BI_{6}^{\prime }\), for some \(A,B\in \mathbb {R}\). We can calculate B using the same method as in the cubical case. As it turns out to be negative (\(B\approx -0.06509\)), the global minimizers coincide with the global maximizers for \(I_{6}^{\prime }\), i.e., they are the vertices of the dodecahedron.

Icosidodecahedron The icosidodecahedral case (\(v=x_{1}\)) is the most complicated one. Since \(\deg P_{v}\le \deg p_{v}\le 15\), and \(P_{v}\) must be a linear combination of polynomials 1, \(I_{2}\), \(I_{2}^{2}\), \(I_{2}^{3}\), \(I_{6}^{\prime }\), \(I_{2}^{4}\), \(I_{6}^{\prime }I_{2}\), \(I_{2}^{5}\), \(I_{6}^{\prime }I_{2}^{2}\), \(I_{10}\), \(I_{2}^{6}\), \(I_{6}^{\prime }I_{2}^{3}\), \((I_{6}^{\prime })^{2}\), \(I_{10}I_{2}\), \(I_{2}^{7}\), \(I_{6}^{\prime }I_{2}^{4}\), \((I_{6}^{\prime })^{2}I_{2}\), and \(I_{10}I_{2}^{2}\). Restriction to \(S^{2}\) gives us: \(P_{v}|_{^{S^{2}}}=A+BI_{6}^{\prime }+CI_{10}+D(I_{6}^{\prime })^{2}\), for some \(A,B,C,D\in \mathbb {R}\). Both of the polynomials \(I_{6}^{\prime }\) and \(I_{10}\) take the value 0 at \(x_{1}\), which is obviously a critical point for \(P_{v}|_{^{S^{2}}}\). As we have conjectured that the vertices of the icosidodecahedron are the global minimizers of \(P_{v}|_{^{S^{2}}}\), it is enough to prove that \(\tilde{P}:=P_{v}|_{^{S^{2}}}-A\) is nonnegative. We keep proceeding like in the previous cases to obtain formulae for B, C, and D:

$$\begin{aligned} B =&-(1/50) (-2 + \sqrt{5}) (7122 \sqrt{5} {{\mathrm{arcoth}}}(\sqrt{5}) + 3 (-3728 + 2773 \sqrt{5}) \ln 2 \\&+ 39575 \ln 3 - 4700 \ln 5 - 8319 \sqrt{5} \ln (7 + 3 \sqrt{5})),\\ C =&\ (1/180) (-108414 {{\mathrm{arcoth}}}(3/\sqrt{5}) + 47970 {{\mathrm{arcoth}}}(\sqrt{5}) + \sqrt{5} (-16352 \ln 2\\&+ 51120 \ln 3 - 5265 \ln 5)),\\ D =&\ (29/900) (9 - 4 \sqrt{5}) (53766 \sqrt{5} {{\mathrm{arcoth}}}(3/\sqrt{5}) - 23418 \sqrt{5} {{\mathrm{arcoth}}}(\sqrt{5})\\&+ 34816 \ln 2 - 126450 \ln 3 + 15075 \ln 5). \end{aligned}$$

The range \(\varOmega \) of the orbit map \(\omega :S^{2}/I_{h}\ni I_{h}w\rightarrow \left( I_{6}^{\prime }\left( w\right) ,I_{10}\left( w\right) \right) \in \mathbb {R}^{2}\) is the curvilinear triangle (see Fig. 2) defined by the following inequalities imposed on the coordinates \(\left( \theta _{1}, \theta _{2} \right) \in \mathbb {R}^{2}\):

$$\begin{aligned}&-\frac{2\tau +1}{5}\le \theta _{1}\le \frac{2\tau +1}{27},\quad (7-4\tau )\theta _{1}\le \theta _{2},\\&0 \le J_{15}^{2}:=4\theta _{1}^{2}-8(3+4\tau )\theta _{1}\theta _{2}-91(3-2\tau )\theta _{1}^{3}+4(5+8\tau )\theta _{2}^{2}\\&+ 159(1-2\tau )\theta _{1}^{2}\theta _{2}+688(13-8\tau )\theta _{1}^{4} +325(1+2\tau )\theta _{1}\theta _{2}^{2} \\&-720(7-4\tau )\theta _{1}^{3}\theta _{2}-1728(55-34\tau )\theta _{1} ^{5}-25(11+18\tau )\theta _{2}^{3}, \end{aligned}$$

where \(J_{15}\) is the only secondary invariant for the icosahedral group I [67, Tab. IIIb].

Fig. 2
figure 2

Zero-level set for \(P_{1}\) (gray) and for \(J_{15}^{2}\) (black)

Define \(P_{1}\left( \theta _{1},\theta _{2}\right) :=B\theta _{1}+C\theta _{2}+D\theta _{1}^{2}\) for \(\left( \theta _{1},\theta _{2}\right) \in \varOmega \). Then, \(\tilde{P}\left( w\right) = P_{1}\left( \omega \left( \left( I_{h}\right) w\right) \right) \) for \(w\in S^{2}\). The level sets of \(P_{1}\) are parabolas, and the zero-level parabola given by \(\theta _{2}=-\left( B/C\right) \theta _{1}-(D/C)\theta _{1}^{2}\) (the gray curve in Fig. 2) divides the plane into two regions: \(\left\{ P_{1} \ge 0\right\} \) and \(\left\{ P_{1}<0\right\} \). Now, it is enough to show that the zero-level set of \(P_{1}\) meets with the zero-level set of \(J_{15}^{2}\) (the black curve in Fig. 2), which defines the boundary of \(\varOmega \) only at \(\left( \theta _{1},\theta _{2}\right) = (0,0)\), since in this case, \(P_{1}\) has the same sign over the whole \(\varOmega \), and, in consequence, \(\tilde{P}\) is positive on the whole unit sphere. This approach reduces the complexity of the problem by lowering the degree of a polynomial equation to be solved. In fact, now it is enough to show that the polynomial \(Q(\theta _1):=J_{15}^{2}\left( \theta _{1},-\left( B/C\right) \theta _{1}-(D/C)\theta _{1}^{2}\right) /\theta _{1}^{2}\) of degree 4 has no real roots. This can be done in a standard way by using Sturm’s theorem, the method which we recall briefly below.

The Sturm chain for polynomial q is a sequence \(q_0, q_1,\ldots , q_m\), where \(q_0=q\), \(q_1=q'\), \(q_i =-\text {rem}(q_{i-2},q_{i-1})\) for \(i=2,\ldots ,m\), and \(m\le \deg q\) is the minimal number i such that \(\text {rem}(q_{i-1},q_{i})=0\) (by \(\text {rem}(r,s)\) we denote the reminder of division of r by s). Sturm’s theorem states that the number of roots of q in (ab) for \(-\infty \le a<b\le +\infty \) equals to the difference between the numbers of sign changes in the Sturm chain for q evaluated in b and a (for more details, see, e.g., [12, Sect. 2.2]). Thus, to finish the proof for icosidodecahedron, we calculate Sturm’s chain for Q, evaluate it at \(\pm \infty \), and show that numbers of sign changes do not differ.Footnote 1

We end the proof with showing that there are no other (global) minimizers of the entropy.

It follows from (42) that if \(w\in S^{2}\) is a global minimizer for \(H_{v}\), then it is also a global minimizer for \(P_{v}\), since \(P_{v}(-v)\le P_{v}(w)\le H_{v}(w)=H_{v}(-v)=P_{v}(-v)\). The same argument gives us \(h(w\cdot u)=p(w\cdot u)\) for every \(u\in Gv\), and so \(\{w\cdot u:u\in Gv\}\subset T=\{-v\cdot u:u\in Gv\}\).

Put \(a_{u}:=w\cdot u\) for \(u\in Gv\) and \(k:=\left| Gv\right| \). Now, it is enough to show that \(-1\in T_{w}:=\{a_{u}:u\in Gv\}\), since then \(w\in G\left( -v\right) \). We know that \(\sum _{u\in Gv}a_{u}=0\). For informationally complete HS-POVMs, we have additionally \(\sum _{u\in Gv}a_{u}^{2}=k/3\) (as Gv is 2-design), and, for icosahedral group, \(\sum _{u\in Gv}a_{u}^{4}=k/5\) (as Gv is 4-design). Moreover, \(1\in T_{w}\) implies \(-1\in T_{w}\) for octahedral and icosahedral group. Using all these facts and the form of the interpolating set for respective informationally complete HS-POVMs (see Table 4), we see that in all seven cases, the assumption \(-1\notin T_{w}\) leads to the immediate contradiction. On the other hand, for a regular polygon, w must lie on the circle containing this polygon (see Sect. 6.6). Then, \(w\cdot \left( -v\right) \in T\), implies \(w\in G\left( -v\right) \), as desired. \(\square \)

Table 4 Interpolating sets for HS-POVMs in dimension 2

Remark 1

Let us observe that without any additional calculations, we get that the theorem holds true for POVMs represented by regular polygons, tetrahedron, octahedron, and icosahedron if the Shannon entropy is replaced by Havrda–Charvát–Tsallis \(\alpha \)-entropy or Rényi \(\alpha \)-entropy for \(\alpha \in (0,2]\). It follows from the fact that the degree of the polynomial \(P_v\) interpolating generalized entropy or its increasing function from below (see Sect. 8.1) does not depend on the entropy function. As in all these cases, it is at most 2, thus P is constant.

Remark 2

In this paper, we presented a universal method of determining the global extrema of the entropy of POVM. However, in some cases, it is possible to give proofs that appear to be more elementary.

Let us recall that for tight informationally complete POVMs, the sum of squared probabilities of the measurement outcomes (known as the index of coincidence) is the same for each initial pure state and equal to \(2d/(k(d+1))\). The problem of finding the minimum and maximum of the Shannon entropy under assumption that the index of coincidence is constant has been analyzed in [57] (some generalizations and related topics can be found also in [16, 137]). By [57, Theorem 2.5.], we get that the minimum is achieved for the probability distribution \((p,\ldots ,p,q,0,\ldots ,0) \), where there are \(\lfloor k(d+1)/(2d)\rfloor \) probabilities equal to p, and both p and q are uniquely determined by the value of the index of coincidence.

One would not suppose this fact to be useful in general setting, since the possible probability distributions of the measurement outcomes for initial pure states form just a \((2d-2)\)-dimensional subset of a \((k-2)\)-dimensional intersection of a \((k-1)\)-sphere and the simplex \(\Delta _k\). If \(\varPi \) is informationally complete, then \(d^2\le k\). Hence, \(2d < k\), unless \(d=2\) and \(k=4\) and so, in general situation, these extremal points need not necessarily belong to this subset. However, this method can be used for the tetrahedral POVM, where \(d=2\) and \(k=4\).

On the other hand, one can ask whether it is possible to derive a proof of Theorem 2 for HS-POVMs with octahedral and icosahedral symmetry using the fact that the corresponding Bloch vectors are spherical 3-designs and 5-designs, respectively. The question consists of two problems. The first one is to find the probability distributions that minimize Shannon entropy under assumption that Rényi \(\alpha \)-entropies are fixed for \(\alpha =2,3\) and \(\alpha =2,3,4,5\), respectively. The second one is whether the obtained extremal probability distribution belongs to the “allowed” set, as the conditions on Rényi entropies do not give a complete characterization of this set.

Remark 3

Usually, the simplest way to find the entropy minimizers leads through the majorization technique. However, we shall see that this method fails in general here and can be useful just in special cases. To show that this is the case, recall that if a normalized rank-1 POVM \(\varPi =\{\varPi _j\}_{j=1}^k\) is tight informationally complete (i.e., the set of corresponding pure states is a complex projective 2-design), then for any \(\rho \in \mathcal {S}\left( {\mathbb {C}}^{d}\right) \), the probability distribution of measurement outcomes \((p_1(\rho ,\varPi ),\ldots ,p_k(\rho ,\varPi ))\) fulfills an additional constraint \(p_1(\rho ,\varPi )^2+\cdots +p_k(\rho ,\varPi )^2=2d/(k(d+1))\). Thus, the set of all possible probability distributions is a \((2d-2)\)-dimensional subset of the \((k-1)\)-dimensional sphere of radius \(\sqrt{2d/(k(d+1))}\) intersected with the probability simplex \(\Delta _k\). That intersection is a \((k-2)\)-dimensional sphere in the affine hyperplane containing \(\Delta _k\) that is centered at the uniform distribution and, possibly, cut to fit in the positive hyperoctant (compare Fig. 4 in [3]). Now, from the fact that the set of probability distributions majorized by a given \(P\in \Delta _k\) is a convex hull of its orbit under permutations (see, e.g., [15, Ch. 2.1] or [83, Ch. 1.A]), it follows that the only probability distributions from the sphere indicated above that majorize (or are majorized by) any probability distribution from the same sphere need to be its permutations. Hence, we deduce that if the distribution of measurement outcomes for one state majorizes that for another one, both distributions must be equivalent, and in particular, the measurement entropies at these points are equal. These facts imply that the minimization problem cannot be solved in full generality via majorization.

9 Informational power and the average value of relative entropy

While we know the minimum and maximum values of the relative entropy of some POVMs, it would be worth taking a look at its average. Surprisingly, the average value of relative entropy over all pure states does not depend on the measurement \(\varPi \), but only on the dimension d. This can be proved using (16) and the formula (21) from Jones [69]. Namely, we have

$$\begin{aligned} \left\langle \widetilde{H}(\rho ,\varPi )\right\rangle _{\rho \in {\mathcal {P}}\left( {\mathbb {C}}^{d}\right) }&=\int _{{\mathcal {P}}\left( {\mathbb {C}}^{d}\right) }\left( \ln d-\frac{d}{k}\sum _{j=1}^{k}\eta \left( \hbox {tr} \left( \rho \sigma _{j}\right) \right) \right) \text {d}m_{FS}\left( \rho \right) \nonumber \\&=\ln d-d\left( \int _{{\mathcal {P}}\left( {\mathbb {C}}^{d}\right) }\eta \left( \hbox {tr}\left( \rho \sigma _{1}\right) \right) \text {d}m_{FS}\left( \rho \right) \right) \nonumber \\&=\ln d-\sum _{j=2}^{d}\frac{1}{j}\rightarrow 1-\gamma \quad \left( d\rightarrow \infty \right) , \end{aligned}$$
(45)

where \(\gamma \approx 0.57722\) is the Euler–Mascheroni constant. This average is also equal to the maximum value (in dimension d) of entropy-like quantity called subentropy, providing the lower bound for accessible information [37, 72]. Moreover, applying (10), Proposition 6, and (45), we get immediately a lower bound for the informational power of \(\varPi \):

$$\begin{aligned} \ln d-\sum _{j=2}^{d}\frac{1}{j} \ \le \ W\left( \varPi \right) , \end{aligned}$$
(46)

provided that condition (1) in Proposition 6 is fulfilled. This bound was found, independently, but in general situation, in [8].

In particular, the average value of relative entropy is the same for every HS-POVM \(\varPi \) in dimension 2 and equals \(\ln 2-1/2\approx 0.19315\). It follows from Theorem 2 and (32) that its maximal value, that is, the informational power of \(\varPi \), is given by the formula

$$\begin{aligned} W\left( \varPi \right) =\ln 2-\frac{2}{|G/G_{v}|}\sum _{\left[ g\right] \in G/G_{v}}\eta \left( \frac{1-gv\cdot v}{2}\right) , \end{aligned}$$
(47)

where G is any group acting transitively on the set of Bloch vectors representing \(\varPi \). Recall that the number of different summands in (47) is bounded by the number of self-inverse double cosets of \(G_{v}\) plus half of the number of non-self-inverse ones.

Table 5 Approximate values of informational power (maximum relative entropy) for all types of HS-POVMs in dimension 2 (up to five digits)
Fig. 3
figure 3

Relative entropy of highly symmetric qubit measurements, where their Bloch vectors form: a an equilateral triangle; b a regular pentagon; c a tetrahedron; d an octahedron; e a cube; f a cuboctahedron; g an icosahedron; h a dodecahedron; i an icosidodecahedron. The colors range from light (maximum) to dark (minimum)

Applying the above formula to the n-gonal POVM, we get

$$\begin{aligned} W\left( \varPi \right) =\ln 2-\frac{2}{n}\sum _{j=1}^{n}\eta \left( \sin ^{2}\frac{\pi j}{n}\right) \rightarrow 1-\ln 2 \, \approx \, 0.30685 (n\rightarrow \infty ). \end{aligned}$$
(48)

The approximate values of informational power for other HS-POVMs in dimension 2 can be found in Table 5. It follows from [5, Corollaries 7-9] that the informational power of tetrahedral, octahedral, and icosahedral POVMs is maximal among POVMs generated by, respectively, 2-, 3-, and 5-designs in dimension 2.

Comparing these values to the average value of relative entropy, we see that the larger is the number of elements in the HS-POVM, the flatter is the graph of \(\widetilde{H}\); see also Fig. 3, where the graphs in spherical coordinates are presented.