1 Introduction

1.1 Background

The permanent is a classical problem of intense interest in the study of counting problems. For a matrix \(A \in {\mathbb {C}}^{n\times n}\), the permanent is defined as

$$\begin{aligned} {{\,\textrm{Perm}\,}}(A)=\sum _{\sigma \in S_n}\prod _{i=1}^n A_{i,\sigma (i)} \end{aligned}$$
(1)

summing over all permutations \(\sigma \) of rows and columns, products of matrix elements \(A_{i,j}\). While directly evaluating the expression in Eq. (1) takes O(n!) time, Ryser’s formula [1] gives an \(O(2^n n)\) time algorithm. Valiant showed in 1979 that computing the permanent exactly is #P-hard, even for 0–1 matrices [2, 3]. However, it is amenable to efficient approximation in particular settings. In 2001, Jerrum, Sinclair and Vigoda [4] gave a fully-polynomial randomized approximation scheme (FPRAS) for permanents of nonnegative matrices. In 2002, Gurvits and Samorodnitsky [5] gave a polynomial time \(e^n\) multiplicative approximation to PSD mixed discriminants, which included permanents of nonnegative matrices as a special case.

When the matrix is Hermitian positive semidefinite (HPSD, or if purely real, PSD), the permanent is necessarily nonnegative, and this offers hope of efficient multiplicative approximation. HPSD permanents are of particular interest to the quantum information community - for reasons unrelated to quantum state tomography, but rather related to thermal BosonSampling experiments [6,7,8]. Computing PSD permanents exactly remains #P-hard [9]. It is known that by Stockmeyer counting [7, 9, 10] computing multiplicative approximations to PSD permanents is contained in \(\textsf {FBPP}^\textsf {NP}\). In 1963, Marcus [11] observed that the product of the diagonal of a PSD matrix immediately gives an n! approximation ratio to the permanent. In 2017, Anari et al. gave a polytime approximation to PSD permanents within a ratio of \(c^n\) with \(c = e^{1+\gamma } \approx 4.85\) [12]. Yuan and Parillo [13] described a similar approach with the same approximation ratio. Chakhmakhchyan et al. [14] and Barvinok [15] gave algorithms for approxmation when the spectrum of the matrix is small in radius, that is, when \(\lambda _{min}/\lambda _{max}\) is not too small.

1.2 Main Results

Our main result is to show that there is no efficient approximation of PSD permanents. Precisely, we show that it is NP-hard to approximate within a particular subexponential factor.

Theorem 1

(Theorem 5, restated) For any constant \(\epsilon > 0\), it is NP-hard to approximate the permanent of \(n\times n\) HPSD matrices within a factor of \(2^{n^{1-\epsilon }}\).

This implies the absence of a polynomial time approximation scheme (PTAS) or polynomial randomized approximation scheme (PRAS).

Corollary 1

(of Theorem 5) There is no PTAS for HPSD permanents unless P=NP, and there is no PRAS for HPSD permanents unless RP=NP.

In Sect. 3.5, we show that these theorems also hold for (purely real) PSD matrices.

Our work provides a lower bound on the difficulty of approximating PSD permanents, that almost matches known upper bounds. The algorithm of Anari et al. [12] shows that the singly exponential approximation ratio \(4.85^n\) is possible within polynomial time, while we show that a subexponential approximation ratio \(2^{n^{1-\epsilon }}\) is intractable. This primarily leaves the question whether \((1+\epsilon )^n\) is polynomial-time computable for any \(\epsilon > 0\). The algorithms of Chakhmakhchyan et al.  [14] and Barvinok [15] fail on the hard instances that we construct: the matrices we construct are highly rank deficient, and therefore have \(\lambda _{min}=0\).

Our most key connection is that between a permanent and an particular integral over the unit sphere. If a matrix M is Hermitian positive semidefinite, then it has a matrix square root \(V=V^\dagger \) satisfying \(VV^\dagger = V^2 = M\). If M is \(n\times n\), and rank d, then V is \(n\times d\). We will show (Theorem 3) that

$$\begin{aligned} {{\,\textrm{Perm}\,}}(M)=\frac{(d+n-1)!}{2\pi ^n} \int _{\vec {{x}} \in {\mathbb {C}}^n,\,|x|=1}\!\!d\vec {x}\,\, \prod _{k=1}^d |\vec {{x}}\cdot V_k|^2 \,. \end{aligned}$$
(2)

Here \(V_i\) are the rows of V, and the integral is the Haar measure over the unit complex sphere. As we will see, this integral occurs naturally in the context of Bayesian inference for quantum state tomography. In that context, the rows \(V_i\) correspond to an observation history, and the variable of integration \(\vec {{x}}\) represents an unknown quantum state. This intuition gained from viewing it as a quantum state tomography problem guided us towards finding our hard instances M. We analyze the problem by first establishing a concentrating construction (Lemmas 2 and 3). Informally, when \(V_i\) contains many copies of basis vectors \(\vec {e}_{{j}}\) and vectors of the form \(\frac{\vec {e}_{{j}}\pm i\vec {e}_{{k}}}{\sqrt{2}}\), the integral concentrates at the points (up to a phase) of an appropriately scaled hypercube:

$$\begin{aligned} \int _{\vec {{x}} \in {\mathbb {C}}^n,\,|x|=1}\!\!d\vec {x}\,\, \prod _{k=1}^d |\vec {{x}}\cdot V_k|^2 \propto \!\!\!\sum _{\vec {{x}} \in \{-1,+1\}^d} \prod _{k=1}^d |\vec {{x}}\cdot V_k|^2\, \end{aligned}$$
(3)

with some exponentially small error, and a simple constant of proportionality that depends only on d and n. This concentration will let us relate permanents to combinatorial problems (Lemma 4), specifically counting solutions to Not-All-Equal-3SAT, and ultimately let us prove hardness.

The connection to quantum state tomography means we also get results about the hardness of estimating quantum states given measurements.

Definition 1

(Maximum Pure State Likelihood) For a quantum system with Hilbert space dimension n and \({{\,\textrm{poly}\,}}(n)\) observations, the maximum pure state likelihood is the highest likelihood of those observations attainable over any pure state \(\vert \psi \rangle \).

Theorem 2

(Theorem 9, informal) For any constant \(\epsilon > 0\), the following task is NP-hard: given a series of quantum observations, find a pure state with likelihood at least \(2^{-n^{1-\epsilon }}\) times the maximum pure state likelihood.

Unless RP=NP, this implies that there is no PRAS for maximum likelihood estimation (MLE) quantum state tomography; in fact, it is not even in APX. We have similar statements about the NP-hardness of computing the Bayesian average state and Bayesian average observables (Theorem 8). These results are unusual in that they imply exponential difficulty in dimension n in the Hilbert space \({\mathbb {C}}^n\). Most quantum problems are only considered tractable if they have efficient algorithms in the number of particles \(q = \log (n)\), and have trivially polynomial solutions in n; whereas we show that (assuming ETH [16]) quantum state tomography takes time exponential in n.

We stress that although our work has connections to quantum information through BosonSampling and tomography, our discussion of complexity is focused on classical computers. The NP-hardness are statements about classical hardness, and the algorithm described in Sect. 4.3 for tomography in fixed dimension is a polynomial time classical algorithm. Unless NP\(\subseteq \) BQP however, our results rule efficient permanent computations on quantum computers as well.

2 Key Ideas of the Proof

We start with a lemma relating symmetric, multilinear functions to permanents. Similar lemmas have appeared in [15, 17], and they can broadly be viewed as alternate forms of Wick’s Theorem [18].

Lemma 1

Suppose \(f: ({\mathbb {C}}^d)^{2n} \rightarrow {\mathbb {R}}\) is a function of 2n vectors of dimension d, that is:

  • Multilinear in its first n arguments

  • Conjugate multilinear in its latter n arguments

  • Symmetric in its first n arguments, and its latter n arguments

  • Invariant under unitary change of basis: for any unitary \(U \in {\mathbb {C}}^{d\times d}\),

    $$\begin{aligned} f(v_1,\dots ; v_{n}, \dots ) = f(Uv_1,\dots ; U v_n, \dots ) \end{aligned}$$

Then f is determined up to an overall constant C by the formula,

$$\begin{aligned} f(v_1,\dots ; v_{n},\dots ) = C {{\,\textrm{Perm}\,}}(A_{ij}),\quad \text { where } A_{ij} = v_i \cdot v_j^* \end{aligned}$$
(4)

and the constant C can be determined by

$$\begin{aligned} C = \frac{f(\vec {e}_{{1}},\vec {e}_{{1}},\vec {e}_{{1}},\dots )}{n!} \end{aligned}$$
(5)

where \(\vec {e}_{{1}}\) is the unit basis vector in the first coordinate.

Proof

Because f is invariant under a unitary change of basis, f can only depend on its inputs through inner products of vectors, \(\langle v_i,v_j\rangle \). Since f is multilinear, it can be written as a sum of terms \(t_k\), where each \(t_k\) is a product of terms from the vectors. The separate linearity and conjugate linearity means that the only permitted inner products are of covariant (first n) and contravariant (latter n) vectors. This means every term in the sum must be some product of the form \(\prod _{i \in [n]} v_i \cdot v_{n+\sigma (i)}^*\) for some permutation \(\sigma \) of [n]. Then by symmetry of the arguments, all pairs must occur in the same relation to either, so all pairings must occur equally. This leaves only a single form, the result above.

Computing C can be found by substituting in \(\vec {e}_{{1}}\) in Eq. (4) so that all dot products become 1. The permanent of the all-1’s matrix is just n!, so this becomes the normalizing factor. \(\square \)

This lets us relate the permanent to a particular integral over unit-norm complex vectors:

Theorem 3

For any \(L, R\in {\mathbb {C}}^{d\times n}\) be complex matrices, denoting the kth row of L as \(L_k\) and the kth row of R as \(R_k\),

$$\begin{aligned} \int _{\vec {{x}} \in {\mathbb {C}}^n,\,|x|=1}\!\!\!\!d\vec {x}\, \left( \prod _{k=1}^d \vec {{x}}^\dagger L_k\right) \left( \prod _{k=1}^d R_k^\dagger \vec {{x}} \right) = \frac{2\pi ^n{{\,\textrm{Perm}\,}}(LR^\dagger )}{(d+n-1)!} \end{aligned}$$
(6)

Note that when \(L=R\), the product in the integral becomes \(\prod _k |\langle L_k, \vec {{x}}\rangle |^2\), and the product \(M = L L^\dagger \) is PSD.

Proof

Viewing the left side as a function f of the n rows of each L and R, we can see that it satisfies all the hypotheses of Lemma 1. It is linear in each row of L, conjugate linear in each row of R, and symmetric under permuting the rows of L or the rows of R. It is also invariant under a unitary change of basis:

$$\begin{aligned} f(UL,UR)&= \int _{\vec {{x}} \in {\mathbb {C}}^n,\,|x|=1}\!\!\!\!d\vec {x}\,\, \prod _{k=1}^d \vec {{x}}^\dagger (UL_k)\,\,\prod _{k=1}^d (UR_k)^\dagger \vec {{x}} \end{aligned}$$
(7)
$$\begin{aligned}&= \int _{\vec {{x}} \in {\mathbb {C}}^n,\,|x|=1}\!\!\!\!d\vec {x}\,\, \prod _{k=1}^d (U^\dagger \vec {{x}})^\dagger L_k\,\,\prod _{k=1}^d R_k^\dagger (U^\dagger \vec {{x}}) \end{aligned}$$
(8)
$$\begin{aligned}&= \int _{\vec {{u}} \in {\mathbb {C}}^n,\,|u|=1} \!\!\!\!d\vec {u}\,\,\prod _{k=1}^d \vec {{u}}^\dagger L_k\prod _{k=1}^d R_k^\dagger \vec {{u}} = f(L,R) \end{aligned}$$
(9)

so that we’ve used the symmetry of the unit sphere in \({\mathbb {C}}^n\) to remove the unitary via \(\vec {{u}} = U\vec {{x}}\). Setting each \(L_k = R_k = \vec {e}_{{1}}\), the spherical integral can be computed with standard formulae (e.g. [19]) to find the normalizing constant

$$\begin{aligned} C = \frac{1}{n!}\int _{\vec {{x}} \in {\mathbb {C}}^n,\,|x|=1}\!\!\!\!d\vec {x}\,\, (\vec {{x}}^\dagger \vec {e}_{{1}})^n(\vec {e}_{{1}}^\dagger \vec {{x}})^n = \frac{2\pi ^n}{(d+n-1)!}. \end{aligned}$$
(10)

\(\square \)

2.1 Outline of the Proof

Before diving into the proof of hardness itself, we aim to provide some intuition of the construction. We focus on the integral \(F = \int _{\vec {{x}}} \prod _k |\langle V_k, \vec {{x}}\rangle |^2\) over the sphere of unit (complex) vectors, and build up a set of vectors V with desirable properties. The proof will involve gradually adding vectors to a list \(V_k\), in turn modifying the integrand \(G_V(\vec {{x}}) = \prod _k |\langle V_k, \vec {{x}}\rangle |^2\). This integrand \(G_V(\vec {{x}})\) is nonnegative, so there cannot be any cancellation in the integral. Our goal will be only showing that certain regions have exponentially small magnitude, so that only particular regions with appreciable contribution remain, and they are primarily responsible for the overall value of F. Then, the magnitude of F will be used to understand the value of \(G_V\) on those particular regions, where large values of F indicate solutions to an NP-hard problem. And since F can be computed by a HPSD permanent, computing that permanent must be hard as well.

How are we to choose the V in order to make an interesting function \(G_V\)? Each vector \(V_k\) introduces zeroes on the sphere at all vectors orthogonal to \(V_k\). All points approximately orthogonal to \(V_k\) will have a very small magnitude, and so contribute very little to the integral. We will start our collection of vectors by taking several copies of each standard basis vector \(\vec {e}_{{k}}\). This creates high-degree zeros along each of d distinct perpendicular directions, slicing the sphere so that the only regions with appreciable magnitude form the corners of a cube.

Fig. 1
figure 1

Schematic of how we can create “corners” on the sphere by repeatedly cutting with planes. Blue represents lower magnitude. This shows only purely real \(\vec {{x}}\)

After adding one copy of each basis vector \(\vec {e}_{{k}}\), the magnitude at a given point \(\vec {{x}}=(\alpha _1,\dots \alpha _d)\) is the product of the absolute values of its entries in that basis: \(G_V(\vec {{x}}) = \prod _j |\alpha _j|^2\). This is maximized when \(|\alpha _j| = |\alpha _k|=\frac{1}{\sqrt{d}}\) for all j, k. If we then subsequently add several vectors of the form \(\frac{\vec {e}_{{j}}+i\vec {e}_{{k}}}{\sqrt{2}}\) and \(\frac{\vec {e}_{{j}}-i\vec {e}_{{k}}}{\sqrt{2}}\), together these rule out a purely imaginary phase between the j and k components, so that the maxima are at \(\frac{\vec {e}_{{j}}\pm \vec {e}_{{k}}}{\sqrt{2}}\). After adding these two for each \(j\ne k\), \(G(\vec {{x}})\) will peak near \(\vec {{x}} = \frac{e^{i\theta }}{\sqrt{d}}(1,\pm 1,\pm 1\dots )\). Up to an overall phase of \(\vec {{x}}\), we’ve focused G to a set of \(2^{d-1}\) distinct points. These \(2^{d-1}\) circles of “binarized” vectors will be our focus, and we call this set \(B_0 = \{\frac{e^{i\theta }}{\sqrt{d}}(1,\pm 1,\pm 1\dots )\}\). To force \(G_V\) to concentrate on \(B_0\), we had to add \(d + 2\,{}_dC_2 = d^2\) vectors into our running list \(V_k\). By analogy with quantum information, we will refer to these as the Z vectors and Y vectors respectively. Together, this set of \(d^2\) vectors will form one “basic set” – “basic” in the set of “enforcing the basis”.

This is visualized in Fig. 1. This plots \(G(\vec {x})\), where \(\vec {x} = (x,y,z)\). The unit vectors form a sphere. After adding one basic set, we add a zero plane at \(x=0\), \(y=0\), \(z=0\); these planes are drawn as squares intersecting the sphere. The orange/white circles show the points that maximize \(G(\vec {x})\): \(\frac{1}{\sqrt{3}}(\pm 1,\pm 1,\pm 1)\). Taking many copies of the basic set is equivalent to raising \(G(\vec {x})\) to a high power, and the function will rapid fall off anywhere off the eight white hotspots, the binarized points that we concentrate to.

Once we have our basic vectors to concentrate G at these binarized points \(B_0\), we want to add vectors that will penalize some of these \(2^{d-1}\) points, so that finding the optimum becomes a search problem over exponentially many points. Our functional G is only sensitive to the relative phase between components of a vector, and not to the signs of the components themselves. This is the same symmetry that appears in quantum mechanics, where multiplying a quantum state \(\vert \psi \rangle \) by a phase yields as a physically identical state \(e^{i\theta }\vert \psi \rangle \). Such a factor \(\theta \) is a global phase, that we can and will neglect later. When we restrict to the binarized points \(B_0\), these phases are just sign differences, and the phase symmetry says that we don’t care about any individual sign: only the pattern of relative signs in the vector.

This sign-flipping symmetry leads us most naturally to the problem of Not-All-Equal 3-Satisfiability, or NAE3SAT [20]:

Definition 2

(NAE3SAT) Given n boolean variables and a set of clauses, each of which are triple of variables \((v_1,v_2,v_3)\), is there an assignment such that each clause contains at least one true variable and at least one false variable?

NAE3SAT is known to be NP-complete. It also has the same global symmetry where all variables in an assignment can be negated, and the satisfaction of the clauses remains unchanged; this reflects our global phase symmetry. So now consider the impact of adding a triple of “clause vectors”,

$$\begin{aligned} \vec {{v}}_1&= (\sqrt{6})^{-1}(-2\vec {e}_{{1}}+\vec {e}_{{2}}+\vec {e}_{{3}}) \end{aligned}$$
(11)
$$\begin{aligned} \vec {{v}}_2&= (\sqrt{6})^{-1}(\vec {e}_{{1}}-2\vec {e}_{{2}}+\vec {e}_{{3}}) \end{aligned}$$
(12)
$$\begin{aligned} \vec {{v}}_3&= (\sqrt{6})^{-1}(\vec {e}_{{1}}+\vec {e}_{{2}}-2\vec {e}_{{3}}). \end{aligned}$$
(13)

Each is orthogonal to \(\frac{1}{\sqrt{3}}(\vec {e}_{{1}}+\vec {e}_{{2}}+\vec {e}_{{2}})\), in which all the relative signs are positive (or equivalently, all negative). We call this collection of three vectors a “clause set”. This effectively rules out the possibility of all signs being the same. There are three not-all-equal points (up to phase):

$$\begin{aligned} \vec {{p}}_1 = (\sqrt{3})^{-1}(-\vec {e}_{{1}}+\vec {e}_{{2}}+\vec {e}_{{3}}) \end{aligned}$$
(14)
$$\begin{aligned} \vec {{p}}_2 = (\sqrt{3})^{-1}(\vec {e}_{{1}}-\vec {e}_{{2}}+\vec {e}_{{3}}) \end{aligned}$$
(15)
$$\begin{aligned} \vec {{p}}_3 = (\sqrt{3})^{-1}(\vec {e}_{{1}}+\vec {e}_{{2}}-\vec {e}_{{3}}) \end{aligned}$$
(16)

while we forbid the the fourth point

$$\begin{aligned} \vec {{q}} = (\sqrt{3})^{-1}(\vec {e}_{{1}}+\vec {e}_{{2}}+\vec {e}_{{3}}). \end{aligned}$$
(17)

We see that \(|\vec {{p}}_i \cdot \vec {{v}}_j|^2 = \frac{2 + 6\delta _{i,j}}{9}\). So, when V consists of just the three \(\vec {{v}}_i\), then \(G_V(\vec {{p}}_i) = \frac{32}{729}\), while \(G_V(\vec {{q}}) = 0\).

Fig. 2
figure 2

Illustration of the effect of a clause. The eight corners \(\{-1/\sqrt{3},1/\sqrt{3}\}^3\) represent four essentially distinct assignments, where opposite points are equivalent. The clause has three zero lines passing through one of the four points, while leaving the other three points untouched

This is visualized in Fig. 2. This shows \(G(\vec {x})\) in the same coordinates as Fig. 1, from three different viewpoints. The basic set from Fig. 1 are still present, and form the axis-aligned cuts visible in the first plot. The clause set adds three more planes, and eliminate two of the eight orange spots. They form the six-way intersection in the second plot. That six-way intersection was one of the eight orange spots, but now G(x) has been driven down to be zero there by the three incoming zero planes. The other six points are essentially unaffected.

By adding appropriate clause sets, the only remaining points with large values will be those satisfying an NAE3SAT problem, which is NP-hard. The other points will be too small to contribute to the integral, so that evaluating the integral tells us about the satisfiability of the NAE3SAT problem. With the outline complete, we now begin the steps of the proof, starting with the concentration.

3 Proof of Hardness

3.1 Concetration

After one basic set, each point in \(B_0\) has a value \(G_V(\vec {{x}})\) of \(1/d^{d^2}\) (by direct calculation). We would like to show that any point far away from \(B_0\) has a significantly lower value. For this reason, (and with the intuition that the integrand \(G_V\) represents likelihood values) we talk about relative values of \(G_V\). By the value of \(G_V(a)\) relative to \(G_V(b)\), we simply mean \(G_V(a)/G_V(b)\).

Any unit vector \(\vec {{x}} \in {\mathbb {C}}^d\) can be written as

$$\begin{aligned} \vec {{x}} = \frac{e^{i\Theta }}{\sqrt{d}} \sum _{k=1}^d \sqrt{\alpha _k} e^{i\pi (\theta _k+n_k)} \vec {{e}}_k \end{aligned}$$
(18)

where \(\Theta \), \(\vec {{\alpha }}_k\) and \(\vec {{\theta }}_k\) are all real, \(\alpha _k\ge 0\), \(\sum _k \alpha _k = d\), \(\theta _1=0\), and all \(\Theta \), \(\theta _k \in [-1/2,1/2])\), and \(n_k \in \{0,1\}\). The \(\vec {{\alpha }}\), \(\vec {{\theta }}\), and \(\vec {{n}}\) respectively indicate the amplitudes, phases relative to the first component, and signs of the real part. This polar representation is unique except for when one of the \(\alpha _0 = 0\), which is a measure-zero set. (We can neglect measure zero sets, as our concern is only with the integral \(F(V) = \int G_V(x)\)).

Lemma 2

Let \(\vec {{x}}\) be a unit vector with polar representation \(\Theta \), \(\vec {{\alpha }}\), \(\vec {{\theta }}\), and \(\vec {{n}}\). Let \(\epsilon _\alpha \) be the 2-norm distance of \(\vec {{\alpha }}=(\alpha _1,\dots \alpha _d)\) from \(\vec {{1}}\). Then when V is one basic set, the value of \(G_V(\vec {{x}})\) relative to any point in \(B_0\), is at most \(1-\frac{\epsilon _\alpha ^2}{4d}\). If \(\epsilon _\alpha \le 1/2\), then the value is also at most \(1-3\theta _i^2\) for all components \(\theta _i\) of \(\vec {{\theta }}\).

Proof

The set \(B_0\) consists of the points with \(\alpha _k = 1\) and \(\theta _k\) is an integer. If \(\vec {{x}}\) has significant distance from all elements of \(B_0\), then either the amplitudes \(\alpha _k\) or phases \(\theta _k\) must differ significantly from these conditions. The value after the basic set is

$$\begin{aligned} G_V(\vec {{x}})= & {} \left( \prod _k \left| \sqrt{\frac{\alpha _k}{d}}\right| ^2 \right) \left( \prod _{j\le k}\left| \frac{ \sqrt{\alpha _j} e^{i\pi (\theta _j+n_j)}+i\sqrt{\alpha _k} e^{i\pi (\theta _k+n_k)} }{\sqrt{2d}}\right| ^2\, \left| \frac{ \sqrt{\alpha _j} e^{i\pi (\theta _j+n_j)}-i\sqrt{\alpha _k} e^{i\pi (\theta _k+n_k)} }{\sqrt{2d}}\right| ^2\right) \\= & {} \left( \prod _k \frac{\alpha _k}{d^d} \right) \left( \prod _{j\le k} \frac{\alpha _j^2 + \alpha _k^2 + 2 \alpha _j \alpha _k \cos (2\pi (\theta _j - \theta _k + n_j - n_k))}{4d^2}\right) \\= & {} \frac{1}{d^d(2d)^{d^2-d}} \left( \prod _k \alpha _k \right) \left( \prod _{j\le k} \alpha _j^2 + \alpha _k^2 + 2 \alpha _j \alpha _k \cos (2\pi (\theta _j - \theta _k))\right) \end{aligned}$$

The first factor coming from the Z vectors \(\vec {e}_{{k}}\) in the basic set, and the last two factors coming from the Y vectors \(\frac{\vec {e}_{{j}}\pm i\vec {e}_{{k}}}{\sqrt{2}}\), for each \(j < k\), in the basic set.

The first step is to bound the value in terms of the magnitudes \(\alpha _k\). Looking at the effect of the Z vectors, \(\prod _k^d \alpha _k\), we have a convex function on the standard \((d-1)\)-simplex \(\sum \alpha _k = d\). It is clearly maximized at \({\vec {\alpha }}_{opt} = (1,1,1,\dots 1)\), where it evaluates to 1. Suppose that our \(\vec {{x}}\)’s associated \(\alpha \)-vector, \(\vec {\alpha } = (\alpha _1,\dots \alpha _d)\), is a distance at least \(\epsilon _\alpha \) away from \(\vec {\alpha }_{opt}\), and that \(\epsilon _\alpha \le 1\). Then one of the coordinates must be at least \(\epsilon _\alpha /\sqrt{d}\) away from 1. With generality, let this coordinate be \(\alpha _1\). If \(\alpha _1 \le 1 - \epsilon _\alpha /\sqrt{d}\), then the greatest the value could still be is when the other \(\alpha _k\) are all equal at \(1 + \epsilon _\alpha /\sqrt{d}(d-1)\). Multiplying these together, the resulting value is upper-bounded by \(1-\frac{\epsilon _\alpha ^2}{2(d-1)}\). If \(\alpha _1\) has instead been increased so that \(\alpha _1 \ge 1 + \epsilon _\alpha /\sqrt{d}\), then the value is maximized when the other \(\alpha _k\) are all equal at \(1 - \epsilon _\alpha /\sqrt{d}(d-1)\). Multiplying these together, the resulting value is upper-bounded by \(1-\frac{\epsilon _\alpha ^2}{4d}\). Since the latter of these bounds is looser, we see that any state whose \(\vec {\alpha }\) is at least \(\epsilon _\alpha \) away from the all-ones vector has a value at most \(1 - \frac{\epsilon ^2}{4d}\) in these measurements.

This gives bounds on the Z vectors’ contribution to the value. To keep this bound when the Y vectors are added, we need to check that they are also maximized at \(\vec {\alpha } = \vec {1}\). Each factor

$$\begin{aligned} \prod _{j\le k} \alpha _j^2 + \alpha _k^2 + 2 \alpha _j \alpha _k \cos (2\pi (\theta _j - \theta _k)) \end{aligned}$$

is maximized when \(\theta _j - \theta _k\) is an integer, at which point it becomes \(\prod _{j\le k} (\alpha _j+\alpha _k)^2 = \left( \prod \alpha _j + \alpha _k\right) ^2\). This is in turn globally maximized by \(\alpha _j = \alpha _k = 1\), so the error bound on \(\vec {\alpha }\) holds.

The next step is to bound the value in terms of the \(\vec {\theta }\). We only care about the degree to which \(\theta _i - \theta _j\) is not an integer, let \(r_{ij} = \theta _i - \theta _j\) to the nearest integer, so \(r_{ij} \in [-1/2,1/2]\). Given that \(\cos (2\pi r) \le 1 - 8r^2\) for all \(r\in [-1/2,1/2]\), we have a relative value of

$$\begin{aligned} \frac{G_V(\vec {{x}})}{G_V(B_0)}= & {} \frac{\alpha _j^2 + \alpha _k^2 + 2 \alpha _j \alpha _k \cos (2\pi r_{jk})}{\alpha _j^2 + \alpha _k^2 + 2 \alpha _j \alpha _k} \le \frac{\alpha _j^2 + \alpha _k^2 + 2 \alpha _j \alpha _k (1-8r_{jk}^2)}{\alpha _j^2 + \alpha _k^2 + 2 \alpha _j \alpha _k}\\ {}= & {} 1 - \frac{16\alpha _j\alpha _k}{(\alpha _j+\alpha _k)^2}r_{jk}^2 \end{aligned}$$

Let’s assume that each \(\alpha _j\) is in the interval [1/2, 3/2] – which is implied by them being sufficiently close to the all-ones vector, that is, \(\epsilon _\alpha \le 1/2\). Then the expression \(\frac{16\alpha _j\alpha _k}{(\alpha _j+\alpha _k)^2}\) is at least 3, so

$$\begin{aligned} \frac{G_V(\vec {{x}})}{G_V(B_0)} \le 1 - 3r_{jk}^2 \end{aligned}$$

which tells us that every phase \(\theta _i\) should be close to 0 for \(G_V\) to be large, or else suffer a \(1-3r^2\) penalty in the value. \(\square \)

This sets upper bounds on the integrand where are not close to \(B_0\). We will also need lower bounds on the integrand, if we are close to \(B_0\):

Lemma 3

If a vector \(\vec {{x}}\) is within distance \(\epsilon \le 0.1\) of some point b in \(B_0\), and V is one basic set, then \(\vec {{x}}\) has value at least

$$\begin{aligned} G_V(\vec {{x}}) \ge \frac{1-2\epsilon d^{5/2}}{d^{d^2}}. \end{aligned}$$

or in terms of the relative value,

$$\begin{aligned} G_V(\vec {{x}})/G_V(B_0) \ge 1-2\epsilon d^{5/2}. \end{aligned}$$

Proof

We will again use polar representation for \(\vec {{x}}\):

$$\begin{aligned} G_V(\vec {{x}}) = \frac{1}{d^d(2d)^{d^2-d}} \left( \prod _k \alpha _k \right) \left( \prod _{j\le k} \alpha _j^2 + \alpha _k^2 + 2 \alpha _j \alpha _k \cos (2\pi (\theta _j - \theta _k))\right) \end{aligned}$$

If our point \(\vec {{x}}\) is within distance \(\epsilon < 1\) of \(B_0\), then each of the \(\alpha _i\) must individually be within \(\epsilon \sqrt{d}\) of 1, and each \(\theta _i\) satisfies

$$\begin{aligned} \cos (\pi \theta _i) > \sqrt{1-\epsilon ^2} \, \implies \, |\theta _i| < \epsilon /2 \end{aligned}$$

and so

$$\begin{aligned} \cos (2\pi (\theta _j-\theta _k)) \ge 1 - \frac{(2\pi (\theta _j-\theta _k))^2}{2} \ge 1 - \frac{(2\pi \epsilon )^2}{2}. \end{aligned}$$

Then the value is bounded by,

$$\begin{aligned}{} & {} G_V(\vec {{x}}) \ge \frac{1}{d^d(2d)^{d^2-d}} \left( \prod _k 1-\epsilon \sqrt{d} \right) \\ {}{} & {} \quad \left( \prod _{j\le k} (1-\epsilon \sqrt{d})^2 + (1-\epsilon \sqrt{d})^2 + 2 (1-\epsilon \sqrt{d})^2 \left( 1 - \frac{(2\pi \epsilon )^2}{2}\right) \right) \\{} & {} \ge \frac{1}{d^d(2d)^{d^2-d}}(1-\epsilon \sqrt{d})^{(d^2+d)/2}\left( 4 - 4\pi ^2\epsilon ^2\right) ^{(d^2-d)/2}\\{} & {} \ge \frac{1}{d^{d^2}}\left( 1-\epsilon \sqrt{d}(d^2+d)/2-\pi ^2\epsilon ^2(d^2-d)/2\right) \end{aligned}$$

If \(\epsilon < \sqrt{d}/\pi ^2\), which is implied by \(\epsilon < 0.1\), then the term \(\pi ^2\epsilon ^2(d^2-d)\) is smaller than \(\epsilon \sqrt{d}(d^2+d)/2\), so we can combine the two. We can also bound \(d^2+d < 2d^2\).

$$\begin{aligned} G_V(\vec {{x}}) \ge \frac{1}{d^{d^2}}\left( 1-2 \epsilon \sqrt{d}(d^2+d)/2\right) = \frac{1-2\epsilon d^{5/2}}{d^{d^2}} \end{aligned}$$

\(\square \)

Together, these two lemmas establish a form of concentration: points close to \(B_0\) have large (lower-bounded) values of \(G_V\), and points far from \(B_0\) have small (upper-bounded) values of \(G_V\).

3.2 Restricting to Neighborhoods of \(B_0\)

Now we consider the effect of clause sets. A clause \({\textsf{C}}\) is defined by a triple of integers \(({\textsf{C}}_{1}, {\textsf{C}}_{2}, {\textsf{C}}_{3})\). A point \(b \in B_0\) with coordinates \((b_1, b_2, \dots b_d)\), each \(b_k = \pm e^{i\Theta }\), is “good" for the clause \({\textsf{C}}\) if \(\{b_{{\textsf{C}}_1}, b_{{\textsf{C}}_{2}}, b_{{\textsf{C}}_{3}}\}\) are not all equal. A point in \(B_0\) is “good" for a set of clauses if it is good for each of them, and a point is “bad” if it is not good. Each clause \({\textsf{C}}\) has an associated set of three clause vectors

$$\begin{aligned} \vec {{v}}_1= & {} (\sqrt{6})^{-1}(-2\vec {e}_{{{\textsf{C}}_1}} +\vec {e}_{{{\textsf{C}}_2}}+\vec {e}_{{{\textsf{C}}_3}})\\ \vec {{v}}_2= & {} (\sqrt{6})^{-1}(\vec {e}_{{{\textsf{C}}_1}}-2\vec {e}_{{{\textsf{C}}_2}}+\vec {e}_{{{\textsf{C}}_3}})\\ \vec {{v}}_3= & {} (\sqrt{6})^{-1}(\vec {e}_{{{\textsf{C}}_1}}+\vec {e}_{{{\textsf{C}}_2}}-2\vec {e}_{{{\textsf{C}}_3}}). \end{aligned}$$

Lemma 4

Take a clause \({\textsf{C}} = ({\textsf{C}}_{1}, {\textsf{C}}_{2}, {\textsf{C}}_{3})\) and let V be its three clause vectors. Nowhere does \(G_V\) exceed 1. At any point \(\vec {{x}}\) within a distance \(\epsilon \) of a good point, \(G_V(\vec {{x}}) \ge \frac{32}{27d^3}\left( 1-12\epsilon \sqrt{d}\right) \). At any point \(\vec {{x}}\) within a distance \(\epsilon \) of a bad point, \(G_V(\vec {{x}}) \le \frac{4096}{27}\epsilon ^6\).

Proof

To see that 1 is an upper bound on \(G_V\), note that \(G_V\) is a product of dot products of unit vectors, each of which is at most 1, so that \(G_V \le 1\).

For the second claim, we have a point \(\vec {{x}}\) close to a good point \(\vec {{g}}\). Since we only care about the value of \(G_V\) and the distance between \(|\vec {{x}}-\vec {{g}}|\), we may adjust the phase of \(\vec {{x}}\) and \(\vec {{g}}\) jointly so that \(\vec {{g}}\) is entirely real, and all of its entries are \(\pm 1\). We decompose \(\vec {{x}}\) in the form

$$\begin{aligned} \vec {{x}} = \alpha \vec {e}_{{{\textsf{C}}_1}} + \beta \vec {e}_{{{\textsf{C}}_2}} + \gamma \vec {e}_{{{\textsf{C}}_3}} + \vec {{x}}_{\perp } \end{aligned}$$

where \(\vec {{x}}_{\perp }\) is the support of \(\vec {{x}}\) on all component besides the first three. Then the likelihood factor due to the three clause vectors is,

$$\begin{aligned} G_V(\vec {{x}}) = \frac{1}{6^3} |\alpha +\beta -2\gamma |^2 \cdot |\alpha -2\beta +\gamma |^2 \cdot |-2\alpha +\beta +\gamma |^2 \end{aligned}$$

We seek to bound this value in the vicinity of good points. A good \(B_0\) point has not all signs equal. Since we can permute the elements of \({\textsf{C}}\) without affecting the value of \(G_V\), a general good point \(\vec {{g}}\) can be written as

$$\begin{aligned} \vec {{g}} = \frac{1}{\sqrt{d}}\left( -\vec {e}_{{{\textsf{C}}_1}} +\vec {e}_{{{\textsf{C}}_2}}+\vec {e}_{{{\textsf{C}}_3}} + \sqrt{d-3}\,\vec {{g}}_{\perp }\right) \end{aligned}$$

where \(\vec {{g}}_{\perp }\) contains the support on all the other basis vectors. It has \(G_V(\vec {{g}}) = \frac{32}{27d^3}\), by direct computation. Then for our other point \(\vec {{x}}\) within a distance \(\epsilon \) of \(\vec {{g}}\), each coordinate must also be within \(\epsilon \) of the corresponding coordinate in \(\vec {{g}}\). So

$$\begin{aligned} \Re [\alpha + \beta - 2\gamma ]\le & {} \frac{1}{\sqrt{d}}\left( (-1+\epsilon \sqrt{d}) + (1+\epsilon \sqrt{d}) - 2(1-\epsilon \sqrt{d})\right) \\ {}= & {} -2(1-2\epsilon \sqrt{d})/\sqrt{d} \end{aligned}$$

and similarly

$$\begin{aligned} \Re [-2\alpha + \beta +\gamma ]\ge & {} \frac{1}{\sqrt{d}}\left( -2(-1+\epsilon \sqrt{d}) + (1-\epsilon \sqrt{d}) + (1-\epsilon \sqrt{d})\right) \\ {}= & {} 4(1-\epsilon \sqrt{d})/\sqrt{d} \ge 4(1-2\epsilon \sqrt{d})/\sqrt{d}. \end{aligned}$$

Putting together the six factors,

$$\begin{aligned} G_V(\vec {{x}})&= \frac{1}{6^3} |\alpha +\beta -2\gamma |^2 \cdot |\alpha -2\beta +\gamma |^2 \cdot |-2\alpha +\beta +\gamma |^2\end{aligned}$$
(19)
$$\begin{aligned}&\ge \frac{1}{6^3} \Re [\alpha +\beta -2\gamma ]^2 \Re [\alpha -2\beta +\gamma ]^2 \Re [-2\alpha +\beta +\gamma ]^2\end{aligned}$$
(20)
$$\begin{aligned}&\ge \frac{32}{27d^3}\times \left( 1-2\epsilon \sqrt{d}\right) ^6\end{aligned}$$
(21)
$$\begin{aligned}&\ge \frac{32}{27d^3}\times \left( 1-12\epsilon \sqrt{d}\right) \end{aligned}$$
(22)

which is the second claim. For the third claim, take a bad point \(\vec {{h}}\) in \(B_0\), for which we can correct the phase to put it in the form

$$\begin{aligned} \vec {{h}} = \frac{1}{\sqrt{d}}\left( +\vec {e}_{{{\textsf{C}}_1}} +\vec {e}_{{{\textsf{C}}_2}}+\vec {e}_{{{\textsf{C}}_3}} + \sqrt{d-3}\,\vec {{h}}_{\perp }\right) \end{aligned}$$

Then for a nearby point only \(\epsilon \) away, each coordinate is at most \(\epsilon \) away. This means

$$\begin{aligned}{} & {} \Re [\alpha + \beta - 2\gamma ] \le \left( \frac{1}{\sqrt{d}}+\epsilon \right) +\left( \frac{1}{\sqrt{d}}+\epsilon \right) +\left( \frac{-2}{\sqrt{d}}+2\epsilon \right) = 4\epsilon \\{} & {} \Im [\alpha + \beta - 2\gamma ] \le 4\epsilon \\{} & {} \implies |\alpha +\beta -2\gamma |^2 \le 32\epsilon ^2 \end{aligned}$$

and similarly for the other two permutations, so that

$$\begin{aligned} G_V(\vec {{x}}) \le \frac{1}{6^3}(32\epsilon ^2)^3 = \frac{4096}{27}\epsilon ^6. \end{aligned}$$

\(\square \)

3.3 \(F = \int _x G_V(\vec {{x}})\) Detects NAE3SAT

With these bounds, we will be able to relate the number of solutions to a NAE3SAT instance to the integral \(F = \int _x G_V(\vec {{x}})\).

Theorem 4

Given an instance of NAE3SAT with d variables and k clauses, let the set of vectors V be given by \(K_1 = 1600d^7 \ln ^2(d)\) copies of basic vectors (Z and Y vectors), together with \(K_2 = d^2 \ln (d)\) copies of the clause vectors for each clause. For sufficiently large d, there is a function p(nk) such that, if there is at least one solution to the NAE3SAT, \(F = \int _x G_V(\vec {{x}}) \ge pd^{-22d}\), and if there are no solutions, \(F \le pd^{-d^2}\).

Proof

The theorem will hold if we take p as the value of \(G_V\) at a good point, or

$$\begin{aligned} p = d^{-K_1 d^2} \left( \frac{32}{27d^3}\right) ^{K_2}. \end{aligned}$$

If the original NAE3SAT instance has a satisfying assignment \((1,0,0,1,\dots )\), there is a corresponding good point

$$\begin{aligned} \vec {{g}} = \frac{1}{\sqrt{d}}\left( +\vec {e}_{{1}}-\vec {e}_{{2}}-\vec {e}_{{3}}+\vec {e}_{{4}}\dots \right) \end{aligned}$$

with a large value of \(G_V(\vec {{g}})\). Each set of basic vectors introduces a factor of \(1/d^{d^2}\) in G, and each set of clause vectors introduces a factor of \(32/27d^3\). Thus

$$\begin{aligned} G_V(\vec {{g}}) = d^{-K_1 d^2} \left( \frac{32}{27d^3}\right) ^{K_2} = p \end{aligned}$$

Further, we want to show that around this good point \(\vec {{g}}\), there is an appreciable volume with large \(G_V\), that will contribute substantially to F. Around each good point, take the ball of radius

$$\begin{aligned} \epsilon _g = \frac{1}{3200d^9(1+d)}. \end{aligned}$$

Then by Lemma 3, each set of basic observations gives a factor in G of at least

$$\begin{aligned} G_1 \ge \frac{1-2\epsilon _gd^{5/2}}{d^{d^2}} \end{aligned}$$

and by Lemma 4, each set of clause observations gives a factor at least

$$\begin{aligned} G_2 \ge \frac{32}{27d^3}(1-12\epsilon _g\sqrt{d}) \end{aligned}$$

so that the final \(G_V\) value of each point in the ball is at least

$$\begin{aligned} G_0&= G_1^{K_1} G_2^{K_2} \ge p(1-2\epsilon _gd^{5/2})^{K_1}(1-12\epsilon _g\sqrt{d})^{K_2}\end{aligned}$$
(23)
$$\begin{aligned}&\ge p(1-2\epsilon _gK_1d^{5/2})(1-12K_2\epsilon _g\sqrt{d})\end{aligned}$$
(24)
$$\begin{aligned}&= p\left( 1-2\frac{1}{3200d^9(1+d)}(1600d^7\ln ^2(d))d^{5/2}\right) \left( 1-12(d^2\ln (d))\frac{1}{3200d^9(1+d)}\sqrt{d}\right) \end{aligned}$$
(25)
$$\begin{aligned}&\ge p\left( 1-\frac{\ln ^2 d}{\sqrt{d}}\right) \end{aligned}$$
(26)

This means the total contributed to F by the ball around this good point is then at least \(p(1-\ln ^2 d/\sqrt{d})\) times the volume of this ball around \(\vec {{g}}\). The ball is not actually a sphere in \({\mathbb {R}}^{2d}\), as it lies on the manifold of normalized states, which is curved; it’s the intersection of a ball centered at \(\vec {{g}}\) and the unit sphere. But since \(\epsilon _g < 1/2\), this deformation reduces the volume by less than a factor of 1/2, and then we can use the standard volume of the ball. So the volume obeys

$$\begin{aligned} \text {Vol} \ge \frac{1}{2}\cdot \frac{2(d-1)!(4\pi )^{(d-1)}}{(2d-1)!}\epsilon _g^{2d-1} \end{aligned}$$

and a single good point contributes a total to the integral F(V) at least

$$\begin{aligned} \text {Vol}\cdot G_0 \ge pc_1c_2^{-d}d^9d^{-21d} \end{aligned}$$

for some particular constants \(c_1, c_2 > 1\); the \(d^{-21d}\) term clearly dominates the scale for large d. For sufficiently large d then we can write

$$\begin{aligned} F \ge \text {Vol}\cdot G_0 \ge pd^{-22d} \end{aligned}$$

which establishes the first claim. The second claim concerns when there are no good points. Suppose for contradiction that there is some point \(\vec {{x}}\) (not necessarily in \(B_0\)) so that \(G_V(\vec {{x}}) > p/d^{d^2}\). Applying Lemma 2, we know that it must have \(\epsilon _\alpha = |\vec {\alpha } - \vec {1}| < 0.1/d^2\), otherwise it would have at most

$$\begin{aligned} G_V(\vec {{x}}) \le \left( d^{-d^2}(1-0.1^2/4d^5)\right) ^{K_1}&< d^{-K_1d^2}\exp (-K_1/400d^5) \end{aligned}$$
(27)
$$\begin{aligned}&= d^{-K_1d^2}\exp (-4d^2\ln ^2 d) \end{aligned}$$
(28)
$$\begin{aligned}&< d^{-K_1d^2}\exp \left( -4d^2\ln ^2 d+d^2\ln d\ln (32/27)\right) \end{aligned}$$
(29)
$$\begin{aligned}&= d^{-K_1d^2}\exp \left( -d^2\ln ^2 d+d^2\ln d\ln (32/27d^3)\right) \end{aligned}$$
(30)
$$\begin{aligned}&= d^{-K_1d^2}\exp \left( -d^2\ln ^2 d+\ln \left( \left( \frac{32}{27d^3}\right) ^{K_2}\right) \right) \end{aligned}$$
(31)
$$\begin{aligned}&= d^{-K_1d^2}\left( \frac{32}{27d^3}\right) ^{K_2}/{d^{d^2\ln d}} \, = p/d^{d^2 \ln d} \end{aligned}$$
(32)
$$\begin{aligned}&< p/d^{d^2} \end{aligned}$$
(33)

Since \(\epsilon _\alpha \le 1/2\), we can also apply the second part of Lemma 2 and check that the all phases \(|\theta _i| < 0.1/d\), otherwise our point would have \(G_V\) at most

$$\begin{aligned} \left( d^{-d^2}(1-3\theta _i^2)\right) ^{K_1}&< \left( d^{-d^2}(1-0.03/d^2)\right) ^{K_1}\end{aligned}$$
(34)
$$\begin{aligned}&< \left( d^{-d^2}(1-0.1^2/4d^5)\right) ^{K_1}\end{aligned}$$
(35)
$$\begin{aligned}&< p/d^{d^2} \end{aligned}$$
(36)

Since the amplitudes are all within \(\epsilon _a\) of \(1/\sqrt{d}\), and the phases are all within 0.1/d of 0, the point’s distance to the nearest point b in \(B_0\) is at most

$$\begin{aligned}{} & {} \text {dist}_{B_0} \le \sqrt{d}\left( \epsilon _a + \left( \frac{1}{\sqrt{d}}+\epsilon _a\right) \left( (1-\cos (\theta _i))^2+\sin ^2(\theta _i)\right) \right) \\{} & {} \le \sqrt{d}\left( \frac{0.1}{d^2} + \left( \frac{1}{\sqrt{d}}+\frac{0.1}{d^2}\right) \left( 2-2\cos (0.1/d)\right) \right) \le \sqrt{d}\left( \frac{0.1}{d^2} + \frac{2}{\sqrt{d}}\left( 0.1/d\right) ^2\right) \\{} & {} \le \frac{0.11}{d^{3/2}} \end{aligned}$$

If that point b is bad, then by Lemma 4 our point would have \(G_V\) at most

$$\begin{aligned}{} & {} d^{-K_1d^2}\left( \frac{4096}{27}\left( \frac{0.11}{d^{3/2}}\right) ^6\right) ^{K_2} {= d^{-K_1d^2}\left( \frac{0.00027}{d^{9}}\right) ^{K_2}}\\ {}{} & {} \quad = d^{-K_1d^2}\left( \frac{32}{27d^3}\right) ^{K_2} \left( \frac{0.00027}{(32/27)d^6}\right) ^{K_2}\\{} & {} \quad< p \times 0.00025^{K_2} = p / 4000^{d^2 \ln d}< p / (4000d)^{d^2} < p/d^{d^2}. \end{aligned}$$

We’ve shown that all points have \(G_V \le p/d^{d^2}\). The volume of integration is \(S_{2n-1} < 1\), so the total integral F is less than \(p/d^{d^2}\). \(\square \)

As an aside, we might hope that F actually detects MAX3SAT as well: that, if there are no solutions to the original constraint problem, that perhaps the value of F would be determined by the maximum number of clauses satisfied at once. This is considerably harder to analyze, however, since \(G_V\) itself is actually zero at any binarized point, since at least one constraint is not satisfied. Accordingly, the value of F will be based on the relatively small values of \(G_V\) away from binarized points. The fact that we cannot reasonably analyze this case does not hurt our results of course, since MAX3SAT can still be reduced to NAE3SAT by virtue of its NP-completeness.

3.4 NP Hardness

We can now prove our main result.

Theorem 5

For any constant \(C<1\), it is NP-Hard to approximate the permanent of an \(n\times n\) Hermitian positive semidefinite matrix within a factor of \(2^{n^C}\).

Proof

We can reduce from NAE3SAT. Given an NAE3SAT instance on d variables, we can use the set of vectors V described in Theorem 4 and examine the resulting value F. As we have \(O(d^9)\) vectors in V, the quantity F can be represented as a permanent of a matrix of size \(O(d^9)\). The NAE3SAT instance is satisfiable if \(F \ge pd^{-22d}\) and unsatisfiable if \(F \le pd^{-d^2}\), which can be distinguished if approximating within a factor of \(d^{d^2-22d} = O(d^{d^2})\), and so \(O(2^{d^2})\) will suffice. If we had an oracle that could approximate permanents of size n PSD matrices within a factor of \(2^{n^C}\) for some \(C<1\), then we could do the replica trick: take the matrix corresponding to F, and repeat it \(M = d^{(9C-2)/(1-C)}\) many times along the diagonal. We assume that \(9C-2 \ge 0\), so that the exponent on M is positive; otherwise we can freely increase C up past 2/9 and this would only make the problem easier. The result is a matrix of size \(M d^9\), which is then approximated within a factor of \(2^{(Md^9)^C}\). The resulting matrix size \(Md^9\) is still poly(d) for any fixed C. Then we raise this approximate answer to the power 1/M to recover an approximation to the original permanent, and it has multiplicative error

$$\begin{aligned} \left( 2^{(Md^9)^C}\right) ^{1/M} = 2^{d^{9C}M^{C-1}} = 2^{d^{9C}d^{2-9C}} = 2^{d^2} \end{aligned}$$

which is sufficient to distinguish between satisfiable and unsatisfiable instances. As NAE3SAT is NP hard, so is approximating HPSD permanents with this accuracy. \(\square \)

This result is complementary to one of Anari et al. [12], where they show that one can approximate within a factor of \(\exp ((1+\gamma +o(1))n)\) where \(\gamma \) is the Euler-Mascheroni constant, while we showed that permanents cannot be approximated with subexponential error. Our hard instances circumvent the fast approximation schemes of Barvinok [15] and Chakhmakhchyan et al. [14], which both have requirements on the spectrum of the matrix, and perform more favorably when \(\lambda _{max}/\lambda _{min}\) is smaller. Our instances are of low rank (only rank d, which is much smaller than the matrix size n) so that \(\lambda _{min} = 0\). Finally, we conjecture that the reduction above is approximation preserving: that each good point contributes an equal amount to the integral that can easily be estimated beforehand. Showing this would require tighter error bounds.

Conjecture 1

With an appropriate choice of polynomial-scaling \(K_1\) and \(K_2\), the construction used in Theorem 4 is an approximation-preserving reduction from #NAE3SAT to HPSD permanents, such that approximating HPSD Permanents within a factor C is as hard as approximating #NAE3SAT (or #3SAT) within a factor C.

It is known that by Stockmeyer counting [7, 9, 10] computing multiplicative approximations to PSD permanents is contained in \(\textsf {FBPP}^\textsf {NP}\). If approximating PSD permanents is indeed as hard as approximating #3SAT, it seems unlikely to be significantly easier than \(\textsf {FBPP}^\textsf {NP}\).

3.5 Real Matrices

The arguments above all involve complex vectors, complex matrices, and integrals over the complex unit sphere. The arguments however can easily be adapted to show that PSD permanents remain hard even for purely real matrices. We could have proved the results only for the real case and this would of course imply hardness for the more general complex case, but the proof for the real case was less symmetric, asthetic, or inuitive than the complex case, which is why we delayed to this section.

Theorem 6

For any constant \(C<1\), it is NP-Hard to approximate the permanent of an \(n\times n\) real positive semidefinite matrix within a factor of \(2^{n^C}\).

Proof

The construction proceeds very similarly to above, by reducing from NAE3SAT. However, we now use one dimension more in the space: a d-variable NAE3SAT problem is mapped to a \((d+1)\)-dimensional spherical integral \(\int G_V(\vec {x})\). The clauses are mapped, as before, with \(K_2\) many sets of clause vectors, connecting the variables 1 through d in the original problem with dimensions 1 through d in the spherical integral G(x). The “basic sets" still include \(K_1\) many instances of the unit vectors \(\vec {e}_{{k}}\) in each basis direction \(k \in [d+1]\), what we previously referred to as the Z vectors.

The Y vectors were, in preivous proofs, of the form \(\frac{\vec {e}_{{j}} \pm i\vec {e}_{{k}}}{\sqrt{2}}\), for \(j\ne k\). This was the sole source of complex terms in our vectors, and the reasons the resulting matrices were complex. Instead now we use four copies of each of \(\frac{\vec {e}_{{j}} \pm \vec {e}_{{d+1}}}{\sqrt{2}}\). These each softly enforce the constraint that the component of \(\vec {x}\) in the j direction and the \(d+1\) direction have relative phase \(\pm i\) (that is, \(\pm \sqrt{-1}\)). Since each j has \(\pm i\) relative to \(d+1\), this implies that each \(j\ne k\) have relative phase \(\pm 1\).

To make this quantitative and precise, we refer to the proof of Lemma 2. The bound of \(1-\frac{\epsilon _\alpha ^2}{4d}\) applies as before, since the \(\vec {e}_{{k}}\) vectors occur just as before. As proved in Lemma 2, if \(\theta _j\) and \(\theta _{d+1}\) differ by a phase (up to \(\pm 1\)) of \(\Delta \theta _j = \theta _j - \theta _{d+1}\), then the likelihood \(G_V(x)\) is reduced by a factor of \(1 - 3 \Delta \theta _j^2\); since we use each vector eight times, this becomes \((1-3\Delta \theta _j^2)^8\). Then for two \(j\ne k\), \(j,k \le d\), the likelihood is at most

$$\begin{aligned} (1-3\Delta \theta _j^2)^4(1-3\Delta \theta _k^2)^4\le & {} \left( 1-3\left( \frac{|\Delta \theta _j|+|\Delta \theta _k|}{2}\right) ^2\right) ^{8}\\ {}\le & {} \left( 1-3\left( \frac{|\theta _j - \theta _k|}{2}\right) ^2\right) ^{8} \le 1 - 3(\theta _j - \theta _k)^2 \end{aligned}$$

which gives us the same bound on the relative phases as before, so that an analogous statement to Lemma 2 also holds for our new basis set. The proof of Lemma 3 holds with few modifications: in the proof above, the Y terms

$$\begin{aligned} \prod _{j\le k} \alpha _j^2 + \alpha _k^2 + 2 \alpha _j \alpha _k \cos (2\pi (\theta _j - \theta _k)) \end{aligned}$$

as bounded by a factor

$$\begin{aligned}{} & {} \prod _{j\le k} (1-\epsilon \sqrt{d})^2 + (1-\epsilon \sqrt{d})^2 + 2 (1-\epsilon \sqrt{d})^2 \left( 1 - \frac{(2\pi \epsilon )^2}{2}\right) \\ {}{} & {} \quad = \left( (1-\epsilon \sqrt{d})\left( 4 - 4\pi ^2\epsilon ^2\right) \right) ^{(d^2-d)/2} \ge 1-(\epsilon \sqrt{d}+\pi ^2\epsilon ^2)\frac{d^2-d}{2}. \end{aligned}$$

Here instead we have four copies of each phase constraint, but only between \(j \le d\) and \(d+1\). So the penalty from

$$\begin{aligned} \prod _{j\le d} \Big (\alpha _j^2 + \alpha _{d+1}^2 + 2 \alpha _j \alpha _{d+1} \cos (2\pi (\theta _j - \theta _{d+1}))\Big )^4 \end{aligned}$$

becomes

$$\begin{aligned}{} & {} \prod _{j\le d} \left( (1-\epsilon \sqrt{d})^2 + (1-\epsilon \sqrt{d})^2 + 2 (1-\epsilon \sqrt{d})^2 \left( 1 - \frac{(2\pi \epsilon )^2}{2}\right) \right) ^4\\ {}{} & {} \quad = \left( (1-\epsilon \sqrt{d})\left( 4 - 4\pi ^2\epsilon ^2\right) \right) ^{4d} \\{} & {} \quad \ge 1-(\epsilon \sqrt{d}+\pi ^2\epsilon ^2)(4d) \ge 1-(\epsilon \sqrt{d}+\pi ^2\epsilon ^2)\frac{d^2-d}{2} \end{aligned}$$

as before, as long as \(d\ge 9\). The resulting conclusion of the lemma that the relative value \(G_V(\vec {{x}})/G_V(B_0) \ge 1 - 2\epsilon d^{5/2}\) thus still holds.

Finally, Lemma 4 remains umodified in this setting, as the form of the clause vectors is unchanged. As all the necessary lemmas hold as before, and the proofs of Theorems 4 and 5 only care about relative values, they will all hold in the real-valued PSD setting. \(\square \)

4 Quantum State Tomography

The author initially found the above construction while investigating the worst-case hardness of quantum state tomography, and the hardness implies that several problems in the context of tomography are NP-hard as well.

Quantum State Tomography (QST) is the procedure of estimating an unknown quantum state from a set of measurements on an identically prepared ensemble. The procedure can encompass both the choosing of measurement bases as well as estimating the resulting state from the measurements; in adaptive settings, the running estimate is also used to inform future measurement choices [21, 22]. We focus on the latter task, of building an estimate of the state. We look at four related forms of what “estimation" can qualify as:

  1. 1.

    Finding the Maximum Likelihood Estimator (MLE): the pure state \(\rho \) most likely to produce the observations.

  2. 2.

    Finding the Bayesian expected state \(\rho _{Avg}\): assuming a prior over the possible pure states, finding the mixed state presenting the mixture of appropriately weighted possible states.

  3. 3.

    Computing the expectation value of some future observation(s).

  4. 4.

    Finding the probability that the unknown state is in fact some particular \(\rho _0\). (As there are infinitely many different pure states, we are actually asking for the probability density at \(\rho _0\).)

The first three estimations problems have all been extensively studied with various heuristics. MLE can be attempted by linear inversion [23, 24], iterative search [25, 26], or even neural networks [27]. Bayesian estimation can be accomplished by direct numerical integration [28] or particle based sampling [21], possibly with neural networks guiding the particles [22]. Directly estimating future samples has also been attempted with neural networks [29] or classical shadows [30,31,32]. The author is not aware of any prior work on computing estimation problem 4.

We can show that estimation problems 2, 3, and 4 are essentially as hard as approximating PSD permanents, and that task 1 is also NP-hard. The exponential difficulty (assuming ETH [16]) is in fact in the dimension d of the underlying Hilbert space. Many questions in quantum information appear to be “exponentially” hard, in the sense that it is hard to analyze a system of q qubits faster than \(O(2^q)\). But here \(d=2^q\), so that even when the number of qubits is a logarithmically small \(q=\log (d)\), the problem of state estimation remains exponentially hard.

4.1 Outline of Tomography Results

Of the four forms above, we focus first on estimation problem 4. Although it is likely the question least relevant to experiment, it is the easiest to manipulate algebraically. We call it Quantum-Bayesian-Update, or simply QBU, and define it in Sect. 4.2. In Sect. 4.3, we give an exponential time algorithm for QBU, showing that it is at least possible. In Sect. 4.4 we show that estmation problems 2, 3, and 4 are equivalent. In Sect. 4.5 we explain QBU’s connection to HPSD permanents, and show it is NP-Hard to approximate within subexponential error. In Sect. 4.6 we show how the construction of difficult PSD permanents can also be modified shows that the MLE problem (estimation problem 1) is also NP-hard to approximate: it is NP-hard to check the existence of a state with likelihood within a subexponential factor.

4.2 Quantum Bayesian Update

We define the QBU problem as follows: given a series of observations \({\mathcal {O}}_i\) each taken from a copy of \(\rho \), and a guess \(\rho _0\), what is the probability density that \(\rho = \rho _0\)? The actual probability of equality is zero—unless we have some other powerful information about the state—which is why we ask for the probability density in the space of candidate density matrices.

Bayes’ theorem lets us compute the probability density of a true state \(\rho \) in terms of the likelihood of the observations \(P({\mathcal {O}}|\rho )\), a prior belief distribution \(P(\rho )\), and the total probability of the sequence of observations \(P({\mathcal {O}})\). It reads,

$$\begin{aligned} P(\rho _0|{\mathcal {O}}) = \frac{P({\mathcal {O}}|\rho _0)P(\rho _0)}{P({\mathcal {O}})} \end{aligned}$$

In order for the equation to be meaningful and not identically zero on both sides, we can read \(\rho \) as representing a small volume in the space of density matrices. While there are many natural priors on the space of density matrices, we focus on the case where we know the unknown state \(\rho \) is pure. This models, for instance, where we are trying to identify the output of a unitary quantum channel. The most natural prior is then the uniform distribution over all pure states, given by the Haar measure. Then all P(E) are equal. The likelihood of a given observation \({\mathcal {O}}_i\) is simply \({{\,\textrm{Tr}\,}}[{\mathcal {O}}_i\rho ]\), so our goal is to compute

$$\begin{aligned} P(\rho |{\mathcal {O}}) = \frac{\prod _{i\in [n]} {{\,\textrm{Tr}\,}}[{\mathcal {O}}_i \rho ]}{P({\mathcal {O}})} \end{aligned}$$

In general \({\mathcal {O}}_i\) could be operators of any rank, and could belong to POVMs. For hardness, it will suffice it consider only observations with rank 1 and trace 1, but for now we allow them to be general. For any particular \(\rho \) and sequence \({\mathcal {O}}_i\), the likelihood \(\prod {{\,\textrm{Tr}\,}}[{\mathcal {O}}_i \rho ]\) can be evaluated directly in \(O(nd^2)\) operations. The difficulty then lies in the normalizing factor,

$$\begin{aligned} p_{norm}= & {} P({\mathcal {O}})\\ \text {so that}\\ P(\rho |{\mathcal {O}})= & {} p_{norm}^{-1}\prod _{i\in [n]} {{\,\textrm{Tr}\,}}[O_i \rho ] \end{aligned}$$

This indicates the probability of an entire sequence of observations. While a single observation has the simple form of \(P({\mathcal {O}}_i) = {{\,\textrm{Tr}\,}}[{\mathcal {O}}_i]\), the expression rapidly becomes more complicated as we consider sequences of observations.

A brief example is useful for understanding what \(p_{norm}\) represents. Suppose that we measure a qubit 1000 times along each of the X, Y, and Z axes: we expect to see a particular amount of bias. Observing 1000 results each of +X, +Y, and +Z would be very unlikely, as the qubit cannot be in the +1 eigenstate of all three axes at once. It would be similarly surprising to see exactly 500 counts each of \(+\)X, −X, \(+\)Y, −Y, \(+\)Z, and −Z: this state shows no tendency of a particular orientation, but a pure qubit state must show a bias towards some orientation. This would have a small value of \(p_{norm}\), as there is no good state to explain the sequence observed. A sequence of 1000 \(+\)Z observations, and 500 each of \(+\)X, −X, \(+\)Y, and −Y is much more likely, as it can be well explained by the \(\vert \uparrow \rangle \) state, and so has a larger value of \(p_{norm}\).

As we just saw, computing the probability density \(P_{density}(\rho = \rho _0|{\mathcal {O}})\) is easy if \(p_{norm}\) is known, and conversely \(p_{norm}\) can be easily computed from the probability density. \(p_{norm}\) is a more attractive goal for our problem, as it doesn’t depend on \(\rho _0\). It can be computed by summing up all unnormalized probabilities:

$$\begin{aligned} p_{norm} = \int _{\vec {x} \in {\mathbb {C}}^d_{1}} \prod _{i\in [n]} {{\,\textrm{Tr}\,}}[{\mathcal {O}}_i xx^\dagger ]\,dx \end{aligned}$$

where the integral is over the Hilbert space \({\mathbb {C}}^d\) restricted to length-1 vectors. This leads to the definition,

Definition 3

(Quantum-Bayesian-Update) Given a collection of observations \({\mathcal {O}} = ({\mathcal {O}}_1,\dots {\mathcal {O}}_n)\) in a Hilbert space of dimension d, compute

$$\begin{aligned} p_{norm} = \frac{\int _{\vec {x} \in {\mathbb {C}}^d_{1}} \prod _{i\in [n]} {{\,\textrm{Tr}\,}}[{\mathcal {O}}_i xx^\dagger ]\,dx}{{{\,\textrm{Tr}\,}}[{\mathcal {O}}_i]} \end{aligned}$$
(37)

4.3 Polynomial Time QBU for Fixed d

This space of state vectors \({\mathbb {C}}_1^d\) has the geometry of a real \((2d-1)\)-sphere, and the entries of \(\rho \) are quadratic in the Cartesian coordinates for this sphere. Thus, \(p_{norm}\) becomes a integral over a \((2d-1)\)-sphere of a homogeneous 2n degree polynomial in the 2d variables. The expansion of the polynomial into monomials takes \(O((2n)^{2d})\) time, and each monomial can then be immediately integrated over the sphere using the formula [19]

$$\begin{aligned} \int _{S^k} x_1^{\alpha _1}x_1^{\alpha _2}\dots x_k^{\alpha _k} = {\left\{ \begin{array}{ll} 0 &{} \text { if any } \alpha _i \hbox { are odd }\\ \frac{2\prod _i\Gamma (\frac{1}{2}(\alpha _i+1))}{\Gamma (\sum _i \frac{1}{2}(\alpha _i+1))} &{} \text { if all } \alpha _i \hbox { are even }\\ \end{array}\right. } \end{aligned}$$
(38)

where \(\Gamma \) is gamma function, \(\Gamma (\frac{1}{2}(\alpha +1)) = \sqrt{\pi 2^\alpha }(\alpha -1)!!\). This gives a polynomial time algorithm for evaluating \(p_{norm}\) when d is fixed. This is functionally equivalent to the \(O(n^d)\) algorithm for permanents described in [33]: Barvinok describes the more general form that applies to any matrices (not just PSD) of rank d. This fact puts the (exact) permanent calculation in the class XP, or slicewise polynomial time, when parameterized by rank.

4.4 Relationship Between Estimation Problems

Since QBU is not of particular interest to actual tomography tasks, we show it is equivalent (under polynomial many-one reductions) to the more realistic tasks 2 and 3 above, of estimating observables or the state itself. We can show that these are just as difficult (or, just as easy) as the Bayesian update step.

4.4.1 Computing \(\rho _{Avg}\)

Given that there will always be room for uncertainty, we cannot meaningfully ask for a single pure state as an answer, but we can ask for \(\rho _{Avg}\): the mixed state representing the correctly updated mixture over all the possible true states, given by \(\int P(\rho ) \rho \,d\rho \). The impure \(\rho _{Avg}\) reflects the expectation of all observables given our current information. We parameterize the space of density matrices by a single vector \(\psi \in S^{2d-1}\), and given some completed observations \({\mathcal {O}}\), the Bayesian expected state is

$$\begin{aligned} \rho _{Avg}= & {} \int _{\psi \in S^{2d-1}} P\Big (\vert \psi \rangle \langle \psi \vert \Big |{\mathcal {O}}\Big ) \vert \psi \rangle \langle \psi \vert \,d\psi \\= & {} \int _{\psi \in S^{2d-1}} p_{norm}^{-1}\Big (\vert \psi \rangle \langle \psi \vert \Big )\prod _{O\in {\mathcal {O}}} {\langle \psi |O|\psi \rangle } \,d\psi \end{aligned}$$

whose individual matrix elements are

$$\begin{aligned} {\langle i|\rho _{Avg}|j\rangle } = p_{norm}^{-1}\int _{\psi \in S^{2d-1}} {\langle i|\psi \rangle }{\langle \psi |j\rangle }\prod _{O\in {\mathcal {O}}} {\langle \psi |O|\psi \rangle } \,d\psi \end{aligned}$$

We have already discussed computing \(p_{norm}\), as a spherical integral of a polynomial. For any given i and j, the remaining integral is also a spherical integral of a polynomial, and can be computed in the same fashion as 4.3. Since each of the \(d^2\) matrix elements can be computed in \(O((2n)^{(2d)})\) time, this the Bayesian average state \(\rho _{Avg}\) is also computable in polynomial time for a fixed d.

On the other hand, a diagonal element \({\langle i|\rho _{avg}|i\rangle }\) gives

$$\begin{aligned} {\langle i|\psi \rangle }{\langle \psi |i\rangle }\prod _{O\in {\mathcal {O}}} {\langle \psi |O|\psi \rangle } = \langle \psi \vert \Big (\vert i\rangle \langle i\vert \Big )\vert \psi \rangle \prod _{O\in {\mathcal {O}}} {\langle \psi |O|\psi \rangle } = \prod _{O\in ({\mathcal {O}} \cup \{\vert i\rangle \langle i\vert \})} {\langle \psi |O|\psi \rangle } \end{aligned}$$

which is the same integrand as for \(p_{norm}\), only with one additional observation \(\vert i\rangle \langle i\vert \) added.

If we had an algorithm compute \(\rho _{Avg}\) efficiently, we could use it to solve the Bayesian update problem on a set of observations \({\mathcal {O}}\), as follows. First, discard the last observation \(O_{last}\), compute \(\rho _{Avg}\) on the \(n-1\) other observations, and compute \(p_{norm,n-1}\) for those \(n-1\) as well. This calls the procedure recursively on one less observation. Then the desired answer is

$$\begin{aligned} p_{norm} = p_{norm,n-1} {{\,\textrm{Tr}\,}}[O_{last} \rho _{Avg}] \end{aligned}$$

That is to say, the probability of the n observations is just the probability of the first \(n-1\) observations multiplied by the last probability of the last (conditioned on the first). This recursive approach makes n calls to computing \(\rho _{Avg}\), which shows that the QBU problem is at most n times as hard as state estimation.

4.4.2 Computing Observable Expectations

We could try to only find the expectation of a particular observable A, and not the whole state \(\rho _{Avg}\), conditioned on our observations. We can write this as \(E[A|{\mathcal {O}}]\). This is also just as hard: density matrices as a \(d^2-1\) linear space, and expectations of observables are linear in \(\rho \), so by computing the exact expectation of \(d^2-1\) independent observables, we can find \(\rho _{Avg}\) exactly, by solving a linear system. This is of course precisely the idea behind least-squares quantum state estimation, and it shows that computing expectation values is as hard as \(\rho _{Avg}\).

Finally, if we could compute a Bayesian update, we could compute the expectation values of observables. Take the eigendecomposition of our operator A: write \(A = \sum \lambda _i \vert i\rangle \langle i\vert \), and evaluate

$$\begin{aligned} E[A|{\mathcal {O}}]= & {} \sum \lambda _i E[{\langle i|\psi \rangle }{\langle \psi |i\rangle }|{\mathcal {O}}] =\sum \lambda _i p_{norm}^{-1} \int _{\psi \in S^{2d-1}} {\langle i|\psi \rangle }{\langle \psi |i\rangle }\prod _{O\in ({\mathcal {O}}} {\langle \psi |O|\psi \rangle } \,d\psi \\{} & {} \sum \lambda _i p_{norm}^{-1} \int _{\psi \in S^{2d-1}} \prod _{O\in ({\mathcal {O}} \cup \{\vert i\rangle \langle i\vert \})} {\langle \psi |O|\psi \rangle } \,d\psi \end{aligned}$$

Computing \(p_{norm}\) and each of the d many spherical integrals is a Bayesian update problem. We have reductions (Bayesian update) \(\rightarrow \) (Compute \(\rho _{Avg}\)) \(\rightarrow \) (Compute \(E[A|{\mathcal {O}}]\)) \(\rightarrow \) (Bayesian update), so these are equivalent in difficulty. Note that these are many-one reductions, which is unavoidable as \(\rho _{Avg}\) is a matrix-valued function problem while the two are scalar-valued.

4.5 NP-Hardness of QBU and \(\rho _{Avg}\)

We now state the main hardness results on quantum tomography.

Theorem 7

For any \(C < 1\), it is NP-hard to compute the value \(p_{norm}\) for Quantum-Bayesian-Update with an approximation factor of at most \(2^{n^C}\).

Proof

When \({\mathcal {O}}_i\) are all rank-1 operators, the numerator in Eq. (37) is of the form in Theorem 3, and the denominator in Eq. (37) can be efficiently computed by direct calculation. Thus any PSD permanent can be efficiently reduced to a problem of computing \(p_{norm}\) with an approximation-preserving reduction, and QBU is NP-Hard to approximate to the same degree. \(\square \)

Theorem 8

For any \(C < 1\), it is NP-hard to compute a diagonal matrix entry of \(\rho _{Avg}\), in any basis, with an approximation factor of at most \(2^{n^C}\). It is also NP-hard to compute the expectation of a positive semidefinite operator \({\mathcal {O}}\) with an approximation factor of at most \(2^{n^C}\).

Proof

A diagonal element of \(\rho _{Avg}\) is the expectation value of the rank-1 PSD operator projecting onto that element, so the first statement is a special case of the second. As described above, both of these quantities then also take the form of a PSD permanent, and any PSD permanent can be turned into these problem by taking the desired matrix element (in the first case) or observabe \({\mathcal {O}}\) (in the second case) to be the first vector \(V_1^\dagger V_1\). These are also approximation preserving reductions, so these are also NP-hard to approximate. \(\square \)

4.6 NP-Completeness of Maximum Likelihood Estimation

In the case of MLE state tomography, we are not so demanding that we require knowledge of the full average state, and we are content with just finding one good explanatory state \(\vert \psi \rangle \). Accordingly, we do not consider a permanent \(\int _x G_V(x)\) (a problem of counting solutions to 3-SAT), but just the question of maximizing \(G_V(x)\) (a problem of finding a solution to 3-SAT). This allows to show that the problem is actually lies in NP, while this is unlikely to be true for the other problems in this paper unless \(\textsf {BPP}^\textsf {NP} = \textsf {NP}\).

Formulating the MLE problem as a decision problem:

Definition 4

(C-Approximate-Quantum-MLE) Given a collection of observations \({\mathcal {O}}_i\) of an unknown quantum state \(\vert \psi \rangle \), and a real number p, decide whether there is a \(\vert \psi \rangle \) whose likelihood \(L(\psi ) = \prod _i {\langle \psi |{\mathcal {O}}_i|\psi \rangle }\) is at least p, or if \(L(\psi ) < p/C\) for all \(\psi \), being promised that one of these is the case.

We will show that even the approximate problem is NP-complete, for any C.

Theorem 9

For any \(C > 1\), the C-Approximate-Quantum-MLE problem is NP-complete.

Proof

Containment in NP is straightforward, as one can supply a description of the state \(\vert \psi \rangle \), which requires only 2d many real numbers, and then \(L(\psi )\) can be directly evaluated. To be contained in NP, we need to show that only poly(n) bits of precision are needed, equivalently a \(1+2^{-poly(n)}\) approximation ratio to the exact number. Every factor in \(L(\psi )\), a single \({\langle \psi |{\mathcal {O}}_i|\psi \rangle }\), is a sum of products of numbers in our witness \(\psi \) and the given \({\mathcal {O}}_i\). Each individual product has a \(1+2^{-poly(n)}\) ratio, but when we add them there may be a catastrophic cancellation, and \({\langle \psi |{\mathcal {O}}_i|\psi \rangle }\) is very small, so that we get a large multiplicative error. But we know this cannot be optimal: this would mean that \({\mathcal {O}}_i\) projects onto a subspace that \(\vert \psi \rangle \) is exponentially close to (within an angle of \(2^{-poly(n)}\). We can choose any other point \(\vert \psi \rangle \) that is not exponentially close to any projected subspace, and get at least a better score. Such a point is guaranteed to exist, because n hyperplanes can only divide the d-sphere into \(2^{poly(n,d)}\) many sections; by taking a larger poly in our precision, we avoid catastrophic cancellation and compute \({\langle \psi |{\mathcal {O}}_i|\psi \rangle }\) to within \(1+2^{-poly(n)}\). Finally, the multiplicative accuracies in each factor \({\langle \psi |{\mathcal {O}}_i|\psi \rangle }\) multiply together, so we can guarantee the C constant approximation ratio necessary to distinguish the two sides of the promise.

We also need to check that the witness is normalized, \({\langle \psi |\psi \rangle }\approx 1\). As long as this is accurate to within \(1+2^{-n}\), rescaling \(\vert \psi \rangle \) by a factor f just scales \(L(\psi )\) by \(f^{2n}\), so also preserves the good multiplicative approximation. This shows containment in NP.

To show hardness, we use the same NAE3SAT construction as in Theorem 4. As was shown in the proof of that theorem, any good point (thus, a solution to the underlying NAE3SAT problem) has

$$\begin{aligned} L(\psi ) = G_0(x) \ge p\left( 1 - \frac{\ln ^2 d}{\sqrt{d}}\right) . \end{aligned}$$

We also show in that proof that, if there are no good points (and thus no solutions) then

$$\begin{aligned} L(\psi ) = G_0(x) \le p/d^{d^2} \end{aligned}$$

for all points. Thus, the existence of a high likelihood point even within \(C < d^{d^2}\) implies the existence of a solution. \(\square \)

4.7 Practical Difficulty of Tomography

Although the above results imply that several approaches to quantum state tomography may be difficult to compute exactly, these difficult instances are somewhat artificial and unlikely to occur in practice. Additionally, difficult instances such as the one constructed in the above proofs could be readily addressed in practice by the addition of measurements in e.g. the X measurement basis, which would directly probe the relative signs in the state vector and allow relatively efficient readout of the state. Additionally, the constraint that we only search for pure states—while a useful prior that could be relevant once high-fidelity quantum computer exists—makes a highly nonconvex search space. If we relax this and take a prior with uniform measure over the space of density matrices, then the resulting likelihood function is logarithmically convex and the resulting MLE problem can be solved in polynomial time in d. Thus, these results should not be taken as a statement that quantum state tomography is actually exponentially hard in the Hilbert space dimension d. Rather, any analysis of quantum state tomography procedures will need at least one of: careful choice of measurement basis, only probabilistic guarantees on convergence, or (if doing MLE) a convex prior.

This work has little implication for the learnability of quantum states, which asks how many (or what kind of) samples we need in order to constrain a state. The optimal sample complexity of states is alread known [34, 35] to scale as O(d) for pure states and \(O(d^2)\) for mixed states. It is also known that simple reconstruction algorithms, this quantity of data, converge with high probability. Our work, in contrast, shows that the optimal answer given a fixed set of data is hard to compute in general.