Complex phase retrieval from subgaussian measurements

Phase retrieval refers to the problem of reconstructing an unknown vector $x_0 \in \mathbb{C}^n$ or $x_0 \in \mathbb{R}^n $ from $m$ measurements of the form $y_i = \big\vert \langle \xi^{\left(i\right)}, x_0 \rangle \big\vert^2 $, where $ \left\{ \xi^{\left(i\right)} \right\}^m_{i=1} \subset \mathbb{C}^m $ are known measurement vectors. While Gaussian measurements allow for recovery of arbitrary signals provided the number of measurements scales at least linearly in the number of dimensions, it has been shown that ambiguities may arise for certain other classes of measurements $ \left\{ \xi^{\left(i\right)} \right\}^{m}_{i=1}$ such as Bernoulli measurements or Fourier measurements. In this paper, we will prove that even when a subgaussian vector $ \xi^{\left(i\right)} \in \mathbb{C}^m $ does not fulfill a small-ball probability assumption, the PhaseLift method is still able to reconstruct a large class of signals $x_0 \in \mathbb{R}^n$ from the measurements. This extends recent work by Krahmer and Liu from the real-valued to the complex-valued case. However, our proof strategy is quite different and we expect some of the new proof ideas to be useful in several other measurement scenarios as well. We then extend our results $x_0 \in \mathbb{C}^n $ up to an additional assumption which, as we show, is necessary.


Introduction
Phase retrieval refers to the problem of reconstructing an unknown vector x 0 ∈ C n from m measurements of the form where the ξ (i) ∈ C n are known measurement vectors and w i ∈ R represents additive noise. Such problems are ubiquituous in many areas of science and engineering such as X-ray crystallography [22,29], astronomical imaging [17], ptychography [31], and quantum tomography [27]. The foundational papers [7,13,4] proposed to reconstruct x 0 via the PhaseLift method, a convex relaxation of the original problem. These papers have triggered many followup works since they were the first to establish rigorous recovery guarantees under the assumption that the measurement vectors ξ (i) are sampled uniformly at random from the sphere. Since then several papers have analyzed scenarios where the measurement vectors possess a significantly reduced amount of randomness, in particular spherical designs [20] and coded diffraction patterns [5,21]. However, the theoretical results for coded diffraction patterns rely on the assumption that the modulus of the illumination patterns is varying. Indeed, it was shown in [16] that for certain illumination patterns with constant modulus ambiguities can arise, i.e., it is not possible to determine x 0 uniquely from the measurements y i . In fact, such ambiguities can already arise in much simpler settings, where the measurement vectors ξ (i) are i.i.d. subgaussian. For example, consider the case that ξ (i) = ε j are i.i.d. Rademacher random variables. That is, they only take the values +1 and −1 with probability each 1 2 . In this case the vector x 0 := e 1 = (1, 0, . . . , 0) can never be distinguished from the vector x 0 := e 2 = (0, 1, . . . , 0). Note that in this scenario it holds that E ξ (i) j and, hence, the vector ξ (i) does not fulfill a small-ball probability assumption, which means that there is no constant c > 0 such that for all ε > 0 and for all vectors x it holds that P | x, ξ (i) | ≤ ε x ≤ cε.
When the signals are complex even additional classes of ambiguities can arise. For example, when the measurement vectors ξ (i) are real, any signal x and its complex-conjugate signal x will result in identical observations.
For these reasons, previous works on phase-retrieval from subgaussian measurements (see, e.g., [11]) work with real signals and require that all entries of the vector ξ (i) fulfill for all j ∈ [n] or make even stronger assumptions. The only exception is [25] which shows for the real-valued case (x 0 ∈ R n and ξ (i) ∈ R n ) PhaseLift recovers a large class of signals from subgaussian measurements even if estimates of the type (2) are not satisfied. More precisely, one obtains that all signals for some absolute constant µ > 0, can be recovered with high probability as long as m n. However, as the approach in [25] is intrinsically based on arguments in [15] it cannot be generalized to the complex case in a straightforward manner. This paper provides an analysis both for real-valued and complex-valued signals. We believe that this understanding will be of importance for the subsequent study of structured scenarios such as coded diffraction patterns, which are also intrinsically complex in nature.
While the proofs in previous papers [5,20,21,25] relied on the construction of a so-called dual certificate, our paper will employ a more geometric approach based on Mendelson's small ball method [24,28]. This is motivated by recent work [27,23,26], which showed that a geometric analysis based on the descent cone of the trace norm can often yield additional insights compared to an approach based on dual certificates. We think that some of the techniques developed in this paper will also be useful for the analysis of other interesting measurement scenarios, such as the case of heavy-tailed measurement vectors ξ (i) or the case that ξ (i) has only entries 0 and 1.

Notation
S n denotes the vector space of all Hermitian matrices in C n×n . By S n + ⊂ S n we will denote the set of all positive Hermitian matrices. For A, B ∈ S n the Hilbert-Schmidt inner product is defined by A, B HS := Tr (A * B). The corresponding norm will be denote by · HS . For a matrix Z ∈ S n we will denote their eigenvalues by λ 1 (Z) , λ 2 (Z) , . . . , λ n (Z), which are assumed to be arranged in decreasing order, i.e., λ 1 (Z) ≥ λ 2 (Z) ≥ . . . ≥ λ n (Z). If no confusion can arise, we will suppress the dependence on Z and write λ i instead of λ i (Z). By Z 1 we will denote the Schatten-1 norm of Z, i.e. Z 1 := n i=1 |λ i (Z) |. By diag (Z) ∈ S n we denote the matrix, which we obtain by setting all off-diagonal entries of Z equal to zero. We will write a b or b a if there is a universal constant C > 0 such that a ≤ Cb.

PhaseLift
The PhaseLift method was first introduced in [7]. In this paper we focus on a variant [4,13] based on the observation that the measurements y i can be rewritten in the form where X 0 = x 0 x * 0 is a rank-1 matrix encoding the signal to be recovered up to the true inherent phase ambiguity. From this observation, PhaseLift relaxes the constraint that X 0 is of rank 1 to obtain the feasibility problem In order to simplify notation we introduce the linear operator A : S n → R m as Hence, setting y := (y 1 , . . . , y m ) ∈ R m , (3) can be rewritten as We note that while understanding the relaxation (5) is an important benchmark approach and can be solved in polynomial time, it is typically not practical for applications, as lifting increases the number of optimization variables. For this reason, a very active line of research study recovery guarantees for algorithms that operate in the natural parameter domain such as alternating minimization (see, e.g., [30,39]), gradient-descent based formulations (see, e.g., [6,9,34,35,10]), and anchored regression [2,1,3,19]. However, most of these guarantees have been shown under the assumption that the measurement vectors ξ (i) m i=1 are sampled i.i.d. from the unit sphere, so it will be a natural follow-up of this work to study to which extent our results generalize to the more practical nonconvex algorithms.

Subgaussian measurements
We consider random measurement vectors ξ (i) m i=1 given as independent copies of a random vector ξ, whose entries ξ j are assumed to be i.i.d. subgaussian random variables with parameter K, expectation E [ξ j ] = 0, and variance E |ξ j | 2 = 1. Recall that a random variable X is subgaussian with parameter K, if and only if inf t > 0 : E exp X 2 /t 2 ≤ 2 ≤ K < +∞ It is well known (see, e.g., [38]) that from this definition it follows for any (measurable) random variable X that X Lp √ pK.

Previous work
A number of previous works have studied phase retrieval with subgaussian measurements in the real-valued setting , i.e., x 0 ∈ R n and ξ ∈ R n . For measurements fulfilling E ξ j 4 > E ξ j 2 , [11] showed that PhaseLift admits order optimal uniform recovery guarantees. 1 Without the assumption E ξ j 4 > E ξ j 2 , in [25] the following result was proven, again for the real-valued case.
. . , ξ n ) ∈ R n be a random vector with i.i.d. subgaussian entries. Then there exist constants C 1 , C 2 , C 3 , and 0 < µ < 1, which depend only on the distribution of ξ 1 , such that whenever m ≥ C 1 n, 1 In [11] instead of (5) the original PhaseLift approach as in [7] is analysed.
the following statement holds with probability at least 1 − exp (−C 2 n): For all signals x 0 ∈ R n with x 0 ∞ ≤ µ x 0 and all noise vectors w ∈ R m any minimizer of (5) fulfills

Complex signals and complex measurement vectors
While arguably the scenario covered by Theorem 3, where x 0 is real-valued, is of most interest for applications, we find it still interesting from an mathematical point of view under which assumptions recovery is possible for complex-valued signals. Our first result deals with this case.
As we have explained in Section 1 there are subgaussian distributions for which we cannot achieve uniform recovery of all signals x 0 ∈ C n . For this reason we define for all 0 < µ ≤ 1 the set Now we are prepared to state the following theorem, which is our first main result. (1), where the random measurement vectors ξ (i) m i=1 are as defined in Section 2.3. Assume that |E ξ 2 1 | 2 ≤ 1 − β for some β ∈ (0, 1) and consider an additional parameter µ ∈ (0, 1]. Furthermore, assume that one of following two conditions is satisfied:

Theorem 2. Let the observation vector y be given as in
1. It holds that µ < 1 81 .

Moreover, assume that
Then with probability at least 1 − O exp −mβ 4 C 2 K 16 the following statement holds: For all vectors x 0 ∈ X µ and any noise vector w ∈ R m any solutionX of (5) satisfies Here C 1 , C 2 , and C 3 are universal constants.
The first case of Theorem 2, where one makes no assumption on the fourth moment of ξ 1 , can be applied also to certain scenarios, where unique recovery is not possible without this assumption. One important example is that the entries ξ i are drawn from {z ∈ C : z = 1} uniformly at random. Note that these measurements will always yield the same observations y for the two signals Such very sparse signals are exactly prevented by Condition 1, so there is no contradiction to the theorem's conclusion that unique recovery can be achieved via (5) for all Note that in the second scenario, where assumptions on the fourth moment of ξ 1 are available, we obtain a uniform recovery result over all x 0 ∈ C n . In the real-valued case a similar result has been shown in [11].
Consequently, x 0 and its complex-conjugate x 0 will always lead to the same measurements.

Real signals and complex measurement vectors
We have seen in Remark 1 that the assumption |E ξ 2 1 | 2 ≤ 1 − β is necessary to distinguish between a signal x 0 and x 0 . However, if, as in many practical applications, it is known a priori that the signal x 0 is real-valued then this ambiguity cannot arise and we can uniquely recover without additional assumptions via the following natural variant of the PhaseLift method, where we restrict the search space to real-valued matrices.
The following theorem shows that in this scenario the assumption |E ξ 2 Theorem 3. Let the observation vector y be given as in (1), where the random measurement vectors ξ (i) m i=1 are as defined in Section 2.3 and consider an additional parameter µ ∈ (0, 1]. Furthermore, assume that one of following two conditions is satisfied: 1. It holds that µ < 1 81 . In this case we set β = 1. 2. It holds that E |ξ 1 | 4 ≥ 1 + β for some β ∈ (0, 1]. In this case we set µ = 1.

Moreover, assume that
Then with probability at least the following statement holds: For all vectors x 0 ∈ X µ ∩ R n and any noise vector w ∈ R m any solutionX of (10) satisfies Here C 1 , C 2 , and C 3 are universal constants.
Remark 2. In comparison to Theorem 1 the probability bound in Theorem 2 and Theorem 3 is slightly better, as it improves from 1 − exp (−Ω (n)) to 1 − exp (−Ω (m)). Moreover, note that in contrast to Theorem 1 the dependence on the subgaussian distribution of ξ is not hidden in the constants. Also note that in our result the dependence on β is stated explicitly. However, we do not know whether these bounds are optimal with respect to K and β.

Proof of Theorem 2
Our goal is to show that with high probability the matrix x 0 x * 0 is close to the minimizer X of the expression A(W ) − y ℓ 1 over all W ∈ S n + . A common proof strategy that we will also follow is to establish that all X ∈ S n + with are sufficiently close to the true solution in · 1 -norm. More precisely, a sufficient condition for inequality (9) is that every X fulfilling condition (11) satisfies By the triangle inequality this implies that Hence, the upper bound (12) that we aim to establish directly follows from an appropriate lower bound for A (Z) ℓ 1 / Z 1 . Here Z ∈ S n ranges over those matrices for which x 0 x * 0 + Z is positive semidefinite. This set is convex, so it is locally well-approximated by a convex cone. To establish a uniform recovery result over all x 0 ∈ X µ , we need to study the union of the corresponding cones as given by We will refer to this set as the cone of admissible directions.
With this notation, our proof strategy can be summarized as establishing a lower bound for which in the literature is commonly referred to as the minimum conic singular value (see, e.g., [36,26]). Except for the precise nature of the cone under consideration, this strategy is exactly analogous to a number of works in the recent literature on linear inverse problems [8,27]. In particular, the following lemma, which summarizes our motivating considerations above, can be seen as a variant of [8, Proposition 2.2].
Lemma 1. Let A be the operator defined in (4). Assume that y = A (x 0 x * 0 ) + w. Then the minimizerX of (5) satisfies In the following, our goal will be to derive an appropriate lower bound for λ min (A, M µ ). One difficulty in the analysis is that not all matrices belonging to M µ are positive semidefinite. Indeed, in this scenario one could use that for positive semidefinite matrices an approximate ℓ 1 isometry holds (see, e.g. [7, Section 3]). While not all matrices in M µ are positive semidefinite the following lemma states that each matrix belonging to M µ possesses at most one negative eigenvector.
Lemma 2. Suppose that Z ∈ M µ . Then Z has at most one strictly negative eigenvalue.
Proof. Let Z ∈ M µ . By definition of M µ we can find x 0 ∈ X µ and t > 0 such that Suppose now by contradiction that Z has two (strictly) negative eigenvalues with corresponding eigenvectors z 1 , z 2 ∈ C n . Then we can find a vector u ∈ span {z 1 , z 2 } \ {0} such that u, x 0 = 0. This implies that for any t > 0 we have that which is a contradiction to (13).
Recall that for a matrix Z ∈ S n we denoted its eigenvalues by {λ i (Z)} n i=1 in decreasing order. By the previous lemma it holds that λ i (Z) ≥ 0 for all i ∈ [n − 1] and all Z ∈ M µ .
For the proof we will partition M µ into two sets. Namely, for α > 0 we define The two sets can be interpreted in the following way. If we would suppose that α = 1 it would follow that Tr (Z) ≤ 0 for all matrices Z ∈ M 2,µ,α . In particular, this implies that there is x 0 ∈ X µ such that Z is in the descent cone of the function Tr (·) at the point x 0 x * 0 . Hence, for α < 1 we can interpret M 2,µ,α as a slightly enlarged union of descent cones. In order to bound inf Z∈M 1,µ,α A (Z) ℓ 1 / Z 1 from below we will rely on the following lemma, which is proven in Section 6.
The proof of Lemma 3 makes use of the fact that the set M 1,µ,α has low complexity in the sense that the matrices in M 2,µ,α are approximately low-rank.
In contrast, the set M 1,µ,α has rather high complexity. For example, note that S n + ⊂ M 1,µ,α . Nevertheless, the quantity inf A (Z) ℓ 1 / Z 1 can be bounded from below, because the measurement matrices ξ (i) (ξ (i) ) * are positive semidefinite and the matrices in M 1,µ,α also have a dominant positive semidefinite component. This is achieved by the following lemma, whose proof can be found in Section 5.
We remark that Lemma 4 would no longer hold if the measurement matrices ξ (i) (ξ (i) ) * would be replaced by symmetric matrices with i.i.d. Gaussian entries (see [33, Proposition 1]).
Having gathered all the necessary ingredients we can prove the main result of this manuscript.
Proof of Theorem 2. Set α = 4/5. By Lemma 4 and assumption (8) it follows that with probability at least 1 Furthermore, by Lemma 3 we have with probability at least 1 − 2 exp −mβ 4 holds.

Proof of Theorem 3
The proof of Theorem 3 is in large parts analogous to the proof of Theorem 2. For this reason, we will only highlight the main differences. Replacing X µ by X µ ∩ R n and M µ by M µ ∩ R n×n we can argue analogously to Section 4.1 with the only difference that Lemma 3 has to be replaced by the following variant.

Lemma 5. Assume that one of following two conditions is satisfied:
1. It holds that µ < 1 81 . In this case we set β = 1.
In order to prove Lemma 5 we can proceed similarly as in the proof of Lemma 3, see Section 6, where in the proof of Lemma 3 we have highlighted the necessary modifications.

Proof of Lemma 4
Proof. Note that for any z ∈ C n we have that Let A ∈ C m×n be the matrix whose rows are given by {ξ i } m i=1 . It follows that A (zz * ) ℓ 1 = Az 2 . It follows from [38, Theorem 4.6.1] that due to our assumption on m with probability at least Due to the observation above this is equivalent to for all z ∈ C n . We will assume in the following that (19) holds for all z ∈ C m .
By Lemma 2 we know that Z has at most one negative eigenvalue. If all eigenvalues λ i (Z) are positive, this inequality chain and inequality (19) imply that which shows (16). Now suppose that λ n (Z) < 0. By (19) and −λ n (Z) ≤ α n−1 i=1 λ i (Z), which is due to Z ∈ M 1,µ,α , we obtain that Again using the relation −λ n (Z) ≤ α n−1 i=1 λ i (Z) we can also observe that Combining (20) and (21) shows (16), which finishes the proof.

Proof of Lemma 3 and Lemma 5
In order to prove Lemma 3 and Lemma 5 we will use the following version of Mendelson's small ball method [24,28], a tool for deriving a lower bound for nonnegative empirical process.
Lemma 6. [14, Lemma 1] Let Z ⊂ S n and let ξ (1) , ξ (2) , . . . , ξ (m) be i.i.d. random vectors. Let u > 0 and t > 0 and define Then, with probability at least 1 − 2 exp −2t 2 , it holds that Our goal is apply Lemma 6 to Z = M 2,µ,α ∩ {Z ∈ S n : Z F = 1}. The following key lemma shows that matrices in M 2,µ,α have two favorable properties: They are approximately low-rank and their mass with respect to the Frobenius norm is not concentrated on the diagonal for µ is small. The first property follows directly from the fact that the negative eigenvalue is rather small, the second property requires the spectral flatness of x 0 , i.e., that µ is bounded.
In order to prove the second inequality note that by definiton of M 2,µ,α ⊂ M µ we can choose x 0 ∈ X µ ∩ S n−1 such that there exists t > 0 with x 0 x * 0 + tZ positive semidefinite. For this choice of x 0 we can decompose Z uniquely into where λ ∈ R, u, x 0 = 0, and Z 2 x 0 = 0. We observe that We will bound the two summands separately. We begin with diag (Z 1 ) HS and observe that In the first inequality we used the triangle inequality and in the third line we used that x 0 ∞ ≤ µ x 0 = µ due to x 0 ∈ X 0 ∩ S n−1 . In the fourth line we used that |λ| ≤ Z 1 HS and µ ≤ Z 1 HS , which follows from the fact that the summands of Z 1 = −λx 0 x * 0 + ux * 0 + x 0 u * are orthogonal to each other. In the last line we again used that Z 1 HS ≤ Z HS as Z is decomposed orthogonally into Z = Z 1 + Z 2 .
In order to bound diag (Z 2 ) HS we note first that Z 2 is positive semidefinite. Indeed, suppose by contradiction that Z 2 is not positive semidefinite. Then there would exist a vector v ∈ C n such that v, x 0 = 0 and v * Z 2 v < 0. In particular, this would imply that v * (x 0 x * 0 + tZ) v < 0 for all t > 0, which is a contradiction to our choice of x 0 . Now let w ∈ C n be the normalized (i.e., w = 1) eigenvector to the eigenvalue λ n (Z). Then we obtain that where the first inequality follows from the fact Z 2 is positive semidefinite. Using this observation we obtain that where in the fourth line we used that −λ n (Z) ≥ 1 1+α −1 Z 1 , which is a consequence of the first inequality of (24). Combining this estimate with (25) and (26) shows part (b), which finishes the proof.
In analogy to [24] we bound Q Z (2u) using the following lemma, whose proof is based on the Paley-Zygmund inequality. A key difference is that we use the Hanson-Wright inequality to control the fourth moment E|ξ * Aξ| 4 appropriately.
Lemma 8. Let A ∈ S n and let ξ = (ξ 1 , . . . , ξ n ) be a random vector with independent and identically distributed entries ξ i taking values in C such that Eξ i = 0, E|ξ i | 2 = 1, and ξ i ψ 2 ≤ K. Then we have that Here C > 0 is an absolute constant.
Proof. Note that by the Paley-Zygmund inequality (see, e.g., [12]) we have that for all In particular, setting t = E|ξ * Aξ| 2 /2 yields that To estimate E|ξ * Aξ| 4 from above we note that the triangle inequality yields that In order to estimate the first summand we will use that ξ * Aξ − E [ξ * Aξ] has a mixed subgaussian/subexponential tail. We can bound the tail probability using the Hanson-Wright inequality (in the version of [32]), which states that there is a numerical constant c > 0 such that for all t > 0 it holds that This yields that where the third line follows from a change of variables. Combining this inequality chain with (29) we obtain that E|ξ * Aξ| 4 K 8 A 4 HS + (Tr (A)) 4 .
Inserting this into (28) finishes the proof.
In order to apply Lemma 8 we need a lower bound for E |ξ * Aξ| 2 . The next lemma computes this quantity.
Lemma 9. Let ξ = (ξ 1 , . . . , ξ n ) be a random vector with independent and identically distributed entries ξ i taking values in C such that Eξ i = 0 and E|ξ i | 2 = 1. Then for all matrices A ∈ S n it holds that Proof. First, we observe that where in the third line we used that E [ξ i ] = 0 and that the entries of ξ are independent, which implies that there are no summands where one index appears exactly three times. The first summand can be computed by where we have used that A i,i = A i,i for all i ∈ [n] and E |ξ i | 2 = 1. The second summand can be computed by For equation (a) we used the observation that By summing up (I) and (II) we obtain equality (30).
The lemmas above would allows us to find a lower bound for Q Z (2u) in Lemma 6. We still need an upper bound for the Rademacher complexity E sup Z∈Z m i=1 ε i ξ i ξ * i , Z HS . The next lemma provides such a bound. In [27] a version of this lemma has already been presented. Nevertheless, we include a proof in the appendix for completeness.
Now we have gathered all the ingredients to complete the proof.
Proof of Lemma 3 and Lemma 5. We will start by showing that E | ξξ * , Z HS | 2 β Z 2 HS for all Z ∈ M 2,µ,α in the case of Lemma 3, or all Z ∈ M 2,µ,α ∩ R n×n in the case of Lemma 5, respectively.
We first consider the second case and assume that the condition E |ξ i | 4 ≥ 1 + β is satisfied for some β > 0. By Lemma 9 we obtain that for all Z ∈ M 2,µ,α under the conditions of Lemma 3 Under the assumptions of Lemma 5 we observe that i =j Im (Z i,j ) 2 = 0 and i =j Re (Z i,j ) 2 = Z − diag (Z) 2 HS . Hence, a similar argument as before also leads to Under the first assumption, µ < 1 81 , we obtain by Lemma 9 that for all Z ∈ M 2,µ,α Similarly, under the assumptions of Lemma 5 we can again use that i =j Im (Z i,j ) 2 = 0 and i =j Re (Z i,j ) 2 = Z − diag (Z) 2 HS to obtain by an analogous argument that E | ξξ * , Z HS | 2 ≥ β Z − diag (Z) 2 HS .
The remainder of the proof will be the same for Lemma 3 and Lemma 5. By Lemma  .
Note that for all Z ∈ M 2,µ,α where in the last inequality we also used that α = 4 5 . This shows that for all Z ∈ M 2,µ,α it holds that P ξξ * , Z HS ≥ √ β Z HS √ 2C where we used that K 1 due to (7). Now recall that Z := M 2,µ,α ∩{Z ∈ S n : Z HS = 1}. Thus we have shown that where Q Z (·) is defined in (22). We used that u = √ β 2 √ 2C and C ′′ > 0 is a constant chosen large enough. From Lemma 10 it follows that Combining this inequality with our choice of u and choosing the constant in assumption (14) large enough it follows that Applying Lemma 6 yields that with probability at least 1 − 2 exp −2t 2 Setting t = √ mβ 2 4C ′′ K 8 it follows that with probability at least 1 − 2 exp −mβ 4 Hence, by the definition of A and Z it follows that inf Z∈M 2,µ,α Due to α = 4 5 and Lemma 7 we have that Z 1 ≤ 9 4 Z HS for all Z ∈ M 2,µ,α . Combined with (32) this shows (15).
Inserting this in the inequality chain above yields that Combined with inequality (33) this finishes the proof.