Convex combination of alternating projection and Douglas-Rachford operators for phase retrieval

We present the convergence analysis of convex combination of the alternating projection and Douglas-Rachford operators for solving the phase retrieval problem. New convergence criteria for iterations generated by the algorithm are established by applying various schemes of numerical analysis and exploring both physical and mathematical characteristics of the phase retrieval problem. Numerical results demonstrate the advantages of the algorithm over the other widely known projection methods in practically relevant simulations.

many scientific and engineering fields, including astronomy imaging [10,21], X-ray crystallography [22,43], microscopy [2,26] and adaptive optics [1,11,12,45]. An important application of phase retrieval in optics is to quantify the properties of an imaging system via its generalized pupil function [5,13,25,47]. The fundamental advantage of this approach compared to those using intensity point spread functions (PSFs) or intensity optical transfer functions is that it is modifiable and automatically takes the specific characteristics of the imaging system under investigation into account. In adaptive optics, one needs to know the phase of the optical field in the system aperture to be able to compensate for an optical aberration, and the phase retrieval is a basis for a wide class of focal-plane based wavefront sensors.
Since the fundamental work [50] of Sayre in 1952, which reveals that the phase of a scattered wave can be recovered from the recorded images at and between Bragg peaks of a diffracted wavefront, a wide variety of solution methods for phase retrieval has been proposed and developed. For an overview of phase retrieval algorithms, we refer the reader to the papers [16,38,40,51]. Direct methods usually require insights about the crystallographic structure to recover the missing phase [23]. Such a structural information is not only costly in terms of computational complexity but also sensitive to noise and approximation, for example, due to physical limitation or model deviation. As a consequence, this approach lacks practicability and becomes less popular in practice. The second class of solution algorithms relies on the fact that phase retrieval problems can be reformulated as linear equations with rank and positive semidefinite constraints in higher dimensional spaces. Well known examples of this algorithm class are MaxCut [18], PhaseCut [54] and PhaseLift [6,7]. This convex relaxation approach requires the matrix lifting step which is computationally demanding and hence not suitable for largescale problems. The most popular class of phase retrieval methods is based on projections and pioneered by the work of Gerchberg and Saxton [17], which deals with phase retrieval given a single PSF image and the amplitude of the complex signal, which in the sequel will be referred to as the amplitude constraint in order to clearly differentiate it from the intensity constraints determined by data images. The need to deal with more and more phase retrieval models, for example, incorporating various types of a priori constraint [15], being given multiple images and involving regularization schemes, has given rise to a wide range of solution methods in this class. It was recently observed by Luke et al. [40] that this class of methods actually outperforms the other classes of phase retrieval algorithms.
In light of [3,32,39], phase retrieval can be interpreted as mathematical feasibility problems and, as a consequence, all algorithmic schemes for set feasibility can be adapted for phase retrieval. The current research is devoted to that topic. The alternating projection (AP) and the Douglas-Rachford (DR) algorithms are perhaps the most widely known solution methods for set feasibility and have served as a basis for a wide range of modifications and regularizations, see, for example, [4,29]. It has been observed that AP is stable, always convergent and to some extent able to suppress noise but it may get stuck at undesired local minima and the convergence speed can be very slow [15]. In contrast, DR can be faster in convergence and better in escaping from bad local minima but less robust against noise and model deviation [36]. As a result, this algorithm can not be naively applied to practical problems which intrinsically involve noise and model approximation. This fact has motivated a number of its efficient relaxation schemes such as the usage of the Krasnoselski-Mann relaxation, the Fienup's hybrid input-output (HIO) algorithm [15], the relaxed averaged alternating reflections (RAAR) algorithm [35,36] and the DRAP algorithm [52].
In this paper, we analyze the DRAP algorithm for solving the phase retrieval problem for the first time after having observed that it appears to be the most efficient algorithm for the problem setting under consideration, see Section 5. Interestingly, DRAP mathematically coincides with the convex combination of the AP and DR operators in the phase retrieval setting. As a result, DRAP admits two mathematically equivalent descriptions (see (19) and (20) in Section 3). The first one ensures that its computational complexity is only approximate to that of each of the constituent operators and thus it is used for numerical implementation. The second description as a convex combination of the AP and the DR operators exhibits a concrete connection to the fundamental projection algorithms and hence it is intuitively better situated on the map of projection methods (see Remark 6).
The main contribution of this paper is the convergence analysis of the DRAP algorithm for solving the phase retrieval problem. First, using the analysis approach initiated by Chen and Fannjiang [8], we establish a convergence criterion for DRAP (Theorem 1), which extends the convergence result of the DR algorithm formulated in that paper. It is worth mentioning here that extending a convergence criterion for DR to a corresponding one for its relaxations such as HIO, RAAR and DRAP algorithms is not trivial 1 . Proposition 2 extends the applicable scope of this type of convergence results 2 to cover also phase retrieval problems with amplitude constraint. Second, applying the analysis scheme developed by Luke et al. [42], we establish another convergence criterion for the DRAP algorithm (Theorem 2) by integrating the physical properties of the phase retrieval problem [36] into the earlier known results for DRAP [52]. Recall that the analysis of the latter article involves only abstract mathematical notions in the general setting of set feasibility. As a comparison, we make an attempt on connecting the two convergence criteria by linking their key mathematical assumptions to a single physical condition on the phase diversities which are the almost only adjustable figures of the phase retrieval problem (see Remark 15).
The paper is organized as follows. In the last part of this introductory section, we introduce the mathematical notation used in the paper. Section 2 is devoted to formulating the phase retrieval problem and addressing in details the key steps towards its solutions using projection algorithms. A discussion on projection methods for phase retrieval is presented in Section 3. In Section 4, convergence results of the DRAP algorithm are established using two different analysis approaches: 1) spectral analysis in Section 4.1 and 2) variational analysis in Section 4.2. Numerical simulation is presented in Section 5.
Mathematical notation. The underlying space in this paper is a finite dimensional Hilbert space denoted by H. The element-wise multiplication is denoted by . The element-wise division · · , absolute value | · |, square ( · ) 2 and square root √ · operations are also frequently used but without need for extra notation. Re(·) and Im(·) denote the real and the imaginary parts of a complex object in th brackets, respectively. The imaginary unit is j = √ −1. Id denotes the identity mapping while I n denotes the identity matrix of size n. The distance to a set Ω ⊂ H is defined by x − w and the set-valued mapping is the projector on Ω. A selection w ∈ P Ω (x) is called a projection of x on Ω. The reflection operator associated with Ω is accordingly defined by R Ω := 2P Ω − Id. Given a subset Ω ⊂ H, the Fréchet and limiting normal cones to Ω at a pointx ∈ Ω are defined, respectively, as follows: where x Ω →x means that x →x and x ∈ Ω. The set of fixed points of an operator T : H ⇒ H is defined by Fix T := {x ∈ H | x ∈ T (x)}. Our other basic notation is standard; cf. [44,49]. B δ (x) stands for the open ball with radius δ > 0 and center x. For a linear subspace V of H, is the orthogonal complement subspace of V .

Phase retrieval
Phase diversities and the Fourier transform are key ingredients of the phase retrieval problem studied in this paper. Recall that adding a phase diversity to the phase of a complex signal is a unitary transform and the (discrete) Fourier transform is also a unitary operator. Since unitary transforms are one-to-one represented as unitary matrices, the phase retrieval problem can be formulated in the form of matrix-vector-multiplication as follows. For an unknown complex objectx ∈ C n , let M ∈ C N ×n be the propagation matrix which is normalized to be isometric, and r ∈ R N + be the measured data of |Mx| 2 . The phase retrieval problem is to find an (approximate) solution to the equation: where w ∈ R N represents unknown noise 3 .
Remark 1 To formulate the phase retrieval problem in the matrix-vectormultiplication form (2) or any feasibility model in Section 2.2, we need to vectorize all array objects in a consistent manner and rewrite all linear mappings as matrix multiplication operations in higher dimensional spaces, see, for example, [13, section 2A]. This one-to-one conversion allows us to do the theoretical analysis in the simple matrix-vector-multiplication formulation without loss of generality.
In this paper, we study the phase retrieval setting with several phase diversities, and the propagation matrix M takes the following form: where m ≥ 2 is the number of data images, F ∈ C n×n is the unitary matrix representing the discrete Fourier transform, and D d ∈ C n×n are unitary matrices representing the phase diversities which will be denoted by φ d in the sequel (d = 1, 2, . . . , m). Note that N = mn.
Remark 2 (phase modulators versus out-of-focus measurements) There are two widely used techniques of acquiring the PSF images for phase-diversity phase retrieval. First, a phase modulator is used for introducing phase diversities in the pupil plane corresponding to which the images are measured in the focal plane. Second, the images are registered in out-of-focus planes along the optical axis (i.e, parallel to the focal plane at some known distances) without the use of phase modulator. It is well known that the two techniques are mathematically equivalent [19].
When a priori knowledge of the solutions is available, that isx ∈ χ for some known subset χ ⊂ C n , one can expect more accurate phase retrieval. The formulation (2) is naturally modified as follows: Following the background developed in [3,15,17,39], we are going to address the problem (4) using projection algorithms. The main steps for this solution process will be detailed next.

Feasibility models
Several feasibility models of phase retrieval have been formulated in either the physical domain 4 [32,39] or the Fourier domain 5 [8]. Viewing the Fourier transform and phase-diversity addition as unitary transforms, we clarify the relationship between various feasibility models of the phase retrieval problem.
In the physical domain, for each d = 1, 2, . . . , m, let us denote r d the measurement of the PSF image |F D d (x)| 2 . Define the intensity constraint sets as follows [3,39]: Then, the problem (4) can be approached via the following feasibility problem involving multiple sets: where Ω 0 := χ captures a priori knowledge of the solutions.
Remark 3 (nonconvexity feasibility) All the problem models appearing in this paper are nonconvex due to the nonconvexity of the intensity constraints Ω d defined in (5).
When addressing the phase retrieval problem with noise and model deviation, an appropriate averaging process is essential for suppressing noise. For this, we consider the following feasibility model in the product space: where The equivalence between (6) and (7) in the general setting of set feasibility finds its root in [46]. Without a priori constraint, i.e., χ = C n , the set D is the (n-dimensional subspace) diagonal of the product space C nm . The counterpart of (7) in the Fourier domain is as follows: where A := M (χ) and B := {y ∈ C N | |y| 2 = r}.
The 2-set feasibility models (7) and (9) allow us to adapt various algorithmic schemes including flexible relaxation and regularization for the phase retrieval problem.
The relationships between models (6), (7) and (9) in the noiseless setting are as follows.
Proposition 1 (equivalences of feasibility models) Letx ∈ C n andŷ = Mx. The following statements are equivalent: is a solution to (7); (iii)ŷ is a solution to (9).
Proof The equivalence between (i) and (ii) is widely known [46], while the equivalence between (i) and (iii) follows from the unitarity property of the matrix M given in (3), that is, M * M = I n .
Remark 4 (inconsistent feasibility) In practical circumstances, for example, due to the presence of noise and model deviation, the intersection in (6), (7) and (9) is likely to be empty. There are natural interpretations of inconsistent feasibility in terms of minimization involving indicator and distance functions. For example, let us interpret the AP method for solving the (possibly inconsistent) feasibility (9) in terms of classical algorithms for minimization. The worrisome issue regarding the emptiness of the intersection would be eased when one associates (9) with the following minimization problem: In view of Proposition 2 (which is proved later in Section 4.1), the set A defined in (10) can be assumed to be convex, and hence the objective function f in (11) is differentiable with the gradient given by ∇f (y) = y − P A (y) for every point y [48]. Then, alternating projection for solving (9) is precisely the projected gradient method for solving (11).

Projectors
The decisive step of solving the feasibility problem (9) by projection algorithms is to calculate the two projectors on the sets A and B defined in (10). Since B is geometrically the product of a number of circles of the complex number plane, an explicit form of the projector P B , which is in general a set-valued mapping, is available [3,39]: with the convention that yi |yi| = S whenever y i = 0, where S denotes the complex unit circle 6 . In numerical computation, the (single-valued) selection of P B corresponding yi |yi| = 1 whenever y i = 0 is sufficient.
Remark 5 (projector on regularized sets) In view of Remark 4, the set B can have no common point with the set A. For ways of handling such a feasibility gap, one can think of regularizing or approximating the set B. For example, Luke [37] proposed to enlarge the set B to where ε ≥ 0 can be viewed as the radius of enlargement, dist φ is the Bregman distance, associated with a strictly convex function φ : R N → (−∞, ∞] which is differentiable on the interior of its domain, given by The function φ should be chosen in accordance with the statistical model of the noise w in (4). More specifically, let us consider the Gaussian and Poisson models of noise, which are perhaps the most relevant to phase retrieval. The Bregman distance associated with the half energy kernel operator φ = 1 2 · 2 is simply the Euclidean norm, and it is appropriate for Gaussian noise. Let us define the function φ : The Bregman distance associated with the function φ given by (13) is the Kullback-Leibler divergence, and it is appropriate for Poisson noise. The projector on the regularized set B ε can be viewed as an approximation of the projector on B, and hence it can be used in the framework of projection methods. The cyclic projection algorithm using approximate projectors of this type has been analyzed by Luke [37], and in fact his idea can also be extended to other projection methods. However, since the projector on a regularized set is often much more complicated to be computed than the one on the original set, we can instead treat the latter one as an approximation of the former one [40, page 22]. This insight about approximate projectors for inconsistent feasibility allows us to simply use the formula (12) for both analytical and numerical purposes without any worrisome issue.
The projector on the set A can also be explicitly described 7 . We make use of the following notation: Lemma 1 For the propagation matrix M given in (3), it holds that 7 Note that convexity of A is not required in Lemma 1.
Proof Let us first define the unitary matrix based on the matrix M as follows: This block diagonal matrix is unitary since all of its constituent blocks are so. By the structure of M and U , we have that Since U is unitary, it holds that Since [C n ] m is a subspace containing 1 √ m [χ] m , by the properties of the metric projection, we have that We next calculate U * y. Note that U * is also a block diagonal matrix whose blocks are the conjugate transpose of the corresponding blocks of U . Let us denote c k the column vector whose entries taken from y correspond to the block Since [C n ] m is the n-dimensional diagonal of the product space C nm , we obtain by solving the minimizing problem (1) that Plugging (18) and (17) into (16) yields that The proof is complete.
The formula (14) shows that the complexity of P A heavily depends on that of P χ .

Projection algorithms
Projection algorithms for phase retrieval can be considered as descendants of the well known Gerchberg-Saxton (GS) algorithm [17] which deals with phase retrieval given the amplitude constraint and a single PSF image. Their introduction has been motivated by the rapidly growing application of phase retrieval originated from a wide variety of physical settings. For example, the famous input-output, output-output and hybrid-input-output algorithms [15] arose up when dealing with the support and the real and nonnegative constraints instead of the amplitude constraint as the GS method. Extensions for solving problems given multiple images and for obtaining better restoration have been among the main objectives of this class of phase retrieval algorithms. In light of the groundwork [3], in Section 2.2 we have interpreted the phase retrieval problem (4) as a feasibility problem in one of the equivalent forms (6), (7) and (9). Having calculated the projectors P A and P B in Section 2.3, we are now ready to discuss algorithmic schemes for the solutions. From now on, we analyze the feasibility model (9).
AP and DR are perhaps the most widely known solution methods for feasibility and have been the basis for a wide variety of modification and regularization schemes. We refer the reader to, for example, [4,29] for an overview of these basic methods in the setting of set feasibility. For an early discussion in the context of phase retrieval, we refer the reader to the surveys [3,39]. It has been observed that AP is stable, always convergent and to some extent able to suppress noise, but the convergence speed can be very slow [15]. In contrast, DR can be fast in convergence, but sensitive to noise and model deviation [36]. Indeed, only relaxations of DR can be used for problems in the presence of noise and model mismatch.
The use of the Krasnoselski-Mann relaxation is perhaps the most widely known. Mathematically, it is the convex combination of the DR operator T DR := 1 2 (R A R B + Id) and the identity mapping: where β ∈ (0, 1] is the relaxation parameter. The Fienup's hybrid-input-output (HIO) method [15] can be viewed as a relaxation of DR: where β ∈ (0, 1] is the relaxation parameter. Another relaxation of DR known as the relaxed averaged alternating reflections (RAAR) algorithm was proposed and analyzed in [35,36] for phase retrieval. It is the convex combination of the DR operator and one of the projectors: where β ∈ (0, 1] is the relaxation parameter. Inexact versions of RAAR were also proposed and analyzed in [36]. The DRAP algorithm [52] is another relaxation of DR: where λ ∈ [0, 1] is the relaxation parameter 8 . Interestingly, in the phase retrieval setting (9), T DRAP coincides with the convex combination of the AP and DR operators provided that χ is an affine set. The latter condition implies that the set A = M (χ) given by (10) is affine. Hence, the projector P A is linear and we obtain the the following expression: where T AP := P A P B is the AP operator.

Remark 6
The two expressions (19) and (20) play their own role in explaining interesting features of DRAP 9 . On the one hand, only two projections are required for computing an iteration of (19) (P B once and P A once) compared to three projections for (20) (P B once and P A twice). This means that the computational complexity of DRAP is at the same level as that of the other projection methods if (19) is used in numerical implementation. On the other hand, the expression (20) as a convex combination of T AP and T DR explains better the idea leading to the introduction of DRAP as a relaxation of DR compared to the less intuitive form (19).
Plugging the two projectors (12) and (14) into (19), we come up with the following explicit form of DRAP for addressing the feasibility problem (9): where y and y + stand for the two consecutive iterations of DRAP. In the case χ = C n , (21) further reduces to In the remainder of this paper, we analyze the DRAP algorithm in the phase retrieval setting (9) and demonstrate its advantages over the other algorithms.

Convergence analysis
In this section, we study convergence properties of DRAP using two different analysis schemes. Since the problem (9) is nonconvex, we can only obtain local convergence criteria though it is observed from numerical results that the quality of phase retrieval is not affected by the starting point for the algorithm.

A result from spectral analysis
The analysis in this section is based on the observation that the projector P A given by (14) is linear, and the projector P B given by (12) also has a good first order approximation around any solution of (9). We follow the analysis approach initiated by Chen and Fannjiang [8] where they established a local linear convergence result for the DR algorithm. The mentioned result of [8] was later extended for the RAAR algorithm in [34]. We will show that DRAP also enjoys that kind of convergence result 10 . In this section, we assume that the lowest intensity of the images is strictly positive: Remark 7 When the phase diversities φ d are assumed to be continuous random variables, condition (23) is satisfied almost surely [8].
We first analyze DRAP in the form (22) for solving (9) with χ = C n . Let us denote whereŷ is a solution to (9) and diag(·) denotes the diagonal matrix with elements on its diagonal taken from the vector in the brackets. Since r = |ŷ| 2 vanishes nowhere 11 by (23), for all y sufficiently close toŷ, |y| also vanishes nowhere. In particular, for a fixed vector v ∈ C N , the vector |ŷ + εv| vanishes nowhere provided that ε is sufficiently small. The next lemma establishes the first order approximation of T DRAP as a complex vector valued function around y in a given direction.
In view of (22), we have that Then The following formula for the first order approximation of (Y ε − Y )r can be calculated directly: Substituting (26) into (25) yields The proof is complete.
The next step is to analyze the spectrum of the real decomposition of the complex matrix L as follows: Note that L is isometric since L is so. Define also the real decomposition of a complex vector by Let 1 ≥ σ 1 ≥ σ 2 · · · ≥ σ 2n ≥ σ 2n+1 = · · · = σ N = 0 be the singular values of L with the corresponding right singular vectors v k ∈ R 2n : k = 1, . . . , 2n and the left singular vectors u k ∈ R N : k = 1, . . . , N . We have by the definition of the singular value decomposition (SVD) that The next technical result regarding the spectrum of L is crucial.
Thanks to Lemma 3 and the definition of the SVD, one has the following expression of the second largest singular value of L: The following theorem establishes linear convergence of the DRAP algorithm for solving (9). Since phase retrieval is ambiguous (at least) up to a global phase shift 12 , the following distance between two complex vectors is of interest: Theorem 1 (linear convergence of DRAP) In the setting of (9) with χ = C n , suppose that Let y (k+1) ∈ T DRAP y (k) be a sequence generated by T DRAP in the form of (22) with where x (k) := M * y (k) (k = 1, 2, . . .).
Proof First, the optimal global phase shift defined by (28) is given by [34]: Let us denote η (k) := Y * (α (k) y (k) −ŷ). Thanks to Lemma 2, we have that . 12 That is the first element of the orthogonal basis of Zernike polynomials.
Multiplying both sides of the above equality by L * = M * Y and taking the isometry property of L into account, we obtain that Due to (30) and the fact that |ŷ|, j|ŷ| = 0 we have In other words, η (k) ⊥ j|ŷ|. By basic properties of the Hermitian inner product, one has Re η (k) ⊥ j|ŷ|. As a result, Im(η (k) ) ⊥ |ŷ|. Taking Lemma 3 into account, we have just shown that Im(η (k) ) is orthogonal to u 1 = |ŷ| which is the first left singular vector of L. This together with the expression (27) of σ 2 implies that Combining (28), (31) and (32) yields that Since σ 2 < 1 by assumption (29), there exists a number c ∈ (σ 2 , 1) such that for all η (k) with η (k) sufficiently small, it holds that Combining (33), (34) and the definition of η (k) yields The proof is complete.
Remark 9 (region of convergence) Since the algorithm operates in the underlying space C N , for the sake of brevity, let us speak of region aroundŷ = M (x) instead ofx. In view of Theorem 1, such a convergence region, if exists, is mutually dependent on the constant c. More specifically, given a number c ∈ (σ 2 , 1), it is the region in which the first order approximation (24) of T DRAP aroundŷ is valid and condition (34) is satisfied for all k ∈ N. Note that the latter involves not only σ 2 and c but also the sequence y (k) itself. The intersection of the regions over all possible sequences complied with T DRAP can be taken as the region of convergence. Obviously, such a statement is not informative and hence it has not ever been an objective of local convergence analysis.
Remark 10 (influence of λ on convergence) In view of Theorem 1, the relaxation parameter λ obviously has influence on the region in which the first order approximation of T DRAP (Lemma 2) is valid and condition (34) is satisfied, however, its influences on the convergence speed of DRAP is unclear 13 .
We have analyzed the DRAP algorithm in the phase retrieval setting (9) with χ = C n . The latter condition limits the effectiveness of Theorem 1 to phase retrieval without a priori constraint. In the remainder of this section, we will show that the convergence criterion can also be applicable to phase retrieval problems with an amplitude constraint, which is a helpful prior information and often available in practice 14 .
The amplitude constraint is described by where a ∈ R n + is the known amplitude of the complex signal. The next result shows that the problem (9) with an amplitude constraint can equivalently be reformulated as a problem without a priori constraint in a higher dimensional space.
Proposition 2 The problem (9) with the amplitude constraint (35) can equivalently be reformulated as: where A := M(C n ), B := {y ∈ C N +n | |y| 2 = r} ⊂ C N +n with 13 Of course, the influence is clearly observed from numerical computation. 14 This is because the light distribution in the pupil plane is often known, for example, it can be uniform or truncated Gaussian.
Proof For convenience, let us recall that N = nm according to (3). We first observe that M is isometric if and only if M is isometric since Letŷ ∈ C N be a solution to (9). That is,ŷ = Mx with |x| = a and |ŷ| 2 = r.
The proof is complete.
Remark 11 Proposition 2 shows that the convergence criterion formulated in Theorem 1 is indeed applicable to not only phase retrieval problems without a priori constraint but also those involving an amplitude constraint. Compared to the earlier convergence results for the DR algorithm in [8,Theorem 5.1] and the RAAR algorithm [34,Theorem 3], this new observation widens the applicable scope of this type of convergence results.

A result from variational analysis
We recall a number of mathematical notions needed for formulating a local linear convergence criterion for the DRAP algorithm using the analysis scheme of [42] and discuss the validity of the imposed assumptions in the setting of phase retrieval.
Definition 1 (prox-regularity of sets) [48] A set Ω is called prox-regular at a pointŷ ∈ Ω if the projector P Ω is single-valued aroundŷ.
Prominent example of prox-regularity is that a closed and convex set is prox-regular at every of its points. In particular, the set A = M (χ) in (10) has this property whenever χ is convex. The next statement finds its root in the original work [36, Section 3.1].
Lemma 4 (prox-regularity of B) [53, Lemma 6.2(i)] The set B defined in (10) is prox-regular at every of its points.
Definition 2 (pointwise almost averaged operators) [42, Definition 2.2 and Proposition 2.1] A (not necessarily nonexpansive) fixed point operator T : H ⇒ H is called pointwise almost averaged on a set Ω at a point y ∈ Ω with violation ε and averaging constant α > 0 if for all z ∈ U , z + ∈ T (z) and y + ∈ T (y), T is called almost averaged on Ω with violation ε and averaging constant α if it is pointwise almost averaged on Ω at every point y ∈ Ω with that violation and averaging constant. When the violation ε is zero, the quantifier 'almost' is dropped.
For the meaning of the quantifiers 'pointwise' and 'almost' appearing in Definition 2 as well as the motivation of the property, we refer the reader to the original work on pointwise almost averaged operators [42]. The following statement claims this property for T DRAP as a fixed point operator.
Proof Let ε > 0 be a positive number, which can be arbitrarily small. Since the set B is prox-regular atŷ by Lemma 4, thanks to [24,Theorem 2.14] there exists a neighborhood ofŷ on which P B is almost averaged with violation ε and averaging constant 1/2. Also, P A is averaged since A is convex. The statement then follows from [52,Proposition 2].
Definition 3 (metric subregularity) [14] A set-valued mapping Ψ : H ⇒ H is called metrically subregular atŷ ∈ H forẑ ∈ Ψ (ŷ) if there exist numbers δ > 0 and κ > 0 such that Metric subregularity is one of the cornerstones of variational analysis and optimization theory with many important applications, particularly as constraint qualifications for establishing calculus rules for generalized subdifferentials and coderivatives [44,49] and for analyzing stability and convergence of numerical algorithms [14,27,42].
We are now ready to formulate another local linear convergence criterion for DRAP in the setting of phase retrieval.
Theorem 2 (linear convergence of DRAP) Letŷ ∈ C N be a solution to (9) and suppose that the set-valued mapping Ψ := T DRAP − Id is metrically subregular atŷ for 0. Then every sequence generated by T DRAP converges linearly to a fixed point of T DRAP provided that the initial point is sufficiently close toŷ.
Compared to [52,Theorem 2], Theorem 2 additionally takes the proxregularity of the sets A and B into account. The proof is omitted for brevity.
Remark 12 (necessity of metric subregularity) There are two types of regularity conditions often required to obtain a convergence result in the nonconvex optimization literature. The geometry of the phase retrieval problem yields one type of regularity, that is, the prox-regularity of the sets. The second type of regularity, termed as metric subregularity, is difficult to verify, but as been recently shown in [41] this condition is not only sufficient but also necessary for local linear convergence.
Remark 13 (analysis for inconsistent feasibility) To analyze convergence properties of DRAP in the more challenging setting of inconsistent feasibility (i.e., phase retrieval with noise and model deviation), more technical details are required. This task can be done by following the lines of [42], however, the technical assumption of metric subregularity again remain unverifiable in the setting of phase retrieval. Hence, we chose to formulate the result in the simpler consistent setting.
Our goals in the remainder of this section are 1) to link the abstract metric subregularity condition imposed in Theorem 2 to the physical figures of phase retrieval, and 2) to connect the two convergence criteria formulated in Theorems 1 & 2 by showing that their key assumptions to some extent can be traced back to a common condition on the phase diversities which are the almost only adjustable figures of phase retrieval.
The subsequent analysis is valid only for the phase retrieval setting with two images, that is, we consider m = 2 in (3). We first recall the concept of transversality.
Definition 4 (transversality) [9, page 99] A pair of sets {A, B} is transversal at a pointŷ in their intersection if The origin of this concept can be traced back to at least the 19th century in differential geometry which deals with smooth manifolds [20]. We refer the reader to, for example, [28][29][30][31] for various characterizations of transversality and its application in feasibility problem. The following result shows that the metric subregularity condition in Theorem 2 can be deduced from the transversality property.
Proposition 3 (transversality implies metric subregularity) Letŷ ∈ C N be a solution to (9) and suppose that the pair of sets {A, B} defined in (10) is transversal atŷ. Then, the set-valued mapping Ψ := T DRAP − Id is metrically subregular atŷ for 0.
This yields metric subregularity of Ψ atŷ for 0 as claimed.

Remark 14
The restriction m = 2 involves in Proposition 3 only in an implicit manner. A further analysis 15 reveals that m = 2 is a necessary condition for having the transversality assumption fulfilled.
Proposition 4 Letŷ = Mx ∈ C N be a solution to (9). Then, the pair of sets {A, B} defined in (10)  In view of Propositions 3 & 4, the metric subregularity condition in Theorem 2 is guaranteed by the transversality of {Ω 1 , Ω 2 } atx. In view of (5), the latter sets are tied to the phase diversities {φ 1 , φ 2 } which are represented as unitary matrices {D 1 , D 2 } in (5). Unfortunately, the question of choosing {φ 1 , φ 2 } such that {Ω 1 , Ω 2 } satisfies the transversality atx or some weaker property but sufficient for the metric subregularity condition in Theorem 2 is not trivial and open.
We conclude this section with an overall remark on the obtained convergence results.

Remark 15
Convergence criteria for the DRAP algorithm in Theorems 1 & 2 are derived from two different analysis approaches 16 , however, their key assumptions to some extent can be related to the choice of phase diversities.

Numerical simulation
We simulate an imaging system with physical parameters as summarized in Table 1. The simulation phase Φ is shown in Figure 1 (right). For the forward model, the amplitude χ is constant over the pixels in the aperture 17 . Five PSF images corresponding to the five phase diversities φ d = π · z d · Z 0 2 , (z d = −2, −1, 0, 1, 2) are respectively calculated by p d = F χ · e j(Φ+φ d ) 2 , (d = 1, . . . , 5).
We consider a practically relevant case where the images are corrupted with both Poisson and Gaussian noise. After normalizing the five images generated by (39) such that their highest intensities are unity, the normalized images are corrupted with Poisson noise using the Matlab function imnoise. Then, after scaling these noisy images to have the highest intensities of the original ones, we introduce additive white Gaussian noise with signal-to-noise ratio (SNR) 10 dB (decibel) using the Matlab function awgn. The input data images, which are denoted by r d (d = 1, 2, . . . , 5), are finally obtained by replacing all the negative entries of the corrupted images by zeros.  We formulate the phase retrieval problem in the form (9) and restore the phase Φ given the five noisy PSF images r d and the physical parameters specified above using DRAP. The quality of phase retrieval is measured by the relative root mean square (RMS) error of the estimated phase relative to the true phase. The phase retrieved using 100 iterations of DRAP and 20 itera-tions of AP 18 is shown in Figure 1 (left). The restored phase is then smoothed using the first 37 Zernike polynomials (in Fringe order convention) and the obtained phase Φ is shown in middle figure compared to the data Φ on the right. The relative RMS error is Φ − Φ / Φ = 0.1108. Since phase retrieval is ambiguous up to (at least) a piston term (global phase shift), the piston terms of the phases are removed before calculating the norms.
We compare the performance of DRAP with the other projection algorithms for solving (9) including AP, KMDR, HIO and RAAR. The overall results are summarized in Figure 2. For brevity, we do not show the results for KMDR and HIO since their performance is far worse than that of RAAR and DRAP in both accuracy and stability. As shown in Figure 2, phase retrieval by RAAR and DRAP is at almost the same level of accuracy as well as convergence speed, however, DRAP (the blue curve) is more stable than RAAR (the red curve). The relaxation parameters used for RAAR and DRAP are 0.8 and 0.45, respectively 19 . For completeness, AP (the black curve) is more stable than the other algorithms as expected, however, it is incomparable to RAAR and DRAP in both accuracy and convergence speed.