Tensor-Free Proximal Methods for Lifted Bilinear/Quadratic Inverse Problems with Applications to Phase Retrieval

We propose and study a class of novel algorithms that aim at solving bilinear and quadratic inverse problems. Using a convex relaxation based on tensorial lifting, and applying first-order proximal algorithms, these problems could be solved numerically by singular value thresholding methods. However, a direct realization of these algorithms for, e.g., image recovery problems is often impracticable, since computations have to be performed on the tensor-product space, whose dimension is usually tremendous. To overcome this limitation, we derive tensor-free versions of common singular value thresholding methods by exploiting low-rank representations and incorporating an augmented Lanczos process. Using a novel reweighting technique, we further improve the convergence behavior and rank evolution of the iterative algorithms. Applying the method to the two-dimensional masked Fourier phase retrieval problem, we obtain an efficient recovery method. Moreover, the tensor-free algorithms are flexible enough to incorporate a-priori smoothness constraints that greatly improve the recovery results.


. Introduction
The theory of inverse problems is nowadays one of the main tools to deal with recovery problems in medicine, engineering, and life sciences. The real-world applications of this theory embrace for instance computed tomography, magnetic resonance imaging, and deconvolution problems in microscopy, see [BB , MS , Ram , SW , Uhl , Uhl ]. Besides these recent monographs, which are only a small selection, there exist many further publications about applications, regularization, and numerical solvers.
In this paper, we consider the subclass of bilinear and quadratic inverse problems [BB ]. Problem formulations of these kinds originate form real-world applications in imaging and physics [SGG + ] like blind deconvolution [BS , JR ], deautoconvolution [GH , GHB + , ABHS ], phase retrieval [DF , Mil , SSD + ], parallel imaging in MRI [BBM + ], and parameter identi cation in EIT [MS ]. Although these problems may be seen as non-linear inverse problems and can be solved by appropriate non-linear solvers, the question arises if we can exploit the speci c bilinear or quadratic structure of these problem. One of the most famous approaches of this kind is PhaseLift [CSV ], where the generic phase retrieval problem is lifted to a linear inverse problem with rank-one constraint using the universal property of the tensor product. After a relaxation the lifted problem is solved by semi-de nite programming. From the theoretical side, one has proved that this methodology yields the true solution with very high probability. Based on PhaseLift, we will derive convex lifting methods for general bilinear and quadratic inverse problems.
Similarly to PhaseLift, the relaxations here require the minimization of a nuclear norm functional on the tensor product, which can be done by applying proximal minimization methods, see for instance [CP ] and references therein. Since the dimension of the lifted problem can become huge, the question arises how one can implement these methods numerically. Therefore, we develop tensor-free, equivalent versions that can be performed in an e cient and memory-saving manner. The main bene t is here that the tensorfree reformulations of the required tensorial operations are performed exactly without additional error. To improve the convergence and solution behaviour, we additionally propose a novel reweighting technique The paper is organized as follows: In Section , we introduce the considered bilinear and quadratic inverse problems in more detail. The focus here lies on the bilinear setting since quadratic formulations are based on underlying bilinear structures. Based on the universal property of bilinear mappings and the nuclear norm heuristic, we then derive a relaxed convex minimization problem with linear lifted forward operator. To stabilize the lifted problem regarding noise and measurement errors, we additionally consider a T approach. In Section , we develop a proximal solver based on the rst-order primal-dual method of C and P [CP ] to solve the lifted problem numerically. The primal-dual iteration is here only one explicit example and can be replaced by any other proximal method. In so doing, we obtain a singular value thresholding depending on the actual H spaces building the domain of the original problem. Although the tensorial lifting allows us to apply linear methods, the dimension of the relaxed minimization problem becomes tremendous.
To overcome this issue, we derive a tensor-free representation of the suggested algorithm. The e cient computation of the required singular value thresholding is here ensured by exploiting an orthogonal power iteration or, alternatively, an augmented L process, see Section . Moreover, in Section , we introduce a novel H space reweighting to promote low-rank iterations and solutions. We complete the paper with a numerical study, where we consider the masked F phase retrieval, see Section . The contribution of the paper is twofold: Firstly, we derive a novel tensor-free proximal algorithm based on a convex lifting and relaxation. Secondly, we introduce a novel phase retrieval technique, which solves high-dimensional instances of the masked phase retrieval problem. Moreover, our approach allows us to incorporate smoothness constraints or relation between di erent pixels to improve the convergence of the algorithm.

. Convex li ings of bilinear and quadratic inverse problems
Bilinear and quadratic problem formulations arise in a wide range of applications in imaging and physics [SGG + ] like blind deconvolution [BS , JR ], deautoconvolution [GH , GHB + , ABHS ], phase retrieval [DF , Mil , SSD + ], parallel imaging in MRI [BBM + ], and parameter identi cation in EIT [MS ]. Since we are mainly interested in computing a numerical solution, we restrict ourselves to nite-dimensional bilinear problems of the form where B is a bilinear operator from H × H into K, and where H R N , H R N , and K R M are real nite-dimensional H spaces equipped with some inner product. Simultaneously, we study nite-dimensional quadratic problems of the form A quadratic operator Q : H → K is here the restriction of an associate bilinear operator B Q : H × H → K to its diagonal and is thus given by Q(u) B Q (u, u). Again, the H spaces H and K are nite dimensional. Without loss of generality, one may assume that the associated bilinear operator is symmetric, i.e. B Q (u, v) = B Q (v, u) for all u, v ∈ H, which can be enforced by considering / B Q (u, v) + / B Q (v, u). However, we will make no use of this restriction.
For the sake of simplicity, we use the following notation and write an arbitrary vector u ∈ H in the form u (u n ) N − n= ∈ R N , which may be interpreted as the coe cient vector with respect to some nite basis. The inner product of H can now be stated in the form ·, · H H ·, · = ·, H · , where H is some symmetric, positive de nite matrix in R N ×N , and where the inner products on the right-hand side denote the usual E ian inner product u, v v * u. Here · * labels the transposition of a vector or matrix. For the remaining spaces H and K, we proceed in the same manner with the associate matrices H ∈ R N ×N and K ∈ R M ×M , respectively.
Although we restrict ourselves to the real-valued setting, all following algorithms and statements remain valid for the complex-valued setting, where one considers sesquilinear mappings B : H × H → K with H C N , H C N , and K C M . Changing the notation to u (u n ) N − n= ∈ C N , replacing the property 'symmetric' by 'H ian,' and using the real part of the inner products, e.g. u, v = [v * u], where · * is the transposition and conjugation, all considerations translate one to one. In the complex case, the associate matrices H , H , and K may also be complex-valued, H ian, and positive de nite.
Inspired by PhaseLift [CESV , CSV ], which exploits the solution strategy developed for matrix completion problems [CCS , MGC ], we suggest to tackle the general bilinear and quadratic inverse problem (B) and (Q) by convex liftings and relaxations. Our approach is here based on the so-called universal property of the tensor product with respect to bilinear mappings, see for instance [Rya , Theorem . ]. In the nite-dimensional setting, the lifting may be stated as follows. T . (B , [R ]). For every bilinear mapping B : H × H → K, there exists a unique linear mappingB : In the nite-dimensional setting considered by us, the tensor product H ⊗ H can be simply identi ed with the matrix space R N ×N , where the rank-one tensor u ⊗ v corresponds to the matrix vu * = ( n u n ) N − , N − n = ,n = . The universal property is also applicable to quadratic mappings, where the tensor product H ⊗ H is restricted to the subspace of symmetric tensors H ⊗ sym H which is isomorphic to the space of symmetric matrices.

C
. (Q ). For every bounded quadratic mapping Q : H× → K, there exists a unique linear mappingQ : Proof. Since the liftingB of the associate bilinear mapping B is unique, the restriction Q B | H⊗ sym H yields a unique linear mapping with the asserted properties.
Due to the uniqueness of the lifting, the bilinear inverse problem (B) and the quadratic inverse problem (Q) are equivalent to the linear inverse problems where the additional constraint w means that w is positive semi-de nite. Although the positive semi-de niteness of the lifted quadratic problem is not mandatory, it strongly reduces the space of possible solutions. The central bene t of these reformulations is the shift of the non-linearity of the forward operator into the rank-one constraint. Although the problem is now linear, we have to deal with an additional non-convex side condition.
In order to eliminate the non-linear constraint for bilinear operators, we rst rewrite (B) into the rank minimization problem minimize rank(w) withB(w) = † T / and then relax the non-convex objective function by replacing it with the nuclear or projective norm || · || H ⊗ π H of the tensor w. Depending on the H spaces H and H , this norm is de ned by where the in mum is taken over all nite representations of the tensor w. In so doing, we nally obtain the convex minimization problem ). For w ∈ H ⊗ H , the projective norm is given by ||w || H ⊗ π H = R− n= σ n , where σ n denotes the n-th singular value and R the rank of w.
The main idea behind the nuclear norm heuristic is that the projective norm is the convex envelope of the rank on the unit ball with respect to the spectral norm. Usually, the nuclear norm heuristic empirically yields low-rank solutions of the matrix equation B(w) = † , see for example [CCS , MGC , RFP ] and references therein. If the lifted operatorB ful lls an appropriate restricted isometry property, one can rigorously prove that the bilinear inverse problem (B) and the relaxed nuclear norm minimization problem (B ) are equivalent, see [RFP , Theorem . ].
For quadratic inverse problems, we follow a similar approach. This means that we rst rewrite (Q) into a rank minimization problem over the positive semi-de nite (symmetric) matrix cone. In order to obtain a convex minimization problem, we again replace the objective function by the projective norm. Since the singular values σ n of a symmetric tensor w coincide with the absolute value of its eigenvalues λ n , the positive semi-de niteness may be incorporated into the projective norm by where the indicator function χ [ ,∞) is equal to for arguments in [ , ∞) and +∞ otherwise; so the modi ed projective "norm" || · || + H⊗ π H sums up the non-negative eigenvalues for positive semi-de nite tensors and is in nity otherwise. To solve the quadratic inverse problem (Q) numerically, we thus consider the convex minimization problem Up to this point, the given data † have been known exactly. A rst approach to incorporate noisy measurements into the inverse problems (B) and (Q) could be to extend the subspace of exact solutions with respect to a supposed error level. More precisely, one may consider the minimization problems and minimize ||w || + In other words, we minimize over all solutions that approximate the given noisy data ϵ with || − ϵ || K ≤ ϵ up to the error level ϵ.
Another approach is to incorporate the data delity of the possible solutions directly into the objective function. Following this approach, we may minimize a T functional with projective norm regularization to solve (B) and (Q). In more detail, we consider the problems All proposed convex relaxations of the lifted problems (B) and (Q) have in common that the minimization of the projective norm usually promotes low-rank tensors and, in the best case, yields a rank-one solutions. The later case forms our basic premise to derive numerical solutions.

A
. (B ). Suppose that the solutions w of the relaxation (B ), (B ϵ ), or (B α ) are at most rank-one tensors u ⊗ v such that (u, v) (approximately) solves the bilinear inverse problem (B). Likewise, suppose that the solutions w of the relaxations (Q ), (Q ϵ ), or (Q α ) have at most rank one so that they (approximately) solve the quadratic inverse problem (Q).

. Proximal algorithms for the li ed problem
To exploit the nuclear norm heuristic, we have to solve the minimization problem (B ), (B ϵ ), (B α ), (Q ), (Q ϵ ), and (Q α ) in an e cient manner. Looking back at the comprehensive literature about matrix completion [CR , CCS , MGC ], low-rank solutions of matrix equations [RFP ], and PhaseLift [CESV , CSV ], there exists several numerical methods like interior-point methods for semi-de nite programming, xed point iterations, singular value thresholding algorithm, projected subgradient methods, and low-rank parametrization approaches.
In order to solve the lifted bilinear and quadratic inverse problems, we follow another approach. Let us rst consider the actual structure of the six derived minimization problems in Section , which is given by where A : H ⊗ H → K denotes the lifted bilinear or quadratic forward operator, +∞} describes the data delity, and G : H × H → R is the (modi ed) projective norm. Since the regularization mapping G and, in some circumstances, the data delity mapping F are non-smooth but convex functions, we may apply proximal rst-order methods like the forward-backward splitting, the primal-dual method by C -P , the alternating directions method of multipliers (ADMM), the D -R splitting, and several variants of these and other algorithms, see for instance [CP ].
Although we can apply any of these algorithms, we exemplarily consider the forwardbackward splitting [LM , CW ] with xed parameters τ , σ > and θ ∈ [ , ]. The details of these methods are given below.
First, the forward-backward splitting and the primal-dual method are originally de ned for linear forward operators between nite-dimensional H spaces; so we have to equip the tensor product H ⊗ H with a corresponding structure. In the following, we always assume that this structure arises from the inner product de ned by where the inner products on the right-hand side denote the H -S inner product for matrices. For the quadratic setting, the symmetric H ian tensor product H ⊗ sym H is the related subspace embracing the symmetric tensors.
Next, the above stated methods are mainly based on concepts from convex analysis, which is re ected in the presuppositions; so the data delity mapping F and, similarly, the regularization mapping G have to be convex and lower semi-continuous. Commonly, a function f : X → R on the real H space X is called convex when for all x , x ∈ X and all t ∈ [ , ], and lower semi-continuous when for all sequences (x n ) in X with x n → x. Since F and G in the relaxed minimization problems of Section represent norms or indicator functions on closed convex sets, here the assumptions for the primal-dual method are always ful lled. The forward-backward splitting additionally requires a di erentiable data delity F with L -continuous derivative; so this method can only be applied to the T relaxations.
For the primal-dual iteration ( ), the rst proximity operator prox σ F * is computed with respect to the L -F conjugate F * . For any function f : X → R, where X denotes an arbitrary H space, the L -F conjugate f * : X → R is de ned by f * (x ) sup x ∈X x , and is always convex and lower semi-continuous, see [Roc ]. If the function f : X → R is lower semi-continuous and convex, the subdi erential ∂f at a certain point x ∈ X is given by and guratively consists of all linear minorants, see [Roc ]. Finally, the proximation prox f of a lower semi-continuous, convex function f : X → R is de ned as the unique minimizer Using the subdi erential calculus, one can show that the proximation coincides with the T / resolvent, i.e. prox f (x) = (I + ∂f ) − (x), see for instance [Roc , CP ]. The most crucial step in the forward-backward splitting ( ) and the primal-dual iteration ( ) is the application of the proximal projective norm prox τ | | · | | H ⊗π H , whereas the computation of proximal conjugated data delity prox σ F * is usually much simpler. To determine the proximal projective norm explicitly, we exploit the singular value decomposition of the argument with respect to the underlying H spaces H and H , which can be derived by an adaption of the classical singular value decomposition for matrices with respect to the E ian inner product.
L . (S ). Let w be a tensor in H ⊗ H . The singular value decomposition of w with respect to H and H is given by Allowing also non-symmetric but invertible factorizations, the root H / and H / are here non-unique. Possible candidates are the symmetric positive de nite square root or the C decomposition of H and H . In the following, the roots are solely required to derive the proximal projective norm mathematically. Their actual computation is not necessary in the nal tensor-free algorithm.
Proof of Lemma . . By assumption the possibly non-symmetric square roots H / and H / are invertible. Considering the classical E ian singular value decomposition of the matrix H / w (H / ) * , we immediately obtain The last arrangement may be easily validated by using the matrix notation v n u * n of the rank-one tensor u n ⊗ v n . Due to the identity u n , u m H = u n , (H − / ) * H H − / u m = u n , u m for all n, m ∈ { , . . . , R − }, the singular vectors { u n : n = , . . . , R − } form an orthonormal system with respect to H as well as their counterparts { v n : n = , . . . , R− } with respect to H .
Remark . . The singular value decomposition can also be interpreted as a matrix factorization of the tensor w. In this case, we have the factorization For the quadratic setting, we will rely on the eigenvalue decomposition instead of the singular value decomposition.

C
. (E ). Let w be a tensor in H ⊗ sym H. The eigenvalue decomposition of w with respect to H is given by where R− n= λ n (u n ⊗ v n ) is the classical eigenvalue decomposition of H / w (H / ) * with respect to the E ian inner product.
Proof. Due to the close relation between singular value decomposition and eigenvalue decomposition for symmetric matrices, the assertion follows from Lemma . with λ n σ n u n , v n H .
With the adaption in Lemma . , we can apply any numerical singular value method to compute the singular value decomposition of a given tensor. The next ingredient is the subdi erential of the nuclear norm based on H and H with respect to H ⊗ H . In the following, the set-valued signum function sgn is de ned by L . (S ). Let w be a tensor in H ⊗ H . Then the subdi erential of the projective norm || · || H ⊗ π H at w with respect to H ⊗ H is given by is a valid singular value decomposition of w with respect to H and H .
Proof. The central idea to compute the subdi erential is to rely on the corresponding statement for the E ian setting in [Lew ]. More precisely, if H = R N and H = R N are equipped with the E ian inner product, then the subdi erential ∂ HS with respect to the H -S inner product on R N ⊗ R N is given by Corollary . ]. The upper bound R is here some number less than or equal to min{N , N }, and the singular value decomposition of w may contain zero as singular value.
Next, we adapt this result to our speci c case. Therefore, we exploit that the projective norm is the sum of the singular values. Using Lemma . , we notice that the generalized projective norm of a tensor w is given by where the norm on the right-hand side is the usual projective norm with respect to the E ian inner product. In order to consider the inner product of H ⊗ H in the subdi erential, we exploit that where the inner product on the right-hand side is the usual H -S scalar product for matrices, see ( ). Thus, the subdi erential with respect to the H ⊗ H scalar product can be expressed in terms of ∂ HS by ( ) Plugging ( ) into ( ), and using the chain rule, we obtain the assertion.
With the characterization of the subdi erential, we are ready to determine the proximity operator for the projective norm with respect to the underlying H spaces H and H . In the following, the soft-thresholding operator with respect to the level τ is de ned by otherwise.
T . (P ). Let w be a tensor in H ⊗ H . The proximation of the projective norm is given by is a singular value decomposition of w with respect to the H spaces H and H .
Proof. In order to establish the statement, we only have to convince ourselves that w Sincew is already represented by its singular value decomposition, Lemma . implies If σ n > τ , the related summand becomes σ n ( u n ⊗ v n ). Otherwise, the summand is Since the singular value decomposition of w obviously has this form, the proof is completed.
Remark . (Singular value thresholding). The proximation of the projective norm with respect to H and H is simply a soft thresholding of the corresponding singular values. In the following, we denote the matrix-valued operation The proximation prox τ | | · | | + H⊗π H associated to the modi ed projective norm used in the quadratic setting may be computed analogously. As preliminary step, we determine the subdi erential of the modi ed norm with respect to the symmetric H ian tensor product. In a nutshell, we have simply to replace the set-valued signum function by the modi ed signum sgn + de ned by Since the truncated absolute value | · | + χ [ ,∞) is not an absolutely symmetric function, one cannot apply the subdi erential characterization in [Lew ]; there we compute the subdi erential directly.

L . (S ).
Let w be a positive semi-de nite tensor in H ⊗ sym H. The subdi erential of the modi ed projective norm || · || + H⊗ π H at w with respect to H ⊗ sym H is given by is an eigenvalue decomposition of w with respect to H. If w is not positive semi-de nite, then the subdi erential is empty.
Proof. Similarly to Lemma . , we rely on the corresponding statement for the E ian setting in [Lew ]. If we endow H R N with the E ian inner product, then the subdi erential of the modi ed projective norm is determined by where w = R− n= λ n (u n ⊗u n ) is a eigenvalue decomposition of w, see [Lew , Theorem ]. The transformation to an arbitrary H space H works exactly as in the proof of Lemma . .
Together with the positive soft-thresholding operator S + τ de ned by the computed subdi erential leads us to the following proximity operator.
T . (P ). Let w be a tensor in H ⊗ H. Then the proximation of the modi ed projective norm is given by where R− n= λ n ( u n ⊗ u n ) is an eigenvalue decomposition of w with respect to H.
Proof. The assertion follows similarly to Theorem . by replacing the subdi erential in Lemma . by Lemma . and the singular value decomposition by an eigenvalue decomposition.
Remark . . Analogously to the tensor-valued singular value thresholding operator S τ , we de ne the positive eigenvalue thresholding operator S + τ , which additionally projects the argument to the positive semi-de nite cone, by Knowing the proximation of the (modi ed) projective norm, we are now able to perform proximal algorithms to solve the minimization problem in Section . Although we can use any of the mentioned method, here we restrict ourselves the primal-dual iteration ( ). First, we consider the bilinear minimization problems (B ), (B ϵ ), and (B α ).
For exactly given data † corresponding to the minimization problem (B ), the data delity functional corresponds to F : . A simple computation shows that the conjugate F * is given by F * ( ) , † K and the associate proximal mapping by Thus, we obtain the following algorithm.
(ii) Iteration: For n > , update w (n) ,w (n) , and (n) by Remark . . If the projective norm in (B ) is weighed with a parameter α > in order to control the in uence of the data delity and the regularization, cf. (B α ), then the iteration in Algorithm . changes slightly. More precisely, one has to replace S τ by S α τ .
For inexact data ϵ , we rst consider the T minimization (B α ), whose data delity corresponds to F ( ) / || − ϵ || K . Here the conjugate F * is given by Again a simple computation leads to the proximation which yields the following algorithm. (ii) Iteration: For n > , update w (n) ,w (n) , and (n) by Remark . . Since the data delity F for the T functional is di erentiable, one may here apply the forward-backward splitting as an alternative for the primal-dual iteration. In so doing, the whole iteration in Algorithm . .ii becomes Analogously, one can here apply FISTA [BT ] in order to improve the convergence.
If we incorporate the measurement errors by extending the solution space as in (B ϵ ), then the data delity is chosen by F ( ) Since the conjugation of the unit ball yields the corresponding norm, we obtain F * ( ) = ϵ || || K + , ϵ with subdi erential cf. [RW , Exercise . ]. Since the proximation is not as simple as in the previous cases, we give a more detailed computation.
. The proximation of F * is then given by Proof. The vector˘ is the resolvent which is an immediate consequence of ( ). Bringing σ ϵ to the left-hand side, we are looking for a˘ such that For || − σ ϵ || K ≤ σϵ, the last condition is ful lled for˘ = . Otherwise, it follows that˘ = γ ( − σ ϵ ) for some γ > . With the notation z − σ ϵ , the rst condition becomes Since ||z || K > σϵ, we obtain γ = − (σ ϵ ) /| | z | | K , and consequently, the assertion.
Remark . . The central part of the resolvent in Lemma . is given by the operator Pictorially, this operator may be interpreted as shrinkage or contraction around the origin.
After this small digression to compute the proximation of the conjugated data delity, the minimization problem (B ϵ ) may be solved by the following primal-dual iteration. (ii) Iteration: For n > , update w (n) ,w (n) , and (n) by The weighing between data delity and regularization in Remark . analogously holds for Algorithm . .
The central di erences between the primal-dual iterations in Algorithm . , . , and . for the minimization problems (B ), (B α ), and (B ϵ ) are contained in the dual update of (n+ ) . If the parameter σ is chosen close to zero, the three iterations nearly coincide. Thus, all three iterations should yield similar results; so Algorithm . should also be able to deal with noisy measurements.
Remark . . For the relaxations (Q ), (Q α ), and (Q ϵ ) of the quadratic inverse problem (Q), we can derive analogous algorithms. We do not state these variants more closely since the only di erences are the application of the eigenvalue thresholding S + τ instead of the singular value thresholding S τ and the usage of the quadratic liftingQ instead of the bilinear liftingB. Up to these two small modi cations the methods completely coincide with the derived algorithms for bilinear inverse problems.
. Tensor-free singular value thresholding Each of the proposed methods solving bilinear or quadratic inverse problems is based on a singular value thresholding on the tensor product H ⊗ H or H ⊗ sym H respectively. If the dimension of the original space H ×H or H is already enormous, then the dimension of the tensor product literally explodes, which makes the computation of the required singular value decomposition impracticable. This di culty occurs for nearly all bilinear and quadratic image recovery problems. However, since the tensor w (n) is generated by a singular value thresholding, the iterations w (n) usually possesses a very low rank. Hence, the involved tensors can be stored in an e cient and storage-saving manner. In order to determine this low-rank representation, we only compute a partial singular value decomposition of the argument w of S τ by deriving iterative algorithms only requiring the left-and right-hand actions of w.
Our rst algorithm is based on the orthogonal iteration with R acceleration, see [Ste , GV ]. In order to compute the leading singular values, the main idea is here a joint power iteration over two -dimensional subspaces U n ⊂ H and V n ⊂ H alternately generated by U n w * H V n− and V n wH U n . These subspaces are represented by (i) Choose V ∈ R N × , whose columns are orthonormal with respect to H .
(ii) For n > , repeat: and set U n E n Z n and V n F n Y n .
until singular vectors have converged, which means where || · || L(H ,H ) denotes the operator norm estimated by σ (n) .
Here, reorthonormalization means that for each applicable m, the span of the rst m columns of the matrix and its reorthonormalization coincide, and that the reorthonormalized matrix has orthonormal columns. This can, for instance, be achieved by the well-known G -S procedure. Under mild conditions on the subspace associated to V , the matrices U n , V n , and Σ n diag(σ (n) , . . . , σ (n) − ) converge to leading singular vectors as well as to the leading singular values of a singular value decomposition w = R− n= σ n ( u n ⊗ v n ).

T . (S ).
If none of the basis vectors in V is orthogonal to the leading singular vectors v , . . . , v − , and if σ − > σ , then the singular values σ (n) ≥ · · · ≥ σ (n) − in Algorithm . converge to σ ≥ · · · ≥ σ − with a rate of Proof. By the construction in steps (a) and (b), the columns in E n and F n form orthonormal systems in H and H . In this proof, we denote the corresponding subspaces by E n and F n , which are related by E n = w * H F n− and F n = wH E n . Due to the basis transformation in (c), the columns of U n and V n also form orthonormal bases of E n and F n . Next, we exploit that the projection P n V n V * n H onto F n acts as identity on wH E n by construction. Since U n is a basis of E n , and since V * n H wH U n = Σ n by the singular value decomposition in step (c), we have and U n diagonalizes H w * H wH on the subspace E n .
Using the substitutions we notice that the iteration in Algorithm . is composed of two main steps. First, in (a) and (b), we compute an orthonormal basis E n of Secondly, ( ) implies that we determine an E ian eigenvalue decomposition on the subspace E n by E * n (H / w * (H / ) * )(H / w (H / ) * )E n = Z n Σ n Z * n and U n E n Z n . This two-step iteration exactly coincides with the orthogonal iteration with R acceleration for the matrix (H / w * (H / ) * )(H / w (H / ) * ), see [Ste , GV ]. Under the given assumptions, this iteration converges to the leading eigenvalues and eigenvectors with the asserted rates. In view of Lemma . , the columns in U n and V n together with Σ n converge to the leading components of the singular value decomposition of w with respect to H and H .
Considering the subspace iteration (Algorithm . ), notice that the algorithm does not need an explicit representation of its argument w but the left-and right-hand actions of w as a matrix-vector multiplication. We may thus use the subspace iteration to compute the singular value thresholding S τ (w) without a tensor representation of w.
(i) Apply Algorithm . with the following modifications: • If σ (n) m > τ for all ≤ m < , increase and extend V n by further orthonormal columns, unless = rank w, i.e., when the columns of E n would become linearly dependent. • Additionally, stop the subspace iterations when the first + singular values with < have converged and σ (n) + < τ . Otherwise, continue the iteration until all non-zero singular values converge and set = .
If the non-zero singular values of w are distinct, and if none of the columns in V n is orthogonal to the singular vectors with σ n > τ , then Algorithm . computes the low-rank representation of S τ (w).
Although Algorithm . for generic start values always yields the singular value thresholding, the convergence of the subspace iteration is rather slow. Therefore, we now derive an algorithm that is based on the L -based bidiagonalization method proposed by G and K in [GK ] and the R approximation in [GV ]. This method again only require the left-hand and right-hand action of w with respect to a given vector. For simplifying the following considerations, we initially present the employed L process with respect to the E ian singular value decomposition. The central idea is here to construct, for xed k, orthonormal matrices F k = [f , . . . , f k − ] ∈ R N ×k and E k = [e , . . . , e k− ] ∈ R N ×k such that the transformed matrix is bidiagonal, and then to compute the singular value decomposition of B k by determining orthogonal matrices Y k , Z k , and Σ k in R k×k such that we nally obtain a set of approximate right-hand and left-hand singular vectors, see [GK , BR , GV ]. The values β n and γ n of the bidiagonal matrix B k and the related vectors e n and f n can be determined by the following iterative procedure [GK ]: Choose an arbitrary unit vector p − ∈ R N with respect to the E ian norm, and compute For the rst iteration, we set γ − and f − . If γ m+ vanishes, then we stop the L process since we have found an invariant K subspace such that the computed singular values become exact.
In order to compute an approximate singular value decomposition with respect to the H spaces H and H , we exploit Lemma . and perform the L bidiagonalization regarding the transformed matrix H / w (H / ) * . Moreover, we incorporate the back transformation in Lemma . with the aid of the substitutions w * H f m+ − β m+ e m+ , and set γ m+ || p m+ ||.
(iii) Compute the E ian singular value decomposition of B k according to ( ), i.e.
Remark . . The bidiagonalization by G and K is based on a L -type process, which is numerically unstable in the computation of e n and f n . For this reason, we have to reorthogonalize all newly generated vectors e n and f n with the previously generated vectors, see [GK ]. This amounts to projecting e m+ to the orthogonal complement of the span of { e , . . . , e m } and the analog for f m+ , for instance, via the G -S procedure.
Remark . . The computation of the last p k − seems to be super uous since it is not needed for the determination of the matrix B k . On the other side, this vector represents the residuals of the approximate singular value decomposition. More precisely, we have Since the bidiagonalization method by G and K is based on the L process for symmetric matrices, one can apply the related convergence theory to show that the approximate singular values and singular vectors -for increasing k -converge to the wanted singular value decomposition of w, see [GV ]. Since we are only interested in the leading singular values and singular vectors, and since we want to choose the matrix B k as small as possible, this convergence theory does not apply to our setting.
In order to improve the quality of the approximate singular value decomposition computed by Algorithm . , we here use a restarting technique proposed by B and R [BR ]. The central idea is to adapt the L bidiagonalization such that the method can be restarted by a set of previously computed R vectors. For this purpose, B and R suggest a modi ed bidiagonalization of the form where the rst columns of the orthonormal matrices T / are prede ned by the R vectors of the previous iteration. For the computation of the rst < k leading singular values and singular vectors, we employ the following algorithm [BR ], which has been adapted to our setting by incorporating Lemma . and the substitution ( ).

A
.
(f) Calculate the remaining values of B k,n by applying step (ii) of Algorithm . with m = , . . . , k − . (g) Compute the E ian singular value decomposition of B k,n in ( ), i.e. B k,n = Y k,n Σ k,n Z * k,n , and set U k,n E k,n Z k,n and V k,n F k,n Y k,n .
Remark . . The stopping criterion in step (ii) originates from the error representation in ( ). For the operator norm || w || L(H ,H ) , one may use the maximal leading singular values of the former iterations, which usually gives a su ciently good approximation, see [BR ].
Although the numerical e ort of the restarted augmented L process is enormously reduced compared with the subspace iteration, we are unfortunately not aware of a convergence and error analysis for this speci c variant of L -type method. Nevertheless, we can employ the obtained partial singular value decomposition to determine the singular value thresholding.
(i) Apply Algorithm . with the following modifications: • If σ (n) m > τ for all ≤ m < , increase and k with < k, unless k = rank w, i.e., when γ (n) k in Algorithm . vanishes. • Additionally, stop the augmented L method when the first + singular values with < have converged and σ (n) + < τ . Otherwise, continue the iteration until all non-zero singular values converge and set = .
Remark . (Tensor-free eigenvalue thresholding). Using the relation between eigenvalues and singular values, we apply Algorithm . and . in the same manner to compute the positive eigenvalue thresholding. More precisely, with λ m σ m u m , v m and Λ n diag(λ , . . . , λ − ), we obtain the eigenvalue decomposition from the singular value decomposition, i.e., U * n HwH U n = Λ n from V * n HwH U n = Σ n .
Analogously, we can transfer Algorithm . and . to the quadratic setting.
Besides the singular value thresholding, the proximal methods in Section to solve the lifted and relaxed bilinear and quadratic problems in Section require the application of the lifted operatorsB andQ as well as their adjointsB * andQ * . Both operations can be computed in a tensor-free manner. Assuming that w has a low rank, one may compute the lifted bilinear forward operator with the aid of the universal property in Theorem . . Likewise, one may apply the lifted quadratic forward operator by using Corollary . .

C . (T ).
Let Q : H → K be a quadratic mapping. If w ∈ H ⊗ sym H has the representation w = U Λ U * with U [ u , . . . , u − ], Λ diag(λ , . . . , λ − ), then the lifted forward operatorQ acts by Considering the proximal methods, we see that the adjoint lifting only occurs in the argument of the singular value thresholding. If one applies the subspace iteration or the augmented L process, it is hence enough to study the left-hand and right-hand actions of the adjoint liftings. In the bilinear setting, these actions can be expressed by the left-hand or right-hand adjoint of the original bilinear mapping B. Proof. Similarly to before, we test the action of the imageQ * ( ) on u ∈ H with an arbitrary vector v ∈ H. Exploiting the symmetry, we obtain

Remark . (Composed tensor-free adjoint lifting). Since the left-hand and right-hand actions of the tensor w
. Analogously, the actions in the quadratic setting are given by Now we are ready to rewrite the proximal methods in Section into tensor-free variants. Exemplarily, we consider the primal-dual method for bilinear operators and exact data, see Algorithm . . (ii) Iteration: For n ≥ , update w (n) and (n) : (a) Using the tensor-free computations in Corollary . , determine with Algorithm . (or . ). The required actions are given in ( ) and ( ).
Remark . . As starting value for the augmented L bidiagonalization according to Algorithm . required for step (ii.b) of Algorithm . , we suggest a linear combination of the right-hand singular vectors of the previous iteration w (n) in the hope that they are good approximations of the new singular vectors.
Adapting the computation of (n+ ) , one may analogously apply Algorithm . and . in a complete tensor-free manner. Using Corollary . , Remark . , and Equation ( ), we obtain the corresponding tensor-free proximal methods for quadratic inverse problems.
Because the singular value thresholding can be computed with arbitrary high accuracy, the convergence results for the primal-dual algorithm translates to our setting. The convergence analysis [CP , Theorem ] yields the following convergence guarantee, where the norm of the bilinear operator B is de ned by . Under Assumption . and the parameter choice rule θ = as well as τ σ || B || < , the iteration (w (n) , (n) ) in Algorithm . converges to a point (w † , † ), where w † = u † ⊗ v † corresponds to a solution (u † , v † ) of the bilinear inverse problem (B).
Proof. For the general minimization problem ( ), the related saddle-point problem is given by minimize . Hence, the bilinear relaxation with exact data (B ) corresponds to the primaldual formulation Due to [Roc , Theorem . ], the rst components w of the saddle-points ( w, ) of ( ) are solutions of (B ). Vice versa, [Roc , Corollary . . ] implies that the solutions w of (B ) are saddle-points of ( ). In particular, the saddle-point problem ( ) has at least one solution since the given data are exact. Now, [CP , Theorem ] yields the convergence (w (n) , (n) ) → (w † , † ) of the primaldual iteration in Algorithm . , where the limit (w † , † ) denotes a saddle-point of ( ), and w † thus a solution of (B ). Under Assumption . , this solution has at most rank one; so every (u † , v † ) with u † ⊗ v † = w † is a solution of the bilinear problem (B).
Similar convergence guarantees can be obtained for the bilinear relaxations (B ϵ ) and (B α ) as well as for the quadratic variants (Q ), (Q ϵ ), and (Q α ). Depending on the considered problem -the bilinear or quadratic forward operator -and on the applied proximal algorithm, one may even obtain explicit convergence rates.

. Reducing rank by Hilbert space reweighting
The projective norm heuristic usually ensures that the solutions of the relaxed lifted minimization problems in Section have low rank, but how may we decrease the rank of the iteration w (n) even further to speed up the convergence, and how can we obtain meaningful solutions if Assumption . is not ful lled?
If we look back at the employed methods in compressed sensing, where one like to recover a sparse vector from a set of linear measurements, one idea to improve the sparsity of the solution is to reweight the -norm. Since the nuclear norm coincides with the -norm of the singular values, one can try to incorporate this reweighting approach, which however is not possible. The main reason here is that the singular values do not have a special ordering. If a tensor, for instance, possesses a multiple singular value, then the reweighting will be ambiguous.
Instead of reweighting the projective norm itself, we propose to modify the H norms of the underlying spaces H and H , where we initially consider the bilinear setting. From a heuristic point of view, we may interpret the leading singular value σ together with corresponding singular vectors u ⊗ v of the variable w (n) as an approximate solution of the inverse problem (B); so we may adapt the inner products of H and H in a way that promote vectors in this direction.
More generally, we want to reweight the norms of H and H with respect to some orthonormal bases, which, for instance, result from the singular value decomposition of the primal variable w (n) . In the following, we only consider the reweighting of H . The reweighting of H can be done completely analogously. If Φ [ϕ , . . . , ϕ N − ] denotes an arbitrary orthonormal basis of H , then P 's identity states In other words, the matrix H corresponding to the inner product on H can be written in the form H = H ΦΦ * H , which incidentally shows ΦΦ * = H − . To reweight the Hnorm ( ) with respect to the basis Φ, we introduce the weights Ξ diag(ξ , . . . , ξ N − ) and the adapted norm || · || H (Ξ) de ned by In so doing, we obtain the updated inner product matrix H (Ξ) = H ΦΞΦ * H with inverse H − (Ξ) = ΦΞ − Φ * . Depending on the weights, some directions are more promoted or penalized than others.
Unfortunately, for large-scale bilinear inverse problems, the proposed approach is impractical since we have to store a complete orthonormal basis. To overcome this disadvantages, we make a slightly re nement of our approach, where we only reweight a small set of basis vectors that we want to promote; so we choose the weights Ξ as ξ n = − λ n with λ n ∈ ( , ) for n = , . . . , S − and λ n = otherwise, and thus make the approach The inverse is here given by Hence, to update the inner product matrices, we only require the original matrices H and H − , the (transformed) promoted vectors ϕ n and ϕ n , and the weights λ n . For the second H space H , we proceed completely analogously. The reweighting of the H spaces has consequences for the proposed algorithms. On the one hand, notice that the adjoint of the lifted operatorB directly depends on the actual H spaces H and H . In order to update the adjoint, we rst determine the standardized adjointB * HS, E with respect to the H -S and E ian inner product. Afterwards, we transform this adjoint to the actual spaces by the following lemma.

L
. (A ). The adjoint operatorB * H ⊗H ,K with respect to the H spaces H ⊗ H and K is given by whereB * HS, E denotes the adjoint with respect to H -S and E ian inner product.
Proof. The assertion immediately follows from On the one side, Lemma . allows us to transform the adjointB * HS, E , which can usually be determined more easily, to the H spaces H , H , and K. On the other side and more generally, we may transform the adjoint lifted operator between arbitrary H space structures. For our speci c setting in ( ) and ( ), for instance, we obtain the following transformation rule, where T (Ξ) denotes the transformation matrix Analogously, one can apply the reweighting technique in Algorithm . .ii.c to Algorithm . and . . All observations and heuristics in this section also hold true or may be easily translated for the quadratic setting.

. Masked F phase retrieval
In this section, we apply the developed algorithm to the phase retrieval problem. Generally, the phase retrieval problem consists in the recovery of an unknown signal from its F intensity. Problems of this kind occur, for instance, in crystallography [Mil , Hau ], astronomy [BS , DF ], and laser optics [SST , SSD + ]. To be more precise, in the following, we consider the two-dimensional masked F phase retrieval problem [LCL + , CSV , CESV , CLS , GKK ], where the true signal u ∈ C N ×N is rstly pointwise multiplied with a set of known masks d ∈ C N ×N and afterwards transformed by the two- Denoting by the H (or pointwise) product, the masked F phase retrieval problem can be stated as follows. In general, phase retrieval problems are ill-posed due to the loss of the phase information in the frequency domain. In one dimension, the problem usually possesses an enormous set of non-trivial solutions, which heavily di er from the true signal, see for instance [BP ] and references therein. In the two-dimensional setting considered by us, the situation changes dramatically since here almost every signal can be uniquely recovered up to a global phase and up to re ection and conjugation, see [HM , Hay , BBE ].
Before considering some numerical simulations, we study the quadratic nature of the masked F phase retrieval problem, where we employ the complex notation as mentioned in the introduction of Section . For this purpose, we interpret both, the domain C N ×N and the image C L×M ×M of the measurement operator in Problem . , as real H spaces. To simplify the notation, we vectorize the unknown image u ∈ C N ×N and the given F intensities | F M ×M [d u] | ∈ C M ×M for a xed mask columnwise. Henceforth, the vectorized variables are labeled withì ·. On the F side, we additionally attach the measurements for di erent masks to each other. Thus, the domain of the measurement operator in Problem . becomes H = C N N and the image K = C LM M . At the moment, the endowed real inner product is not speci ed in detail.
In order to derive an explicit representation of the (vectorized) forward operator, we write the two-dimensional (M × M )-point F transform as with the F matrices In the same manner, we write the pointwise multiplication d u as matrix vector multiplication diag(d ) ì u, where diag(d ) denotes the matrix with diagonal d . Combining the interference with the given masks into one operator, we de ne the matrix . . .
The action D L ì u thus attaches the single masked signals to each other.
Composing the two operations, and squaring the measurements, we notice that Problem . is equivalent to where I L ∈ C L×L denotes the identity matrix, and † ∈ R LM M the vectorized exact squared, masked F intensities of the looked for signal. The associate bilinear operator B F of the quadratic forward operator Q F in (F) is now given by where the function diag extracts the diagonal of a matrix. Since the last right-hand side is linear in ì The last missing ingredient in order to apply our proximal algorithms is the action of the adjointQ * F (ì ) for a xed ì ∈ K.
L . (T ). If the H spaces H and K are endowed with the real E ian inner product, i.e. H = I and K = I , then the action of the adjoint operatorQ * F (ì ) with xed ì ∈ K is given by Proof. We compute the action of the adjoint operator with the aid of Lemma . . For this purpose, we rst determine the left adjoint of B F by considering for all ì f ∈ H and xed ì e ∈ H. For the right adjoint, we analogously obtain where diagonal in the middle is not conjugated. Summation of the left and right adjoint as in Lemma . yields the assertion.
Remark . . Using Lemma . , we can now transform the actions of the E ian adjoint to our actual H spaces. More precisely, we havȇ One of the central reasons to choose the masked F phase retrieval problem as application of the proposed algorithms and heuristics is that the phase retrieval problem (F), although severely ill posed, behaves nicely under convex relaxation. More precisely, one can show that under certain conditions the solution of the convex relaxed problem minimize ||w || + H⊗ π H subject toQ F (w) = † is unique and coincides with the true lifted solution ì u ì u * with high probability, see [CSV , CESV , CLS , GKK ]. Therefore, we expect that the proposed tensor-free proximal algorithms converge to a unique rank-one solution. .

. E ects of bidiagonalization and reweighting
In the rst numerical example, we want to study the e ect of the applied augmented L bidiagonalization and of the reweighting heuristic to the computation time and the convergence behavior. For all simulations in this paper, the proposed methods have been implemented in MATLAB ® (R a, -bit) and are performed using an Intel © Core™ i -CPU ( × . GHz) and GiB main memory. The employed true two-dimensional signal consists in a synthetic image referring to transmission electron microscopy experiments with nanoparticles. The test image u is rather small and is composed of × pixels. The corresponding tensor w uu * is already of dimension .
Based on the true image, we compute synthetic and noise-free data by applying the masked F transform. The entries of the eight employed masks have been randomly generated with respect to independent R random variables. More precisely, the entries of the masks are distributed with respect to the model with probability / , with probability / , − √ with probability / .
( ) Masks of this kind have been studied in [CLS , GKK ] in order to de-randomize the generic phase retrieval problem considered in [CSV , CESV ]. In our experiment, we employ the × -point F transform such that the complete autocorrelation of the masked signals is encoded in the given F intensities. A rst reconstruction of the true signal based on Algorithm . is shown in Figure , where we compute the singular value threshold with the aid of a full singular value decomposition of the tensor w (n) . For the involved H spaces H and K, we simply choose the corresponding E ian spaces. In other words, we employ the H -S inner product for the (vectorized) matrices in H = C N N .
Although the reconstruction is quite accurate, the main drawbacks of a direct application of Algorithm . are the computation time and memory requirements. For iterations, we need about . minutes to recover the true signal. Next, we employ the tensor-free variant of the primal-dual iteration in Algorithm . , where we apply the augmented L bidiagonalization to determine the singular value thresholding   (Algorithm . ). Using this modi cation, we merely need about seconds to perform the reconstruction. Since we can control the accuracy of the partial singular value decomposition, the performed iteration essentially coincides with the original iteration.
The in uence of the augmented L bidiagonalization on the computation time is presented in Table . The parameter k here describes the maximal size of the bidiagonal matrix B k . Further, denotes the number of xed R pairs in the augmented restarting procedure. Since the approximation property of the incomplete L method becomes worse for small k, we require more restarts in order to observe an accurate partial singular value decomposition. For higher-dimensional input images, the time saving aspect becomes much more important.
Considering the evolution of the non-zero singular values of w (n) during the iteration, see Table , we observe that the projective norm heuristic here enforces a very low rank. After iterations, we nearly obtain a rank-one tensor such that the additional reconstruction error caused by the rank-one projection of w (n) to extract the recovered image becomes negligible. After iterations, the tensor w (n) has converged to a rank-one tensor.
In order to promote the rank-one solutions of the masked F phase retrieval problem even further, we next exploit the reweighting approach proposed in Section . For our current simulation, this means that we replace the inner product matrices H = I Table : Evolution of the non-zero singular values using Algorithm . with augmented L process. by the modi ed matrices H (Ξ) de ned in ( ), where we build up the basis {ϕ k } from the singular value decomposition of w (n) . In the E ian setting considered in this simulation, the transformed directions ϕ k simply coincide with ϕ k . The reweighting is here applied every ten iterations with relative weight λ / . The results of the tensor-free masked F phase retrieval with augmented L process and H space reweighting (Algorithm . ) are shown in Figure . Although the reconstruction looks comparable, we want to point out that the absolute errors are several magnitudes smaller. If we compare the evolution of the ranks, see Figure , we can see that the proposed reweighting heuristic reduces the rank quite e ciently. More precisely, most of the iterations have rank one. Due to this reduction, the reweighting has also a positive e ect on the computation time and the average number of restarts of the L process, see Table . Moreover, we may notice that the data delity term ||Q F (w (n) ) − † || decreases with a higher rate. Here the convergences stops after about iterations due to numerical reasons.

. . Incorporating smoothness properties
One of the central di erence between our tensor-free reweighting algorithm and Phase-Lift [CSV , CESV ] consists in the modeling of the domain and image space of the phase retrieval problem. Where PhaseLift is based on the standard E ian setting,

Figure :
Evolution of the rank and data fidelity term during the masked phase retrieval problem based on Algorithm . and Algorithm . .
we rely on arbitrary discrete H spaces H and K. Especially, in two-dimensional phase retrieval, we may thus exploit relationships between neighbored pixels like nite di erences. More precisely, we here study the in uence of the a-priori smoothness property formulated in terms of the two-dimensional discretized S norm.
In order to discretize the (weighted) S space W , , we employ the forward di erences In comparison, the discretized L -norm corresponds to the standard E ian setting associated with the identity matrix.
The masked F intensities of the second example has been created on the basis of a transmission electron microscopy (TEM) reconstruction in [LSP + a]. The image has a dimension of × pixels such that the related tensor possesses complex-valued entries and requires GiB memory (double precision complex numbers). Further, we apply four random masks of R -type ( ). Since the masks are generated entirely random, about a one-sixteenth of the pixel are blocked by all masks. The test  image together with the number of masks covering a certain pixel are shown in Figure . To solve the corresponding masked phase retrieval problem, we apply Algorithm . , where we reweight the H spaces every steps with the relative weight λ / . Since the algorithm tends to higher-rank tensors in the starting phase, we only compute a partial singular value decomposition with at most ve leading singular values using ten L vectors e n and f n . Hence, we perform Algorithm . in an inexact manner. After a few iterations the rank of w (n) decreases such that the method becomes again exact, and that the convergence is ensured.
The reconstructions for the E ian and S setting are presented in Figure  and , respectively. Due to the small number of four masks, the convergence of the algorithm using the discretized L -norm is very problematic. Although the method converges for the chosen parameters, the convergence rate is very low. Moreover, pixels that are not covered by any masks cannot be recovered and cause reconstruction defects characterized by black holes. Using instead the discretized S norm with weight µ ( / , , ) and the same parameters, we obtain a much faster convergence and rank reduction, see Figure . Further, the required number of dual updates in order to produce a non-zero primal update is reduced. A small drawback is that the S norm tends to smooth out the edges of the particles in the reconstruction. One the other side, the a-priori smoothness condition allows us to recover pixels not covered by the given data.

. . Phase retrieval for large-scale images
Using the proposed reweighting heuristic to reduce the rank of the iteration w (n) , we are able to perform Algorithm . for much larger test images. In this numerical experiment, we consider an × pixel image, whose F data are again based on a TEM micrograph of gold nanoparticles [LSP + a]. The lifted image here already requires  . The pre-image space C N N is equipped with the discrete L inner product. The reconstruction is terminated a er iterations. In order to compare the retrieval with the true signal, the pixels are presented in the same range as the true image, resulting in the truncation of higher intensities.   (b) Evolution of the data fidelity term.

Figure :
Evolution of the rank and data fidelity term during the masked phase retrieval based on Algorithm . . Both terms are compared for the discrete L norm (E ) and the discrete S norm with weight µ = ( . , . , . ) * .
TiB memory in order to hold the complex-valued entries with double precision. Di erently from the previous examples, we here apply eight G ian masks following the standard normal distribution d [n , n ] ∼ N( , ).
The recovered signal for the E ian inner product is shown in Figure together with the evolution of the rank and data delity in Figure . Analogously to the above experiments, the H space H is reweighted every ten iterations with a relative weight λ / .

. . Corruption by noise
In the last numerical example, we study the in uence of noise to the proposed tensor-free primal-dual algorithm. For simplicity, we only study the behavior of the proposed method with respect to white or G ian noise of the form ϵ † + ζ where ζ is a normal distributed random vector. For the noise level ϵ || † − ϵ ||, we consider di erent percentages of the norm || † ||. Similarly to the rst numerical examples, we again apply four R -type masks of the form ( ). The synthetic data † for the × test image are again based on a TEM reconstruction of gold nanoparticles [LSP + b, Figure S B] and the × -point F transform. The domain H is again equipped with the discretized S norm with weight µ ( / , , ) * . Due to the noise, we have adapted Algorithm . to the T regularization in Algorithm . , which simply means that we have to multiply (n+ ) in step (ii.a) with /σ + additionally and to scale the threshold in step (ii.b) to S τ α . Since α a ects the in uence of the projective norm heuristic, this parameter has to be chosen relatively large. In this brief test run with noisy

Figure :
Evolution of the rank and data fidelity term during the masked phase retrieval problem based on Algorithm . using eight G ian masks and terminating a er iterations.  measurements, we chose α independent of the noise level. Surely, the results can be improved by more sophisticated parameter choice rules.
The recovered signals and the evolution of the rank and data delity are shown in Figure and , respectively. Due to the noise, we cannot ensure that the recovered tensor has rank one and corresponds to a meaningful approximation of the true signal. If we endow the domain H with the E ian inner product in analogy to classical PhaseLift, then we observed that the rank of the iteration w (n) increases uncontrollable, and the proposed algorithm diverges because of the limited data provided by only four masks. Using instead the weighted S norm and the H space reweighting, the rank becomes one after a short starting phase, where the maximal rank is restricted by ve. Because of the same starting value and the same regularization parameter, the primal-dual iteration initially performs nearly identical for the three considered noise levels such that the rank evolutions coincide. Further, for all three cases, we here obtain reasonable reconstructions. Moreover, the pixels not covered by any mask are lled up, and the in uence of the noise to the reconstruction is smoothed out. Further numerical experiments suggest that Algorithm . in combination with a-priori smoothness properties recovers the unknown signal in a stable manner. ) − g † 5% 10% 25% (b) Evolution of the data fidelity term.

Figure :
Evolution of the rank and data fidelity term during masked phase retrieval for noisy measurements. The noise level is given with respect to the norm || † || of the noise-free measurements.

. Conclusion
In this paper, we developed a novel proximal algorithm to solve bilinear and quadratic inverse problems. The basic idea was to exploit the universal property to lift the considered problem to linear inverse problems on the tensor product. In order to deal with the rank-one constraint, we applied a nuclear or projective norm heuristic, which is known to produce low-rank solutions. The relaxation of the lifted problem yields a constrained minimization problem, which has been solved by applying the rst-order primal-dual algorithm proposed by C and P . Since the choice of the underlying algorithm is somehow arbitrary, there are several further options to develop minimization methods for the lifted and relaxed problem. For instance, one may apply the alternating direction method of multipliers (ADMM) [BPC + ], forward-backward splitting [LM , CW ], or FISTA [BT ].
The exibility to adapt the domain of the bilinear or quadratic operator allows us to incorporate smoothness assumptions or neighborhood relations. As demonstrated for the masked F phase retrieval problem, this freedom enables us to recover pixels that are blocked by the applied masks such that they do not contribute to the given measurements. Further, the smoothing properties of the discretized S norm greatly improve the numerical observed convergence rates. Moreover, one can exploit this exibility to reweight the pre-image spaces in order to promote low-rank solutions. In the moment, we rely on a projective norm based on the corresponding H norms. Here, the question arises if one can employ nuclear "norms" based on semi-norms like the total variation or total generalized variation.
For the masked F phase retrieval problem, we have shown that the developed algorithms seem to be stable under noise. More precisely, we have studied the in uence of white noise, which can be treated by choosing the squared E ian or discretized L -norm as data delity term of the T functional in (B α ) and (Q α ). In order to incorporate more realistic noise models like P noise into phase retrieval, one can, for instance, replace the data delity by the K -L divergence. In so doing, one only has to update the proximal mapping to compute the dual variable.