Nanoscale Photonic Imaging pp 165202  Cite as
Proximal Methods for Image Processing
 3.1k Downloads
Abstract
In this tutorial on proximal methods for image processing we provide an overview of proximal methods for a general audience, and illustrate via several examples the implementation of these methods on data sets from the collaborative research center at the University of G\(\ddot{o}\)ttingen. The ProxToolbox is a collection of routines that process simulated and raw image data. We use the toolbox not only for research purposes, but also as a teaching tool, to give beginning students access to relevant, laboratory data, upon which they can test their understanding of algorithms and experiment with new ideas.
6.1 All Together Now
A major challenge in building and maintaining collaborations across disciplines is to establish a common language. Sounds simple enough, but even a common language is not helpful without a common understanding of the basic elements. When it comes to the daytoday exchange of data and software, this means building a common data management and processing environment. Try to do this, however, and you learn very quickly that even for something as concrete as building software that everyone can use, there are different ways of interpreting and understanding what the software does. In the context of Xray diffraction, for instance, what a physicist might understand as a software routine that simulates the propagation of an Xray through an optical device, a mathematician would understand as an operator with certain mathematical properties.
The first successful algorithms for phase retrieval were developed and understood by physicists as iterative procedures that simulate the forward and backward propagation of a wave through an optical device, where in each iteration the computed wave is adjusted to fit either measurement data or some experimental constraint, like the shape of the aperture or the illuminating beam. Later, mathematicians reinterpreted these operations in terms of the application of projectors to iterates of a fixed point mapping. Of most recent vintage is an effort by a new generation of applied mathematicians to sidestep the more interesting aspects of the physicists’ algorithms—namely that they occasionally don’t work—by lifting the problem to a space that is too highdimension for any practical purposes, and then relaxing the underlying problem to something with theoretically nicer properties, but whose solution bears little meaningful relationship to the problem at hand. At this point, one is reminded of Richard Courant’s lamentation, “the broad stream of scientific development may split into smaller and smaller rivulets and dry out.” [2] Here’s to swimming against the current.
6.1.1 What Seems to Be the Problem Here?
Ptychography was briefly mentioned in Chap. 2. Ptychography is harder to say than to describe. A quilt is a ptychogram of sorts. Or put in more technical terms, ptychography is a combination of blind deconvolution and computed tomography for phase retrieval. The original idea proposed by Hegerl and Hoppe [4], was just the computed tomography part: to stitch together the original object \(\psi \) from many measurements at different settings of the instrument, modeled by \({\mathcal D}_{j}\), the j indexing the setting. One of the implicit complications of conventional ptychography, which differs from the original is that the illuminating beam is also unknown—this is the blind deconvolution part.
The problem in blind ptychography is to reconstruct simultaneously the object \(\psi \) and illuminating beam u from a given ptychgraphic dataset. Nearfield, or inline ptychography [9] involves moving the imaging plane along the axis of propagation of the beam (i.e., away from the plane where the object lies). This is very similar to phase diversity in astronomy [10], but there one changes the defocus in the farfield instead of the imaging plane in the near field (also, one does not have to recover the beam in astronomy applications). Mathematically, however, the two instances have the same structure. In conventional farfield ptychographic experiments the beam is much smaller than the specimen. The different measurements consist of scans of \(\psi \) in the lateral direction with sufficient overlap between successive images. Lateral translations and translations along the axis of propagation were combined in [11]. Note that the last case is least restrictive in terms of probe properties, see also [12] for a detailed comparison and discussion.
The issue of existence of a \(\psi \) that satisfies all the equations is discussed in Chap. 23. For this chapter, existence is recast as consistency of the measurements and the physical model. The data is exact.^{2} What is not entirely accurate is the model for the data and the computational bandwidth, e.g. finite precision arithmetic.
6.1.2 What Is an Algorithm?
One obvious exception is when \(\left( {\mathcal D}_{F}(\psi )\right) _{i}=0\). If you are a physicist you might argue that \(\left( {\mathcal D}_{F}(\psi )\right) _{i} = 0\) on a set of measure zero—a fancy way for saying never, with infinite precision arithmetic. Except that electronic processors operate with finite precision arithmetic and zero is therefore enormous on a computer. In fact, with double precision, zero is not smaller than \(1e16\). To see what kind of error this can lead to, suppose that \(\left( {\mathcal D}_{F}(\psi )\right) _{i} = 1e15\) with infinite precision arithmetic, but because of roundoff, the computer returns \(1e15\). A very small difference locally. But suppose that, at this pixel, the measured intensity is \(I_i=10\). In computing the projection, the computer returns a point with \({\widehat{y}}_i = 10\) instead of \({\widehat{y}}_i = 10\). This makes an enormous difference. The typical user won’t see this kind of error often, but in numerical studies in [14] it happened about \(12\%\) of the time, which is not insignificant. Anyone with experience in programming knows to be careful about dividing. Without thinking much more about it, one usually codes some exception to avoid problems when division gets too dicey.
Returning to iterative algorithms just in the context of phase retrieval (the probe \(z\) is known and there is only a single measurement \(M=M_1\) so that the variables \(\mathbf{{u}}\) are not needed), the procedure (6.9) is a natural way to think of a numerical procedure when approaching things physically: one makes a guess for the object \(\psi ^0\), propagates it through the optical system according the model given by (6.10) and updates this guess to \(\psi ^1\) depending on whether the elements satisfy the data and some a priori constraint. And repeat. The user would stop the iteration either when he needed to go for coffee or when the iterates stop making progress, in some loosely defined way. The subtle point here is that (6.9) is not a fixed point iteration, but rather a simulation of a physical process.
Whether or not the iterations above converge to a fixed point, and what this point has to do with the problem at hand is the subject of Chap. 23. For the purposes of this tutorial, the algorithm will simply be run with the given mappings and the user will be left to interpret the result. A few tools are provided within the ProxToolbox to monitor the iterates according to mathematical, as opposed to physical, criteria. For the beginning reader all that is important to keep in mind is that, first of all, the algorithms don’t always converge, and second of all, when they converge, the limit point is not a solution to the problem you thought you were solving, but you can usually get there easily from the limiting fixed point. Another issue to keep in mind is that the physical criteria that scientists apply to judge the quality of a computed solution usually does not correspond to the mathematical criteria used to characterize and quantify convergence of an algorithm. It is not uncommon to see pictures of “solutions” returned by various algorithms at iteration k, or to see a comparison of a root meansquared error estimate of an iterate of various algorithms. This is, from a mathematical perspective, not really meaningful for several reasons. The first reason is that, unless the underlying optimization problem is to minimize the root meansquared error of something, there is no reason to expect that an algorithm should do this. The second reason is that, as already mentioned, the iterates of some algorithms, like DouglasRachford, are not the points that approximate solutions to the desired optimization problem, but their shadows, defined as the projection of these points onto a relevant set, are. Comparison of the quality of solutions returned by algorithms is common, but it should be recognized that such comparisons are not mathematical, but rather phenomenological, if of any scientific significance at all.
The identification of the HIO algorithms with DouglasRachford, at least for one parameter value, made a lot of sense when it was first discovered. The HIO algorithm is famously unstable. The way most people use it today is to get themselves in the neighborhood of a good solution by running 10–40 iterations of HIO, at which point they switch to a more stable algorithm to clean up their images. The value of HIO or DouglasRachford is that they rarely get stuck in local minimums. The identification with DouglasRachford makes this phenomenon clear since it can be proved that, if the sets \(\mathfrak {S}\) and M do not intersect, then DouglasRachford does not possess fixed points. If M were convex, then you could even prove that the iterates must diverge to infinity in the direction of the gap vector between best approximation pairs between the sets [19]. For nonconvex problems like noncrystallographic phase retrieval, the iterates need not diverge, but they cannot converge. In Chap. 23 the convergence theory is discussed in some detail.
As should be clear by now, there is no equation being solved here, but rather some point is sought, any point, that satisfies an equation and any other kind of requirement one might like to add. It is high time to bring the main character of this story to the stage. In the most general format (ptychography) this has the form
The behavior of these algorithms in the presence of model inconsistency tends to be mistaken for another bugbear of inverse problems, namely nonuniqueness. For example, around the turn of the millennium, oversampling in the image domain was proposed to overcome the nonuniqueness of phase reconstructions for noncrystallographic observations [20]. A few moments reflection on elementary Fourier analysis, and careful reading of Chap. 2 is all you need to convince yourself, however, that oversampling has less to do with uniqueness than with inconsistency. Increased, but still finite, sampling in the image domain just pushes the inconsistency, or gap, between the sets \(\mathfrak {S}\) and M to some level below either your numerical or experimental precision. It might look like the iteration has converged, but what has really happened is that the movement of the iterates has become so small that it is no longer detectable with a fixed arithmetic precision. This is just a physical manifestation of the fact from Fourier analysis that objects with compact support do not have compactly supported Fourier transforms, and vice versa. Since the measurements are finite, the object that is recovered cannot be finite. This means that the only time phase retrieval can be consistent is when imaging periodic crystals.
6.1.3 What Is a Proximal Method?
It is worthwhile spending a few moments to marvel at \({{\,\mathrm{prox}\,}}\). This is a mapping from points in a space X to points in the same space; using mathematical notation \({{\,\mathrm{prox}\,}}_{f, \lambda }:\,X\rightarrow X\,\). But, look again at (6.29): this is the solution to another optimization problem. There are two important things to notice about this observation, first of which is that a mapping has been created out of an optimization problem. This is what a mathematician might call pretty. The second thing to notice is that the value of the optimization problem—“the answer to the ultimate question of life, the universe and everything” [24]—is beside the point.^{5}
6.1.4 On Your Mark. Get Set...
There are three groups of readers envisioned for this tutorial. The first group is students, of either physics or mathematics, wishing to get handson numerical experience with classical algorithms for realworld problems in the physical sciences. The second group is optical scientists who already know what they want to do, but would like a repository of algorithms to see what works for their problem and what does not work. The third group is applied mathematicians who have new algorithmic ideas, but need to see how they perform in comparison to other known methods on real data. A strippeddown version of the ProxToolbox is used at the University of Göttingen to teach graduate and undergraduate courses in numerical optimization and mathematical imaging. What is omitted from the student version is the repository of algorithms and some of the prox mappings— the students are expected to write these themselves, with some guidance. Experienced researchers, it is expected, will extract the parts of the toolbox they need and incorporate these into their own software. To make it easy to identify the pieces, the toolbox has been organized in a highly modular structure. The modularity comes at the cost of an admittedly labyrinthine structure, which is the hardest thing to master and the main goal of the rest of this tutorial.
6.2 Algorithms
The two different models discussed above, feasibility and constrained optimization (6.19), lead to a natural classification of categories of algorithms. The development presented here follows [25]. To underscore the fact that the algorithms can be applied to problems other than Xray imaging, the sets involved are denoted by \(\varOmega _j\) for \(j=0,1,2,\dots ,m\) and the points of interest are denoted with a u, instead of the contextspecific notation for a wavefield \(\psi \). The sets \(\varOmega _j\) are subsets of the model space \(\mathbb {C}^n\) (or \(\mathbb {R}^{2n}\)) and, since there can be more than just two sets, as in the case of phase diversity or ptychography the integer m is just a standin for the number of images and other qualitative constraints involved in an experiment.
6.2.1 Model Category I: Multiset Feasibility
The multiset feasibility problem is:
The numerical experience is that this model format leads to the most effective methods for solving phasetype problems. It is important to keep in mind, however, that for all practical purposes the intersection above is empty, so the algorithm is not really solving the problem since it has no solution.
The easiest iterative procedure of all is the Cyclic Projections algorithm
Algorithm 6.2.1
Initialization. Choose \(u^{0} \in \left( \mathbb {C}^{n}\right) \).
To extend this to more than two sets, Borwein and Tam [31, 32] proposed the following variant:
Algorithm 6.2.2
(Cyclic DouglasRachford—CDR)
Initialization. Choose \(u^{0} \in \mathbb {C}^{n}\).
Different sequencing strategies than the one presented above are possible. In [33] one of the pair of sets is held fixed. This has some theoretical advantages in a convex setting, though no advantage was observed for phase retrieval.
Algorithm 6.2.3
(Cyclic Relaxed DouglasRachford CDR\(\varvec{\lambda }\))
Initialization. Choose \(u^{0} \in \mathbb {C}^{n}\) and \(\lambda \in \left[ 0 , 1\right] \).
The analysis for RAAR and its precursor, DouglasRachford is contained in [30, Sect. 3.2.2] and [34].
Algorithm 6.2.4
(Nonsmooth ADMM\(_{\mathbf {1}}\))
Initialization. Choose \(x^{0} , u_{j}^{0} , v_{j}^{0} \in \mathbb {C}^{n}\) and fix \(\eta >0\).
 1.Update$$\begin{aligned} x^{k + 1}&\in {{\,\mathrm{argmin\,}\,}}_{x \in \mathbb {C}^{n}} \left\{ \iota _{\varOmega _{0}}\left( x\right) + \sum _{j = 1}^{m} \left( \left\langle v_{j}^{k},~ x  u_{j}^{k}\right\rangle + \frac{\eta }{2}{\Vert }x  u_{j}^{k}{\Vert }^{2}\right) \right\} \nonumber \\&= {\mathcal P}_{\varOmega _{0}}\left( \frac{1}{m}\sum _{j = 1}^{m} \left( u_{j}^{k}  \frac{1}{\eta }v_{j}^{k}\right) \right) . \end{aligned}$$(6.38)
 2.For all \(j = 1 , 2 , \ldots , m\) update (in parallel)$$\begin{aligned} u_{j}^{k + 1}&\in {{\,\mathrm{argmin\,}\,}}_{u_{j} \in \mathbb {C}^{n}} \left\{ \iota _{\varOmega _{j}}\left( u_{j}\right) + \left\langle v_{j}^{k},~ x^{k + 1}  u_{j}\right\rangle + \frac{\eta }{2}{\Vert }x^{k + 1}  u_{j}{\Vert }^{2} \right\} \nonumber \\&= {\mathcal P}_{\varOmega _{j}}\left( x^{k + 1}  \eta v_{j}^{k}\right) . \end{aligned}$$(6.39)
 3.For all \(j = 1 , 2 , \ldots , m\) update (in parallel)$$\begin{aligned} v_{j}^{k + 1} = v_{j}^{k} + \eta \left( x^{k + 1}  u_{j}^{k + 1}\right) . \end{aligned}$$(6.40)
This can be written as a fixed point iteration on triplets \((x^k, u^k, v^k)\in \mathbb {C}^n\times \mathbb {C}^{mn}\times \mathbb {C}^n\), but it is not very convenient to see things this way. Note that the projections in Step 2 of the algorithm can be computed in parallel, while the Cyclic Projections and Cyclic DouglasRachford Algorithms must be executed sequentially. Note also that the update of the block \(u^{k+1}\) incorporates the newest information from the block \(x^{k+1}\) together with the old data \(v^{k}\), while the update of the block \(v^{k+1}\) incorporates the newest information from both blocks \(x^{k+1}\) and \(u^{k+1}\). This is in the same spirit as the GaussSeidel method for systems of linear equations. Obviously, there is an increase (by a factor of \(3+m\)) of the number of variables, but this is a mild increase in complexity in comparison to some recent proposals for phase retrieval which involve squaring the number of variables! Indeed ADMM is starting point for just about all the most successful methods for largescale optimization with linear constraints (see, for instance, [40] and references therein).
An ADMM scheme for phase retrieval has appeared in [41]. This is a terrible algorithm for phase retrieval. It is included here, however, as a point of reference to the DouglasRachford Algorithm.
6.2.2 Model Category II: Product Space Formulations
Algorithm 6.2.5
(Alternating Projections—AP)
Initialization. Choose \(\mathbf{{u}}^{0} \in \mathbb {C}^{n(m + 1)}\).
Algorithm 6.2.6
(Projected Gradient—PG)
Initialization. Choose \(\mathbf{{u}}^{0} \in \mathbb {C}^{n(m + 1)}\).
Algorithm 6.2.7
(Fast Projected Gradient—FPG)
Initialization. Choose \(\mathbf{{u}}^{0} , \mathbf{{y}}^{1}\in \mathbb {C}^{n(m + 1)}\) and \(\alpha _{k} =\frac{k  1}{k + 2}\quad \)for all \(k=0,1,2,\dots \).
There is no theory for the choice of acceleration parameter \(\alpha _{k}\), \(k =0,1,2,\dots \), in Algorithm 6.2.7 for nonconvex problems, but numerical experience [25, 44] indicates that this works pretty well. All that is missing is an explanation.
Algorithm 6.2.8
(Relaxed DouglasRachford—DR\(\varvec{\lambda }\)/RAAR)
Initialization. Choose \(\mathbf{{u}}^{0} \in \mathbb {C}^{n(m + 1)}\) and \(\lambda \in \left[ 0 , 1\right] \).
A different kind of relaxation to the DouglasRachford algorithm was recently proposed and studied in [45]. This appears to be better than Algorithm 6.2.8. When the sets involved are affine, the algorithm is a convex combination of DouglasRachford and Alternating Projections, but generally it takes the form
Algorithm 6.2.9
(DouglasRachfordAlternatingProjections)
Initialization. Choose \(\mathbf{{u}}^{0} \in \mathbb {C}^{n(m + 1)}\) and \(\lambda \in \left[ 0 , 1\right] \).
This algorithm is denoted by DRAP in the demonstrations below.
6.2.3 Model Category III: Smooth Nonconvex Optimization
The next algorithm, Averaged Projections, could be motivated purely from the feasibility framework detailed above. But there is a more significant smooth interpretation of this model, which motivates the smooth model class.
Algorithm 6.2.10
(Averaged Projections—AvP)
Initialization. Choose \(u^{0} \in \mathbb {C}^{n}\).
The analysis of averaged projections for problems with this structure is covered by the analysis of nonlinear/nonconvex gradient descent. This is classical and can be found throughout the literature, but it is limited to guarantees of convergence to critical points [46, 47]. For phase retrieval it is not known how to guarantee that all critical points are global minimums, though this is a topic of intense interest at the moment.
Although, in general, averaged projections has a slower convergence rate than its sequential counterpart [48], there are two features that recommend this method. First, it can be run in parallel. Secondly, it appears to be more robust to problem inconsistency. Indeed, Averaged Projections algorithm is equivalent to gradientbased schemes when applied to an adequate smooth and nonconvex objective function. This wellknown fact goes back to [49] when the sets \(\varOmega _{j}\), \(j = 0 , 1 , \ldots , m\), are closed and convex. In particular, two very prevalent schemes are in fact equivalent to AvP.
The objective in (6.45) is as nice as one could hope for: it has full domain, is smooth and nonegative and has the value zero at points of intersection. These kinds of models are for obvious reasons favored in applications; unfortunately, these reasons are a little old fashioned considering today’s mathematical technology for dealing with nonsmooth objectives like (6.33).
Problem (6.48) always has an optimal solution (the objective is continuous and the constraint is closed and bounded set, so by a theorem from Weierstrass the minimum is attained). The optimization problem (6.48) consists of constraint sets which are separable over the variables \(u_{j}\), \(j = 0 , 1 , \ldots , m\); this can be exploited to divide the optimization problem into a sequence of easier subproblems. Alternating Minimization (AM) does just this, and involves updating each variable sequentially:
Algorithm 6.2.11
(Alternating Minimization—AM)
Initialization. Choose \(\left( y^{0} , u_{0}^{0} , u_{1}^{0} , \ldots , u_{m}^{0}\right) \in \left( \mathbb {C}^{n}\right) ^{m + 2}\).
 1.Update$$\begin{aligned} y^{k + 1} = {{\,\mathrm{argmin\,}\,}}_{y \in \mathbb {C}^{n}} \sum _{j = 0}^{m} \frac{1}{2}{\Vert }y  u_{j}^{k}{\Vert }^{2} = \frac{1}{m + 1} \sum _{j = 0}^{m} u_{j}^{k}. \end{aligned}$$(6.49)
 2.For all \(j = 0 , 1 , \ldots , m\) update (in parallel)$$\begin{aligned} u_{j}^{k + 1} \in {{\,\mathrm{argmin\,}\,}}_{u_{j} \in \varOmega _{j}} \frac{1}{2}{\Vert }u_{j}  y^{k + 1}{\Vert }^{2} = {\mathcal P}_{\varOmega _{j}}y^{k + 1}. \end{aligned}$$(6.50)
Algorithm 6.2.12
(AvP\(^{\mathbf {2}}\))
Initialization. Choose any \(x^{0} \in \mathbb {C}^{n}\) and \(\rho _{j} > 0\), \(j = 0 , 1 , \ldots , m\). Compute \(u_{j}^{1} \in {\mathcal P}_{\varOmega _{j}}\left( y^{0}\right) \) (\(j = 0 , 1 , \ldots , m)\) and \(y^{1} \equiv \left( 1/\left( m + 1\right) \right) \sum _{j = 0}^{m} u_{j}^{1}\).
 Compute$$\begin{aligned} y^{k + 1} = \frac{1}{m}\sum _{j = 1}^{m} \left( u_{j}^{k} + \frac{1}{\rho _{j}}\left( y^{k}  y^{k  1}\right) \right) . \end{aligned}$$(6.55)
 For each \(j = 1 , 2 , \ldots , m\), compute$$\begin{aligned} u_{j}^{k + 1} = {\mathcal P}_{\varOmega _{j}}\left( u_{j}^{k} + \frac{1}{\rho _{j}}\left( 2y^{k}  y^{k  1}\right) \right) . \end{aligned}$$(6.56)
This algorithm can be viewed as a smoothed/relaxed version of Algorithm 6.2.4, but, when you look at it for the first time, the most obvious thing that jumps out at you is that this is averaged projections with a twstep recursion. This is why it has been called AvP\(^2\) in [25].
The more general PHeBIE Algorithm 6.2.13 applied to the problem of blind ptychography [54] reduces to Averaged Projections Algorithm for phase retrieval when the illuminating field is known. To derive this method, note that for any fixed y and \(\mathbf{{u}}\), the function \(u \mapsto \mathbf{{F}}\left( z, y , \mathbf{{u}}\right) \) given by (6.21) is continuously differentiable and its partial gradient, \(\nabla _{z} \mathbf{{F}}\left( z, y , \mathbf{{u}}\right) \), is Lipschitz continuous with moduli \(L_{z}\left( y , \mathbf{{u}}\right) \). The same assumption holds for the function \(y \mapsto \mathbf{{F}}\left( z, y , \mathbf{{u}}\right) \) when \(z\) and \(\mathbf{{u}}\) are fixed. In this case, the Lipschitz moduli is denoted by \(L_{y}\left( z, \mathbf{{u}}\right) \). Define \(L_{z}'\left( y , \mathbf{{u}}\right) \equiv \max \left\{ L_{z}\left( y , \mathbf{{u}}\right) , \eta _{z} \right\} \) where \(\eta _{z}\) is an arbitrary positive number. Similarly define \(L_{y}'\left( z, \mathbf{{u}}\right) \equiv \max \left\{ L_{y}\left( z, \mathbf{{u}}\right) , \eta _{y} \right\} \) where \(\eta _{y}\) is an arbitrary positive number. The constant \(\eta _{z}\) and \(\eta _{y}\) are used to address the following issue: if the Lipschitz constants \(L_{z}\left( y , \mathbf{{u}}\right) \) and/or \(L_{y}\left( z, \mathbf{{u}}\right) \) are zero then one should replace them with positive numbers (for the sake of welldefinedness of the algorithm). In practice, it is better to chose them to be small numbers but for the analysis it can be chosen arbitrarily.
Algorithm 6.2.13
(Proximal Heterogeneous Block ImplicitExplicit)
Initialization. Choose \(\alpha , \beta > 1\), \(\gamma > 0\) and \(\left( z^{0} , y^{0} , \mathbf{{u}}^{0}\right) \in X \times O \times M\).
 1.Set \(\alpha ^{k} = \alpha L_{z}'\left( y^{k} , \mathbf{{u}}^{k}\right) \) and select$$\begin{aligned} z^{k + 1} \in {{\,\mathrm{argmin\,}\,}}_{z\in X} \left\{ \left\langle z z^{k},~ \nabla _{z} \mathbf{{F}}\left( z^{k} , y^{k} , \mathbf{{u}}^{k}\right) \right\rangle + \frac{\alpha ^{k}}{2}{\Vert }z z^{k}{\Vert }^{2} \right\} , \end{aligned}$$(6.57)
 2.Set \(\beta ^{k} = \beta L_{y}'\left( z^{k + 1} , \mathbf{{u}}^{k}\right) \) and select$$\begin{aligned} y^{k + 1} \in {{\,\mathrm{argmin\,}\,}}_{y \in O} \left\{ \left\langle y  y^{k},~ \nabla _{y} \mathbf{{F}}\left( z^{k + 1} , y^{k} , \mathbf{{u}}^{k}\right) \right\rangle + \frac{\beta ^{k}}{2}{\Vert }y  y^{k}{\Vert }^{2} \right\} , \end{aligned}$$(6.58)
 3.Select$$\begin{aligned} \mathbf{{u}}^{k + 1} \in {{\,\mathrm{argmin\,}\,}}_{\mathbf{{u}}\in M} \left\{ \mathbf{{F}}\left( z^{k + 1} , y^{k + 1} , \mathbf{{u}}\right) + \frac{\gamma }{2}{\Vert }\mathbf{{u}} \mathbf{{u}}^{k}{\Vert }^{2} \right\} . \end{aligned}$$(6.59)
Algorithm 6.2.13, referred to as PHeBIE in Sect. 6.3.3, can be interpreted as a combination of the algorithm proposed in [55] and a slight generalization of the PALM Algorithm [47]. In the context of blind ptychography (6.4), the block of variables y is replaced with the object \(\psi \) and the function \(\mathbf{{F}}\) is the least squares objective (6.21). A partially preconditioned version of PALM was studied in [56] for phase retrieval, with improved performance over PALM. The regularization parameters \(\alpha ^{k}\) and \(\beta ^{k}\), \(k =0,1,2,\dots \), are discussed in [54]. These parameters are inversely proportional to the step size in Steps (6.57) and (6.58) of the algorithm. Noting that \(\alpha _{k}\) and \(\beta _{k}\), \(k =0,1,2,\dots \), are directly proportional to the respective partial Lipschitz moduli, the larger the partial Lipschitz moduli the smaller the step size, and hence the slower the algorithm progresses.
This brings to light an advantage of blocking strategies that is discussed in Chap. 12: algorithms that exploit block structures inherent in the objective function achieve better numerical performance by taking heterogeneous step sizes optimized for the separate blocks. There is, however, a price to be paid in the blocking strategies that are explored here: namely, they result in procedures that pass sequentially between operations on the blocks, and as such are not immediately parallelizable. The ptychography application is very generous in that it permits parallel computations on highly segmented blocks.
Other methods based on the quartic objective have gained popularity in the newer generation of phase retrieval studies in the applied mathematics community. Notable among these are methods called Wirtinger flow. Smoothness makes the analysis easier, but the quartic objective has almost no curvature around critical points, which makes convergence of first order methods much slower than first order methods applied to nonsmooth objectives. See [14, Sect. 5.2] for a discussion of this and [25] for numerical comparisons.
Algorithm 6.2.14
(Dynamically Reweighted Averaged Projections)
Initialization. Choose \(u^{0} \in \mathbb {C}^{n}\) and \(c > 0\).
The smoothness of the sum of squared distances (almost everywhere) opens the door to higherorder techniques from nonlinear optimization that accelerate the basic gradient descent method. QuasiNewton methods, for instance, would do the trick, and as observed in [61], they work unexpectedly well even on nonsmooth problems.
Algorithm 6.2.15
(Limited Memory BFGS with Trust Region)
 1.(Initialization) Choose \(\tilde{\eta } > 0,\) \(\zeta > 0,\) \(\overline{\ell } \in \{ 1 , 2 ,\ldots , n \}\), \(u^{0} \in \mathbb {C}^{n}\), and set \(\nu = \ell = 0\). Compute \(\nabla f\left( u^{0}\right) \) and \({\Vert }\nabla f\left( u^{0}\right) {\Vert }\) for$$\begin{aligned} f\left( u\right) \equiv \frac{1}{2\left( m + 1\right) }\sum _{j = 0}^{m} {{\,\mathrm{dist}\,}}^{2}\left( u , \varOmega _{j}\right) , \quad \nabla f\left( u\right) \equiv \frac{1}{m + 1}\sum _{j = 0}^{m} \left( {{\,\mathrm{Id}\,}} {\mathcal P}_{\varOmega _{j}}\right) \left( u\right) . \end{aligned}$$
 2.(LBFGS step) For each \(k = 0 , 1 , 2 , \ldots \) if \(\ell = 0\) compute \(u^{k + 1}\) by some line search algorithm; otherwise computewhere \(M^{k}\) is the LBFGS update [62], \(u^{k + 1} = u^{k} + s^{k}\), \(f\left( u^{k + 1}\right) \), and the predicted change (see, for instance [63]).$$\begin{aligned} s^{k} = \left( M^{k}\right) ^{1}\nabla f\left( u^{k}\right) , \end{aligned}$$
 3.(Trust Region) If \(\rho \left( s^{k}\right) < \tilde{\eta }\), wherereduce the trust region \(\Delta ^{k}\), solve the trust region subproblem for a new step \(s^{k}\) [64], and return to the beginning of Step 2. If \(\rho \left( s^{k}\right) \ge \tilde{\eta }\) compute \(u^{k + 1} = u^{k} + s^{k}\) and \(f\left( u^{k + 1}\right) \).$$\begin{aligned} \rho \left( s^{k}\right) = \frac{ \text{ actual } \text{ change } \text{ at } \text{ step } \text{ k }}{ \text{ predicted } \text{ change } \text{ at } \text{ step } \text{ k }}, \end{aligned}$$
 4.(Update) Compute \(\nabla f\left( u^{k + 1}\right) \), \({\Vert }\nabla f\left( u^{k + 1}\right) {\Vert }\),and \({s^{k}}^{T}y^{k}\). If \({s^{k}}^{T}y^{k} \le \zeta \), discard the vector pair \(\{s^{k \ell } , y^{k  \ell }\}\) from storage, set \(\ell = \max \{\ell  1 , 0 \}\), \(\Delta ^{k + 1} = \infty \), \(\mu ^{k + 1} = \mu ^{k}\) and \(M^{k + 1} = M^{k}\) (i.e. shrink the memory and don’t update); otherwise set \(\mu ^{k + 1} = \frac{{y^{k}}^{T}y^{k}}{{s^{k}}^{T}y^{k}}\) and \(\Delta ^{k + 1} = \infty ,\) add the vector pair \(\{ s^{k} , y^{k} \}\) to storage, if \(\ell = \overline{\ell }\), discard the vector pair \(\{ s^{k  \ell } , y^{k \ell } \}\) from storage. Update the Hessian approximation \(M^{k + 1}\) [62]. Set \(\ell = \min \{ \ell + 1 , \overline{\ell } \}\), \(\nu = \nu + 1\) and return to Step 1.$$\begin{aligned} y^{k} \equiv \nabla f\left( u^{k + 1}\right)  \nabla f\left( u^{k}\right) , \quad s^{k}\equiv u^{k + 1}  u^{k}, \end{aligned}$$
This looks complicated but is standard in nonlinear optimization. Convergence is still unexplained for the limited memory implementation.
6.3 ProxToolbox—A Platform for Creative Hacking

data transfer

sharing data processing algorithms

comparing the performance of different algorithmic approaches

teaching

innovation.
The ProxToolbox has been used within the Collaborative Research Center Nanoscale Photonic Imaging (SFB 755) at the University of Göttingen for each of the points above. It is written to be able to incorporate new problems, data, and algorithms without abandoning the old knowledge. This type of builtin knowledge retention requires a structure that is burdensome for singlepurpose users. Most colleagues and students prefer to cannibalize the ProxToolbox—hacking is positively encouraged. This tutorial and the demos in the toolbox are intended to put the user on a fast track to successfully disassembling and repurposing the basic elements.
Our presentation of the toolbox here is without specific reference to commands and code to prevent this tutorial from being outdated within a few months. Certain aspects of the code will change as new applications and new features get added to the toolbox, but what will not change is the compartmentalization of various mathematically and computationally distinct tools.

Nanoscale\(\underline{~}\)Photonic\(\underline{~}\)Imaging\(\underline{~}\)demos

Algorithms

ProxOperators
 Drivers/Problems

\(\dots \)
 Phase

Demos

DataProcessors

ProxOperators

 Ptychography

Demos

DataProcessors

ProxOperators


\(\dots \)


Utilities

InputData

Documentation
The Algorithms folder. This folder contains a general algorithm wrapper that loops through the iterations calling the desired algorithm. This is exactly T in (6.26), and the \(T_{*}\) indicates which specific algorithm is run, from Algorithm 6.2.1 through Algorithm 6.2.9. After the specific fixed point operator is applied, a specialized iterate monitor is called. This will depend both on the problem and the algorithm being run. The default is a generic iterate monitor that merely checks the distance between successive iterates. By default, the stopping criterion for the fixed point iteration is when the step between successive iterates falls below a tolerance given by the user. But for some algorithms and some problems, this may not be the best or most informative data about the progress of the iteration. For instance, if the problem is a feasibility problem (6.32), then the feasibility iterate monitor not only computes the difference between successive iterates, but also the distance between sets (the gap) at a given iterate. In this context, a reasonable comparison between algorithms is not the stepsize, but rather between the gap achieved by different algorithms. If one is running a DouglasRachfordtype algorithm on a feasibility problem, then as explained above, the iterates themselves don’t have to converge, but their shadows, defined as the projection of the iterates onto one of the sets, will give a good indication of convergence of some form. Still other algorithms, like ADMM 6.2.12, generate several sequences of iterates (three in the case of ADMM), only two of which converge nicely when everything goes well [65]. As much as possible, the iterate monitoring is automated so that the user does not have to bother with this. But users who are interested in algorithm development will want to pay close attention to this.
The ProxOperators folder. Some prox operators are generic, like a projector onto the diagonal of a product space \({\mathcal I}\) (otherwise known as averaging), or the prox of the \(\ell _1\)norm (soft thresholding), or the prox of the \(\ell _0\)function (hard thresholding). General prox operators are stored here. These always map an input to another point in the same space, but how they do this depends on the strucure of the input u (array, dimension, etc.). Some problems involve prox mappings that are specific to that problem, like Sudoku. These specific prox mappings are stored under the Problem/Drivers folder.
The Drivers/Problems folder. This is the folder where the specific problem instances are stored. The problems that are of interest for this chapter are phase and ptychography, though there are other problems, like computed tomography, sensor localization and Sudoku. The Phase subfolder contains a general problem family handler called “phase”. Since all phase problems have similar features, this problem handler makes sure that all the inputs and outputs are processed in the same way. The toolbox works through input files stored in the demos subfolder. The input files contain names of data sets, data processors, algorithm names and parameters, and other user defined parameters like stopping tolerances, output choices and so forth. The input files might be augmented by a graphical user interface in the future. The link between the experimentalist and the mathematician is through the data processor. The data processor for the Göttingen data sets is easily identified and contains all the required parameter values for specific experiments conducted at the Institute for Physics in Göttingen. The data that the processor manipulates is not contained in the ProxToolbox release, but is stored separately and must be downloaded from the links provided on the ProxToolbox homepage. Prox operators specific to phase retrieval, such as the projection onto the intensity data (6.11), are also stored at this level.
The InputData folder. The data, which is intended to be stored or linked in the directory “InputData”, is not included in the software toolbox in the interest of portability. This tutorial will only cover demonstrations with the Phase datasets and the Ptychography datasets. As these sets grow and develop, the links may change to reflect different hosts.
6.3.1 Coffee Break
6.3.2 Star Power
The next demonstration is of the reconstruction of a test object (the Siemens star) from near field Xray data provided by Tim Salditt’s laboratory at the Institute for Xray Physics at the University of Göttingen. Here the structured illumination shown in Fig. 6.2a left is modeled by \({\mathcal D}_{j}\), \(j = 1 , 2 , \ldots , m\), in problem (6.3) with \(m = 1\). The image shown in Fig. 6.2a right is in the near field, so the mapping \({\mathcal D}_F\) in problem (6.2) is the nearfield Fresnel transform [66]. In the model (6.3) this image is represented by \(I_{ij}\), \(j = 1 , 2 , \ldots , m\), with \(m = 1\). The qualitative constraint is that the object is a pure phase object, that is, the field in the object domain has amplitude 1 at each pixel.
A reconstruction with this data that does not take noise into account is shown in Fig. 6.3. What is remarkable here is that if one only looks at the convergence of the algorithms and judges by the achieved gap before termination, it appears that the quasiNewton accelerated average projections algorithm (QNAvP) is clearly the best Fig. 6.3b–c. But when you look at the reconstructions Fig. 6.3a the QNAvP reconstruction is the worst. The problem here is that the noise has also been faithfully recovered.
6.3.3 E Pluribus Unum
6.4 Last Word
One of the most rewarding things about participating in the Nanoscale Photonic Imaging Collaborative Research Center at the University of Göttingen has been working with scientists from different disciplines with different sensibilities and intuition. Collaboration starts with mutual respect and an openness for new ways of thinking about things. This has resulted in better mathematics and better science, both grounded in real world experience but with an attention to abstract structures. This has forced the examination of aspects of abstract models that, at first glance, don’t seem that important, but turn out to be decisive in practice.
Footnotes
 1.
The issue of whether to represent the vectors as points in \(\mathbb {R}^{2n}\) or points in \(\mathbb {C}^{n}\) is notational. The representation as points in \(\mathbb {C}^n\) is more convenient for the purpose of explanation, but on a computer you will need to work with \(\mathbb {R}^{2n}\).
 2.
This is a minority opinion, but it seems to be the height of hubris to think that an empirical observation is an approximation to a theoretical model rather than the other way around. The only thing that is indisputable is that the instrument behavior and the predicted behavior don’t match up as well as desired.
 3.
The names for these algorithms have evolved since their first introduction. In [19] the procedure that is today known as the DouglasRachford algorithm was called the Averaged Alternating Reflections algorithm, which then explains the genesis of the name RAAR for (6.22). Since DouglasRachford is more or less the accepted name for (6.15), (6.22) is called DR\(\lambda \) in more recent matheamtical articles. Nevertheless, RAAR is more common in the physics literature, so that is the nomenclature used here. In the ProxToolbox, however, the DR\(\lambda \) nomenclature is used.
 4.
(6.24) corrects a sign error in the lower half of [21, Eq (14)]
 5.
In case you forgot, it’s 42.
Notes
Acknowledgements
This work was supported by DFG grant SFB755.
References
 1.Krantz, S.G.: Mathematical Apocrypha Redux. Mathematical Association of America, UK edition (2006)Google Scholar
 2.Courant, R., Hilbert, D.: Methods of Mathematical Physics. Interscience Publishers, New York (1953)zbMATHGoogle Scholar
 3.Luke, D.R.: Proxtoolbox. http://num.math.unigoettingen.de/proxtoolbox/ (2017)
 4.Hegerl, R., Hoppe, W.: Dynamische Theorie der Kristallstrukturanalyse durch Elektronenbeugung im inhomogenen Primärstrahlwellenfeld. Ber. Bunsenges. Phys. Chem 74(11), 1148–1154 (1970)Google Scholar
 5.Maiden, A.M., Rodenburg, J.M.: An improved ptychographical phase retrieval algorithm for diffractiveimaging. Ultramicroscopy 109, 1256–1262 (2009)Google Scholar
 6.Rodenburg, J.M., Bates, R.H.T.: The theory of superresolution electron microscopy via Wignerdistribution deconvolution. Philos. Trans. R. Soc. Lond. Ser. A 339, 521–553 (1992)Google Scholar
 7.Rodenburg, J.M., Hurst, A.C., Cullis, A.G., Dobson, B.R., Pfeiffer, F., Bunk, O., David, C., Jefimovs, K., Johnson, I.: HardXray lensless imaging of extended objects. Phys. Rev. Lett. 98, 034801 (2007)ADSGoogle Scholar
 8.Thibault, P., Dierolf, M., Bunk, O., Menzel, A., Pfeiffer, F.: Probe retrieval in ptychographic coherent diffractive imaging. Ultramicroscopy 109, 338–343 (2009)Google Scholar
 9.Robisch, A.L., Salditt, T.: Phase retrieval for object and probe using a series of defocus nearfield images. Opt. Express 21(20), 23345–23357 (2013)ADSGoogle Scholar
 10.Gonsalves, R.A.: Phase retrieval and diversity in adaptive optics. Opt. Eng. 21(5), 829–832 (1982)ADSGoogle Scholar
 11.Robisch, A.L., Kröger, K., Rack, A., Salditt, T.: Nearfield ptychography using lateral and longitudinal shifts. New J. Phys. 17(7), 073033 (2015)ADSGoogle Scholar
 12.Robisch, A.L.: Phase retrieval for object and probe in the optical nearfield. Ph.D. thesis, Universität Göttingen (2016)Google Scholar
 13.Fienup, J.R.: Phase retrieval algorithms: a comparison. Appl. Opt. 21(15), 2758–2769 (1982)ADSGoogle Scholar
 14.Luke, D.R., Burke, J.V., Lyon, R.G.: Optical wavefront reconstruction: Theory and numerical methods. SIAM Rev. 44, 169–224 (2002)MathSciNetzbMATHADSGoogle Scholar
 15.Bauschke, H.H., Combettes, P.L., Luke, D.R.: A hybrid projection reflection method for phase retrieval. J. Opt. Soc. Amer. A 20(6), 1025–1034 (2003)ADSGoogle Scholar
 16.Bauschke, H.H., Combettes, P.L., Luke, D.R.: Phase retrieval, error reduction algorithm and Fienup variants: a view from convex feasibility. J. Opt. Soc. Amer. A 19(7), 1334–1345 (2002)MathSciNetADSGoogle Scholar
 17.Douglas Jr., J., Rachford Jr., H.H.: On the numerical solution of heat conduction problems in two or three space variables. Trans. Amer. Math. Soc. 82, 421–439 (1956)MathSciNetzbMATHGoogle Scholar
 18.Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)MathSciNetzbMATHADSGoogle Scholar
 19.Bauschke, H.H., Combettes, P.L., Luke, D.R.: Finding best approximation pairs relative to two closed convex sets in Hilbert spaces. J. Approx. Theory 127, 178–192 (2004)MathSciNetzbMATHGoogle Scholar
 20.Miao, J., Charalambous, P., Kirz, J., Sayre, D.: Extending the methodology of Xray crystallography to allow imaging of micrometresized noncrystalline specimens. Nature 400, 342–344 (1999)ADSGoogle Scholar
 21.Luke, D.R.: Relaxed averaged alternating reflections for diffraction imaging. Inverse Probl. 21, 37–50 (2005)MathSciNetzbMATHADSGoogle Scholar
 22.Luke, D.R.: Finding best approximation pairs relative to a convex and a proxregular set in Hilbert space. SIAM J. Optim. 19(2), 714–739 (2008)MathSciNetzbMATHGoogle Scholar
 23.Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace Hilbertien. Comptes Rendus de l’Académie des Sciences de Paris 255, 2897–2899 (1962)MathSciNetzbMATHGoogle Scholar
 24.Adams, D.: The Hitchhikers Guide to the Galaxy. Pan Books, New York (1980)Google Scholar
 25.Luke, D.R., Sabach, S., Teboulle, M.: Optimization on spheres: models and proximal algorithms with computational performance comparisons. SIAM J. Math. Data Sci. 1(3), 408–445 (2019)MathSciNetGoogle Scholar
 26.Gerchberg, R.W., Saxton, W.O.: A practical algorithm for the determination of phase from image and diffraction plane pictures. Optik 35, 237–246 (1972)Google Scholar
 27.Censor, Y.: Rowaction methods for huge and sparse systems and their applications. SIAM Rev. 23, 444–466 (1981)MathSciNetzbMATHGoogle Scholar
 28.Censor, Y., Cegielski, A.: Projection methods: an annotated bibliography of books and reviews. Optimization 64, 2343–2358 (2015). https://doi.org/10.1080/02331934.2014.957701MathSciNetCrossRefzbMATHGoogle Scholar
 29.Censor, Y., Zaknoon, M.: Algorithms and convergence results of projection methods for inconsistent feasibility problems: a review. Pure Appl. Funct. Anal. (2019). https://arxiv.org/abs/1802.07529 (to appear)
 30.Luke, D.R., Thao, N.H., Tam, M.K.: Quantitative convergence analysis of iterated expansive, setvalued mappings. Math. Oper. Res. 43(4), 1143–1176 (2018). https://doi.org/10.1287/moor.2017.0898
 31.Borwein, J.M., Tam, M.K.: A cyclic DouglasRachford iteration scheme. J. Optim. Theory Appl. 160(1), 1–29 (2014)MathSciNetzbMATHGoogle Scholar
 32.Borwein, J.M., Tam, M.K.: The cyclic DouglasRachford method for inconsistent feasibility problems. J. Nonlinear Convex Anal. 16(4), 537–584 (2015)MathSciNetzbMATHGoogle Scholar
 33.Bauschke, H.H., Noll, D., Phan, H.M.: Linear and strong convergence of algorithms involving averaged nonexpansive operators. J. Math. Anal. Appl. 421(1), 1–20 (2015)MathSciNetzbMATHGoogle Scholar
 34.Luke, D.R., Martins, A.L., Tam, M.K.: Relaxed cyclic DouglasRachford algorithms for nonconvex optimization. In: ICML Workshop: Modern Trends in Nonconvex Optimization for Machine Learning (2018). https://sites.google.com/view/icml2018nonconvex/papers
 35.Gabay, D.: Augmented Lagrangian Methods: Applications to the Solution of Boundary Value Problems. In: Applications of the Method of Multipliers to Variational Inequalities, pp. 299–331. NorthHolland, Amsterdam (1983)Google Scholar
 36.Glowinski, R., Marroco, A.: Sur l’approximation, par elements finis d’ordre un, et las resolution, par penalisationdualitè, d’une classe de problemes de dirichlet non lineares. Revue Francais d’Automatique, Informatique et Recherche Opérationelle 9(R2), 41–76 (1975)Google Scholar
 37.Bolte, J., Sabach, S., Teboulle, M.: Nonconvex Lagrangianbased optimization: monitoring schemes and global convergence. Math. Oper. Res. (2018). https://doi.org/10.1287/moor.2017.0900MathSciNetCrossRefzbMATHGoogle Scholar
 38.Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)MathSciNetzbMATHGoogle Scholar
 39.Themelis, A., Stella, L., Patrinos, P.: Forwardbackward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms. SIAM J Optim 28, 2274–2303 (2018)MathSciNetzbMATHGoogle Scholar
 40.Shefi, R., Teboulle, M.: Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization. SIAM J. Optim. 24(1), 269–297 (2014)MathSciNetzbMATHGoogle Scholar
 41.Liang, J., Stoica, P., Jing, Y., Li, J.: Phase retrieval via the alternating direction method of multipliers. IEEE Signal Process. Lett. 25(1), 5–9 (2018)ADSGoogle Scholar
 42.Beck, A., Teboulle, M.: A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009). https://doi.org/10.1137/080716542MathSciNetCrossRefzbMATHGoogle Scholar
 43.Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate \(O(1/k^2)\). Dokl. Akad. Nauk SSSR 269, 543–547 (1983)MathSciNetGoogle Scholar
 44.Pauwels, E.J.R., Beck, A., Eldar, Y.C., Sabach, S.: On Fienup methods for sparse phase retrieval. IEEE Trans. Signal Process. 66(4) (2018)Google Scholar
 45.Thao, N.H.: A convergent relaxation of the DouglasRachford algorithm. Comput. Optim., Appl (2018). https://doi.org/s105890189989y
 46.Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semialgebraic and tame problems: proximal algorithms, forwardbackward splitting, and regularized GaussSeidel methods. Math. Program. 137, 91–129 (2013)MathSciNetzbMATHGoogle Scholar
 47.Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1), 459–494 (2014). https://doi.org/10.1007/s1010701307019MathSciNetCrossRefzbMATHGoogle Scholar
 48.Lewis, A.S., Luke, D.R., Malick, J.: Local linear convergence of alternating and averaged projections. Found. Comput. Math. 9(4), 485–513 (2009)MathSciNetzbMATHGoogle Scholar
 49.Zarantonello, E.H.: Projections on convex sets in Hilbert space and spectral theory. In: Zarantonello, E.H. (ed.) Contributions to Nonlinear Functional Analysis, pp. 237–424. Academic Press, New York (1971)Google Scholar
 50.Luke, D.R., Martins, A.L.: Convergence Analysis of the Relaxed DouglasRachford Algorithm. SIAM J. Optim. https://arxiv.org/abs/1811.11590 (to appear)
 51.Poliquin, R.A., Rockafellar, R.T., Thibault, L.: Local differentiability of distance functions. Trans. Amer. Math. Soc. 352(11), 5231–5249 (2000)MathSciNetzbMATHGoogle Scholar
 52.Fienup, J.R.: Reconstruction of an object from the modulus of its Fourier transform. Opt. Lett. 3(1), 27–29 (1978)ADSGoogle Scholar
 53.Marchesini, S.: Phase retrieval and saddlepoint optimization. J. Opt. Soc. Am. A 24(10) (2007)Google Scholar
 54.Hesse, R., Luke, D.R., Sabach, S., Tam, M.: The proximal heterogeneous block implicitexplicit method and application to blind ptychographic imaging. SIAM J. Imaging Sci. 8(1), 426–457 (2015)MathSciNetzbMATHGoogle Scholar
 55.Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the KurdykaLojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)MathSciNetzbMATHGoogle Scholar
 56.Chang, H., Marchesini, S., Lou, Y., Zeng, T.: Variational phase retrieval with globally convergent preconditioned proximal algorithm. SIAM J. Imaging Sci. 11(1), 56–93 (2018)MathSciNetzbMATHGoogle Scholar
 57.Blumensath, T., Davies, M.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 27(3), 265–274 (2009)MathSciNetzbMATHGoogle Scholar
 58.Blumensath, T., Davies, M.: Normalised iterative hard thresholding; guaranteed stability and performance. IEEE J. Sel. Top. Signal Process. 4(2), 298–309 (2010)ADSGoogle Scholar
 59.Hesse, R., Luke, D.R., Neumann, P.: Alternating projections and DouglasRachford for sparse affine feasibility. IEEE Trans. Signal. Process. 62(18), 4868–4881 (2014). https://doi.org/10.1109/TSP.2014.2339801MathSciNetCrossRefzbMATHADSGoogle Scholar
 60.Beck, A., Teboulle, M., Chikishev, Z.: Iterative minimization schemes for solving the single source localization problem. SIAM J. Optim. 19(3), 1397–1416 (2008)MathSciNetzbMATHGoogle Scholar
 61.Lewis, A.S., Overton, M.L.: Nonsmooth optimization via quasiNewton methods. Math. Program. 141, 135–163 (2013)MathSciNetzbMATHGoogle Scholar
 62.Byrd, R.H., Nocedal, J., Schnabel, R.B.: Representations of quasiNewton matrices and their use in limited memory methods. Math. Program. 63, 129–156 (1994)MathSciNetzbMATHGoogle Scholar
 63.Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York (1999)zbMATHGoogle Scholar
 64.Burke, J.V., Wiegmann, A.: Lowdimensional quasiNewton updating strategies for largescale unconstrained optimization. Department of Mathematics, University of Washington (1996)Google Scholar
 65.Aspelmeier, T., Charitha, C., Luke, D.R.: Local linear convergence of the ADMM/DouglasRachford algorithms without strong convexity and application to statistical imaging. SIAM J. Imaging Sci. 9(2), 842–868 (2016)MathSciNetzbMATHGoogle Scholar
 66.Hagemann, J., Robisch, A.L., Luke, D.R., Homann, C., Hohage, T., Cloetens, P., Suhonen, H., Salditt, T.: Reconstruction of wave front and object for inline holography from a set of detection planes. Opt. Express 22, 11552–11569 (2014)ADSGoogle Scholar
 67.Luke, D.R.: Local linear convergence of approximate projections onto regularized sets. Nonlinear Anal. 75, 1531–1546 (2012). https://doi.org/10.1016/j.na.2011.08.027MathSciNetCrossRefzbMATHGoogle Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.