Skip to main content

A majorization–minimization algorithm for split feasibility problems

Abstract

The classical multi-set split feasibility problem seeks a point in the intersection of finitely many closed convex domain constraints, whose image under a linear mapping also lies in the intersection of finitely many closed convex range constraints. Split feasibility generalizes important inverse problems including convex feasibility, linear complementarity, and regression with constraint sets. When a feasible point does not exist, solution methods that proceed by minimizing a proximity function can be used to obtain optimal approximate solutions to the problem. We present an extension of the proximity function approach that generalizes the linear split feasibility problem to allow for non-linear mappings. Our algorithm is based on the principle of majorization–minimization, is amenable to quasi-Newton acceleration, and comes complete with convergence guarantees under mild assumptions. Furthermore, we show that the Euclidean norm appearing in the proximity function of the non-linear split feasibility problem can be replaced by arbitrary Bregman divergences. We explore several examples illustrating the merits of non-linear formulations over the linear case, with a focus on optimization for intensity-modulated radiation therapy.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Alber, M., Meedt, G., Nusslin, F., Reemtsen, R.: On the degeneracy of the IMRT optimization problem. Med. Phys. 29(11), 2584–2589 (2002)

    Article  Google Scholar 

  2. Antoniadis, A., Gijbels, I., Nikolova, M.: Penalized likelihood regression for generalized linear models with non-quadratic penalties. Ann. Inst. Stat. Math. 63(3), 585–615 (2011)

    MathSciNet  Article  Google Scholar 

  3. Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Essential smoothness, essential strict convexity, and legendre functions in banach spaces. Commun. Contemp. Math. 3(04), 615–647 (2001)

    MathSciNet  Article  Google Scholar 

  4. Becker, M.P., Yang, I., Lange, K.: EM algorithms without missing data. Stat. Methods Med. Res. 6, 38–54 (1997)

    Article  Google Scholar 

  5. Biegler, L.T., Zavala, V.M.: Large-scale nonlinear programming using IPOPT: an integrating framework for enterprise-wide dynamic optimization. Comput. Chem. Eng. 33(3), 575–582 (2009)

    Article  Google Scholar 

  6. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)

    MathSciNet  Article  Google Scholar 

  7. Byrne, C.L.: Iterative image reconstruction algorithms based on cross-entropy minimization. IEEE Trans. Image Process. 2(1), 96–103 (1993)

    Article  Google Scholar 

  8. Byrne, C.L.: Iterative projection onto convex sets using multiple Bregman distances. Inverse Probl. 15(5), 1295 (1999)

    MathSciNet  Article  Google Scholar 

  9. Byrne, C.L.: Iterative oblique projection onto convex sets and the split feasibility problem. Inverse Probl. 18(2), 441–453 (2002)

    MathSciNet  Article  Google Scholar 

  10. Byrne, C.L.: Sequential unconstrained minimization algorithms for constrained optimization. Inverse Probl. 24(1), 015013 (2008)

    MathSciNet  Article  Google Scholar 

  11. Byrne, C.L.: Alternating minimization as sequential unconstrained minimization: a survey. J. Optim. Theory Appl. 156(3), 554–566 (2013)

    MathSciNet  Article  Google Scholar 

  12. Byrne, C.L.: An elementary proof of convergence for the forward-backward splitting algorithm. J. Nonlinear Convex Anal. 15(4), 681–691 (2014)

    MathSciNet  MATH  Google Scholar 

  13. Cegielski, A.: General method for solving the split common fixed point problem. J. Optim. Theory Appl. 165(2), 385–404 (2015)

    MathSciNet  Article  Google Scholar 

  14. Censor, Y.: Row-action methods for huge and sparse systems and their applications. SIAM Rev. 23(4), 444–466 (1981)

    MathSciNet  Article  Google Scholar 

  15. Censor, Y.: Weak and strong superiorization: between feasibility-seeking and minimization. Analele Univ. Ovidius Constanta Ser. Mat. 23(3), 41–54 (2015)

    MathSciNet  Article  Google Scholar 

  16. Censor, Y., Bortfeld, T., Martin, B., Trofimov, A.: A unified approach for inversion problems in intensity-modulated radiation therapy. Phys. Med. Biol. 51(10), 2353 (2006)

    Article  Google Scholar 

  17. Censor, Y., Elfving, T.: A multiprojection algorithm using Bregman projections in a product space. Numer. Algorithms 8(2), 221–239 (1994)

    MathSciNet  Article  Google Scholar 

  18. Censor, Y., Elfving, T., Kopf, N., Bortfeld, T.: The multiple-sets split feasibility problem and its applications for inverse problems. Inverse Probl. 21(6), 2071–2084 (2005)

    MathSciNet  Article  Google Scholar 

  19. Censor, Y., Gibali, A., Reich, S.: Algorithms for the split variational inequality problem. Numer. Algorithms 59(2), 301–323 (2012)

    MathSciNet  Article  Google Scholar 

  20. Censor, Y., Motova, A., Segal, A.: Perturbed projections and subgradient projections for the multiple-sets split feasibility problem. J. Math. Anal. Appl. 327(2), 1244–1256 (2007)

    MathSciNet  Article  Google Scholar 

  21. Censor, Y., Zenios, S.A.: Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press on Demand, Oxford (1997)

    MATH  Google Scholar 

  22. Chi, E.C., Lange, K.: A look at the generalized heron problem through the lens of majorization–minimization. Am. Math. Mon. 121(2), 95–108 (2014)

    MathSciNet  Article  Google Scholar 

  23. Chi, E.C., Zhou, H., Lange, K.: Distance majorization and its applications. Math. Program. Ser. A 146(1–2), 409–436 (2014)

    MathSciNet  Article  Google Scholar 

  24. Combettes, P.L.: Inconsistent signal feasibility problems: least-squares solutions in a product space. IEEE Trans. Signal Process. 42(11), 2955–2966 (1994)

    Article  Google Scholar 

  25. Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization 53(5–6), 475–504 (2004)

    MathSciNet  Article  Google Scholar 

  26. Combettes, P.L., Bondon, P.: Hard-constrained inconsistent signal feasibility problems. IEEE Trans. Signal Process. 47(9), 2460–2468 (1999)

    Article  Google Scholar 

  27. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)

    MathSciNet  Article  Google Scholar 

  28. Craft, D., Bangert, M., Long, T., Papp, D., Unkelbach, J.: Shared data for intensity modulated radiation therapy (IMRT) optimization research: the CORT dataset. GigaSci. 3(1), 37 (2014)

    Article  Google Scholar 

  29. Davenport, M.A., Duarte, M.F., Eldar, Y.C., Kutyniok, G.: Chapter 1. Introduction to compressed sensing. Compressed Sensing: Theory and Applications. Cambridge University Press, Cambridge (2012)

    Google Scholar 

  30. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  31. Dingping, W., Qibin, D., Erli, W., Hang, Z.: The Split Feasibility Problem in Hilbert Space, pp. 1149–1154. Springer, Berlin (2013)

    Google Scholar 

  32. Ehrgott, M., Güler, c, Hamacher, H.W., Shao, L.: Mathematical optimization in intensity modulated radiation therapy. Ann. Oper. Res. 175(1), 309–365 (2010)

    MathSciNet  Article  Google Scholar 

  33. Fan, J., Lv, J.: A selective overview of variable selection in high dimensional feature space. Stat. Sin. 20(1), 101 (2010)

    MathSciNet  MATH  Google Scholar 

  34. Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)

    MathSciNet  Article  Google Scholar 

  35. Fiacco, A.V., McCormick, G.P.: Nonlinear Programming: Sequential Unconstrained Minimization Techniques. Classics in Applied Mathematics. SIAM, PA (1990)

    Book  Google Scholar 

  36. Gibali, A., Küfer, K., Süss, P.: Successive linear programing approach for solving the nonlinear split feasibility problem. J. Nonlinear Convex Anal. 15(2), 345–353 (2014)

    MathSciNet  Google Scholar 

  37. Gibali, A., Küfer, K.-H., Reem, D., Süss, P.: A generalized projection-based scheme for solving convex constrained optimization problems. Comput. Optim. Appl. 70(3), 737–762 (2018)

    MathSciNet  Article  Google Scholar 

  38. Goldstein, T., Studer, C., Baraniuk, R.: A field guide to forward-backward splitting with a FASTA implementation. arXiv:1411.3406 [cs.NA] (2014)

  39. Gordon, R., Bender, R., Herman, G.T.: Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and X-ray photography. J. Theor. Biol. 29(3), 471–481 (1970)

    Article  Google Scholar 

  40. Hou, Q., Wang, J., Chen, Y., Galvin, J.M.: An optimization algorithm for intensity modulated radiotherapy-the simulated dynamics with dose-volume constraints. Med. Phys. 30(1), 61–68 (2003)

    Article  Google Scholar 

  41. Lange, K.: A gradient algorithm locally equivalent to the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 57(2), 425–437 (1995)

    MathSciNet  MATH  Google Scholar 

  42. Lange, K.: Numerical Analysis for Statisticians. Statistics and Computing, 2nd edn. Springer, New York (2010)

    Book  Google Scholar 

  43. Lange, K.: Optimization. Springer Texts in Statistics, 2nd edn. Springer, New York (2013)

    Google Scholar 

  44. Lange, K.: MM Optimization Algorithms. SIAM, PA (2016)

    Book  Google Scholar 

  45. Lange, K., Hunter, D.R., Yang, I.: Optimization transfer using surrogate objective functions (with discussion). J. Comput. Gr. Stat. 9, 1–20 (2000)

    Google Scholar 

  46. Li, Z., Han, D., Zhang, W.: A self-adaptive projection-type method for nonlinear multiple-sets split feasibility problem. Inverse Probl. Sci. Eng. 21(1), 155–170 (2013)

    MathSciNet  Article  Google Scholar 

  47. Llacer, J., Deasy, J.O., Bortfeld, T.R., Solberg, T.D., Promberger, C.: Absence of multiple local minima effects in intensity modulated optimization with dose-volume constraints. Phys. Med. Biol. 48(2), 183 (2003)

    Article  Google Scholar 

  48. Lorenz, D.A., Schöpfer, F., Wenger, S.: The linearized Bregman method via split feasibility problems: analysis and generalizations. SIAM J. Imaging Sci. 7(2), 1237–1262 (2014)

    MathSciNet  Article  Google Scholar 

  49. Luenberger, D.G., Ye, Y.: Linear and Nonlinear Programming. International series in operations research & management science, volume 228, 4th edn. Springer, New York (2016)

    Book  Google Scholar 

  50. Masad, E., Reich, S.: A note on the multiple-set split convex feasibility problem in Hilbert space. J. Nonlinear Convex Anal. 8(3), 367–371 (2007)

    MathSciNet  MATH  Google Scholar 

  51. McCullagh, P., Nelder, J.A.: Generalized Linear Models, vol. 37. CRC press, Boca Raton (1989)

    Book  Google Scholar 

  52. Mordukhovich, B.S., Nam, N.M.: Applications of variational analysis to a generalized Fermat–Torricelli problem. J. Optim. Theory Appl. 148, 431–454 (2011)

    MathSciNet  Article  Google Scholar 

  53. Mordukhovich, B.S., Nam, N.M., Salinas, J.: Applications of variational analysis to a generalized Heron problem. Appl. Anal. 91(10), 1915–1942 (2011)

    MathSciNet  Article  Google Scholar 

  54. Mordukhovich, B.S., Nam, N.M., Salinas, J.: Solving a generalized Heron problem by means of convex analysis. Am. Math. Mon. 119(2), 87–99 (2012)

    MathSciNet  Article  Google Scholar 

  55. Moré, J.J.: The Levenberg–Marquardt algorithm: implementation and theory. In: Numerical analysis, pp. 105–116. Springer (1978)

  56. Moudafi, A.: A note on the split common fixed-point problem for quasi-nonexpansive operators. Nonlinear Anal. Theory Methods Appl. 74(12), 4083–4087 (2011)

    MathSciNet  Article  Google Scholar 

  57. Murty, K.G., Yu, F.-T.: Linear Complementarity, Linear and Nonlinear Programming, vol. 3. Citeseer, Heldermann, Berlin (1988)

    MATH  Google Scholar 

  58. Palta, J.R., Mackie, T.R. (eds.): Intensity-Modulated Radiation Therapy: The State of The Art. Medical Physics Publishing, Madison (2003)

    Google Scholar 

  59. Park, M.Y., Hastie, T.: L1-regularization path algorithm for generalized linear models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 69(4), 659–677 (2007)

    MathSciNet  Article  Google Scholar 

  60. Polson, N.G., Scott, J.G., Willard, B.T.: Proximal algorithms in statistics and machine learning. Stat. Sci. 30(4), 559–581 (2015)

    MathSciNet  Article  Google Scholar 

  61. R Core Team. R: a language and environment for statistical computing. R foundation for statistical computing, Vienna (2017)

  62. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, New York (1998)

    Book  Google Scholar 

  63. Shepard, D.M., Ferris, M.C., Olivera, G.H., Mackie, T.R.: Optimizing the delivery of radiation therapy to cancer patients. SIAM Rev. 41(4), 721–744 (1999)

    Article  Google Scholar 

  64. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  65. Wang, F., Xu, H.-K.: Cyclic algorithms for split feasibility problems in Hilbert spaces. Nonlinear Anal. Theory Methods Appl. 74(12), 4105–4111 (2011)

    MathSciNet  Article  Google Scholar 

  66. Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)

    MathSciNet  Article  Google Scholar 

  67. Xu, H.-K.: A variable Krasnosel’skii Mann algorithm and the multiple-set split feasibility problem. Inverse Probl. 22(6), 2021–2034 (2006)

    Article  Google Scholar 

  68. Xu, H.-K.: Iterative methods for the split feasibility problem in infinite-dimensional Hilbert spaces. Inverse Probl. 26(10), 105018 (2010)

    MathSciNet  Article  Google Scholar 

  69. Xu, J., Chi, E., Lange, K.: Generalized linear model regression under distance-to-set penalties. In: Advances in Neural Information Processing Systems, pp. 1385–1395 (2017)

  70. Zhang, X., Liu, H., Wang, X., Dong, L., Wu, Q., Mohan, R.: Speed and convergence properties of gradient algorithms for optimization of IMRT. Med. Phy. 31(5), 1141–1152 (2004)

    Article  Google Scholar 

  71. Zhou, H., Alexander, D., Lange, K.: A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat. Comput. 21, 261–273 (2011)

    MathSciNet  Article  Google Scholar 

  72. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Methodol.) 67(2), 301–320 (2005)

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

We thank Steve Wright and Dávid Papp for their help with the IMRT data examples, and thank Patrick Combettes for helpful comments. We also thank two anonymous referees for their constructive comments and thoughtful feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason Xu.

Additional information

Jason Xu was partially supported by NSF MSPRF Grant DMS-1606177. Kenneth Lange was partially supported by NIH Grant HG006139.

Appendix

Appendix

Proof of Proposition 3

Our proof of Proposition 3 relies on showing the algorithm map \(\psi \) that carries out the Armijo backtracking line search is closed at all non-stationary points of f. Recall that a point-to-set map A is closed at a point \(\varvec{x}\), if for any sequence \(\varvec{x}_j \rightarrow \varvec{x}\) such that \(\varvec{y}_j \in A(\varvec{x}_j)\) converges to \(\varvec{y}\), it follows that \(\varvec{y}\in A(\varvec{x})\). Define the following point-to-set map

$$\begin{aligned} S(\varvec{x}, \varvec{v}) = \left\{ \varvec{y}: \varvec{y}= \varvec{x}+ \sigma ^k \varvec{v}\text { for }k \in \mathbb {Z}_+\text { such that }\, f(\varvec{y}) \le f(\varvec{x}) + \alpha \sigma ^{k} df(\varvec{x}) \varvec{v}\right\} , \end{aligned}$$

and let G denote the map

$$\begin{aligned} G(\varvec{x}) = \begin{pmatrix} \varvec{x}\\ \nabla f(\varvec{x}) \end{pmatrix}. \end{aligned}$$

The map G is continuous since f is Lipschitz differentiable. Let \(A = S \circ G(\varvec{x})\). By Corollary 2 in Chapter 7.7 of [49], the composite mapping A is closed at \(\varvec{x}\), if the mapping S is closed at \(G(\varvec{x})\). We will state and prove a slightly more general result.

Proposition 4

The mapping S is closed at all points \((\varvec{x}, \varvec{v})\) provided that \(\varvec{v}\not = \varvec{0}\).

Proof

Consider sequences \(\{ \varvec{x}_j \}\), \(\{ \varvec{v}_j \}\) such that \(\varvec{x}_j \rightarrow \varvec{x}, \varvec{v}_j \rightarrow \varvec{v}\), and let \(\varvec{y}_k \in S(\varvec{x}_{k}, \varvec{v}_k)\) with \(\varvec{y}_k \rightarrow \varvec{y}\). For every j, there is some \(k_j\) such that \(\varvec{y}_j = \varvec{x}_j + \sigma ^{k_j} \varvec{v}_j\), where

$$\begin{aligned} \sigma ^{k_j} = \log \left( \frac{ ||\varvec{y}_j - \varvec{x}_j ||}{||\varvec{v}_j ||} \right) . \end{aligned}$$

Taking limits on both side above yields

$$\begin{aligned} \sigma ^{\bar{k}} = \frac{ ||\varvec{y}- \varvec{x}||}{||\varvec{v}||}, \text { where }\bar{k} = \lim _{j \rightarrow \infty } k_j. \end{aligned}$$

Because \(\varvec{y}_k \in S(\varvec{x}_{k}, \varvec{v}_k)\), it holds for each j that

$$\begin{aligned} f(\varvec{y}_j) \le f(\varvec{x}_j) + \alpha \sigma ^{k_j} df(\varvec{x}_j ) \varvec{v}_j . \end{aligned}$$
(22)

Since \(\{k_j\}\) is a sequence of integers, it assumes only finitely many distinct values before converging to the constant sequence \(\{ \bar{k} \}\); let \(k^\dagger \) denote the maximum of these values. Then replacing \(k_j\) by \(k^\dagger \) and letting \(j \rightarrow \infty \) in (22), together with continuity of f and \(df(\varvec{x})\), we have that

$$\begin{aligned} f(\varvec{y}) \le f(\varvec{x}) + \alpha \sigma ^{k^\dagger } df(\varvec{x}) \varvec{v}. \end{aligned}$$

That is, \(\varvec{y}\in S(\varvec{x}, \varvec{v})\), proving that S is closed at \((\varvec{x}, \varvec{v})\). \(\square \)

Proposition 4 informs us that the map A is closed at all non-stationary points of f. We are now ready to prove Proposition 3.

Proof

Fix an initial iterate \(\varvec{x}_0\). Note that the set \(\mathcal {L}_f(\varvec{x}_0) \equiv \{ \varvec{x}: f(\varvec{x}) \le f(\varvec{x}_0)\}\) is compact since f is coercive and the modified quasi-Newton method generates monotonically decreasing values. Since \(\mathcal {L}_f(\varvec{x}_0)\) is compact, the sequence \(\{\varvec{x}_{k}\}\) has a convergent subsequence whose limit is in \(\mathcal {L}_f(\varvec{x}_0)\); denote this as \(\varvec{x}_{k_l} \rightarrow \varvec{x}_\star \) as \(l \rightarrow \infty \). Our goal is to show that \(\varvec{x}_\star \) must be a stationary point of f. To the contrary, suppose that \(\varvec{x}_\star \) is not a stationary point of f.

Let \(\varvec{y}_{k_l} = \psi (\varvec{x}_{k_l}) \in A(\varvec{x}_{k_l})\). Note that \(\varvec{y}_{k_l} \in \mathcal {L}_f(\varvec{x}_0)\) and therefore the sequence \(\{\varvec{y}_{k_l}\}\) has a convergent subsequence \(\{\varvec{y}_{k_{l_j}}\}\). Denote this sequence’s limit \(\varvec{y}_\star \). Note that the map A is closed at \(\varvec{x}_\star \), since \(\varvec{x}_\star \) is not a stationary point of f. Therefore, by the definition of closed maps, we have that \(\varvec{y}_\star \in A(\varvec{x}_\star )\) and consequently

$$\begin{aligned} f(\varvec{y}_\star ) \le f(\varvec{x}_\star ) + \alpha \sigma ^k df(\varvec{x}_\star ) \varvec{H}(\varvec{x}_\star )^{-1}\nabla f(\varvec{x}_\star ) < f(\varvec{x}_\star ), \end{aligned}$$
(23)

for some positive integer k. On the other hand, since f is Lipschitz-differentiable, it is continuous; therefore \(\lim f(\varvec{x}_{k_l}) = f(\varvec{x}_\star )\). Moreover, for all \(k_l\) we have that

$$\begin{aligned} f(\varvec{x}_\star ) \le f(\varvec{x}_{k_{l+1}}) \le f(\varvec{x}_{k_l + 1}) = f(\varphi (\varvec{x}_{k_l})) \le f(\psi \circ \psi (\varvec{x}_{k_l})) \le f(\psi (\varvec{x}_{k_l})). \end{aligned}$$

By continuity of f, we have that \(f(\varvec{x}_\star ) \le f(\varvec{y}_\star )\), contradicting the inequality established in (23). We conclude that \(\varvec{x}_\star \) must be a stationary point of f. \(\square \)

\(\beta \)-divergence

The \(\beta \)-divergence is defined

$$\begin{aligned} D(\varvec{x},\varvec{y}) = {\left\{ \begin{array}{ll} \displaystyle \sum _{j}\dfrac{1}{\beta (\beta -1)} x_j^\beta + \sum _{j}\dfrac{1}{\beta } y_j^{\beta } - \dfrac{1}{\beta -1} x_j y_j^{\beta -1} &{} \beta \in \mathbb {R}\setminus \{0, 1 \} \\ \sum _j x_j \log \left( \dfrac{x_j}{y_j}\right) - x_j + y_j &{} \beta = 1 \\ \sum _j \dfrac{x_j}{y_j} - \log \left( \dfrac{x_j}{y_j}\right) - 1 &{} \beta = 0 \end{array}\right. } \end{aligned}$$

We see that the \(\beta \)-divergence corresponds to the Kullback–Leibler and Itakura–Saito divergences when \(\beta =1,0\) respectively. Below we discuss the projection onto a hyperplane for the case of \(\beta \in \mathbb {R}\setminus \{0, 1 \}\).

The function \(\phi (x) = \dfrac{1}{\beta (\beta -1)} \varvec{x}^\beta \) generates the \(\beta \)-divergence

$$\begin{aligned} {D(\varvec{x},\varvec{y}) = \sum _{j}\dfrac{1}{\beta (\beta -1)} x_j^\beta + \sum _{j}\dfrac{1}{\beta } y_j^{\beta } - \dfrac{1}{\beta -1} x_j y_j^{\beta -1} .} \end{aligned}$$

Its gradient is

$$\begin{aligned} {\nabla _\beta \phi (\varvec{x}) = \dfrac{1}{\beta -1}\varvec{x}^{\beta -1}, } \end{aligned}$$

and recall the Fenchel conjugate of \(\phi \) is given by

$$\begin{aligned} {\phi ^\star (\varvec{z}) = \sup _{\varvec{x}} \bigg ( \langle \varvec{z}, \varvec{x}\rangle - \phi (\varvec{x}) \bigg ) = \sup _{\varvec{x}} \bigg ( \langle \varvec{z}, \varvec{x}\rangle - \sum _j \dfrac{1}{\beta (\beta -1)} x_j^\beta \bigg ).} \end{aligned}$$

Defining \(h(\varvec{x}) = \langle \varvec{z}, \varvec{x}\rangle - \sum _j \dfrac{1}{\beta (\beta -1)} x_j^\beta \), and differentiating with respect to \(x_j\):

$$\begin{aligned} \nabla _{x_j} h&= z_j - \dfrac{1}{\beta -1}x_j^{\beta -1} = 0 \\ x_j&= (\beta -1)^{\frac{1}{\beta -1}} z_j^{\frac{1}{\beta -1}} . \end{aligned}$$

Plugging into \(\phi ^\star (\varvec{x})\),

$$\begin{aligned} \phi ^\star (\varvec{z})&= \sum _j z_j (\beta -1)^{\frac{1}{\beta -1}} z_j^{\frac{1}{\beta -1}} - \sum _j \dfrac{1}{\beta (\beta -1)} \left( (\beta -1)^{\frac{1}{\beta -1}} z_j^{\frac{1}{\beta -1}}\right) ^\beta \\&=(\beta -1)^{\frac{1}{\beta -1}} \sum _j z_j^{\frac{\beta }{\beta -1}} - \dfrac{ (\beta -1)^{\frac{1}{\beta -1}} }{\beta } \sum _j z_j^{\frac{\beta }{\beta -1}} \\&=(\beta -1)^{\frac{1}{\beta -1}} \bigg ( 1 - \dfrac{1 }{\beta } \bigg ) \sum _j z_j^{\frac{\beta }{\beta -1}} . \end{aligned}$$

Finally, differentiating the Fenchel conjugate yields

$$\begin{aligned} \nabla \phi ^\star (\varvec{z}) =(\beta -1)^{\frac{1}{\beta -1}} \bigg ( 1 - \dfrac{1 }{\beta } \bigg ) \frac{\beta }{\beta -1} \varvec{z}^{\frac{1}{\beta -1}} = (\beta -1)^{\frac{1}{\beta -1}} \varvec{z}^{\frac{1}{\beta -1} } . \end{aligned}$$

Thus, the projection of \(\varvec{x}\) onto a hyperplane is given by

$$\begin{aligned} {\mathcal {P}^\phi _{Q(a,c)} (\varvec{x}) =(\beta -1)^{\frac{1}{\beta -1}} \bigg ( \dfrac{1}{\beta -1} \varvec{x}^{\beta -1} - \widetilde{\gamma } a\bigg ) ^{\frac{1}{\beta -1}},} \end{aligned}$$

where \(\widetilde{\gamma } = \text {arg}\min _{\gamma } (\beta -1)^{\frac{1}{\beta -1}} \bigg ( 1 - \dfrac{1 }{\beta } \bigg ) \sum _j \bigg ( \dfrac{1}{\beta -1} x_j^{\beta -1} - \gamma a_j\bigg )^{\frac{\beta }{\beta -1}} + \gamma c .\)

Per-iteration complexity of IMRT study

We detail the calculations behind our per iteration complexity remarks. Note that in the IMRT dataset considered, \(p \ll n < m\).

Linear

How many flops are required to compute a single MM update given by Eq. (8) in the linear case?

$$\begin{aligned} \nabla f(\varvec{x}_{k})= & {} \sum _i \varvec{v}_i\big (\varvec{x}_{k} - \mathcal {P}_{C_i}(\varvec{x}_{k})\big ) + \sum _j w_j \varvec{A}^t\big (\varvec{A}\varvec{x}_{k} - \mathcal {P}_{Q_j}(\varvec{A}\varvec{x}_{k})\big ) \\ \varvec{x}_{k+1}= & {} \varvec{x}_{k} - \big [v\varvec{I}+ w\varvec{A}^t\varvec{A}\big ]^{-1}\nabla f(\varvec{x}_{k}). \end{aligned}$$

We first tally the flops required to compute the gradient \(\nabla f(\varvec{x}_{k})\). In the IMRT example, the first sum \(\sum _i v_i(\varvec{x}_{k} - \mathcal {P}_{C_i}(\varvec{x}_{k}))\) requires \(\mathcal {O}(n)\) flops. The matrix-vector product \(\varvec{z}_k = \varvec{A}\varvec{x}_{k}\) requires \(\mathcal {O}(mn)\) flops. The sum \(\varvec{y}_k = \sum _j w_j(\varvec{z}_k - \mathcal {P}_{Q_j}(\varvec{z}_k))\) requires \(\mathcal {O}(m)\) flops. The matrix-vector product \(\varvec{A}^t\varvec{y}_k\) requires \(\mathcal {O}(mn)\) flops. Adding the two sums requires \(\mathcal {O}(n)\) flops. In total, the gradient requires \(\mathcal {O}(mn)\) flops.

We next tally the flops required to compute the MM update. Forming the matrix \(v\varvec{I}+ w\varvec{A}^t\varvec{A}\) requires \(\mathcal {O}(mn^2)\) flops. Computing its Cholesky factorization requires \(\mathcal {O}(n^3)\) flops. We only need to compute the factorization once and can cache it. Subsequent iterations will require \(\mathcal {O}(n^2)\) flops. Thus, the amount of work required to compute a single MM update for the linear formulation is \(\mathcal {O}(mn)\).

Note that the exact update in the linear case given by

$$\begin{aligned} \varvec{x}_{k+1} = (v\varvec{I}+ w\varvec{A}^t\varvec{A})^{-1} \left[ \sum _i \varvec{v}_i \mathcal {P}_{C_i}(\varvec{x}_{k}) + \varvec{A}^t \sum _j w_j \varvec{A}\mathcal {P}_{Q_j}(\varvec{A}\varvec{x}_{k}) \right] \end{aligned}$$

has the same complexity as the update considered above.

Non-linear

We next consider the number of flops required for an MM update in the non-linear case:

$$\begin{aligned} \nabla f(\varvec{x}_{k})= & {} \sum _i v_i\big (\varvec{x}_{k} - \mathcal {P}_{C_i}(\varvec{x}_{k})\big ) + \nabla h(\varvec{x}_{k})\sum _j w_j \big (h(\varvec{x}_{k}) - \mathcal {P}_{Q_j}(h(\varvec{x}_{k}))\big ) \\ \varvec{x}_{k+1}= & {} \varvec{x}_{k} - \eta _{\varvec{x}_{k}}\big [v\varvec{I}+ w\nabla h(\varvec{x}_{k})dh(\varvec{x}_{k})\big ]^{-1}\nabla f(\varvec{x}_{k}) \\= & {} \varvec{x}_{k} - \eta _{\varvec{x}_{k}}\left[ \frac{1}{v}\varvec{I}- \frac{w}{v^2}\nabla h(\varvec{x}_{k}) \left( \varvec{I}+ \frac{w}{v}dh(\varvec{x}_{k})\nabla h(\varvec{x}_{k}) \right) ^{-1} dh(\varvec{x}_{k}) \right] \nabla f(\varvec{x}_{k}) \\= & {} \varvec{x}_{k} - \eta _{\varvec{x}_{k}}\left[ \frac{1}{v}\nabla f(\varvec{x}_{k}) - \frac{w}{v}\nabla h(\varvec{x}_{k}) \big (v\varvec{I}+ { {wdh}}(\varvec{x}_{k})\nabla h(\varvec{x}_{k}) \big )^{-1} dh(\varvec{x}_{k})\nabla f(\varvec{x}_{k}) \right] . \end{aligned}$$

We first tally the flops required to compute the gradient \(\nabla f(\varvec{x}_{k})\). The first sum \(\sum _i v_i(\varvec{x}_{k} - \mathcal {P}_{C_i}(\varvec{x}_{k}))\) requires \(\mathcal {O}(n)\) flops. Computing \(\varvec{A}\varvec{x}_{k}\) requires \(\mathcal {O}(mn)\) flops. Computing the sum \(\sum _j w_j (h(\varvec{x}_{k}) - \mathcal {P}_{Q_j}(h(\varvec{x}_{k})))\) requires \(\mathcal {O}(m)\) flops. Computing \(\nabla h(\varvec{x}_{k})\) requires \(\mathcal {O}(mn)\) flops, and computing its product with the sum term \(\nabla h(\varvec{x}_{k})\sum _j w_j (h(\varvec{x}_{k}) - \mathcal {P}_{Q_j}(h(\varvec{x}_{k})))\) requires \(\mathcal {O}(mp)\) flops. In total, the gradient requires \(\mathcal {O}(mn)\) flops. As in the linear case, the dominant cost are matrix-vector products involving the matrix \(\varvec{A}\).

We next tally the flops required to compute the MM update. Forming the matrix \(v\varvec{I}+ w\nabla h(\varvec{x}_{k})dh(\varvec{x}_{k})\) requires \(\mathcal {O}(np^2)\) flops. Computing its Cholesky factorization requires \(\mathcal {O}(p^3)\) flops. Computing \(dh(\varvec{x}_{k})\nabla f(\varvec{x}_{k})\) requires \(\mathcal {O}(np)\) flops. Computing \(\left( v\varvec{I}+ wdh(\varvec{x}_{k})\nabla h(\varvec{x}_{k}) \right) ^{-1} dh(\varvec{x}_{k})\nabla f(\varvec{x}_{k})\) requires \(\mathcal {O}(p^2)\) flops. The product \(\nabla h(\varvec{x}_{k})\left( v\varvec{I}+ wdh(\varvec{x}_{k})\nabla h(\varvec{x}_{k}) \right) ^{-1} dh(\varvec{x}_{k})\nabla f(\varvec{x}_{k})\) requires \(\mathcal {O}(np)\) flops. Computing a candidate update requires \(\mathcal {O}(n)\) flops. An objective function evaluation requires \(\mathcal {O}(mn)\) flops. Thus, including the line search, the amount of work required to compute a single MM update for the linear formulation is \(\mathcal {O}(\max \{mn,np^2\})\). When \(p^2 < m\), we conclude that the computational work for a single MM update for the non-linear case is comparable to that of the linear case. In practice, reducing the problem size via a non-linear formulation may additionally reduce the number of MM updates.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xu, J., Chi, E.C., Yang, M. et al. A majorization–minimization algorithm for split feasibility problems. Comput Optim Appl 71, 795–828 (2018). https://doi.org/10.1007/s10589-018-0025-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-018-0025-z

Keywords

  • Majorize–minimize
  • Nonlinear split feasibility
  • Intensity modulated radiation therapy
  • Proximity function minimization
  • Constrained regression