An Analysis of the Superiorization Method via the Principle of Concentration of Measure

Censor, Yair; Levy, Eliahu

doi:10.1007/s00245-019-09628-4

An Analysis of the Superiorization Method via the Principle of Concentration of Measure

Published: 11 November 2019

Volume 83, pages 2273–2301, (2021)
Cite this article

Applied Mathematics & Optimization Aims and scope Submit manuscript

Yair Censor¹ &
Eliahu Levy²

245 Accesses
2 Citations
Explore all metrics

Abstract

The superiorization methodology is intended to work with input data of constrained minimization problems, i.e., a target function and a constraints set. However, it is based on an antipodal way of thinking to the thinking that leads constrained minimization methods. Instead of adapting unconstrained minimization algorithms to handling constraints, it adapts feasibility-seeking algorithms to reduce (not necessarily minimize) target function values. This is done while retaining the feasibility-seeking nature of the algorithm and without paying a high computational price. A guarantee that the local target function reduction steps properly accumulate to a global target function value reduction is still missing in spite of an ever-growing body of publications that supply evidence of the success of the superiorization method in various problems. We propose an analysis based on the principle of concentration of measure that attempts to alleviate this guarantee question of the superiorization method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Derivative-free superiorization: principle and algorithm

Article 05 November 2020

On optimality conditions for quasi-relative efficient solutions in set-valued optimization

Article 12 September 2015

On the Extension of the DIRECT Algorithm to Multiple Objectives

Article Open access 05 September 2020

Notes

As common, we use the terms algorithm or algorithmic structure for the iterative processes studied here although no termination criteria are present and only the asymptotic behavior of these processes is studied.
Some support for this reasoning may be borrowed from the American scientist and Noble-laureate Herbert Simon who was in favor of “satisficing” rather than “maximizing”. Satisficing is a decision-making strategy that aims for a satisfactory or adequate result, rather than the optimal solution. This is because aiming for the optimal solution may necessitate needless expenditure of time, energy and resources. The term “satisfice” was coined by Herbert Simon in 1956 [35], see: https://en.wikipedia.org/wiki/Satisficing.

References

Bauschke, H.H., Borwein, J.M.: On projection algorithms for solving convex feasibility problems. SIAM Rev. 38, 367–426 (1996)
Article MathSciNet Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, New York (2017)
Book Google Scholar
Behrends, E.: Introduction to Markov Chains. Springer, New York (2000)
Book Google Scholar
Bell, J.: Trace class operators and Hilbert-Schmidt operators, Technical report, April 18, (2016), 26pp. Available on Semantic Scholar at https://www.semanticscholar.org/
Butnariu, D., Davidi, R., Herman, G.T., Kazantsev, I.G.: Stable convergence behavior under summable perturbations of a class of projection methods for convex feasibility and optimization problems. IEEE J. Sel. Top. Signal Process. 1, 540–547 (2007)
Article Google Scholar
Butnariu, D., Reich,S., Zaslavski, A.J.: Convergence to fixed points of inexact orbits of Bregman-monotone and of nonexpansive operators in Banach spaces, in: H.F. Nathansky, B.G. de Buen, K. Goebel, W.A. Kirk, and B. Sims (Editors), Fixed Point Theory and its Applications, (Conference Proceedings, Guanajuato, Mexico, 2005), Yokahama Publishers, Yokahama, Japan, pp. 11–32, (2006). http://www.ybook.co.jp/pub/ISBN%20978-4-9465525-0.htm
Butnariu, D., Reich, S., Zaslavski, A.J.: Stable convergence theorems for infinite products and powers of nonexpansive mappings. Num. Funct. Anal. Optim. 29, 304–323 (2008)
Article MathSciNet Google Scholar
Carlen, E., Madiman, M., Werner, E.M. (eds.): Convexity and Concentration, The IMA Volumes in Mathematics and its Applications, vol. 161. Springer, New York (2017)
Google Scholar
Cegielski, A.: Iterative Methods for Fixed Point Problems in Hilbert Spaces. Springer, Berlin (2012)
MATH Google Scholar
Censor, Y.: Superiorization and perturbation resilience of algorithms: A bibliography compiled and continuously updated. (2019) arXiv:1506.04219. http://math.haifa.ac.il/yair/bib-superiorization-censor.html (last updated: September 26)
Censor, Y.: Weak and strong superiorization: Between feasibility-seeking and minimization. Anal. Stiint. Univ. Ovidius Constanta Ser. Mat. 23, 41–54 (2015)
MathSciNet MATH Google Scholar
Censor, Y.: Can linear superiorization be useful for linear optimization problems? Inverse Problems33, (2017), 044006 (22 pp.)
Censor, Y., Davidi, R., Herman, G.T.: Perturbation resilience and superiorization of iterative algorithms. Inverse Probl. 26, 065008 (2010)
Article MathSciNet Google Scholar
Censor, Y., Davidi, R., Herman, G.T., Schulte, R.W., Tetruashvili, L.: Projected subgradient minimization versus superiorization. J. Optim. Theory Appl. 160, 730–747 (2014)
Article MathSciNet Google Scholar
Censor, Y., Herman, G.T., Jiang, M. (Editors): Superiorization: Theory and Applications, Inverse Problems33, (2017). Special Issue
Censor, Y., Zaslavski, A.J.: Convergence and perturbation resilience of dynamic string-averaging projection methods. Comput. Optim. Appl. 54, 65–76 (2013)
Article MathSciNet Google Scholar
Censor, Y., Zaslavski, A.J.: Strict Fejér monotonicity by superiorization of feasibility-seeking projection methods. J. Optim. Theory Appl. 165, 172–187 (2015)
Article MathSciNet Google Scholar
Censor, Y., Zur, Y.: Linear superiorization for infeasible linear programming, in: Y. Kochetov, M. Khachay, V. Beresnev, E. Nurminski and P. Pardalos (Editors), Discrete Optimization and Operations Research, Lecture Notes in Computer Science (LNCS), Vol. 9869, Springer, pp. 15–24 (2016)
Davidi, R.: Algorithms for Superiorization and their Applications to Image Reconstruction, Ph.D. dissertation, Department of Computer Science, The City University of New York, NY, USA, (2010). http://gradworks.umi.com/34/26/3426727.html
Davidi, R., Garduño, E., Herman, G.T., Langthaler, O., Rowland, S.W., Sardana, S., Ye, Z.: SNARK14: A programming system for the reconstruction of 2D images from 1D projections. Available at: http://turing.iimas.unam.mx/SNARK14M/. Latest Manuel of October 29, (2017) is at: http://turing.iimas.unam.mx/SNARK14M/SNARK14.pdf
Davidi, R., Herman, G.T., Censor, Y.: Perturbation-resilient block-iterative projection methods with application to image reconstruction from projections. Int. Trans. Oper. Res. 16, 505–524 (2009)
Article MathSciNet Google Scholar
Dubhashi, D.P., Panconesi, A.: Concentration of Measure for the Analysis of Randomised Algorithms. Cambridge University Press, New York (2009)
Book Google Scholar
Edelman, A., Rao, N.R.: Random matrix theory. Acta Numer. 14, 233–297 (2005)
Article MathSciNet Google Scholar
Gromov, M.: Spaces and questions. In: Alon, N., Bourgain, J., Connes, A., Gromov, M., Milman, V. (eds.) Visions in Mathematics, pp. 118–161. Modern Birkhäuser Classics. Birkhäuser Basel, Basel (2010)
Chapter Google Scholar
Herman, G.T.: Superiorization for image analysis, In: Combinatorial Image Analysis, Lecture Notes in Computer Science Vol. 8466, Springer pp. 1–7 (2014)
Herman, G.T., Garduño, E., Davidi, R., Censor, Y.: Superiorization: an optimization heuristic for medical physics. Med. Phys. 39, 5532–5546 (2012)
Article Google Scholar
Lange, K.: Singular Value Decomposition. In: Numerical Analysis for Statisticians. Springer, New York, pp. 129–142 (2010)
Ledoux, M.: The Concentration of Measure Phenomenon, Mathematical surveys and monographs Vol. 89, The American Mathematical Society (AMS), (2001)
Lee, J.M.: Introduction to Riemannian Manifolds, 2nd Edition. Springer International Publishing, Graduate Texts in Mathematics, Vol. 176, Originally published with title “Riemannian Manifolds: An Introduction to Curvature” (2018)
Samson, P.-M.: Concentration of measure principle and entropy-inequalities. In: Carlen, E., Madiman, M., Werner, E.M. (eds.) Convexity and Concentration, The IMA Volumes in Mathematics and its Applications, vol. 161, pp. 55–105. Springer, New York (2017)
Google Scholar
Seneta, E.: A tricentenary history of the law of large numbers. Bernoulli 19, 1088–1121 (2013)
Article MathSciNet Google Scholar
Shapiro, A.: Differentiability properties of metric projections onto convex sets. J. Optim. Theory Appl. 169, 953–964 (2016)
Article MathSciNet Google Scholar
Sidky, E.Y., Pan, X.: Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization. Phys. Med. Biol. 53, 4777–4807 (2008)
Article Google Scholar
S̆ilhavý, M.: Differentiability of the metric projection onto a convex set with singular boundary points. J. Convex Anal. 22, 969–997 (2015)
MathSciNet Google Scholar
Simon, H.A.: Rational choice and the structure of the environment. Psychol. Rev. 63, 129–138 (1956)
Article Google Scholar
Song, D., Gupta, A.: $L_{p}$-norm uniform distribution. Proc. Am. Math. Soc. 125, 595–601 (1997)
Article Google Scholar
Talagrand, M.: A new look at independence. Ann. Prob. 24, 1–34 (1996)
MathSciNet MATH Google Scholar
Zhang, X.: Prior-Knowledge-Based Optimization Approaches for CT Metal Artifact Reduction, Ph.D. dissertation, Dept. of Electrical Engineering, Stanford University, Stanford, CA, USA, (2013). http://purl.stanford.edu/ws303zb5770

Download references

Acknowledgements

We thank two anonymous reviewers for their constructive comments. This work was supported by research grant no. 2013003 of the United States-Israel Binational Science Foundation (BSF) and by the ISF-NSFC joint research program grant No. 2874/19.

Author information

Authors and Affiliations

Department of Mathematics, University of Haifa, Mt. Carmel, 3498838, Haifa, Israel
Yair Censor
Department of Mathematics, Technion – Israel Institute of Technology, 3200003, Technion City, Haifa, Israel
Eliahu Levy

Authors

Yair Censor
View author publications
You can also search for this author in PubMed Google Scholar
Eliahu Levy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yair Censor.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Some Concentration of Measure Facts in a High-Dimensional $E^{N}$

1.1 The Probability $L^{p}$ Norms of Vectors

For a vector $x\in E^{N}$, and $1\le p<\infty $, denote by $\Vert \,\cdot \Vert _{p}^{(\pi )}$ ($\pi $ stands for “probability space”) its $L^{p}$ norm when the set of indices $\{1,2,\ldots ,N\}$ is made into a uniform probability space, giving each index a weight 1 / N, namely,

$$\begin{aligned} \Vert x\Vert _{p}^{(\pi )}:=\left( \dfrac{1}{N}\sum _{j=1}^{N}|x_{j} |^{p}\right) ^{1/p}, \end{aligned}$$

(22)

see, e.g., [36]. As with any probability measure, always $\Vert \,\cdot \Vert _{p}^{(\pi )}$ increases with p.

For $x_{1},x_{2},\ldots ,x_{N}$ i.i.d. $\sim {\mathcal {N}}$, $\left( \Vert x\Vert _{p}^{(\pi )}\right) ^{p}$ is an average: its expectation ${\mathbb {E}}$ will be the same as the expectation of $|x|^{p}$ for x a scalar distributed $\sim {\mathcal {N}}$:

$$\begin{aligned} {\mathbb {E}}\left[ |x|^{p}\right] =\dfrac{1}{\sqrt{2\pi }}\int |x|^{p} {\text {exp}}(-\textstyle {\frac{1}{2}}x^{2})\,dx, \end{aligned}$$

(23)

but its standard deviation will be $1/\sqrt{N}$ that of $|x|^{p}$ for a scalar $\sim {\mathcal {N}}$:

$$\begin{aligned} \dfrac{1}{\sqrt{N}}\dfrac{1}{\sqrt{2\pi }}\int \left( |x|^{p}-{\mathbb {E} }\left[ |y|^{p}\right] \right) ^{2}{\text {exp}}(-\textstyle {\frac{1}{2}}x^{2})\,dx. \end{aligned}$$

(24)

Thus, $\Vert x\Vert _{p}^{(\pi )}$ is highly concentrated around the, not depending on N, $\left( {\mathbb {E}}\left[ |x|^{p}\right] \right) ^{1/p}$ with degree of concentration $O(1/\sqrt{N})$.

One may conclude, loosely speaking, that in any case, these $\Vert \,\cdot \Vert _{p}^{(\pi )}$ norms, having not depending on N means, are expected to be O(1), for all N.

1.2 The Norm of the Sum of Vectors with Given Norms

Suppose we are given M vectors $y_{1},y_{2},\ldots ,y_{M}$ of known norms $d_{1},d_{2},\ldots .d_{M}$ in $E^{N}$. What should we expect the norm of their sum to be?

This can be answered: take the direction of each of them distributed uniformly on $S^{N-1}$, even conditioned on fixed valued for the others. In other words, take them independent, each with direction distributed uniformly. This can be constructed by taking random M vectors in $E^{N}$ (that is, a random $M\times N$ matrix), with entries i.i.d. $\sim {\mathcal {N}}$, dividing them by $\sqrt{N}$, then by their norm (now highly concentrated near 1) and multiplying them by $d_{1},d_{2},\ldots ,d_{M},$ respectively.

The sum $\sum _{i=1}^{M}y_{i}$, if we ignore the division by the norm, is $1/\sqrt{N}$ times the random matrix applied to the vector $(d_{1} ,d_{2},\ldots ,d_{M})$. But the distribution of the random matrix is invariant with respect to any transformation which is orthogonal with respect to the Hilbert-Schmidt norm—the square root of the sum of squares of the entries (i.e., $\Vert T\Vert _{HS}:=\sqrt{{\text {*}}{tr}(T^{\prime }\cdot T)} =\sqrt{{\text {*}}{tr}(T\cdot T^{\prime })}$, $T^{\prime }$ denoting the transpose and ${\text {*}}{tr}$ standing for the trace, see, e.g., [4]). In particular, the distribution of the sum is the same as that of $1/\sqrt{N}$ times $\sqrt{d_{1}^{2}+d_{2}^{2}+\cdots +d_{M} ^{2}}$ times the random matrix applied to $(1,0,\ldots ,0)$, which is, of course, distributed with independent $\sim {\mathcal {N}}$ entries, thus, with norm concentrated near $\sqrt{N}$. (With relative deviation $O(1/\sqrt{N})$.) This leads to the following conclusion.

Conclusion 10

For M vectors $y_{1},y_{2},\ldots ,y_{M}$ of known norms $d_{1},d_{2},\ldots .d_{M}$, in $E^{N}$ we have that $\Vert \sum _{i=1}^{M} y_{i}\Vert $ is near $\sqrt{d_{1}^{2}+d_{2}^{2}+\cdots +d_{M}^{2}}$ with almost full probability (With relative deviation $O(1/\sqrt{N})$.)

1.3 The Accumulation of Given Distances on the Unit Sphere

As in the previous Appendix A.2, we seek to find what should we expect the norm of a sum of M vectors of given norms $d_{1},d_{2} ,\ldots .d_{M}$ to be. But here the vectors are the differences between consecutive elements in a sequence of points on the unit sphere $S^{N-1}\subset E^{N}$. Denote by $\omega _{N-1}$ the normalized to be probability (i.e., of total mass 1) uniform measure on $S^{N-1}$.

Remark 11

By symmetry, for $x=(x_{1},x_{2},\ldots ,x_{N})\in S^{N-1}$, $\int x_{k} ^{2}\,d\omega _{N-1}$ is the same for all k. Of course, their sum is $\int 1\,d\omega _{N-1}=1$. Therefore,

$$\begin{aligned} \int x_{k}^{2}\,d\omega _{N-1}=\dfrac{1}{N},\qquad k=1,2,\ldots ,N. \end{aligned}$$

(25)

Hence, for a polynomial of degree $\le 2$ on $E^{n}$:

$$\begin{aligned} p(x)=\langle Qx,x\rangle +2\langle a,x\rangle +\gamma , \end{aligned}$$

(26)

where Q is a symmetric $N\times N$ matrix, $a\in E^{N}$ and $\gamma \in E$, we will have

$$\begin{aligned} \int p(x)\,d\omega _{N-1}=\dfrac{1}{N}{\text {*}}{tr}\,Q+\gamma . \end{aligned}$$

(27)

Note that, for some fixed $0\le d\le 2$, the set of points in $S^{N-1}$ of distance d from some fixed vector $u\in S^{N-1}$ is the $(N-2)$-sphere $\subset S^{N-1},$ $\Sigma (u,d)$ given by

$$\begin{aligned} \Sigma (u,d):=(1-d^{2}/2)u+d\sqrt{1-d^{2}/4}\cdot {S^{N-2}}_{u^{\bot } }, \end{aligned}$$

(28)

where ${S^{N-2}}_{u^{\bot }}$ stands for the unit sphere in the hyperplane prependicular to u. In our scenario, one performs a Markov chain, see, e.g., [3]. Starting from a point $u_{0}$ on $S^{N-1}$, and moving to a point $u_{1}\in \Sigma (u_{0},d_{1})$ uniformly distributed there. Then, from that $u_{1}$, to a point $u_{2}\in \Sigma (u_{1},d_{2})$ uniformly distributed there, and so on, until one ends with $u_{M}$. We would like to find ${\mathbb {E}}\left[ \Vert u_{M}-u_{0} \Vert \right] $.

If we denote by ${\mathcal {L}}_{d}$ the operator mapping a function p on $S^{N-1}$ to the function whose value at a vector $u\in S^{N-1}$ is the average of p on $\Sigma (u,d)$, then ${\mathcal {L}}_{d_{k}}(p)$ evaluated at u is the expectation of p at the point to which u moved in the k-th step above. Hence, in the above Markov chain, the expectation of $p(u_{M})$ is

$$\begin{aligned} {\mathcal {L}}_{d_{M}}{\mathcal {L}}_{d_{M-1}}\cdots {\mathcal {L}}_{d_{1}} p(u_{0}). \end{aligned}$$

(29)

Thus, what we are interested in is

$$\begin{aligned} {\mathbb {E}}\left[ \Vert u_{M}-u_{0}\Vert \right] ={\mathcal {L}}_{d_{M} }{\mathcal {L}}_{d_{M-1}}\cdots {\mathcal {L}}_{d_{1}}(\Vert x-u_{0}\Vert ^{2})(u_{0}). \end{aligned}$$

(30)

So, let us calculate ${\mathcal {L}}_{d}(p)$ for polynomials of degree $\le 2$ as in (26). In performing the calculation, assume $u=(1,0,\ldots ,0)$. For $x=(x_{1},x_{2},\ldots ,x_{N})\in E^{N}$ write $y=(x_{2},x_{3} ,\ldots ,x_{N})\in E^{N-1}$. In (26) write $a=(a_{1},b)$ where $b=(a_{2},a_{3},\ldots a_{N})\in E^{N-1}$ and

$$\begin{aligned} Q=\left( \begin{array} [c]{cc} \eta &{} c^{\prime }\\ c &{} Q^{\prime } \end{array} \right) , \end{aligned}$$

(31)

where $Q^{\prime }$ is a symmetric $(N-1)\times (N-1)$ matrix, $c\in E^{N-1}$ and $\eta \in E$. Note that for our $u=(1,0,\ldots ,0)$, $a_{1}=\langle a,u\rangle $, $\eta =\langle Qu,u\rangle $ and ${\text {*}}{tr}\,Q^{\prime }={\text {*}}{tr}\,Q-\eta ={\text {*}}{tr}\,Q-\langle Qu,u\rangle $.

Then, for p as in in (26),

$$\begin{aligned} p(x)=\eta x_{1}^{2}+2x_{1}\langle c,y\rangle +\langle Q^{\prime }y,y\rangle +2a_{1}x_{1}+2\langle b,y\rangle +\gamma . \end{aligned}$$

(32)

Hence, taking account of (28) for $u=(1,0,\ldots ,0)$, and using (25),

$$\begin{aligned}&({\mathcal {L}}_{d}p)(u)=({\mathcal {L}}p)(1,0,\ldots ,0)\nonumber \\&\quad =\left( 1-\dfrac{d^{2}}{2}\right) ^{2}\eta +\dfrac{1}{N-1}d^{2}\left( 1-\dfrac{d^{2}}{4}\right) {\text {*}}{tr}\,Q^{\prime }+2\left( 1-\dfrac{d^{2}}{2}\right) a_{1}+\gamma \nonumber \\&\quad =\left( 1-\dfrac{d^{2}}{2}\right) ^{2}\langle Qu,u\rangle +\dfrac{1}{N-1}d^{2}\left( 1-\dfrac{d^{2}}{4}\right) \left( {\text {*}}{tr} Q-\langle Qu,u\rangle \right) \nonumber \\&\qquad +2\left( 1-\dfrac{d^{2}}{2}\right) \langle a,u\rangle +\gamma , \end{aligned}$$

(33)

which, by symmetry, will hold for any $u\in S^{N-1}$. In particular, we find, as should be expected, that

$$\begin{aligned}&\int ({\mathcal {L}}_{d}(p))(x)\,d\omega _{N-1}\nonumber \\&\quad =\dfrac{1}{N}\left( 1-\dfrac{d^{2}}{2}\right) ^{2}{\text {*}}{tr} \,Q+\dfrac{1}{N-1}d^{2}\left( 1-\dfrac{d^{2}}{4}\right) \left( 1-\dfrac{1}{N}\right) {\text {*}}{tr}\,Q+\gamma \nonumber \\&\quad =\dfrac{1}{N}{\text {*}}{tr}\,Q+\gamma =\int p(x)\,d\omega _{N-1}. \end{aligned}$$

(34)

We are interested, for some fixed $u\in S^{N-1}$, in

$$\begin{aligned} p(x)=\Vert x-u\Vert ^{2}=2(1-\langle u,x\rangle ). \end{aligned}$$

(35)

Then there is no Q term, so one has

$$\begin{aligned} \Big ({\mathcal {L}}_{d}\big (2(1-\langle u,x\rangle )\big )\Big )(u)=2\left( 1-\left( 1-\dfrac{d^{2}}{2}\right) \langle u,x\rangle \right) . \end{aligned}$$

(36)

Consequently,

$$\begin{aligned}&{\mathbb {E}}\left[ \Vert u_{M}-u_{0}\Vert ^{2}\right] =\left( {\mathcal {L}}_{d_{M}}{\mathcal {L}}_{d_{M-1}}\cdots {\mathcal {L}}_{d_{1}}(\Vert x-u_{0}\Vert ^{2})\right) (u_{0})\nonumber \\&\quad =2\left( 1-\left. \left( {\mathcal {L}}_{d_{M}}{\mathcal {L}}_{d_{M-1} }\cdots {\mathcal {L}}_{d_{1}}(\langle u_{0},x\rangle )\right) \right| _{x=u_{0}}\right) \nonumber \\&\qquad \times 2\left. \left( 1-\prod _{i=1}^{M}\left( 1-\dfrac{d_{i}^{2}}{2}\right) \langle u_{0},x\rangle \right) \right| _{x=u_{0}}=2\left( 1-\prod _{i=1}^{M}\left( 1-\dfrac{d_{i}^{2}}{2}\right) \right) . \end{aligned}$$

(37)

This is $O\left( M\cdot \left( \Vert (d_{1},d_{2},\ldots ,d_{M})\Vert _{2} ^{(\pi )}\right) ^{2}\right) $. We also assess the standard deviation, which is

$$\begin{aligned} =2\sqrt{\left. {\mathcal {L}}_{d_{M}}{\mathcal {L}}_{d_{M-1}}\cdots {\mathcal {L}}_{d_{1}}(\langle u_{0},x\rangle ^{2})\right| _{x=u_{0} }-\left( \prod _{i=1}^{M}\left( 1-\dfrac{d_{i}^{2}}{2}\right) \right) ^{2} }. \end{aligned}$$

(38)

Here $p(x)=\langle a,x\rangle ^{2}$, so there is only the Q term with $Q(x):=\langle a,x\rangle ^{2}$. Then ${\text {*}}{tr}\,Q=\Vert a\Vert ^{2}$, and we find

$$\begin{aligned}&\Big ({\mathcal {L}}_{d}\big (\langle a,x\rangle ^{2} \big )\Big )(u)\nonumber \\&\quad =\left( 1-\dfrac{d^{2}}{2}\right) ^{2}\langle a,u\rangle ^{2}+\dfrac{1}{N-1}d^{2}\left( 1-\dfrac{d^{2}}{4}\right) \left( \Vert a\Vert ^{2}-\langle a,u\rangle ^{2}\right) \nonumber \\&\quad =\left( 1-\dfrac{N}{N-1}d^{2}\left( 1-\dfrac{d^{2}}{4}\right) \right) \langle a,u\rangle ^{2}+\dfrac{1}{N-1}d^{2}\left( 1-\dfrac{d^{2}}{4}\right) \Vert a\Vert ^{2}. \end{aligned}$$

(39)

Consequently, for $a=u_{0}$ (note $\Vert u_{0}\Vert ^{2}=1$)),

$$\begin{aligned}&\left. \left( {\mathcal {L}}_{d_{M}}{\mathcal {L}}_{d_{M-1}}\cdots {\mathcal {L}}_{d_{1}}(\langle u_{0},x\rangle ^{2})\right) \right| _{x=u_{0}}-\left( \prod _{i=1}^{M}\left( 1-\dfrac{d_{i}^{2}}{2}\right) \right) ^{2}\nonumber \\&\quad =-\left( \prod _{i=1}^{M}\left( 1-\dfrac{d_{i}^{2}}{2}\right) \right) ^{2}+\prod _{i=1}^{M}\left( 1-\dfrac{N}{N-1}d_{i}^{2}\left( 1-\dfrac{d_{i}^{2}}{4}\right) \right) \nonumber \\&\qquad +\dfrac{1}{N-1}\left[ d_{1}^{2}\left( 1-\dfrac{d_{1}^{2}}{4}\right) +d_{2}^{2}\left( 1-\dfrac{d_{2}^{2}}{4}\right) \left( 1-\dfrac{N}{N-1} d_{1}^{2}\left( 1-\dfrac{d_{1}^{2}}{4}\right) \right) \right. \nonumber \\&\qquad +d_{3}^{2}\left( 1-\dfrac{d_{3}^{2}}{4}\right) \left( 1-\dfrac{N}{N-1}d_{2}^{2}\left( 1-\dfrac{d_{2}^{2}}{4}\right) \right) \left( 1-\dfrac{N}{N-1}d_{1}^{2}\left( 1-\dfrac{d_{1}^{2}}{4}\right) \right) \nonumber \\&\qquad \left. +\cdots +{\text { }}d_{M}^{2}\left( 1-\dfrac{d_{M}^{2}}{4}\right) \prod _{i=1}^{M}\left( 1-\dfrac{N}{N-1}d_{i}^{2}\left( 1-\dfrac{d_{i}^{2}}{4}\right) \right) \right] . \end{aligned}$$

(40)

This is $O\left( (M/N)\cdot \left( \Vert (d_{1},d_{2},\ldots ,d_{M})\Vert _{4}^{(\pi )}\right) ^{4}\right) $, since the constant terms and the terms with $d_{k}^{2}$ cancel, and the terms which do not cancel are coefficiented by O(1 / N). Therefore, twice its square root, the standard deviation, will be $O\left( \sqrt{M}/\sqrt{N}\left( \Vert (d_{1},d_{2},\ldots ,d_{M})\Vert _{4}^{(\pi )}\right) ^{2}\right) $, making the relative deviation $O(1/\sqrt{MN})$. This leads to the following conclusion.

Conclusion 12

The square of the norm of the sum of M vectors of given norms $d_{1},d_{2},\ldots .d_{M}$, which are differences between consecutive elements in a sequence of points on the unit sphere $S^{N-1} \subset E^{N}$, modeled by the above Markov chain, is with almost full probability, near

$$\begin{aligned} 2\left( 1-\prod _{i=1}^{M}\left( 1-\dfrac{d_{i}^{2}}{2}\right) \right) . \end{aligned}$$

(41)

(With relative deviation $O(1/\sqrt{MN})$.)

1.4 A Reminder: Polar Decomposition and Singular Values of a Matrix

As is well-known, see, e.g., [27], every fixed $N\times N$ matrix T can be uniquely written as $T=UA$ with U orthogonal and A symmetric positive semidefinite (take $A=\sqrt{T^{\prime }\cdot T}$, then for every vector x, $\Vert Tx\Vert $=$\Vert Ax\Vert $, so the map $Ax\mapsto Tx$ is norm-preserving, i.e., orthogonal), and also uniquely written as $T=A_{1}U_{1}$ with $U_{1}$ orthogonal and $A_{1}$ symmetric positive semidefinite (take $A_{1}=\sqrt{T\cdot T^{\prime }}$).

The singular values of T are defined as the eigenvalues of its positive semidefinite part in the above decomposition. (It does not matter from which side: $T\cdot T^{\prime }$ and $T^{\prime }\cdot T$ have the same eigenvalues. Note that if T is invertible they are similar: $T^{\prime -1}(T\cdot T^{\prime })T$.)

Since any positive semidefinite matrix with eigenvalues $s_{1},s_{2} ,\ldots ,s_{N}$ is of the form

$$\begin{aligned} U^{\prime }\cdot {\text {diag}}(s_{1},s_{2},\ldots ,s_{N})\cdot U \end{aligned}$$

(42)

with U orthogonal (${\text {diag}}$ denotes a diagonal matrix), we find that the general form of a matrix with singular values $s_{1} ,s_{2},\ldots ,s_{N}$ is

$$\begin{aligned} T=U_{1}\cdot {\text {diag}}(s_{1},s_{2},\ldots ,s_{N})\cdot U_{2},\qquad U_{1}{\text { and }}U_{2}{\text { orthogonal}}. \end{aligned}$$

(43)

1.5 Square Matrix with Entries Independently $\sim {\mathcal {N}}$ and the Uniform Distribution on Orthogonals

Take a random $N\times N$ matrix Y with entries $Y_{i,j}$ i.i.d. $\sim {\mathcal {N}}$. If we polarly decompose the random Y as per Appendix A.4, from either side, then the orthogonal part will be distributed uniformly (i.e., by Haar’s measure) on the orthogonal group. This follows from the fact that, by the symmetries of the above distribution of Y, it is invariant under multiplying the random matrix on the right or left by a fixed orthogonal matrix. So, we have here a vehicle to get this uniform distribution. For a general excellent text on random matrices consult [23].

For the positive semidefinite part we have to check, say, $Y^{\prime }\cdot Y$ for our random matrix Y. But if u is any vector then, by the symmetries of the distribution of the random Y, Yu is distributed like $\Vert u\Vert $ times $Y\cdot (1,0,\ldots .0)$—i.i.d. $\sim {\mathcal {N}}$ entries, thus, with norm concentrated near $\Vert u\Vert \cdot \sqrt{N}$, with relative deviation $O(1/\sqrt{N})$. But, all the entries of $Y^{\prime }\cdot Y$ being discernible from $\langle Y^{\prime }\cdot Yu,u\rangle $ if we take as u elements of the standard basis $e_{i}=(0,\ldots ,0,1,0\ldots ,0)$ and sums of two of these, we obtain the following conclusion.

Conclusion 13

$(1/N)Y^{\prime }\cdot Y$ (and likewise $(1/N)Y\cdot Y^{\prime }$) is concentrated near ${\mathbf {1}}$ (${\mathbf {1}}$ denotes the identity matrix), with relative deviation O(1 / N).

In other words, the random Y is, with almost full probability, very near $\sqrt{N}$ times an orthogonal matrix. Indeed. to check how orthogonal $(1/\sqrt{N})Y$ is, note that the amount it distorts the inner product between unit vectors u and v is

$$\begin{aligned} (1/N)\langle Yu,Yv\rangle -\langle u,v\rangle =\langle ((1/N)Y^{\prime }\cdot Y-{\mathbf {1}})u,v\rangle =O(1/N). \end{aligned}$$

(44)

1.6 The Action of a Linear Operator in a High-Dimensional Space

Consider an $N\times N$ matrix T with given singular values $s_{1} ,s_{2},\ldots ,s_{N}$ as in (43). Let T act on a unit vector u with direction uniformly distributed over $S^{N-1}$. By (43) this is distributed, up to an orthogonal “rotation” of the space, the same as $S={\text {*}}{diag} (s_{1},s_{2},\ldots ,s_{N})$ acting on such a vector.

But by Sect. 4, that would be almost as S applied to $(1/\sqrt{N})x$, x with coordinates i.i.d. $\sim {\mathcal {N}}$, which is, of course, a vector with independent coordinates but the j-th coordinate distributed as $(1/\sqrt{N})s_{j}$ times ${\mathcal {N}}$.

Now, similarly to what we had in Sect. 4, the square of the norm of $S\cdot (1/\sqrt{N})x$, which is $(1/N)\sum _{j=1}^{N}s_{j}^{2} x_{j}^{2}$ has mean

$$\begin{aligned} (1/N)\sum _{j=1}^{N}s_{j}^{2}=\left( \Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{2}^{(\pi )}\right) ^{2}, \end{aligned}$$

(45)

around which it is concentrated—its standard deviation being

$$\begin{aligned} \sigma \cdot \sqrt{(1/N^{2})\sum _{j=1}^{N}s_{j}^{4}}=(1/\sqrt{N})\sigma \cdot \left( \Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{4}^{(\pi )}\right) ^{2}, \end{aligned}$$

(46)

where $\sigma $ is the standard deviation for $x^{2}$ when $x\sim {\mathcal {N}} $, namely,

$$\begin{aligned} \sigma =\sqrt{\frac{1}{\sqrt{2\pi }}\int (x^{2}-1)^{2}{\text {exp}} \left( -\textstyle {\frac{1}{2}}x^{2}\right) \,dx}=\sqrt{2}. \end{aligned}$$

(47)

By Appendix A.1, the relative deviation is, thus, expected, with almost full probability, to be $O(1/\sqrt{N})$. Note that since $T=U_{1} \cdot {\text {*}}{diag}(s_{1},s_{2},\ldots ,s_{N})\cdot U_{2}$, the value around which the norm of T applied to a uniformly distributed unit vector is concentrated is

$$\begin{aligned} \Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{2}^{(\pi )}=(1/\sqrt{N})\Vert S\Vert _{HS}=(1/\sqrt{N})\Vert T\Vert _{HS}. \end{aligned}$$

(48)

Dividing T by that value, we get a T with $(1/\sqrt{N})\Vert T\Vert _{HS}=1$ which, with almost full probability, will approximately preserve the norm. How “orthogonal” will it be? Let us see how S distorts the inner product between $(1/\sqrt{N})x$ and $(1/\sqrt{N})y$, all 2N coordinates of x and y i.i.d. $\sim {\mathcal {N}}$. The mean of the square of the difference

$$\begin{aligned} \langle S(1/\sqrt{N})x,S(1/\sqrt{N})y\rangle -\langle (1/\sqrt{N})x,(1/\sqrt{N})y\rangle \end{aligned}$$

(49)

is

$$\begin{aligned}&(1/N^{2}){\mathbb {E}}\left( \sum _{j=1}^{N}s_{j}^{2}x_{j}y_{j}-\sum _{j=1}^{N}x_{j}y_{j}\right) ^{2}=(1/N^{2}){\mathbb {E}}\left( \sum _{j=1} ^{N}(s_{j}^{2}-1)x_{j}y_{j}\right) ^{2}\nonumber \\&\quad =(1/N)\left( (1/N)\sum _{j=1}^{N}(s_{j}^{4}-2s_{j}^{2}+1)\right) \nonumber \\&\quad =(1/N)\left( \left( \Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{4}^{(\pi )}\right) ^{4}-2\left( \Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{2}^{(\pi )}\right) ^{2}+1\right) \nonumber \\&\quad =(1/N)\left( \left( \Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{4}^{(\pi )}\right) ^{4}-1\right) . \end{aligned}$$

(50)

Consequently, T is orthogonal, with almost full probability, up to $O(1/\sqrt{N})$. This leads to the following conclusion.

Conclusion 14

An $N\times N$ matrix T with given singular values $s_{1},s_{2},\ldots ,s_{N}$, acting on a high-dimensional $E^{N}$, would be expected to act, with almost full probability, as

$$\begin{aligned} \Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{2}^{(\pi )}=(1/\sqrt{N})\Vert T\Vert _{HS} \end{aligned}$$

(51)

times an orthogonal matrix, up to a relative deviation $O(1/\sqrt{N})$.

Remark 15

Now we address a seeming mystery raised by Conclusion 14. That conclusion seems to require that $(1/\sqrt{N})$ times the Hilbert-Schmidt norm of the product of two matrices with singular values $(s_{1},s_{2},\ldots ,s_{N})$ and $(s_{1}^{\prime },s_{2}^{\prime }\ldots ,s_{N}^{\prime }),$ respectively, be equal to the product of the same for the factors, i.e., to $\Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{2}^{(\pi )}\cdot \Vert (s_{1},s_{2} ,\ldots ,s_{N})\Vert _{2}^{(\pi )}$, up to relative deviation $O(1/\sqrt{N})$. Is that so?

Note that, by (43), the HS-norm of the product is that of $SUS^{\prime }$ where $S={\text {*}}{diag}(s_{1},s_{2},\ldots ,s_{N} )$, $S^{\prime }={\text {*}}{diag}(s_{1}^{\prime },s_{2}^{\prime }\ldots ,s_{N}^{\prime })$ and U is orthogonal. So, if, up to an $O(1/\sqrt{N})$ relative deviation, we model U as $(1/\sqrt{N})Y$, $Y=\left( Y_{i,j}\right) _{i,j}$ as in Appendix A.5, then $SUS^{\prime }=\left( (1/\sqrt{N})s_{i}Y_{i,j}s_{j}^{\prime }\right) _{i,j}$. The square of $(1/\sqrt{N})$ times its HS-norm is $(1/N^{2})\sum _{i,j=1,1}^{N,N} s_{i}^{2}Y_{i,j}^{2}s_{j}^{\prime 2}$, with mean indeed equal to the square of $\Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{2}^{(\pi )}\cdot \Vert (s_{1}^{\prime },s_{2}^{\prime }\ldots ,s_{N}^{\prime })\Vert _{2}^{(\pi )}$, and with standard deviation $\sigma \cdot 1/N$ times the square of $\Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{4}^{(\pi )}\cdot \Vert (s_{1}^{\prime },s_{2}^{\prime }\ldots ,s_{N}^{\prime })\Vert _{4}^{(\pi )}$.

1.7 The Rotation Effected by an Operator and by a Product of Operators in a High-Dimensional Space

Let T be an an $N\times N$ matrix, and consider the amount of rotation between v and Tv. The square of the distance between these vectors, both normalized to norm 1 will be

$$\begin{aligned}&\left\| \dfrac{Tv}{\Vert Tv\Vert }-\dfrac{v}{\Vert v\Vert }\right\| ^{2}=\left\langle \dfrac{Tv}{\Vert Tv\Vert }-\dfrac{v}{\Vert v\Vert },\dfrac{Tv}{\Vert Tv\Vert }-\dfrac{v}{\Vert v\Vert }\right\rangle \nonumber \\&\quad = 2-\dfrac{\langle Tv,v\rangle +\langle v,Tv\rangle }{\Vert Tv\Vert \Vert v\Vert }=2\left( 1-\dfrac{\langle T^{(sym)}v,v\rangle }{\Vert Tv\Vert \Vert v\Vert }\right) , \end{aligned}$$

(52)

where $T^{(sym)}:=\textstyle {\frac{1}{2}}(T+T^{\prime })$ is the symmetric part of T. Note that ${\text {tr}}\,T^{(sym)} ={\text {*}}{tr}\,T$. So, we are led to investigate the inner product $\langle Ax,x\rangle $ for A symmetric. Let $(s_{1},s_{2},\ldots ,s_{N})$ be its eigenvalues, then $A=U^{\prime }SU$ where $S={\text {*}}{diag} (s_{1},s_{2},\ldots ,s_{N})$ and U orthogonal. As we did above, we take $v=(1/\sqrt{N})x$, and x with coordinates i.i.d. $\sim {\mathcal {N}}$. Then

$$\begin{aligned} \left\langle A(1/\sqrt{N})x,(1/\sqrt{N})x\right\rangle =(1/N)\langle U^{\prime }SUx,x\rangle =(1/N)\langle SUx,Ux\rangle . \end{aligned}$$

(53)

But, Ux being distributed like x, this will have the same distribution as

$$\begin{aligned} (1/N)\langle Sx,x\rangle =(1/N)\sum _{j=1}^{N}s_{j}x_{j}^{2}, \end{aligned}$$

(54)

which has mean $(1/N)\sum _{j=1}^{N}s_{j}=(1/N){\text {*}}{tr}\,A$ and $(1/\sqrt{N})\sigma \Vert (s_{1},\ldots ,s_{N})\Vert _{2}^{(\pi )}$ is its standard deviation. Of course, if A is positive semidefinite then the $s_{\ell }\ge 0$ and the above mean is $\Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{1}^{(\pi )}$. This leads to the following conclusion.

Conclusion 16

For T with symmetric part with eigenvalues $(s_{1} ,s_{2},\ldots ,s_{N})$, the square of the distance between v and Tv, both normalized to norm 1, is, with almost full probability, near (with deviation $O(1/\sqrt{N})$)

$$\begin{aligned} 2\left( 1-\dfrac{(1/N){\text {*}}{tr}\,T}{(1/\sqrt{N})\Vert T\Vert _{HS} }\right) =2\left( 1-\dfrac{(1/N){\text {*}}{tr}\,T}{\Vert (s_{1} ,s_{2},\ldots ,s_{N})\Vert _{2}^{(\pi )}}\right) , \end{aligned}$$

(55)

which, if the symmetric part of T is positive-semidefinite, is equal to

$$\begin{aligned} 2\left( 1-\dfrac{\Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{1}^{(\pi )}}{\Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{2}^{(\pi )}}\right) . \end{aligned}$$

(56)

The next discussion will lead to a conclusion about a product $A_{M} A_{M-1}\cdots A_{1}$ of a sequence of symmetric operators. Consider a symmetric $A=U^{\prime }{\text {*}}{diag}(s_{1},s_{2},\ldots ,s_{N})U$ with given $s_{1},s_{2},\ldots ,s_{N}$. Take U uniformly distributed on the orthogonal group, which we model up to a relative deviation $O(1/\sqrt{N})$ by $1/\sqrt{N}\cdot Y$, $Y=\left( Y_{i,j}\right) _{i,j}$ as in Appendix A.5. Then

$$\begin{aligned} A=USU^{\prime }\approx \left( (1/N)\sum _{k=1}^{N}Y_{k,i}s_{k}Y_{k,j}\right) _{i,j}. \end{aligned}$$

(57)

Consequently,

$$\begin{aligned} {\mathbb {E[}}A]\approx (1/N)\left( \sum _{k=1}^{N}s_{k}\right) \cdot {\mathbf {1}}=(1/N){\text {*}}{tr}\,A\cdot {\mathbf {1}}. \end{aligned}$$

(58)

But here we cannot say, as we did in previous cases, that, with high probability, A would be near that average—indeed they cannot be “near” since the eigenvalues of the average are all $(1/N){\text {*}}{tr}\,A$ while those of A are with full probability $s_{1},s_{2},\ldots ,s_{N}$.

To apply the considerations of Appendix A.3, where one relies on a Markov chain employing uniform distribution on spheres, we inquire what is the distribution of $Av_{0}$, and of the difference vector $\left( \dfrac{Av_{0}}{\Vert Av_{0}\Vert }-\dfrac{v_{0}}{\Vert v_{0}\Vert }\right) $ for a fixed $v_{0}$, with A random as in (57) above. To fix matters, assume $v_{0}=(1,0,\ldots ,0)$. As above, we have $Av_{0}=U^{\prime }SUv_{0}$. where $S:={\text {*}}{diag}(s_{1},s_{2},\ldots ,s_{N})$. Or, with U replaced by $1/\sqrt{N}\cdot Y$, $Av_{0}\approx (1/N)Y^{\prime }SYv_{0}$. Write Y as (w, Z) where w is the $N\times 1$ matrix which is the first column of Y, and Z is the $N\times (N-1)$ matrix of the other columns. Then, with $v_{0}=(1,0,\ldots ,0)$, $Yv_{0}=w$, and

$$\begin{aligned} (1/N)Y^{\prime }SYv_{0}=(1/N)\left( \begin{array} [c]{c} w^{\prime }\\ Z^{\prime } \end{array} \right) Sw=(1/N)\left( \begin{array} [c]{c} w^{\prime }Sw\\ Z^{\prime }Sw \end{array} \right) . \end{aligned}$$

(59)

Note that the random Z and w are independent. Z is an $N\times (N-1)$ matrix with entries i.i.d. $\sim {\mathcal {N}}$, and by the symmetries of this distribution (as in Appendices A.2 and A.5), $(1/N)Z^{\prime }Sw$ is distributed like $(1/N)\Vert Sw\Vert $ times an $(N-1)$ vector with entries i.i.d. $\sim {\mathcal {N}}$—near $(1/\sqrt{N})\Vert Sw\Vert $ times a vector uniformly distributed on $S^{N-2}$. And, as in Appendix A.6, $(1/\sqrt{N})\Vert Sw\Vert $ is concentrated near $\Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{2}^{(\pi )}$. As for $(1/N)w^{\prime }Sw$—it is just (54)—its value is concentrated near $(1/N){\text {*}}{tr}\,A$, which if A is positive-semidefinite is equal to $\Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{1}^{(\pi )}$.

To conclude, the value our random A gives to $(1,0,\ldots ,0)$ is a vector with first coordinate near $(1/N){\text {*}}{tr}\,A$—which if A is positive-semidefinite is $\Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{1}^{(\pi )}$, and other coordinates forming a vector near the product of $\Vert (s_{1} ,s_{2},\ldots ,s_{N})\Vert _{2}^{(\pi )}$ with a vector uniformly distributed on $S^{N-2}$. Its norm is $\Vert (s_{1},s_{2},\ldots ,s_{N})\Vert _{2}^{(\pi )}$ up to a deviation O(1 / N), and one obtains values agreeing with the above for $\langle Ax,x\rangle $ and the square of the distance between v and Av, both normalized.

In particular, for A symmetric, employing uniform distribution on spheres in the Markov chain as in Appendix A.3 and Conclusion 12 is vindicated. Therefore, for a product of a sequence of symmetric operators $A_{M}A_{M-1}\cdots A_{1}$, we may apply Conclusion 12 to obtain the following conclusion.

Conclusion 17

For a product $A_{M}A_{M-1}\cdots A_{1}$, of a sequence of symmetric operators $A_{i}$ with given eigenvalues $(s_{1} ^{(i)},s_{2}^{(i)},\ldots ,s_{N}^{(i)})$, the square of the distance between v and $A_{M}A_{M-1}\cdots A_{1}v$, both normalized to norm 1, is, with almost full probability, near (with deviation $O(\sqrt{M}/\sqrt{N})$)

$$\begin{aligned} 2\left( 1-\prod _{i=1}^{M}\dfrac{(1/N){\text {*}}{tr}\,A_{i}}{(1/\sqrt{N})\Vert A_{i}\Vert _{HS}}\right) =2\left( 1-\prod _{i=1}^{M}\dfrac{(1/N){\text {*}}{tr}A_{i}}{\Vert (s_{1}^{(i)},s_{2}^{(i)},\ldots ,s_{N}^{(i)})\Vert _{2}^{(\pi )}}\right) , \end{aligned}$$

(60)

which, if for all i, $A_{i}$ is positive semidefinite, is equal to

$$\begin{aligned} 2\left( 1-\prod _{i=1}^{M}\dfrac{\Vert (s_{1}^{(i)},s_{2}^{(i)},\ldots ,s_{N}^{(i)})\Vert _{1}^{(\pi )}}{\Vert (s_{1}^{(i)},s_{2}^{(i)},\ldots ,s_{N}^{(i)})\Vert _{2}^{(\pi )}}\right) . \end{aligned}$$

(61)

Remark 18

Note that if the $A_{i}$ are positive semidefinite, the value (61) around which the square of the distance between the points on $S^{N-1}$ is concentrated, is $\le 2$, that is, the distance is $\le \sqrt{2}$ and the angle between the vectors is $\le 90^{\circ }$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Censor, Y., Levy, E. An Analysis of the Superiorization Method via the Principle of Concentration of Measure. Appl Math Optim 83, 2273–2301 (2021). https://doi.org/10.1007/s00245-019-09628-4

Download citation

Published: 11 November 2019
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00245-019-09628-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Analysis of the Superiorization Method via the Principle of Concentration of Measure

Abstract

Access this article

Similar content being viewed by others

Derivative-free superiorization: principle and algorithm

On optimality conditions for quasi-relative efficient solutions in set-valued optimization

On the Extension of the DIRECT Algorithm to Multiple Objectives

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Some Concentration of Measure Facts in a High-Dimensional \(E^{N}\)

1.1 The Probability \(L^{p}\) Norms of Vectors

1.2 The Norm of the Sum of Vectors with Given Norms

Conclusion 10

1.3 The Accumulation of Given Distances on the Unit Sphere

Remark 11

Conclusion 12

1.4 A Reminder: Polar Decomposition and Singular Values of a Matrix

1.5 Square Matrix with Entries Independently \(\sim {\mathcal {N}}\) and the Uniform Distribution on Orthogonals

Conclusion 13

1.6 The Action of a Linear Operator in a High-Dimensional Space

Conclusion 14

Remark 15

1.7 The Rotation Effected by an Operator and by a Product of Operators in a High-Dimensional Space

Conclusion 16

Conclusion 17

Remark 18

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Analysis of the Superiorization Method via the Principle of Concentration of Measure

Abstract

Access this article

Similar content being viewed by others

Derivative-free superiorization: principle and algorithm

On optimality conditions for quasi-relative efficient solutions in set-valued optimization

On the Extension of the DIRECT Algorithm to Multiple Objectives

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Some Concentration of Measure Facts in a High-Dimensional \(E^{N}\)

Some Concentration of Measure Facts in a High-Dimensional \(E^{N}\)

1.1 The Probability \(L^{p}\) Norms of Vectors

1.2 The Norm of the Sum of Vectors with Given Norms

Conclusion 10

1.3 The Accumulation of Given Distances on the Unit Sphere

Remark 11

Conclusion 12

1.4 A Reminder: Polar Decomposition and Singular Values of a Matrix

1.5 Square Matrix with Entries Independently \(\sim {\mathcal {N}}\) and the Uniform Distribution on Orthogonals

Conclusion 13

1.6 The Action of a Linear Operator in a High-Dimensional Space

Conclusion 14

Remark 15

1.7 The Rotation Effected by an Operator and by a Product of Operators in a High-Dimensional Space

Conclusion 16

Conclusion 17

Remark 18

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation