1 Introduction

Let A be an \(m\times n\) matrix and consider the matrix norm

$$\begin{aligned} \Vert A\Vert _{\beta \rightarrow \alpha }=\max _{x \ne 0}\frac{\Vert Ax\Vert _\alpha }{\Vert x\Vert _\beta }, \end{aligned}$$

where \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \) are vector norms.

Computing \(\Vert A\Vert _{\beta \rightarrow \alpha }\) is a classical problem in computational mathematics, as norms of this kind arise naturally in many situations, such as approximation theory, estimation of matrix condition numbers and approximation of relative residuals [26]. However, attention around the problem of computing \(\Vert A\Vert _{\beta \rightarrow \alpha }\) has been growing in recent years. In fact, for example, matrix norms of this type can be used in combinatorial optimization and sparse data recovery, to approximate generalized Grothendieck and restricted isometry constants [1, 6, 16, 30], in scientific computing, to estimate the largest entries of large matrices [27], in data mining and learning theory, to minimize empirical risks or obtain robust nonnegative graph embeddings [9, 41], or in quantum information theory and the study of Khot’s unique game conjecture where the computational complexity of evaluating \(\Vert A \Vert _{\beta \rightarrow \alpha }\) plays an important role [2]. Moreover, it was observed by Lim [33] that the notion of tensor norm and tensor spectrum relates to \(\Vert A\Vert _{\beta \rightarrow \alpha }\) in a very natural way and thus relevant advances on the problem of computing \(\Vert A\Vert _{\beta \rightarrow \alpha }\) when A is entrywise nonnegative and \(\Vert \cdot \Vert _\alpha \), \(\Vert \cdot \Vert _{\beta }\) are \(\ell ^p\) norms have been recently obtained as a consequence of a number of new nonlinear Perron–Frobenius-type theorems for higher-order maps [15, 19, 21, 22].

Closed form solutions and efficient algorithms are known for some special \(\ell ^p\) norms, as for instance the case where \(\Vert \cdot \Vert _\alpha =\Vert \cdot \Vert _\beta \) and they coincide with either the \(\ell ^1\), the \(\ell ^2\), or the \(\ell ^\infty \) norm, or the case where \(p\le 1\le q\) and \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \) are \(\ell ^p\) and \(\ell ^q\) (semi) norms, respectively (c.f. [10, 32, 36]). However, the computation of \(\Vert A\Vert _{\beta \rightarrow \alpha }\) is generally NP-hard [23, 38].

The best known method for the computation of \(\Vert A\Vert _{\beta \rightarrow \alpha }\) is the (nonlinear) power method, essentially introduced by Boyd [4] and then further analyzed and extended for instance in [3, 15, 25, 39]. When the considered vector norms are \(\ell ^p\) norms, the power method can count on a very fundamental global convergence result which ensures convergence to the matrix norm \(\Vert A\Vert _{\beta \rightarrow \alpha }\) for a class of entry-wise nonnegative matrices A and for a range of \(\ell ^p\) norms. We discuss in detail the method and its convergence in Sect. 2.

The convergence of the method is a consequence of an elegant fixed point argument that involves a nonlinear operator \({\mathcal {S}}_A\) and its Lipschitz contraction constant. However, the convergence analysis of this method has two main uncovered points: on the one hand, all the work done so far addresses only the case of \(\ell ^p\) norms whereas almost nothing is known about the global convergence behavior of the power iterates for more general norms. On the other hand, even for the case of \(\ell ^p\) norms, known upper-bounds on the contraction constant of \({\mathcal {S}}_A\) are not sharp, especially for positive matrices. In this work we provide novel results that address and improve both these directions.

Consider for example the case where \(\Vert \cdot \Vert _\alpha \) is defined as

$$\begin{aligned} \Vert x\Vert _\alpha = \Vert (x_1, \dots , x_k)\Vert _{p_1} + \Vert (x_{k+1},\ldots , x_{n})\Vert _{p_2} \end{aligned}$$
(1)

where k is a positive integer not larger than the dimension of x and \(\Vert \cdot \Vert _{p_i}\) are \(\ell ^p\) norms. Of course one can extend this idea by looking at any family of subsets of entries of x and any set of \(\ell ^p\) norms, in order to generate arbitrarily new norms. Norms of this form are natural modifications of \(\ell ^p\) norms and are used for instance to define the generalized Grothendieck constants as in [30] or in graph matching problems to build continuous relaxation of the set of matrix permutations [11, 34]. However, even for this case, extending the result of Boyd is not straightforward.

In this work we consider general pairs of monotonic and differentiable vector norms and provide a thorough convergence analysis of the power method for the computation of the corresponding induced matrix norm \(\Vert A\Vert _{\beta \rightarrow \alpha }\). Our result is based on a novel nonlinear Perron–Frobenius theorem for this kind of norms and ensures global convergence of the power method provided that the Birkhoff contraction ratio of the power iterator is smaller than one.

When applied to the case \(\Vert A\Vert _{q\rightarrow p}\) of \(\ell ^p\) norms, our result does not only imply the current convergence result, but actually significantly improves the range of values of p and q for which global convergence can be ensured. This is particularly interesting from a complexity viewpoint. In fact, for example, although the computation of \(\Vert A\Vert _{q\rightarrow p}\) is well known to be NP-hard for \(p>q\), we show that for a non-trivial class of nonnegative matrices the power method converges to \(\Vert A\Vert _{q\rightarrow p}\) in polynomial time even for p sensibly larger than q. To our knowledge this is the first global optimality result for this problem that does not require the condition \(p\le q\).

In the general case \(\Vert A\Vert _{\beta \rightarrow \alpha }\), a main computational drawback of the power method is related with the computation of the dual norm \(\Vert \cdot \Vert _{\beta ^*}\). In fact, if \(\Vert \cdot \Vert _\beta \) is not an \(\ell ^p\) norm, the corresponding dual norm may be challenging to compute [14]. In practice, evaluating \(\Vert \cdot \Vert _{\alpha ^*}\) from \(\Vert \cdot \Vert _{\alpha }\) can be done via convex optimization and Corollary 7 of [14] proves that \(\Vert \cdot \Vert _{\alpha ^*}\) can be evaluated in polynomial time (resp. is NP-hard) if and only if \(\Vert \cdot \Vert _{\alpha }\) can be evaluated in polynomial time (resp. is NP-hard). There are norms for which an explicit expression in terms of arithmetic operations for \(\Vert \cdot \Vert _{\alpha }\) is given by construction (resp. modelisation), but such an expression is not available for the dual \(\Vert \cdot \Vert _{\alpha ^*}\). As we discuss in Sect. 5.1, examples of this type include for instance \(\Vert x \Vert _{\alpha }=(\Vert x \Vert ^2_{p}+\Vert x \Vert ^2_{q})^{1/2}\). A further main result of this work addresses this issue for the particular case of norms of the type (1). For this family of norms we provide an explicit convergence bound and an explicit formula for the power iterator for the computation of the corresponding matrix norm \(\Vert A\Vert _{\beta \rightarrow \alpha }\). To illustrate possible applications of the result, we list in Corollaries 38 relatively sophisticated and non-standard matrix norms together with an explicit condition for their computability.

We organize the discussion as follows: In Sect. 2 we review the nonlinear power method and its main convergence properties. In Sect. 3 we review relevant preliminary cone-theoretic results and notation. Then, in Sect. 4, we propose a novel and detailed global convergence analysis of the method based on a Perron–Frobenius type result for the map \(x\mapsto \Vert Ax\Vert _\alpha /\Vert x\Vert _\beta \), in the case of entry-wise nonnegative matrices and monotonic norms \(\Vert \cdot \Vert _\alpha , \Vert \cdot \Vert _\beta \). We derive new conditions for the global convergence to \(\Vert A\Vert _{\beta \rightarrow \alpha }\) that, in particular, help shedding new light on the NP-hardness of the problem, and we propose a new explicit bound on the linear convergence rate of the power iterates. In Sect. 5 we focus on the particular case of norms of the same form as (1). We show how to practically implement the power method for this type of norms, we prove a specific convergence criterion that gives a-priori global convergence guarantees and we discuss the complexity of the method. Finally, in Sect. 6 we illustrate the behaviour of the nonlinear power method on some example matrix norm.

2 Boyd’s Nonlinear Power Method

Let \(\Vert \cdot \Vert _p\), \(\Vert \cdot \Vert _q\) be the usual \(\ell ^p\) and \(\ell ^q\) vector norms and consider the induced matrix norm \(\Vert A\Vert _{q\rightarrow p} = \max _{x\ne 0}\Vert Ax\Vert _p / \Vert x\Vert _q\). A well known explicit formula holds for the \(\ell ^1\) and \(\ell ^\infty \) matrix norms \(\Vert A\Vert _{1\rightarrow 1}\), \(\Vert A\Vert _{\infty \rightarrow \infty }\). However, while the mixed norm \(\Vert A\Vert _{1\rightarrow \infty }\) equals \(\max _{ij}|a_{ij}|\), the computation of \(\Vert A\Vert _{\infty \rightarrow 1}\) is NP-hard [36]. More generally, when p is any rational number \(p\ne 1,2\), computing the norm \(\Vert A\Vert _{p\rightarrow p}\) is NP-hard for a general matrix A [23], and the same holds for any norm \(\Vert A\Vert _{q\rightarrow p}\), for \(1\le p< q \le \infty \) [38]. The best known technique to compute \(\Vert A\Vert _{q\rightarrow p}\) is a form of nonlinear power method that we review in what follows.

Consider the nonnegative function \(f_{A}(x) = \Vert Ax\Vert _p/\Vert x\Vert _q\). The norm \(\Vert A\Vert _{q\rightarrow p}\) is the global maximum of \(f_A\) by analyzing the optimality conditions of \(f_A\), for differentiable \(\ell ^p\)-norms \(\Vert \cdot \Vert _p\) and \(\Vert \cdot \Vert _q\), we note that

$$\begin{aligned} \nabla f_A(x)=0 \Longleftrightarrow A^T J_p (Ax)=f_A(x)J_q(x), \end{aligned}$$

where, for \(1<p<\infty \), we denote by \(J_p(x)\) the gradient of the norm \(\nabla \Vert x\Vert _p = J_p(x)=\Vert x\Vert _p^{1-p}\, \Phi _p(x)\), with \(\Phi _p(x)\) entrywise defined as \(\Phi _p(x)_i = |x_i|^{p-2}x_i\). Let \(p^*\) be the dual exponent such that \(1/p+1/p^* =1\). As \(J_{p^*}(J_p(x))=x/\Vert x\Vert _p\) for all \(x\ne 0\) and \(J_{p}(\lambda \, x) = J_p(x)\) for any coefficient \(\lambda >0\), we have that \(\nabla f_A(x)=0\) if and only if \(J_{q^*}(A^TJ_p(Ax))= x/\Vert x\Vert _q\). Thus, x with \(\Vert x\Vert _{q}=1\) is a critical point of \(f_A(x)\) if and only if it is a fixed point of the map \(J_{q^*}(A^T J_{p}(Ax))\). The associated fixed point iteration

$$\begin{aligned} x_0 = x_0 /\Vert x_0 \Vert _{q}, \quad x_{k+1} = J_{q^*}(A^TJ_p(Ax_k))\quad \text {for}\quad k=0,1,2,3,\ldots \end{aligned}$$
(2)

defines what we call (nonlinear) power method for \(\Vert A\Vert _{q\rightarrow p}\).

Although, in practice, the method applied to \(\Vert A\Vert _{p\rightarrow p}\) for \(p=1,\infty \) often seems to converge to the global maximum (see e.g. [24]), no guarantees exist for the general case. For differentiable \(\ell ^p\) norms and nonnegative matrices, instead, conditions can be established in order to guarantee that the power iterates always converge to a global maximizer of \(f_A\). The idea is that when the power method is started in the positive orthant then, provided A has an appropriate non-zero pattern, each iterate of the method will stay in this orthant until convergence. Then, a nonlinear Perron–Frobenius type result is proved to guarantee that there exists only one critical point of \(f_A\) in this region and this point is a global maximizer of \(f_A\). While this idea was already known by Perron himself in the Euclidean \(\ell ^2\) case, to our knowledge, the first version of this result for norms different than the Euclidean norm, has been proved by Boyd [4]. However, Boyd did not prove the uniqueness of positive critical points but only that they are global maximizer of \(f_A\) under the assumption that \(A^TA\) is irreducible and \(1<p\le q<\infty \). This work is then revisited by Bhaskara and Vijayaraghavan [3] who proved uniqueness for positive matrices A and \(1<p\le q<\infty \). Independently Friedland, Gaubert and Han proved in [15] similar results for \(1<p\le 2 \le q<\infty \) and any nonnegative A such that the matrix \(\left[ \begin{array}{cc} 0 &{} A \\ A^T &{} 0 \end{array}\right] \) is irreducible. Their result was then extended to \(1<p\le q<\infty \) in [18] under the assumption that \(A^T A\) is irreducible. Finally, all these results have been improved in [22], leading to the following

Theorem 1

(Theorems 3.2 and 3.3, [22]) Let \(A\in \mathbb {R}^{m\times n}\) be a matrix with nonnegative entries and suppose that \(A^TA\) has at least one positive entry per row. If \(1<p\le q <\infty \), then, every positive critical point of \(f_A\) is a global maximizer. Moreover, if either \(p<q\) or \(A^TA\) is irreducible, then \(f_A\) has a unique positive critical point \(x^+\) and the power sequence (2) converges to \(x^+\) for every positive starting point.

In this work we consider the case of a matrix norm defined in terms of arbitrary vector norms \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \) and we prove Theorem 4 below, which is a new version of Theorem 1, holding for general vector norms, provided that suitable and mild differentiability and monotonicity conditions are satisfied. We stress that Theorems 1 and 4 are not corollaries of each other in the sense that there are cases where exactly one, both or none apply. However, when both apply, then Theorem 4 is more informative. We discuss in detail these discrepancies in Sect. 4.1 and give there examples to illustrate them. In particular, a noticeable difference is that, for positive matrices A, the newly proposed Theorem 4 ensures uniqueness and maximality for choices of \(1<p,q<\infty \) that include the range \(p>q\). This is, to our knowledge, the first global optimality result for this problem that includes such range of values.

The key of our approach is the use of cone geometry techniques and the Birkhoff–Hopf theorem, which we recall below.

3 Cone–theoretic Background

We start by recalling concepts from conic geometry. Let \(\mathbb {R}^n_+\) be the nonnegative orthant in \(\mathbb {R}^n\), that is \(x\in \mathbb {R}^n_+\) if \(x_i\ge 0\) for every \(i=1,\ldots ,n\). The cone \(\mathbb {R}^n_+\) induces a partial ordering on \(\mathbb {R}^n\) as follows: for every \(x,y\in \mathbb {R}^n\) we write \(x\le y\) if \(y-x\in \mathbb {R}^n_+\), i.e. \(x_i\le y_i\) for every i. Furthermore, \(x,y\in \mathbb {R}^n_+\) are comparable, and we write \(x\sim y\), if there exist \(c,C>0\) such that \(cy \le x \le Cy\). Clearly, \(\sim \) is an equivalence relation and the equivalence classes in \(\mathbb {R}^n_+\) are called the parts of \(\mathbb {R}^n_+\). For example, if \(n=2\) and \(x = (1,0)\), then the equivalence class of x in \(\mathbb {R}^2_+\) is given by \(\{(y_1,0): y_1>0\}\).

For simplicity, from now on we will say that a vector is nonnegative (resp. positive) if its entries are nonnegative (resp. positive). The same nomenclature will be used for matrices.

We recall that a norm \(\Vert \cdot \Vert \) on \(\mathbb {R}^n\) is monotonic if for every \(x,y\in \mathbb {R}^n\) such that \(|x|\le |y|\), where the absolute value is taken componentwise, it holds \(\Vert x \Vert \le \Vert y \Vert \) and it is strongly monotonic if for every \(x,y\in \mathbb {R}^n\) with \(|x|\ne |y|\) and \(|x|\le |y|\) it holds \(\Vert x \Vert <\Vert y \Vert \).

One of the key tools for our main result is the Hilbert’s projective metric \(d_H:\mathbb {R}^n_+\times \mathbb {R}^n_+\rightarrow [0,\infty ]\), defined as follows:

$$\begin{aligned} d_H(x,y)={\left\{ \begin{array}{ll} \ln \big (M(x/y)M(y/x)\big ) &{} \text {if } x\sim y,\\ 0 &{} \text {if }x=y=0,\\ \infty , &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

where \(M(x/y) = \inf \{C>0 : x \le Cy\}\). We collect in the following lemma some useful properties of \(d_H\). Most of these results are known and can be found in [31]. Moreover, similarly to what is observed in Theorem 3 of [20], we prove a direct relation between the infinity norm and the Hilbert metric, which is useful for deriving explicitly computable convergence rates for the power method.

Lemma 1

For every \(x,y\in \mathbb {R}^n_+\), it holds \(d_H(x,y)=0\) if and only if \(x=\lambda y\) for some \(\lambda >0\) and \(d_H(cx,\widetilde{c}y)=d_H(x,y)\) for every \(c,\widetilde{c}>0\). Moreover, let \(\Vert \cdot \Vert \) be a monotonic norm on \(\mathbb {R}^n\), P a part of \(\mathbb {R}^n_+\) and define \(\mathbb {M}=P\cap \{x\in \mathbb {R}^n_+: \Vert x \Vert =1\}\). Then, \((\mathbb {M},d_H)\) is a complete metric space and

$$\begin{aligned} \Vert x-y \Vert _{\infty } \le r\, d_H(x,y) \quad \forall x,y\in \mathbb {M}, \end{aligned}$$
(3)

where \(r=\inf \{t>0: x_i\le t\ \forall x\in \mathbb {M}, i=1,\ldots ,n\}\).

Proof

Proposition 2.1.1 in [31] implies that \(d_H(x,y)=0\) if and only if \(x=\lambda y\) and that \((\mathbb {M},d_H)\) is a metric space. The property \(d_H(cx,\widetilde{c}y)=d_H(x,y)\) for every \(c,\widetilde{c}>0\) follows directly from the definition of \(d_H\). The completeness of \((\mathbb {M},d_H)\) is a consequence of Proposition 2.5.4 in [31]. We prove (3). If \(P=\{0\}\), the result is trivial so we assume \(P\ne \{0\}\) and let \(i_1,\ldots ,i_m\) be such that for any \(z\in \mathbb {R}^n_+\), \(z\in P\) if and only if \(z_{i_1},\ldots ,z_{i_m}>0\). Let \(x,y\in \mathbb {M}\), then \(x\le M(x/y)y\) and, by monotonicity of \(\Vert \cdot \Vert \), it follows \(1 = \Vert x \Vert \le M(x/y)\Vert y \Vert = M(x/y).\) Similarly \(M(y/x)\ge 1\), so that \(M(x/y)M(y/x) \ge \max \big \{M(x/y),M(y/x)\big \}.\) It follows that

$$\begin{aligned} d_H(x,y) \ge \ln \big (\max \big \{M(x/y),M(y/x)\big \}\big ) = \Vert \overline{x}-\overline{y} \Vert _{\infty }, \end{aligned}$$

where \(\overline{x}=\big (\ln (x_{i_1}),\ldots ,\ln (x_{i_m})\big )\) and \(\overline{y}=\big (\ln (y_{i_1}),\ldots ,\ln (y_{i_m})\big )\). By definition of \(r>0\), we have \(\ln (x_{i_j}),\ln (y_{i_j})\in (-\infty ,\ln (r)]\) for every \(j=1,\ldots ,m\). Furthermore, by the mean value theorem, we have

$$\begin{aligned} |e^s-e^t| \le |s-t| \max _{\xi \in (-\infty ,\ln (r)]}e^{\xi } = r|s-t| \quad \forall s,t\in (-\infty ,\ln (r)]. \end{aligned}$$

Finally, with \(\widetilde{x}=(x_{i_1},\ldots ,x_{i_m})\) and \(\widetilde{y}=(y_{i_1},\ldots ,y_{i_m})\), we obtain

$$\begin{aligned} d_H(x,y) \ge \Vert \overline{x}-\overline{y} \Vert _{\infty } \ge r^{-1} \Vert \widetilde{x}-\widetilde{y} \Vert _{\infty }=r^{-1}\Vert x-y \Vert _{\infty } \end{aligned}$$

which concludes the proof. \(\square \)

Observe that if r is defined as in Lemma 1 and \(\Vert \cdot \Vert \) is strongly monotonic, then

$$\begin{aligned} r\le {\widetilde{r}}=\max _{i=1,\ldots ,n}\frac{1}{\Vert e_i \Vert }. \end{aligned}$$
(4)

Indeed, if \(y\in \mathbb {M}\) is such that there exists \(j\in \{1,\ldots ,n\}\) with \(y_j>{\widetilde{r}}\), then \(1 = \Vert y \Vert >\Vert {\widetilde{r}} e_j \Vert = {\widetilde{r}} \Vert e_j \Vert \), which is not possible.

The proof of our main theorem is based on the Banach contraction principle. Thus, for a map \(F:\mathbb {R}_+^n\rightarrow \mathbb {R}_+^m\) we consider the Birkhoff contraction ratio \(\kappa _H(F)\in [0,\infty ]\) of F, defined as the smallest Lipschitz constant of F with respect to \(d_H\):

$$\begin{aligned} \kappa _H(F)=\inf \big \{C>0: d_H(F(x),F(y))\le Cd_H(x,y), \ \forall x,y\in \mathbb {R}^n_{+} \text { s.t. } x \sim y\big \}. \end{aligned}$$

Clearly, if there exist \(x,y\in \mathbb {R}^n_+\) such that \(x\sim y\) and \(F(x)\not \sim F(y)\), then \(\kappa _H(F)=\infty \). However, such a situation never happens when F is a linear map in which case \(\kappa _H(F)\le 1\) always holds. Indeed, if \(A\in \mathbb {R}^{m\times n}\) is a nonnegative matrix, \(x,y\in \mathbb {R}^n_+\) and \(x\sim y\), then \(x\le M(x/y)y\) implies \(Ax \le M(x/y)Ay\). Similarly, we have \(Ay \le M(y/x)Ax\) and thus \(Ax\sim Ay\). These inequalities also imply that \(\kappa _H(A)\le 1\). This upper bound is not tight in many cases. However, thanks to the Birkhoff–Hopf theorem, a better estimate of \(\kappa _H(A)\) can be obtained by computing the projective diameter \(\triangle (A)\in [0,\infty ]\) of A, defined as

$$\begin{aligned} \triangle (A)= \sup \big \{d_H(Ax,Ay) : x,y\in \mathbb {R}^n_+ \text { with } x\sim y\big \}. \end{aligned}$$
(5)

This is formalized in the following theorem whose proof can be found in Theorems 3.5 and 3.6 of [12].

Theorem 2

(Birkhoff–Hopf) Let \(A\in \mathbb {R}^{m\times n}\) be a matrix with nonnegative entries, then

$$\begin{aligned} \kappa _H(A)=\tanh \!\big (\triangle (A)/4\big ), \end{aligned}$$

where \(\tanh (t) = (e^{2t}-1)/(e^{2t}+1)\) and with the convention \(\tanh (\infty )=1\).

The above theorem is particularly useful when combined with the following Theorem 6.2 in [12] and Theorem 3.12 in [37]:

Theorem 3

Let \(A\in \mathbb {R}^{m\times n}\) be a matrix with nonnegative entries and \(e_1,\ldots ,e_n\) the canonical basis of \(\mathbb {R}^n\). If there exists \({\mathcal {I}}\subset \{1,\ldots ,n\}\) such that \(Ae_i \sim Ae_j\) for all \(i,j\in {\mathcal {I}}\) and \(Ae_i=0\) for all \(i\notin {\mathcal {I}}\), then

$$\begin{aligned} \triangle (A) = \max _{i,j\in {\mathcal {I}}}d_H(Ae_i,Ae_j) <\infty . \end{aligned}$$

In particular, if all the entries of A are positive, then \(\triangle (A) = \ln \big (\max _{i,j,k,l}\frac{a_{ki}\, a_{lj}}{a_{kj}\, a_{li}}\big )\) and \(\triangle (A)=\triangle (A^T)\). Moreover, if A has at least one positive entry per row and per column but A is not positive, then \(\triangle (A)=\infty \).

Unfortunately, such simple formulas for the Birkhoff contraction ratio are, to our knowledge, not known for general nonlinear mappings. We refer however to Corollary 2.1 in [35] and Corollary 3.9 in [17] for general characterizations of this ratio.

4 Nonlinear Perron–Frobenius Theorem for \(\Vert A\Vert _{\beta \rightarrow \alpha }\)

Given \(A\in \mathbb {R}^{m\times n}\), consider the matrix norm \(\Vert A\Vert _{\beta \rightarrow \alpha }=\max _{x \ne 0}{\Vert Ax\Vert _\alpha }/{\Vert x\Vert _\beta }\), where \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \) are arbitrary vector norms on \(\mathbb {C}^m\) and \(\mathbb {C}^n\), respectively. Then, as for the case of \(\ell ^p\) norms, consider the function

$$\begin{aligned} f_{A}(x) = \frac{\Vert Ax\Vert _\alpha }{\Vert x\Vert _\beta } . \end{aligned}$$
(6)

For an arbitrary possibly non-differentiable vector norm \(\Vert \cdot \Vert \) it holds ([13, e.g.])

$$\begin{aligned} \partial \Vert x \Vert = \{y : \langle y,x\rangle = \Vert x\Vert , \Vert y\Vert _* = 1\} , \end{aligned}$$
(7)

where \(\partial \) denotes the subdifferential and \(\Vert \cdot \Vert _*\) is the dual norm of \(\Vert \cdot \Vert \), defined as \(\Vert y\Vert _* = \max _{x\ne 0}\langle x,y\rangle /\Vert x\Vert \). Again, for notational convenience, given the vector norm \(\Vert x\Vert _\alpha \), we introduce the set-valued operator \(J_\alpha \) such that

$$\begin{aligned} J_\alpha (x)=\partial \Vert x\Vert _\alpha ,\quad \forall x\ne 0 \quad \text {and}\quad J_{\alpha }(0)=0. \end{aligned}$$

The definition of dual norm implies the generalized Hölder inequality \(\langle x,y\rangle \le \Vert x\Vert \Vert y\Vert _*\). Thus, for a vector x and a norm \(\Vert \cdot \Vert _\alpha \), the set of vectors \(J_\alpha (x)\) coincides with the set of vectors in the unit sphere of the dual norm of \(\Vert \cdot \Vert _\alpha \), for which equality holds in the Hölder inequality. In fact, the subdifferential of a norm \(J_\alpha \) is strictly related with the duality mapping \({\mathcal {J}}_\alpha \) induced by that norm. Precisely, by Asplund’s theorem (see e.g. [7]), we have that

$$\begin{aligned} {\mathcal {J}}_\alpha (x) = \frac{1}{2} \partial \Vert x\Vert _\alpha ^2 = \Vert x\Vert _\alpha J_\alpha (x). \end{aligned}$$
(8)

It is well known that the subgradient of a convex function f is single valued if and only if f is Fréchet differentiable. Therefore \(J_\alpha \) is single valued if and only if \(\Vert \cdot \Vert _\alpha \) is a Fréchet differentiable norm. The assumption that the duality maps involved are single valued will be crucial for our main result. For this reason, throughout we make the following assumptions on the norms we are considering:

Assumption 1

The norms \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \) we consider are such that

  1. 1.

    \(\Vert \cdot \Vert _\alpha \) is Fréchet differentiable.

  2. 2.

    The dual norm \(\Vert \cdot \Vert _{\beta ^*}\) is Fréchet differentiable.

  3. 3.

    Both \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _{\beta *}\) are strongly monotonic.

Remark 1

Recall that every monotonic norm \(\Vert \cdot \Vert \) is also absolute (see e.g. [28, Thm. 1]), that is \(\Vert \,|x|\,\Vert =\Vert x \Vert \) for every x, where |x| denotes the entrywise absolute value. This implies, in particular, that a monotonic norm is Fréchet differentiable at every \(x\in \mathbb {R}^n{\setminus }\{0\}\) if and only if it is Fréchet differentiable at every \(x\in \mathbb {R}^n_+{\setminus }\{0\}\).

Points (1) and (1) of Assumption 1 ensure that the following nonlinear mapping

$$\begin{aligned} {\mathcal {S}}_A(x)=J_{\beta ^*}(A^T J_\alpha ( Ax)) \end{aligned}$$
(9)

is single valued. Point (1) ensures that for nonnegative matrices the maximum of \(f_A\) is attained on a nonnegative vector and that if \(A^TA\) is irreducible, then this maximizer has positive entries. Overall, they allow us to prove the following fundamental preliminary Lemmas 26.

First, we discuss the critical points of \(f_A\). If \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \) satisfy Assumption 1, then \(f_A\) may not be differentiable. Indeed, the differentiability of \(\Vert \cdot \Vert _{\beta ^*}\) does not imply that of \(\Vert \cdot \Vert _{\beta }\) (see for instance [7, Chapter II]). Hence, in the following, we use Clarke’s generalized gradient [8] to discuss the critical points of \(f_A\). In particular, let us recall that, by [8, Prop. 2.2.7], the generalized gradient of a convex function coincides with its subgradient. Moreover, it can be verified that \(f_A\) is locally Lipschitz near every \(x\in \mathbb {R}^n{\setminus }\{0\}\) so that its generalized gradient \(\partial f_A(x)\subset \mathbb {R}^n\) is well defined and x is a critical point of \(f_A\) if \(0\in \partial f_A(x)\). Moreover, if \(f_A\) attains a local minimum or maximum at \(x\ne 0\), then \(0\in \partial f_A(x)\) by [8, Prop. 2.3.2].

Lemma 2

Let \(\Vert \cdot \Vert _\alpha \), \(\Vert \cdot \Vert _\beta \) satisfy Assumption 1 and let \(x\in \mathbb {R}^n_+\) with \(\Vert x\Vert _{\beta }=1\) and \(f_A(x)\ne 0\). If x is a critical point of \(f_A\), then it is a fixed point of \({\mathcal {S}}_A\). Conversely, if x is a fixed point of \({\mathcal {S}}_A\) and \(\Vert \cdot \Vert _{\beta }\) is differentiable, then x is a critical point of \(f_A\).

Proof

First, assume that \(0\in \partial f_A(x)\). As \(\Vert \cdot \Vert _{\alpha }\) and \(\Vert \cdot \Vert _{\beta }\) are Lipschitz functions and \(\Vert x \Vert _{\beta }=1\), Proposition 2.3.14 of [8] implies that

$$\begin{aligned} \partial f_A(x)\subset A^TJ_{\alpha }(Ax)-f_A(x)J_{\beta }(x). \end{aligned}$$
(10)

\(J_\alpha \) is single valued since \(\Vert \cdot \Vert _{\alpha }\) is differentiable. Hence, \(0\in \partial f_A(x)\) implies that \(f_A(x)^{-1}A^TJ_{\alpha }(Ax)\in J_{\beta }(x)\). Now, as \(\Vert \cdot \Vert _{\beta ^*}\) is differentiable, we have that, for the duality mapping \({\mathcal {J}}_\beta \), it holds \(y \in \mathcal J_\beta (x)\) if and only if \(x = {\mathcal {J}}_{\beta ^*}(y)\) (c.f. [7, Prop. 4.7]). It follows, with (8), that \( \lambda J_{\beta ^*}(A^TJ_\alpha (Ax))= x\) with \(\lambda >0\). Finally, as \(\Vert J_{\beta ^*}(A^TJ_\alpha (Ax)) \Vert _{\beta }=1=\Vert x \Vert _{\beta }\), we have \(\lambda = 1\) which implies that \({\mathcal {S}}_A(x)=x\).

Now, suppose that x is a fixed point of \({\mathcal {S}}_A\). Then, we have \(J_{\beta }({\mathcal {S}}_A(x))=J_{\beta }(x)\). Again, by [7, Prop. 4.7] and (8), we deduce the existence of \(\lambda >0\) such that \( \lambda \,A^TJ_{\alpha }(Ax)\in J_{\beta }(x)\). The definition of \(J_{\beta }\) implies that \(\left\langle x , \lambda \,A^TJ_{\alpha }(Ax) \right\rangle = \Vert x \Vert _{\beta }=1\) and thus \(\lambda ^{-1}=\left\langle Ax , AJ_{\alpha }(Ax) \right\rangle =f_A(x)\). It follows that \(0\in A^TJ_{\alpha }(Ax)-f_A(x)J_{\beta }(x)\). If \(\Vert \cdot \Vert _{\beta }\) is differentiable, then \(f_A\) is differentiable at x and the sets in (10) are equal (and singletons). It follows that \(0\in \partial f_A(x)\). \(\square \)

Lemma 3

Let \(A\in \mathbb {R}^{m\times n}\) be a matrix with nonnegative entries and P, a part of \(\mathbb {R}^n_+\) such that \(A^TAx\in P\) for every \(x\in P\). Furthermore, let \(\Vert \cdot \Vert _{\alpha }\) and \(\Vert \cdot \Vert _{\beta }\) satisfy Assumption 1. If \(\kappa _H({\mathcal {S}}_A)\le \tau <1\), then \({\mathcal {S}}_A\) has a unique fixed point z in P and for every positive integer k and every \(x\in P\), it holds

$$\begin{aligned} \Vert {\mathcal {S}}^k_{A}(x)-z \Vert _{\infty }\le \tau ^k \, (r/(1-\tau )) \, d_H(x,{\mathcal {S}}_{A}(x)) , \end{aligned}$$

where \(r=\inf \{t>0: x_i\le t\ \forall i=1,\ldots ,n, \ x\in P, \Vert x \Vert _{\beta }=1\}\).

Proof

By assumption \({\mathcal {S}}_A\) is a strict contraction on the metric space \((\mathbb {M},d_H)\) where \(\mathbb {M}=P\cap \{x \in \mathbb {R}^n_+: \Vert x \Vert _{\beta }=1\}\). As \((\mathbb {M},d_H)\) is complete by Lemma 1, it follows from the Banach fixed point theorem (see for instance Theorem 3.1 in [29]) that \({\mathcal {S}}_A\) has a unique fixed point z in \(\mathbb {M}\) and for every \(y\in \mathbb {M}\) it holds

$$\begin{aligned} d_H({\mathcal {S}}^k_A(y),z) \le \frac{\tau ^k}{1-\tau } d_H(y,{\mathcal {S}}_A(y)). \end{aligned}$$

As \({\mathcal {S}}_A(\lambda y)={\mathcal {S}}_A(y)\) and \(d_H(\lambda y, {\mathcal {S}}_A(y))=d_H(y, {\mathcal {S}}_A(y))\) for every \(\lambda >0\), the convergence rate is a direct consequence of the above inequality and Lemma 1. \(\square \)

We remark that this result does not guarantee that the unique fixed point z of \({\mathcal {S}}_A\) in P is a global maximizer of \(f_A\) and in fact this is not always true. Indeed, if A is a \(2\times 2\) diagonal matrix which is not a multiple of the identity and \(\Vert \cdot \Vert _\alpha =\Vert \cdot \Vert _2\), \(\Vert \cdot \Vert _{\beta }=\Vert \cdot \Vert _3\), then \(\kappa _H({\mathcal {S}}_A)\le 1/2\) and \({\mathcal {S}}_A\) leaves all the parts of \(\mathbb {R}^2_+\) invariant but some of them do not contain a global maximizer of \(f_A\). Moreover, as \(\mathbb {R}^n_+\) has \(2^n\) parts, testing each part of the cone is computationally too expensive for large n. Therefore, in the remaining part of the section, we derive conditions in order to ensure that the power iterates converge to a global maximizer of \(f_A\).

Lemma 4

Let \(A\in \mathbb {R}^{m\times n}\) be a matrix with nonnegative entries and let \(\Vert \cdot \Vert _\alpha \), \(\Vert \cdot \Vert _\beta \) satisfy Assumption 1. Then it holds \(f_A(x)\le f_A(|x|)\) for any \(x\in \mathbb {C}^n{\setminus }\{0\}\) and the maximum of \(f_A\) is attained in \(\mathbb {R}^n_+\).

Proof

Let \(x\ne 0\), since A has nonnegative entries, it holds \(|Ax|\le A|x|\). Thus, as monotonic norms are also absolute, we have

$$\begin{aligned} f_A(x)=\frac{\Vert Ax \Vert _{\alpha }}{\Vert x \Vert _{\beta }}= \frac{\Vert |Ax| \Vert _{\alpha }}{\Vert |x| \Vert _{\beta }}\le \frac{\Vert A|x| \Vert _{\alpha }}{\Vert |x| \Vert _{\beta }}=f_A(|x|). \end{aligned}$$

Now, if y is a global maximizer of \(f_A\), then \(f_A(y)\le f_A(|y|)\le f_A(y)\) which concludes the proof. \(\square \)

In the forthcoming Lemma 6, we use the strong monotonicity required in Point (1) of Assumption 1 to prove that if \(A^TA\) is irreducible, then the nonnegative maximizer of Lemma 4 has positive entries. To this end, however, we need one additional preliminary result that characterizes strongly monotonic norms in terms of the zero pattern of J and which we prove in the following:

Lemma 5

Let \(\Vert \cdot \Vert _{\gamma }\) be a differentiable monotonic norm on \(\mathbb {R}^n\), then \(\Vert \cdot \Vert _{\gamma }\) is strongly monotonic if and only if \(x\sim J_{\gamma }(x)\) for every \(x\in \mathbb {R}^n_+\).

Proof

Suppose that \(\Vert \cdot \Vert _{\gamma }\) is strongly monotonic. Let \(x\in \mathbb {R}^n_{+}\). If \(x=0\), \(J_{\gamma }(0)=0\) by construction. Suppose that \(x\ne 0\). We use the strong monotonicity to prove the existence of \(c>0\) such that \(c\,x\le J_{\gamma }(x)\). Let i be such that \(x_i>0\) and define \(f(t)=\Vert x+(t-x_i)e_i \Vert _{\gamma }\) for all \(t>0\). Then, f is differentiable and \(f'(t)=J_{\gamma }(x+(t-x_i)e_i)_i\) for all \(t>0\). Furthermore, f is strictly increasing on \((0,\infty )\) since \(\Vert \cdot \Vert \) is strongly monotonic. It follows that \(J_{\gamma }(x)_i = f'(x_i) >0\). As this is true for all i such that \(x_i>0\), we conclude that there exists \(c >0\) such that \(c\,x\le J_{\gamma }(x)\). The existence of \(C >0\) such that \(J_{\gamma }(x)\le C\,x\) follows from Proposition 5.2 of [7, Chapter 1]. Hence, we have \(J_{\gamma }(x)\sim x\).

For the reverse implication, suppose that \(J_{\gamma }(x)\sim x\) for all \(x\in \mathbb {R}^n_+\). Let \(x,y\in \mathbb {R}^n_+\) be such that \(x\le y\) and \(x\ne y\). If \(x=0\), then \(\Vert x \Vert _{\gamma }=0<\Vert y \Vert _{\gamma }\). Suppose that \(x\ne 0\). As \(x\le y\) and \(x\ne 0\), there exists i and \(t_0>0\) such that \(x+te_i \le y\) for all \(t\in (0,t_0)\). For \(t\in (0,t_0)\), we have

$$\begin{aligned} \Vert y \Vert _{\gamma }&\ge \Vert x+\tfrac{1}{2}(t_0+t)e_i \Vert _{\gamma }\\&\ge \Vert x+\tfrac{t_0}{2}e_i \Vert _{\gamma }+\left\langle J_{\gamma }\left( x+\tfrac{t_0}{2}e_i\right) , \tfrac{t}{2}e_i \right\rangle \ge \Vert x \Vert _{\gamma }+\tfrac{t}{2}J_{\gamma }\left( x+\tfrac{t_0}{2}e_i\right) _i, \end{aligned}$$

where the second inequality follows from the convexity of \(\Vert \cdot \Vert _{\gamma }\). By assumption, we have \(J_{\gamma }\left( x+\tfrac{t_0}{2}e_i\right) \sim x+\tfrac{t_0}{2}e_i\) and thus \(J_{\gamma }\left( x+\tfrac{t_0}{2}e_i\right) _i>0\). It follows that \(\Vert y \Vert _{\gamma } > \Vert x \Vert _{\gamma }\), i.e. \(\Vert \cdot \Vert _{\gamma }\) is strongly monotonic. \(\square \)

Lemma 6

Let \(\Vert \cdot \Vert _{\alpha }\) and \(\Vert \cdot \Vert _{\beta }\) satisfy Assumption 1. Let A be a matrix with nonnegative entries and suppose that \(A^TA\) is irreducible. Then, \({\mathcal {S}}_A(x)\) is positive for every positive x and every nonnegative critical point of \(f_A\) is positive.

Proof

Lemma 5 implies that \({\mathcal {S}}_A(x) \sim A^TA x\). It follows that \({\mathcal {S}}_A\) maps positive vectors to positive vectors since the irreducibility of \(A^TA\) implies that \(A^TA\) is positive for all positive x. Finally, note that \(A^TA\) is symmetric positive semi-definite and therefore all its eigenvalues are nonnegative. It follows that \(A^TA\) is primitive (see e.g. Theorem 1 in [40]). By the same theorem, there exists a positive integer k such that \((A^TA)^k\) is a matrix with positive entries. Since \({\mathcal {S}}^{k}_A(x)\sim (A^TA)^{k}x\) for every \(x\in \mathbb {R}^n_+{\setminus }\{0\}\), we deduce that \({\mathcal {S}}^{k}_A(x)\) is strictly positive for every nonzero, nonnegative x. Finally, suppose that \(y\in \mathbb {R}^n_+\) is a critical point of \(f_A\), then y is a fixed point of \({\mathcal {S}}_A\) by Lemma 2 and thus \(y={\mathcal {S}}_A^k(y)\) is strictly positive. \(\square \)

We are now ready to state our main theorem of this section. This theorem provides conditions on A, \(\Vert \cdot \Vert _{\alpha }\) and \(\Vert \cdot \Vert _{\beta }\) that ensure the existence of a unique positive maximizer \(x^+\) such that \(\Vert Ax^+\Vert _\beta /\Vert x^+\Vert _\alpha =\Vert A\Vert _{\beta \rightarrow \alpha }\) and that govern the convergence of the power sequence

$$\begin{aligned} x_0 = x_0 /\Vert x_0 \Vert _{\beta }, \quad x_{k+1} = J_{\beta ^*}(A^TJ_\alpha (Ax_k))\quad \text {for}\quad k=0,1,2,3,\ldots \end{aligned}$$
(11)

to such \(x^+\). As announced, this result is essentially a fixed point theorem for \({\mathcal {S}}_A\) and thus the Birkhoff contraction ratio \(\kappa _H({\mathcal {S}}_A)\) and any \(\tau \) that well-approximate \(\kappa _H({\mathcal {S}}_A)\) from above play a central role.

Theorem 4

Let \(A\in \mathbb {R}^{m\times n}\) be a matrix with nonnegative entries and suppose that \(A^TA\) is irreducible. Let \(\Vert \cdot \Vert _{\alpha }\) and \(\Vert \cdot \Vert _{\beta }\) satisfy Assumption 1.

If \(\kappa _H({\mathcal {S}}_A)\le \tau <1\), then:

  1. 1.

    \(f_A\) has a unique critical point \(x^+\) in \(\mathbb {R}^n_+\). Moreover, \(f_A(x^+)=\Vert A \Vert _{\beta \rightarrow \alpha }\) and \(x^+\) is positive.

  2. 2.

    If \(x_0\) is positive and \(x_{k+1}={\mathcal {S}}_A(x_k)\) is the power sequence, then

    $$\begin{aligned} \Vert x_k-x^+ \Vert _{\infty }\le \tau ^k\,C\, \quad \text {with}\quad C=\max _{i=1,\ldots ,n}\frac{d_H(x_0,x_1)}{(1-\tau )\Vert e_i \Vert _{\beta }} \end{aligned}$$

    where \(e_1,\ldots ,e_n\) is the canonical basis of \(\mathbb {R}^n\). Furthermore, it holds

    $$\begin{aligned} (1-\tau ^k\,{\widetilde{C}})\Vert A \Vert _{\beta \rightarrow \alpha } \le \Vert Ax_k \Vert _\alpha \le \Vert A \Vert _{\beta \rightarrow \alpha } \end{aligned}$$

    with \({\widetilde{C}}= C\,\max _{x\ne 0}\tfrac{\Vert x \Vert _{\alpha }}{\Vert x \Vert _{\infty }}\). In particular, \(x_k\rightarrow x^+\) as \(k\rightarrow \infty \).

Proof

Lemma 4 implies that \(f_A\) has a maximizer \(x^+\in \mathbb {R}^n_+\). Lemma 6 implies that \(x^+\) is positive and that the interior of \(\mathbb {R}^n_+\) is left invariant by \({\mathcal {S}}_A\). Hence, all statements except the bounds on \(\Vert Ax_k \Vert _{\alpha }\) follow by a direct application of Lemma 3 and Eq. (4). We conclude with a proof of the estimates for \(\Vert Ax_k \Vert _{\alpha }\). Clearly, \(\Vert Ax_k \Vert _{\alpha }\le \Vert A \Vert _{\beta \rightarrow \alpha }\) always hold. For the lower bound, let \(\gamma = \max _{x\ne 0}\tfrac{\Vert x \Vert _{\beta }}{\Vert x \Vert _{\infty }}\). The estimate on \(\Vert x_k-x^+ \Vert _{\infty }\) implies that

$$\begin{aligned} \Vert A \Vert _{\beta \rightarrow \alpha }-\Vert Ax_k \Vert _{\alpha }&= \Vert Ax^+ \Vert _{\alpha }-\Vert Ax_k \Vert _{\alpha }\le \Vert A(x^+-x_k) \Vert _{\alpha }\\&\le \Vert A \Vert _{\beta \rightarrow \alpha }\Vert x^+-x_k \Vert _{\beta } \le \gamma \,\Vert A \Vert _{\beta \rightarrow \alpha }\Vert x^+-x_k \Vert _{\infty }\le \tau ^k\,C\,\gamma \,\Vert A \Vert _{\beta \rightarrow \alpha } \end{aligned}$$

which concludes the proof. \(\square \)

Note that the condition that requires \(A^TA\) to be irreducible is in general weaker than requiring the initial matrix A to be irreducible itself, as \(A^TA\) may be irreducible even if A is reducible. This is also observed in the numerical examples in Sect. 6.

Theorem 4 holds for any upperbound \(\tau \) of \(\kappa _H({\mathcal {S}}_A)\) and a somewhat natural choice for such a \(\tau \) is the following

$$\begin{aligned} \tau ({\mathcal {S}}_A) = \kappa _H(A^T)\kappa _H(J_{\beta ^*})\kappa _H(A)\kappa _H(J_{\alpha }). \end{aligned}$$
(12)

This coefficient is particularly useful in practice as, thanks to the Birkhoff–Hopf theorem, in many circumstances one can provide explicit bounds for \(\tau ({\mathcal {S}}_A)\). Although in principle \(\tau ({\mathcal {S}}_A)\) can be larger than \(\kappa _H({\mathcal {S}}_A)\), in the forthcoming Sect. 4.2 we show that there are cases where the equality \(\tau ({\mathcal {S}}_A)=\kappa _H({\mathcal {S}}_A)\) holds. Moreover, we discuss the sharpness of the condition \(\kappa _H({\mathcal {S}}_A)<1\) required by our main result. In the following Sect. 4.1, instead, we discuss the particular case where \(\Vert \cdot \Vert _{\alpha },\Vert \cdot \Vert _\beta \) are \(\ell ^p\) norms and we give examples showing how Theorem 4 improves the existing theory for this problem.

4.1 Examples and Comparison with Previous Work

When \(\Vert \cdot \Vert _{\alpha }\) and \(\Vert \cdot \Vert _{\beta }\) are \(\ell ^p\) norms, Theorem 4 implies the following:

Corollary 1

Let \(A\in \mathbb {R}^{m\times n}\) be a matrix with nonnegative entries and suppose that \(A^TA\) is irreducible. Let \(1<p,q<\infty \) and consider

$$\begin{aligned} \Vert A \Vert _{q\rightarrow p}=\max _{x\ne 0}\frac{\Vert Ax \Vert _p}{\Vert x \Vert _q}, \quad \text {and}\quad \tau = \kappa _H(A)\kappa _H(A^T)\frac{p-1}{q-1}. \end{aligned}$$

If \(\tau <1\), then \(\Vert A \Vert _{q\rightarrow p}\) can be approximated to an arbitrary precision with the fixed point iteration (2).

In the case of \(\ell ^p\) norms, both Theorem 1 and Corollary 1 apply. In order to compare them let us compute the Birkhoff contraction ratio for some simple but explanatory cases. Let \(\varepsilon \ge 0\) and \(A\in \mathbb {R}^{3\times 2}\), \(B\in \mathbb {R}^{2\times 2}\), \(C\in \mathbb {R}^{3\times 3}\) be defined as

$$\begin{aligned} A=\begin{bmatrix} 1 &{} 2 \\ 3 &{} 4 \\ 0 &{} 0 \end{bmatrix}, \quad B = \begin{bmatrix} \varepsilon &{} 1 \\ 1 &{} \varepsilon \end{bmatrix}, \quad C = \begin{bmatrix} 0 &{} 1 &{} 1 \\ 2 &{} 2 &{}2 \\ 3 &{} 3 &{} 0 \end{bmatrix}. \end{aligned}$$

Due to Theorem 3, it is easy to see that

$$\begin{aligned}&\kappa _H(A) =\tanh (3/8)\le 9/25, \\&\kappa _H(A^T)= \tanh (1/16)\le 1/16,\\&\kappa _H(B)=\kappa _H(B^T)=(1-\varepsilon )^2/(1+\varepsilon )^2\\&\kappa _H(C)=\kappa _H(C^T)= 1\, . \end{aligned}$$

Note that \(A^TA\) and \(C^TC\) are positive matrices and \(B^TB\) is positive if and only if \(\varepsilon >0\). If \(\varepsilon =0\), then \(B^T B\) is the identity matrix. We first discuss the implications of Theorem 1 for the computation of \(\Vert X \Vert _{q\rightarrow p}\) where \(X\in \{A,B,C\}\).

If \(p\le q\) and \(\varepsilon >0\), then Theorem 1 implies that \(f_X\) has a unique positive maximizer \(x^+\), which is global, and the power sequence (11) will converge to \(x^+\). However, if \(\varepsilon =0\) then Theorem 1 ensures that every positive critical point of \(f_B\) is a global maximizer but uniqueness and convergence are only guaranteed under the assumption \(p<q\). Now, we look at the implications of Theorem 4. By noting that \(\kappa _H(J_p) = p-1\) and \(\kappa _H(J_{q^*}) = 1/(q-1)\), we have

$$\begin{aligned} \tau ({\mathcal {S}}_A)\le \frac{9}{400}\,\, \frac{p-1}{q-1}, \quad \tau ({\mathcal {S}}_B) =\left( \frac{1-\varepsilon }{1+\varepsilon }\right) ^2\frac{p-1}{q-1},\quad \tau ({\mathcal {S}}_C)=\frac{p-1}{q-1}. \end{aligned}$$

Hence, for instance, uniqueness and global maximality of a positive maximizer of \(f_A\) is guaranteed by Theorem 4 under the assumption \(9(p-1)<400(q-1)\) which includes the known global convergence range of values \(p<q\), but is of course a much weaker assumption.

Now, note that for \(\varepsilon \ge 1\) we have \(\tau ({\mathcal {S}}_B)<1\) if and only if \((\varepsilon -1)^2(p-1)<(\varepsilon +1)^2(q-1)\). This assumption is less restrictive than \(p\le q\) for every \(\varepsilon \ge 1\) as \(p\le q\) correspond to the asymptotic case \(\varepsilon \rightarrow \infty \). If \(\varepsilon =1\), Theorem 4 applies for every \(1<p,q<\infty \). The analysis for \(0<\varepsilon < 1\) is similar. However, we note that if \(\varepsilon =0\), then Theorem 4 does not provide any information about \(f_B\) for the case \(p=q\) in contrast with Theorem 1. When \(\varepsilon =0\) and \(p<q\), both theorems imply the same result. Finally, note that \(\tau ({\mathcal {S}}_C)<1\) if and only if \(p<q\) and so Theorem 1 is more useful as it also covers the case \(p=q\).

More in general, when the considered matrix A has finite projective diameter \(\triangle (A)\), then Theorem 2 implies that \(\kappa _H(A)<1\) and thus Theorem 4 ensures that for any \(p>1\), the matrix norm \(\Vert A\Vert _{q\rightarrow p}\) can be approximated in polynomial time to an arbitrary precision for any choice of \(q>\kappa _H(A)^2(p-1)+1\), without the requirement \(q>p\).

Figure 1 shows that the value of \(\kappa _H(A)\) for matrices with positive entries is often substantially smaller than one, enhancing the relevance of Theorem 4.

Fig. 1
figure 1

Each line shows the distribution of \(\kappa _H(A)\) over 1000 random matrices \(A\in \mathbb {R}_+^{10\times 10}\) with entries between k and 10. Different curves correspond to different values of \(k \in \{1,2,\ldots , 5\}\)

4.2 On the Sharpness of the New Convergence Condition

As we observed earlier, the key property behind the global convergence of the power iterates relies on the fact that, when \(\kappa _H({\mathcal {S}}_A)<1\), the mapping \({\mathcal {S}}_A\) has a unique positive fixed point \(x^+\). Due to Lemma 2, this is equivalent to observing that, in this case, \(x^+\) is the unique positive critical point of \(f_A\), up to scalar multiples. In what follows we show that this is not anymore the case if \(\kappa _H({\mathcal {S}}_A)>1\). In particular, we limit our attention to the case of \(\ell ^p\) norms and we exhibit a one-parameter family of \(2\times 2\) positive and symmetric matrices \(A_\varepsilon \) for which a unique positive critical point of \(f_{A_\varepsilon }\) exists if and only if \(\kappa _H({\mathcal {S}}_{A_\varepsilon })\le 1\). Moreover, we show that for such a family of matrices it holds \(\tau ({\mathcal {S}}_A)=\kappa _H({\mathcal {S}}_A)\) where \(\tau ({\mathcal {S}}_A)\) is the estimate of \(\kappa _H({\mathcal {S}}_A)\) discussed in Eq. (12). As \(f_A\) is scale invariant, here and in the rest of this section, uniqueness of the critical point is meant up to scalar multiples.

For \(\varepsilon >0\) and \(p,q\in (1,\infty )\), let \(A_{\varepsilon }\in \mathbb {R}^{2\times 2}\) and \(f_{A_\varepsilon }:\mathbb {R}^{2}\rightarrow \mathbb {R}_+\) be defined as

$$\begin{aligned} A_{\varepsilon }=\begin{bmatrix} \varepsilon &{} 1 \\ 1 &{} \varepsilon \end{bmatrix} \quad \text {and}\quad f_{A_\varepsilon }(x)=\frac{\Vert A_{\varepsilon }x\Vert _p}{\Vert x\Vert _q}. \end{aligned}$$
(13)

The main result of this section is the following theorem, whose proof is postponed to the end of the section

Theorem 5

It holds \(\kappa _H({\mathcal {S}}_{A_{\varepsilon }})=\tau ({\mathcal {S}}_{A_{\varepsilon }})\). Furthermore, \(f_{A_\varepsilon }\) has a unique critical point in \(\mathbb {R}_+^2\) if and only if \(\tau ({\mathcal {S}}_{A_\varepsilon })\le ~1\).

This result shows that, unlike the previous Theorem 1, Theorem 4 is tight in the sense that when \(\kappa _H({\mathcal {S}}_A)>1\) there might be multiple distinct fixed points of \({\mathcal {S}}_A\) in \(\mathbb {R}^2_+\), and thus convergence of the power sequence to a prescribed fixed point cannot be ensured globally without restrictions on the starting point \(x_0\in \mathbb {R}^2_+\).

We subdivide the proof of Theorem 5 above into a number of preliminary results. Before proceeding, we recall that for \(p\in (1,\infty )\), \(\Phi _p:\mathbb {R}^n\rightarrow \mathbb {R}^n\) is entrywise defined as \(\Phi _p(x)_i = |x_i|^{p-2}x_i\) for all i. We compute \(\tau ({\mathcal {S}}_{A_\varepsilon })\) and \(\kappa _H({\mathcal {S}}_A)\).

Lemma 7

For every \(\varepsilon >0\), we have \(\kappa _H({\mathcal {S}}_{A_{\varepsilon }})=\tau ({\mathcal {S}}_{A_{\varepsilon }})= \big (\frac{1-\varepsilon }{1+\varepsilon }\big )^2\frac{p-1}{q-1}.\)

Proof

As \(\kappa _H(A)=\big | \frac{\varepsilon -1}{1+\varepsilon }\big |\) by Theorem 3, we have \(\tau ({\mathcal {S}}_{A_\varepsilon })=\big (\frac{1-\varepsilon }{1+\varepsilon }\big )^2 \frac{p-1}{q-1}\). Now, we show that \(\kappa _H({\mathcal {S}}_{A_\varepsilon })=\tau ({\mathcal {S}}_{A_\varepsilon })\). Clearly, \(\kappa _H({\mathcal {S}}_{A_\varepsilon })\le \tau ({\mathcal {S}}_{A_\varepsilon })\), for the reverse inequality consider \(x=(1,1)^T\) and \(y(t)=(1,t)^T\). Furthermore, define \(h:(1,\infty )\rightarrow \mathbb {R}\) as

$$\begin{aligned} h(t) = \frac{d_H\big ({\mathcal {S}}_{A_\varepsilon }(x),{\mathcal {S}}_{A_\varepsilon }(y(t))\big )}{d_H\big (x,y(t)\big )}. \end{aligned}$$

Then, we have \(h(t)\le \kappa _H({\mathcal {S}}_{A_\varepsilon })\) for every \(t>0\). To conclude the proof, we show that \(\lim _{t\rightarrow 1^+}h(t)=\tau ({\mathcal {S}}_{A_{\varepsilon }})\). A direct computation shows that \(d_H\big (x,y(t)\big )=\ln (t)\) and \(A_{\varepsilon }\Phi _p(A_{\varepsilon }x) =(1+\varepsilon )^p(1,1)^T\). Recalling that \({\mathcal {S}}_{A_{\varepsilon }}(z)= \Phi _{q^*}(A_{\varepsilon }\Phi _p(A_{\varepsilon }z))\), we have

$$\begin{aligned} d_H({\mathcal {S}}_{A_\varepsilon }(x),{\mathcal {S}}_{A_\varepsilon }(y(t))) =(q^*-1)d_H(A_{\varepsilon }\Phi _p(A_{\varepsilon }x), A_{\varepsilon }\Phi _p(A_{\varepsilon }y(t))). \end{aligned}$$

So if we let \(f_1,f_2:(1,\infty )\rightarrow \mathbb {R}\) be such that \(A_{\varepsilon }\Phi _p(A_{\varepsilon }y(t))=\big (f_1(t),f_2(t)\big )^T\) for all \(t>1\), we get

$$\begin{aligned} \exp \Big ((q-1) d_H({\mathcal {S}}_{A_\varepsilon }(x),{\mathcal {S}}_{A_\varepsilon }(y(t)))\Big )=\max \left\{ \frac{f_1(t)}{f_2(t)},\frac{f_2(t)}{f_1(t)}\right\} . \end{aligned}$$

With

$$\begin{aligned} g(t)=\frac{f_1(t)}{f_2(t)}=\frac{\varepsilon (t+\varepsilon )^{p-1}+(t \varepsilon +1)^{p-1}}{(t+\varepsilon )^{p-1}+\varepsilon (t \varepsilon +1)^{p-1}}, \end{aligned}$$

the above computations, imply

$$\begin{aligned} (q-1)\lim _{t\rightarrow 1^+}h(t)=\lim _{t\rightarrow 1^+} \frac{\max \{\ln (g(t)),-\ln (g(t))\}}{\ln (t)}=\Big |\lim _{t\rightarrow 1^+}\frac{\ln (g(t))}{\ln (t)}\Big |, \end{aligned}$$

where the last equality follows by continuity. As \(\ln (1)=\ln (g(1))=0\), L’Hopital’s rule implies that

$$\begin{aligned} \lim _{t\rightarrow 1^+}\frac{\ln (g(t))}{\ln (t)}= \lim _{t\rightarrow 1^+}\frac{t \,g'(t)}{g(t)}=\lim _{t\rightarrow 1^+}-\frac{(p-1) t \left( \varepsilon ^2-1\right) ^2 (t+\varepsilon )^p (t \varepsilon +1)^p}{\zeta _1(t) \zeta _2(t)} \end{aligned}$$

where

$$\begin{aligned} \zeta _1(t)=\left( t \varepsilon ^2 (t+\varepsilon )^p+t (t \varepsilon +1)^p+\varepsilon \left( (t+\varepsilon )^p+(t \varepsilon +1)^p\right) \right) \end{aligned}$$

and

$$\begin{aligned} \zeta _2(t)=\left( \varepsilon ^2 (t \varepsilon +1)^p+(t+\varepsilon )^p+t \varepsilon \left( (t+\varepsilon )^p+(t \varepsilon +1)^p\right) \right) . \end{aligned}$$

As \(\zeta _1(1)\zeta _2(1)=(1+\varepsilon )^{2p}(1+\varepsilon )^4\), after rearrangement, we finally obtain

$$\begin{aligned} \lim _{t\rightarrow 1^+}h(t)=\left| \frac{(p-1) \left( \varepsilon ^2-1\right) ^2 (1+\varepsilon )^{2p}}{(q-1)\zeta _1(1) \zeta _2(1)}\right| =\tau ({\mathcal {S}}_{A_{\varepsilon }}), \end{aligned}$$

which implies \(\tau ({\mathcal {S}}_{A_{\varepsilon }})\le \kappa _H({\mathcal {S}}_{A_{\varepsilon }})\) and thus concludes the proof. \(\square \)

Now, we prove that the nonnegative critical points of \(f_{A_\varepsilon }\) are positive and we then characterize them in terms of a real parameter t. As critical points are defined up to multiples, we restrict our attention to the line \(\{x\in \mathbb {R}^2:x_1+x_2=1\}\).

Lemma 8

Let \(x\in \mathbb {R}^2_+\) with \(x_1+x_2=1\). Then x is a critical point of \(f_{A_\varepsilon }\) if and only if there exists \(t\in (0,1)\) such that \(x=(t,1-t)^T\) and \(\psi (t)=\psi (1-t)\) where \(\psi :[0,1]\rightarrow \mathbb {R}_{+}\) is defined as

$$\begin{aligned} \psi (t)=t^{q-1} \big [(t\varepsilon +1-t)^{p-1}+\varepsilon (\varepsilon +t-t\varepsilon )^{p-1}\big ]. \end{aligned}$$
(14)

Proof

As we already observed, \(f_{A_\varepsilon }\) attains a global maximum in \(\mathbb {R}^2_{+}\). Furthermore, the critical points of \(f_{A_\varepsilon }\) satisfy

$$\begin{aligned} A_\varepsilon \Phi _p( A_\varepsilon x)=\lambda \Phi _q(x)\quad x\in \mathbb {R}^2{\setminus } \{0\}. \end{aligned}$$
(15)

As \(A_{\varepsilon }\) is positive, (15) implies that every nonnegative critical point of \(f_{A_\varepsilon }\) is positive. It follows that, for positive vectors x, (15) is equivalent to

$$\begin{aligned} {\left\{ \begin{array}{ll} \big (A_\varepsilon \Phi _p( A_\varepsilon x)\big )_1\, x_2^{q-1}= \big (A_\varepsilon \Phi _p( A_\varepsilon x)\big )_2\, x_1^{q-1} &{} \\ \lambda =(A_\varepsilon \Phi _p( A_\varepsilon x))_1 / x_1^{q-1} &{} \end{array}\right. } \end{aligned}$$
(16)

Thus, \(x_1+x_2=1\) and \(x_1,x_2>0\) imply the existence of \(t\in (0,1)\) such that \(x_1=t\) and \(x_2=1-t\). Substituting \(x=(t,1-t)^T\) in (16) we finally obtain the claimed result. \(\square \)

A direct consequence of Lemma 8 is that \((1,1)^{T}/2\) is a critical point of \(f_{A_\varepsilon }\). Moreover, by symmetry, we see that \((t,1-t)^T\) is a critical point of \(f_{A_\varepsilon }\) if and only if \((1-t,t)^T\) is also a critical point. This observation implies the following

Lemma 9

If \(\tau ({\mathcal {S}}_{A_{\varepsilon }})>1\), then \(f_{A_\varepsilon }\) has at least three distinct positive critical points.

Proof

Note that if \(\tau ({\mathcal {S}}_{A_{\varepsilon }})>1\), then \( \big (\frac{1+\varepsilon }{1-\varepsilon }\big )^2<\frac{p-1}{q-1}\). Let \(h:[0,1]\rightarrow \mathbb {R}\) be defined as \(h(t)=\psi (1-t)-\psi (t)\), where \(\psi \) is defined as in (14). The critical points of \(f_{A_\varepsilon }\) correspond to zeros of h in (0, 1/2]. Indeed, by Lemma 8, we know that these points are in bijection with the zeros of h on (0, 1) and \(h(t)=-h(1-t)\) for every \(t\in (0,1)\). We have already observed that \(h(t_0)=0\) with \(t_0=1/2\). We now show that there exists \(t_1\in (0,t_0)\) such that \(h(t_1)=0\). The existence of such \(t_1\) implies that \((t_1,1-t_1)^T,(1-t_1,t_1)^T,(t_0,t_0)^{T}\) are three distinct positive critical points of \(f_{A_\varepsilon }\), since \(h(1-t_1)=h(t_1)=0\). To construct \(t_1\), we first prove that our assumption \(\tau ({\mathcal {S}}_{A_{\varepsilon }})>1\) is equivalent to the condition \(h'(t_0)>0\). We have

$$\begin{aligned} \psi '(t)&=(q-1)t^{q-2}\big [(t\varepsilon +1-t)^{p-1}+\varepsilon (\varepsilon +t-t\varepsilon )^{p-1}\big ]\\&\quad +(p-1)t^{q-1}(1-\varepsilon )\big [\varepsilon (\varepsilon +t-t\varepsilon )^{p-2}-(t\varepsilon +1-t)^{p-2}\big ]. \end{aligned}$$

With \((\varepsilon +t_0-t_0\varepsilon )=(t_0\varepsilon +1-t_0)=(\varepsilon +1)/2\) we get

$$\begin{aligned} \psi '(t_0)&=(q-1)2^{2-q}(1+\varepsilon )\Big (\frac{\varepsilon +1}{2}\Big )^{p-1} +(p-1)2^{1-q}(1-\varepsilon )(\varepsilon -1)\Big (\frac{\varepsilon +1}{2}\Big )^{p-2}\\&= 2^{3-q-p}(1+\varepsilon )^{p-2}\Big [(q-1)(1+\varepsilon )^2-(p-1) (1-\varepsilon )^2\Big ]. \end{aligned}$$

As \(h'(t_0)=-\psi '(t_0)-\psi '(1-t_0)=-2\psi '(t_0)\), we have \(h'(t_0)>0\) if and only if \((q-1)(1+\varepsilon )^2<(p-1)(1-\varepsilon )^2\) i.e. \(h'(t_0)>0\) if and only if \(\tau ({\mathcal {S}}_{A_{\varepsilon }})>1\).

Now, as \(h'(t_0)>0\), there exists a neighborhood U of \(t_0\) such that h is strictly increasing on U. Since \(h(t_0)=0\), this implies that there exists \(s\in (0,t_0)\cap U\) such that \(h(s)<0\). As \(\lim _{t\rightarrow 0}h(t)=\varepsilon ^{p-1}+\varepsilon >0\), the intermediate value theorem implies the existence of \(t_1\in (0,s)\) such that \(h(t_1)=0\). As observed above, this concludes the proof. \(\square \)

Finally, we address the case \(\tau ({\mathcal {S}}_{A_\varepsilon })=1\).

Lemma 10

If \(\tau ({\mathcal {S}}_{A_\varepsilon })= 1\), then \(f_{A_\varepsilon }\) has a unique nonnegative critical point.

Proof

Let \(F:\mathbb {R}^2_+\rightarrow \mathbb {R}_+^2\) be defined as \(F(x)=\Phi _{q^*}(A_{\varepsilon }\Phi _p(A_{\varepsilon }x))\), where \(q^*=q/(q-1)\) denotes the Hölder conjugate of q. Then, for \(\mathbf {1}=(1,1)^T\) and \(u=\mathbf {1}/2\), we have \(F(u)=\lambda u\) for some \(\lambda >0\). Hence, u is a fixed point of \({\mathcal {S}}_{A_{\varepsilon }}\) and, \(\Vert \cdot \Vert _q\) is differentiable, by Lemma 2, it follows that u is a critical point of \(f_{A_\varepsilon }\). Moreover, it is a fixed point of \(G:D_+\rightarrow D_+\) defined by \(G(x)=\langle {F(x),\mathbf {1}\rangle }^{-1}F(x)\), where \(D_+=\{(t,1-t): t\in [0,1]\}\). Note that the fixed points of G coincide, up to scaling, with those of \({\mathcal {S}}_{A_{\varepsilon }}\). To conclude, we prove that u is the unique fixed point of G.

As \(\tau ({\mathcal {S}}_{A_\varepsilon })=1\), we have \(d_H(G(x),G(y))=d_H(F(x),F(y))\le d_H(x,y)\) and so G is non-expansive with respect to \(d_H\). Now, Theorem 6.4.1 in [31] implies that u is the unique fixed point of G, if

$$\begin{aligned} z-G'(u)z\ne 0\quad \forall z\in \mathbb {R}^2{\setminus } \{0\}\quad \text {with}\quad z_1+z_2=0. \end{aligned}$$

where \(G'(u)\) denotes the Jacobian matrix of G evaluated at u. Moreover, as \(F(u)=\lambda u\), Lemma 6.4.2 in [31] implies that \(F'(u)u=\lambda u\) and

$$\begin{aligned} G'(u)z = \tfrac{1}{\lambda }(F'(u)z-\langle {F'(u)z,\mathbf {1}}\rangle u). \end{aligned}$$

Suppose by contradiction that there exists a \(z\in \mathbb {R}^2{\setminus } \{0\}\) with \(z_1+z_2=0\), such that \(z-G'(u)z=0\). A direct computation shows that \(\langle {z,F'(u)^T u\rangle }=0\). Then,

$$\begin{aligned} 0&= z-G'(u)z = z-\tfrac{1}{\lambda }F'(u)z+ \tfrac{1}{\lambda }\langle {F'(u)z,\mathbf {1}}\rangle u\\&=z-\tfrac{1}{\lambda }F'(u)z+\tfrac{2}{\lambda } \langle {z,F'(u)^Tu}\rangle u=z-\tfrac{1}{\lambda }F'(u)z. \end{aligned}$$

It follows that \(F'(u)z=\lambda z\) and, as \(F'(u)\) is entry-wise positive, the classical Perron–Frobenius theorem implies that \(z=\pm u\). However, \(u_1+u_2>0\) which contradicts the assumption \(z_1+z_2=0\). So \(0 \ne z-G'(u)z\) for every \(z\ne 0\) such that \(z_1+z_2=0\). Hence, u is the unique fixed point of G, which concludes the proof. \(\square \)

Combining the last two lemmas allows us to conclude:

Proof of Theorem 5

Due to Lemmas 9 and 10 we only need to address the case \(\tau (S_{A_{\varepsilon }})< 1\). However this is a direct consequence of Lemma 3. In fact, as \(A_{\varepsilon }\) is entry-wise positive, the nonnegative fixed points of \({\mathcal {S}}_{A_\varepsilon }\) are positive and, if \(\tau (S_{A_{\varepsilon }})<1\), then \({\mathcal {S}}_{A_{\varepsilon }}\) is a strict contraction with respect to \(d_H\) and so it has a unique fixed point which also is the unique positive maximizer of \(f_{A_\varepsilon }\) on \(\mathbb {R}^2_+\). \(\square \)

5 Matrix Norms Induced by Sum of Weighted \(\ell ^p\) Norms

The Birkhoff contraction ratios \(\kappa _H(J_\alpha )\) and \(\kappa _H(J_{\alpha ^*})\) are easy to compute when \(\Vert \cdot \Vert _\alpha \) is a weighted \(\ell ^p\) norm. More precisely, we have the following

Proposition 1

Let \(\Vert x \Vert _{\alpha }=\Vert Dx \Vert _p\) for some \(p\in (1,\infty )\) and some diagonal matrix D with positive diagonal entries, then \(\Vert x \Vert _{\alpha ^*}=\Vert D^{-1}x \Vert _{p*}\) where \(p^*=p/(p-1)\). Furthermore, it holds \(\kappa _H(J_{\alpha })=\kappa _H(J_{\alpha ^*})^{-1}=p-1\).

Proof

The equality \(\Vert x \Vert _{\alpha ^*}=\Vert D^{-1}x \Vert _{p*}\) follows from Theorem 6 below. To conclude, note that \(J_{\alpha }(x)=\Vert Dx \Vert _{p}^{1-p}D^p\Phi _p(x)\) and therefore \(\kappa _H(J_{\alpha })=\kappa _H(\Phi _p)=p-1\). The same argument shows that \(\kappa _H(J_{\alpha ^*})=\kappa _H(\Phi _{p^*})=p^*-1=(p-1)^{-1}\). \(\square \)

While the above Proposition 1 makes the computation of the Birkhoff constant of weighted \(\ell ^p\)-norms particularly easy, computing \(\kappa _H(J_{\alpha })\) or \(\kappa _H(J_{\alpha ^*})\) for a general strongly monotonic norm \(\Vert \cdot \Vert _{\alpha }\) can be a difficult task. There are norms for which an explicit expression in terms of arithmetic operations for \(\Vert \cdot \Vert _{\alpha }\) is given by construction (resp. modelisation), but such an expression is not available for the dual \(\Vert \cdot \Vert _{\alpha _*}\). Examples include \(\Vert x \Vert _{\alpha }=(\Vert x \Vert ^3_{p}+\Vert x \Vert ^3_{q})^{1/3}\) as shown by Theorem 6 below. On the other hand, as discussed in the introduction, monotonic norms different than the standard \(\ell ^p\) norms arise quite naturally in several applications.

Motivated by the above observations, we devote the rest of the section to the study of a particular class of monotonic norms of the form \(\Vert x \Vert _{\alpha }=\Vert \big (\Vert x \Vert _{\alpha _1},\ldots , \Vert x \Vert _{\alpha _d}\big ) \Vert _{\gamma }\) where all the norms are monotonic and where we also allow \(\Vert x \Vert _{\alpha _i}\) to measure only a subset of the coordinates of x.

5.1 Composition of Monotonic Norms and its Dual

Let d be a positive integer. We consider norms of the following form

$$\begin{aligned} \Vert x \Vert _{\alpha }=\Vert \big (\Vert P_1x \Vert _{\alpha _1}, \ldots ,\Vert P_dx \Vert _{\alpha _d}\big ) \Vert _{\gamma } \end{aligned}$$
(17)

where \(\Vert \cdot \Vert _{\gamma }\) is a monotonic norm on \(\mathbb {R}^d\), \(\Vert \cdot \Vert _{\alpha _i}\) is a norm on \(\mathbb {R}^{n_i}\) and \(P_i\in \mathbb {R}^{n_i\times n}\) is a “weight matrix” for all \(i=1,\ldots ,d\). For \(\Vert \cdot \Vert _{\alpha }\) to be a norm, we assume that \(M= [P_1^T,\ldots ,P_d^{T}]^T\in \mathbb {R}^{ (n_1+\ldots +n_d)\times n}\) has rank n. Note that the monotonicity of \(\Vert \cdot \Vert _{\gamma }\) implies that \(\Vert \cdot \Vert _{\alpha }\) satisfies the triangle inequality.

Let us first discuss particular cases of (17). First, note that for two norms \(\Vert \cdot \Vert _{\alpha _1},\Vert \cdot \Vert _{\alpha _2}\) on \(\mathbb {R}^n\), the norm

$$\begin{aligned} \Vert x \Vert _{\alpha _+}=(\Vert x \Vert _{\alpha _1}^p+\Vert x \Vert _{\alpha _2}^p)^{1/p} \end{aligned}$$

can be obtained from (17) with \(d=2\), \(\Vert \cdot \Vert _{\gamma }=\Vert \cdot \Vert _p\), and \(P_1=P_2 = I\), with \(I\in \mathbb {R}^{n\times n}\) being the identity matrix. It is also possible to model norms acting on different coordinates of the vectors. For example, if \((x,y)\in \mathbb {R}^{2n}\), then

$$\begin{aligned} \Vert (x,y) \Vert _{\alpha _{\times }}=(\Vert x \Vert _{\alpha _1}^p +\Vert y \Vert _{\alpha _2}^p)^{1/p} \end{aligned}$$

can be obtained from (17) with \(d=2\), \(\Vert \cdot \Vert _{\gamma }=\Vert \cdot \Vert _p\), \(P_1=\mathrm {diag}(1,\ldots ,1,0,\ldots ,0)\in \mathbb {R}^{2n\times 2n}\) and \(P_2=\mathrm {diag}(0,\ldots ,0,1,\ldots ,1)\in \mathbb {R}^{2n\times 2n}\). The dual of \(\Vert \cdot \Vert _{\alpha _{\times }}\) is discussed in Lemma 11 below and has a particularly elegant description. More complicated weight matrices \(P_i\) can also be used. For example if \({\widetilde{n}}\) is an integer not smaller than n and \(P\in \mathbb {R}^{{\widetilde{n}}\times n}\) has rank n, then the norm

$$\begin{aligned} \Vert x \Vert _{\alpha _P} = \Vert Px \Vert _{p} \end{aligned}$$

can be obtained with \(d=1\), \(\Vert \cdot \Vert _{\gamma }=|\cdot |\), \(\Vert \cdot \Vert _{\alpha _1}=\Vert \cdot \Vert _p\) and \(P_1 = P\). Note that if \({\widetilde{n}} = n\), then P is square and invertible and this property can be used to simplify the evaluation of the dual norm of \(\Vert \cdot \Vert _{\alpha _P}\). Consequences of such additional structure are discussed in Corollary 2.

In the next Theorem 6 we provide a characterization of the dual norm of \(\Vert \cdot \Vert _{\alpha }\) in its general form as defined in (17). We first need the following lemma that addresses the particular case where \(P_1,\ldots ,P_d\) are projections.

Lemma 11

Let \(n_1,\ldots ,n_d\) be positive integers and for \(i=1,\ldots ,d\) let \(\Vert \cdot \Vert _i\) be a norm on \(\mathbb {R}^{n_i}\). Furthermore, let \(\Vert \cdot \Vert _{\gamma }\) be a monotonic norm on \(\mathbb {R}^d\). Let \(V= \mathbb {R}^{n_1}\times \ldots \times \mathbb {R}^{n_d}\) and for all \((u_1,\ldots ,u_d)\in V\) define

$$\begin{aligned} \Vert (u_1,\ldots ,u_d) \Vert _{V}=\Vert \big (\Vert u_1 \Vert _{\alpha _1}, \ldots ,\Vert u_d \Vert _{\alpha _d}\big ) \Vert _{\gamma }. \end{aligned}$$

Then \(\Vert \cdot \Vert _{ V}\) is a norm on V and the induced dual norm \(\Vert \cdot \Vert _{ V^*}\) satisfies

$$\begin{aligned} \Vert (u_1,\ldots ,u_d) \Vert _{ V^*}=\Vert \big (\Vert u_1 \Vert _{\alpha _1^*},\ldots , \Vert u_d \Vert _{\alpha _d^*}\big ) \Vert _{\gamma ^*}\quad \forall (u_1,\ldots ,u_d)\in V. \end{aligned}$$

Proof

The fact that \(\Vert \cdot \Vert _{ V}\) is a norm follows from a direct verification. Let \((u_1,\ldots ,u_d)\in V\). Then, for every \((y_1,\ldots ,y_d)\in V\), we have

$$\begin{aligned} \left\langle (u_1,\ldots ,u_d) , (y_1,\ldots ,y_d) \right\rangle&=\sum _{i=1}^d \left\langle u_i , y_i \right\rangle \le \sum _{i=1}^d \Vert u_i \Vert _{\alpha _i^*}\Vert y_i \Vert _{\alpha _i} \\&\le \Vert \big (\Vert u_1 \Vert _{\alpha _1^*},\ldots , \Vert u_d \Vert _{\alpha _d^*}\big ) \Vert _{\gamma ^*} \Vert \big (\Vert y_1 \Vert _{\alpha _1},\ldots , \Vert y_d \Vert _{\alpha _d}\big ) \Vert _{\gamma }, \end{aligned}$$

which shows that

$$\begin{aligned} \Vert (u_1,\ldots ,u_d) \Vert _{ V^*}\le \Vert \big (\Vert u_1 \Vert _{\alpha _1^*}, \ldots ,\Vert u_d \Vert _{\alpha _d^*}\big ) \Vert _{\gamma *}. \end{aligned}$$
(18)

For the reverse inequality, let \(v = (\Vert u_1 \Vert _{\alpha _1^*},\ldots ,\Vert u_d \Vert _{\alpha _d^*})\). As \(\Vert \cdot \Vert _\gamma \) is monotonic, by Proposition 5.2 in [7, Chapter 1], there exists \(w\in \mathbb {R}^d_+\) such that \(\Vert w \Vert _{\gamma }\le 1\) and \(\left\langle v , w \right\rangle =\Vert v \Vert _{\gamma ^*}\). Let us denote by \(w_1,\ldots ,w_d\in \mathbb {R}_+\) and \(v_1,\ldots ,v_d\in \mathbb {R}\) respectively the components of w and v in the canonical basis of \(\mathbb {R}^d\). Now, let \( {\overline{y}}_1\in \mathbb {R}^{n_1},\ldots ,{\overline{y}}_d\in \mathbb {R}^{n_d}\) be such that \(\Vert {\overline{y}}_i \Vert _{\alpha _i}\le 1\) and \(\left\langle {\overline{y}}_i , u_i \right\rangle =\Vert u_i \Vert _{\alpha _i^*}\) for all \(i=1,\ldots ,d\). Then, as \(\Vert \cdot \Vert _{\gamma }\) is monotonic with respect to \(\mathbb {R}^d_+\) and \(\Vert {\overline{y}}_i\Vert _{\alpha _i}\le 1\) for all i, we have

$$\begin{aligned} \Vert \big (\Vert w_1\,{\overline{y}}_1 \Vert _{\alpha _1},\ldots , \Vert w_d\,{\overline{y}}_d \Vert _{\alpha _d}\big ) \Vert _{\gamma }= \Vert \big (w_1\Vert {\overline{y}}_1 \Vert _{\alpha _1},\ldots ,w_d \Vert {\overline{y}}_d \Vert _{\alpha _d}\big ) \Vert _{\gamma }\le \Vert w \Vert _{\gamma }\le 1. \end{aligned}$$

Note that

$$\begin{aligned} \left\langle (u_1,\ldots ,u_d) , (w_1\,{\overline{y}}_1,\ldots , w_d\,{\overline{y}}_d) \right\rangle&=\sum _{i=1}^d w_i\left\langle u_i , {\overline{y}}_i \right\rangle =\sum _{i=1}^d w_i\,\Vert u_i \Vert _{\alpha _i^*}=\left\langle v , w \right\rangle \\&=\Vert v \Vert _{\gamma ^*}=\Vert \big (\Vert u_1 \Vert _{\alpha _1^*}, \ldots ,\Vert u_d \Vert _{\alpha _d^*}\big ) \Vert _{\gamma *}. \end{aligned}$$

It follows that \(\Vert \big (\Vert u_1 \Vert _{\alpha _1^*},\ldots , \Vert u_d \Vert _{\alpha _d^*}\big ) \Vert _{\gamma *}\le \Vert (u_1,\ldots ,u_d) \Vert _{ V^*},\) which, together with (18), concludes the proof. \(\square \)

Theorem 6

Let d be a positive integer. For \(i=1,\ldots ,d\), let \(P_i\in \mathbb {R}^{n_i\times n}\) and let \(\Vert \cdot \Vert _{\alpha _i}\) be a norm on \(\mathbb {R}^{n_i}\). Suppose that \(M= [P_1^T,\ldots ,P_d^{T}]^T\in \mathbb {R}^{ (n_1+\ldots +n_d)\times n}\) has rank n. Furthermore, let \(\Vert \cdot \Vert _{\gamma }\) be a monotonic norm on \(\mathbb {R}^d\). For every \(x\in \mathbb {R}^n\), define

$$\begin{aligned} \Vert x \Vert _{\alpha }=\Vert \big (\Vert P_1x \Vert _{\alpha _1}, \ldots ,\Vert P_d x \Vert _{\alpha _d}\big ) \Vert _{\gamma }. \end{aligned}$$

Then, \(\Vert \cdot \Vert _{\alpha }\) is a norm on \(\mathbb {R}^n\) and the induced dual norm is given by

$$\begin{aligned} \Vert x \Vert _{\alpha ^*}= \inf _{\begin{array}{c} u_1\in \mathbb {R}^{n_1},\ldots ,u_d\in \mathbb {R}^{n_d}\\ P_1^Tu_1+\cdots +P_d^Tu_d = x \end{array}} \Vert \big (\Vert u_1 \Vert _{\alpha _1^*},\ldots , \Vert u_d \Vert _{\alpha _d^*}\big ) \Vert _{\gamma ^*}, \end{aligned}$$

where \(\Vert \cdot \Vert _{\alpha _i^*}\) is the dual norm induced by \(\Vert \cdot \Vert _{\alpha _i}\) and \(\Vert \cdot \Vert _{\gamma ^*}\) is the dual norm induced by \(\Vert \cdot \Vert _{\gamma }\).

Proof

Let \(u_1\in \mathbb {R}^{n_1},\ldots ,u_d\in \mathbb {R}^{n_d}\) be such that \(P_1^Tu_1+\cdots +P_d^Tu_d=x\). Such vectors always exists as M has full rank. Then, for every \(y\in \mathbb {R}^n\), it holds

$$\begin{aligned} \left\langle x , y \right\rangle&= \sum _{i=1}^d \left\langle P_i^Tu_i , y \right\rangle = \sum _{i=1}^d \left\langle u_i , P_iy \right\rangle \\&\le \sum _{i=1}^d \Vert u_i \Vert _{\alpha _i^*}\Vert P_iy \Vert _{\alpha _i}\le \Vert \big (\Vert u_1 \Vert _{\alpha _1^*},\ldots ,\Vert u_d \Vert _{\alpha _d^*}\big ) \Vert _{\gamma ^*}\Vert y \Vert _{\alpha }. \end{aligned}$$

It follows that

$$\begin{aligned} \Vert x \Vert _{\alpha ^*}\le \inf _{\begin{array}{c} u_1\in \mathbb {R}^{n_1},\ldots ,u_d\in \mathbb {R}^{n_d}\\ P_1^Tu_1+\cdots +P_d^Tu_d = x \end{array}} \Vert \big (\Vert u_1 \Vert _{\alpha _1^*},\ldots , \Vert u_d \Vert _{\alpha _d^*}\big ) \Vert _{\gamma ^*}. \end{aligned}$$

Now, we prove the reverse inequality. To this end, consider the vector space \( V=\mathbb {R}^{n_1}\times \ldots \times \mathbb {R}^{n_d}\) endowed with the norm \(\Vert \cdot \Vert _{ V}\) defined as

$$\begin{aligned} \Vert (u_1,\ldots ,u_d) \Vert _{ V}=\Vert \big (\Vert u_1 \Vert _{\alpha _1},\ldots , \Vert u_d \Vert _{\alpha _d}\big ) \Vert _{\gamma }\quad \forall (u_1,\ldots ,u_d)\in V. \end{aligned}$$

As V is a finite product of finite dimensional vector spaces, we can identify \( V^*\) with V and by Lemma 11, we know that the dual norm of \(\Vert \cdot \Vert _{ V^*}\) induced by \(\Vert \cdot \Vert _{ V}\) satisfies

$$\begin{aligned} \Vert (u_1,\ldots ,u_d) \Vert _{ V^*}=\Vert \big (\Vert u_1 \Vert _{\alpha _1^*}, \ldots ,\Vert u_d \Vert _{\alpha _d^*}\big ) \Vert _{\gamma ^*}\quad \forall (u_1,\ldots ,u_d)\in V. \end{aligned}$$

Consider now the vector subspace \( W=\{(P_1y,\ldots ,P_dy)\mid y\in \mathbb {R}^n\}\subset V\). Note that, we can identify W with the image of M, i.e. \( W = \{My\mid y\in \mathbb {R}^n\}\). Let \(M^\dagger \in \mathbb {R}^{n\times (n_1+\ldots +n_d)}\) be the Moore–Penrose inverse of M. Then, as M is full rank, we have \(M^\dagger My= y\) for all \(y\in \mathbb {R}^n\). Let \(\phi :W\rightarrow \mathbb {R}\) be defined as

$$\begin{aligned} \phi (u_1,\ldots ,u_d)=\left\langle M^\dagger (u_1,\ldots ,u_d) , x \right\rangle \quad \forall (u_1,\ldots , u_d)\in W. \end{aligned}$$

For every \( (u_1,\ldots , u_d)\in W\), there exists \(y\in \mathbb {R}^n\) such that \((u_1,\ldots , u_d)=My\), i.e. \(u_i=P_iy\) for all \(i=1,\ldots ,d\), and thus

$$\begin{aligned} |\phi (u_1,\ldots ,u_d)|&=|\phi (My)|=|\left\langle M^\dagger My , x \right\rangle |=| \left\langle y , x \right\rangle |\\&\le \Vert y \Vert _{\alpha }\Vert x \Vert _{\alpha ^*}= \Vert \big (\Vert P_1y \Vert _{\alpha _1},\ldots , \Vert P_dy \Vert _{\alpha _d}\big ) \Vert _{\gamma }\Vert x \Vert _{\alpha ^*}= \Vert (u_1,\ldots ,u_d) \Vert _{ V}\Vert x \Vert _{\alpha ^*}. \end{aligned}$$

By the Hahn–Banach theorem (see e.g. Corollary 1.2 of [5]), there exists \((u_1',\ldots ,u_d')\in V\) such that

$$\begin{aligned} \phi (u_1,\ldots ,u_d)=\sum _{i=1}^d \left\langle u_i' , u_i \right\rangle \quad \forall (u_1,\ldots ,u_d)\in W, \end{aligned}$$
(19)

and

$$\begin{aligned} \Vert \big (\Vert u'_1 \Vert _{\alpha _1^*},\ldots ,\Vert u'_d \Vert _{\alpha _d^*}\big ) \Vert _{\gamma ^*}=\Vert (u'_1,\ldots ,u'_d) \Vert _{V^*}\le \Vert x \Vert _{\alpha ^*}. \end{aligned}$$

Next, let \(y\in \mathbb {R}^n\), then \(My=(P_1y,\ldots ,P_dy)\in W\) and with (19), we have

$$\begin{aligned} \left\langle y , x \right\rangle =\left\langle M^\dagger My , x \right\rangle =\sum _{i=1}^d \left\langle u_i' , P_iy \right\rangle =\sum _{i=1}^d \left\langle P_i^Tu_i' , y \right\rangle . \end{aligned}$$

As the above is true for all \(y\in \mathbb {R}^n\), it follows that \(P_1^Tu_1'+\ldots +P_d^Tu_d'=x\). Hence, we have

$$\begin{aligned} \inf _{\begin{array}{c} u_1\in \mathbb {R}^{n_1},\ldots ,u_d\in \mathbb {R}^{n_d}\\ P_1^Tu_1+\ldots +P_d^Tu_d = x \end{array}} \Vert \big (\Vert u_1 \Vert _{\alpha _1^*},\ldots ,\Vert u_d \Vert _{\alpha _d^*}\big ) \Vert _{\gamma ^*}\le \Vert \big (\Vert u'_1 \Vert _{\alpha _1^*},\ldots ,\Vert u'_d \Vert _{\alpha _d^*}\big ) \Vert _{\gamma ^*}\le \Vert x \Vert _{\alpha ^*}, \end{aligned}$$

which concludes the proof of the formula for \(\Vert \cdot \Vert _{\alpha ^*}\). \(\square \)

As a consequence of the above Theorem 6, we have that the dual of the norms \(\Vert \cdot \Vert _{\alpha _+},\Vert \cdot \Vert _{\alpha _{\times }}, \Vert \cdot \Vert _{\alpha _P}\) considered at the beginning of this section are respectively given by

$$\begin{aligned}&\Vert x \Vert _{\alpha _+^*}=\inf _{\begin{array}{c} u_1+u_2=x\\ u_1,u_2\in \mathbb {R} \end{array}}(\Vert u_1 \Vert _{\alpha _1^*}^{p^*}+\Vert u_2 \Vert _{\alpha _2^*}^{p^*})^{1/p^*},\\&\Vert (x,y) \Vert _{\alpha _{\times }^*}=(\Vert x \Vert _{\alpha _1^*}^{p^*}+\Vert y \Vert _{\alpha _2^*}^{p^*})^{1/p^*},\qquad \Vert x \Vert _{\alpha _P^*}=\inf _{u \in \mathbb {R}^{{\widetilde{n}}}:P^Tu = x}\Vert u \Vert _{p^*}, \end{aligned}$$

with \(p^* = p/(p-1)\). Note that the \(\Vert \cdot \Vert _{\alpha ^*_{\times }}\) does not involve an infimum. The infimum can also be removed in \(\Vert x \Vert _{\alpha _P^*}\), if P is square and invertible and in that case it holds \(\Vert x \Vert _{\alpha _P^*}=\Vert P^{-T}x \Vert _{p^*}\).

We discuss more general examples in the next result.

Corollary 2

Under the same assumptions as Theorem 6, we have:

  1. 1.

    If \(P_1,\ldots ,P_d\) are all square invertible matrices and

    $$\begin{aligned} \Vert x \Vert _{\alpha ^*}=\min _{\begin{array}{c} x=u_1+\cdots + u_d\\ u_1,\ldots ,u_d\in \mathbb {R}^n \end{array}}\Vert (\Vert (P_1^T)^{-1} u_1 \Vert _{\alpha _1^*},\ldots ,\Vert (P_d^T)^{-1} u_d \Vert _{\alpha _d^*}) \Vert _{\gamma ^*} \end{aligned}$$
  2. 2.

    If every \(x\in \mathbb {R}^n\) can be uniquely written as \(x=x_{P_1}+\ldots +x_{P_d}\) with \(x_{P_i}\in {\text {Im}}(P_i^T)\) for all \(i=1,\ldots ,d\) (i.e. \(\mathbb {R}^n\) is the direct sum of the range of \(P_1,\dots ,P_d\)), then

    $$\begin{aligned} \Vert x \Vert _{\alpha ^*}=\left\| \left( \inf _{\begin{array}{c} u_1\in \mathbb {R}^{n_1}\\ P_1^Tu_1= x_{P_1} \end{array}}\Vert u_1 \Vert _{\alpha _1^*},\ldots ,\inf _{\begin{array}{c} u_d\in \mathbb {R}^{n_d}\\ P_d^Tu_d= x_{P_d} \end{array}}\Vert u_{d} \Vert _{\alpha _d^*}\right) \right\| _{\gamma ^*}. \end{aligned}$$

    If, additionally, \(n_i = \dim ({\text {Im}}(P_i^T))\) for all \(i=1,\ldots ,d\), then

    $$\begin{aligned} \Vert x \Vert _{\alpha ^*}=\Vert \big (\Vert (P_1^T)^{\dagger }x \Vert _{\alpha _1^*}, \ldots ,\Vert (P_d^T)^{\dagger } x \Vert _{\alpha _d^*}\big ) \Vert _{\gamma ^*}, \end{aligned}$$

    where \((P_i^T)^{\dagger }\) is the Moore–Penrose inverse of \(P_i^T\).

5.2 The Power Method for Compositions of \(\ell ^p\)-Norms

We discuss here consequences of Theorems 4 and 6 when applied to a special family of norms defined in terms of subsets of entries of the initial vector, i.e. the case where \(P_i\) is a nonnegative diagonal matrix.

For some nonnegative weight vector \(\omega \in \mathbb {R}^m\) and coefficient \(p\in (1,\infty )\), let \(\Vert \cdot \Vert _{\omega ,p}\) be the \(\omega \)-weighted \(\ell ^p\)-(semi)norm on \(\mathbb {R}^m\), defined as

$$\begin{aligned} \Vert x \Vert _{\omega ,p}=\Vert \mathrm {diag}(\omega )^{1/p}x \Vert _p= \left( \sum _{j=1}^m\omega _j|x_j|^p\right) ^{1/p}. \end{aligned}$$
(20)

To express the dual of \(\Vert x \Vert _{\omega ,p}\) and their compositions, let

$$\begin{aligned} p^*=\frac{p}{p-1}\quad \text {and}\quad \omega _i^*= {\left\{ \begin{array}{ll} \omega _j^{1-p^*}&{}\text {if }\omega _j>0,\\ 0&{}\text {if }\omega _j=0, \end{array}\right. } \quad \forall j=1,\ldots ,m. \end{aligned}$$
(21)

If \(\omega \) is positive, then \(\Vert x \Vert _{\omega ,p}\) is a norm and it holds \((\Vert x \Vert _{\omega ,p})_*=\Vert x \Vert _{\omega ^*,p^*} \) by Proposition 1.

Let \(\omega _1,\ldots ,\omega _d\in \mathbb {R}^m\) be nonzero vectors of nonnegative weights such that \(\omega _1+\ldots +\omega _d\) is a positive vector. Further let \(s\in [1,\infty )\), \(p_1,\ldots ,p_d\in (1,\infty )\) and define

$$\begin{aligned} \Vert x \Vert _{\alpha } =\left( \sum _{k=1}^d \Vert x \Vert _{\omega _k,p_k}^s\right) ^{1/s}, \end{aligned}$$
(22)

The fact that \(\omega _1+\cdots +\omega _d\) is positive ensures that \(\Vert \cdot \Vert _{\alpha }\) is a norm. Note that \(\Vert \cdot \Vert _{\alpha }\) is strongly monotonic.

The differentiability of \(\Vert \cdot \Vert _{\alpha }\) is discussed in the following lemma.

Lemma 12

Let \(\Vert \cdot \Vert _{\alpha }\) be as in (22), then \(\Vert \cdot \Vert _{\alpha }\) is differentiable if either \(s>1\) or \(s=1\) and \(\omega _i\) has at least two positive entries for every \(i=1,\ldots ,d\).

Proof

As \(p_k>1\), \(\Vert \cdot \Vert _{\omega _k,p_k}\) is differentiable if \(\omega _{k}\) has at least two positive entries. If it has only one positive entry then \(\Vert \cdot \Vert _{\omega _k,p_k}\) is just a weighted absolute value. Hence, if \(s>1\), then the differentiability of \(\Vert \cdot \Vert _{\alpha }\) follows from that of the \(\ell ^{s}\)-norm. While if \(s=1\) and \(\omega _i\) has at least two positive entries for every \(i=1,\ldots ,d\), then \(\Vert \cdot \Vert _{\alpha }\) is just a sum of differentiable norms. \(\square \)

If \(\Vert \cdot \Vert _\alpha \) is differentiable, we have

$$\begin{aligned} J_\alpha (x) = \Vert x\Vert _{\alpha }^{1-s} \sum _{k=1}^d\Vert x \Vert _{\omega _k,p_k}^{s-p_k} \mathrm {diag}(\omega _k)\Phi _{p_k}(x) \end{aligned}$$
(23)

and the following lemma provides an upper bound for \(\kappa _H(J_{\alpha })\).

Lemma 13

Let \(\Vert \cdot \Vert _{\alpha }\) be as in (22). If \(\Vert \cdot \Vert _{\alpha }\) is differentiable then

$$\begin{aligned} \kappa _H(J_{\alpha })\le (s-1)+\sum _{k=1}^d\max \{0,p_k-s\}. \end{aligned}$$

Proof

Let \(\delta =\sum _{k=1}^d\max \{0,p_k-s\}\). We have \(J_{\alpha }(x)=\Vert x \Vert _{\alpha }^{1-s}(F(x)+G(x))\) where for all \(x\in \mathbb {R}^m_{+}{\setminus }\{0\}\) we let \(F(x) = \sum _{p_k\le s}\Vert x \Vert _{\omega _k,p_k}^{s-p_k}\mathrm {diag}(\omega _k)\Phi _{p_k}(x)\) and \( G(x)=\sum _{p_k>s}\Vert x \Vert _{\omega _k,p_k}^{s-p_k}\mathrm {diag}(\omega _k)\Phi _{p_k}(x).\) Note that if \(p_k > s\) for all k then \(F(x)=0\), whereas \(G(x)=0\) when \(p_k\le s\) for all k. Moreover, note that F is order-preserving and homogeneous of degree \(s-1\). Now let us set \(\tau (x)=1\) if \(p_j\le s\) for all j and \(\tau (x) =\prod _{p_j>s}\Vert x \Vert _{\omega _j,p_j}^{p_j-s}\) otherwise. Then \(\tau \) is order-preserving and homogeneous of degree \(\delta \) and \(x\mapsto \tau (x)F(x)\) is order-preserving and homogeneous of degree \(\delta +(s-1)\). Finally, note that

$$\begin{aligned} x\mapsto \tau (x) G(x)=\sum _{p_k>s}\prod _{\begin{array}{c} p_j>s\\ j \ne k \end{array}}\Vert x \Vert _{\omega _j,p_j}^{p_j-s}\mathrm {diag}(\omega _k)\Phi _{p_k}(x) \end{aligned}$$

is order-preserving as well and homogeneous of degree \(\delta +(s-1)\). This implies that \(\delta +(s-1)\) is a Lipschitz constant of \(H(x)=\tau (x)(F(x)+G(x))\) with respect to the Hilbert metric \(\mu \). Hence, for any \(x,y\in \mathbb {R}^m_+{\setminus }\{0\}\) with \(x\sim y\), we finally obtain

$$\begin{aligned} \mu (J_{\alpha }(x),J_{\alpha }(y)) =\mu \big (H(x),H(y)\big )\le (\delta +s-1)\mu (x,y), \end{aligned}$$

which concludes the proof. \(\square \)

If \(s>1\), by Theorem 6, we have

$$\begin{aligned} \Vert x \Vert _{\alpha ^*}&=\min _{\begin{array}{c} u_1,\ldots ,u_d\in \mathbb {R}^m\\ \mathrm {diag}(\omega _1)u_1+\cdots +\mathrm {diag}(\omega _d)u_d=x \end{array}}\left( \sum _{k=1}^d \Vert u_k \Vert _{p_k^*}^{s^*}\right) ^{1/s^*} \nonumber \\&=\min _{\begin{array}{c} u_1+\cdots +u_d=x\\ u_1,\ldots ,u_d\in \mathbb {R}^m \end{array}}\left( \sum _{k=1}^d \Vert u_k \Vert _{\omega _k^*,p_k^*}^{s^*}\right) ^{1/s^*}. \end{aligned}$$
(24)

It is not difficult to realize that the case \(s=1\) has a similar form, where the sum is replaced by a maximum. We henceforth omit that case, for the sake of brevity.

Now, consider a norm \(\Vert \cdot \Vert _\beta \) defined as the dual norm of a norm of the type (22)

$$\begin{aligned} \Vert x\Vert _\beta = \min _{\begin{array}{c} u_1+\cdots +u_h=x\\ u_1,\ldots ,u_h\in \mathbb {R}^n \end{array}}\left( \sum _{k=1}^h \Vert u_k \Vert _{\varpi _k,q_k}^{t}\right) ^{1/t} \end{aligned}$$
(25)

where h is some positive integer, \(\varpi _i\) are nonnegative weight vectors whose sum \(\varpi _1+\cdots +\varpi _h\) is positive and \(q_1,\dots ,q_h, t\in (1,\infty )\). As \(\min _x f(x)=(\max _x f(x)^{-1})^{-1}\) for continuous positive f, we deduce that for this choice of norm we have

$$\begin{aligned} \Vert A \Vert _{\beta \rightarrow \alpha } =\max _{x\ne 0}\frac{\Vert Ax \Vert _{\alpha }}{\Vert x \Vert _{\beta }}= \max _{\begin{array}{c} u_1+\cdots +u_h\ne 0\\ u_1\in \mathbb {R}^n,\ldots ,u_h\in \mathbb {R}^n \end{array}}\dfrac{\displaystyle \left( \sum _{k=1}^d \Vert \sum _{j=1}^h Au_j \Vert _{\omega _k,p_k}^s\right) ^{1/s}}{\displaystyle \left( \sum _{k=1}^h \Vert u_k \Vert _{\varpi _k,q_k}^{t}\right) ^{1/t}} \end{aligned}$$

for any matrix \(A\in \mathbb {R}^{m\times n}\).

We emphasize that, while the norm \(\Vert \cdot \Vert _\beta \) is defined implicitly in the general case, when the weight vectors \(\varpi _i\) have disjoint support Corollary 2 yields the following explicit formula

$$\begin{aligned} \Vert x\Vert _\beta = \left( \sum _{k=1}^h \Vert x \Vert _{\varpi _k,q_k}^{t}\right) ^{1/t} \end{aligned}$$

which also simplifies the definition of \(\Vert A\Vert _{\beta \rightarrow \alpha }\).

The advantage of choosing \(\Vert \cdot \Vert _\beta \) as in (25) relies on the fact that both \(\Vert x\Vert _{\beta ^*}\) and \(J_{\beta ^*}\) admit an explicit expression analogous to (22) and (23), precisely

$$\begin{aligned} \Vert x\Vert _{\beta ^*}&= \left( \sum _{k=1}^h \Vert x \Vert _{\varpi _k^*, q_k^*}^{t^*}\right) ^{1/t^*} \quad \text {and} \\ J_{\beta ^*}(x)&= \Vert x\Vert _{\beta ^*}^{1-t^*}\sum _{k=1}^h \Vert x \Vert _{\varpi _k^*,q_k^*}^{t^*-q_k^*} \mathrm {diag}(\varpi _k^*)\Phi _{q_k^*}(x), \end{aligned}$$

for all choices of the weights \(\varpi _i\) such that \(\varpi _1+\dots +\varpi _h>0\).

figure a

Thus, we obtain an explicit formula for the operator

$$\begin{aligned} {\mathcal {S}}_A(x)=J_{\beta ^*}(A^TJ_{\alpha }(Ax)) \end{aligned}$$

which allows us to easily implement the power method (11) for the matrix norm \(\Vert A\Vert _{\beta \rightarrow \alpha }\). An efficient implementation of the operator \(J_\alpha \) for a norm \(\Vert \cdot \Vert _alpha\) of the form (22) is provided by Algorithm 1. If we let \(\mathrm {nnz}(X)\) denote the number of nonzero entries in X and we assume arithmetic operations have unit cost, evaluating \(J_{\alpha }\) and \(J_{\beta ^*}\) via Algorithm 1 costs \({\mathcal {O}}\left( \sum _{i=1}^d\mathrm {nnz}(\omega _i)\right) \) and \(\mathcal O\left( \sum _{i=1}^h\mathrm {nnz}(\varpi _i)\right) \) operations, respectively. So, the total cost of evaluating \({\mathcal {S}}_A\), i.e. of each iteration of the power method in (11), is \(\mathcal O\big (C({\mathcal {S}}_A)\big )\) where

$$\begin{aligned} C({\mathcal {S}}_A) = \sum _{i=1}^d\mathrm {nnz}(\omega _i) + \sum _{i=1}^h\mathrm {nnz}(\varpi _i) + \mathrm {nnz}(A) \end{aligned}$$

which boils down to \({\mathcal {O}}(dn+hn+n^2)\) when all the \(\omega _i\), \(\varpi _i\) and A are full.

As a consequence, we have

Theorem 7

Let \(A\in \mathbb {R}^{m\times n}\) be a nonnegative matrix such that \(A^TA\) is irreducible. Let \(\Vert \cdot \Vert _{\alpha }\) and \(\Vert \cdot \Vert _{\beta }\) be as in (22) and (25), respectively. Let

$$\begin{aligned} \tau =\kappa _H(A)\kappa _H(A^T)\left( s-1+ \sum _{k=1}^d\max \{0,p_k-s\}\right) \left( t-1+ \sum _{j=1}^h\max \{0,q_j-t\}\right) . \end{aligned}$$

If \(\tau <1\) and \(\Vert \cdot \Vert _{\alpha }\), \(\Vert \cdot \Vert _{\beta }\) are differentiable, then \(\Vert A \Vert _{\beta ^*\rightarrow \alpha }\) can be approximated to \(\varepsilon \) precision in \(\mathcal {O}\big (C({\mathcal {S}}_A)\ln (1/\varepsilon )\big )\) arithmetic operations with the power sequence (11).

Proof

Besides the complexity bound, the result is a direct consequence of Theorem 4 and the upper bounds for \(\kappa _H(J_\alpha )\) and \(\kappa _H(J_\beta )\) obtained in Lemma 13. Let us provide and estimates for the total number of operations required by the fixed point sequence (11). Let \({\widetilde{C}}\) be as in Theorem 4. We have \({\widetilde{C}}\tau ^k <\varepsilon \) if and only if \(k>(\ln (\varepsilon )-\ln (\widetilde{C}))/\ln (\tau )\). As \((\ln (\varepsilon )-\ln (\widetilde{C}))/\ln (\tau )\in \mathcal {O}(-\ln (\varepsilon ))\) for \(\varepsilon \rightarrow 0\), we deduce that \(\Vert A \Vert _{q\rightarrow p}-\varepsilon \le \Vert Ax_k \Vert _{p}\) after \(\mathcal {O}(\ln (\varepsilon ^{-1}))\) iterations of \({\mathcal {S}}_{A}\), leading to a total complexity of \(\mathcal {O}(C(\mathcal S_A)\ln (\varepsilon ^{-1}))\). \(\square \)

We conclude the section by proving a number of corollaries of Theorem 7 that illustrate the richness of the class of problem that can be addressed via that theorem. For simplicity, in the statements we assume that the involved matrices are square and positive. However, more general statements involving irreducible and rectangular matrices can be easily derived by reproducing the proof of the corresponding corollary.

Corollary 3

Let \(A\in \mathbb {R}^{n\times n}\) be a positive matrix. Let \(\omega ,\varpi \in \mathbb {R}^n\) be positive weights and \(1<p,q<\infty \). Let

$$\begin{aligned} \Vert A \Vert _{\beta \rightarrow \alpha }=\max _{x\ne 0} \frac{\Vert Ax \Vert _{\omega ,p}}{\Vert x \Vert _{\varpi ,q}}\quad \text {and}\quad \tau = \kappa _H(A)^2\frac{p-1}{q-1}. \end{aligned}$$

It \(\tau <1\), then \(\Vert A \Vert _{\beta \rightarrow \alpha }\) can be computed to \(\varepsilon \) precision in \(\mathcal {O}\big (\mathrm {nnz}(A)\ln (1/\varepsilon )\big )\) operations.

Proof

As \(d=h=1\), \(C({\mathcal {S}}_A)=\mathrm {nnz}(A)\), \(\Vert y \Vert _{\alpha }=\Vert y \Vert _{\omega ,p}\) and \(\Vert x \Vert _{\beta ^*}=\Vert x \Vert _{\varpi ^*,q^*}\) in Theorem 7. \(\square \)

Corollary 4

Let \(A,B\in \mathbb {R}^{n\times n}\) be positive matrices. Further, let \(1<p,q,r<\infty \),

$$\begin{aligned} \left\| \left[ \begin{matrix}A\\ B\end{matrix}\right] \right\| _{\beta \rightarrow \alpha }\!\!\!\!=\max _{x\ne 0} \frac{2\,\Vert Ax \Vert _{p}+3\,\Vert Bx \Vert _{q}}{\Vert x \Vert _{r}} \quad \text {and}\quad \tau = \kappa _H\left( \left[ \begin{matrix}A\\ B\end{matrix}\right] \right) ^2\,\frac{p+q-2}{r-1}. \end{aligned}$$

If \(\tau <1\), then \(\left\| \bigg [\begin{matrix}A\\ B\end{matrix}\bigg ] \right\| _{\beta \rightarrow \alpha }\) can be computed to \(\varepsilon \) precision in \(\mathcal {O}\big (N\,\ln (1/\varepsilon )\big )\) operations with \(N=\mathrm {nnz}(A)+\mathrm {nnz}(B)\).

Proof

Let \(d=2,h=1\), \(\omega _i=2^p,\varpi _i=3^q\) for \(i=1,\ldots ,n\), and \(\Vert x \Vert _{\beta ^*}=\Vert x \Vert _{r^*}\), \(\Vert (y,z) \Vert _{\alpha }=\Vert \Vert y \Vert _{\omega ,p},\Vert z \Vert _{\varpi ,q} \Vert _1\) in Theorem 7. Also note that \(\mathcal O(C({\mathcal {S}}_A))={\mathcal {O}}(N)\). \(\square \)

Corollary 5

Let \(A\in \mathbb {R}^{n\times n}\) positive, \(1<p<\infty \), \(2\le q,r<\infty \),

$$\begin{aligned} \Vert A \Vert _{\beta \rightarrow \alpha }=\max _{x+y\ne 0} \frac{\Vert Ax+Ay \Vert _{p}}{\sqrt{\Vert x \Vert _q^2+\Vert y \Vert ^2_r}} \quad \text {and}\quad \tau = \kappa _H(A)^2(p-1). \end{aligned}$$

If \(\tau <1\), then \(\Vert A \Vert _{\beta \rightarrow \alpha }\) can be computed to \(\varepsilon \) precision in \(\mathcal {O}\big (\mathrm {nnz}(A)\ln (1/\varepsilon )\big )\) operations.

Proof

Let \(d=1,h=2\), \(\Vert y \Vert _{\alpha }=\Vert y \Vert _p\), \(\Vert x \Vert _{\beta ^*}=\Vert \Vert x \Vert _{q^*},\Vert x \Vert _{r^*} \Vert _2\) in Theorem 7. \(\square \)

Corollary 6

Let \(A,B\in \mathbb {R}^{n\times n}\) be positive matrices, \(1<s\le \theta \le p,q,r<\infty \),

$$\begin{aligned} \Vert [A\ B] \Vert _{\beta \rightarrow \alpha }^\theta =\max _{(x,y)\ne (0,0)}\frac{\Vert Ax \Vert ^\theta _p+\Vert By \Vert _{q}^\theta }{\Vert x \Vert _r^\theta +\Vert y \Vert ^\theta _s} \quad \text {and}\quad \tau = \kappa _H([A\ B])^2\,\frac{p+q-\theta -1}{s-1}. \end{aligned}$$

If \(\tau <1\), then \(\Vert [A\ B] \Vert _{\beta \rightarrow \alpha }\) can be computed to \(\varepsilon \) precision in \(\mathcal {O}\big (N\,\ln (1/\varepsilon )\big )\) operations with \(N=\mathrm {nnz}(A)+\mathrm {nnz}(B)\).

Proof

Let \(d=2,h=2\), \(\Vert (y,z) \Vert _{\alpha }=\Vert \Vert y \Vert _p,\Vert z \Vert _q \Vert _\theta \) and \(\Vert x \Vert _{\beta ^*}=\Vert \Vert x \Vert _{r^*},\Vert x \Vert _{s^*} \Vert _{\theta ^*}\) in Theorem 7. \(\square \)

Corollary 7

Let \(A,B\in \mathbb {R}^{n\times n}\) be positive matrices and \(1<p,q,r<\infty \), let

$$\begin{aligned} \phi =\max _{(x,y,u+v)\ne (0,0,0)}\min \left\{ \frac{\Vert A(x+y)+B(u+v) \Vert _{p}}{\Vert (x,u) \Vert _q},\frac{\Vert A(x+y)+B(u+v) \Vert _{p}}{\Vert (y,v) \Vert _r}\right\} . \end{aligned}$$

If \(\tau = \frac{p-1}{q-1}+\frac{p-1}{r-1}<1\), then \(\phi \) can be computed to \(\varepsilon \) precision in \(\mathcal {O}\big (N\,\ln (1/\varepsilon )\big )\) operations with \(N=\mathrm {nnz}(A)+\mathrm {nnz}(B)\).

Proof

Let \(M=\Big [\begin{matrix}A&{} A &{}0 \\ 0 &{} B &{} B \end{matrix}\Big ]\), \(d=2,h=2\), \(\Vert y \Vert _{\alpha }=\Vert y \Vert _{p}\) and \(\Vert (x,y,z) \Vert _{\beta ^*}=\Vert \Vert (x,z) \Vert _{r^*},\Vert (y,z) \Vert _{s^*} \Vert _{1}\) in Theorem 7. Note that \(\kappa _H(M)=1\) by Theorem 3. \(\square \)

Corollary 8

Let \(A,B\in \mathbb {R}^{n\times n}\) be positive matrices and \(1<p,q,r<\infty \). Let \(\sigma _p:\mathbb {R}^n\rightarrow \mathbb {R}^n_+\) be defined as \(\sigma _p(x)=(|x_1|^p,\ldots ,|x_n|^p)^T\) and let

$$\begin{aligned} \Vert B \Vert _{\beta \rightarrow \alpha }^{p}=\max _{\Vert x \Vert _r=1}\Vert A\sigma _p(Bx) \Vert _{q}\quad \tau = \frac{pq-1}{r-1}\kappa (B)^2. \end{aligned}$$

If \(\tau <1\) then \(\Vert B \Vert _{\beta \rightarrow \alpha }\) can be computed to \(\varepsilon \) precision in \(\mathcal {O}\big (N\,\ln (1/\varepsilon )\big )\) operations with \(N=\mathrm {nnz}(A)+\mathrm {nnz}(B)\).

Proof

As \(x\mapsto A\sigma _p(Bx)\) is positively homogeneous of degree p, we have

$$\begin{aligned} \max _{\Vert x \Vert _r=1}\Vert A\sigma _p(Bx) \Vert _q=\max _{x\ne 0}\frac{\Vert A\sigma _p(Bx) \Vert _q}{\Vert x \Vert _r^p}=\left( \max _{x\ne 0}\frac{\Vert A\sigma _p(Bx) \Vert _q^{1/p}}{\Vert x \Vert _r}\right) ^p. \end{aligned}$$

Let \(\Vert \cdot \Vert _{\beta ^*}=\Vert \cdot \Vert _{r^*}\) and \(\Vert x \Vert _{\alpha }=\Vert A\sigma _p(x) \Vert _q^{1/p}\). Then, \(\Vert Bx \Vert _{\alpha }=\Vert A\sigma _{p}(Bx) \Vert _q^{1/p}\) and with \(\omega _i = (A_{i,1},\ldots ,A_{i,n})\), it holds \(\Vert x \Vert _{\alpha }=\Vert (\Vert x \Vert _{\omega _1,p}, \ldots ,\Vert x \Vert _{\omega _n,p}) \Vert _{pq}\) for every x. The proof is now a direct consequence of Theorem 3 with \(s=n\) and \(t=1\). \(\square \)

6 Numerical Experiments

In this section we illustrate the numerical behaviour of the power sequence (11) on some example matrices and some choices of the norms \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \). In particular, we consider both a dense matrix and sparse matrix example.

6.1 Dense Matrix Example: Tightness of the Convergence Bound

We verify the convergence bound of Theorem 4 for the family of matrix norms analyzed in Sect. 4.2, that is we verify the bound of Theorem 4 on the convergence of the power sequence (11) for the computation of \(\Vert A_{\varepsilon } \Vert _{q\rightarrow p}\) for various \(1<p,q<\infty \) and \(A_{\varepsilon }\) defined as in (13).

By Lemma 7, we have \(\tau _{p,q,\varepsilon }=\kappa _H({\mathcal {S}}_{A_\varepsilon })= \kappa _H(A_{\varepsilon })^2\frac{p-1}{q-1}\). Moreover, with \(x^+ = 2^{-1/q}(1,1) ^T\) it holds \({\mathcal {S}}_{A_\varepsilon }(x^+)=x^+\), hence the power sequence converges to \(x^+\) when \(\tau _{p,q,\varepsilon }<1\). By Theorem 4, if \(\tau _{p,q,\varepsilon }<1\), then

$$\begin{aligned} \Vert x_k-x^+ \Vert _{\infty } \le (\tau _{p,q,\varepsilon })^{k}\, C \quad \text {with}\quad C=\frac{d_H(x_0,x_1)}{1-\tau _{p,q,\varepsilon }}. \end{aligned}$$
(26)
Fig. 2
figure 2

Comparison between the true error \(\Vert x_k-x^+\Vert _\infty \) and the upper bound \((\tau _{p,q,\varepsilon })^kC\) in (26) against number of iterations for the power method (11) applied to \(\Vert A_{\varepsilon } \Vert _{q\rightarrow p}\) with \(\varepsilon = 1/3\), \(q=2\) and for five values of p ranging within the interval \([1/15, 1/\kappa _H(A_\varepsilon )]\cdot (q-1)+1\), i.e. chosen so that \(\tau _{p,q,\varepsilon }=\kappa _H({\mathcal {S}}_{A_\varepsilon })<1\)

We use \(\delta = 10^{-12}\) and \(x_0 = (\delta ,1-\delta )^T\) in our experiments. The choice of \(x_0\) is motivated by the fact that it is far from the limit point \(x^+\) in the Hilbert metric, so to model a worst-case-scenario setting. In Fig. 2, we plot the true error \(\Vert x_k-x^+ \Vert _{\infty }\) against the number of iterations k and compare it with the upper bound \((\tau _{p,q,\varepsilon })^kC\), for the choice \(\varepsilon = 1/3\), \(q=2\) and five increasing values of p chosen so that \(\tau _{p,q,\varepsilon }=\kappa _H({\mathcal {S}}_{A_\varepsilon })<1\). We can observe that the method converges linearly as expected and that the upper bound well captures the decay slope. Moreover, even though larger values of p yield larger values of the contraction constant \(\tau _{p,q,\varepsilon }\), the upper bound still behaves as the true error when p grows up to a multiplicative constant.

6.2 Sparse Matrix Example

For this experiment, we consider two families of matrices with growing size which are not irreducible but satisfy the requirement of Theorem 4, that is \(A^TA\) is irreducible. More precisely, let \(A_1\in \mathbb {R}^{3\times 3}\) and \(B_1 \in \mathbb {R}^{4\times 2}\) be given by

$$\begin{aligned} A_1 = \begin{bmatrix} 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 1 \end{bmatrix} \quad \text {and}\quad B_1=\begin{bmatrix} 1 &{} 0 \\ 0 &{} 1 \\ 0 &{} 0 \\ 1 &{} 1 \end{bmatrix}. \end{aligned}$$

Clearly, neither \(A_1\) nor \(B_1\) is irreducible, however \(A_1^TA_1\) and \(B_1^TB_1\) have strictly positive entries and are therefore irreducible. Then, for \(s\ge 2\), we consider the matrices \(A_s\in \mathbb {R}^{3^s\times 3^s}\), \(B_s\in \mathbb {R}^{4^s \times 2^s}\) obtained by taking s times the Kronecker product of \(A_1\), resp. \(B_1\), with itself, i.e. \(A_s = A_1\otimes A_{s-1}\) and \(B_s = B_1 \otimes B_{s-1}\). Note that for all \(s\ge 1\), \(A_s\) has at least one full row of only zero entries and therefore cannot be irreducible. On the other hand, it holds \(A_s^TA_s = (A_1^TA_1)\otimes \ldots \otimes (A_1^TA_1)\) and thus \(A_s^TA_s\) has positive entries since it is the Kronecker product of s positive matrices. The same observation holds for the sequence \(B_s\). Furthermore, Theorem 3 implies that \(\kappa _H(A_s)=\kappa _H(B_s) = 1\) for all \(s\ge 1\). Hence, we have \(\tau ({\mathcal {S}}_{A_s})=\tau ({\mathcal {S}}_{B_s})=\frac{p-1}{q-1}\) for all s.

In our experiments we analyze the number of iterations until convergence of the power sequences associated to the computation of \(\Vert A_s \Vert _{q\rightarrow p}\) and \(\Vert B_s \Vert _{q\rightarrow p}\), where pq are fixed so that \(\tau = \tau ({\mathcal {S}}_{A_s})=\tau ({\mathcal {S}}_{B_s})=3/4\). For each fixed pq and s, we try 5000 different starting points drawn uniformly from \((0,1)^n\) with \(n=3^s\) or \(n=2^s\). The boxplots in Fig. 3 show the number of iterations required until the stopping criterion

$$\begin{aligned} \tau ^k \, \frac{d_H(x_0,x_1)}{(1-\tau )}<\delta \end{aligned}$$
(27)

is met, for both \(A_s\) (the two panels in the top row) and \(B_s\) (the two panels in the row at the bottom), and for \(\delta = 10^{-10}\). Note that, due to Theorem 4, if (27) holds for k then we are guaranteed to be \(\delta \)-close to the true solution \(x^+\), i.e. the computed approximation \(x_k\) is such that \(\Vert x_k-x^+\Vert _\infty <\delta \). Moreover, since \(\Vert x\Vert _p\le n^{1/p}\Vert x\Vert _\infty \) for all \(x\ne 0\), we have

$$\begin{aligned} (1-\delta n^{1/p})\Vert M \Vert _{q\rightarrow p} \le \Vert M x_k\Vert _p\le \Vert M \Vert _{q\rightarrow p} \end{aligned}$$

for both \(M=A_s\) and \(M=B_s\). While Fig. 3 shows the steps required to guarantee approximation to the true solution, we emphasize that in practice the required number of steps to reach floating point precision on two consecutive iterates is typically much smaller.

Fig. 3
figure 3

Number of iterations k until guaranteed convergence such that \(\Vert x_k-x^+\Vert _\infty < 10^{-10}\) for the computation of \(\Vert A_{s} \Vert _{q\rightarrow p}\) and \(\Vert B_{s} \Vert _{q\rightarrow p}\) for \(s=1,\ldots ,10\) and two different choices (pq) such that \(\tau ({\mathcal {S}}_{A_s})=\tau ({\mathcal {S}}_{B_s})=3/4\). Note that \(A_{10}\in \mathbb {R}^{3^{10}\times 3^{10}}\) with \(3^{10}\approx 6\cdot 10^6\) and \(B_{10}\in \mathbb {R}^{4^{10}\times 2^{10}}\) with \(4^{10}\approx 10^6\)

7 Conclusions

On top of being a classical problem in numerical analysis, computing the norm of a matrix \(\Vert A\Vert _{\beta \rightarrow \alpha }\) is a problem that appears in many recent applications in data mining and optimization. However, except for a few choices of \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \), computing such a matrix norm to an arbitrary precision is generally unfeasible for large matrices as this is known to be an NP-hard problem. The situation is different when the matrix has nonnegative entries, in which case \(\Vert A\Vert _{q\rightarrow p}\) is known to be computable for \(\ell ^p\) norms such that \(q\le p\). In this paper we have both (a) refined this result, by showing that the condition \(p<q\) is not necessarily required and (b) extended this result to much more general vector norms \(\Vert \cdot \Vert _\alpha \) and \(\Vert \cdot \Vert _\beta \) than \(\ell ^p\) norms. In particular, we have shown how to compute matrix norms induced by monotonic norms of the form \(\Vert x \Vert _{\alpha }=\Vert \big (\Vert x \Vert _{\alpha _1},\ldots ,\Vert x \Vert _{\alpha _d}\big ) \Vert _{\gamma }\), where we also allow \(\Vert x \Vert _{\alpha _i}\) to measure only a subset of the coordinates of x. Using these kinds of norms we can globally solve in polynomial time quite sophisticated nonconvex optimization problems, as we discuss in the examples corollaries at the end of Sect. 5.