Two methods for the maximization of homogeneous polynomials over the simplex

The paper deals with the numerical solution of the problem P to maximize a homogeneous polynomial over the unit simplex. We discuss the convergence properties of the so-called replicator dynamics for solving P. We further examine an ascent method, which also makes use of the replicator transformation. Numerical experiments with polynomials of different degrees illustrate the theoretical convergence results.


Introduction
The present paper studies two special methods for computing local or global maximizers of homogeneous polynomials of degree d over the unit simplex in ℝ n . The maximization problem is referred to as P (cf., (1)). We are especially interested in the so-called replicator iteration to solve P. The mathematical structure and theoretical properties of P have been investigated, e.g., in [5,6] for d = 2 and, e.g., in [1] for d > 2 . The problem considered has many applications, such as portfolio decisions [15], evolutionary biology (see [3,21] for d = 2 , and [20] for d > 2 ), as well as maximum clique problem in (hyper-) graphs [9].
It is well-known that for d ≥ 2 the global maximization problem P is NP-hard (cf., [19]). Approximation methods for computing a global maximizer have been investigated in [2] for d = 2 , and in [12,13] for d > 2 . However, in the present article we are less interested in the solution of the global maximization problem but we are looking for an algorithm to compute local maximizers of P. For this purpose we focus our attention to the so-called replicator dynamics (cf., (20) below). The useful monotonicity properties of the replicator transformation are well-known (see [2,14]). The convergence of the replicator iteration to a (unique) critical point of P has been proven in [16] for d = 2 . Also for d = 2 , results on the rate of convergence of the replicator dynamics have been presented in [17].
In the present paper, we are especially interested in the convergence properties of the replicator method for d > 2 . It appears that (as for d = 2 ) a linear convergence to a critical point x of P can only occur if a strict complementarity condition is satisfied at x . We therefore also examine an appropriate ascent method to increase the convergence speed. Another main aim of the paper is to provide a selection of numerical experiments with many example problems for objective functions of different degrees d.
The paper is organized as follows. Section 2 introduces the maximization problem P, presents properties of the homogeneous objective function f(x) in terms of the corresponding n-dimensional symmetric tensor A of order d, and summarizes useful optimality conditions for P.
In Sect. 3, we describe two important applications of our maximization problem. Section 4 contains the main results of the paper. We study the properties of the replicator method to find local maximizers of P. More precisely, in Sect. 4.1 we discuss the behavior of the replicator transformation y(x) (cf., (16)). It appears that the fixed points of this transformation y(x) coincide with the critical points of P. We summarize known convergence results of the replicator iteration for the case d = 2 and present new convergence results for d ≥ 2 . Section 4.3 shows how the replicator transformation y(x) can be used to set up an ascent method, enjoying a faster convergence in comparison with the pure replicator dynamics. Section 5 presents many numerical experiments with the replicator iteration and the special ascent method for example objective functions f(x) of different degrees d. Finally, Sect. 6 is devoted to concluding remarks.

Preliminaries
In the present paper, we consider the problem to compute local or global maximizers of an (in general nonconvex) homogeneous polynomial of degree d over the unit simplex in ℝ n : is a homogeneous polynomial of degree d given by the d th order n-dimensional tensor A = (a i 1 ,…,i d ) 1≤i 1 ,…,i d ≤n . As usual, e ∈ ℝ n is the vector with all components equal to 1. We define the index set N ∶= {1, … , n}.
In this paper, we always assume that the tensor A is symmetric, i.e., The space of all symmetric tensors of order d in n-variables has dimension n+d−1 d and is denoted by S n,d . For A, B ∈ S n,d we define the inner product A ⋅ B by, Then the polynomial f(x) of order d can be written as If we make use of the fact that f(x) is a d-linear form we write We further introduce the tensors of order d − 1: With these abbreviations the following holds for the elements of the gradient ∇f (x) and the Hessian ∇ 2 f (x), Interested in (local) maximizers of P we summarize some well-known facts needed below. For x ∈ Δ n the KKT condition is said to hold if with (unique) Lagrangian multipliers ∈ ℝ , 0 ≤ ∈ ℝ n , we have for all permutations of the numbers {1, … , d}. (3) We call such a point x a KKT point of P. A feasible point x ∈ Δ n satisfying (5) with ∈ ℝ n (without the non-negativity condition) is called a critical point of P. The KKT point x (critical point x ) is said to satisfy strict complementarity (SC) if For a KKT point x ∈ Δ n with corresponding (unique) multiplier 0 ≤ ∈ ℝ n the necessary second order (NSOC) and the sufficient second order conditions (SSOC) are given by where C x is the cone of critical directions, Here are the well-known optimality conditions for P (cf., e.g., [ and such a point x is a KKT point of P, if in addition we have Moreover, a critical point x satisfies the SC condition ( i ≠ 0 ∀i such that x i = 0 ) if and only if Remark 1 Note that for any critical point x ∈ Δ n of P satisfying x > 0 ( x in the relative interior of Δ n ) by definition ( x i i = 0 for all i) the multiplier (cf., (5)) must be zero. So, any critical point x > 0 must be a KKT point.

Remark 2
The assumption that a homogeneous polynomial f(x) of degree d, is given in the form with symmetric coefficients a i 1 ,…,i d , i.e., A = a i 1 ,…,i d 1≤i 1 ,…,i d ≤n ∈ S n,d , does not mean any restriction. Indeed, let be given a polynomial in the form (13). Take any of the subsets {j 1 , … , j d } ⊂ N such that j 1 ≤ j 2 ≤ … ≤ j d . Let S = S(j 1 , … , j d ) be the set of all permutations = (j 1 , … , j d ) of the numbers in {j 1 , … , j d } without repetition. Then by we have defined a symmetric tensor A such that f (x) = A ⋅ x d .

Remark 3
In this paper, we study problem P for the case of homogeneous polynomials f(x). But this does not mean a real restriction. Indeed suppose we have given a non-homogeneous polynomial f(x) of degree d. We then can simply transform f(x) into homogeneous form as follows. By multiplying each monomial We finally give some notation used throughout the paper. As usual, e i , i ∈ N , denote the standard basis vectors in ℝ n , ‖x‖ is the Euclidean norm of x ∈ ℝ n , and (10) for a tensor A ∈ S n,d we define the (Frobenius) norm ‖A‖ = √ A ⋅ A . This tensor norm satisfies �A ⋅ B� ≤ ‖A‖‖B‖ and for x ∈ Δ n we find ‖x d ‖ ≤ 1.

Applications
Polynomial optimization over the simplex has numerous applications. In this section we describe two of them. For both we present numerical experiments in Sect. 5.

Evolutionarily stable strategies
The first application depends on the close relation between the maximization problem P and a model in evolutionary biology. For d = 2 the model goes back to Maynard Smith and Price [22] and for d > 2 we refer to [7,20].
Let us consider a population of individuals which differ in n distinct features (also called strategies): • For x = (x 1 , … , x n ) ∈ Δ n , the component x i gives the percentage of the population having feature i. So, x gives the strategy (state) of the whole population. • We are given a symmetric fitness tensor A ∈ S n,d . The value A ⋅ x d represents the (mean) fitness of a population with strategy x.
In evolutionary biology special strategies called evolutionarily stable strategies (ESS) play an important role.

Definition 1 [ESS in biology]
A point x ∈ Δ n is called an ESS for A ∈ S n,d if for all x ≠ y ∈ Δ n the following conditions hold for the d-linear form If only the first relation F(y, x, … , x) ≤ F(x, x, … , x)∀y ∈ Δ n holds, the strategy x is called a symmetric Nash equilibrium (see, e.g., [1, Remark 3.1]).
The following relation is well-known, see, e.g., [

3
Two methods for the maximization of homogeneous polynomials…

Cliques in a d-graph
The optimization problem P is also connected with the maximum clique problem on graphs ( d = 2 ) and on hypergraphs ( d > 2).
is not contained in any other clique, and it is said to be a maximum clique if C has maximum cardinality among all cliques. The complement G of a d-graph G is defined as the As we will see now, there is a close relation between the maximal/maximum clique problem in a d-graph and special problems P. For d = 2 (graphs) we refer the reader to [4,18] and for d > 2 (hypergraphs) to [9].
Let us introduce the so-called Lagrangean L G of the complement G of a d-graph G, and for > 0 the homogeneous polynomial of degree d, Then consider the optimization problem: Define for a subset S ⊂ N the characteristic vector x S ∈ Δ n by Then the following holds (see [9, Theorem 5] for a proof).
. Then x ∈ Δ n is a strict local (global) maximizer of (15) if and only if x = x C is the characteristic vector of a maximal (maximum) clique C of G.

Numerical methods based on the replicator transformation
In this section, we are going to discuss two methods for computing local maximizers of P which will be based on the following so-called replicator transformation: For given A ∈ S n,d and x ∈ Δ n define the transformation We emphasize that to assure that the image y(x) remains in Δ n we have to make sure that the values A ⋅ x d and A i ⋅ x d−1 remain positive. To do so throughout the paper we assume In Sect. 4.4 lateron, we will see that this assumption does not mean any real restriction.

Properties of the replicator transformation y(x)
As usual a point x ∈ Δ n is called a fixed point of the transformation y in (16) if y(x) = x holds or equivalently if A comparison with (10) reveals the following key connection between the transformation y and the maximization problem P. For x ∈ Δ n it holds, It is well-known that the transformation y(x) enjoys a monotonicity property.

Lemma 3 (Monotonicity)
For any x ∈ Δ n and y(x) given by (16) it holds for

is a fixed point of y).
Proof For a proof we refer to [2,14]. The proofs are valid for general (non-symmetric) tensors. ◻ In [17] a stronger version of the monotonicity result has been proven for d = 2: It is not difficult to show that this formula remains true for d > 2.
i.e., all elements of A are positive.

3
Two methods for the maximization of homogeneous polynomials… The monotonicity behavior of y(x) gives rise to the following replicator iteration for solving the maximization problem P: Start with some x (0) ∈ Δ n and iterate (cf., (16)) In algorithmic form this iteration leads to Method 1.

Method 1 Replicator dynamics
Input: Update: By a simple compactness and continuity argument combined with Lemma 3, we obtain (see [16,Proposition 1]). (20) has a limit point (at least one) and each limit point x of this iteration is a fixed point of y , i.e., a critical point of P (cf., (19)).

Lemma 4 Any sequence
Unfortunately, limit points x of (20) (i.e., critical points of P) need not satisfy the necessary KKT condition for a local maximizer of P (see Lemma 1(a)). If however the iterates x (k) in (20) converge to a fixed point x of y which is not a KKT point of P (cf., also Remark 1), then the following lemma could be used to jump away from x to better points.
Lemma 5 Let x ∈ Δ n be a fixed point of y but not a KKT point of P , so that by (11) there is some index j 0 with x j 0 = 0 and A j 0 ⋅x d−1 > A ⋅x d . Then for all t ≥ 0 the points x+te j 0 1+t are in Δ n and satisfy Proof Since e Tx +te j 0 1+t = 1 1+t (e Tx + t) = 1 and To prove the second statement it suffice to show that the differentiable function g(t) = f x+te j 0 1+t satisfies g � (0) > 0 . By our assumptions we actually obtain

Convergence properties of the replicator iteration
Let A and thus f (x) = A ⋅ x d be given, and consider for some starting point x (0) the replicator iterates x (k) in (20). Let S = S(x (0) ) denote the set of all limit points of the sequence x (k) . Recall that by Lemma 4 the set S is nonempty and is contained in the set of all fixed points of the mapping y (cf., (16)).

Lemma 6 The limit point set S is closed (compact), connected, and all points
Proof A proof is, e.g., to be found in [16,Proposition 1], where the case d = 2 is considered. The proof is also valid for the general case d ≥ 2 . ◻ More difficult is to answer the following two questions.
• Is for each starting point x (0) ∈ Δ n the replicator iteration (20) For d = 2 the first question has been answered positively in [17,Theorem 3.2] and in [16,Convergence Theorem 2]. However, a generalisation of the proofs to the case d > 2 seems to be difficult. The arguments in these proofs essentially make use of the fact that (for d = 2 ) the function f(x) is quadratic.
For d = 2 also the second question has been answered in [17] as follows. Recall that the convergence of x (k) → x is said to be linear, if with some constants C > 0 and q, 0 ≤ q < 1 , we have The constant q is called linear convergence factor.

3
Two methods for the maximization of homogeneous polynomials… Proof For (a) we refer to [17,Theorem 3.3] and for (b) to [17,Theorem 3.4]. ◻ Again, we did not see a way to generalize the results of Theorem 2 to the case d > 2 (except for the only if direction in Theorem 2(b), cf., Theorem 3 below). The reason is that the proof of the preceding theorem is based on Lemmas 2.3,2.4 in [17] which again essentially depend on the fact that for d = 2 the function f(x) is a quadratic polynomial.
Notice that if the SC condition (21) is not fulfilled the convergence must be sublinear and according to Theorem 2(a) the convergence might be extremely slow.
Recall that for d = 2 (cf., [17,Theorem 3.2]), any iteration (20) converges to a (unique) point x . Since we did not succeed in a generalization of this statement to d > 2 , we will at least give a partial convergence result for the general case d ≥ 2.
Theorem 3 For any d ≥ 2 the following holds.
(a) Let the iterates x (k) given by (20) have a limit point x such that x is a KKT point with corresponding (unique) multiplier 0 ≤ ∈ ℝ n , satisfying the second order condition SSOC (cf., (7)). Then the sequence x (k) converges to x , i.e., the point x is the only limit point. (b) Suppose x is a KKT point of P satisfying the second order condition SSOC, as well as the SC condition (12). Then there is some 1 > 0 such that the following is true: The point x is the only fixed point x ∈ Δ n of y(x) with ‖x − x‖ < 1 , and for any starting point x (0) ∈ Δ n , with ‖x (0) − x‖ < 1 the replicator iterates (20) converge to x.
Proof By the assumptions the limit point x (see Lemma 1(b)) satisfies the relation (8). So let > 0, c > 0 be given as in (8). The mapping y(x) is a continuous function of x (recall that by assumption we have A ⋅ x d ≥ m > 0, ∀x ∈ Δ n ). So, in view of y(x) − x = 0 , given , we can chose some 0 > 0 such that Clearly, we can chose 0 < 2 . For any , 0 < < 0 , there is some 1 such that Again take 1 < . Since x is a limit point of x (k) we can find an iterate x (k ) satisfying ‖ x (k ) − x ‖ < 1 . By (22) we then have In view of (8) and the monotonicity condition f (y(x (k ) )) ≥ f (x (k ) ) it follows i.e., ‖x (k +1) − x‖ < . We can repeat the same arguments and obtain ‖y(x (k +1) ) − x‖ < , ‖x (k +2) ) − x‖ 2 ≤ 2 and thus for all k ≥ k the iterates x (k) stay in the neighborhood {x ∈ Δ n | ‖x − x‖ < } . Since can be chosen arbitrarily small it follows that the sequence x (k) must converge to x. (b) By continuity, using SC, there exists 2 > 0 such that (cf., (11), (12)) It now follows that if x ∈ Δ n , ‖x − x‖ < 2 , is a fixed point of y, then it must be a KKT point of P. Indeed, by (23) we obtain x i = 0 ⇒ x i = 0 and thus by the first relation of (23) we find Consequently, in view of (11) x is a KKT point. According to the last statement of Lemma 1(b), by making 2 smaller if necessary, x must coincide with x , so that x is the only fixed point in a 2 -neighborhood of x.
To prove the convergence statement recall the arguments in the proof of part (a) and choose in (a) smaller than 2 . Then if we take a starting point x (0) ∈ Δ n satisfying ‖x (0) − x‖ < 1 (with 1 in (a)) by the arguments above, for all iterates x (k) the relation holds. To show that x (k) converges to x , suppose to the contrary that the sequence x (k) has a limit point x,x ≠ x . By (24) we have ‖x − x‖ ≤ < 2 in contradiction to the fact that x is the only fixed point satisfying ‖x − x‖ < 2 .

By [11, Proposition 2.26] we have
Now let x (k) converge linearly to x , i.e., with some C > 0 , 0 < q < 1 , it holds componentwise So for any index i such that x (0) i > 0 , x i = 0 (the nontrivial case), this implies Consequently using (26) we obtain ◻ We emphasize that the SC condition (25) in Theorem 4 is slightly stronger than the SC condition (21) in Theorem 2(b). The former in particular implies that if we start with x (0) > 0 then linear convergence can only occur if x (k) converges to a KKT point x . Recall (cf., Remark 1) that any fixed point x of y satisfying x > 0 must be a KKT point.

Speeding up the convergence by line maximization
Notice that by Theorem 4, linear convergence of the replicator iteration (20) to a fixed point x of y can only happen if the SC condition (25) is met. Keep in mind that this SC condition trivially holds for a fixed point x > 0 (i.e., x is in the relative interior of Δ n ). In other words, for a fixed point x where (25) is not fulfilled, a convergence to this point must be sublinear. According to Theorem 2(a) the sublinear convergence might be extremely slow.
On the other hand, even when for larger d (say d > 4 ) linear convergence occurs, the linear convergence factor q < 1 might be very close to 1 (see numerical experiments below).
So we may be interested to speed up the convergence by an appropriate ascent method using (some sort of) line maximization. Also here the transformation y(x) can be used to achieve that. Indeed it appears, that the vector s = y(x) − x is a direction of growth for f. (16), The preceding lemma suggests the following ascent method for solving P.

Lemma 7 For all x ∈ Δ n it holds for y(x) in
Input: It can easily be seen that for the sequence x (k) generated by Method 2 a result as in Lemma 4 remains true. Also Lemma 6 is still valid for the set S of limit points of such a sequence x (k) , except for the connectness of S. The reason is that the "pure" line maximization of Method 2 allows the iterates x (k) to jump (infinitely many times) between different (unconnected) fixed points x,x of S. In our numerical experiments, we however did never see such an event. A ziggzagging between different fixed points could however be prevented by an appropriate steplength control in the line maximization.
Efficient implementation of the line maximization step (3): We add some remarks on an efficient implementation of the line maximization step (3) of Method 2. Recall that by Lemma 3 we have for To find the optimal step-size t we have to solve the line maximization problem (with The condition x + t(y(x) − x) ∈ Δ n , t ≥ 0 , leads to the restriction So we have to solve the line maximization problem with a polynomial g(t) of degree d. Notice that in view of formula (28) for x close to a fixed point x of y the coefficients i , 1 ≤ i ≤ d , tend to zero according to (recall Consequently, for x ≈ x the coefficients i , i ≥ 3 , will only play a minor role in the line maximization of g(t) and we propose the following efficient procedure.
where t * ∶= min We emphasize that for solving this approximate line maximization problem we only have to compute the coefficients 1 , 2 of g(t). This can be done efficiently, requiring (roughly) the computation time of one function evaluation of f. (27) is particularly easy for d = 2 . Then (as a simple exercise) we can show: the optimal step-size t x in (27) is given by

Dependence of the replicator iteration on a transformation A → A +˛E
We have to recall that the property and the monotonicity f (y(x)) ≥ f (x) for x ∈ Δ n depends on the assumption We have assured these conditions by assuming A > 0 , i.e., all elements of the tensor A are positive.
In case A has negative elements we can force positivity by applying a transformation A → A + E , where E ∈ S n,d is the tensor with all elements equal to 1, and ∈ ℝ is chosen such that A + E > 0 holds. So instead of f (x) = A ⋅ x d we consider the function f (x) ∶= (A + E) ⋅ x d . Using E ⋅ x d = 1 for x ∈ Δ n , we obtain the identities Fortunately under this transformation the fixed points of y(x) remain the same.

Lemma 8
Under the transformation f (x) → f (x) , ∈ ℝ , the fixed points of y and the KKT points of P , and thus the maximizers of P , remain unchanged.
Proof Consider the transformation f (x) → f (x) and the corresponding transformation y(x) → y (x) where y (x) is the vector with coefficients So the fixed point relation y (x) = x , x ∈ Δ n , is equivalent with

3
Two methods for the maximization of homogeneous polynomials… which are the fixed point relations for y(x). The same is true for the additional KKT conditions (cf., (11)). ◻ Note that formally we have f = f 0 and y = y 0 . Evidently, all results in the preceding subsections remain valid for any function f , y , as long as (A + E) > 0 is satisfied. However the iterates x (k+1) = y (x (k) ) of the replicator dynamics (20) clearly depend on . So starting at the same point x (0) the limit points of these iterates and the speed of convergence may strongly depend on the choice of . We give an illustrative example.
It is not difficult to see that depending on the choice of the linear convergence factors q (k) converge to: showing how in this example the linear convergence factor increases with increasing .
Notice that the SC condition (25) holds at x (and x ). For i = 2 with x 2 = 0 we actually find We can also give a general formula for the dependence on of the transformation y (x) compared with y(x) = y 0 (x): So for small ∶= A⋅x d using 1 Rewriting this relation in the form it appears that compared with = 0 , for > 0 the new point y (x) is closer to x (i.e., the step is shorter). This may be seen as an indication that with increasing the speed of convergence is decreasing, as was shown explicitly in Example 1. All our numerical experiments suggest such a behavior. So as a rule of thumb, to assure (A + E) > 0 we should choose appropriately small.

Numerical experiments
In this section, we report on many examples which illustrate numerically the theoretical convergence results in Section 4.
In all examples, we indicate the numbers n, d, the function f(x) and the choices of to assure f (x) > 0 (cf., Subsection 4.4). We also give the starting point x (0) and specify whether the replicator iteration (20) (Method 1) is applied or Method 2 is used. For the former we describe the convergence behavior of the sequence x (k) → x . If linear convergence occurs also information on the linear convergence factors q (k) ∶= ‖x (k+1) −x‖ ‖x (k) −x‖ is provided.

3
Two methods for the maximization of homogeneous polynomials… Compared with the replicator iteration, Method 2 shows a considerable speeding up of the convergence. Method 2: Note that for the same value = 1 different starting points led to convergence to x with different linear convergence factors 0.963.., resp., 0.667.. .
Method 2: Method 2: For this example problem we have added some more experiments with = 1 . For both methods, 1000 random initial points x (0) have been generated. The stopping criterion for convergence is min{‖x (k) −x‖, ‖x (k) −x‖, ‖x (k) −x‖} < 10 −3 and we allowed a maximum number of 1000 iterations in each run. In both methods, for x (0) near x the replicator iteration converges to x . For Method 1 only one of the 1000 initial points converged to x within the 1000 iterations and none converged to x (cf., Figure 1, Table 1). The convergence to these points was too slow (sublinear). Method 2 performed better (see Figure 2 and Table 1). All maximizers x,x,x have been detected.   Table 1 presents the results of the experiments for both methods. It contains the number of cases out of the 1000 runs where convergence occurred, as well as the corresponding maximum, minimum and mean numbers (with ± standard deviation) of iterations for the random initial point x (0) . Example 7 (cf. [8,Example 9.8]) In this example we consider the ESS problem with n = 2, d = 4 . The tensor A ∈ S 2,4 of the function f (x) = A ⋅ x d is given by: The function f has two (global) maximizers x = ( 1 4 , 3 4 ) and x = ( 3 4 , 1 4 ) . We choose = 13 96 + 1 100 and generate 1000 random instances for the initial vector x (0) . In this example for all initial vectors the iterations met the convergence criterion  Example 9 We apply Method 2 to detect the maximal cliques for some of the graphs in the well-known Second DIMACS Implementation Challenge. Note that in this case we have d = 2 and line maximization step is simplified (see Remark 4). 1 For each graph we apply the method for 1000 initial points x (0) . The results are shown in Table 4. Column 4 and 5 give the number n of nodes and the number |E| of edges. In the last column the notation c ∶ # means that a maximal clique of size c has been detected # times out of the 1000 runs with stopping criterion ‖x k+1 − x k ‖ ≤ 10 −6 .

Conclusion
In this paper we are interested in the problem to maximize homogeneous polynomials of degree d over the simplex. Two methods for the computation of local (global) maximizers are investigated. Both are based on the replicator dynamics. New convergence results for the replicator iteration are presented for the case d > 2 . Linear convergence to a maximizer x can only occur if strict complementarity holds at x . Many numerical experiments illustrate the theoretical convergence results.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/ licenses/by/4.0/.