On local optimality of vertex type designs in generalized linear models

Locally optimal designs are derived for generalized linear models with ﬁrst order linear predictors. We consider models including a single factor, two factors and multiple factors. Mainly, the experimental region is assumed to be a unit cube. In particular, models without intercept are considered on arbitrary experimental regions. Analytic solutions for optimal designs are developed under the D- and A-criteria, and more generally, for Kiefer’s (cid:2) k -criteria. The focus is on the vertex type designs. That is, the designs are only supported by the vertices of the respective experimental regions. By the equivalence theorem, necessary and sufﬁcient conditions are developed for the local optimality of these designs. The derived results are applied to gamma and Poisson models.


Introduction
The generalized linear model (GLM) was developed by Nelder and Wedderburn (1972). It is viewed as a generalization of the ordinary linear regression which allows continuous or discrete observations from one-parameter exponential family distributions to be combined with explanatory variables (factors) via proper link functions. Generalized linear models include several types such as Poisson, gamma, logistic models among others. Therefore, wide applications can be addressed by GLMs such as social and educational sciences, clinical trials, insurance, industry (Walker and Duncan 1967;Myers and Montgomery 1997;Fox 2015;Goldburd et al. 2016 While deriving optimal designs is obtained by minimizing the variance-covariance matrix there is no loss of generality to concentrate on maximizing the Fisher information matrix. For generalized linear models the Fisher information matrix depends on the model parameters. Therefore, the optimal design cannot be found without a prior knowledge of the parameters. One approach is the so-called local optimality, which was proposed by Chernoff (1953). This approach aims at deriving an optimal design at a given parameter value (best guess).
As in many research works the results on optimal designs in particular, on a continuous experimental region are influenced by the type of models used. For example, Ford et al. (1992) used single-factor GLMs. Moreover, Gaffke et al. (2019) and Russell et al. (2009) used the gamma and the Poisson models, respectively, while the logistic model was employed by Yang et al. (2011) and Atkinson and Haines (1996).
In this paper we focus on the problem of finding locally optimal designs for a class of generalized linear models, which is motivated by the work of Yang and Stufken (2009) and Tong et al. (2014), who provided analytic results for a general setup of the GLM with binary factors. Here, we are also interested in deriving locally optimal designs for a general setup of generalized linear models on continuous and discrete experimental regions. Schmidt and Schwabe (2017) showed that the support points of the optimal designs for GLMs on an experimental region given by a polytope are located at the edges of the experimental region. In particular, in Gaffke et al. (2019) we proved that the optimal designs for gamma models are supported by the vertices of the experimental region (polytope). In this paper, we will restrict our attention to the vertices of the experimental region by which optimal designs can be supported for the corresponding generalized linear models. Throughout the sequel, we confine ourselves to the general equivalence theorem to establish a necessary and sufficient condition for a design to be locally optimal.
The remainder of the paper is organized as follows. In Sect. 2 we introduce the generalized linear model and optimality of designs. Approaches to determine the optimal weights for some particular designs under D-, A-and Kiefer k -criteria are characterized in Sect. 3. Then optimal designs are derived under the single-factor and the two-factor models in Sects. 4 and 5, respectively. First order models of multiple factors are presented in Sect. 6, and optimal designs are derived for such models with and without intercept. Applications of the results are discussed under gamma and Poisson model in Sect. 7.

Preliminary
In this section, we introduce the generalized linear model and give a characterization of optimal designs. Let the univariate observation (response) Y belongs to a oneparameter exponential family distribution in the canonical form p(y; θ) = exp yθ − b(θ ) + c(y) , (2.1) where b(·) and c(·) are known functions while θ is a canonical parameter. In the generalized linear model each response Y of a statistical unit is observed at a certain value of a covariate x = (x 1 , . . . , x ν ) T that belongs to an experimental region X ⊆ R ν , ν ≥ 1. Here, θ := θ(x, β) varies with the value of x ∈ X at a fixed value of the vector of model parameters β ∈ R p . The expected mean is given by E McCullagh and Nelder (1989, Sect. 2.2.2)]. Let f (x) : X → R p be a vector of continuous regression functions f 1 (x), . . . , f p (x) which are assumed to be linearly independent. Denote the linear predictor by η = f T (x)β. In the generalized linear model it is assumed that η = g μ(x, β) , where g is a link function and assumed to be one-to-one and differentiable. We can define the intensity function at a point x ∈ X as where dμ(x, β)/dη = 1/g (μ(x, β)). Obviously, u(x, β) is positive for all x ∈ X and may be regarded as a weight for the corresponding unit at the point x (Atkinson and Woods 2015). The Fisher information matrix at x ∈ X [see Fedorov and Leonov (2013, Sect. 1.3.2)] is given by An information matrix of the form (2.3) is appropriate for other nonlinear models, e.g., model with survival times observations employing the proportional hazards (Schmidt and Schwabe 2017). Moreover, under homoscedastic regression models the intensity function is constantly equal to 1 whereas, under heteroscedastic regression models we get intensity that is equal to 1/var(Y ), which depends on x only and thus we have information matrix of the form M(x) = u(x) f (x) f T (x) that does not depend on the model parameters. The latter case was discussed in Graßhoff et al. (2007) and in the book by Fedorov and Leonov (2013, p.13). Throughout the present work we will deal with the approximate (continuous) design theory. An approximate design ξ can be defined as a probability measure with finite support on the experimental region X , where r ∈ N, x 1 , x 2 , . . . , x r ∈ X are pairwise distinct points and ω 1 , ω 2 , . . . , ω r > 0 with r i=1 ω i = 1. The set supp(ξ ) = {x 1 , x 2 , . . . , x r } is called the support of ξ and ω 1 , . . . , ω r are called the weights of ξ [see Silvey (1980, p.15)]. The information matrix of a design ξ from (2.4) at a parameter point β is defined by (2.5) One might recognize M(ξ, β) as a convex combination of all information matrices for all support points of ξ . Another representation of the information matrix (2.5) can be utilized based on the r × p design matrix F = [ f (x 1 ), . . . , f (x r )] T and the r × r weight matrix V = diag(ω i u(x i , β)) r i=1 and hence we can write Remark A particular type of designs appears frequently when the support size equals the dimension of f , i.e., r = p. In such a case the design is minimally supported and it is often called a minimal-support or a saturated design.
This paper focuses on optimal designs within the family of Kiefer's k -criteria (Kiefer 1975). These criteria aim at minimizing the k-norm of the eigenvalues of the variancecovariance matrix and include the most common criteria for D-, A-and E-optimality. Denote by λ i (ξ, β) (1 ≤ i ≤ p) the eigenvalues of a nonsingular information matrix M(ξ, β). Denote by "det" and "tr" the determinant and the trace of a matrix, respectively. The Kiefer's k -criteria are defined by Note that 0 M(ξ, β) , 1 M(ξ, β) and ∞ M(ξ, β) are the D-, A-and E-criteria, respectively. Since M(ξ, β) depends on the values of the parameters, a best guess of β is adopted here and locally D-optimal designs are constructed (Chernoff 1953). A locally k -optimal design ξ * (at β) minimizes the function k M(ξ, β) over all designs ξ whose information matrix M(ξ, β) is nonsingular. For 0 ≤ k < ∞ the strict convexity of k M(ξ, β) implies that the information matrix of a locally k -optimal design (at β) is unique. That is, if ξ * and ξ * * are two locally k -optimal designs (at β) then M(ξ * , β) = M(ξ * * , β) (Kiefer 1975). In particular, D-optimal designs are constructed to minimize the determinant of the variance-covariance matrix of the estimates or equivalently to maximize the determinant of the information matrix. The D-criterion is typically defined by the convex function D (M(ξ, β)) = − log det M(ξ, β) . Aoptimal designs are constructed to minimize the trace of the variance-covariance matrix of the estimates, i.e., to minimize the average variance of the estimates. The A-criterion is typically defined by A M(ξ, β) = tr M −1 (ξ, β) . Moreover, E-optimal designs maximize the smallest eigenvalue of M(ξ, β).
In order to verify the local optimality of a design the general equivalence theorem is commonly employed [see Atkinson et al. (2007, p.137)]. It provides necessary and sufficient conditions for a design to be optimal and thus the optimality of a suggested design can be easily verified or disproved. The design ξ * is k -optimal if and only if (2.6) Furthermore, if the design ξ * is k -optimal then inequality (2.6) becomes equality at its support.

Remark
The left hand side of condition (2.6) of the general equivalence theorem is called the sensitivity function.
Moreover, the choice of optimal weights of the saturated design under Kiefer kcriteria was given in Pukelsheim et al. (1991). It was stated in Schmidt (2019), Sect. 5, that the method of Pukelsheim et al. (1991) provides a system of equations that must be solved numerically. In the following, explicit optimal weights of the k -optimal saturated design are derived for a GLM without intercept specifically, under the first order model f (x) = (x 1 , . . . , x ν ) T and a parameter vector β = (β 1 , . . . , β ν ) T . The choice of locally k -optimal weights which yields the minimum value of k M(ξ, β) over all saturated designs with the same support is given by the following lemma.

Lemma 3.2 Consider a GLM without intercept with
. Let a vector a = (a 1 , . . . , a ν ) T be given with positive components. Then the design ξ * a which achieves the minimum value of k M(ξ a , β) over all designs ξ a with supp(ξ a ) = {x * 1 , . . . , x * ν } assigns weights to the corresponding design points It is straightforward to see that the equation for all (1 ≤ i ≤ ν) which are the optimal weights given by the lemma.

Single-factor model
In this section we deal with the simplest case under a model with a single factor Let the experimental region is taken to be the continues unit interval X = [0, 1]. In the following we introduce, for a fixed β = (β 0 , β 1 ) T , the function which will be utilized for the characterization of the optimal designs. Consider the following conditions: Recently, Lemma 1 in Konstantinou et al. (2014) showed that under the above conditions (i)-(iii) a locally D-optimal design on [0, 1] is only supported by two points a and b where 0 ≤ a < b ≤ 1. In what follows an analogous result is presented for locally optimal designs under various optimality criteria.
then the support points of a locally optimal design ξ * is concentrated on exactly two points a and b where 0 ≤ a < b ≤ 1.
which is a polynomial in x of degree 2 where x ∈ X . Hence, by the general equivalence theorem ξ * is locally optimal (at β) if and only if The above inequality is similar to that obtained in the proof of Lemma 1 in Konstantinou et al. (2014) and thus the rest of our proof is analogous to that.
Accordingly, for D-optimality we have c = 2 and β). In general, under Kiefer's Generalized D-criterion and L-criterion can be applied (Atkinson and Woods 2015, Chapter 10).
As a consequence of Lemma 4.1, we next provide sufficient conditions for a design supported by the boundary points 0 and 1 to be locally D-or A-optimal on X = [0, 1] at a given β. Let q(x) = 1/u(x, β) and denote q 0 = q 1 2 (0) and q 1 = q 1 2 (1).
Let a parameter point β = (β 0 , β) T be given. Let q(x) be positive and twice continuously differentiable. Then: (i) The unique locally D-optimal design ξ * (at β) is the two-point design supported by 0 and 1 with equal weights 1/2 if The unique locally A-optimal design ξ * (at β) is the two-point design supported by 0 and 1 with weights Proof Part (i): Condition (2.6) of the general equivalence theorem for k = 0 implies that ξ * is locally D-optimal if and only if Since the support points are {0, 1}, the l.h.s. of the above inequality equals zero at the boundaries of [0, 1]. Then it is sufficient to show that the aforementioned l.h.s. is convex on the interior (0, 1) and this convexity realizes under condition (4.1) asserted in the theorem. Now to show that ξ * is unique at β assume that ξ * * is locally D-optimal at β. Then M(ξ * , β) = M(ξ * * , β) and therefore, the condition of the equivalence theorem under ξ * * is equivalent to (4.3) and this is an equation only at the support of ξ * , i.e., 0 and 1. Part (ii): This case can be shown in analogy to Part (i) by employing condition (2.6) of the general equivalence theorem for k = 1 with tr(M −1 (ξ * , β)) = ( √ 2q 0 + q 1 ) 2 . The optimal weights ω * 0 and ω * 1 are derived according to (3.1) in Sect. 3.

Two-factor model
In this section we consider a first order model of two factors (5.1)
Theorem 5.1 Consider a GLM with f (x) = 1, x 1 , x 2 T and the experimental region (3) ξ * = x * 1 x * 3 x * 4 1/3 1/3 1/3 if and only if Proof The proof of cases (1) -(4) is demonstrated by making use of condition (2.6) for k = 0 of the general equivalence theorem. For case ( ) (1 ≤ ≤ 4) denote the design T and the weight matrix U = diag u i , u j , u k such that 1 ≤ i < j < k ≤ 4 and i, j, k = 4 − + 1. We will show that the condition in each case (1)-(4) is equivalent to To this end, for each case (1) -(4), we report the matrices F, F −1 and U It remains to show that the design ξ * is unique at β. Suppose that ξ * and ξ * * are locally D-optimal at β. Then by the strict convexity of the D-criterion we have M(ξ * In analogy to Theorem 5.1 we introduce locally A-optimal designs in the next theorem. (

For each case (1)-(4), the constant c appearing in the weights equals the sum of the numerators of the three ratios. (5) Otherwise, ξ * is supported by the four design points
Proof We make use of condition (2.6) for k = 1 of the general equivalence theorem. In analogy to the proof of Theorem 5.1 for case ( such that 1 ≤ i < j < k ≤ 4 and i, j, k = 4 − + 1. Then we obtain C = F −1 T F −1 . An elementary calculation shows that the weights given by (3.1) for an A-optimal design coincide with the ω * i (1 ≤ i ≤ 3) as stated in the theorem. Now we show that the design ξ * is locally A-optimal if and only if the corresponding condition holds. We have . So, together with condition (2.6) of the general equivalence theorem for k = 1 the design ξ * is locally A-optimal (at β) if and only if Straightforward calculation shows that condition (5.2) is equivalent to the respective condition in Case ( ).
Remark Yang et al. (2011) developed a method to find locally optimal designs for logistic models of multiple factors. It was assumed that one factor is defined on the whole real line while the other factors belong to a compact region which seems in conflict with the experimental region given in Theorem 5.1. Then a subclass of designs was established by Loewner semi ordering of nonnegative definite matrices and so, one could focus on this subclass to derive optimal designs. A similar strategy was used in Gaffke et al. (2019) for gamma models on the experimental region [0, 1] ν , ν ≥ 1. Nevertheless, it seems that this strategy may not work for a general setup of the generalized linear model. However, consider a logistic model of two factors with f (x) = (1, x 1 , x 2 ) T and intensity function u(x, β) = exp(β 0 + β 1 x 1 + β 2 x 2 )/(1 + exp(β 0 + β 1 x 1 + β 2 x 2 )) 2 . According to Yang et al. (2011) the experimental region is assumed to be X = [0, 1] × R, i.e., x 2 ∈ (−∞, ∞). From Yang et al. (2011), Corollary 1, a locally D-optimal design is given by where c * is the maximizor of c 2 exp(c)/(1 + exp(c)) 2 3 . In general, ξ * is not covered by Theorem 5.1. In contrast to that, for a particular parameter point β = (β 0 , β 1 , β 2 ) T such that β 1 = 0, β 2 = −2β 0 and β 0 = c * the design ξ * is supported by the vertices of [0, 1] 2 .

Corollary 5.1
Consider a GLM with f (x) = 1, x 1 , x 2 T and the experimental regioñ intensity values u 1 , u 2 , u 3 , u 4 rearranged in ascending order. Then: (i) The design ξ * is supported by the three design points whose intensity values are given by u (2) , u (3) , u (4) , with equal weights 1/3 if and only if (ii) The design ξ * is supported by the four design points x * 1 , x * 2 , x * 3 , x * 4 with weights ω * 1 , ω * 2 , ω * 3 , ω * 4 which are uniquely determined by the condition Proof The proof is demonstrated by Theorem 5.1. The condition of ξ * in part (i) comes by the the corresponding inequality in cases (1)

. Then the locally D-optimal design (at β) is supported by the four design points
Proof Since assumption (ii) of Corollary 5.1 is fulfilled by a point β the design is supported by all points x * 1 , x * 2 , x * 3 , x * 4 . Then the optimal weights are obtained according to Lemma 3.1 where we have d 2 i = 1 (1 ≤ i ≤ 4) and u 2 = u 3 . Hence, the results follow.
Now we restrict to A-optimal designs on the set of verticesX = {0, 1} 2 . It can also be noted that the design points with highest intensities perform as a support of a locally A-optimal design at a given parameter value. ≤ 4). Then the unique locally A-optimal design ξ * is as follows.

Corollary 5.2 Consider the assumptions and notations of Corollary
(

For each case (i) -(iv), the constant c appearing in the weights equals the sum of the numerators of the three ratios. (5) Otherwise, ξ * is supported by the four design points
As the optimal weights of the A-optimal designs depend on the model parameters each condition provided in the theorem characterizes a subregion of the parameter space where the corresponding designs with the same support are A-optimal. 6 Multiple regression model

) is locally D-optimal (at β) if and only if
We have where 0 1×ν , 1 ν×1 , and I ν denote the ν-dimensional row vector of zeros, the νdimensional column vector of ones, and the ν × ν unit matrix, respectively. So, by condition (2.6) of the general equivalence theorem for k = 0 the design is locally D-optimal if and only if The l.h.s. of (6.4) reads as and hence it is obvious that (6.4) is equivalent to (6.2).

Remark
The D-optimal design under a two-factor model with support (0, 0) T , (1, 0) T , (0, 1) T from Theorem 5.1 , part (1) is covered by Theorem 6.1 for ν = 2. It is clear that condition (6.2) for ν = 2 is equivalent to the inequality In analogy to Theorem 6.1 we present locally A-optimal designs in the next theorem.

locally A-optimal (at β) if and only if for all
Proof As in the proof of Theorem 6.1 the design matrix F and its inverse are given by (6.3) and we obtain This yields √ c 11 /u 1 = √ ν + 1q 1 and √ c ii /u i = q i for i = 2, . . . , ν + 1 according to (3.1) in Sect. 3 with p = ν + 1. An elementary calculation shows that the weights given by (3.1) for an A-optimal design coincide with the ω * i (1 ≤ i ≤ p) as stated in the theorem. Now we show that the design ξ * is locally A-optimal if and only if (6.5) holds. Let U = diag u 1 , . . . , u p , = diag ω * 1 , . . . , ω * p and V = U. Then we have So, together with condition (2.6) of the general equivalence theorem for k = 1 the design ξ * is locally A-optimal (at β) if and only if Straightforward calculation shows that condition (6.6) is equivalent to condition (6.5).

Model without intercept
Consider a model of multiple factors and without intercept. We assume a first order model The experimental region X has an arbitrary form. Locally optimal designs will be derived under Kiefer's k -criteria. The support points are located at the boundary of X and the optimal weights are obtained according to Lemma 3.2.
Theorem 6.3 Consider model (6.7) on an experimental region X . Let a vector a = (a 1 , . . . , a ν ) T be given such that a i > 0 (1 ≤ i ≤ ν). Denote the design points by that are assumed to belong to X . For a given parameter . Let k with 0 ≤ k < ∞ be given. Let ξ * a be the saturated design whose support consists of the points x * i (1 ≤ i ≤ ν) with the corresponding weights Then ξ * a is locally k -optimal (at β) if and only if , and On local optimality of vertex type designs… Adopting these formulas simplifies the l.h.s. of condition (2.6) of the general equiv- if and only if condition (6.8) holds true.
The optimality condition (6.8) does not depend on the value of k. However, from Theorem 6.3 the locally D-optimal design (k = 0) has weights ω * i = 1/ν (1 ≤ i ≤ ν) and the locally A-optimal design (k = 1) has weights

Applications
In this section, we give a discussion on the application of the previous results for the generalized linear models. Here, emphasis will be laid on gamma and Poisson models. However, it is known that the linear regression model is a GLM. Therefore, to begin with, we briefly focus on the k -optimality under a non-intercept linear model with Here, u(x, β) = 1 for all x ∈ X so the information matrices in a linear model are independent of β. Note that Theorem 6.3 does not cover a non-intercept linear model on X since condition (6.8) does not hold true for ν ≥ 2. However, the l.h.s. of condition (2.6) of the general equivalence theorem under linear models, i.e., when u(x, β) = 1, is strictly convex and it attains its maximum at some vertices of X . Thus the support of any k (or D, A)-optimal design is a subset of {0, 1} ν . As a result, in particular for Dand A-optimality, one might apply the results of Theorem 3.1 in Huda and Mukerjee (1988), which were obtained under linear models on {0, 1} ν .

Gamma model
A gamma model is given by Here, κ is the shape parameter of the gamma distribution which is assumed to be fixed and positive. The expected mean μ(x, β) for the gamma distribution is positive for all x ∈ X . The parameter space including all possible parameter vector β is determined by the assumption f T (x)β > 0 for all x ∈ X .

Corollary 7.3 Consider a non-intercept gamma model with
For a given vector a = (a 1 , . . . , a ν ) T where a i > 0 (1 ≤ i ≤ ν) let x * i = a i e i for all i = 1, . . . , ν. Let k with 0 ≤ k < ∞ be given. For a given parameter point β ∈ (0, ∞) ν let ξ * a be the saturated design whose support consists of the points x * i (1 ≤ i ≤ ν) with the corresponding weights Then ξ * a is locally k -optimal (at β).
The optimal weights given in Corollary 7.3 are the same irrespective of the values a 1 , . . . , a ν . The reason is that the information matrix under a gamma model without intercept is invariant with respect to simultaneous scaling of the factors. Therefore, we get M(a i e i , β) = M(e i , β), i = 1, . . . , ν. Note also that Corollary 7.3 covers Theorem 3.1 in Idais and Schwabe (2020) who provided locally D-and A-optimal designs for non-intercept gamma models.

Poisson model
A Poisson model is given by Here, the expected mean μ(x, β) for the Poisson distribution is positive for all x ∈ X . The parameter vector β ∈ R p is a real-valued vector.