1 Introduction

Generalized linear models are a powerful tool to analyze data for which the standard linear model approach is not adequate. The idea of generalized linear models goes back to Nelder and Wedderburn [28], and their concept is comprehensively presented in the monograph by McCullagh and Nelder [27]. The statistical analysis is well developed in generalized linear models, and there is also a considerable amount of literature on optimal design in this situation (see, e.g., Atkinson and Woods [3] and the literature cited therein).

In generalized linear models, the performance of a design depends not only on the experimental settings but, in contrast with linear models, also on the values of the underlying parameters. Even more crucially, the solutions for optimal designs also depend on the parameters. As those are commonly unknown at the design step, nominal values of these parameters have to be specified prior to the experiment which leads to the concept of locally optimal designs (see Chernoff [5]). This approach has been frequently employed (see [1, 13, 14, 36, 41,42,43] among others), and it provides at least a benchmark for the quality of a design.

To overcome the parameter dependence, robust criteria have been proposed which either impose a prior weight on the parameters (Bayesian design, see, e.g., Atkinson et al. [2], ch. 18) or choose a minimax approach over a parameter region of interest (maximin efficiency, see, e.g., [8, 10, 17, 22]).

The construction of optimal designs for generalized linear models is difficult, and often numerical algorithms are employed to find a solution. To reduce the complexity of the search for a good design, one can make use of symmetries (“invariance”) in the design problem which can be described by transformations of the experimental settings and the location parameters in the linear component. The concept of invariance or, more specifically, equivariance with respect to transformations has been used for long in statistical analysis and dates back to Pitman [30] (see, e.g., Lehmann [24, ch. 6], for a comprehensive description). The underlying idea of transformations which are conformable with the model (“reparameterization”) has been successfully adapted to optimal design theory in linear models. In contrast with equivariance in statistical analysis, the parameter values do not play a role in optimal designs for linear models. Therefore, only transformations of the experimental settings have to be considered there. With these transformations, optimal designs may be first determined on a standardized experimental region and then transferred to more general regions as long as the transformation is order preserving with respect to the design criterion (see, e.g., Heiligers and Schneider [19]). This covers, for example, the situation of D-optimal designs for polynomial regression on an arbitrary (multivariate) interval.

The stronger concept of invariance requires a whole group of equivariant transformations. In linear models, invariance has been widely used to characterize optimal designs which reflect the symmetries resulting from the group actions (see Pukelsheim [33, ch. 5], or Schwabe [38, ch. 3]). These groups of transformations may cover reflections and rotations for quantitative variables as well as permutations of levels and factors for categorical variables and combinations thereof. In the context of generalized linear models, however, the concept of invariance is not well established. This seems to be mainly due to the fact that local optimality criteria lack symmetries, in general, because they depend on the parameter values. Therefore, we have to account for this dependence by also transforming the parameters similar to the situation in statistical analysis.

For the underlying concept of equivariance, we thus need a pair of transformations which acts simultaneously on the experimental settings and on the parameter values. The most famous representative of this concept is known as the canonical transformation defined in Ford et al. [13]. But the motivation there is different from our approach exhibited in Radloff and Schwabe [34]. The canonical transformation starts with a standardization of the nominal value of the parameters. This standardization is compensated by an associated transformation of the experimental settings which leaves the value of the linear component unchanged. In contrast with that, we start with a transformation of the experimental settings as in linear models. This transformation is conformable (“linearly equivariant,” see Schwabe [38, ch. 3]) with the regression functions in the sense that it results in a linear reparameterization of the linear component. This linear reparameterization might be the associated action on the parameter values as in the canonical transformation, but we allow for more general, even nonlinear, transformations of the parameters. Moreover, in its standard formulation, the canonical transformation deals with one quantitative explanatory variable with a straight line relationship for the linear component. A generalization of the canonical transformation to multiple explanatory variables is given in Sitter and Torsney [40]. In our approach, there is no restriction on the number of explanatory variables, on their impact on the linear component described by the regression functions, or on whether they are quantitative or categorical.

For the concept of invariance in generalized linear models, symmetries are also required in the parameters which concur with the symmetries in the experimental settings. This requirement is hardly met in the case of local optimality, but Bayesian or maximin efficiency criteria can incorporate symmetries in their prior or in their parameter region of interest (see Radloff and Schwabe [34]).

Based on this approach, we develop in the following the concept of equivariance and invariance in generalized linear models and their application to optimal designs step-by-step and illustrate each step by a running example of gamma models with canonical link functions. This kind of gamma model is chosen because it exhibits an additional scaling property which provides a more complex, nonlinear symmetry structure.

The paper is organized as follows. In Sect. 2, we introduce the model assumptions and the design criteria. In Sect. 3, we discuss the concept of equivariance under standard linear transformations of the parameters and show how optimal designs can be transferred from one experimental region to another. In Sect. 4, the concept of equivariance is extended to nonlinear transformations of the parameters. In Sect. 5, the general concept of invariance is introduced and optimal designs are obtained for various situations. Finally, Sect. 6 concludes the paper with a short discussion and an outlook.

2 Basics: Model Specification, Information, and Design

We consider a response variable Y for which the dependence on a (potentially multi-dimensional) covariate \({\mathbf {x}}\) can be described by a generalized linear model. This means that the distribution of Y comes from a given exponential family and the mean \(\mu = \mathrm {E}(Y)\) is related to the linear component \({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta } = \sum _{j = 0}^{p - 1} \beta _j f_j({\mathbf {x}})\) by a one-to-one link function. In the linear component, \({\mathbf {f}}({\mathbf {x}}) = (f_0({\mathbf {x}}), \ldots , f_{p - 1}({\mathbf {x}}))^\mathsf{T}\) is a p-dimensional vector of given regression functions \(f_0({\mathbf {x}}), \ldots , f_{p - 1}({\mathbf {x}})\) and \(\varvec{\beta } = (\beta _0, \ldots , \beta _{p - 1})^\mathsf{T}\) is a p-dimensional vector of parameters \(\beta _0, \ldots , \beta _{p - 1}\) to be estimated. Traditionally the link function maps the mean to the linear component (see McCullagh and Nelder [27, ch. 2]). For analytical purposes, however, it is more convenient to describe the dependence of the mean on the linear component,

$$\begin{aligned} \mu = \mu ({\mathbf {x}}; \varvec{\beta }) = \eta ({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta }), \end{aligned}$$
(1)

where \(\eta \) is the inverse of the link function. For example, for the \(\log \) link \(\eta \) is the exponential and for the inverse link \(\eta \) is the reciprocal function.

As a particular case and for illustrative purposes, we consider gamma models. Such models are frequently used in engineering applications. For example, in Dette et al. [9] a gamma model is considered in a thermal spraying process. Further applications in the fields of ecology, medicine, and psychology can be found in Gea-Izquierdo and Cañellas [15], Grover et al. [18], and Ng and Cribbie [29]. In a gamma model, the response Y is gamma distributed. One possibility to parameterize its density is given by \(f_Y(y) = y^{\kappa - 1} \exp ( - y / \theta ) / (\theta ^\kappa \Gamma (\kappa ))\), where \(\kappa > 0\) and \(\theta > 0\) denote the shape and scale parameters, respectively. In this case, the expectation of Y is given by \(\mu = \kappa \theta \). In order to end up with a one-parametric exponential family, we suppose that the shape parameter \(\kappa \) is a fixed nuisance parameter (see Atkinson and Woods [3]). For example, \(\kappa = 1\) gives the family of exponential distributions, or for fixed integer \(\kappa \) one obtains a family of certain Erlang distributions.

For the link function, we assume the inverse link \(\kappa / \mu = {\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta }\). Alternatively, the \(\log \) link is frequently used in gamma models (see, e.g., Ford et al. [13]). However, the inverse link appears to be more suitable for illustrative purposes. Moreover, the inverse link is equal to the canonical link \( - \kappa / \mu = {\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta }\) up to the minus sign (see McCullagh and Nelder [27, ch. 3]). This means that all subsequent results will also be valid for the canonical link, and the minus sign is suppressed for notational reasons.

The inverse \(\eta \) of the inverse link is given by

$$\begin{aligned} \eta (z) = \kappa / z , \end{aligned}$$
(2)

which itself is equal to the inverse \( - \kappa / z\) of the canonical link up to the minus sign. Then, the responses \(Y_i\) of a sample \(Y_1, \ldots , Y_n\) with covariates \({\mathbf {x}}_1, \ldots , {\mathbf {x}}_n\) are gamma distributed with means \(\mu _i = \eta ({\mathbf {f}}({\mathbf {x}}_i)^\mathsf{T}\varvec{\beta })\) and common shape parameter \(\kappa \).

In an experimental design setup, the covariates \({\mathbf {x}}_i\) may be chosen by the experimenter from an experimental region \({\mathcal {X}}\) over which the model under consideration is assumed to be valid. For gamma distributed responses, as an additional side condition, the means \(\mu _i\) have to be positive (\(\mu ({\mathbf {x}}_i; \varvec{\beta }) > 0\)). This implies the natural restriction on the parameter region \({\mathcal {B}}\) of potential values for the parameter vector \(\varvec{\beta }\) that for every \(\varvec{\beta } \in {\mathcal {B}}\) the linear component has to be positive (\({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta } > 0\)) for all \({\mathbf {x}} \in {\mathcal {X}}\). Further note that for reasons of parameter identifiability the regression functions \(f_0, \ldots , f_{p - 1}\) are assumed to be linearly independent on the experimental region \({\mathcal {X}}\).

The aim of experimental designs is to optimize the performance of statistical analysis. The contribution of an observation \(Y_i\) to the performance is measured in terms of its information. In the present generalized linear models framework, for a single observation at an experimental setting \({\mathbf {x}}\) the elemental information matrix is given by

$$\begin{aligned} {\mathbf {M}}({\mathbf {x}}; \varvec{\beta }) = \lambda ({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta })\,{\mathbf {f}}({\mathbf {x}})\,{\mathbf {f}}({\mathbf {x}})^\mathsf{T} \end{aligned}$$
(3)

(see Fedorov and Leonov [12] or Atkinson and Woods [3]) where \(\lambda \) is a positive valued function which is called the intensity function. Note that through the intensity function the elemental information depends on the parameter vector \(\varvec{\beta }\).

In generalized linear models, the intensity is given by

$$\begin{aligned} \lambda ({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta }) = \eta '({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta })^{2} / \mathrm {Var}(Y) . \end{aligned}$$
(4)

In the case of a canonical link, we have \(\mathrm {Var}(Y) = \eta '({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta })\) and the intensity reduces to the variance. In particular, in the gamma model with inverse link the intensity function is

$$\begin{aligned} \lambda (z) = \kappa / z^{2}, \end{aligned}$$
(5)

because the minus sign in the inverse of the link function does not affect the intensity (cf. Gaffke et al. [14]). The (per experiment) Fisher information of n independent observations \(Y_i\) at experimental settings \({\mathbf {x}}_i\) is then given by

$$\begin{aligned} {\mathbf {M}}({\mathbf {x}}_1, \ldots , {\mathbf {x}}_n; \varvec{\beta }) = \sum _{i=1}^n {\mathbf {M}}({\mathbf {x}}_i; \varvec{\beta }) = \sum _{i=1}^n \lambda ({\mathbf {f}}({\mathbf {x}}_i)^\mathsf{T}\varvec{\beta })\, {\mathbf {f}}({\mathbf {x}}_i)\,{\mathbf {f}}({\mathbf {x}}_i)^\mathsf{T}. \end{aligned}$$
(6)

The aim of finding an exact optimal design \({\mathbf {x}}_1^*, \ldots , {\mathbf {x}}_n^*\) is to optimize the Fisher information in a certain sense because the inverse is proportional to the asymptotic covariance matrix of the maximum likelihood estimator for \(\varvec{\beta }\) (see Fahrmeir and Kaufmann [11]).

As this discrete optimization problem is too difficult, in general, we will deal with approximate (continuous) designs \(\xi \) in the spirit of Kiefer [23] (see also Silvey [39, p. 15]) throughout the remainder of the present paper. An approximate design \(\xi \) is defined on the experimental region \({\mathcal {X}}\) by mutually distinct support points \({\mathbf {x}}_1,\ldots ,{\mathbf {x}}_m\) and corresponding weights \(w_1, \ldots , w_m > 0\) such that \(\sum _{i=1}^{m} w_i = 1\). In terms of an exact design, the support points \({\mathbf {x}}_i\) may be interpreted as the distinct experimental settings and the weights \(w_i\) as their corresponding relative frequencies in the sample. The relaxation of an approximate design is then that the weights \(w_i\) may be chosen continuously and need not be multiples of 1/n. The standardized (per observation) information matrix of a design \(\xi \) is defined by

$$\begin{aligned} {\mathbf {M}}(\xi ; \varvec{\beta }) = \sum _{i=1}^m w_i {\mathbf {M}}({\mathbf {x}}_i; \varvec{\beta }) = \sum _{i=1}^m w_i \lambda ({\mathbf {f}}({\mathbf {x}}_i)^\mathsf{T}\varvec{\beta })\, {\mathbf {f}}({\mathbf {x}}_i)\,{\mathbf {f}}({\mathbf {x}}_i)^\mathsf{T}. \end{aligned}$$
(7)

Design optimization is now concerned with finding an approximate design \(\xi ^*\) which minimizes a convex real-valued criterion function \(\Phi \) of the Fisher information \({\mathbf {M}}(\xi ; \varvec{\beta })\) of the design \(\xi \). A design \(\xi ^*\) will then be called \(\Phi \)-optimal when it minimizes \(\Phi (\xi )\), \(\Phi (\xi ^*) = \min \Phi (\xi )\). As the information matrix depends on the parameter vector \(\varvec{\beta }\), the obtained design \(\xi ^*\) is locally \(\Phi \)-optimal at a given parameter value \(\varvec{\beta }\) [5] and may change with \(\varvec{\beta }\). To avoid the parameter dependence, so-called robust versions of the criteria can be considered like “Bayesian” criteria which involve a weighting measure (“prior”) on the parameters (see Atkinson et al. [2, ch. 18]) or “minimax” criteria which aim at minimizing the worst case scenario for the parameter settings (see the “standardized minimax” criteria in [8]). In the following, we will focus on the local D- and IMSE-criteria and the corresponding maximin efficiency (“standardized maximin”) criteria.

The D-criterion is the most commonly used design criterion. It is related to the estimation of the model parameters \(\varvec{\beta }\) and aims at minimizing the determinant of the asymptotic covariance matrix, \(\Phi ({\mathbf {M}})=\det ({\mathbf {M}}^{-1})\) for positive definite information matrix \({\mathbf {M}}\), and \(\Phi ({\mathbf {M}})=\infty \) for singular \({\mathbf {M}}\). A design \(\xi ^*\) is then called locally D-optimal at \(\varvec{\beta }\) when \(\det ({\mathbf {M}}(\xi ^*; \varvec{\beta })^{-1})=\min \det ({\mathbf {M}}(\xi ; \varvec{\beta })^{-1})\). The D-criterion can be motivated by the fact that it measures the (squared) volume of the asymptotic confidence ellipsoid of the maximum likelihood estimator for \(\varvec{\beta }\). However, its popularity predominantly stems from its nice analytic properties.

Note that in the present situation the property of \({\mathbf {M}}(\xi ; \varvec{\beta })\) being nonsingular does not depend on the value of the parameter vector \(\varvec{\beta }\) because the intensity \(\lambda ({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta })\) is greater than zero for all \({\mathbf {x}}\in {\mathcal {X}}\) and all \(\varvec{\beta } \in {\mathcal {B}}\).

The definition of the IMSE-criterion (alternatively, also called I-, V- or Q-optimality in the literature) is based on the estimation (prediction) of the mean response \(\mu ({\mathbf {x}}; \varvec{\beta })\). It aims at minimizing the average asymptotic variance of the predicted mean response \({\hat{\mu }}({\mathbf {x}}) = \mu ({\mathbf {x}}; \hat{\varvec{\beta }})\), where averaging is taken with respect to a standardized measure \(\nu \) on \({\mathcal {X}}\) (see Li and Deng [25, 26]). For a generalized linear model, the asymptotic variance is given by

$$\begin{aligned} \mathrm {asVar}({\hat{\mu }}({\mathbf {x}}); \xi , \varvec{\beta }) = \eta '({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta })^{2} {\mathbf {f}}({\mathbf {x}})^\mathsf{T}{\mathbf {M}}(\xi ; \varvec{\beta })^{-1} {\mathbf {f}}({\mathbf {x}}), \end{aligned}$$
(8)

for all \({\mathbf {x}} \in {\mathcal {X}}\). For a canonical link, we have \(\lambda = \eta '\) and hence

$$\begin{aligned} \mathrm {asVar}({\hat{\mu }}({\mathbf {x}}); \xi , \varvec{\beta }) =\lambda ({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta })^{2} {\mathbf {f}}({\mathbf {x}})^\mathsf{T}{\mathbf {M}}(\xi ; \varvec{\beta })^{-1} {\mathbf {f}}({\mathbf {x}}). \end{aligned}$$
(9)

The integrated mean-squared error (IMSE) is then defined as the average prediction variance

$$\begin{aligned} \mathrm {IMSE}(\xi ; \varvec{\beta },\nu )= & {} \int \mathrm {asVar}({\hat{\mu }}({\mathbf {x}}); \xi , \varvec{\beta })\,\nu (\mathrm {d} {\mathbf {x}}) \nonumber \\= & {} \int \lambda ({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta })^{2} {\mathbf {f}}({\mathbf {x}})^\mathsf{T}{\mathbf {M}}(\xi ; \varvec{\beta })^{-1} {\mathbf {f}}({\mathbf {x}})\, \nu (\mathrm {d} {\mathbf {x}}) \end{aligned}$$
(10)

with respect to a given standardized measure \(\nu \) on the experimental region \({\mathcal {X}}\) (\(\nu ({\mathcal {X}}) = 1\)).

By a standard method to express the IMSE-criterion (see, e.g., Li and Deng [26]), the asymptotic variance can be rewritten as

$$\begin{aligned} \mathrm {asVar}({\hat{\mu }}({\mathbf {x}}); \xi , \varvec{\beta }) = \mathrm {trace}(\lambda ({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta })^{2} {\mathbf {f}}({\mathbf {x}}) {\mathbf {f}}({\mathbf {x}})^\mathsf{T}{\mathbf {M}}(\xi ; \varvec{\beta })^{-1}) . \end{aligned}$$
(11)

Hence, the IMSE is given by

$$\begin{aligned} \mathrm {IMSE}(\xi ; \varvec{\beta }, \nu ) = \mathrm {trace}({\mathbf {V}}(\varvec{\beta }; \nu )\,{\mathbf {M}}(\xi ; \varvec{\beta })^{-1}) , \end{aligned}$$
(12)

where

$$\begin{aligned} {\mathbf {V}}(\varvec{\beta }; \nu ) = \int \lambda ({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta })^{2} {\mathbf {f}}({\mathbf {x}}) {\mathbf {f}}({\mathbf {x}})^\mathsf{T}\,\,\nu (\mathrm {d} {\mathbf {x}}) \end{aligned}$$
(13)

denotes a weighted “moment” matrix with respect to the measure \(\nu \). Note that the leading term under the integral in \({\mathbf {V}}(\varvec{\beta };\nu )\) differs from that in the virtual information matrix \({\mathbf {M}}(\nu ; \varvec{\beta })\) by replacing the intensity \(\lambda \) by \(\lambda ^2\). Moreover, in contrast with the D-criterion, the IMSE-criterion does not solely depend on the information matrix \({\mathbf {M}}(\xi ; \varvec{\beta })\), but also depends through the weighting matrix \({\mathbf {V}}(\varvec{\beta }; \nu )\) explicitly on the parameter vector \(\varvec{\beta }\) and additionally on the measure \(\nu \) as a supplementary argument. The IMSE-criterion is thus defined by \(\Phi ({\mathbf {M}}; \varvec{\beta }, \nu ) = \mathrm {trace}({\mathbf {V}}(\varvec{\beta }; \nu )\,{\mathbf {M}}^{-1})\). A design \(\xi ^*\) is then called locally IMSE-optimal with respect to \(\nu \) at \(\varvec{\beta }\) when \(\mathrm {trace}({\mathbf {V}}(\varvec{\beta }; \nu ) {\mathbf {M}}(\xi ^*; \varvec{\beta })^{-1}) = \min \mathrm {trace}({\mathbf {V}}(\varvec{\beta }; \nu ) {\mathbf {M}}(\xi ; \varvec{\beta })^{-1})\).

To avoid the parameter dependence of an optimal design under local criteria, we will also consider as a “robust” alternatives maximin efficiency criteria which are also called standardized optimality criteria (see Dette et al. [10]). For this, we first have to introduce the concept of efficiency. Let the local criterion \(\Phi _{\varvec{\beta }}\) at \(\varvec{\beta }\) depend homogeneously on the information matrix, i.e., \(\Phi _{\varvec{\beta }}(\xi ) = \phi ({\mathbf {M}}(\xi ; \varvec{\beta }))\) for some function \(\phi \) on the set of positive definite matrices satisfying \(\phi (c {\mathbf {M}}) = c^{-1} \phi ({\mathbf {M}})\) for \(c>0\) (cf Pukelsheim [33], ch. 5, for the related concept of information functions). Then, the efficiency of a design \(\xi \) (locally at \(\varvec{\beta }\)) is defined by

$$\begin{aligned} \mathrm {eff}(\xi ; \varvec{\beta }) = \frac{\Phi _{\varvec{\beta }}(\xi _{\varvec{\beta }}^*)}{\Phi _{\varvec{\beta }}(\xi )}, \end{aligned}$$

where \(\xi _{\varvec{\beta }}^*\) is the \(\Phi \)-optimal design (locally at \(\varvec{\beta }\)). Maximin efficiency then aims at maximizing the worst efficiency \(\inf _{\varvec{\beta } \in {\mathcal {B}}^{\prime }} \mathrm {eff}_{\Phi }(\xi ; \varvec{\beta })\) over a given subset \({\mathcal {B}}^{\prime }\) of interest of the parameter region \({\mathcal {B}}\). In order to arrive at a minimization problem, we define the maximin efficiency criterion by the inverse relation

$$\begin{aligned} \Phi (\xi ) = \sup _{\varvec{\beta } \in {\mathcal {B}}^{\prime }} \frac{\Phi _{\varvec{\beta }}(\xi )}{\Phi _{\varvec{\beta }}(\xi _{\varvec{\beta }}^*)} . \end{aligned}$$
(14)

Note that \(\Phi \) is convex if, for all \(\varvec{\beta }\), the local criteria \(\Phi _{\varvec{\beta }}\) are convex.

For maximin D-efficiency, we have to choose the homogeneous version \(\Phi _{\varvec{\beta }}(\xi )=(\det ({\mathbf {M}}(\xi ;\varvec{\beta })))^{-1/p}\) of the local D-criterion (see [33, ch. 6]) to get the maximin D-efficiency criterion

$$\begin{aligned} \Phi _{\textit{D-ME}}(\xi ) = \sup _{\varvec{\beta } \in {\mathcal {B}}^{\prime }} \left( \frac{\det ({\mathbf {M}}(\xi ; \varvec{\beta }))}{\det ({\mathbf {M}}(\xi _{\varvec{\beta }}^*; \varvec{\beta }))} \right) ^{ - 1 / p} , \end{aligned}$$

where \(\xi _{\varvec{\beta }}^*\) denotes the locally D-optimal design at \(\varvec{\beta }\). The D-efficiency can then be interpreted as the proportion of observations required under the D-optimal design \(\xi _{\varvec{\beta }}^*\) to obtain the same value of the determinant as for design \(\xi \). For example, an efficiency of 0.5 means that with a D-optimum design \(\xi _{\varvec{\beta }}^*\) only half as many observations as for \(\xi \) are necessary to get the same precision. A design \(\xi ^*\) is then called maximin D-efficient on \({\mathcal {B}}^{\prime }\) when \(\Phi _{\textit{D-ME}}(\xi ^*) = \min \Phi _{\textit{D-ME}}(\xi )\).

The local IMSE-criterion is already homogeneous because it is a linear criterion. Thus, the maximin IMSE-efficiency criterion can be defined directly as

$$\begin{aligned} \Phi _{\textit{IMSE-ME}}(\xi ;\nu ) = \sup _{\varvec{\beta } \in {\mathcal {B}}^{\prime }} \frac{\mathrm {IMSE}(\xi , \varvec{\beta }, \nu )}{\mathrm {IMSE}(\xi ^*_{\varvec{\beta }}, \varvec{\beta }, \nu )}, \end{aligned}$$

where \(\xi _{\varvec{\beta }}^*\) denotes the locally IMSE-optimal design at \(\varvec{\beta }\). A design \(\xi ^*\) is then called maximin IMSE-efficient with respect to \(\nu \) on \({\mathcal {B}}^{\prime }\) when \(\Phi _{\textit{IMSE-ME}}(\xi ^*; \nu ) = \min \Phi _{\textit{IMSE-ME}}(\xi ; \nu )\).

In particular, for the gamma model with inverse link we have \(\lambda (z)=\kappa /z^2\) (see (5)) which implies that

$$\begin{aligned} {\mathbf {M}}(\xi ;\varvec{\beta }) = \sum _{i=1}^m w_i \kappa ({\mathbf {f}}({\mathbf {x}}_i)^\mathsf{T}\varvec{\beta })^{ - 2}\, {\mathbf {f}}({\mathbf {x}}_i)\,{\mathbf {f}}({\mathbf {x}}_i)^\mathsf{T} \end{aligned}$$
(15)

and

$$\begin{aligned} {\mathbf {V}}(\varvec{\beta };\nu ) = \int \kappa ^2({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta })^{ - 4} {\mathbf {f}}({\mathbf {x}}){\mathbf {f}}({\mathbf {x}})^\mathsf{T}\,\,\nu (\mathrm {d} {\mathbf {x}}) . \end{aligned}$$
(16)

Hence, in both the D- and the IMSE-criterion the shape parameter \(\kappa \) occurs only as a factor which does not affect the optimization problem. Without loss of generality, we may thus assume \(\kappa = 1\) in the remainder of the text.

3 Equivariance

Invariance and equivariance play an important role for optimal design in linear models. However, these concepts can also be applied in the context of generalized linear models as established in Radloff and Schwabe [34].

The essential idea of equivariance in the design setup is to transfer an already known optimal design on a given (standardized) experimental region to another experimental region of interest by a suitable transformation while keeping the model structure unchanged. The most prominent approach of this kind is the method of canonical transformation propagated by Ford et al. [13].

Throughout we accompany each conceptual step by a simple running example (Example 1). We start with a one-to-one transformation \(g:\,{\mathcal {X}} \rightarrow {\mathcal {Z}}\) which maps the experimental region \({\mathcal {X}}\) onto a potentially different region \({\mathcal {Z}}\).

Example 1

Let \({\mathcal {X}} = [0, 1]\) be the one-dimensional standard unit interval and \({\mathcal {Z}} = [a, b]\) another non-degenerate interval, \(b > a\). Then, the shift and scale transformation \(g(x) = a + c x\), where \(c = b - a\), maps \({\mathcal {X}}\) onto \({\mathcal {Z}}\). \(\square \)

The next ingredient connects the transformation g with the vector of regression functions: \({\mathbf {f}}\) is said to be linearly equivariant with respect to g if there exists a (nonsingular) matrix \({\mathbf {Q}}_g\) such that \({\mathbf {f}}(g({\mathbf {x}})) = {\mathbf {Q}}_g {\mathbf {f}}({\mathbf {x}})\) for all \({\mathbf {x}}\in {\mathcal {X}}\), which will be assumed to hold throughout the remainder of this text.

Example

(Example 1 continued) Let \({\mathbf {f}}(x) = (1, x)^\mathsf{T}\) be the vector of regression functions for a simple one-dimensional linear regression, \(p = 2\), such that the linear component is \({\mathbf {f}}(x)^\mathsf{T}\varvec{\beta } = \beta _0 + \beta _1 x\). Then, for \(g(x) = a + c x\) the transformation matrix \({\mathbf {Q}}_g\) is given by

$$\begin{aligned} {\mathbf {Q}}_g = \left( \begin{array}{cc} 1 &{} 0 \\ a &{} c \end{array} \right) . \end{aligned}$$

\(\square \)

In contrast with the situation in linear models, additionally a transformation \({\tilde{g}}:\,{\mathcal {B}} \rightarrow \tilde{{\mathcal {B}}}\) of the parameter vector \(\varvec{\beta }\) is required in the present setup of generalized linear models. This approach of equivariance with respect to a pair \((g, {\tilde{g}})\) of transformations of the settings \({\mathbf {x}}\) and the parameters \(\varvec{\beta }\), respectively, is in accordance with the general concept of equivariance in statistical analysis (see, e.g., Lehmann [24, ch. 6]).

A natural choice for the transformation \({\tilde{g}}\) is a reparameterization which leaves the value of the linear component unchanged, \({\mathbf {f}}(g({\mathbf {x}}))^\mathsf{T}{\tilde{g}}(\varvec{\beta }) = {\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta }\) for all \({\mathbf {x}} \in {\mathcal {X}}\). This is accomplished by setting \({\tilde{g}}(\varvec{\beta }) = {\mathbf {Q}}_g^{-\mathsf{T}} \varvec{\beta }\), where “\(\cdot ^{-\mathsf{T}}\)” denotes the inverse of a transposed matrix, and \(\tilde{{\mathcal {B}}} = {\tilde{g}}({\mathcal {B}})\). For convenience, we denote \({\tilde{g}}(\varvec{\beta })\) by \(\tilde{\varvec{\beta }} = ({\tilde{\beta }}_0, \ldots , {\tilde{\beta }}_{p - 1})^\mathsf{T}\).

Example

(Example 1 continued) For \(g(x) = a + c x\) and simple linear regression \({\mathbf {f}}(x) = (1, x)^\mathsf{T}\), the transformation matrix for the parameter vector is

$$\begin{aligned} {\mathbf {Q}}_g^{-\mathsf{T}} = \left( \begin{array}{cc} 1 &{} - a / c \\ 0 &{} 1 / c \end{array} \right) , \end{aligned}$$

and the transformation \({\tilde{g}}(\varvec{\beta })={\mathbf {Q}}_g^{-\mathsf{T}}\varvec{\beta }\) results in \(\tilde{\beta _0} = \beta _0 - a \beta _1 / c\) and \(\tilde{\beta _1} = \beta _1 / c\). If for given value of \(\varvec{\beta } = (\beta _0, \beta _1)^\mathsf{T}\), the pair \((g, {\tilde{g}})\) is chosen in such a way that \(\tilde{\varvec{\beta }} = (0, 1)^\mathsf{T}\), i.e., \(c = \beta _1\) and \(a = \beta _0\), then g represents essentially the canonical transformation used in Ford et al. [13].

For the gamma model, the parameter region \({\mathcal {B}}\) is restricted by the constraint that the linear component \({\mathbf {f}}(x)^\mathsf{T}\varvec{\beta } = \beta _0 + \beta _1 x\) is positive for all \(x\in {\mathcal {X}} = [0, 1]\). Hence, the maximal parameter region is \({\mathcal {B}} = \{\varvec{\beta };\,\beta _0> 0, \beta _1 > - \beta _0\}\) which is displayed in Fig. 2 in Sect. 4. The transformed parameter region is then \(\tilde{{\mathcal {B}}} = {\tilde{g}}({\mathcal {B}}) = \{\tilde{\varvec{\beta }};\, \tilde{\beta _0} + \tilde{\beta _1} a> 0, \tilde{\beta _0} + \tilde{\beta _1} b > 0\}\). In particular, to obtain the symmetric unit interval \({\mathcal {Z}} = [ - 1, 1]\) as secondary experimental region, the transformation \(g(x) = 2 x - 1\) is to be chosen with \(a = - 1\) and \(c = 2\), and the transformed parameter region becomes \(\tilde{{\mathcal {B}}} = \{\tilde{\varvec{\beta }};\, |\tilde{\beta _1}| < \tilde{\beta _0}\}\). \(\square \)

Note that for each pair \((g, {\tilde{g}})\) of transformations the mean response and the intensity remain unchanged, \(\mu (g({\mathbf {x}}); {\tilde{g}}(\varvec{\beta })) = \mu ({\mathbf {x}}; \varvec{\beta })\) and \(\lambda ({\mathbf {f}}(g({\mathbf {x}}))^\mathsf{T}{\tilde{g}}(\varvec{\beta })) = \lambda ({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta })\). Having this in mind, we study how these transformations act on a design and its information matrix: For a design \(\xi \) with support points \({\mathbf {x}}_i\) and corresponding weights \(w_i\), \(i = 1, \ldots , m\), we denote by \(\xi ^g\) its image under the transformation g, i.e., \(\xi ^g\) has support points \({\mathbf {z}}_i = g({\mathbf {x}}_i)\) with weights \(w_i\), \(i = 1, \ldots , m\), respectively, and is hence a design on \({\mathcal {Z}}\). Then, for the associated information matrices we obtain

$$\begin{aligned} {\mathbf {M}}(\xi ^{g}; {\tilde{g}}(\varvec{\beta }))= & {} \sum _{i=1}^m w_i \lambda ({\mathbf {f}}(g({\mathbf {x}}_i))^\mathsf{T}{\tilde{g}}(\varvec{\beta })) {\mathbf {f}}(g({\mathbf {x}}_i)) {\mathbf {f}}(g({\mathbf {x}}_i))^\mathsf{T}\nonumber \\= & {} \sum _{i=1}^m w_i \lambda ({\mathbf {f}}({\mathbf {x}}_i)^\mathsf{T}\varvec{\beta }) {\mathbf {Q}}_{g} {\mathbf {f}}({\mathbf {x}}_i) {\mathbf {f}}({\mathbf {x}}_i)^\mathsf{T}{\mathbf {Q}}^\mathsf{T}_{g} \nonumber \\= & {} {\mathbf {Q}}_{g} \left( \sum _{i=1}^m w_i \lambda ({\mathbf {f}}({\mathbf {x}}_i)^\mathsf{T}\varvec{\beta }) {\mathbf {f}}({\mathbf {x}}_i) {\mathbf {f}}({\mathbf {x}}_i)^\mathsf{T}\right) {\mathbf {Q}}^\mathsf{T}_{g} = {\mathbf {Q}}_{g} {\mathbf {M}}(\xi ; \varvec{\beta }) {\mathbf {Q}}^\mathsf{T}_{g} \nonumber \\ \end{aligned}$$
(17)

(see Radloff and Schwabe [34]). In short, the pair \((g, {\tilde{g}})\) of simultaneous transformations induces the transformation \({\mathbf {M}}(\xi ; \varvec{\beta }) \rightarrow {\mathbf {Q}}_g {\mathbf {M}}(\xi ; \varvec{\beta }) {\mathbf {Q}}_g^\mathsf{T}\) of the information matrix.

Example

(Example 1 continued) Let \(\xi \) be supported on the endpoints \(x_1 = 0\) and \(x_2=1\) of the experimental region \({\mathcal {X}} = [0, 1]\) with corresponding weights \(w_1 = 1 - w\) and \(w_2 = w\), respectively. For the gamma model with simple linear regression, \({\mathbf {f}}(x) = (1, x)^\mathsf{T}\), denote by \(\lambda _0 = \lambda (\beta _0)\) and \(\lambda _1 = \lambda (\beta _0 + \beta _1)\) the intensities at the support points 0 and 1. The information matrix of \(\xi \) is given by

$$\begin{aligned} {\mathbf {M}}(\xi ; \varvec{\beta }) = \left( \begin{array}{cc} (1 - w) \lambda _0 + w \lambda _1 &{} w \lambda _1 \\ w \lambda _1 &{} w \lambda _1 \end{array} \right) . \end{aligned}$$

For \(g(x) = a + c x\), the induced design \(\xi ^g\) is supported on the endpoints \(z_1 = a\) and \(z_2 = b\) of the induced experimental region \({\mathcal {Z}} = [a, b]\) with weights \(1 - w\) at a and w at b. Under \(\tilde{\varvec{\beta }} = {\tilde{g}}(\varvec{\beta })\), the intensities at a and b are \(\lambda _0\) and \(\lambda _1\), respectively, and the information matrix of \(\xi ^g\) is

$$\begin{aligned} {\mathbf {M}}(\xi ^g; {\tilde{g}}(\varvec{\beta })) = \left( \begin{array}{cc} (1 - w) \lambda _0 + w \lambda _1 &{} (1 - w) \lambda _0 a + w \lambda _1 b \\ (1 - w) \lambda _0 a + w \lambda _1 b &{} (1 - w) \lambda _0 a^2 + w \lambda _1 b^2 \end{array}\right) = {\mathbf {Q}}_g {\mathbf {M}}(\xi ; \varvec{\beta }) {\mathbf {Q}}_g^\mathsf{T}. \end{aligned}$$

\(\square \)

The final step is the equivariance of the criterion \(\Phi \). In analogy to the terminology in Heiligers and Schneider [19] for linear models, we will call a convex optimality criterion \(\Phi \) equivariant with respect to a transformation g, if \(\Phi \) preserves the ordering under the transformation g, i.e., for any two designs \(\xi _1\) and \(\xi _2\) the relation \(\Phi (\xi _1) \le \Phi (\xi _2)\) implies \(\Phi (\xi _1^g) \le \Phi (\xi _2^g)\).

In the present situation of generalized linear models, more care has to be taken, since in addition the parameter vector \(\varvec{\beta }\) and eventually some supplementary arguments have to be changed in the criterion during the transformation. We therefore introduce a second criterion function \(\Phi ^{\prime } = \Phi _{g, {\tilde{g}}}\) for the designs on \({\mathcal {Z}}\) which may depend on the transformations g and \({\tilde{g}}\). Then, we will call a pair of criteria \(\Phi \) and \(\Phi ^{\prime }\) equivariant with respect to the pair \((g, {\tilde{g}})\) of transformations, when the ordering is preserved, i.e., the relation \(\Phi (\xi _1) \le \Phi (\xi _2)\) implies \(\Phi ^{\prime }(\xi _1^g) \le \Phi ^{\prime }(\xi _2^g)\).

With these definitions, we obtain the following result that in the case of equivariance the optimality of designs is preserved under transformations.

Theorem 1

Let the pair of criteria \(\Phi \) and \(\Phi ^{\prime }\) be equivariant with respect to the pair \((g, {\tilde{g}})\) of transformations. If \(\xi ^*\) is \(\Phi \)-optimal, then its image \((\xi ^*)^g\) is \(\Phi ^{\prime }\)-optimal.

We will now establish that the D- and IMSE-criteria are equivariant, if simultaneously the parameter vector \(\varvec{\beta }\) and potential supplementary arguments are transformed. By (17), we obtain for the D-criterion

$$\begin{aligned} \det ({\mathbf {M}}(\xi ^g; {\tilde{g}}(\varvec{\beta }))^{-1}) = \det ({\mathbf {Q}}_g)^{-2} \det ({\mathbf {M}}(\xi ; \varvec{\beta })^{-1}). \end{aligned}$$
(18)

Let \(\Phi \) be the local D-criterion at \(\varvec{\beta }\) and \(\Phi ^{\prime }\) be the local D-criterion at \({\tilde{g}}(\varvec{\beta )}\), then the D-criterion is equivariant under simultaneous transformation of \(\varvec{\beta }\), and by Theorem 1 the locally D-optimal design can be transferred.

Corollary 1

If \(\xi ^*\) is locally D-optimal on \({\mathcal {X}}\) at \(\varvec{\beta }\), then \((\xi ^*)^g\) is locally D-optimal on \({\mathcal {Z}}\) at \(\tilde{\varvec{\beta }} = {\tilde{g}}(\varvec{\beta })\).

Example

(Example 1 continued) For the gamma model with simple linear regression, \({\mathbf {f}}(x) = (1, x)^\mathsf{T}\), the locally D-optimal design \(\xi ^{*}\) on the unit interval \({\mathcal {X}} = [0, 1]\) is supported by the endpoints \(x_1 = 0\) and \(x_2 = 1\) and assigns equal weights \(w^* = 1/2\) to these endpoints for any value of the parameter vector \(\varvec{\beta }\in {\mathcal {B}}\) (see Gaffke et al. [14]). Then, for any other interval \({\mathcal {Z}} = [a, b]\) as the experimental region we may consider the transformation \(g(x) = a + c x\), \(c = b - a\), together with \({\tilde{g}}(\varvec{\beta }) = {\mathbf {Q}}_g^{-\mathsf{T}} \varvec{\beta }\). By Corollary 1, the design \((\xi ^*)^g\) which assigns equal weights \(w^* = 1/2\) to the endpoints \(z_1 = a\) and \(z_2 = b\) of the experimental region \({\mathcal {Z}}\) is locally D-optimal for any value of the parameter vector \(\tilde{\varvec{\beta }} = {\tilde{g}}(\varvec{\beta }) \in \tilde{{\mathcal {B}}} = {\tilde{g}}({\mathcal {B}})\). \(\square \)

In the situation of Example 1, the locally D-optimal design does not depend on the parameter \(\varvec{\beta }\). This will typically not hold true, if the underlying model for the linear component becomes more complex.

Example 2

We consider the gamma model with the linear component \({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta } = \beta _0 + \beta _1 x_1 + \beta _2 x_2\), that is multiple linear regression of two covariates, \({\mathbf {x}} = (x_1,x_2)^\mathsf{T}\), where \({\mathbf {f}}({\mathbf {x}}) = (1, x_1, x_2)^\mathsf{T}\), \(p = 3\), with the unit square \({\mathcal {X}} = [0, 1]^2\) as the experimental region. Denote by \({\mathbf {x}}_1 = (0, 0)^\mathsf{T}\), \({\mathbf {x}}_2 = (1, 0)^\mathsf{T}\), \({\mathbf {x}}_3 = (0, 1)^\mathsf{T}\) and \({\mathbf {x}}_4 = (1, 1)^\mathsf{T}\) the vertices of \({\mathcal {X}}\). The parameter region \({\mathcal {B}}\) is the set of all parameter vectors \(\varvec{\beta } = (\beta _0, \beta _1, \beta _2)^\mathsf{T}\) such that the linear component at \({\mathbf {x}}_1, \ldots , {\mathbf {x}}_4\) is positive, i.e., \(\beta _0 > 0\), \(\beta _0 + \beta _1 > 0\), \(\beta _0 + \beta _2 > 0\), and \(\beta _0 + \beta _1 + \beta _2 > 0\). This region, depicted in the left panel of Fig. 1, constitutes a cone in the three-dimensional Euclidean space.

According to Burridge and Sebastiani [4], the minimally supported design \(\xi ^*\) which assigns equal weights \(w_i^* = 1/3\) to the support points \({\mathbf {x}}_i\), \(i = 1, 2, 3\), is locally D-optimal at \(\varvec{\beta }\), when \(\varvec{\beta }\) satisfies \(\beta _0^2 - \beta _1 \beta _2 \le 0\). The subset \({\mathcal {B}}_1\) of these \(\varvec{\beta }\) in \({\mathcal {B}}\) is shown in the right panel of Fig. 1.

Now equivariance can be used to find D-optimal designs for other parameter values different from those in \({\mathcal {B}}_1\). For this, we use transformations which map the experimental region onto itself, \({\mathcal {Z}} = {\mathcal {X}}\):

$$\begin{aligned} g_2({\mathbf {x}}) = (1 - x_1, 1 - x_2)^\mathsf{T}, \quad g_3({\mathbf {x}}) = (1 - x_1, x_2)^\mathsf{T}\quad \mathrm { and } \quad g_4({\mathbf {x}}) = (x_1, 1 - x_2)^\mathsf{T}. \nonumber \\ \end{aligned}$$
(19)

Here \(g_3\) and \(g_4\) represent the reflection with respect to the first and second covariate \(x_1\) and \(x_2\), respectively, and \(g_2\) is the simultaneous reflection with respect to both covariates. Alternatively, \(g_2\) can also be described as a rotation by 180 degree. We also introduce \(g_1 = \mathrm {id}\) as the identity mapping.

The regression function \({\mathbf {f}}({\mathbf {x}})=(1,x_1,x_2)^\mathsf{T}\) is linearly equivariant with respect to these transformations with corresponding matrices

$$\begin{aligned} {\mathbf {Q}}_{g_2} = \left( \begin{array}{rrr} 1 &{} 0 &{} 0 \\ 1 &{} - 1 &{} 0 \\ 1 &{} 0 &{} - 1 \end{array}\right) , \quad {\mathbf {Q}}_{g_3} = \left( \begin{array}{rrr} 1 &{} 0 &{} 0 \\ 1 &{} - 1 &{} 0 \\ 0 &{} 0 &{} 1 \end{array}\right) ,\quad {\mathbf {Q}}_{g_4} = \left( \begin{array}{rrr} 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} - 1 \end{array}\right) . \end{aligned}$$

For each \(g_k\), \(k = 2, 3, 4\), the corresponding parameter transformation is given by \({\tilde{g}}_k(\varvec{\beta }) = {\mathbf {Q}}_{g_k}^{\mathsf{T}} \varvec{\beta }\). Because \(g_k\) maps the experimental region \({\mathcal {X}}\) onto itself, also the related transformation \({\tilde{g}}_k\) maps the parameter regions onto itself, \(\tilde{{\mathcal {B}}} = {\mathcal {B}}\).

Starting from the parameter subregion \({\mathcal {B}}_1\), where the design \(\xi ^*\) is locally D-optimal, we can define parameter subregions \({\mathcal {B}}_k = {\tilde{g}}_k({\mathcal {B}}_1)\) induced by the transformations \(g_k\), \(k = 2, 3, 4\). These subregions are characterized explicitly in Table 1 by the inequalities in the last column, and they are also shown in the right panel of Fig. 1. All these subregions constitute cones. Now by equivariance we can conclude that the designs \(\xi ^*_k = (\xi ^*)^{g_k}\) are locally D-optimal at \(\varvec{\beta }\) for \(\varvec{\beta } \in {\mathcal {B}}_k\). The results are explicitly stated in Table 1.

Note that the same optimal designs have been obtained before in Idais [20] by a straightforward application of the celebrated Kiefer–Wolfowitz equivalence theorem (see, e.g., Silvey [39]). Further note that the interior region shown in the right panel of Fig. 1 contains those values for the parameter vector \(\varvec{\beta }\) for which locally D-optimal designs are supported on all four vertices and the corresponding weights depend on the values of \(\varvec{\beta }\) (see Idais [20]). \(\square \)

Fig. 1
figure 1

Parameter region \({\mathcal {B}}\) in the two-factor gamma model on \([0,1]^2\) (left panel); subregions \({\mathcal {B}}_1, \dots , {\mathcal {B}}_4\) of parameters for minimally supported locally D-optimal designs \(\xi _1^*\), \(\dots \), \(\xi _4^*\) (right panel)

Table 1 Minimally supported locally D-optimal designs and optimality regions for the two-factor gamma model on \([0, 1]^2\)

Next we investigate equivariance for the IMSE-criterion. There also the supplementary argument of the weighting measure \(\nu \) has to be transformed. Similar to the information matrix in (17), the weighting matrix \({\mathbf {V}}\) is equivariant under the transformations g and \({\tilde{g}}\),

$$\begin{aligned} {\mathbf {V}}({\tilde{g}}(\varvec{\beta }); \nu ^g)&= \int \lambda ({\mathbf {f}}({\mathbf {z}})^\mathsf{T}{\tilde{g}}({\varvec{\beta }}))^{2} {\mathbf {f}}({\mathbf {z}}) {\mathbf {f}}({\mathbf {z}})^\mathsf{T}\, \nu ^g(\mathrm {d} {\mathbf {z}}) \nonumber \\&= \int \lambda ({\mathbf {f}}(g({\mathbf {x}}))^\mathsf{T}{\tilde{g}}({\varvec{\beta }}))^{2} {\mathbf {f}}(g({\mathbf {x}})) {\mathbf {f}}(g({\mathbf {x}}))^\mathsf{T}\, \nu (\mathrm {d} {\mathbf {x}}) \nonumber \\&= \int \lambda ({\mathbf {f}}(g({\mathbf {x}}))^\mathsf{T}{\tilde{g}}({\varvec{\beta }}))^{2} {\mathbf {Q}}_g {\mathbf {f}}({\mathbf {x}}) {\mathbf {f}}({\mathbf {x}})^\mathsf{T}{\mathbf {Q}}_g^\mathsf{T}\, \nu (\mathrm {d} {\mathbf {x}}) \nonumber \\&= {\mathbf {Q}}_g \left( \int \lambda ({\mathbf {f}}(g({\mathbf {x}}))^\mathsf{T}{\tilde{g}}({\varvec{\beta }}))^{2} {\mathbf {f}}({\mathbf {x}}) {\mathbf {f}}({\mathbf {x}})^\mathsf{T}\, \nu (\mathrm {d}{\mathbf {x}})\right) {\mathbf {Q}}_g^\mathsf{T}= {\mathbf {Q}}_g {\mathbf {V}}(\varvec{\beta }; \nu ) {\mathbf {Q}}_g^\mathsf{T}, \end{aligned}$$
(20)

in the case of a generalized linear model with canonical link. This implies

$$\begin{aligned} \mathrm {IMSE}(\xi ^g; {\tilde{g}}(\varvec{\beta }), \nu ^g)= & {} \mathrm {trace}({\mathbf {Q}}_g {\mathbf {V}}(\varvec{\beta }; \nu ) {\mathbf {Q}}_g^\mathsf{T}({\mathbf {Q}}_g{\mathbf {M}}(\xi ; \varvec{\beta }) {\mathbf {Q}}_g^{\mathsf{T}})^{-1}) \nonumber \\= & {} \mathrm {trace}({\mathbf {V}}(\varvec{\beta }; \nu ) {\mathbf {M}}(\xi ; \varvec{\beta })^{-1}) =\mathrm {IMSE}(\xi ; \varvec{\beta }, \nu ) . \end{aligned}$$
(21)

Let \(\Phi \) be the local IMSE-criterion at \(\varvec{\beta }\) with respect to \(\nu \) and \(\Phi ^{\prime }\) be the local IMSE-criterion at \({\tilde{g}}(\varvec{\beta )}\) with respect to \(\nu ^g\), then the IMSE-criterion is equivariant under simultaneous transformation of \(\varvec{\beta }\) and the supplementary argument \(\nu \), and by Theorem 1 the locally IMSE-optimal design can be transferred.

Corollary 2

If \(\xi ^*\) is locally IMSE-optimal on \({\mathcal {X}}\) at \(\varvec{\beta }\) with respect to \(\nu \), then \((\xi ^*)^g\) is locally IMSE-optimal on \({\mathcal {Z}}\) at \(\tilde{\varvec{\beta }} = {\tilde{g}}(\varvec{\beta })\) with respect to \(\nu ^g\).

Note that the results of Corollaries 1 and 2 hold not only for any generalized linear model, but also, more generally, for all models, where the elemental information matrix is of the form (3) (see, e.g., Schmidt and Schwabe [37], for further examples).

Example

(Example 1 continued) In order to apply the equivariance result of Corollary 2 to the gamma model with simple linear regression, \({\mathbf {f}}(x) = (1, x)^\mathsf{T}\), the locally IMSE-optimal design \(\xi ^{*}\) on the unit interval \({\mathcal {X}} = [0, 1]\) has to be determined first.

Proposition 1

For the one-factor gamma model with simple linear regression \({\mathbf {f}}(x)^\mathsf{T}\varvec{\beta } = \beta _0 + \beta _1 x\) on the experimental region \({\mathcal {X}} = [0, 1]\) locally IMSE-optimal designs can be found which are supported on the endpoints 0 and 1 of the experimental region.

Locally optimal weights \(1 - w^*\) at 0 and \(w^*\) at 1, respectively, are given by

  1. (a)

    \(1 - w^* = w^* = 1/2\) for \(\nu \) the uniform (Lebesgue) measure on the interval [0, 1],

  2. (b)

    \(1 - w^* = (\beta _0 + \beta _1) / (2 \beta _0 + \beta _1)\) and \(w^* = \beta _0 / (2 \beta _0 + \beta _1)\) for \(\nu \) the (discrete) uniform measure on the endpoints \(\{0, 1\}\), and

  3. (c)

    \(1 - w^* = \beta _0 / (2 \beta _0 + \beta _1)\) and \(w^* = (\beta _0 + \beta _1) / (2 \beta _0 + \beta _1)\) for \(\nu \) the one-point measure on the midpoint 1/2 of the design region.

The proof of Proposition 1 is given in “Appendix.” Note that in Proposition 1 the locally optimal weights may depend on the weighting measure \(\nu \) used. In particular, for the two measures in Proposition 1 (b) and (c) which are concentrated on the endpoints and the midpoint, respectively, the locally optimal weights at 0 and 1 are interchanged. For the continuous uniform measure (Proposition 1 (a)), equal weights, \(w^*=1/2\), are assigned to both endpoints, and the (locally) IMSE-optimal design does not depend on the value of the parameter vector \(\varvec{\beta }\).

Now equivariance can be employed to obtain locally IMSE-optimal designs for any other interval \({\mathcal {Z}} = [a, b]\) as the experimental region. We again use the transformation \(g(x) = a + c x\), \(c = b - a\), together with \({\tilde{g}}(\varvec{\beta }) = {\mathbf {Q}}_g^{-\mathsf{T}} \varvec{\beta }\). Let \(\xi ^*\) be the locally IMSE-optimal design of Proposition 1 at \(\varvec{\beta }\) with respect to one of the given weighting measures \(\nu \). Then, by Corollary 2, the design \((\xi ^*)^g\) is the locally IMSE-optimal design at \(\tilde{\varvec{\beta }} = {\tilde{g}}(\varvec{\beta })\) with respect to \(\nu ^g\). \(\square \)

In order to obtain locally optimal designs at a given value of \(\tilde{\varvec{\beta }}\) on the transformed design region \({\mathcal {Z}}\), the inverse transformations \(g^{-1}({\mathbf {z}}) = {\mathbf {Q}}_g^{-1} {\mathbf {z}}\) and \({\tilde{g}}^{-1}(\tilde{\varvec{\beta }}) = {\mathbf {Q}}_g \tilde{\varvec{\beta }}\) of g and \({\tilde{g}}\), respectively, have to be used. We give this general result only for the case of the D- and the IMSE-criterion.

Corollary 3

Let the equivariance conditions be fulfilled.

  1. (a)

    The design \((\xi ^*)^g\) is locally D-optimal on \({\mathcal {Z}}\) at \(\tilde{\varvec{\beta }}\) if \(\xi ^*\) is locally D-optimal on \({\mathcal {X}}\) at \(\varvec{\beta }={\tilde{g}}^{-1}(\tilde{\varvec{\beta }})\).

  2. (b)

    The design \((\xi ^*)^g\) is locally IMSE-optimal on \({\mathcal {Z}}\) at \(\tilde{\varvec{\beta }}\) with respect to \(\nu \) if \(\xi ^*\) is locally IMSE-optimal on \({\mathcal {X}}\) at \(\varvec{\beta } = {\tilde{g}}^{-1}(\tilde{\varvec{\beta }})\) with respect to \(\nu ^{g^{-1}}\).

Example

(Example 1 continued) By Corollary 3, we can obtain locally IMSE-optimal designs for the one-factor gamma model with simple linear regression \({\mathbf {f}}(x)^\mathsf{T}\tilde{\varvec{\beta }} = {\tilde{\beta }}_0 + {\tilde{\beta }}_1 x\) on a given interval \({\mathcal {Z}} = [a, b]\) with respect to suitably specified weighting measures \(\nu _{{\mathcal {Z}}}\). The inversely transformed parameter vector \(\varvec{\beta } = {\tilde{g}}^{-1}(\tilde{\varvec{\beta }})\) is given by \(\varvec{\beta } = ({\tilde{\beta }}_0 + a {\tilde{\beta }}_1, (b - a) {\tilde{\beta }}_1)^\mathsf{T}\). By Corollary 3 and Proposition 1, the optimal designs are supported on the endpoints a and b of the interval and the optimal weights \(1-w^*\) at a and \(w^*\) at b, respectively, can be obtained as

  1. (a)

    \(1 - w^* = w^* = 1/2\) for \(\nu _{{\mathcal {Z}}}\) the uniform (Lebesgue) measure on the interval [ab],

  2. (b)

    \(1 - w^* = ({\tilde{\beta }}_0 + b {\tilde{\beta }}_1) / (2 {\tilde{\beta }}_0 + (a + b) {\tilde{\beta }}_1) = {\mathbf {f}}(b)^\mathsf{T}\tilde{\varvec{\beta }} / ({\mathbf {f}}(a)^\mathsf{T}\tilde{\varvec{\beta }} + {\mathbf {f}}(b)^\mathsf{T}\tilde{\varvec{\beta }})\) and \(w^* = ({\tilde{\beta }}_0 + a {\tilde{\beta }}_1) / (2 {\tilde{\beta }}_0 + (a + b) {\tilde{\beta }}_1) = {\mathbf {f}}(a)^\mathsf{T}\tilde{\varvec{\beta }} / ({\mathbf {f}}(a)^\mathsf{T}\tilde{\varvec{\beta }} + {\mathbf {f}}(b)^\mathsf{T}\tilde{\varvec{\beta }})\) for \(\nu _{{\mathcal {Z}}}\) the (discrete) uniform measure on the endpoints \(\{a, b\}\), and

  3. (c)

    \(1 - w^* = ({\tilde{\beta }}_0 + a {\tilde{\beta }}_1) / (2 {\tilde{\beta }}_0 + (a + b) {\tilde{\beta }}_1) = {\mathbf {f}}(a)^\mathsf{T}\tilde{\varvec{\beta }} / ({\mathbf {f}}(a)^\mathsf{T}\tilde{\varvec{\beta }} + {\mathbf {f}}(b)^\mathsf{T}\tilde{\varvec{\beta }})\) and \(w^* = ({\tilde{\beta }}_0 + b {\tilde{\beta }}_1) / (2 {\tilde{\beta }}_0 + (a + b) {\tilde{\beta }}_1) = {\mathbf {f}}(b)^\mathsf{T}\tilde{\varvec{\beta }} / ({\mathbf {f}}(a)^\mathsf{T}\tilde{\varvec{\beta }} + {\mathbf {f}}(b)^\mathsf{T}\tilde{\varvec{\beta }})\) for \(\nu _{{\mathcal {Z}}}\) the one-point measure on the midpoint \((a + b) / 2\) of the experimental region.

The continuous uniform measure in (a) is the common choice for the IMSE-criterion. The discrete uniform measure in (b) lays equal interest in the extreme values of the experimental region and may also be applied for the restricted experimental region \({\mathcal {X}}=\{a,b\}\) which can be used to describe two groups “a” and “b.” In that case, the IMSE-optimal weights are inverse proportional to the standard deviations \(\lambda _x = 1/({\mathbf {f}}(x)^\mathsf{T}\tilde{\varvec{\beta }})^2\), \(x = a, b\), in the groups in accordance with known results on A-optimality for group means. The one-point measure in (c) coincides with the c-criterion for estimating the mean response at the midpoint of the interval. \(\square \)

Note that the D- and IMSE-criteria are equivariant with respect to any transformation g of \({\mathbf {x}}\) for which the regression function \({\mathbf {f}}\) is linearly equivariant, \({\mathbf {f}}(g({\mathbf {x}})) = {\mathbf {Q}}_g {\mathbf {f}}({\mathbf {x}})\), and the corresponding transformation \({\tilde{g}}(\varvec{\beta }) = {\mathbf {Q}}_g^{-\mathsf{T}} \varvec{\beta }\) of \(\varvec{\beta }\). For other criteria, additional requirements may have to be fulfilled by the transformations to obtain equivariance results. For example, in the case of Kiefer’s class of \(\Phi _q\)-criteria (including the A-criterion) the transformation matrix \({\mathbf {Q}}_g\) should be orthogonal or, at least, satisfy that \({\mathbf {Q}}_g^{\mathsf{T}} {\mathbf {Q}}_g\) is a multiple of the \(p \times p\) identity matrix.

For the equivariance of maximin efficiency criteria, we require additionally that the underlying local criteria are multiplicatively equivariant with respect to \((g, {\tilde{g}})\), which means that for every \(\varvec{\beta } \in {\mathcal {B}}^{\prime }\) there is a constant \(c > 0\) such that \(\Phi _{{\tilde{g}}(\varvec{\beta })}(\xi ^g) = c \Phi _{\varvec{\beta }}(\xi )\) uniformly in \(\xi \). Then, for the corresponding maximin efficiency criterion we get

$$\begin{aligned} \Phi (\xi ^g)= & {} \sup _{\tilde{\varvec{\beta }} \in {\tilde{g}}({\mathcal {B}}^{\prime })} \frac{\Phi _{\tilde{\varvec{\beta }}}(\xi ^g)}{\Phi _{\tilde{\varvec{\beta }}}(\xi _{\tilde{\varvec{\beta }}}^*)} \nonumber \\= & {} \sup _{\varvec{\beta } \in {\mathcal {B}}^{\prime }} \frac{\Phi _{{\tilde{g}}(\varvec{\beta })}(\xi ^g)}{\Phi _{{\tilde{g}}(\varvec{\beta })}((\xi ^*_{\varvec{\beta }})^g)} \nonumber \\= & {} \sup _{\varvec{\beta } \in {\mathcal {B}}^{\prime }} \frac{c \Phi _{\varvec{\beta }}(\xi )}{c \Phi _{\varvec{\beta }}(\xi ^*_{\varvec{\beta }})} = \Phi (\xi ) , \end{aligned}$$
(22)

where in the second equality it is used that by Theorem 1 the image of the locally optimal design at \(\varvec{\beta }\) under g is locally optimal at \({\tilde{g}}(\varvec{\beta })\). Hence, the resulting maximin efficiency criterion \(\Phi \) is equivariant.

By (18), the homogeneous version \(\Phi _{\varvec{\beta }}(\xi ) = (\det ({\mathbf {M}}(\xi ; \varvec{\beta })))^{ - 1 / p}\) of the local D-criterion is multiplicatively equivariant with \(c = \det ({\mathbf {Q}}_g)^{ - 2 / p} > 0\). Accordingly, the local IMSE-criterion is multiplicatively equivariant with \(c=1\) by (21). Hence, both the maximin D-efficiency criterion and the maximin IMSE-efficiency criterion retain their value under the transformation and are thus equivariant.

Corollary 4

  1. (a)

    If \(\xi ^*\) is maximin D-efficient on \({\mathcal {B}}^{\prime }\), then \((\xi ^*)^g\) is maximin D-efficient on \(\tilde{{\mathcal {B}}}^{\prime } = {\tilde{g}}({\mathcal {B}}^{\prime })\).

  2. (b)

    If \(\xi ^*\) is maximin IMSE-efficient with respect to \(\nu \) on \({\mathcal {B}}^{\prime }\), then \((\xi ^*)^g\) is maximin IMSE-efficient with respect to \(\nu ^g\) on \(\tilde{{\mathcal {B}}}^{\prime } = {\tilde{g}}({\mathcal {B}}^{\prime })\).

Example

(Example 1 continued) In the gamma model with simple linear regression on [0, 1] the design \(\xi ^*\) which assigns equal weights 1/2 to both endpoints 0 and 1 is both locally D-optimal and by Proposition 1 locally IMSE-optimal with respect to the uniform measure \(\nu \) on [0, 1] for any \(\varvec{\beta } \in {\mathcal {B}}\). Hence, \(\xi ^*\) is obviously both maximin D-efficient and maximin IMSE-efficient with respect to \(\nu \) on \({\mathcal {B}}\) on [0, 1]. Then, with \(g(x) = a + c x\), \(c = b - a\), by Corollary 4 the design \((\xi ^*)^g\) which assigns equal weights 1/2 to a and b is maximin D-efficient and maximin IMSE-efficient with respect to the uniform measure \(\nu ^g\) on \(\tilde{{\mathcal {B}}}={\tilde{g}}({\mathcal {B}})\) on [ab]. \(\square \)

Further maximin D- and IMSE-efficient designs are derived in Sect. 5.

4 Extended Equivariance

The concept of equivariance can be extended when the structure of the intensity function is compatible with some transformation of the parameters. More specifically, we will consider situations where the intensity function \(\lambda \) is multiplicatively equivariant with respect to a transformation \({\tilde{g}}_0\) of \(\varvec{\beta }\), i.e., there exists a constant \(c_0 > 0\) such that \(\lambda ({\mathbf {f}}({\mathbf {x}})^\mathsf{T}{\tilde{g}}_0(\varvec{\beta })) = c_0 \lambda ({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta })\) for all \({\mathbf {x}} \in {\mathcal {X}}\).

For example, in the gamma model with inverse link we have \(\lambda ({\mathbf {f}}({\mathbf {x}})^\mathsf{T}{\tilde{c}}\varvec{\beta }) = {\tilde{c}}^{\, - 2} \lambda ({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta })\) for any scaling factor \({\tilde{c}}\) (see Idais and Schwabe [21], for some specific models). Hence, the intensity function is multiplicatively equivariant with respect to any transformation \({\tilde{g}}_0(\varvec{\beta }) = {\tilde{c}} \varvec{\beta }\) which scales all components of the parameter vector \(\varvec{\beta }\) simultaneously by the same factor \({\tilde{c}} > 0\), and the multiplicative factor is \(c_0 = {\tilde{c}}^{\, - 2} > 0\). Note that the scaling \({\tilde{g}}_0\) retains the positivity of the linear component \({\mathbf {f}}({\mathbf {x}})^\mathsf{T}{\tilde{g}}_0(\varvec{\beta }) = {\tilde{c}} {\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta } > 0\) for the scaled vector \({\tilde{g}}_0(\varvec{\beta }) = {\tilde{c}} \varvec{\beta }\), \({\tilde{c}} > 0\). Thus, the maximal region \({\mathcal {B}}\) of parameter values \(\varvec{\beta }\) such that the linear component \({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta }\) is positive constitutes a cone in the p-dimensional Euclidean space, i.e., for each vector \(\varvec{\beta } \in {\mathcal {B}}\) and every positive scale factor \({\tilde{c}} > 0\) the scaled vector \({\tilde{c}} \varvec{\beta }\) lies also in \({\mathcal {B}}\).

Another, more basic example arises in Poisson regression with canonical \(\log \) link when the value \(\beta _0\) of the intercept parameter is changed to \(\tilde{\beta _0}\). The corresponding transformation of \(\varvec{\beta }\) can be described by the (affine) linear mapping \({\tilde{g}}_0(\varvec{\beta }) = \varvec{\beta } + (\tilde{\beta _0} - \beta _0) {\mathbf {e}}_1\), where \({\mathbf {e}}_1\) denotes the first unit vector of appropriate length p. Then, the intensity function \(\lambda (z) = \exp (z)\) is multiplicatively equivariant with respect to \({\tilde{g}}_0\) with multiplicative factor \(c_0 = \exp (\tilde{\beta _0} - \beta _0) > 0\). This has been implicitly applied in the literature when concluding that optimal designs do not depend on the value \(\beta _0\) of the intercept parameter (see, e.g., Russell et al. [36]).

To embed these transformations \({\tilde{g}}_0\) into the concept of equivariance of Sect. 3, we combine them with the identity mapping \(g = \mathrm {id}\) on the experimental region \({\mathcal {X}}\). Then, the multiplicative equivariance obviously carries over from the intensity to the information matrix.

Lemma 1

If the intensity function \(\lambda \) is multiplicatively equivariant with respect to \({\tilde{g}}_0\) with multiplicative factor \(c_0 > 0\), then the information matrix is (multiplicatively) equivariant with respect to \(g = \mathrm {id}\) and \({\tilde{g}}_0\), \( {\mathbf {M}}(\xi ; {\tilde{g}}_0(\varvec{\beta })) = c_0 {\mathbf {M}}(\xi ; \varvec{\beta }) \).

To transfer optimal designs by Theorem 1, it remains to show that the criteria under consideration are order preserving with respect to transformations which act multiplicatively on the intensity. By Lemma 1, we directly get \(\det ({\mathbf {M}}(\xi ; {\tilde{g}}_0(\varvec{\beta }))) = c_0^{p} \det ({\mathbf {M}}(\xi ; \varvec{\beta }))\) and, hence, the equivariance of the D-criterion.

For the IMSE-criterion, we additionally require multiplicative equivariance of the function \(\eta '\), i.e., \(\eta '({\mathbf {f}}({\mathbf {x}})^\mathsf{T}{\tilde{g}}_0(\varvec{\beta })) = c_\eta \eta '({\mathbf {f}}({\mathbf {x}})^\mathsf{T}\varvec{\beta })\) uniformly in \({\mathbf {x}}\) for some constant \(c_\eta > 0\). This condition is fulfilled for generalized linear models with canonical link because of \(\eta ' = \lambda \). Then, the equivariance property \({\mathbf {V}}({\tilde{g}}_0(\varvec{\beta }); \nu ) = c_\eta ^{2} {\mathbf {V}}(\varvec{\beta }; \nu )\) of the weighting matrix yields equivariance, \(\mathrm {IMSE}(\xi ; {\tilde{g}}(\varvec{\beta }), \nu ) = c_0^{-1} c_\eta ^2 \mathrm {IMSE}(\xi ; \varvec{\beta }, \nu )\).

Corollary 5

If the intensity function \(\lambda \) is multiplicatively equivariant with respect to \({\tilde{g}}_0\), then

  1. (a)

    A locally D-optimal design \(\xi ^*\) at \(\varvec{\beta }\) is also locally D-optimal at \(\tilde{\varvec{\beta }} = {\tilde{g}}_0(\varvec{\beta })\),

  2. (b)

    If additionally \(\eta '\) is multiplicatively equivariant, a locally IMSE-optimal design \(\xi ^*\) at \(\varvec{\beta }\) with respect to \(\nu \) is also locally IMSE-optimal at \(\tilde{\varvec{\beta }} = {\tilde{g}}_0(\varvec{\beta })\) with respect to \(\nu \).

When a whole family of such transformations \({\tilde{g}}_0\) is available, as scaling by \({\tilde{c}} > 0\) in the gamma model with inverse link or shifting the intercept in Poisson regression, then we can use the result of Corollary 5 to reduce the number of parameters in the optimization problem. Therefore, we first solve the optimization problem for a standardized parameter setting and then transfer the obtained optimal design to a general parameter vector by a suitable choice of the transformation \({\tilde{g}}_0\). For example, in the gamma model one component of the parameter vector can be set equal to 1 for standardization. Then, a general parameter vector can be obtained by choosing \({\tilde{c}}\) equal to the nominal value of the component of the parameter vector used for standardization. Similarly, in Poisson regression, we can first set the intercept parameter equal to 0 and then transfer the optimal design to the parameter vector with given nominal value \(\beta _0\).

Example

(Example 1 continued) For the one-factor gamma model with simple linear regression \({\mathbf {f}}(x)^\mathsf{T}\varvec{\beta } = \beta _0 + \beta _1 x\) on \({\mathcal {X}} = [0, 1]\), the locally IMSE-optimal design at \((1, \gamma )^\mathsf{T}\) with respect to the discrete uniform weighting measure \(\nu \) on the endpoints \(\{0, 1\}\) assigns weights \(1 - w^* = 1 / (2 + \gamma )\) and \(w^* = (1 + \gamma ) / (2 + \gamma )\) to 0 and 1, respectively. Under scaling with \({\tilde{c}} = \beta _0 > 0\), these weights remain locally optimal for any parameter vector \(\varvec{\beta } = (\beta _0, \beta _1)^\mathsf{T}\) with \(\beta _1 / \beta _0 = \gamma \) by Corollary 5. The corresponding reduced parameter region for \(\gamma \) is given by \({\mathcal {C}} = \{\gamma ;\, \gamma > - 1\}\) which is displayed in Fig. 2 as the vertical dashed line at \(\beta _0 = 1\). There the diagonal dotted line represents one ray \(\{\varvec{\beta };\, \beta _1 = \gamma \beta _0\}\) of values of \(\varvec{\beta }\) which are reduced to one specific value of \(\gamma \) indicated by the intersection of the ray with the vertical line at \(\beta _0 = 1\). \(\square \)

Fig. 2
figure 2

Parameter region \({\mathcal {B}}\) for the one-factor gamma model on [0, 1]; reduced parameter region \({\mathcal {C}}\) (vertical dashed line); values of \(\varvec{\beta }\) reduced to \(\gamma =1\) (diagonal dotted line)

Example

(Example 2 continued) Similarly, in the two-factor gamma model on \([0, 1]^2\), the three-dimensional parameter vector \(\varvec{\beta } = (\beta _0, \beta _1, \beta _2)^\mathsf{T}\) can be reduced to \(\tilde{\varvec{\beta }} = (1, \gamma _1, \gamma _2)^\mathsf{T}\), where \(\gamma _1 = \beta _1 / \beta _0\) and \(\gamma _2 = \beta _2 / \beta _0\), by setting the value of the intercept parameter \(\beta _0\) equal to 1. As a consequence, the three-dimensional parameter region \({\mathcal {B}}\) in Fig. 1 is reduced to the two-dimensional region \({\mathcal {C}}\) for \(\varvec{\gamma } = (\gamma _1, \gamma _2)^\mathsf{T}\) in Fig. 3 which is characterized by the linear constraints \(\gamma _1 > - 1\), \(\gamma _2 > - 1\), and \(\gamma _1 + \gamma _2 > - 1\). The optimality regions \({\mathcal {B}}_k\), \(k = 1, \ldots , 4\) of Table 1 can now be described in terms of the ratios \(\gamma _j = \beta _j / \beta _0\), \(j = 1, 2\), resulting in \({\mathcal {B}}_1 \equiv 1 - \gamma _1 \gamma _2 \le 0\), \({\mathcal {B}}_2 \equiv (1 + \gamma _1 + \gamma _2)^2 - \gamma _1 \gamma _2 \le 0\), \({\mathcal {B}}_3 \equiv (1 + \gamma _1)^2 + \gamma _1 \gamma _2 \le 0\) and \({\mathcal {B}}_4 \equiv (1 + \gamma _2)^2 + \gamma _1 \gamma _2 \le 0\), as exhibited in Fig. 3. The scaling property explains why the subregions in Fig. 1 constitute cones in the three-dimensional Euclidean space. \(\square \)

Fig. 3
figure 3

Scaled parameter region of the two-factor gamma model on \([0, 1]^2\); the diagonal dashed line represents \(\gamma _1 = \gamma _2\)

By combination of the transformation \({\tilde{g}}_0\) with the linear transformations of the preceding Sect. 3, we get an extension of Corollaries 1 and 2 by Theorem 1.

Corollary 6

If the intensity function \(\lambda \) is multiplicatively equivariant with respect to \({\tilde{g}}_0\), then:

  1. (a)

    If \(\xi ^*\) is locally D-optimal on \({\mathcal {X}}\) at \(\varvec{\beta }\), then \((\xi ^*)^g\) is locally D-optimal on \({\mathcal {Z}}\) at \(\tilde{\varvec{\beta }} = {\tilde{g}}_0({\mathbf {Q}}_g^{-\mathsf{T}} \varvec{\beta })\),

  2. (b)

    If \(\xi ^*\) is locally IMSE-optimal on \({\mathcal {X}}\) at \(\varvec{\beta }\) with respect to \(\nu \) and if, additionally, \(\eta ^{\prime }\) is multiplicatively equivariant, then \((\xi ^*)^g\) is locally IMSE-optimal on \({\mathcal {Z}}\) at \(\tilde{\varvec{\beta }} = {\tilde{g}}_0({\mathbf {Q}}_g^{-\mathsf{T}} \varvec{\beta })\) with respect to \(\nu ^g\).

This result indicates that for a given transformation g of \({\mathbf {x}}\) the associated transformation \({\tilde{g}}(\varvec{\beta }) = {\tilde{g}}_0({\mathbf {Q}}_g^{-\mathsf{T}} \varvec{\beta })\) of \(\varvec{\beta }\) needs not be unique. Moreover, we may let the transformation \({\tilde{g}}_0 = {\tilde{g}}_{0, \varvec{\beta }}\) depend on the parameter vector \(\varvec{\beta }\), where the intensity function \(\lambda \) is multiplicatively equivariant with respect to \({\tilde{g}}_{0, \varvec{\beta }}\) for any \(\varvec{\beta }\). Then, also the multiplicative factor \(c_0 = c_{0, \varvec{\beta }}\) will depend on \(\varvec{\beta }\) so that \(\lambda ({\mathbf {f}}({\mathbf {x}})^{\mathsf{T}} {\tilde{g}}_{0, \varvec{\beta }}(\varvec{\beta })) = c_{0, \varvec{\beta }} \lambda ({\mathbf {f}}({\mathbf {x}})^{\mathsf{T}} \varvec{\beta })\). In combination with the linear transformation of Sect. 3, this leads to a nonlinear transformation \({\tilde{g}}(\varvec{\beta }) = {\tilde{g}}_{0, \varvec{\beta }}({\mathbf {Q}}_g^{-\mathsf{T}} \varvec{\beta })\) of the parameter vector \(\varvec{\beta }\) so that the information matrix is equivariant with respect to the pair \((g, {\tilde{g}})\) of transformations. For the gamma model with inverse link, this can be accomplished by choosing the scaling factor \({\tilde{c}} ={\tilde{c}}_{\varvec{\beta }}\) in dependence on \(\varvec{\beta }\).

Example

(Example 1 continued) For the one-factor gamma model with simple linear regression on the unit interval [0, 1], consider the reflection \(g(x) = 1 - x\) that maps [0, 1] onto itself. The corresponding linear transformation of the parameter vector \(\varvec{\beta } = (\beta _0, \beta _1)^\mathsf{T}\) is given by \({\mathbf {Q}}_g^{-\mathsf{T}} \varvec{\beta } = (\beta _0 + \beta _1,\, - \beta _1)^\mathsf{T}\). In particular, for a scaled reduced parameter vector \(\varvec{\beta } = (1, \gamma )^\mathsf{T}\), \(\gamma = \beta _1 / \beta _0\), we have \({\mathbf {Q}}_g^{-\mathsf{T}} \varvec{\beta } = (1 + \gamma ,\, - \gamma )^\mathsf{T}\). In order to obtain a transformed parameter vector \(\tilde{\varvec{\beta }} = {\tilde{g}}(\varvec{\beta })\) in reduced form, \(\tilde{\beta _0} = 1\) the linear transformation has to be rescaled by \({\tilde{c}}_{\varvec{\beta }} = 1 / (1 + \gamma )\). This results in the nonlinear transformation \({\tilde{g}}((1, \gamma )^\mathsf{T}) = (1, \, - \gamma / (1 + \gamma ))^\mathsf{T}\). Note that this transformation is a one-to-one mapping of the maximal region \({\mathcal {C}} = ( - 1, \infty )\) for the reduced parameter \(\gamma \).

For general values of the first entry \(\beta _0\) in the parameter vector \(\varvec{\beta }\), the value of \(\beta _0\) can be preserved by the nonlinear transformation \({\tilde{g}}(\varvec{\beta }) = \beta _0 / (\beta _0 + \beta _1) {\mathbf {Q}}_g^\mathsf{T}\varvec{\beta }\), where the scaling factor \({\tilde{c}}_{\varvec{\beta }} = \beta _0 / (\beta _0 + \beta _1) = 1 / (1 + \gamma )\) only depends on \(\gamma = \beta _1 / \beta _0\). \(\square \)

The result of Corollary 6 carries over directly also for the nonlinear transformation, when \({\tilde{g}}_0\) is replaced by \({\tilde{g}}_{0, \varvec{\beta }}\).

Example

(Example 1 continued) For the one-factor gamma model with simple linear regression on [0, 1] and reflection \(g(x) = 1 - x\), the weighting measures \(\nu \) specified in Proposition 1 are all invariant with respect to g, i.e., \(\nu ^g = \nu \). The corresponding locally IMSE-optimal designs \(\xi ^*\) on [0, 1] with respect to \(\nu \) are supported by the endpoints with optimal weights \(1 - w^*\) and \(w^*\) at 0 and 1, respectively. Then, the designs \((\xi ^*)^g\) which assign the interchanged weights \(w^*\) to 0 and \(1 - w^*\) to 1 are locally IMSE-optimal on [0, 1] with respect to \(\nu \) at \(\tilde{\varvec{\beta }} = {\tilde{c}}_{\varvec{\beta }} {\mathbf {Q}}_{g}^{-\mathsf{T}} \varvec{\beta } = (\beta _0,\, - \beta _1 / (1 + \gamma ))^\mathsf{T}\).

The standardization with respect to the intercept can be extended to more complex models.

Example

(Example 2 continued)In the two-factor gamma model on \([0, 1]^2\), we consider the transformation \(g_2({\mathbf {x}}) = (1 - x_1, 1 - x_2)^\mathsf{T}\) of simultaneous reflection of both explanatory variables and the corresponding rescaled transformation \({\tilde{g}}_2(\varvec{\beta }) = {\tilde{c}}_{\varvec{\beta }} {\mathbf {Q}}_{g_2}^{-\mathsf{T}} \varvec{\beta }\) of \(\varvec{\beta }\) which leaves the intercept \(\beta _0\) unchanged, i.e., \({\tilde{c}}_{\varvec{\beta }} = \beta _0 / (\beta _0 + \beta _1 + \beta _2)\) and, hence, \({\tilde{g}}_2(\varvec{\beta }) = \beta _0 (\beta _0, - \beta _1, - \beta _2)^{\mathsf{T}} / (\beta _0 + \beta _1 + \beta _2)\). The scaling factor \({\tilde{c}}_{\varvec{\beta }} = 1 / (1 + \gamma _1 + \gamma _2)\) only depends on the reduced parameters \(\gamma _j = \beta _j / \beta _0\).\(j = 1, 2\), so that the one-to-one transformation \(\varvec{\gamma } \rightarrow - (1 / (1 + \gamma _1 + \gamma _2)) \varvec{\gamma }\) is induced on the reduced parameter region \({\mathcal {C}}\) onto itself. Hence, if a design \(\xi ^*\) is locally D-optimal at \(\varvec{\gamma }\) which assigns weights \(w_i^*\) to \({\mathbf {x}}_i\), \(i = 1, \ldots , 4\), then the design \((\xi ^*)^{g_2}\) which assigns weights \(w_4^*\), \(w_3^*\), \(w_2^*\) and \(w_1^*\) to \({\mathbf {x}}_1, \ldots , {\mathbf {x}}_4\), respectively, is locally D-optimal at \( - (1 / (1 + \gamma _1 + \gamma _2)) \varvec{\gamma }\).

Similar results hold for IMSE-optimality. \(\square \)

For maximin efficiency criteria, we additionally allow here that the multiplicative factor in the equivariance of the underlying local criteria may depend on the parameter \(\varvec{\beta }\), \(c = c_{\varvec{\beta }}\). This does not affect the arguments in (22), and hence, the resulting maximin efficiency criteria remain equivariant. The homogeneous version of the local D-criterion and the local IMSE-criterion is multiplicatively equivariant with \(c_{\varvec{\beta }} = c_{0, \varvec{\beta }}^{-1} \det ({\mathbf {Q}}_g)^{ - 2 / p} > 0\) and \(c_{\varvec{\beta }} = c_{0, \varvec{\beta }} > 0\), respectively. Hence, for both the maximin D-efficiency and the maximin IMSE-efficiency criterion their value is not changed under the transformation. These criteria are thus equivariant, and the result of Corollary 4 remains valid so that maximin efficient designs can be transferred also for nonlinear transformations \({\tilde{g}}(\varvec{\beta }) = {\tilde{g}}_{0, \varvec{\beta }}({\mathbf {Q}}_g^{-\mathsf{T}} \varvec{\beta })\) when the intensity function is multiplicatively equivariant with respect to \({\tilde{g}}_{0, \varvec{\beta }}\) for all \(\varvec{\beta }\).

5 Invariance

While equivariance can be used to transfer optimal designs, the concept of invariance allows reduction in the complexity of finding optimal designs by exploiting symmetries (see, e.g., Schwabe [38, ch. 3], in the case of linear models). As in linear models, we need a (finite) group G of transformations g which map the experimental region \({\mathcal {X}}\) onto itself. For each of these transformations g, the regression functions \({\mathbf {f}}\) are assumed to be linearly equivariant, \({\mathbf {f}}(g({\mathbf {x}})) = {\mathbf {Q}}_g {\mathbf {f}}({\mathbf {x}})\). For generalized linear models, we require additionally that the corresponding transformations \({\tilde{g}}\) of \(\varvec{\beta }\) also constitute a group \({\tilde{G}}\) such that the set \((G, {\tilde{G}})\) of pairs \((g, {\tilde{g}})\) of transformations shares the group structure. This requirement is automatically fulfilled for the linear transformations \({\tilde{g}}(\varvec{\beta }) = {\mathbf {Q}}_g^{-\mathsf{T}} \varvec{\beta }\), because the transformation matrices \({\mathbf {Q}}_g\), \(g \in G\), constitute a group with respect to matrix multiplication. For extended equivariance (Sect. 4), also the factor \(c_{0, \varvec{\beta }}\) has to share the group property. This holds in the gamma model for rescaling by \({\tilde{c}}_{\varvec{\beta }}\) which leaves the value for the standardized component of the parameter vector unchanged. Similarly, for Poisson regression, standardization of the intercept to 0 preserves the group structure.

Example

(Example 1 continued) For the one-factor gamma model with simple linear regression on [0, 1], the reflection \(g(x) = 1 - x\) maps [0, 1] onto itself and is self-inverse, i.e., \(g^{-1} = g\). Hence, g together with the identity \(\mathrm {id}\) constitute a group \(G = \{\mathrm {id}, g\}\) of transformations. For g the associated transformation of the parameter vector \(\varvec{\beta }\) is \({\tilde{g}}(\varvec{\beta }) = {\mathbf {Q}}_g^{-\mathsf{T}} \varvec{\beta } = (\beta _0 + \beta _1,\, - \beta _1)^\mathsf{T}\) in the linear case and \({\tilde{g}}(\varvec{\beta }) = {\tilde{c}}_{\varvec{\beta }}{\mathbf {Q}}_g^{-\mathsf{T}} \varvec{\beta } = (\beta _0,\, - \beta _0 \beta _1 / (\beta _0 + \beta _1))^\mathsf{T}\) in the extended case. As always, the identity on \({\mathcal {B}}\) is associated with the identity \(\mathrm {id}\) on \({\mathcal {X}}\). Because of \({\tilde{c}}_{{\tilde{g}}(\varvec{\beta })} = 1 + \gamma = 1 / {\tilde{c}}_{\varvec{\beta }}\) also \({\tilde{g}}\) is self-inverse, and the group structure is retained. \(\square \)

The final ingredient for invariance is that the optimality criterion \(\Phi \) is invariant with respect to the group G of transformations, i.e., \(\Phi (\xi ^g) = \Phi (\xi )\) for all \(g \in G\) and any design \(\xi \). Then, we can make use of convexity arguments to improve designs by symmetrization. For this, define by \({\bar{\xi }} = (1 / |G|)\sum _{g \in G} \xi ^g\) the symmetrized version of a design \(\xi \) with respect to the group G, where |G| denotes the number of elements in the (finite) group G. Note that \({\bar{\xi }}\) is itself a design and that \({\bar{\xi }}\) is invariant with respect to G, i.e., \({\bar{\xi }}^g = {\bar{\xi }}\) for all \(g \in G\). If \(\Phi \) is invariant and convex, we obtain

$$\begin{aligned} \Phi ({\bar{\xi }}) \le \frac{1}{|G|} \sum _{g \in G} \Phi (\xi ^g) = \Phi (\xi ) , \end{aligned}$$
(23)

where the inequality follows from convexity and the equation from invariance. From this majorization property, we can conclude that the designs which are invariant with respect to G constitute an essentially complete class with regard to \(\Phi \). This means that we can confine the search for a \(\Phi \)-optimal design to the class of invariant designs.

Theorem 2

If \(\Phi \) is invariant (with respect to G) and convex, then there exists an invariant design \(\xi ^*\) (with respect to G) which is \(\Phi \)-optimal over all designs.

The class of invariant designs is often much smaller than the class of all designs, and optimization can be simplified. Invariant designs are uniform on orbits \({\mathcal {O}}_{{\mathbf {x}}} = \{g({\mathbf {x}});\, g \in G\} \subset {\mathcal {X}}\), i.e., all \({\mathbf {x}}\) in the same orbit have the same weight. In particular, for an invariant design, either all \({\mathbf {x}}\) in an orbit \({\mathcal {O}}\) are included with weight \(w_{{\mathcal {O}}}\) or the whole orbit is not in the support of the design. For optimization in the class of invariant designs, it remains to find the optimal orbits and the corresponding optimal weights which is often a much easier task than to optimize over all possible designs.

Example

(Example 1 continued) For the reflection group \(G = \{\mathrm {id}, g\}\), \(g(x) = 1 - x\) on [0, 1], the orbits are all of the form \(\{x, 1 - x\}\) for \(x < 1/2\) and \(\{1/2\}\) for \(x = 1/2\), respectively. In the one-factor gamma model with simple linear regression, it is known that the optimal designs are supported at the endpoints 0 and 1 (see Gaffke et al. [14]). Thus, the only remaining orbit for an optimal design is \(\{0, 1\}\), and hence, there is only one invariant design which assigns equal weights 1/2 to each endpoint. This design is optimal with respect to each convex invariant criterion. \(\square \)

In the case of local optimality criteria, the requirement of invariance is rather restrictive. In particular, the local parameter \(\varvec{\beta }\) has to be invariant under all transformations, i.e., \({\tilde{g}}(\varvec{\beta }) = \varvec{\beta }\) for all \(g \in G\). This condition typically holds only for a few values of \(\varvec{\beta }\).

Example

(Example 1 continued) For the one-factor gamma model with simple linear regression under the reflection \(g(x)=1-x\), the parameter \(\varvec{\beta }\) is only invariant if \(\beta _1 = 0\), i.e., there is no effect of the covariate x. The invariant design which assigns equal weights 1/2 to the endpoints is locally optimal at \(\varvec{\beta }\) for \(\beta _1 = 0\).

Example

(Example 2 continued) For the two-factor gamma model with multiple linear regression on \([0, 1]^2\), the reflections \(g_2\), \(g_3\), and \(g_4\) are all self-inverse and the composition of any two reflections yields the third one. Together with the identity \(g_1 = \mathrm {id}\), the reflections constitute a group \(G = \{g_1, g_2, g_3, g_4\}\) of transformations. Locally optimal designs are supported at the vertices of \([0, 1]^2\) (see Gaffke et al. [14]). The vertices lie all on one orbit, and hence, the unique invariant design on the vertices assigns equal weights 1/4 to each of the vertices. Under the group G, the parameter vector \(\varvec{\beta }\) is only invariant if \(\beta _1 = \beta _2 = 0\), i.e., there is no effect for both covariates \(x_1\) and \(x_2\). Thus, the invariant design which assigns equal weights 1/4 to the vertices is locally optimal at \(\varvec{\beta }\) only in the case \(\beta _1 = \beta _2 = 0\). \(\square \)

Note that in both examples above locally optimal designs are obtained for the situation of constant intensity \(\lambda \). In that case, the information matrix is proportional to that in the corresponding linear model with the same linear component. Hence, the locally optimal design coincides with the optimal design in the linear model (see Cox [6, Section 4]).

In more complex situations, however, invariance may be helpful for local optimality at certain parameter values which are invariant with respect to \({\tilde{g}}\) for all \(g \in G\). To this end, first note that in the case of a finite group G of transformations g the corresponding transformation matrices \({\mathbf {Q}}_g\) are unimodal, i.e., \(|\det ({\mathbf {Q}}_g)| = 1\) (see Schwabe [38, ch. 3]). For the IMSE-criterion, we additionally require that the weighting measure \(\nu \) is invariant with respect to G, i.e., \(\nu ^g = \nu \) for all \(g \in G\).

Corollary 7

If \({\tilde{g}}(\varvec{\beta }) = \varvec{\beta }\) for all \(g \in G\), then there exists a locally D-optimal design \(\xi ^*\) at \(\varvec{\beta }\) which is invariant with respect to G.

If additionally \(\nu \) is invariant with respect to G, then there exists a locally IMSE-optimal design \(\xi ^*\) at \(\varvec{\beta }\) with respect to \(\nu \) which is invariant with respect to G.

Example

(Example 2 continued) In the two-factor gamma model on \([0, 1]^2\), we consider nominal parameter values \(\varvec{\beta }\) with \(\beta _1 = 0\), i.e., where the first covariate \(x_1\) has no effect. Such parameter vectors are invariant with respect to the linear transformation \({\tilde{g}}_3(\varvec{\beta }) = (\beta _0 + \beta _1, - \beta _1, \beta _2)^\mathsf{T}\) associated with the reflection \(g_3({\mathbf {x}}) = (1 - x_1, x_2)^\mathsf{T}\) of the first covariate \(x_1\). As the transformation \(g_3\) is self-inverse, together with the identity \(\mathrm {id}\) it constitutes a group \(G_3 = \{\mathrm {id}, g_3\}\). Then, the local D-criterion at such \(\varvec{\beta }\) with \(\beta _1 = 0\) is invariant with respect to \(G_3\). By Corollary 7 a locally D-optimal design can be found in the class of designs which are invariant with respect to \(G_3\). Moreover, here we can also restrict attention to designs supported by the vertices. With respect to \(G_3\), the relevant orbits are then \(({\mathbf {x}}_1, {\mathbf {x}}_2)\) and \(({\mathbf {x}}_3, {\mathbf {x}}_4)\), and invariant designs on the vertices have equal weights w at \({\mathbf {x}}_1\) and \({\mathbf {x}}_2\) and equal weights \(1/2 - w\) at \({\mathbf {x}}_3\) and \({\mathbf {x}}_4\), respectively. We will denote such designs by \({\bar{\xi }}_w\). The optimization problem for a local D-optimal design reduces to finding the optimal weight \(w^*\). Note that for \(\beta _1 = 0\) the intensities on the orbits are constant, i.e., \(\lambda _1 = \lambda _2\) and \(\lambda _3 = \lambda _4\), where again \(\lambda _i\) denotes the intensity at \({\mathbf {x}}_i\). For the designs \({\bar{\xi }}_w\), the determinant of the information matrix becomes

$$\begin{aligned} \det ({\mathbf {M}}(\bar{\xi }_w; \varvec{\beta })) = 2 (\lambda _1^2 \lambda _3 w^2 (1/2 - w) + \lambda _1\lambda _3^2 w(1/2 - w)^2) \end{aligned}$$

in the case \(\beta _1 = 0\). The optimal weight \(w^*\) can be determined by straightforward computations as

$$\begin{aligned} w^* = \frac{3 \gamma _2 - 1 + \sqrt{12 \gamma _2^2 + 1}}{6 \gamma _2 (\gamma _2 + 2)} , \end{aligned}$$
(24)

for \(\beta _2 \ne 0\), and \(w^* = 1/4\) for \(\beta _2 = 0\) or \(\beta _2 = 2\beta _0\). The dependence of the optimal weight \(w^*\) on \(\gamma _2\) is shown in Fig. 4. The resulting invariant design \(\bar{\xi }_{w^*}\) is locally D-optimal at \(\varvec{\beta }\) with \(\beta _1 = 0\).

An analogous result holds for \(\beta _2 = 0\), when the reflection \(g_4\) of the second covariate \(x_2\) is used instead of \(g_3\). \(\square \)

Fig. 4
figure 4

Optimal weights \(w^*\) in the two-factor gamma model when \(\beta _1 = 0\). The vertical dashed lines indicate \(\gamma _2=0\) and \(\gamma _2=2\). The horizontal dashed line indicates \(w^*=0.25\)

Example 3

In the two-factor gamma model on \([0, 1]^2\), there are further symmetries which can be employed. In particular, we may consider parameter vectors \(\varvec{\beta }\) with equal slopes, i.e., \(\beta _1 = \beta _2 = \beta \) for some \(\beta \) when both covariates \(x_1\) and \(x_2\) have an effect of the same size. These values for the parameter vector \(\varvec{\beta }\) are invariant with respect to the linear transformation \({\tilde{g}}_5(\varvec{\beta }) = (\beta _0, \beta _2, \beta _1)^\mathsf{T}\) associated with the permutation \(g_5({\mathbf {x}}) = (x_2, x_1)^\mathsf{T}\) of the covariates. The transformation \(g_5\) is self-inverse and constitutes together with the identity \(\mathrm {id}\) a group \(G_5 = \{\mathrm {id}, g_5\}\). Because locally optimal designs are supported by the vertices of \([0,1]^2\), there are only three relevant orbits \(\{{\mathbf {x}}_1\}\), \(\{{\mathbf {x}}_2, {\mathbf {x}}_3\}\), and \(\{{\mathbf {x}}_4\}\). Optimal invariant designs can thus be characterized by two weights \(w_1^*\) assigned to \({\mathbf {x}}_1\) and \(w_4^*\) assigned to \({\mathbf {x}}_4\), while the remaining equal weights \(w_2^* = w_3^* = (1 - w_1^* - w_4^*) / 2\) are assigned to each of \({\mathbf {x}}_2\) and \({\mathbf {x}}_3\).

For the local D-criterion, optimal weights have been obtained (see Gaffke et al. [14, Theorem 4.3]). There it was shown that minimally supported designs are locally D-optimal at \(\varvec{\beta }\) with \(\beta _1 = \beta _2 = \beta \) for \(\beta > \beta _0\) or \( - \beta _0 / 2 < \beta \le \beta _0 / 3\) with weights \(w_2^* = w_3^* = 1/3\) and \(w_1^* = 1/3\) or \(w_4^* = 1/3\), respectively. In the intermediate case \( - \beta _0 / 3 < \beta \le \beta _0\) the locally D-optimal designs are supported on all four vertices with weights

$$\begin{aligned} w_1^* = \frac{3 \gamma + 1}{4 (2 \gamma + 1)},\ w_2^* = w_3^* = \frac{(\gamma + 1)^2}{4 (2 \gamma + 1)} \quad \mathrm { and }\quad w_4^* = \frac{1 - \gamma }{4} , \end{aligned}$$

where \(\gamma = \beta / \beta _0\). In particular, uniform weights \(w_i^* = 1/4\) are again seen to be optimal in the case \(\beta = 0\) of constant intensity, as indicated in Fig. 4 by the vertical and horizontal dashed lines at \(\gamma _2=\beta _2=0\) and \(w^*=0.25\), respectively.

Also for IMSE-optimality, the optimal weights depend only on \(\gamma = \beta / \beta _0\) by the scaling property of Sect. 4. The locally optimal weights can only be determined numerically. For selected values of \(\gamma \), we present some numerical solutions in Table 2 in the case of the uniform weighting measure \(\nu \) on \([0, 1]^2\) which is invariant with respect to \(g_5\). These results were obtained by the method of augmented Lagrange multipliers implemented in the \(\textsf {R}\) package Rsolnp [35]. Similar to the D-criterion, the locally IMSE-optimal designs are seen to be minimally supported on \({\mathbf {x}}_1\), \({\mathbf {x}}_2\), and \({\mathbf {x}}_3\) when the standardized effect \(\gamma = \beta / \beta _0\) is sufficiently large, but the optimal weights vary in contrast to the local D-criterion. All four vertices are required for smaller values of \(\gamma \ge 0\). Note that this parameter region is considerably larger here than for the D-criterion. In the case \(\gamma = 0\), the optimal weights are again uniform on all vertices. Moreover, by the additional reflection \(g_2({\mathbf {x}}) = (1 - x_1, 1 - x_2)^\mathsf{T}\) the optimal weights can be transferred from \(\gamma > 0\) to \(\gamma < 0\) by the nonlinear transformation \({\tilde{g}}_2\) described in Sect. 4. For example, in the last column of Table 2, the locally IMSE-optimal design at \(\tilde{\varvec{\beta }} = (1, - 3/7, - 3/7)^\mathsf{T}\) is obtained from the locally IMSE-optimal design at \(\varvec{\beta } = (1, 3, 3)^\mathsf{T}\) by \(g_2\) and the corresponding (nonlinear) transformation \({\tilde{g}}_2(\varvec{\beta })\) which results in \({\tilde{\gamma }} = - \gamma / (1 + 2 \gamma )\).

Table 2 Locally IMSE-optimal weights in the two-factor gamma model on \([0,1]^2\)

We now turn to maximin efficiency where invariance can become a powerful tool. For this, we additionally require that the subregion \({\mathcal {B}}^{\prime }\) of interest is also invariant with respect to the pair \((g, {\tilde{g}})\) of transformations, i.e., \({\tilde{g}}({\mathcal {B}}^{\prime }) = {\mathcal {B}}^{\prime }\) for all \(g \in G\). As mentioned in Sect. 3, the homogeneous version of the D-criterion and the IMSE-criterion is multiplicatively equivariant. Hence, by (22) both maximin D- and IMSE-efficiency are invariant with respect to any group G of transformations satisfying the conditions of this section.

Corollary 8

If \({\tilde{g}}({\mathcal {B}}^{\prime }) = {\mathcal {B}}^{\prime }\) for all \(g \in G\), then there exists a maximin D-efficient design \(\xi ^*\) on \({\mathcal {B}}^{\prime }\) which is invariant with respect to G.

If additionally \(\nu \) is invariant with respect to G, then there exists a maximin IMSE-efficient design \(\xi ^*\) on \({\mathcal {B}}^{\prime }\) with respect to \(\nu \) which is invariant with respect to G.

Example

(Example 1 continued) In the one-factor gamma model with simple linear regression on [0, 1], the invariant design \({\bar{\xi }}\) which assigns equal weights 1/2 to the endpoints is both maximin D-efficient and maximin IMSE-efficient with respect to the uniform measure \(\nu \) on [0, 1] on \({\mathcal {B}}\) as has already been pointed out at the end of Sect. 3.

However, in contrast with the local criteria, there is no general majorization argument available for maximin efficiency criteria which allows restriction of the support of an optimal design to the extremal points of the experimental region. Therefore, to keep argumentation simple and to concentrate on the concept of invariance, we deliberately confine the support of the designs under consideration to these endpoints. Then, with respect to reflection, \(g(x) = 1 - x\), the only invariant design which assigns equal weights 1/2 to the endpoints is maximin efficient for any invariant criterion. In particular, this design is maximin IMSE-efficient on \({\mathcal {B}}\) with respect to any invariant weighting measure \(\nu \) as specified in Proposition 1. \(\square \)

Although, generally, the determination of the efficiencies requires the knowledge of all locally optimal designs, the maximin efficient design may be constructed without this information as the above example shows. This result can be extended to more complex models.

Example

(Example 2 continued) In the two-factor gamma model on \([0, 1]^2\) with multiple regression, we first consider maximin efficiency on the region \({\mathcal {B}}\) of all possible values for the parameter vector. This region is invariant under the transformations associated with the group \(G = \{g_1, \ldots , g_4\}\) of reflections of the covariates. As in the case of the one-factor gamma model, we deliberately confine the support of the designs to the vertices \({\mathbf {x}}_1, \ldots , {\mathbf {x}}_4\) of the experimental region to keep argumentation simple. Then, there is only one orbit which contains all vertices, and the only invariant design with respect to G is the uniform design \({\bar{\xi }}\) on the vertices which assigns equal weights 1/4 to each vertex. Hence, the design \({\bar{\xi }}\) is maximin efficient on \({\mathcal {B}}\) for any invariant criterion with respect to G.

This result carries over to any parameter subregion \({\mathcal {B}}^{\prime }\) which is invariant with respect to G. For example, if the intercept \(\beta _0\) is restricted to a subset, \(\beta _0 \in {\mathcal {B}}_0\), of its marginal region \((0, \infty )\) or, more specifically, set to a fixed value (\({\mathcal {B}}_0 = \{\beta _0\}\)), while the slopes may vary across their corresponding (conditional) regions, then the resulting subregion \({\mathcal {B}}^{\prime } = \{\varvec{\beta } \in {\mathcal {B}};\, \beta _0 \in {\mathcal {B}}_0\}\) is invariant with respect to the rescaled transformations \({\tilde{g}}\) associated with the transformations \(g \in G\). Hence, the uniform design \({\bar{\xi }}\) is also maximin efficient on \({\mathcal {B}}^{\prime }\) for any invariant criterion with respect to G. In particular, this holds for the reduced parameter region \({\mathcal {C}}\) displayed in Fig. 3 when \(\beta _0 = 1\) is fixed. \(\square \)

Invariance can also be employed in cases where there are fewer symmetries, and thus, there is more than one orbit, so that the weights of the orbits still have to be optimized.

Example

(Example 3 continued) In the two-factor gamma model on \([0, 1]^2\), we are now only interested in the parameter vectors \(\varvec{\beta }\) with equal slopes, i.e., \(\beta _1 = \beta _2 = \beta \). We thus consider the parameter subregion \({\mathcal {B}}^{\prime } = \{(\beta _0, \beta , \beta )^\mathsf{T};\, \beta> - \beta _0 / 2, \beta _0 > 0\}\). In terms of the reduced parameter \(\varvec{\gamma } = (\gamma _1, \gamma _2)^\mathsf{T}\), \(\gamma _j = \beta _j / \beta _0\), the subset \({\mathcal {B}}^{\prime }\) reduces to \({\mathcal {C}}^{\prime } = \{(\gamma , \gamma )^\mathsf{T}: \gamma > - 1/2\}\) which is exhibited in Fig. 3 by the diagonal dashed line.

For the transformation \(g_2({\mathbf {x}}) = (1 - x_1, 1 - x_2)^\mathsf{T}\) of simultaneous reflection of both explanatory variables, we use the rescaled transformation \({\tilde{g}}_2(\varvec{\beta }) = {\tilde{c}}_{\varvec{\beta }} {\mathbf {Q}}_{g_2}^{-\mathsf{T}} \varvec{\beta }\) of the parameter vector \(\varvec{\beta }\) which leaves the intercept \(\beta _0\) unchanged, i.e., \({\tilde{c}}_{\varvec{\beta }} = \beta _0 / (\beta _0 + \beta _1 + \beta _2) = 1 / (\gamma _0 + \gamma _1 + \gamma _2)\) which specializes to \({\tilde{c}}_{\varvec{\beta }} = \beta _0 / (\beta _0 + 2 \beta ) = 1 / (1 + 2 \gamma )\) on the subsets \({\mathcal {B}}^{\prime }\) and \({\mathcal {C}}^{\prime }\), respectively. In particular, the relevant reduced slope parameter \(\gamma > - 1/2\) is mapped to \( - \gamma / (1 + 2\gamma ) > - 1/2\) as mentioned before. Obviously, both \({\mathcal {B}}^{\prime }\) and \({\mathcal {C}}^{\prime }\) are invariant with respect to \({\tilde{g}}_2\).

To make use of the symmetries with respect to the transformations \(g_2\) and \(g_5\) jointly, we consider the group \(G^{\prime } = \{\mathrm {id}, g_2, g_5, g_6\}\) generated by them, where the composition \(g_6\) of \(g_2\) and \(g_5\) is the reflection at the secondary diagonal of the unit square \({\mathcal {X}}\), \(g_6({\mathbf {x}}) = (1 - x_2, 1 - x_1)^\mathsf{T}\). We again restrict attention to the vertices of the experimental region. Then there are just two orbits \(\{{\mathbf {x}}_1, {\mathbf {x}}_4\}\) and \(\{{\mathbf {x}}_2, {\mathbf {x}}_3\}\) of the group \(G^{\prime }\). The invariant designs \({\bar{\xi }}_w\) can thus be characterized by the weight w assigned to each of the settings \({\mathbf {x}}_1\) and \({\mathbf {x}}_4\) in the first orbit, \(0< w < 1/2\), while weight \(1/2 - w\) is assigned to each of the settings \({\mathbf {x}}_2\) and \({\mathbf {x}}_3\) in the second orbit. Design optimization is then reduced to determining the optimal weight w.

The determinant of the invariant design \(\bar{\xi }_w\) is given by

$$\begin{aligned} \det ({\mathbf {M}}(\bar{\xi }_w; \varvec{\beta })) = \frac{w (1 - 2 w) \left( (1 + \gamma )^2 + \gamma ^2 (1 - 2 w)\right) }{2 \beta _0^6 (1 + \gamma )^4 (1 + 2\gamma )^2} \end{aligned}$$

locally at \(\varvec{\beta } = (\beta _0, \beta , \beta )^\mathsf{T}\), where \(\gamma = \beta / \beta _0\). To determine the maximin D-efficient design over \({\mathcal {B}}^{\prime }\), we can confine the analysis to the reduced parameter region \({\mathcal {C}}^{\prime }\), i.e., \(\gamma > - 1/2\). For \(\gamma \ge 1\) the minimally supported design \(\xi ^*_{\varvec{\beta }}\) with equal weights 1/3 on \({\mathbf {x}}_1\), \({\mathbf {x}}_2\), and \({\mathbf {x}}_3\) is locally D-optimal and has \(\det ({\mathbf {M}}(\xi _{\varvec{\beta }}^*;\varvec{\beta })) = \beta _0^6 (1 + \gamma )^4 / 27\). Hence, for the D-efficiency of \(\bar{\xi }_w\) at \(\gamma \ge 1\), we get

$$\begin{aligned} \mathrm {eff}_{\textit{D}}(\bar{\xi }_w; \varvec{\beta })^3 = 27 \frac{w (1 - 2 w) \left( (1 + \gamma )^2 + \gamma ^2 (1 - 2 w)\right) }{2 (1 + 2 \gamma )^2} . \end{aligned}$$
(25)

The efficiency is decreasing in \(\gamma \ge 1\). Therefore, its infimum \(\inf _{\gamma \ge 1} \mathrm {eff}_{\textit{D}}(\bar{\xi }_w; \varvec{\beta })^3 = 27 w (1 - w) (1 - 2 w) / 4\) is obtained when \(\gamma \) tends to \(\infty \). This expression is maximized by \(w^* = (3 - \sqrt{3}) / 6 = 0.2113\), and the design \(\bar{\xi }_{w^*}\) is maximin D-efficient over \(\gamma \ge 1\) within the class of invariant designs \(\bar{\xi }_w\). This also holds for the parameter region \( - 1/2 < \gamma \le - 1/3\) by symmetry considerations with respect to the transformation \(g_2\) (or \(g_6\)). For the intermediate region (\( - 1/3< \gamma < 1\)), the efficiency has to be computed numerically. The D-efficiency of \({\bar{\xi }}_{w^*}\) is displayed in Fig. 5. By inspection of the plot, it can be concluded that \({\bar{\xi }}_{w^*}\) is maximin D-efficient on \({\mathcal {C}}^{\prime }\) and its minimal D-efficiency \(3 (w^* (1 - w^*) (1 - 2 w^*) / 4)^{1/3} = 0.8660\) is attained at the boundary of the parameter region. This result carries over to the whole parameter region \({\mathcal {B}}^{\prime }\) of equal slopes as well as to subregions \(\{\varvec{\beta } \in {\mathcal {B}}^{\prime };\, \beta _0 \in {\mathcal {B}}_0\}\) with constraints on the intercept \(\beta _0\).

For comparison, the D-efficiency is also plotted in Fig. 5 for the uniform design \(\bar{\xi }_{1/4}\) which is locally optimal at \(\gamma = 0\). The maximin D-efficiency of \(\bar{\xi }_{1/4}\) is also attained at the boundary and can be computed to be 0.8585 by (25). Hence, the maximin D-efficiency of \(\bar{\xi }_{1/4}\) is slightly worse than for the maximin D-efficient design \({\bar{\xi }}_{w^*}\).\(\square \)

Fig. 5
figure 5

D-efficiency of the maximin D-efficient design \({\bar{\xi }}_{w^*}\) (solid line) and the uniform design \({\bar{\xi }}_{1/4}\) (dashed line); the vertical dashed lines indicate the lower bound (\(\gamma = - 1/2\)) and the thresholds \(\gamma = - 1/3\) and \(\gamma = 1\) between the subregions

6 Discussion

In this article, we present an outline of the concept of invariance and equivariance in the design of experiments for generalized linear models. In contrast with the well-known results in linear models, where only the experimental settings are transformed, we have to consider pairs of transformations in generalized linear models which act simultaneously on the experimental settings and on the location parameters in the linear component. We focus on local optimality and maximin efficiency for the common D- and IMSE-criteria which allow a wide range of transformations for the experimental settings like scaling, permutations or reflections. As in linear models, the transformation of the experimental settings has to act in a linear way on the regression functions of the linear component. The parameters can then be transformed linearly in such a way that the value of the linear component and, hence, of the intensity is not changed (see Radloff and Schwabe [34]). Besides this natural choice, nonlinear transformations of the parameters may also be employed, if additional properties of the intensity function can be used. We illustrate this feature by the gamma model with inverse link for which the intensity is only scaled by a multiplicative factor based on the parameter. This scaling does not affect standardized design criteria like maximin efficiency, and invariance can also be used here. In Table 3, we exhibit which concepts of equivariance and invariance can be used under the model conditions of linear and generalized linear models and, in particular, for the gamma model with canonical (inverse) link.

Table 3 Transformations for equivariance (Equiv.) and invariance (Inv.); “\(+\)” indicates whether this property is required / can be used

The general results on equivariance and invariance in generalized linear models can be extended in a straightforward manner to other model specifications in which the intensity depends only on the linear component, as in censoring (see Schmidt and Schwabe [37], for examples). How far the results on nonlinear transformations can be extended, however, depends on the structure of the intensity function.

For other optimality criteria, such as A-, E- or, more generally, Kiefer’s \(\Phi _q\)-criteria which are based on the eigenvalues of the information matrix, the use of equivariance and invariance is limited, because additional structures of the transformations would be required, like orthogonality of the transformation matrices \({\mathbf {Q}}_g\).

For the case of maximin efficiency in the gamma model, it would be also desirable to obtain the majorization results as we have found for local optimality. These would allow us to restrict the search for the optimal experimental settings to the extremal points of the experimental region. However, the findings in Gaffke et al. [14] do not carry over, because the arguments used there are of a local nature and do not work uniformly on the parameter region. Alternatively, equivalence theorems could be employed for establishing maximin efficiency (see Pronzato and Pázman [31, ch. 8]), but in their formulation these theorems require that the minimal efficiency is attained inside the parameter region which is violated in our examples. It thus remains an open problem whether the restriction to the extremal points of the experimental region can be justified.

The concepts of equivariance and invariance can further be extended to models with random effects (see Graßhoff et al. [16] and Debusho and Haines [7] for the estimation of population parameters, and Prus and Schwabe [32] for individual prediction in linear mixed models).