1 Introduction

Count data plays an important role in medical and pharmaceutical development, marketing, or psychological research. For example, Vives et al. (2006) performed a review on articles published in psychological journals in the period from 2002 to 2006. There they found out that a substantial part of these articles dealt with count data for which the mean was quite low [for details we refer to the discussion in Graßhoff et al. (2020)]. In these situations, standard linear models are not applicable because they cannot account for the inherent heteroscedasticity. Instead, Poisson regression models are often more appropriate to describe such data. As an early source in psychological research, we may refer to the Rasch Poisson counts model introduced by Rasch (1960) in 1960 to predict person ability in an item response setup.

The Poisson regression model can be considered as a particular generalized linear model [see McCullagh and Nelder (1989)]. For the analysis of count data in the Poisson regression model, there is a variety of literature (see e.g. Cameron and Trivedi (2013)) and the statistical analysis is implemented in main standard statistical software packages (cf. “glm” in R,“GENLIN” in SPSS, “proc genmod” in SAS), But only few work has been done to design such experiments. Ford, Torsney and Wu derived optimal designs for the one-dimensional Poisson regression model in their pioneering paper on canonical transformations (Ford et al. 1992). Wang et al. (2006) obtained numerical solutions for optimal designs in two-dimensional Poisson regression models both for the main effects only (additive) model as well as for the model with interaction term. For the main effects only model, the optimality of their design was proven analytically by Russell et al. (2009) even for larger dimensions. Rodríguez-Torreblanca and Rodríguez-Díaz (2007) extended the result by Ford et al. for one-dimensional Poisson regression to overdispersed data specified by a negative binomial regression model, and Schmidt and Schwabe (2017) generalized the result by Russell et al. for higher-dimensional Poisson regression to a much broader class of additive regression models. Graßhoff et al. (2020) gave a complete characterization of optimal designs in an ANOVA-type setting for Poisson regression with binary predictors and Kahle et al. (2016) indicate, how interactions could be incorporated in this particular situation.

In the present paper, we find D-optimal designs for the two-dimensional Poisson regression model with synergetic interaction as considered before numerically by Wang et al. (2006). We show the D-optimality by reparameterizing the design space via hyperbolic coordinates, such that the inequalities in the Kiefer–Wolfowitz equivalence theorem only need to be checked on the boundary and the diagonal of the design region. This allows us to find an analytical proof for the D-optimality of the proposed design. Furthermore, we extend this result in various ways to higher-dimensional Poisson regression. First, we find D-optimal designs for first-order and second-order interactions, given that the prespecified interaction parameters are zero. Second, we present a D-optimal design for Poisson regression with first-order synergetic interaction where the design space is restricted to the union of the two-dimensional faces of the positive orthant.

The paper is organized as follows. In the next section, we introduce the basic notations for Poisson regression models and specify the corresponding concepts of information and design in Sect. 3. Results for two-dimensional Poisson regression with interaction are established in Sect. 4. In Sect. 5, we present some extensions to higher-dimensional Poisson regression models. Efficiencies of the found designs and further extensions are discussed in Sect. 6. Proofs have been deferred to an “Appendix”. We note that most of the inequalities there have first been detected by using the computer algebra system Mathematica (Wolfram Research, Inc 2020), but analytical proofs are provided in the “Appendix” for the readers’ convenience.

2 Model specification

We consider the Poisson regression model where observations Y are Poisson distributed with intensity \(E(Y)=\lambda ({\mathbf {x}})\) which depends on one or more explanatory variables \({\mathbf {x}}=(x_1,\ldots ,x_k)\) in terms of a generalized linear model. In particular, we assume a log-link which relates the mean \(\lambda ({\mathbf {x}})\) to a linear component \({\mathbf {f}}({\mathbf {x}})^\top \varvec{\beta }\) by \(\lambda ({\mathbf {x}}) = \exp ({\mathbf {f}}({\mathbf {x}})^\top \varvec{\beta })\), where \({\mathbf {f}}({\mathbf {x}})=(f_1({\mathbf {x}}),\ldots ,f_p({\mathbf {x}}))^\top \) is a vector of p known regression functions and \(\varvec{\beta }\) is a p-dimensional vector of unknown parameters. For example, if \({\mathbf {x}}=x\) is one-dimensional (\(k=1\)), then simple Poisson regression is given by \({\mathbf {f}}(x)=(1,x)^\top \) with \(p=2\), \(\varvec{\beta }=(\beta _0,\beta _1)^\top \) and intensity \(\lambda (x)=\exp (\beta _0+\beta _1 x)\). For two explanatory variables \({\mathbf {x}}=(x_1,x_2)\) (\(k=2\)), multiple Poisson regression without interaction is given by \({\mathbf {f}}({\mathbf {x}})=(1,x_1,x_2)^\top \) with \(p=3\), \(\varvec{\beta }=(\beta _0,\beta _1,\beta _2)^\top \) and intensity \(\lambda ({\mathbf {x}})=\exp (\beta _0+\beta _1 x_1+\beta _2 x_2)\).

In what follows, we will focus on the two-dimensional multiple regression (\({\mathbf {x}}=(x_1,x_2)\), \(k=2\)) with interaction term, where \(p=4\), \({\mathbf {f}}({\mathbf {x}})=(1,x_1,x_2,x_1 x_2)^\top \), \(\varvec{\beta }=(\beta _0,\beta _1,\beta _2,\beta _{12})^\top \) and intensity

$$\begin{aligned} \lambda ({\mathbf {x}})=\exp (\beta _0+\beta _1 x_1+\beta _2 x_2+\beta _{12} x_1 x_2). \end{aligned}$$
(2.1)

Here, \(\beta _0\) is an intercept term such that the mean is \(\exp (\beta _0)\) when the explanatory variables are equal to 0. The quantities \(\beta _1\) and \(\beta _2\) denote the direct effects of each single explanatory variable, and \(\beta _{12}\) describes the amount of the interaction effect when both explanatory variables are active (nonzero).

Typically the explanatory variables describe nonnegative quantities (\(x_1, x_2 \ge 0\)) like doses of some chemical or pharmaceutical agents—or difficulties of tasks in item response experiments in psychology. In particular, in the latter case the expected number of counts (correct answers) decreases with increasing difficulty. Then, it is reasonable to assume that the direct effects are negative (\(\beta _1,\beta _2 < 0\)), and that the interaction effect tends into the same direction if present (\(\beta _{12} \le 0\)). In the case that \(\beta _{12} < 0\), this will be called a synergy effect because it describes a strengthening of the effect if both components are used simultaneously.

3 Information and design

In experimental situations the setting \({\mathbf {x}}\) of the explanatory variables may be chosen by the experimenter from some experimental region \({\mathcal {X}}\). As the explanatory variables describe nonnegative quantities, and if there are no further restrictions on these quantities, it is natural to assume that the design region \({\mathcal {X}}\) is the nonnegative half-axis \([0,\infty )\) or the closure of quadrant I in the Cartesian plane, \([0,\infty )^2\), in one- or two-dimensional Poisson regression, respectively.

To measure the contribution of an observation Y at setting \({\mathbf {x}}\) the corresponding information can be used; With the log-link, the Poisson regression model constitutes a generalized linear model with canonical link (McCullagh and Nelder 1989). Furthermore for Poisson distributed observations Y, the variance and the mean coincide, \(\text {Var}(Y)={\mathbb {E}}(Y)=\lambda ({\mathbf {x}})\). Hence, according to Atkinson et al. (2014), the elemental (Fisher) information for an observation Y at a setting \({\mathbf {x}}\) is a \(p \times p\) matrix given by

$$\begin{aligned} {\mathbf {M}}_{{\varvec{\beta }}}({\mathbf {x}}) = \lambda ({\mathbf {x}}){\mathbf {f}}({\mathbf {x}}) {\mathbf {f}}({\mathbf {x}})^{\top }. \end{aligned}$$

Note that on the right-hand side, the intensity \(\lambda ({\mathbf {x}})=\exp ({\mathbf {f}}({\mathbf {x}})^\top \varvec{\beta })\) depends on the linear component \({\mathbf {f}}({\mathbf {x}})^\top \varvec{\beta }\) and, hence, on the parameter vector \(\varvec{\beta }\). Consequently, also the information depends on \(\varvec{\beta }\) as indicated by the notation \({\mathbf {M}}_{{\varvec{\beta }}}\).

For N independent observations \(Y_1,\ldots ,Y_N\) at settings \({\mathbf {x}}_1,\ldots ,{\mathbf {x}}_N\), the joint Fisher information matrix is obtained as the sum of the elemental information matrices,

$$\begin{aligned} {\mathbf {M}}_{{\varvec{\beta }}}({\mathbf {x}}_1,\ldots ,{\mathbf {x}}_N) = \sum _{i=1}^N \lambda ({\mathbf {x}}_i){\mathbf {f}}({\mathbf {x}}_i) {\mathbf {f}}({\mathbf {x}}_i)^{\top }. \end{aligned}$$

The collection \({\mathbf {x}}_1,\ldots ,{\mathbf {x}}_N\) of settings is called an exact design, and the aim of design optimization is to choose these settings such that the statistical analysis is improved. The quality of a design can be measured in terms of the information matrix because its inverse is proportional to the asymptotic covariance matrix of the maximum-likelihood estimator of \(\varvec{\beta }\), see Fahrmeir and Kaufmann (1985). Hence, larger information means higher precision. However, matrices are not comparable in general. Therefore, one has to confine oneself to some real valued criterion function applied to the information matrix. In accordance with the literature, we will use the most popular D-criterion which aims at maximizing the determinant of the information matrix. This criterion has nice analytical properties and can be interpreted in terms of minimization of the volume of the asymptotic confidence ellipsoid for \(\varvec{\beta }\) based on the maximum-likelihood estimator.

The optimal design will depend on the parameter vector \(\varvec{\beta }\) and is, hence, locally optimal in the spirit of Chernoff (1953). This means that the resulting design has an optimal performance when the true parameter is equal to the prespecified value used in design optimization. These locally optimal designs serve well when strong initial knowledge is available for the parameter, e. g. in the form of a null hypothesis. They also provide benchmarks for the best possible values for the criterion and in conjunction with the calculation of efficiencies, as involved in standardized criteria (see Dette 1997). Moreover, locally optimal designs can be used as bricks in adaptive learning procedures, when the design is adjusted during the experiments by locally optimal designs based on the current parameter estimates.

Finding an optimal exact design is a discrete optimization problem which is often too hard for analytical solutions. Therefore, we adopt the concept of approximate designs in the spirit of Kiefer (1974). An approximate design \(\xi \) is defined as a collection \({\mathbf {x}}_0,\ldots ,{\mathbf {x}}_{n-1}\) of n mutually distinct settings in the design region \({\mathcal {X}}\) with corresponding weights \( w_0,\ldots ,w_{n-1} \ge 0\) satisfying \(\sum _{i=0}^{n-1} w_i = 1\). Then, an exact design can be written as an approximate design, where \({\mathbf {x}}_0,\ldots ,{\mathbf {x}}_{n-1}\) are the mutually distinct settings in the exact design with corresponding numbers \(N_0,\ldots ,N_{n-1}\) of replications, \(\sum _{i=0}^{n-1} N_i = N\), and frequencies \(w_i = N_i / N\), \(i=0,\ldots ,{n-1}\). However, in an approximate design the weights are relaxed from multiples of 1/N to nonnegative real numbers which allow for continuous optimization.

For an approximate design \(\xi \) the information matrix is defined as

$$\begin{aligned} {\mathbf {M}}_{{\varvec{\beta }}}(\xi ) = \sum _{i=0}^{n-1} w_i \lambda ({\mathbf {x}}_i) {\mathbf {f}}({\mathbf {x}}_i) {\mathbf {f}}({\mathbf {x}}_i)^\top , \end{aligned}$$

which therefore coincides with the standardized (per observation) information matrix \(\frac{1}{N}{\mathbf {M}}_{{\varvec{\beta }}}({\mathbf {x}}_1,\ldots ,{\mathbf {x}}_N)\). An approximate design \(\xi ^*\) will be called locally D-optimal at \(\varvec{\beta }\) if it maximizes the determinant of the information matrix \({\mathbf {M}}_{{\varvec{\beta }}}(\xi )\).

4 Optimal designs

We start with quoting results from the literature for one-dimensional and two-dimensional regression without interaction: In the case of one-dimensional Poisson regression, the design \(\xi _{\beta _1}^*\) which assigns equal weights \(w_0^*=w_1^*=1/2\) to the two settings \(x_0^*=0\) and \(x_1^*=2/|\beta _1|\) is locally D-optimal at \(\varvec{\beta }\) on \({\mathcal {X}}=[0,\infty )\) for \(\beta _1<0\), see Rodríguez-Torreblanca and Rodríguez-Díaz (2007).

In the case of two-dimensional Poisson regression without interaction the design \(\xi _{\beta _1,\beta _2}^*\) which assigns equal weights \(w_0^*=w_1^*=w_2^*=1/3\) to the three settings \({\mathbf {x}}_0^*=(0,0)\), \({\mathbf {x}}_1^*=(2/|\beta _1|,0)\), and \({\mathbf {x}}_2^*=(0,2/|\beta _2|)\) is locally D-optimal at \(\varvec{\beta }\) on \({\mathcal {X}}=[0,\infty )^2\) for \(\beta _1,\beta _2<0\), see Russell et al. (2009). Note that the optimal coordinates on the axes coincide with the optimal values in the one-dimensional case, see Schmidt and Schwabe (2017).

In both cases the optimal design is minimally supported, i.e., the number n of support points of the design is equal to the number p of parameters. It is well-known that for D-optimal minimally supported designs the optimal weights are all equal, \(w_i^*=1/p\), see Silvey (1980). Such optimal designs are attractive as they can be realized as exact designs when the sample size N is a multiple of the number of parameters p.

Further note that these optimal designs always include the setting \(x_0=0\) or \({\mathbf {x}}_0=(0,0)\), respectively, where the intensity \(\lambda \) attains its largest value.

The above findings coincide with the numerical results obtained by Wang et al. (2006) who also numerically found minimally supported D-optimal designs for the case of two-dimensional Poisson regression with interaction. In what follows, we will give explicit formulae for these designs and establish rigorous analytical proofs of their optimality.

We start with the special situation of vanishing interaction (\(\beta _{12}=0\)). In this case standard methods of factorization can be applied to establish the optimal design, see Schwabe (1996, section 4).

Theorem 4.1

If \(\beta _1,\beta _2<0\) and \(\beta _{12}=0\), then the design \(\xi _{\beta _1}^*\otimes \xi _{\beta _2}^*\) which assigns equal weights \(w_0^*=w_1^*=w_2^*=w_3^*=1/4\) to the four settings \({\mathbf {x}}_0^*=(0,0)\), \({\mathbf {x}}_1^*=(2/|\beta _1|,0)\), \({\mathbf {x}}_2^*=(0,2/|\beta _2|)\), and \({\mathbf {x}}_3^*=(2/|\beta _1|,2/|\beta _2|)\) is locally D-optimal at \(\varvec{\beta }\) on \({\mathcal {X}}=[0,\infty )^2\).

In contrast to the result of Theorem 4.1 the intensity fails to factorize in the case of a non-vanishing interaction (\(\beta _{12} \ne 0\)). Thus, a different approach has to be chosen. As a prerequisite, we mention that in the above cases the optimal designs can be derived from those for standard parameter values \(\beta _0=0\) and \(\beta _1=-1\) in one dimension or \(\beta _1=\beta _2=-1\) in two dimensions by canonical transformations, see Ford et al. (1992), or, more generally, by equivariance considerations, see Radloff and Schwabe (2016). We will adopt this approach also to the two-dimensional Poisson regression model with interaction and consider the case \(\beta _0=0\) and \(\beta _1=\beta _2=-1\) first. There the interaction effect remains a free parameter, and we denote the strength of the synergy effect by \(\rho =-\beta _{12} \ge 0\).

4.1 Standardized case

Throughout this subsection, we assume the standardized situation with \(\varvec{\beta }=(0,-1,-1,-\rho )^\top \) for some \(\rho \ge 0\). Motivated by Theorem 4.1 and the numerical results in Wang et al. (2006) we consider a class \(\Xi _0\) of minimally supported designs as potential candidates for being optimal. In the class \(\Xi _0\), the designs have one setting at the origin \({\mathbf {x}}_0=(0,0)\), where the intensity is highest, one setting \({\mathbf {x}}_1=(x_1,0)\) and \({\mathbf {x}}_2=(0,x_2)\) on each of the bounding axes of the design region as for the optimal design in the model without interaction, and an additional setting \({\mathbf {x}}_3=(t,t)\) on the diagonal of the design region, where the effects of the two components are equal. The following result is due to Könner (2011).

Lemma 4.2

Let \(t=(\sqrt{1+8\rho }-1)/(2\rho )\) for \(\rho > 0\) and \(t=2\) for \(\rho = 0\). Then, the design \(\xi _t\) which assigns equal weights 1/4 to \({\mathbf {x}}_0=(0,0)\), \({\mathbf {x}}_1=(2,0)\), \({\mathbf {x}}_2=(0,2)\), and \({\mathbf {x}}_3=(t,t)\) is locally D-optimal within the class \(\Xi _0\).

Note that \(t=2\) for \(\rho = 0\) is in accordance with the optimal product-type design in Theorem 4.1, t is continuously decreasing in \(\rho \), and t tends to 0 when the strength of synergy \(\rho \) gets arbitrarily large. Figure 1 shows the value of t in dependence on \( \rho \).

Fig. 1
figure 1

Value of optimal t in Lemma 4.2 for \( -1/8\le \rho \le 3 \). Negative values of \( \rho \) refer to Lemma 6.2

To establish that \(\xi _t\) is locally D-optimal within the class of all designs on \({\mathcal {X}}\) we will make use of the Kiefer–Wolfowitz equivalence theorem Kiefer and Wolfowitz (1960) in its extended version incorporating intensities, see Fedorov (1972). For this, we introduce the sensitivity function \( \psi ({\mathbf {x}};\xi ) = \lambda ({\mathbf {x}}) {\mathbf {f}}({\mathbf {x}})^{\top } {\mathbf {M}}(\xi )^{-1} {\mathbf {f}}({\mathbf {x}}) , \) where we suppress the dependence on \(\varvec{\beta }\) in the notation. Then by the equivalence theorem, a design \(\xi ^*\) is (locally) D-optimal if (and only if) the sensitivity function \(\psi ({\mathbf {x}};\xi ^*)\) does not exceed the number p of parameters uniformly on the design region \({\mathcal {X}}\). Equivalently, we may consider the deduced sensitivity function

$$\begin{aligned} d({\mathbf {x}};\xi )&= {\mathbf {f}}({\mathbf {x}})^{\top } {\mathbf {M}}(\xi )^{-1} {\mathbf {f}}({\mathbf {x}})/p - 1/\lambda ({\mathbf {x}}) \end{aligned}$$

as \( \lambda ({\mathbf {x}})>0 \). Then \(\xi _t\) is D-optimal if \(d({\mathbf {x}};\xi _t) \le 0\) for all \({\mathbf {x}}\in {\mathcal {X}}\). To establish this condition we need some preparatory results on the shape of the (deduced) sensitivity function. Figure 2 shows \( d({\mathbf {x}};\xi _t) \) for \( t=2 \) for \( \rho =0 \), i.e. for the standardized setting in Theorem 4.1.

Fig. 2
figure 2

Deduced sensitivity function for \( t=2 \) (\( \rho =0 \))

Lemma 4.3

If \(\xi \) is invariant under permutation of \(x_1\) and \(x_2\), then \(d({\mathbf {x}};\xi )\) attains its maximum on the boundary or on the diagonal of \({\mathcal {X}}\).

Lemma 4.4

\(d((x,0);\xi _{t}) = d((0,x);\xi _{t}) \le 0\) for all \(x \ge 0\).

Lemma 4.5

\(d((x,x);\xi _{t}) \le 0\) for all \(x \ge 0\).

Note that \(\xi _t\) is invariant with respect to the permutation of \(x_1\) and \(x_2\). Then, combining Lemmas 4.3 to 4.5, we obtain \(d({\mathbf {x}};\xi _{t}) \le 0\) for all \({\mathbf {x}} \in {\mathcal {X}}\) which establishes the D-optimality of \(\xi _t\) in view of the equivalence theorem.

Theorem 4.6

In the two-dimensional Poisson regression model with interaction, the design \(\xi _t\) is locally D-optimal at \(\varvec{\beta }=(0,-1,-1,-\rho )^\top \) on \({\mathcal {X}}=[0,\infty )^2\) which assigns equal weights 1/4 to the 4 settings \({\mathbf {x}}_0=(0,0)\), \({\mathbf {x}}_1=(2,0)\), \({\mathbf {x}}_2=(0,2)\), and \({\mathbf {x}}_3=(t,t)\), where \(t=(\sqrt{1+8\rho }-1)/(2\rho )\) for \(\rho > 0\) and \(t=2\) for \(\rho = 0\).

4.2 General case

For the general situation of decreasing intensities (\(\beta _1,\beta _2 < 0\)) and a synergy effect (\(\beta _{12} < 0\)), the optimal design can be obtained by simultaneous scaling of the settings \({\mathbf {x}} = (x_1,x_2) \rightarrow \tilde{{\mathbf {x}}} = (x_1/|\beta _1|,x_2/|\beta _2|)\) and of the parameters \(\varvec{\beta } = (0,-1,-1,-\rho )^\top \rightarrow \tilde{\varvec{\beta }} = (0,\beta _1,\beta _2,-\rho \beta _1\beta _2)^\top \) by equivariance, see Radloff and Schwabe (2016). This simultaneous scaling leaves the linear component and, hence, the intensity unchanged, \({\mathbf {f}}(\tilde{{\mathbf {x}}})^\top \tilde{\varvec{\beta }} ={\mathbf {f}}({\mathbf {x}})^\top \varvec{\beta }\). If the scaling of \({\mathbf {x}}\) is applied to the settings in \(\xi _t\) of Theorem 4.6, then the resulting rescaled design will be locally D-optimal at \(\tilde{\varvec{\beta }}\) on \({\mathcal {X}}\) as the design region is invariant with respect to scaling. Furthermore, the design optimization is not affected by the value \(\beta _0\) of the intercept term because this term contributes to the intensity and, hence, to the information matrix only by a multiplicative factor, \(\lambda ({\mathbf {x}}) = \exp (\beta _0)\exp (\beta _1 x_1 + \beta _2 x_2 + \beta _{12} x_1 x_2)\). We thus obtain the following result from Theorem 4.6.

Corollary 4.7

Assume the two-dimensional Poisson regression model with interaction and \(\varvec{\beta }=(\beta _0,\beta _1,\beta _2,\beta _{12})^\top \) with \(\beta _1,\beta _2 < 0\) and \(\beta _{12} \le 0\). Let \(\rho = - \beta _{12}/(\beta _1\beta _2)\), \(t=(\sqrt{1+8\rho }-1)/(2\rho )\) for \(\beta _{12} < 0\) and \(t=2\) for \( \beta _{12} = 0\). Then, the design which assigns equal weights 1/4 to the 4 settings \({\mathbf {x}}_0=(0,0)\), \({\mathbf {x}}_1=(2/|\beta _1|,0)\), \({\mathbf {x}}_2=(0,2/|\beta _2|)\), and \({\mathbf {x}}_3=(t/|\beta _1|,t/|\beta _2|)\) is locally D-optimal at \(\varvec{\beta }\) on \({\mathcal {X}}=[0,\infty )^2\).

Note that the settings \({\mathbf {x}}_0\), \({\mathbf {x}}_1\), and \({\mathbf {x}}_2\) of the locally D-optimal design \(\xi _t\) in the model with interaction coincide with those of the optimal design for the model without interaction. Only a fourth setting \({\mathbf {x}}_3=(t/|\beta _1|,t/|\beta _2|)\) has been added in the interior of the design region.

5 Higher-dimensional models

In the present section on k-dimensional Poisson regression with k explanatory variables (\({\mathbf {x}}=(x_1,x_2,\ldots ,x_k)\), \(k \ge 3\)), we restrict to the standardized case with zero intercept (\(\beta _0=0\)) and all main effects \(\beta _1=\cdots =\beta _k\) equal to \(-1\) for simplicity of notation. Extensions to the case of general \(\beta _0\) and \(\beta _1,\ldots ,\beta _k<0\) can be obtained by the scaling method used for Corollary 4.7.

We first note that for the k-dimensional Poisson regression without interactions

$$\begin{aligned} {\mathbf {f}}({\mathbf {x}})^\top \varvec{\beta } = \beta _0 + \sum _{j=1}^k \beta _j x_j \end{aligned}$$

Russell et al. (2009) showed that the minimally supported design which assigns equal weights \(1/(k+1)\) to the origin \({\mathbf {x}}_0=(0,\ldots ,0)\) and the k axial settings \({\mathbf {x}}_1=(2,0,\ldots ,0)\), \({\mathbf {x}}_2=(0,2,\ldots ,0)\), \(\ldots \), \({\mathbf {x}}_k=(0,\ldots ,0,2)\) is locally D-optimal at \(\varvec{\beta }=(0,-1,\ldots ,-1)^\top \). Schmidt and Schwabe (2017) more generally proved that in models without interactions the locally D-optimal design points coincide with their counterparts in the marginal one-dimensional models. This approach will be extended in Theorems 5.2 and 5.4 to two- and three-dimensional marginals with interactions.

In what follows, we mainly consider the particular situation that all interactions occurring in the models have values equal to 0 and that the design region is the full orthant \({\mathcal {X}}=[0,\infty )^k\). Setting the interactions to zero does not mean that we presume to know that there are no interactions in the model. Instead, we are going to determine locally optimal designs in models with interactions which are locally optimal at such \(\varvec{\beta }\) for which all interaction terms attain the value 0.

We start with a generalization of Theorem 4.1 to a k-dimensional Poisson regression model with complete interactions

$$\begin{aligned} {\mathbf {f}}({\mathbf {x}})^\top \varvec{\beta } = \beta _0 + \sum _{j=1}^k \beta _j x_j + \sum _{i<j} \beta _{ij} x_i x_j + \cdots + \beta _{12\ldots k} x_1 x_2 \ldots x_k, \end{aligned}$$

where the number of parameters is \(p = 2^k\).

Theorem 5.1

In the k-dimensional Poisson regression model with complete interactions the minimally supported design \(\xi _{-1}^* \otimes \cdots \otimes \xi _{-1}^*\) which assigns equal weights 1/p to the \(p=2^k\) settings of the full factorial on \(\{0,2\}^k\) is locally D-optimal at \(\varvec{\beta }\) on \({\mathcal {X}}=[0,\infty )^k\), when \(\beta _1=\cdots =\beta _k=-1\) and all interactions \(\beta _{ij}, \ldots , \beta _{12\ldots k}\) are equal to 0.

The proof of Theorem 5.1 follows the lines of the proof of Theorem 4.1 as all of the design region \({\mathcal {X}}\), the vector of regression functions \({\mathbf {f}}\), and the intensity function \(\lambda \) factorize to their one-dimensional counterparts. Hence, details will be omitted.

Now, we come back to the Poisson regression model with first-order interactions

$$\begin{aligned} {\mathbf {f}}({\mathbf {x}})^\top \varvec{\beta } = \beta _0 + \sum _{j=1}^k \beta _j x_j + \sum _{i<j} \beta _{ij} x_i x_j , \end{aligned}$$

where the number of parameters is \(p = 1+k+k(k-1)/2\).

Theorem 5.2

In the k-dimensional Poisson regression model with first-order interactions, the minimally supported design which assigns equal weights 1/p to the \(p=1+k+k(k-1)/2\) settings \({\mathbf {x}}_0=(0,0,\ldots ,0)\), \({\mathbf {x}}_1=(2,0,\ldots ,0)\), \({\mathbf {x}}_2=(0,2,\ldots ,0)\), \(\ldots \), \({\mathbf {x}}_k=(0,\ldots ,0,2)\), and \({\mathbf {x}}_{ij}={\mathbf {x}}_i+{\mathbf {x}}_j\), \(1 \le i < j \le k\), is locally D-optimal at \(\varvec{\beta }\) on \({\mathcal {X}}=[0,\infty )^k\), when \(\beta _1=\cdots =\beta _k=-1\) and \(\beta _{ij} = 0\), \(1 \le i < j \le k\).

For illustrative purposes, we specify this result for \(k=3\) components.

Corollary 5.3

In the three-dimensional Poisson regression model with first-order interactions

$$\begin{aligned} {\mathbf {f}}({\mathbf {x}})^\top \varvec{\beta } = \beta _0 + \beta _1 x_1 + \beta _2 x_2 + \beta _3 x_3 + \beta _{12} x_1 x_2 + \beta _{13} x_1 x_3 + \beta _{23} x_2 x_3 \end{aligned}$$

the minimally supported design which assigns equal weights 1/7 to the 7 settings \({\mathbf {x}}_0=(0,0,0)\), \({\mathbf {x}}_1=(2,0,0)\), \({\mathbf {x}}_2=(0,2,0)\), \({\mathbf {x}}_3=(0,0,2)\), \({\mathbf {x}}_4=(2,2,0)\), \({\mathbf {x}}_5=(2,0,2)\), and \({\mathbf {x}}_6=(0,2,2)\) is locally D-optimal at \(\varvec{\beta }\) on \({\mathcal {X}}=[0,\infty )^3\), when \(\beta _1=\beta _2=\beta _3=-1\) and \(\beta _{12} = \beta _{13} = \beta _{23} = 0\).

Fig. 3
figure 3

Design points in Example 5.3

The optimal design points of Corollary 5.3 are visualized in Fig. 3. Note that in in the Poisson regression model with first-order interactions the locally D-optimal design has only support points on the axes and on the diagonals of the faces, but none in the interior of the design region, and that the support points on each face coincide with the optimal settings for the corresponding two-dimensional marginal model. Thus, only those settings are included from the full factorial \(\{0,2\}^k\) of the complete interaction case (Theorem 5.1) which have, at most, two nonzero components, and the locally D-optimal design concentrates on settings with higher intensity. This is in accordance with the findings for the Poisson regression model without interactions, where only those settings will be used which have, at most, one nonzero component, and carries over to higher-order interactions. In particular, for the Poisson regression model with second-order interactions

$$\begin{aligned} {\mathbf {f}}({\mathbf {x}})^\top \varvec{\beta } = \beta _0 + \sum _{j=1}^k \beta _j x_j + \sum _{i<j} \beta _{ij} x_i x_j + \sum _{i<j<\ell } \beta _{ij\ell } x_i x_j x_\ell , \end{aligned}$$

where the number of parameters is \(p = 1+k+k(k-1)/2+k(k-1)(k-2)/6\), we obtain a similar result.

Theorem 5.4

In the k-dimensional Poisson regression model with second-order interactions the minimally supported design which assigns equal weights 1/p to the \(p=1+k+k(k-1)/2+k(k-1)(k-2)/6\) settings \({\mathbf {x}}_0=(0,0,\ldots ,0)\), \({\mathbf {x}}_1=(2,0,\ldots ,0)\), \({\mathbf {x}}_2=(0,2,\ldots ,0)\), \(\ldots \), \({\mathbf {x}}_k=(0,\ldots ,0,2)\), \({\mathbf {x}}_{ij}={\mathbf {x}}_i+{\mathbf {x}}_j\), \(1 \le i < j \le k\), and \({\mathbf {x}}_{ij\ell }={\mathbf {x}}_i+{\mathbf {x}}_j+{\mathbf {x}}_\ell \), \(1 \le i< j < \ell \le k\), is locally D-optimal at \(\varvec{\beta }\) on \({\mathcal {X}}=[0,\infty )^k\), when \(\beta _1=\cdots =\beta _k=-1\), \(\beta _{ij} = 0\), \(1 \le i < j \le k\), and \(\beta _{ij\ell } = 0\), \(1 \le i< j < \ell \le k\).

The proofs of Theorems 5.2 and 5.4 are based on symmetry properties which get lost if one or more of the interaction terms are nonzero. However, if only few components of \({\mathbf {x}}\) may be active (nonzero), then locally D-optimal designs may be obtained in the spirit of the proof of Lemma 4.4 for synergetic interaction effects. We demonstrate this in the setting of first-order interactions \(\rho _{ij}=-\beta _{ij}\ge 0\), when the design region \({\mathcal {X}}\) consists of the union of the two-dimensional faces of the orthant, i. e. when, at most, two components of \({\mathbf {x}}\) can be active.

Theorem 5.5

Consider the k-dimensional Poisson regression model with first-order interactions on \({\mathcal {X}}=\bigcup _{i<j}{\mathcal {X}}_{ij}\), where \({\mathcal {X}}_{ij}=\{(x_1,\ldots ,x_k);\ x_i,x_j \ge 0, x_\ell =0 ~~\text {for}~~ \ell \ne i,j\}\) is the two-dimensional face related to the ith and jth component. Let \(\beta _1=\cdots =\beta _k=-1\), \(\rho _{ij} = -\beta _{ij} \ge 0\), \(t_{ij}=(\sqrt{1+8\rho _{ij}}-1)/(2\rho _{ij})\) for \(\rho _{ij} > 0\), \(t_{ij}=2\) for \(\rho _{ij} = 0\), and \({\mathbf {x}}_{ij}\in {\mathcal {X}}_{ij}\) with \(x_i=x_j=t_{ij}\), \(1 \le i < j \le k\). Then, the minimally supported design which assigns equal weights \(1/(1+k+k(k-1)/2)\) to the \(1+k+k(k-1)/2\) settings \({\mathbf {x}}_0=(0,0,\ldots ,0)\), \({\mathbf {x}}_1=(2,0,\ldots ,0)\), \({\mathbf {x}}_2=(0,2,\ldots ,0)\), \(\ldots \), \({\mathbf {x}}_k=(0,\ldots ,0,2)\), and \({\mathbf {x}}_{ij}\), \(1 \le i < j \le k\), is locally D-optimal at \(\varvec{\beta }\) on \({\mathcal {X}}\).

This result follows as in the proof of Lemma 4.4. We believe that the D-optimality of the design in Theorem 5.5 could also hold on the whole positive orthant if we assume that the prespecified interaction parameters are identical and non-positive. A proof of this statement should follow in the spirit of Farrell et al. (1967), similar to the constructions in the Lemmas 4.3 and 4.5 and the proof of Theorem 5.2.

However, in the situation of general synergy effects, an analogon to Lemma 4.3 cannot be established because of the lacking symmetry. Hence, it remains open whether the design of Theorem 5.5 retains its optimality in the general setting, as conjectured by Wang et al. (2006).

6 Efficiency and extensions

In this section, we compute the efficiency of the locally optimal designs in case that the intersection parameter \( \rho \) is misspecified and compare their performance with a competitor design inspired from applications which is supported on a wider range of settings. We further indicate an extension of the present results to bounded design regions and to the situation of antagonistic interaction effects (\( \rho <0 \)) and its limitations. Although the locally D-optimal designs only differ in the location of the support point on the diagonal, if the main effects are kept fixed, they are quite sensitive with respect to the strength \(\rho \) of the synergy parameter in their performance. The quality of their performance can be measured in terms of the local D-efficiency which is defined as \(\mathrm {eff}_D(\xi ,\varvec{\beta }) = \left( \det ({\mathbf {M}}_{{\varvec{\beta }}}(\xi ))/\det ({\mathbf {M}}_{{\varvec{\beta }}}(\xi _{{\varvec{\beta }}}^*))\right) ^{1/p}\) for a design \(\xi \), where \(\xi _{{\varvec{\beta }}}^*\) denotes the locally D-optimal design at \(\varvec{\beta }\). This efficiency can be interpreted as the asymptotic proportion of observations required for the locally D-optimal \(\xi _{{\varvec{\beta }}}^*\) to obtain the same precision as for the competing design \(\xi \) of interest. For example, in the standardized case of Sect. 4.1 the design \(\xi _x\) would be locally D-optimal when the strength of synergy would be \((2-x)/x^2\). Its local D-efficiency can be calculated as \(\mathrm {eff}_D(\xi ,\varvec{\beta }) = (x/t) \exp ((2 t + \rho t^2 - 2 x - \rho x^2)/4)\) when \(\rho \) is the true strength of synergy and t is the corresponding optimal coordinate on the diagonal (\(t=(\sqrt{1+8\rho }-1)/(2\rho )\) for \(\rho > 0\) and \(t=2\) for \(\rho = 0\)). For selected values of x the local D-efficiencies are depicted in Fig. 4 over the range \( 0 \le \rho \le 10 \) for the true interaction effect \( \rho \).

Fig. 4
figure 4

Efficiency of \( \xi _x \) for \( x=2 \) (solid line), \( x=1 \) (dashed), \( x=1/2 \) (dotted) and efficiency of the derived design in Example 6.1 (dot-dashed)

The appealing product-type design \(\xi _2\) of Theorem 4.1 rapidly loses efficiency if the strength \(\rho \) of synergy substantially increase. The triangular design \(\xi _1\) seems to be rather robust over a wide range of strength parameters, while for smaller x the design \(\xi _x\) loses efficiency when there is no synergy effect (\(\rho =0\)).

Example 6.1

To compare the optimal designs with a design from applications, we compute the efficiency of a design inspired by an example in Tallarida (2000, p.63), where the synergism between two pharmaceutical agents, Morphine and Clonidine, is investigated. As in the above design \( \xi _1 \) the dose levels for the combination drug should have the same effect as the dose levels of the single drugs when there is no interaction effect present. For the single drugs, three dose levels are chosen equidistantly with the middle level equal to the optimal setting \( x=2 \). The derived standardized design distributes a quarter of the observations to a control point \( (x_1,x_2)=(0,0) \) and 1/12 to the points (0, 1), (0, 2), (0, 3), (1, 0), (2, 0), (3, 0), (1/2, 1/2),  (1, 1) and (3/2, 3/2) . For \( \rho =0 \), the efficiency of this design is about 0.784, while the maximum efficiency of 0.853 is achieved for \( \rho \approx 0.514 \). Figure 4 shows the robustness of this design, such that its efficiency is above 0.7 for a wide interval of parameter values.

In order to obtain designs which are less sensitive against misspecification of the interaction parameter robust design criteria may be employed like maximin D-efficient or weighted (“Bayesian”) optimal designs [see e.g., Atkinson et al. (2007)], but this would go beyond the scope of the present paper.

If in contrast to the situation of Theorem 4.6 and Corollary 4.7, there is an antagonistic interaction effect which means that \(\beta _{12}\) is positive (\(\rho <0\)), no optimal design will exist on quadrant I because the determinant of the information matrix becomes unbounded. However, if we restrict the design region to a rectangle one may be tempted to extend the above results. For example, in the standardized case (\(\beta _1=\beta _2=-1\)) on a square design region Lemma 4.2 may be extended as follows

Lemma 6.2

Let \(b\ge 2\), \(\rho < 0\), and \(t=(\sqrt{1+8\rho }-1)/(2\rho )\) for \(\rho > -1/8\).

  1. (a)

    If \(\rho > -1/8\), \(t \le b\) and \(t^4\exp (-2 t - \rho t^2) \ge b^4\exp (-2 b - \rho b^2)\), then the design \(\xi _t\) is locally D-optimal within the class \(\Xi _0\) on \({\mathcal {X}}=[0,b]^2\).

  2. (b)

    If \(\rho \le -1/8\) or \(b < t\) or \(t^4\exp (-2 t - \rho t^2) < b^4\exp (-2 b - \rho b^2)\), then the design \(\xi _b\) is locally D-optimal within the class \(\Xi _0\) on \({\mathcal {X}}=[0,b]^2\).

Moreover, Lemma 4.4 does not depend on \(\rho \) and, if, additionally, \(b \le 1/|\rho |\), then the argumentation in the proof of Lemma 4.3 can be adopted, where now the hyperbolic coordinate system is centered at \((1/|\rho |,1/|\rho |)\) and v is negative (cf. the proof below). However, the inequalities of Lemma 4.5 are no longer valid, in general. In particular, for \(\rho \) less than, but close to \(-1/8\) the (deduced) sensitivity function of the design \(\xi _t\) shows a local minimum at t rather than a maximum which disproves the optimality of \(\xi _t\) within the class of all designs on \({\mathcal {X}}=[0,b]^2\). In that case an additional fifth support point is required on the diagonal, and also the weights have to be optimized. So, in the case of an antagonistic interaction effect no general analytic solution can be expected and the numerically obtained optimal designs may become difficult to be realized as exact designs.

For even smaller design regions (\(b<2\)) design points on the adverse boundaries (\(x_1=b\) or \(x_2=b\)) may occur in the optimal designs, but not in the interior besides the diagonal, both in the synergetic as well as in the antagonistic case.

In the multi-factor case (\( k>2 \)) of Sect. 5, the locally optimal design \( \xi \) of Theorem 5.2 (first-order interactions) has efficiency \( \mathrm {eff}_D(\xi ,\beta ) = (\prod _{i<j} \mathrm {eff}_D(\xi _2,\rho _{ij})^4)^{1/p} \), where \( \mathrm {eff}_D(\xi _2,\rho ) \) is the efficiency of the product type design \( \xi _2 \) in the two-factor case as exhibited in Fig. 4 when \( \rho \) is the true value of the interaction parameter and \( p=1+k+k(k-1)/2 \). This amounts to \(\mathrm {eff}_D(\xi ,\beta ) = \mathrm {eff}_D(\xi _2,\rho )^{2k(k-1)/p}\) if all interactions \( \rho _{ij} \) are equal to \( \rho \), and to \(\mathrm {eff}_D(\xi ,\beta ) = \mathrm {eff}_D(\xi _2,\rho )^{4/p}\) if only one interaction, \( \rho _{12} \) say, is equal to \( \rho \) and all other interactions are equal to zero. This means that in the first case, the efficiency decreases to \( \mathrm {eff}_D(\xi _2)^4 \), when k gets larger, while in the second case the efficiency tends to 1.

7 Discussion

The main purpose of the present paper is to characterize locally D-optimal designs explicitly for the two-dimensional Poisson regression model with interaction on the unbounded design region of quadrant I when both main effects as well as the interaction effect are negative, and to present a rigorous proof for their optimality. Obviously, the designs specified in Corollary 4.7 remain optimal on design regions which are subsets of quadrant I and cover the support points of the respective design. For example, if the design region is a rectangle, \({\mathcal {X}} = [0,b_1] \times [0,b_2]\), then the design of Corollary 4.7 is optimal as long as \(b_1 \ge 2/|\beta _1|\) and \(b_2 \ge 2/|\beta _2|\) for the two components. Furthermore, if the design region is shifted, \({\mathcal {X}} = [a_1,\infty ) \times [a_2,\infty )\) or a sufficiently large subregion of that, then also the locally D-optimal design is shifted accordingly and assigns equal weights 1/4 to \({\mathbf {x}}_0=(a_1,a_2)\), \({\mathbf {x}}_1=(a_1+2/|\beta _1|,a_2)\), \({\mathbf {x}}_2=(a_1,a_2+2/|\beta _2|)\), and \({\mathbf {x}}_3=(a_1+t/|\beta _1|,a_2+t/|\beta _2|)\) where t is defined as in Corollary 4.7.

Various extensions of our work have been discussed in the previous section. Furthermore, it seems promising to extend the present results to negative binomial (Poisson-Gamma) regression which is a popular generalization of Poisson regression which can cope with overdispersion as in Rodríguez-Torreblanca and Rodríguez-Díaz (2007) for one-dimensional regression or in Schmidt and Schwabe (2017) for multidimensional regression without interaction. This will be object of further investigation.