Abstract
In this paper we extend the results of Radloff and Schwabe (arXiv:1806.00275, 2018), which could be applied for example to Poisson regression, negative binomial regression and proportional hazard models with censoring, to a wider class of nonlinear multiple regression models. This includes the binary response models with logit and probit link besides others. For this class of models we derive (locally) Doptimal designs when the design region is a kdimensional ball. For the corresponding construction we make use of the concept of invariance and equivariance in the context of optimal designs as in our previous paper. In contrast to the former results the designs will not necessarily be exact designs in all cases. Instead approximate designs can appear. These results can be generalized to arbitrary ellipsoidal design regions.
Introduction
To see that models on spherical design spaces are important and worth to be investigated we can mention first publications by Kiefer (1961) and Farrell et al. (1967) which discuss polynomial regression on the ball. Generalized linear models are also extensively investigated in literature; for example Ford et al. (1992). So these models are wellexamined, but there seems to be no reference available for the extension of generalized linear models to circular design regions. The first approach to bring these two things together is our publication Radloff and Schwabe (2018). This should be extended to further models, whereby qualitatively slightly different optimal designs occur.
For practical applications one may imagine problems in engineering or physics where the validity of a model may be assumed on a spherical region around a target value, for example in the framework of response surface methodology.
In Radloff and Schwabe (2018) we found optimal designs for a special class of linear and nonlinear models with respect to the Dcriterion on a kdimensional ball. The main result was for (nonlinear) multiple regression models, which means the linear predictor is
and the onesupportpoint (or elemental) information matrix should be representable in the form
with an intensity (or efficiency) function \(\lambda \) which only depends on the value of the linear predictor. By using results on equivariance and invariance of Radloff and Schwabe (2016), we rotate the design space, the kdimensional unit ball \(\mathbb {B}_k\), and the parameter space \({\mathbb {R}}^{k+1}\) simultaneously in such a way, that the linear predictor of the multiple regression problem collapses to
So it is possible to reduce that multidimensional problem to a onedimensional marginal problem. Similar onedimensional problems have already been investigated, for example in Konstantinou et al. (2014).
In Radloff and Schwabe (2018) the following four conditions, which can be satisfied by the intensity function \(\lambda \), were imposed (see also Konstantinou et al. 2014 or Schmidt and Schwabe 2017):

(A1)
\(\lambda \) is positive on \({\mathbb {R}}\) and twice continuously differentiable.

(A2)
The first derivative \(\lambda ^\prime \) is positive on \({\mathbb {R}}\).

(A3)
The second derivative \(u^{\prime \prime }\) of \(u=\frac{1}{\lambda }\) is injective on \({\mathbb {R}}\).

(A4)
The function \(\frac{\lambda ^\prime }{\lambda }\) is nonincreasing.
Poisson regression, negative binomial regression and special proportional hazard models with censoring (see Schmidt and Schwabe 2017) fulfill these four conditions.
For a concise notation we will use from now on the abbreviation
For \(\beta _1>0\) the properties (A1)–(A4) transfer to q, respectively, and vice versa.
In Radloff and Schwabe (2018) we established the following main result that is reproduced for the readers’ convenience.
Theorem 1
There is a (locally) Doptimal design for the simplified problem (1) with \(\beta _1>0\) and intensity function satisfying \(\mathrm {(A1)}\)–\(\mathrm {(A3)}\) which has one support point equal to \((1,0,\ldots ,0)^\top \) and the other k support points are the vertices of an arbitrarily rotated, \((k1)\)dimensional simplex which is maximally inscribed in the intersection of the kdimensional unit ball and a hyperplane with \(x_1=x_{12}^*\).
For \(k\ge 2\ :\ x_{12}^*\in (1,1)\) is solution of
and for \(k=1\) : It is \(x_{12}^*=1\) or \(x_{12}^*\in [1,1)\) is solution of
In any case, if additionally \(\mathrm {(A4)}\) is satisfied, the solution \(x_{12}^*\) is unique. The design is equally weighted with \(\frac{1}{k+1}\).
We have to remark, that for this proof we do not need (A1)–(A4) on the entire real line \({\mathbb {R}}\). It is enough to demand it on \(x_1\in [1,1]\) which means \(x\in [\beta _0\beta _1,\beta _0+\beta _1]\).
If \(\beta _1=0\) then the design consisting of the equally weighted vertices of a regular simplex inscribed in the unit sphere, the boundary of the design space, is (locally) Doptimal. The orientation is arbitrary.
In the present paper we want to transfer the results to other models for example to binary response models with logit or probit link. Here the intensity functions do not satisfy the conditions (A2) and (A3).
The corresponding problem of logit and probit models in one dimension has already been investigated by Ford et al. (1992) and Biedermann et al. (2006). We will give here a natural extension to higher dimensions.
The mentioned publications by Kiefer (1961) and Farrell et al. (1967) which first discuss polynomial regression on spherical design spaces are followed up by Lau (1988), where polynomials are fitted on the kdimensional unit ball by using canonical moments, or Dette et al. (2005, 2007) and Hirao et al. (2015). The three lastnamed papers use harmonic polynomials and Zernike polynomials to be fit on the unit disc (2dimensional unit ball) the 3 and kdimensional unit ball. But they only focus on linear problems. Nonlinearity or generalized linear models are not treated. Nevertheless we can benefit from these results. They gave constructions for the discretization—beside simplex, crosspolytope and cube—of the continuous uniform distribution/design on the kdimensional ball. We can shift this to our problem of finding a discrete version of the uniform distribution on the \((k1)\)dimensional ball as the intersection of the kdimensional unit ball and a hyperplane.
General model description, design, and invariance
In the following sections as mentioned in the introduction we want to focus on a class of (nonlinear) multiple regression models. Here every observation Y depends on a special setting of control variables, the design point \({\varvec{x}}\), which is in the design region \({\mathscr {X}}={\mathbb {B}}_k=\{{\varvec{x}}\in {\mathbb {R}}^k\ :\ x_1^2+\cdots +x_k^2\le 1\}\), the kdimensional unit ball with \(k\in {\mathbb {N}}\). The regression function \({\varvec{f}}:{\mathscr {X}}\rightarrow {\mathbb {R}}^{k+1}\) is considered to be \({\varvec{x}}\mapsto (1,x_1,\ldots ,x_k)^\top \), and the parameter vector \(\varvec{\beta }=(\beta _0,\beta _1,\ldots ,\beta _k)^\top \) is unknown and lies in the parameter space \({\mathscr {B}}\). We will take \({\mathscr {B}}={\mathbb {R}}^{k+1}\). So the linear predictor is
A second requirement is that the onesupportpoint (or elemental, see Atkinson et al. (2014)) information matrix \({\varvec{M}}({\varvec{x}},\varvec{\beta })\) can be written as
with an intensity (or efficiency) function \(\lambda \) (see Fedorov 1972, Sect. 1.5) which only depends on the value of the linear predictor.
We want to find optimal designs on the the kdimensional unit ball for those problems. This will be done in the sense of Doptimality, which is a very popular criterion and minimizes the volume of the (asymptotic) confidence ellipsoid.
On account of this we need the concept of information matrices. In our case the information matrix of a (generalized) design \(\xi \) with independent observations is
Here generalized design does not only mean design on a discrete set of design points. It means an arbitrary probability measure on the design region. In contrast a discrete design has a discrete probability measure with discrete or finite support; see, for example, Silvey (1980).
So we define: A design \(\xi ^*\) with regular information matrix \({\varvec{M}}(\xi ^*,\varvec{\beta })\) is called (locally) Doptimal (at \(\varvec{\beta }\)) if \(\det ({\varvec{M}}(\xi ^*,\varvec{\beta }))\ge \det ({\varvec{M}}(\xi ,\varvec{\beta }))\) holds for all possible probability measures \(\xi \) on \({\mathscr {X}}\).
Notation 1
The symbol \({\mathbb {S}}_{d1}\), \(d\in \{2,3,4,\ldots \}\), describes the unit sphere, which is the surface of a ddimensional unit ball \({\mathbb {B}}_d\). Introducing further notation we also mention \({\mathbb {O}}_d\) the ddimensional zerovector, \({\mathbb {O}}_{d_1\times d_2}\) the \((d_1\times d_2)\)dimensional zeromatrix, \(\mathbb {1}_d\) the ddimensional onevector, \({\mathbb {I}}_d\) the \((d\times d)\)dimensional identity matrix and \({\text {id}}\,\) the identity function.
In the remainder of this section we reproduce some results and lemmas from Radloff and Schwabe (2018) which will also be valid and helpful for our current endeavour.
Lemma 1
Any (locally) Doptimal design for (2) is concentrated on the surface of \({\mathscr {X}}={\mathbb {B}}_k\) and is equivariant with respect to rotations.
Equivariance in this context means: If the design or design region is rotated, the parameter space must be rotated in a corresponding way. For detailed information see Radloff and Schwabe (2016, 2018).
For an initial guess \((\beta _1,\ldots ,\beta _k)^\top \ne {\mathbb {O}}_k\)—the case \(={\mathbb {O}}_k\) is discussed later—there is a rotation \(\varvec{\tilde{g}}\) such that \(\varvec{\tilde{g}}(\beta _0,\beta _1,\ldots ,\beta _k)^\top =(\beta _0,{\tilde{\beta }}_1,0,\ldots ,0)\) with \({\tilde{\beta }}_1=(\beta _1,\ldots ,\beta _k)^\top >0\), where \(\cdot \) is the (kdimensional) Euclidean norm. In view of the equivariance and without loss of generality only the case \(\varvec{\beta }\in {\mathbb {R}}^{k+1}\) with
has to be considered for optimization. This simplifies our problem of finding a (locally) Doptimal design, with an initial guess of the parameter vector in the whole parameter space, to only the length of this vector.
Lemma 2
For \(\varvec{\beta }\) satisfying (3) the Dcriterion is invariant with respect to rotations of \(x_2,\ldots ,x_k\).
So we can find an optimal design within the class of invariant designs on the surface of the ball.
If the initial guess \((\beta _1,\ldots ,\beta _k)^\top \) is \({\mathbb {O}}_k\) then no rotation \(\varvec{\tilde{g}}\) is needed at the beginning and an optimal design is invariant with respect to rotations of all components \(x_1,x_2,\ldots ,x_k\) because the intensity function \(\lambda ({\varvec{f}}({\varvec{x}})^\top \varvec{\beta })\) is constant in this case. As in the linear model issue the (continuously) uniform design on \({\mathbb {S}}_{k1}\) is (locally) Doptimal. A kdimensional regular simplex, whose \(k+1\) vertices lie on the surface of the design region \({\mathbb {S}}_{k1}\), has the same information matrix—the diagonal matrix \(\mathrm {diag}(1,\tfrac{1}{k},\ldots ,\tfrac{1}{k})\); see Pukelsheim (1993, Sect. 15.12) or Radloff and Schwabe (2018). It can be easily calculated that the vertices of a regular kdimensional crosspolytope (\(2\,k\) vertices) as well as the vertices of a kdimensional cube (\(2^k\) vertices) inscribed in the ball \({\mathbb {B}}_k\) have the same information matrix if equal weights are assigned. As remarked in the Introduction other discretizations of the uniform design can be found by using the methods in Dette et al. (2005, 2007) and Hirao et al. (2015).
Note that every design or probability measure on the surface of a unit ball can be split into a marginal probability measure \(\xi _1\) on \([1,1]\) for \(x_1\) and a probability kernel given \(x_1\). In the case of (3), with \(\beta _1>0\), Lemma 3 provides a special property, so that we obtain the representations in Lemma 3 for optimal invariant designs, the information matrix and the sensitivity function
which is used in the Kiefer–Wolfowitz Equivalence Theorem for Doptimality.
Lemma 3
For \(\varvec{\beta }\) satisfying (3), the invariant designs (on the surface) with respect to rotations of \(x_2,\ldots ,x_k\) are given by \(\xi _1\otimes {\overline{\eta }}\), where \(\xi _1\) is a marginal design on \([1,1]\) and \({\overline{\eta }}\) is a probability kernel (conditional design). For fixed \(x_1\) the kernel \({\overline{\eta }}(x_1,\cdot )\) is the uniform distribution on the surface of a \((k1)\)dimensional ball with radius \(\sqrt{1x_1^2}\).
The related information matrix is—remembering \(q(x_1)=\lambda (\beta _0+\beta _1 x_1)\)—
The sensitivity function \(\psi \) is invariant (constant on orbits) and has for\({\varvec{x}}\in {\mathbb {S}}_{k1}\) the form
where \(p_1\) is a polynomial of degree 2 in \(x_1\).
If \(x_1\in \{1,1\}\), the \((k1)\)dimensional ball with the uniform distribution reduces to a single point. So it is only a onepointmeasure.
The wider class—logit and probit model
The intensity function for the logit model is
and for the probit model
with the density function \(\phi \) and cumulative distribution function \({\varPhi }\) of the standard normal distribution.
As mentioned before the intensity function of the binary response models with logit or probit link satisfy (A1) and (A4) but fail to satisfy (A2) and (A3) in general. However they satisfy the conditions
 (A2\(^\prime \)):

\(\lambda \) is unimodal with mode \(c_\lambda ^\mathrm {(A2^\prime )}\in {\mathbb {R}}\), which means that there exists a \(c_\lambda ^\mathrm {(A2^\prime )}\in {\mathbb {R}}\) so that \(\lambda ^\prime \) is positive on \((\infty ,c_\lambda ^\mathrm {(A2^\prime )})\) and negative on \((c_\lambda ^\mathrm {(A2^\prime )},\infty )\).
 (A3\(^\prime \)):

There exists a \(c_\lambda ^\mathrm {(A3^\prime )}\in {\mathbb {R}}\) so that the second derivative \(u^{\prime \prime }\) of \(u=\frac{1}{\lambda }\) is both injective on \((\infty ,c_\lambda ^\mathrm {(A3^\prime )}]\) and injective on \([c_\lambda ^\mathrm {(A3^\prime )},\infty )\).
Admittedly (A2) does not imply (A2\(^\prime \)) and (A3) does not imply (A3\(^\prime \)). Because of the unit sphere we only focus on an interval \(x_1\in [1,1]\) which implies an x between \(\beta _0\beta _1\) and \(\beta _0+\beta _1\). So in our special case (A2) and (A3) can be transferred to (A2\(^\prime \)) and (A3\(^\prime \)) by using an arbitrary \(c_\lambda >\beta _0+\beta _1\), which means that \(c_\lambda \) lies outside the interval \([1,1]\) and only one branch of the function is considered. The other way round we do not need the properties of (A2) and (A3) on whole \({\mathbb {R}}\) as requested in Radloff and Schwabe (2018).
As the properties (A1)–(A4) transfer from the intensity function \(\lambda \) to the abbreviated form q for \(\beta _1>0\) and vice versa, the same applies to (A2\(^\prime \)) and (A3\(^\prime \))—analogously \(c_q^\mathrm {(\cdot )}=\frac{c_\lambda ^\mathrm {(\cdot )}\beta _0}{\beta _1}\) with \(\mathrm {(\cdot )}\) is (A2\(^\prime \)), (A3\(^\prime \)) or empty.
We have
in the logit model. We omit the corresponding terms of the probit model. However we have in both models \(c_\lambda =0\) for \(\lambda \) and the analogue \(c_q=\frac{\beta _0}{\beta _1}\) for q.
We introduce a fifth property.

(A5)
\(u=\frac{1}{\lambda }\) dominates \(x^2\) asymptotically for \(x\rightarrow \infty \), which means
$$\begin{aligned} \lim \limits _{x\rightarrow \infty }\left \frac{u(x)}{x^2}\right =\infty . \end{aligned}$$
In other words \(u(x)=\frac{1}{\lambda (x)}\) goes faster to (±) infinity than \(x^2\) for \(x\rightarrow \infty \). The logit and probit models satisfy also (A5).
Not only these two models belong to this wider class. For example, the complementary loglog model, see Ford et al. (1992), with intensity function \(\lambda _{\mathrm {comp\,log\,log}}(x)=\frac{\exp (2x)}{\exp (\exp (x))1}\) also satisfy (A1), (A2\(^\prime \)) with \(c_\lambda ^\mathrm {(A2^\prime )}\approx 0.466011\), (A3\(^\prime \)) with \(c_\lambda ^\mathrm {(A3^\prime )}\approx 0.049084\), (A4) and (A5).
Lemma 4
In (3): If q satisfies \(\mathrm {(A1)}\), \(\mathrm {(A2^\prime )}\) and \(\mathrm {(A3^\prime )}\), then the (locally) Doptimal marginal design \(\xi _1^*\) is concentrated on exactly 2 points \(x_{11}^*, x_{12}^*\in [1,1]\) or exactly 3 points \(x_{11}^*=1\), \(x_{12}^*\in (1,1)\) and \(x_{13}^*=1\).
If q satisfies additionally \(\mathrm {(A5)}\) then only the 2point structure is possible.
Proof
This proof is based on the proof of Lemma 1 in Konstantinou et al. (2014). By the KieferWolfowitz Equivalence Theorem for Doptimality we have to check
This is equivalent to
With equality in the support points of the optimal design.
Assume that \(\xi _1\) has only 1 support point. So the determinant of the first block of the information matrix \({\varvec{M}}(\xi _1\otimes {\overline{\eta }})\) in Lemma 3 would be 0 and the inverse of the information matrix and thus the polynomial \(p_1\) would not exist. Contradiction. Hence, \(\xi _1\) has at least 2 support points.
Let us call the lefthand side of (6) \(v(x_1)\). The second derivative of v is \(v^{\prime \prime }(x_1)={\tilde{c}}\left( \frac{1}{q(x_1)}\right) ^{\prime \prime }\) where \({\tilde{c}}\) is the constant remaining from the polynomial \(\frac{p_1(x_1)}{k+1}\) of degree 2 (see Lemma 3). The condition (A3\(^\prime \)) says that \(v^{\prime \prime }\) can have at most 2 roots. Because of differentiability and continuity the first derivative of v has at most 3 roots which means that v has at most 3 potential inner local extreme points with alternating minima and maxima. If it is minimum–maximum–minimum then \(x_{11}^*=1\), \(x_{12}^*\in (1,1)\) and \(x_{13}^*=1\) can be the three points of local maxima of v since 1 and \(1\) are boundary points. If additionally (A5) is satisfied, \(\lim _{x_1\rightarrow \infty }v(x_1)=\infty \) so that 1 cannot be a boundary maximum if the other three local extreme points are less than 1. In the case of (A5) the only situation with exactly three inner extreme points is maximum–minimum–maximum. In all other cases there are at most two maxima (inner or boundary) and so at most two support points. \(\square \)
The next theorem characterizes the support points when the optimal marginal design \(\xi _1^*\) has exactly 2. While cases (a) and (b) go along with Theorem 1 and the results from Radloff and Schwabe (2018) the main focus of case (c) is when \(c_q^\mathrm {(A2^\prime )}=\frac{c_\lambda ^\mathrm {(A2^\prime )}\beta _0}{\beta _1}\) or \(c_q^\mathrm {(A3^\prime )}=\frac{c_\lambda ^\mathrm {(A3^\prime )}\beta _0}{\beta _1}\) is in \([1,1]\)—in the ball. Here the behaviour is not so easy. It is the switchover area from (a) to (b). So numerical computation is often needed. We start with \(k\ge 2\).
Theorem 2
For \(k\ge 2\) and in the settings of Lemma 4 and with q satisfying \(\mathrm {(A5)}\) the (locally) Doptimal marginal design \(\xi _1^*\) has exactly 2 support points \(x_{11}^*\) and \(x_{12}^*\) with \(x_{11}^*>x_{12}^*\) and weights \(w_1:=\xi _1^*(x_{11}^*)\) and \(w_2:=\xi _1^*(x_{12}^*)\).
There are 3 cases:

(a)
If \(\frac{c_\lambda ^\mathrm {(A2^\prime )}\beta _0}{\beta _1} > 1\) and \(\frac{c_\lambda ^\mathrm {(A3^\prime )}\beta _0}{\beta _1} > 1\) then \(x_{11}^*=1\), \(w_1=\frac{1}{k+1}\), \(w_2=\frac{k}{k+1}\) and \(x_{12}^*\in (1,1)\) is solution of
$$\begin{aligned} \frac{q^\prime (x_{12}^*)}{q(x_{12}^*)}=\frac{2\,(1+kx_{12}^*)}{k\,(1x_{12}^{*\ 2})}. \end{aligned}$$If additionally (A4) is satisfied, the solution \(x_{12}^*\) is unique.

(b)
If \(\frac{c_\lambda ^\mathrm {(A2^\prime )}\beta _0}{\beta _1} < 1\) and \(\frac{c_\lambda ^\mathrm {(A3^\prime )}\beta _0}{\beta _1} < 1\) then \(x_{12}^*=1\), \(w_1=\frac{k}{k+1}\), \(w_2=\frac{1}{k+1}\) and \(x_{11}^*\in (1,1)\) is solution of
$$\begin{aligned} \frac{q^\prime (x_{11}^*)}{q(x_{11}^*)}=\frac{2\,(1+kx_{11}^*)}{k\,(1x_{11}^{*\ 2})}. \end{aligned}$$If additionally (A4) is satisfied, the solution \(x_{11}^*\) is unique.

(c)
Otherwise, if \(x,y\in (1,1)\) with \(x>y\) and \(\alpha \in (0,1)\) is a solution of the equation system
$$\begin{aligned} \frac{q^\prime (x)}{q(x)}+\frac{2}{xy}+(k1)\,\frac{q^\prime (x)\,(1x^2)\,\alpha + q(x)\,(2\,x)\,\alpha }{q(x)\,(1x^2)\,\alpha +q(y)\,(1y^2)\,(1\alpha )}&=0\\ \frac{q^\prime (y)}{q(y)}\frac{2}{xy}+(k1)\,\frac{q^\prime (y)\,(1y^2)\,(1\alpha ) + q(y)\,(2\,y)\,(1\alpha )}{q(x)\,(1x^2)\,\alpha +q(y)\,(1y^2)\,(1\alpha )}&=0\\ \frac{1}{\alpha }\frac{1}{1\alpha }+(k1)\,\frac{q(x)\,(1x^2)  q(y)\,(1y^2)}{q(x)\,(1x^2)\,\alpha +q(y)\,(1y^2)\,(1\alpha )}&=0 \end{aligned}$$then the 2 support points are \(x_{11}^*=x\), \(x_{12}^*=y\) with weights \(w_1=\alpha \) and \(w_2=1\alpha \). Otherwise the solution is in the form of the first two cases.
Proof
In (a) for all \(x_1\in [1,1]\) (A1), (A2) and (A3) are satisfied. And this is the situation of Theorem 1 and the corresponding remark.
In (b) for all \(x_1\in [1,1]\) (A1) and (A3) are satisfied, but \(\lambda \) or q, respectively, are strictly decreasing. Using the reflection \(x_1\mapsto x_1\) (A2) is also on hand. Equivariance yields that the optimal design of Theorem 1 has to be reflected, too.
In (c) we know according to Radloff and Schwabe (2018) the logarithmized determinant of the information matrix \({\varvec{M}}(\xi _1\otimes {\overline{\eta }})\) with a 2point marginal design to be
which has to be maximized under (c). If \(x_{11}^*, x_{12}^*\notin (1,1)\) and \(\alpha \notin (0,1)\) then there must be a boundary maximum. If one point is equal to 1 or \(1\) we obtain the same situation as in (a) or (b), respectively. \(\square \)
For \(k=1\) we have to state; see also Biedermann et al. (2006):
Remark 1
In the same settings as Theorem 2 but with \(k=1\) we have \(w_1=w_2=\frac{1}{2}\) and the same 3 cases:

(a)
If x is solution of
$$\begin{aligned} \frac{q^\prime (x)}{q(x)}=\frac{2}{1x} \end{aligned}$$and \(x\in [1,1)\) then \(x_{12}^*=x\). Otherwise \(x_{12}^*=1\). If additionally (A4) is satisfied, the solution \(x_{12}^*\) is unique.

(b)
If x is solution of
$$\begin{aligned} \frac{q^\prime (x)}{q(x)}=\frac{2}{1+x} \end{aligned}$$and \(x\in (1,1]\) then \(x_{11}^*=x\). Otherwise \(x_{11}^*=1\). If additionally (A4) is satisfied, the solution \(x_{11}^*\) is unique.

(c)
If \(x,y\in (1,1)\) with \(x>y\) is a solution of the equation system
$$\begin{aligned} \frac{q^\prime (x)}{q(x)}+\frac{2}{xy}&=0\\ \frac{q^\prime (y)}{q(y)}\frac{2}{xy}&=0 \end{aligned}$$then the 2 support points are \(x_{11}^*=x\), \(x_{12}^*=y\). Otherwise the solution is in the form of the first two cases.
Results for the logit and probit model
As mentioned before both logit and probit model have \(c_\lambda =c_\lambda ^\mathrm {(A2^\prime )}=c_\lambda ^\mathrm {(A3^\prime )}=0\) which is the point of the peak of the intensity function \(\lambda \) and the analogue \(c_q=c_q^\mathrm {(A2^\prime )}=c_q^\mathrm {(A3^\prime )}=\frac{\beta _0}{\beta _1}\) for q. Because \(\beta _1\ge 0\) we have that \(c_q\) depends on \(\beta _0\). We often fix \(\beta _1=1\) so that \(c_q=\beta _0\). Therefore we refer here only to \(\beta _0\) instead of \(\beta _0\).
Using Theorem 2 we can evaluate the two support points of the marginal design \(\xi _1\) of the logit model. In Fig. 1 we did this numerically for \(\beta _0\in [1.2,1.2]\), fixed \(\beta _1=1\) and the dimensions \(k=3\) or \(k=6\). The situation (c) where we have two real inner points is only for \(\beta _0\in (0.403,0.403)\) (approximated) for \(k=3\) and \(\beta _0\in (0.480,0.480)\) for \(k=6\). In the probit model the plots in Fig. 2 have nearly the same structure. Here the \(\beta _0\)intervals where we have two inner points are \((0.436,0.436)\) for \(k=3\) and \((0.507,0.507)\) for \(k=6\).
However there is a big difference which cannot be seen in the two figures; namely the behaviour of the inner point for \(\beta _0\rightarrow \infty \) or \(\beta _0\rightarrow \infty \) and arbitrary \(\beta _1\ge 0\). In the probit model the inner point converges from below to 1 or from above to \(1\), respectively. In the logit model the inner point converges to
For \(\beta _1=1\) we have \(1+\sqrt{\frac{4}{3}}\approx 0.1547\) (\(k=3\)) and \(1+\sqrt{\frac{5}{3}}\approx 0.2910\) (\(k=6\)).
As in Theorem 1 the orbit belonging to the inner point \(x_{12}^*\) or \(x_{11}^*\) in situation (a) or (b) can be discretized by the vertices of a \((k1)\)dimensional regular simplex. So we have exact (locally) Doptimal designs with equal weights \(\frac{1}{k+1}\).
The discretization in (c) is more difficult. If the weights \(w_1\) and \(w_2\) are appropriated (in terms of the following example with \(\beta _0=0.1\)) it can be done by using \((k1)\)dimensional regular simplices, crosspolytopes, cubes or as mentioned above other discretizations of the (continuous) uniform distribution.
In three examples we want to focus the logit model with \(\beta _1=1\) and \(k=3\); see Fig. 3. The \(\beta _0\) are chosen in such a way to illustrate exact and nearly exact (rounded) designs.
For \(\beta _0=0.5\) we have \(x_{11}^*=1\), \(x_{12}^*\approx 0.18\) and \(w_1=\xi _1^*(x_{11}^*)=\frac{1}{4}\). For \(\beta _0=0\) we have apart from rotation invariance with respect to \(x_2,\ldots ,x_k\), an extra invariance—the reflection in \(x_1\)direction. In addition the intensity function of the logit model is symmetrical. Therefore the two support points of the marginal design must be symmetrical around 0, that is \(x_{11}^*=x_{12}^*\), and the weights must be equal \(\xi _1^*(x_{11}^*)=\xi _1^*(x_{12}^*)=0.5\). By calculation we have \(x_{11}^*=x_{12}^*\approx 0.52\). So both designs have equal weights on their support points. While the optimal design for \(\beta _0=0.5\) has the minimum number of points the optimal design for \(\beta _0=0\) consists of two 2dimensional simplices. So it may be possible that there is another optimal design with less than 6 support points.
In the case of \(\beta _0=0.1\) we have the optimal (but generalized) design \(\xi ^*\) with \(x_{11}^*\approx 0.42\), \(x_{12}^*\approx 0.62\) and \(\xi _1^*(x_{11}^*)\approx 0.4297\). It is \(0.4297\approx \frac{3}{7}\). We decided to take the rounded design \(\xi ^\approx \) with the same support points \(x_{11}^*\) and \(x_{12}^*\) but with the marginal design \(\xi _1^\approx (x_{11}^*)=\frac{3}{7}\) and \(\xi _1^\approx (x_{12}^*)=\frac{4}{7}\). So we can substitute one orbit by the vertices of a 2dimensional simplex (3 points) and one by the vertices of a 2dimensional cube or cross polytope, which is, in two dimensions, always a square (4 points). In spite of a little bit rounding it is near to the optimum. To verify this we can calculate the Defficiency which compares the rounded design \(\xi ^\approx \) and the (nonrounded) optimal design \(\xi ^*\)—here \(k=3\):
Summary and discussion
In the present paper we developed (locally) Doptimal designs for a class of nonlinear multiple regression problems which include especially binary response models with logit or probit link. This extension of the results established in Radloff and Schwabe (2018) provides in certain cases exact designs. In all other cases rotationinvariant approximate designs are obtained which consist of two parallel (nondegenerate) orbits on the surface of the spherical design region of a kdimensional ball.
By using linear transformations, like scaling and rotating, the class of shapes of the design region can be extended from the unit ball to kdimensional balls with arbitrary radius or any kdimensional ellipsoid, which can be obtained by using the equivariance results established in Radloff and Schwabe (2016).
Here we focused on linear regressors of the multiple linear regression type. Accounting for interactions or quadratic terms will presumably induce additional support points in the interior of the design region and/or more complicated design structures.
There is one property observed in the numerical calculations for both the logit and probit model (see Figs. 1 and 2 ) which deserves further investigations: If the intensity function \(\lambda \) is symmetrical, which means \(\lambda (c_\lambda +x)=\lambda (c_\lambda x)\), then the two support points are also symmetric around \(c_\lambda \) as long as these support points are in the interior of the marginal design region. We observed this in the case of logit and probit models; see Figs. 1 and 2 . For the onedimensional case this has been proved in Ford et al. (1992, Sects. 6.5 and 6.6), but this proof cannot be extended to higher dimensions directly because of the additional asymmetric term \((1x_1^2)\).
As in Radloff and Schwabe (2018) we only considered the criterion of (local) Doptimality which depends on the actual value of the parameter vector. In general other optimality criteria or especially (more) robust criteria, like maximin efficiency or weighted criteria, should also be the object of future research in the present context.
Change history
12 September 2019
Unfortunately, due to a technical error, the articles published in issues 60:2 and 60:3 received incorrect pagination. Please find here the corrected Tables of Contents. We apologize to the authors of the articles and the readers.
References
Atkinson AC, Fedorov VV, Herzberg AM, Zhang R (2014) Elemental information matrices and optimal experimental design for generalized regression models. J Stat Plan Inference 144:81–91
Biedermann S, Dette H, Zhu W (2006) Optimal designs for doseresponse models with restricted design spaces. J Am Stat Assoc 101(474):747–759
Dette H, Melas VB, Pepelyshev A (2005) Optimal designs for threedimensional shape analysis with spherical harmonic descriptors. Ann Stat 33:2758–2788
Dette H, Melas VB, Pepelyshev A (2007) Optimal designs for statistical analysis with Zernike polynomials. Statistics 41:453–470
Farrell RH, Kiefer J, Walbran A (1967) Optimum multivariate designs. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1: statistics. University of California Press, Berkeley, pp 113–138
Fedorov VV (1972) Theory of optimal experiments. Academic Press, New York
Ford I, Torsney B, Wu C (1992) The use of a canonical form in the construction of locally optimal designs for nonlinear problems. J R Stat Soc Ser B (Stat Methodol) 54(2):569–583
Hirao M, Sawa M, Jimbo M (2015) Constructions of \(\phi _p\)optimal rotatable designs on the ball. Sankhya A 77(1):211–236
Kiefer JC (1961) Optimum experimental designs v, with applications to systematic and rotatable designs. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, Berkeley, pp 381–405
Konstantinou M, Biedermann S, Kimber A (2014) Optimal designs for twoparameter nonlinear models with application to survival models. Stat Sin 24(1):415–428
Lau TS (1988) \(d\)optimal designs on the unit \(q\)ball. J Stat Plan Inference 19(3):299–315
Pukelsheim F (1993) Optimal design of experiments. Wiley series in probability and statistics. Wiley, New York
Radloff M, Schwabe R (2016) Invariance and equivariance in experimental design for nonlinear models. In: Kunert J, Müller CH, Atkinson AC (eds) mODa 11—advances in modeloriented design and analysis. Springer, Basel, pp 217–224
Radloff M, Schwabe R (2018) Locally \(d\)optimal designs for nonlinear models on the \(k\)dimensional ball. arXiv:1806.00275
Schmidt D, Schwabe R (2017) Optimal design for multiple regression with information driven by the linear predictor. Stat Sin 27(3):1371–1384
Silvey SD (1980) Optimal design: an introduction to the theory for parameter estimation. Chapman and Hall, Boca Raton
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Radloff, M., Schwabe, R. Locally Doptimal designs for a wider class of nonlinear models on the kdimensional ball. Stat Papers 60, 515–527 (2019). https://doi.org/10.1007/s00362018010784
Received:
Revised:
Published:
Issue Date:
Keywords
 Binary response models
 Doptimality
 kdimensional ball
 Logit and probit model
 Multiple regression models
Mathematics Subject Classification
 62K05
 62J12