Locally D-optimal designs for a wider class of non-linear models on the k-dimensional ball

with applications to logit and probit models

Abstract

In this paper we extend the results of Radloff and Schwabe (arXiv:1806.00275, 2018), which could be applied for example to Poisson regression, negative binomial regression and proportional hazard models with censoring, to a wider class of non-linear multiple regression models. This includes the binary response models with logit and probit link besides others. For this class of models we derive (locally) D-optimal designs when the design region is a k-dimensional ball. For the corresponding construction we make use of the concept of invariance and equivariance in the context of optimal designs as in our previous paper. In contrast to the former results the designs will not necessarily be exact designs in all cases. Instead approximate designs can appear. These results can be generalized to arbitrary ellipsoidal design regions.

Introduction

To see that models on spherical design spaces are important and worth to be investigated we can mention first publications by Kiefer (1961) and Farrell et al. (1967) which discuss polynomial regression on the ball. Generalized linear models are also extensively investigated in literature; for example Ford et al. (1992). So these models are well-examined, but there seems to be no reference available for the extension of generalized linear models to circular design regions. The first approach to bring these two things together is our publication Radloff and Schwabe (2018). This should be extended to further models, whereby qualitatively slightly different optimal designs occur.

For practical applications one may imagine problems in engineering or physics where the validity of a model may be assumed on a spherical region around a target value, for example in the framework of response surface methodology.

In Radloff and Schwabe (2018) we found optimal designs for a special class of linear and non-linear models with respect to the D-criterion on a k-dimensional ball. The main result was for (non-linear) multiple regression models, which means the linear predictor is

$$\begin{aligned} {\varvec{f}}({\varvec{x}})^\top \varvec{\beta } = \beta _0 + \beta _1 x_1 + \cdots + \beta _k x_k\ \end{aligned}$$

and the one-support-point (or elemental) information matrix should be representable in the form

$$\begin{aligned} {\varvec{M}}({\varvec{x}},\varvec{\beta })=\lambda ({\varvec{f}}({\varvec{x}})^\top \varvec{\beta }){\varvec{f}}({\varvec{x}}){\varvec{f}}({\varvec{x}})^\top \end{aligned}$$

with an intensity (or efficiency) function \(\lambda \) which only depends on the value of the linear predictor. By using results on equivariance and invariance of Radloff and Schwabe (2016), we rotate the design space, the k-dimensional unit ball \(\mathbb {B}_k\), and the parameter space \({\mathbb {R}}^{k+1}\) simultaneously in such a way, that the linear predictor of the multiple regression problem collapses to

$$\begin{aligned} {\varvec{f}}({\varvec{x}})^\top \varvec{\beta } = \beta _0 + \beta _1 x_1 \quad \text { and } \quad \beta _1\ge 0. \end{aligned}$$
(1)

So it is possible to reduce that multidimensional problem to a one-dimensional marginal problem. Similar one-dimensional problems have already been investigated, for example in Konstantinou et al. (2014).

In Radloff and Schwabe (2018) the following four conditions, which can be satisfied by the intensity function \(\lambda \), were imposed (see also Konstantinou et al. 2014 or Schmidt and Schwabe 2017):

  1. (A1)

    \(\lambda \) is positive on \({\mathbb {R}}\) and twice continuously differentiable.

  2. (A2)

    The first derivative \(\lambda ^\prime \) is positive on \({\mathbb {R}}\).

  3. (A3)

    The second derivative \(u^{\prime \prime }\) of \(u=\frac{1}{\lambda }\) is injective on \({\mathbb {R}}\).

  4. (A4)

    The function \(\frac{\lambda ^\prime }{\lambda }\) is non-increasing.

Poisson regression, negative binomial regression and special proportional hazard models with censoring (see Schmidt and Schwabe 2017) fulfill these four conditions.

For a concise notation we will use from now on the abbreviation

$$\begin{aligned}q(x_1):=\lambda (\beta _0+\beta _1 x_1)\ .\end{aligned}$$

For \(\beta _1>0\) the properties (A1)–(A4) transfer to q, respectively, and vice versa.

In Radloff and Schwabe (2018) we established the following main result that is reproduced for the readers’ convenience.

Theorem 1

There is a (locally) D-optimal design for the simplified problem (1) with \(\beta _1>0\) and intensity function satisfying \(\mathrm {(A1)}\)\(\mathrm {(A3)}\) which has one support point equal to \((1,0,\ldots ,0)^\top \) and the other k support points are the vertices of an arbitrarily rotated, \((k-1)\)-dimensional simplex which is maximally inscribed in the intersection of the k-dimensional unit ball and a hyperplane with \(x_1=x_{12}^*\).

For \(k\ge 2\ :\ x_{12}^*\in (-1,1)\) is solution of

$$\begin{aligned} \frac{q^\prime (x_{12}^*)}{q(x_{12}^*)}=\frac{2\,(1+kx_{12}^*)}{k\,(1-x_{12}^{*\ 2})} \end{aligned}$$

and for \(k=1\) : It is \(x_{12}^*=-1\) or \(x_{12}^*\in [-1,1)\) is solution of

$$\begin{aligned} \frac{q^\prime (x_{12}^*)}{q(x_{12}^*)}=\frac{2}{1-x_{12}^{*}}\ . \end{aligned}$$

In any case, if additionally \(\mathrm {(A4)}\) is satisfied, the solution \(x_{12}^*\) is unique. The design is equally weighted with \(\frac{1}{k+1}\).

We have to remark, that for this proof we do not need (A1)–(A4) on the entire real line \({\mathbb {R}}\). It is enough to demand it on \(x_1\in [-1,1]\) which means \(x\in [\beta _0-\beta _1,\beta _0+\beta _1]\).

If \(\beta _1=0\) then the design consisting of the equally weighted vertices of a regular simplex inscribed in the unit sphere, the boundary of the design space, is (locally) D-optimal. The orientation is arbitrary.

In the present paper we want to transfer the results to other models for example to binary response models with logit or probit link. Here the intensity functions do not satisfy the conditions (A2) and (A3).

The corresponding problem of logit and probit models in one dimension has already been investigated by Ford et al. (1992) and Biedermann et al. (2006). We will give here a natural extension to higher dimensions.

The mentioned publications by Kiefer (1961) and Farrell et al. (1967) which first discuss polynomial regression on spherical design spaces are followed up by Lau (1988), where polynomials are fitted on the k-dimensional unit ball by using canonical moments, or Dette et al. (2005, 2007) and Hirao et al. (2015). The three last-named papers use harmonic polynomials and Zernike polynomials to be fit on the unit disc (2-dimensional unit ball) the 3- and k-dimensional unit ball. But they only focus on linear problems. Non-linearity or generalized linear models are not treated. Nevertheless we can benefit from these results. They gave constructions for the discretization—beside simplex, cross-polytope and cube—of the continuous uniform distribution/design on the k-dimensional ball. We can shift this to our problem of finding a discrete version of the uniform distribution on the \((k-1)\)-dimensional ball as the intersection of the k-dimensional unit ball and a hyperplane.

General model description, design, and invariance

In the following sections as mentioned in the introduction we want to focus on a class of (non-linear) multiple regression models. Here every observation Y depends on a special setting of control variables, the design point \({\varvec{x}}\), which is in the design region \({\mathscr {X}}={\mathbb {B}}_k=\{{\varvec{x}}\in {\mathbb {R}}^k\ :\ x_1^2+\cdots +x_k^2\le 1\}\), the k-dimensional unit ball with \(k\in {\mathbb {N}}\). The regression function \({\varvec{f}}:{\mathscr {X}}\rightarrow {\mathbb {R}}^{k+1}\) is considered to be \({\varvec{x}}\mapsto (1,x_1,\ldots ,x_k)^\top \), and the parameter vector \(\varvec{\beta }=(\beta _0,\beta _1,\ldots ,\beta _k)^\top \) is unknown and lies in the parameter space \({\mathscr {B}}\). We will take \({\mathscr {B}}={\mathbb {R}}^{k+1}\). So the linear predictor is

$$\begin{aligned} {\varvec{f}}({\varvec{x}})^\top \varvec{\beta } = \beta _0 + \beta _1 x_1 + \cdots + \beta _k x_k. \end{aligned}$$
(2)

A second requirement is that the one-support-point (or elemental, see Atkinson et al. (2014)) information matrix \({\varvec{M}}({\varvec{x}},\varvec{\beta })\) can be written as

$$\begin{aligned} {\varvec{M}}({\varvec{x}},\varvec{\beta })=\lambda ({\varvec{f}}({\varvec{x}})^\top \varvec{\beta }){\varvec{f}}({\varvec{x}}){\varvec{f}}({\varvec{x}})^\top \end{aligned}$$

with an intensity (or efficiency) function \(\lambda \) (see Fedorov 1972, Sect. 1.5) which only depends on the value of the linear predictor.

We want to find optimal designs on the the k-dimensional unit ball for those problems. This will be done in the sense of D-optimality, which is a very popular criterion and minimizes the volume of the (asymptotic) confidence ellipsoid.

On account of this we need the concept of information matrices. In our case the information matrix of a (generalized) design \(\xi \) with independent observations is

$$\begin{aligned} {\varvec{M}}(\xi ,\varvec{\beta })=\int _{\mathscr {X}}{\varvec{M}}({\varvec{x}},\varvec{\beta })\ \xi (\mathrm {d}{\varvec{x}})=\int _{\mathscr {X}}\lambda ({\varvec{f}}({\varvec{x}})^\top \varvec{\beta }){\varvec{f}}({\varvec{x}}){\varvec{f}}({\varvec{x}})^\top \xi (\mathrm {d}{\varvec{x}}). \end{aligned}$$

Here generalized design does not only mean design on a discrete set of design points. It means an arbitrary probability measure on the design region. In contrast a discrete design has a discrete probability measure with discrete or finite support; see, for example, Silvey (1980).

So we define: A design \(\xi ^*\) with regular information matrix \({\varvec{M}}(\xi ^*,\varvec{\beta })\) is called (locally) D-optimal (at \(\varvec{\beta }\)) if \(\det ({\varvec{M}}(\xi ^*,\varvec{\beta }))\ge \det ({\varvec{M}}(\xi ,\varvec{\beta }))\) holds for all possible probability measures \(\xi \) on \({\mathscr {X}}\).

Notation 1

The symbol \({\mathbb {S}}_{d-1}\), \(d\in \{2,3,4,\ldots \}\), describes the unit sphere, which is the surface of a d-dimensional unit ball \({\mathbb {B}}_d\). Introducing further notation we also mention \({\mathbb {O}}_d\) the d-dimensional zero-vector, \({\mathbb {O}}_{d_1\times d_2}\) the \((d_1\times d_2)\)-dimensional zero-matrix, \(\mathbb {1}_d\) the d-dimensional one-vector, \({\mathbb {I}}_d\) the \((d\times d)\)-dimensional identity matrix and \({\text {id}}\,\) the identity function.

In the remainder of this section we reproduce some results and lemmas from Radloff and Schwabe (2018) which will also be valid and helpful for our current endeavour.

Lemma 1

Any (locally) D-optimal design for (2) is concentrated on the surface of \({\mathscr {X}}={\mathbb {B}}_k\) and is equivariant with respect to rotations.

Equivariance in this context means: If the design or design region is rotated, the parameter space must be rotated in a corresponding way. For detailed information see Radloff and Schwabe (2016, 2018).

For an initial guess \((\beta _1,\ldots ,\beta _k)^\top \ne {\mathbb {O}}_k\)—the case \(={\mathbb {O}}_k\) is discussed later—there is a rotation \(\varvec{\tilde{g}}\) such that \(\varvec{\tilde{g}}(\beta _0,\beta _1,\ldots ,\beta _k)^\top =(\beta _0,{\tilde{\beta }}_1,0,\ldots ,0)\) with \({\tilde{\beta }}_1=||(\beta _1,\ldots ,\beta _k)^\top ||>0\), where \(||\cdot ||\) is the (k-dimensional) Euclidean norm. In view of the equivariance and without loss of generality only the case \(\varvec{\beta }\in {\mathbb {R}}^{k+1}\) with

$$\begin{aligned} \beta _1\ge 0, \beta _2=\cdots =\beta _k=0 \end{aligned}$$
(3)

has to be considered for optimization. This simplifies our problem of finding a (locally) D-optimal design, with an initial guess of the parameter vector in the whole parameter space, to only the length of this vector.

Lemma 2

For \(\varvec{\beta }\) satisfying (3) the D-criterion is invariant with respect to rotations of \(x_2,\ldots ,x_k\).

So we can find an optimal design within the class of invariant designs on the surface of the ball.

If the initial guess \((\beta _1,\ldots ,\beta _k)^\top \) is \({\mathbb {O}}_k\) then no rotation \(\varvec{\tilde{g}}\) is needed at the beginning and an optimal design is invariant with respect to rotations of all components \(x_1,x_2,\ldots ,x_k\) because the intensity function \(\lambda ({\varvec{f}}({\varvec{x}})^\top \varvec{\beta })\) is constant in this case. As in the linear model issue the (continuously) uniform design on \({\mathbb {S}}_{k-1}\) is (locally) D-optimal. A k-dimensional regular simplex, whose \(k+1\) vertices lie on the surface of the design region \({\mathbb {S}}_{k-1}\), has the same information matrix—the diagonal matrix \(\mathrm {diag}(1,\tfrac{1}{k},\ldots ,\tfrac{1}{k})\); see Pukelsheim (1993, Sect. 15.12) or Radloff and Schwabe (2018). It can be easily calculated that the vertices of a regular k-dimensional cross-polytope (\(2\,k\) vertices) as well as the vertices of a k-dimensional cube (\(2^k\) vertices) inscribed in the ball \({\mathbb {B}}_k\) have the same information matrix if equal weights are assigned. As remarked in the Introduction other discretizations of the uniform design can be found by using the methods in Dette et al. (2005, 2007) and Hirao et al. (2015).

Note that every design or probability measure on the surface of a unit ball can be split into a marginal probability measure \(\xi _1\) on \([-1,1]\) for \(x_1\) and a probability kernel given \(x_1\). In the case of (3), with \(\beta _1>0\), Lemma 3 provides a special property, so that we obtain the representations in Lemma 3 for optimal invariant designs, the information matrix and the sensitivity function

$$\begin{aligned} \psi ({\varvec{x}},\xi _1\otimes {\overline{\eta }}) = \lambda ({\varvec{f}}({\varvec{x}})^\top \varvec{\beta }){\varvec{f}}({\varvec{x}})^\top {\varvec{M}}^{-1}(\xi _1\otimes {\overline{\eta }}){\varvec{f}}({\varvec{x}}) \end{aligned}$$

which is used in the Kiefer–Wolfowitz Equivalence Theorem for D-optimality.

Lemma 3

For \(\varvec{\beta }\) satisfying (3), the invariant designs (on the surface) with respect to rotations of \(x_2,\ldots ,x_k\) are given by \(\xi _1\otimes {\overline{\eta }}\), where \(\xi _1\) is a marginal design on \([-1,1]\) and \({\overline{\eta }}\) is a probability kernel (conditional design). For fixed \(x_1\) the kernel \({\overline{\eta }}(x_1,\cdot )\) is the uniform distribution on the surface of a \((k-1)\)-dimensional ball with radius \(\sqrt{1-x_1^2}\).

The related information matrix is—remembering \(q(x_1)=\lambda (\beta _0+\beta _1 x_1)\)

$$\begin{aligned} {\varvec{M}}(\xi _1\otimes {\overline{\eta }})= \left( \begin{array}{c|c} \begin{array}{cc} \int q\,\mathrm {d}\xi _1 &{} \int q {\text {id}}\,\mathrm {d}\xi _1\\ \int q {\text {id}}\,\mathrm {d}\xi _1 &{} \int q {\text {id}}\,^2 \mathrm {d}\xi _1 \end{array} &{} {\mathbb {O}}_{2\times (k-1)}\\ \hline {\mathbb {O}}_{(k-1)\times 2} &{} \frac{1}{k-1} \int q\,(1-{\text {id}}\,^2)\,\mathrm {d}\xi _1\ {\mathbb {I}}_{k-1} \end{array} \right) . \end{aligned}$$
(4)

The sensitivity function \(\psi \) is invariant (constant on orbits) and has for\({\varvec{x}}\in {\mathbb {S}}_{k-1}\) the form

$$\begin{aligned} \psi ({\varvec{x}},\xi _1\otimes {\overline{\eta }})=q(x_1)\cdot p_1(x_1) \quad \text {with}\quad {\varvec{x}}=(x_1,\ldots ,x_k)^\top \end{aligned}$$
(5)

where \(p_1\) is a polynomial of degree 2 in \(x_1\).

If \(x_1\in \{-1,1\}\), the \((k-1)\)-dimensional ball with the uniform distribution reduces to a single point. So it is only a one-point-measure.

The wider class—logit and probit model

The intensity function for the logit model is

$$\begin{aligned} \lambda _{\mathrm {logit}}(x)=\frac{\exp (x)}{(1+\exp (x))^2} \end{aligned}$$

and for the probit model

$$\begin{aligned} \lambda _{\mathrm {probit}}(x)=\frac{\phi ^2(x)}{{\varPhi }(x)(1-{\varPhi }(x))} \end{aligned}$$

with the density function \(\phi \) and cumulative distribution function \({\varPhi }\) of the standard normal distribution.

As mentioned before the intensity function of the binary response models with logit or probit link satisfy (A1) and (A4) but fail to satisfy (A2) and (A3) in general. However they satisfy the conditions

(A2\(^\prime \)):

\(\lambda \) is unimodal with mode \(c_\lambda ^\mathrm {(A2^\prime )}\in {\mathbb {R}}\), which means that there exists a \(c_\lambda ^\mathrm {(A2^\prime )}\in {\mathbb {R}}\) so that \(\lambda ^\prime \) is positive on \((-\infty ,c_\lambda ^\mathrm {(A2^\prime )})\) and negative on \((c_\lambda ^\mathrm {(A2^\prime )},\infty )\).

(A3\(^\prime \)):

There exists a \(c_\lambda ^\mathrm {(A3^\prime )}\in {\mathbb {R}}\) so that the second derivative \(u^{\prime \prime }\) of \(u=\frac{1}{\lambda }\) is both injective on \((-\infty ,c_\lambda ^\mathrm {(A3^\prime )}]\) and injective on \([c_\lambda ^\mathrm {(A3^\prime )},\infty )\).

In some examples, like logit and probit, which fulfill (A2\(^\prime \)) and (A3\(^\prime \)) they share a common \(c_\lambda ^\mathrm {(A2^\prime )}=c_\lambda ^\mathrm {(A3^\prime )}\), say \(c_\lambda \). Note that \(c_\lambda =0\) for logit and probit.

Admittedly (A2) does not imply (A2\(^\prime \)) and (A3) does not imply (A3\(^\prime \)). Because of the unit sphere we only focus on an interval \(x_1\in [-1,1]\) which implies an x between \(\beta _0-\beta _1\) and \(\beta _0+\beta _1\). So in our special case (A2) and (A3) can be transferred to (A2\(^\prime \)) and (A3\(^\prime \)) by using an arbitrary \(c_\lambda >\beta _0+\beta _1\), which means that \(c_\lambda \) lies outside the interval \([-1,1]\) and only one branch of the function is considered. The other way round we do not need the properties of (A2) and (A3) on whole \({\mathbb {R}}\) as requested in Radloff and Schwabe (2018).

As the properties (A1)–(A4) transfer from the intensity function \(\lambda \) to the abbreviated form q for \(\beta _1>0\) and vice versa, the same applies to (A2\(^\prime \)) and (A3\(^\prime \))—analogously \(c_q^\mathrm {(\cdot )}=\frac{c_\lambda ^\mathrm {(\cdot )}-\beta _0}{\beta _1}\) with \(\mathrm {(\cdot )}\) is (A2\(^\prime \)), (A3\(^\prime \)) or empty.

We have

$$\begin{aligned} q_{\mathrm {logit}}(x_1)&=\frac{\exp (\beta _0+\beta _1 x_1)}{(1+\exp (\beta _0+\beta _1 x_1))^2}\\ q_{\mathrm {logit}}^\prime (x_1)&=\beta _1\,\frac{\exp (\beta _0+\beta _1 x_1)\,(1-\exp (\beta _0+\beta _1 x_1))}{(1+\exp (\beta _0+\beta _1 x_1))^3}\\ u_{\mathrm {logit}}(x_1)&=2+\exp (\beta _0+\beta _1 x_1)+\exp (-(\beta _0+\beta _1 x_1))\\ u_{\mathrm {logit}}^{\prime \prime }(x_1)&=\beta _1^2\,(\exp (\beta _0+\beta _1 x_1)+\exp (-(\beta _0+\beta _1 x_1))) \end{aligned}$$

in the logit model. We omit the corresponding terms of the probit model. However we have in both models \(c_\lambda =0\) for \(\lambda \) and the analogue \(c_q=-\frac{\beta _0}{\beta _1}\) for q.

We introduce a fifth property.

  1. (A5)

    \(u=\frac{1}{\lambda }\) dominates \(x^2\) asymptotically for \(x\rightarrow \infty \), which means

    $$\begin{aligned} \lim \limits _{x\rightarrow \infty }\left| \frac{u(x)}{x^2}\right| =\infty . \end{aligned}$$

In other words \(u(x)=\frac{1}{\lambda (x)}\) goes faster to (±) infinity than \(x^2\) for \(x\rightarrow \infty \). The logit and probit models satisfy also (A5).

Not only these two models belong to this wider class. For example, the complementary log-log model, see Ford et al. (1992), with intensity function \(\lambda _{\mathrm {comp\,log\,log}}(x)=\frac{\exp (2x)}{\exp (\exp (x))-1}\) also satisfy (A1), (A2\(^\prime \)) with \(c_\lambda ^\mathrm {(A2^\prime )}\approx 0.466011\), (A3\(^\prime \)) with \(c_\lambda ^\mathrm {(A3^\prime )}\approx 0.049084\), (A4) and (A5).

Lemma 4

In (3): If q satisfies \(\mathrm {(A1)}\), \(\mathrm {(A2^\prime )}\) and \(\mathrm {(A3^\prime )}\), then the (locally) D-optimal marginal design \(\xi _1^*\) is concentrated on exactly 2 points \(x_{11}^*, x_{12}^*\in [-1,1]\) or exactly 3 points \(x_{11}^*=1\), \(x_{12}^*\in (-1,1)\) and \(x_{13}^*=-1\).

If q satisfies additionally \(\mathrm {(A5)}\) then only the 2-point structure is possible.

Proof

This proof is based on the proof of Lemma 1 in Konstantinou et al. (2014). By the Kiefer-Wolfowitz Equivalence Theorem for D-optimality we have to check

$$\begin{aligned} k+1\ge \psi ({\varvec{x}},\xi _1\otimes {\overline{\eta }})=q(x_1)\cdot p_1(x_1)\quad \text {for all}\quad {\varvec{x}}=(x_1,\ldots ,x_k)^\top . \end{aligned}$$

This is equivalent to

$$\begin{aligned} \frac{p_1(x_1)}{k+1} - \frac{1}{q(x_1)} \le 0. \end{aligned}$$
(6)

With equality in the support points of the optimal design.

Assume that \(\xi _1\) has only 1 support point. So the determinant of the first block of the information matrix \({\varvec{M}}(\xi _1\otimes {\overline{\eta }})\) in Lemma 3 would be 0 and the inverse of the information matrix and thus the polynomial \(p_1\) would not exist. Contradiction. Hence, \(\xi _1\) has at least 2 support points.

Let us call the left-hand side of (6) \(v(x_1)\). The second derivative of v is \(v^{\prime \prime }(x_1)={\tilde{c}}-\left( \frac{1}{q(x_1)}\right) ^{\prime \prime }\) where \({\tilde{c}}\) is the constant remaining from the polynomial \(\frac{p_1(x_1)}{k+1}\) of degree 2 (see Lemma 3). The condition (A3\(^\prime \)) says that \(v^{\prime \prime }\) can have at most 2 roots. Because of differentiability and continuity the first derivative of v has at most 3 roots which means that v has at most 3 potential inner local extreme points with alternating minima and maxima. If it is minimum–maximum–minimum then \(x_{11}^*=1\), \(x_{12}^*\in (-1,1)\) and \(x_{13}^*=-1\) can be the three points of local maxima of v since 1 and \(-1\) are boundary points. If additionally (A5) is satisfied, \(\lim _{x_1\rightarrow \infty }v(x_1)=-\infty \) so that 1 cannot be a boundary maximum if the other three local extreme points are less than 1. In the case of (A5) the only situation with exactly three inner extreme points is maximum–minimum–maximum. In all other cases there are at most two maxima (inner or boundary) and so at most two support points. \(\square \)

The next theorem characterizes the support points when the optimal marginal design \(\xi _1^*\) has exactly 2. While cases (a) and (b) go along with Theorem 1 and the results from Radloff and Schwabe (2018) the main focus of case (c) is when \(c_q^\mathrm {(A2^\prime )}=\frac{c_\lambda ^\mathrm {(A2^\prime )}-\beta _0}{\beta _1}\) or \(c_q^\mathrm {(A3^\prime )}=\frac{c_\lambda ^\mathrm {(A3^\prime )}-\beta _0}{\beta _1}\) is in \([-1,1]\)—in the ball. Here the behaviour is not so easy. It is the switchover area from (a) to (b). So numerical computation is often needed. We start with \(k\ge 2\).

Theorem 2

For \(k\ge 2\) and in the settings of Lemma 4 and with q satisfying \(\mathrm {(A5)}\) the (locally) D-optimal marginal design \(\xi _1^*\) has exactly 2 support points \(x_{11}^*\) and \(x_{12}^*\) with \(x_{11}^*>x_{12}^*\) and weights \(w_1:=\xi _1^*(x_{11}^*)\) and \(w_2:=\xi _1^*(x_{12}^*)\).

There are 3 cases:

  1. (a)

    If \(\frac{c_\lambda ^\mathrm {(A2^\prime )}-\beta _0}{\beta _1} > 1\) and \(\frac{c_\lambda ^\mathrm {(A3^\prime )}-\beta _0}{\beta _1} > 1\) then \(x_{11}^*=1\), \(w_1=\frac{1}{k+1}\), \(w_2=\frac{k}{k+1}\) and \(x_{12}^*\in (-1,1)\) is solution of

    $$\begin{aligned} \frac{q^\prime (x_{12}^*)}{q(x_{12}^*)}=\frac{2\,(1+kx_{12}^*)}{k\,(1-x_{12}^{*\ 2})}. \end{aligned}$$

    If additionally (A4) is satisfied, the solution \(x_{12}^*\) is unique.

  2. (b)

    If \(\frac{c_\lambda ^\mathrm {(A2^\prime )}-\beta _0}{\beta _1} < -1\) and \(\frac{c_\lambda ^\mathrm {(A3^\prime )}-\beta _0}{\beta _1} < -1\) then \(x_{12}^*=-1\), \(w_1=\frac{k}{k+1}\), \(w_2=\frac{1}{k+1}\) and \(x_{11}^*\in (-1,1)\) is solution of

    $$\begin{aligned} \frac{q^\prime (x_{11}^*)}{q(x_{11}^*)}=\frac{2\,(-1+kx_{11}^*)}{k\,(1-x_{11}^{*\ 2})}. \end{aligned}$$

    If additionally (A4) is satisfied, the solution \(x_{11}^*\) is unique.

  3. (c)

    Otherwise, if \(x,y\in (-1,1)\) with \(x>y\) and \(\alpha \in (0,1)\) is a solution of the equation system

    $$\begin{aligned} \frac{q^\prime (x)}{q(x)}+\frac{2}{x-y}+(k-1)\,\frac{q^\prime (x)\,(1-x^2)\,\alpha + q(x)\,(-2\,x)\,\alpha }{q(x)\,(1-x^2)\,\alpha +q(y)\,(1-y^2)\,(1-\alpha )}&=0\\ \frac{q^\prime (y)}{q(y)}-\frac{2}{x-y}+(k-1)\,\frac{q^\prime (y)\,(1-y^2)\,(1-\alpha ) + q(y)\,(-2\,y)\,(1-\alpha )}{q(x)\,(1-x^2)\,\alpha +q(y)\,(1-y^2)\,(1-\alpha )}&=0\\ \frac{1}{\alpha }-\frac{1}{1-\alpha }+(k-1)\,\frac{q(x)\,(1-x^2) - q(y)\,(1-y^2)}{q(x)\,(1-x^2)\,\alpha +q(y)\,(1-y^2)\,(1-\alpha )}&=0 \end{aligned}$$

    then the 2 support points are \(x_{11}^*=x\), \(x_{12}^*=y\) with weights \(w_1=\alpha \) and \(w_2=1-\alpha \). Otherwise the solution is in the form of the first two cases.

Proof

In (a) for all \(x_1\in [-1,1]\) (A1), (A2) and (A3) are satisfied. And this is the situation of Theorem 1 and the corresponding remark.

In (b) for all \(x_1\in [-1,1]\) (A1) and (A3) are satisfied, but \(\lambda \) or q, respectively, are strictly decreasing. Using the reflection \(x_1\mapsto -x_1\) (A2) is also on hand. Equivariance yields that the optimal design of Theorem 1 has to be reflected, too.

In (c) we know according to Radloff and Schwabe (2018) the logarithmized determinant of the information matrix \({\varvec{M}}(\xi _1\otimes {\overline{\eta }})\) with a 2-point marginal design to be

$$\begin{aligned}&\log q(x_{11}^*) + \log q(x_{12}^*) + \log (x_{11}^*-x_{12}^*)^2 + \log \alpha + \log (1-\alpha )\\&\quad + (k-1)\left[ -\log (k-1)+\log \left( q(x_{11}^*)\,(1-x_{11}^{*\ 2})\,\alpha +q(x_{12}^*)\,(1-x_{12}^{*\ 2})\,(1-\alpha )\right) \right] \end{aligned}$$

which has to be maximized under (c). If \(x_{11}^*, x_{12}^*\notin (-1,1)\) and \(\alpha \notin (0,1)\) then there must be a boundary maximum. If one point is equal to 1 or \(-1\) we obtain the same situation as in (a) or (b), respectively. \(\square \)

For \(k=1\) we have to state; see also Biedermann et al. (2006):

Remark 1

In the same settings as Theorem 2 but with \(k=1\) we have \(w_1=w_2=\frac{1}{2}\) and the same 3 cases:

  1. (a)

    If x is solution of

    $$\begin{aligned} \frac{q^\prime (x)}{q(x)}=\frac{2}{1-x} \end{aligned}$$

    and \(x\in [-1,1)\) then \(x_{12}^*=x\). Otherwise \(x_{12}^*=-1\). If additionally (A4) is satisfied, the solution \(x_{12}^*\) is unique.

  2. (b)

    If x is solution of

    $$\begin{aligned} \frac{q^\prime (x)}{q(x)}=\frac{-2}{1+x} \end{aligned}$$

    and \(x\in (-1,1]\) then \(x_{11}^*=x\). Otherwise \(x_{11}^*=1\). If additionally (A4) is satisfied, the solution \(x_{11}^*\) is unique.

  3. (c)

    If \(x,y\in (-1,1)\) with \(x>y\) is a solution of the equation system

    $$\begin{aligned} \frac{q^\prime (x)}{q(x)}+\frac{2}{x-y}&=0\\ \frac{q^\prime (y)}{q(y)}-\frac{2}{x-y}&=0 \end{aligned}$$

    then the 2 support points are \(x_{11}^*=x\), \(x_{12}^*=y\). Otherwise the solution is in the form of the first two cases.

Results for the logit and probit model

As mentioned before both logit and probit model have \(c_\lambda =c_\lambda ^\mathrm {(A2^\prime )}=c_\lambda ^\mathrm {(A3^\prime )}=0\) which is the point of the peak of the intensity function \(\lambda \) and the analogue \(c_q=c_q^\mathrm {(A2^\prime )}=c_q^\mathrm {(A3^\prime )}=-\frac{\beta _0}{\beta _1}\) for q. Because \(\beta _1\ge 0\) we have that \(c_q\) depends on \(-\beta _0\). We often fix \(\beta _1=1\) so that \(c_q=-\beta _0\). Therefore we refer here only to \(-\beta _0\) instead of \(\beta _0\).

Fig. 1
figure1

Logit model: dependence of \(x_{11}^*\) and \(x_{12}^*\) (solid lines) and the corresponding weights \(w_1\) and \(w_2=1-w_1\) (dashed lines) on \(-\beta _0\in [-1.2,1.2]\). The plots are for fixed dimension \(k=3\) (left panel) and \(k=6\) (right panel), respectively, and \(\beta _1=1\). Hence, \(-\beta _0=-\frac{\beta _0}{\beta _1}=c_q\)

Using Theorem 2 we can evaluate the two support points of the marginal design \(\xi _1\) of the logit model. In Fig. 1 we did this numerically for \(-\beta _0\in [-1.2,1.2]\), fixed \(\beta _1=1\) and the dimensions \(k=3\) or \(k=6\). The situation (c) where we have two real inner points is only for \(-\beta _0\in (-0.403,0.403)\) (approximated) for \(k=3\) and \(-\beta _0\in (-0.480,0.480)\) for \(k=6\). In the probit model the plots in Fig. 2 have nearly the same structure. Here the \(-\beta _0\)-intervals where we have two inner points are \((-0.436,0.436)\) for \(k=3\) and \((-0.507,0.507)\) for \(k=6\).

Fig. 2
figure2

Probit model: dependence of \(x_{11}^*\) and \(x_{12}^*\) (solid lines) and the corresponding weights \(w_1\) and \(w_2=1-w_1\) (dashed lines) on \(-\beta _0\in [-1.2,1.2]\). The plots are for fixed dimension \(k=3\) (left panel) and \(k=6\) (right panel), respectively, and \(\beta _1=1\). Hence, \(-\beta _0=-\frac{\beta _0}{\beta _1}=c_q\)

However there is a big difference which cannot be seen in the two figures; namely the behaviour of the inner point for \(-\beta _0\rightarrow \infty \) or \(-\beta _0\rightarrow -\infty \) and arbitrary \(\beta _1\ge 0\). In the probit model the inner point converges from below to 1 or from above to \(-1\), respectively. In the logit model the inner point converges to

$$\begin{aligned} {\left\{ \begin{array}{ll} \frac{-1+\sqrt{1-\frac{2}{k}\beta _1+\beta _1^2}}{\beta _1} &{} \text {for }\beta _1>0\\ -\frac{1}{k}&{} \text {for }\beta _1=0. \end{array}\right. } \end{aligned}$$

For \(\beta _1=1\) we have \(-1+\sqrt{\frac{4}{3}}\approx 0.1547\) (\(k=3\)) and \(-1+\sqrt{\frac{5}{3}}\approx 0.2910\) (\(k=6\)).

As in Theorem 1 the orbit belonging to the inner point \(x_{12}^*\) or \(x_{11}^*\) in situation (a) or (b) can be discretized by the vertices of a \((k-1)\)-dimensional regular simplex. So we have exact (locally) D-optimal designs with equal weights \(\frac{1}{k+1}\).

The discretization in (c) is more difficult. If the weights \(w_1\) and \(w_2\) are appropriated (in terms of the following example with \(-\beta _0=-0.1\)) it can be done by using \((k-1)\)-dimensional regular simplices, cross-polytopes, cubes or as mentioned above other discretizations of the (continuous) uniform distribution.

Fig. 3
figure3

Logit model: discretized (locally) D-optimal designs for \(\beta _1=1\) and \(k=3\)

In three examples we want to focus the logit model with \(\beta _1=1\) and \(k=3\); see Fig. 3. The \(-\beta _0\) are chosen in such a way to illustrate exact and nearly exact (rounded) designs.

For \(-\beta _0=0.5\) we have \(x_{11}^*=1\), \(x_{12}^*\approx -0.18\) and \(w_1=\xi _1^*(x_{11}^*)=\frac{1}{4}\). For \(-\beta _0=0\) we have apart from rotation invariance with respect to \(x_2,\ldots ,x_k\), an extra invariance—the reflection in \(x_1\)-direction. In addition the intensity function of the logit model is symmetrical. Therefore the two support points of the marginal design must be symmetrical around 0, that is \(x_{11}^*=-x_{12}^*\), and the weights must be equal \(\xi _1^*(x_{11}^*)=\xi _1^*(x_{12}^*)=0.5\). By calculation we have \(x_{11}^*=-x_{12}^*\approx 0.52\). So both designs have equal weights on their support points. While the optimal design for \(-\beta _0=0.5\) has the minimum number of points the optimal design for \(-\beta _0=0\) consists of two 2-dimensional simplices. So it may be possible that there is another optimal design with less than 6 support points.

In the case of \(-\beta _0=-0.1\) we have the optimal (but generalized) design \(\xi ^*\) with \(x_{11}^*\approx 0.42\), \(x_{12}^*\approx -0.62\) and \(\xi _1^*(x_{11}^*)\approx 0.4297\). It is \(0.4297\approx \frac{3}{7}\). We decided to take the rounded design \(\xi ^\approx \) with the same support points \(x_{11}^*\) and \(x_{12}^*\) but with the marginal design \(\xi _1^\approx (x_{11}^*)=\frac{3}{7}\) and \(\xi _1^\approx (x_{12}^*)=\frac{4}{7}\). So we can substitute one orbit by the vertices of a 2-dimensional simplex (3 points) and one by the vertices of a 2-dimensional cube or cross polytope, which is, in two dimensions, always a square (4 points). In spite of a little bit rounding it is near to the optimum. To verify this we can calculate the D-efficiency which compares the rounded design \(\xi ^\approx \) and the (non-rounded) optimal design \(\xi ^*\)—here \(k=3\):

$$\begin{aligned} \mathrm {eff}_D(\xi ^\approx ):=\left( \frac{\det {\varvec{M}}(\xi ^\approx )}{\det {\varvec{M}}(\xi ^*)}\right) ^\frac{1}{k}\approx 0.999676\ . \end{aligned}$$

Summary and discussion

In the present paper we developed (locally) D-optimal designs for a class of non-linear multiple regression problems which include especially binary response models with logit or probit link. This extension of the results established in Radloff and Schwabe (2018) provides in certain cases exact designs. In all other cases rotation-invariant approximate designs are obtained which consist of two parallel (non-degenerate) orbits on the surface of the spherical design region of a k-dimensional ball.

By using linear transformations, like scaling and rotating, the class of shapes of the design region can be extended from the unit ball to k-dimensional balls with arbitrary radius or any k-dimensional ellipsoid, which can be obtained by using the equivariance results established in Radloff and Schwabe (2016).

Here we focused on linear regressors of the multiple linear regression type. Accounting for interactions or quadratic terms will presumably induce additional support points in the interior of the design region and/or more complicated design structures.

There is one property observed in the numerical calculations for both the logit and probit model (see Figs. 1 and 2 ) which deserves further investigations: If the intensity function \(\lambda \) is symmetrical, which means \(\lambda (c_\lambda +x)=\lambda (c_\lambda -x)\), then the two support points are also symmetric around \(c_\lambda \) as long as these support points are in the interior of the marginal design region. We observed this in the case of logit and probit models; see Figs. 1 and 2 . For the one-dimensional case this has been proved in Ford et al. (1992, Sects. 6.5 and 6.6), but this proof cannot be extended to higher dimensions directly because of the additional asymmetric term \((1-x_1^2)\).

As in Radloff and Schwabe (2018) we only considered the criterion of (local) D-optimality which depends on the actual value of the parameter vector. In general other optimality criteria or especially (more) robust criteria, like maximin efficiency or weighted criteria, should also be the object of future research in the present context.

Change history

  • 12 September 2019

    Unfortunately, due to a technical error, the articles published in issues 60:2 and 60:3 received incorrect pagination. Please find here the corrected Tables of Contents. We apologize to the authors of the articles and the readers.

References

  1. Atkinson AC, Fedorov VV, Herzberg AM, Zhang R (2014) Elemental information matrices and optimal experimental design for generalized regression models. J Stat Plan Inference 144:81–91

    MathSciNet  Article  MATH  Google Scholar 

  2. Biedermann S, Dette H, Zhu W (2006) Optimal designs for dose-response models with restricted design spaces. J Am Stat Assoc 101(474):747–759

    MathSciNet  Article  MATH  Google Scholar 

  3. Dette H, Melas VB, Pepelyshev A (2005) Optimal designs for three-dimensional shape analysis with spherical harmonic descriptors. Ann Stat 33:2758–2788

    MathSciNet  Article  MATH  Google Scholar 

  4. Dette H, Melas VB, Pepelyshev A (2007) Optimal designs for statistical analysis with Zernike polynomials. Statistics 41:453–470

    MathSciNet  Article  MATH  Google Scholar 

  5. Farrell RH, Kiefer J, Walbran A (1967) Optimum multivariate designs. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1: statistics. University of California Press, Berkeley, pp 113–138

  6. Fedorov VV (1972) Theory of optimal experiments. Academic Press, New York

    Google Scholar 

  7. Ford I, Torsney B, Wu C (1992) The use of a canonical form in the construction of locally optimal designs for non-linear problems. J R Stat Soc Ser B (Stat Methodol) 54(2):569–583

    MATH  Google Scholar 

  8. Hirao M, Sawa M, Jimbo M (2015) Constructions of \(\phi _p\)-optimal rotatable designs on the ball. Sankhya A 77(1):211–236

    MathSciNet  Article  MATH  Google Scholar 

  9. Kiefer JC (1961) Optimum experimental designs v, with applications to systematic and rotatable designs. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, Berkeley, pp 381–405

  10. Konstantinou M, Biedermann S, Kimber A (2014) Optimal designs for two-parameter nonlinear models with application to survival models. Stat Sin 24(1):415–428

    MathSciNet  MATH  Google Scholar 

  11. Lau TS (1988) \(d\)-optimal designs on the unit \(q\)-ball. J Stat Plan Inference 19(3):299–315

    MathSciNet  Article  MATH  Google Scholar 

  12. Pukelsheim F (1993) Optimal design of experiments. Wiley series in probability and statistics. Wiley, New York

  13. Radloff M, Schwabe R (2016) Invariance and equivariance in experimental design for nonlinear models. In: Kunert J, Müller CH, Atkinson AC (eds) mODa 11—advances in model-oriented design and analysis. Springer, Basel, pp 217–224

  14. Radloff M, Schwabe R (2018) Locally \(d\)-optimal designs for non-linear models on the \(k\)-dimensional ball. arXiv:1806.00275

  15. Schmidt D, Schwabe R (2017) Optimal design for multiple regression with information driven by the linear predictor. Stat Sin 27(3):1371–1384

    MathSciNet  MATH  Google Scholar 

  16. Silvey SD (1980) Optimal design: an introduction to the theory for parameter estimation. Chapman and Hall, Boca Raton

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Martin Radloff.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Radloff, M., Schwabe, R. Locally D-optimal designs for a wider class of non-linear models on the k-dimensional ball. Stat Papers 60, 515–527 (2019). https://doi.org/10.1007/s00362-018-01078-4

Download citation

Keywords

  • Binary response models
  • D-optimality
  • k-dimensional ball
  • Logit and probit model
  • Multiple regression models

Mathematics Subject Classification

  • 62K05
  • 62J12