D-optimal and nearly D-optimal exact designs for binary response on the ball

In this paper the results of Radloff and Schwabe (Stat Papers 60:165–177, 2019) will be extended for a special class of symmetrical intensity functions. This includes binary response models with logit and probit link. To evaluate the position and the weights of the two non-degenerated orbits on the k-dimensional ball usually a system of three equations has to be solved. The symmetry allows to reduce this system to a single equation. As a further result, the number of support points can be reduced to the minimal number. These minimally supported designs are highly efficient. The results can be generalized to arbitrary ellipsoidal design regions.


Introduction
Spherical design spaces can occur in engineering or physics problems where the validity of a model may be assumed on a spherical region around a target value.So (linear) models on spherical design spaces were investigated early in publications like Kiefer (1961) and Farrell et al (1967) which discuss polynomial regression on the ball.These ideas were followed up by papers in which also only linear problems were focused.So Lau (1988) fitted polynomials on the k-dimensional unit ball by using canonical moments.In Dette et al (2005Dette et al ( , 2007) ) and Hirao et al (2015) harmonic polynomials and Zernike polynomials were used to be fit on the unit disc (2-dimensional unit ball), the 3-and k-dimensional unit ball.On the other hand generalized linear models are also well-examined and used in practical application.Logit and probit models, for example, in one dimension on an interval have already been investigated by Ford et al (1992) and Biedermann et al (2006).But there seems to be no available literature which combines both topics.In our publication Radloff and Schwabe (2019b) we took the first step to bring nonlinearity or generalized linear models, respectively, and spherical design regions together.These results were extended to a wider class of non-linear models in our follow-up paper Radloff and Schwabe (2019a).
For better comprehensibility, we will start with the model description and give a brief overview of the findings so far.Then we will consider a special class of intensity functions which allows to reduce the the complexity of finding (locally) D-optimal designs.Afterwards we will tackle the problem, that the optimal designs are not exact designs in general, by establishing highly efficient designs on the ball.

General Model Description
As in Radloff and Schwabe (2019b) and Radloff and Schwabe (2019a), where we described (locally) D-optimal designs for two special classes of linear and non-linear models on a k-dimensional unit ball B k = {x ∈ R k : x 2 1 + . . .+ x 2 k ≤ 1} with k ∈ N, we solely focus (non-linear) multiple regression models, which means the linear predictor is with regression function f : B k → R k+1 , x → (1, x 1 , . . ., x k ) , and parameter vector β = (β 0 , β 1 , . . ., β k ) ∈ R k+1 .The one-support-point (or elemental) information matrix should be representable in the form with an intensity (or efficiency) function λ which only depends on the value of the linear predictor f (x) β.These one-support-point (or elemental) information matrices are the base for the information matrix of a (generalized) design ξ with independent observations Here generalized design means an arbitrary probability measure on the design region B k .These information matrices allow to define the (local) D-optimality, which is one of the most popular criteria in experimental design theory.A design ξ * β 0 with regular information matrix M (ξ * β 0 , β 0 ) is called (locally) D-optimal (at β 0 ) if det(M (ξ * β 0 , β 0 )) ≥ det(M (ξ, β 0 )) holds for all suitable probability measures ξ on the design space -here B k .This optimality criterion can be interpreted as the minimization of the volume of the (asymptotic) confidence ellipsoid.

Prior Results
In Radloff and Schwabe (2016) we stated results on equivariance and invariance.By rotating the design space B k -the k-dimensional unit ball -and the parameter space R k+1 in an analogous way the linear predictor of the multiple regression problem reduces to f (x) Using the rotation invariance with fixed x 1 , this means the invariance to all orthogonal transformations in O(k) which let the x 1 -component unchanged, the (locally) D-optimal (generalized) design ξ * can be decomposed (ξ * = ξ * 1 ⊗ η) in a marginal probability measure ξ * 1 on [−1, 1] for x 1 and a probability kernel η given x 1 .For fixed x 1 the kernel η(x 1 , •) is the uniform distribution on the surface of a (k − 1)-dimensional ball with radius 1 − x 2

1
-the orbit at position x 1 .As a consequence the multidimensional problem collapses to a one-dimensional marginal problem.Only the positions of the orbits and their weights have to be determined.To get an exact design the uniform orbits have to be discretized, for example, by using regular simplices.
In our first paper -Radloff and Schwabe (2019b) -we started with models where the intensity function belongs to the class of monotonous functions.Such models have already been investigated in one dimension, for example, by Konstantinou et al (2014) and on multidimensional cuboids or orthants by Schmidt and Schwabe (2017).These authors gave the following four conditions on the intensity function λ: (A1) λ is positive on R and twice continuously differentiable.
(A2) The first derivative λ is positive on R.

(A3)
The second derivative u of u = 1 λ is injective on R.
(A4) The function λ λ is non-increasing.Condition (A2) is the motivation for the name class of monotonous intensity functions.The intensity functions of this class have to satisfy always (A1) to (A3).( A4) is an extra condition to guarantee uniqueness.For a concise notation is used and the properties (A1), (A2), (A3) and (A4) transfer to q for β 1 > 0, respectively, and vice versa.Poisson regression with intensity function q P (x 1 ) = exp(β 0 + β 1 x 1 ) and negative binomial regression as well as special proportional hazard models with censoring, see Schmidt and Schwabe (2017), satisfy all four conditions.If β 1 = 0 then the intensity function q is always a constant.This yields to a (locally) D-optimal design as it can be found in linear models.In Pukelsheim (1993, section 15.12) such a design consists of the equally weighted vertices of a regular simplex inscribed in the unit sphere, the boundary of the design space.The orientation of the simplex is arbitrary.The main result for β 1 > 0 in Radloff and Schwabe (2019b) is recited for the readers' convenience.
Theorem 1.There is a (locally) D-optimal design for the multiple regression problem (3.1) with β 1 > 0 and intensity function satisfying (A1)-(A3) which has one support point equal to (1, 0, . . ., 0) and the other k support points are the vertices of an arbitrarily rotated (k − 1)-dimensional regular simplex which is maximally inscribed in the intersection of the k-dimensional unit ball and a hyperplane with .
If additionally (A4) is satisfied, the solution x * 12 is unique.
12 is unique.The design is equally weighted with 1 k+1 .It should be noted, that for fixed β this theorem does not need (A1) to (A4) on the entire real line R.It is enough to have it in the ball and so on x 1 ∈ [−1, 1] for q and on [β 0 − β 1 , β 0 + β 1 ] for λ, respectively.But the model has to satisfy the conditions always on the whole real line.In our second paper -Radloff and Schwabe (2019a) -the conditions (A2) and (A3) were replaced by (A2 ) and (A3 ) and a fifth property (A5) was added.
In this context condition (A2 ) means that there exists a c ) and negative on (c (A2 ) λ , ∞).Hence, there is only one local maximum which is simultaneously the global maximum.So the class of intensity functions, which satisfy (A1), (A2 ) and (A3 ), is called class of unimodal intensity functions.Indeed (A2) or (A3) do not imply (A2 ) or (A3 ), respectively.As mentioned before, we only focus on the unit ball and the interval So in our special case (A2) and (A3) can be transferred to (A2 ) and (A3 ) by using an arbitrary c λ > β 0 + β 1 , which means that c q lies outside the interval [−1, 1] and only one branch of the function is considered.Property (A5) means This means that u(z) = 1 λ(z) goes faster to (±) infinity than z 2 for z → ∞.As (A1) to (A4) the conditions (A2 ), ( A3) and (A5) transfer from the intensity function λ to the abbreviated form q for β 1 > 0 and vice versa -analogously c The logit model has the intensity function and probit model has with the density function φ and cumulative distribution function Φ of the standard normal distribution.Both models satisfy all five conditions (A1), (A2 ), (A3 ), (A4), (A5) and share a common c β 1 for q.Beside these two models other models like the complementary log-log model, see Ford et al (1992), with intensity function λ comp log log (z) = exp(2z)  exp(exp(z))−1 satisfy all five conditions with c We showed that if the (concise) intensity function q satisfies (A1), (A2 ), ( A3) and (A5) the (locally) D-optimal design ξ * = ξ * 1 ⊗η is concentrated on exactly two orbits, which are the support points of the marginal design ξ * 1 .The idea of the proof is based on Biedermann et al ( 2006) and Konstantinou et al (2014).The next theorem is the main result of our second paper -Radloff and Schwabe (2019a) -and is reproduced for the readers' convenience.It characterizes the positions of the two support points of the optimal marginal design ξ * 1 .
2 be solution of the equation system: Figure 1: Logit model for k = 3 and β 1 = 1: Dependence of x * 11 and x * 12 (solid lines) and the corresponding weights w 1 and w 2 = 1 − w 1 (dashed lines) on ) is a solution of the equation system, the orbit positions are x * 11 = x, x * 12 = y with weights w 1 = 1 2 − α and To illustrate this complex issue we revisit the logit model in dimension k = 3 with β 1 = 1.We (numerically) plot the orbit positions x * 11 and x * 12 and corresponding weights w 1 and The cases (a) and (b) go along with Theorem 1 and the results from Radloff and Schwabe (2019b).The cases (c1) and (c2) yield marginal extremum solutions which are identical to (a) and (b).So for these four cases there is always an exact minimally supported (locally) D-optimal design.As described in Theorem 1, it consists of a pole point in x 1 = −1 or else x 1 = 1 and the k vertices of a (regular) simplex which is maximally inscribed in the non-degenerated orbit.But the problematic case is (c0) because the (locally) D-optimal (generalized) design consists of two non-degenerated orbits and additionally the weights are rarely appropriate for a discretization.In Radloff and Schwabe (2019a) we showed two examples for the logit model (k = 3, β 1 = 1) from which we derived (nearly) exact designs.For −β 0 = 0 the two orbit positions are symmetrical around 0, that is x * 11 = −x * 12 ≈ 0.52, and the weights are ξ * 1 (x * 11 ) = ξ * 1 (x * 12 ) = 1 2 .These two orbits were discretized by two 2-dimensional simplices -overall 6 equally weighted support points; see Figure 2 (left image).
For −β 0 = −0.1 it is x * 11 ≈ 0.42, x * 12 ≈ −0.62 and ξ * 1 (x * 11 ) ≈ 0.4297, while 0.4297 ≈ 3 7 .We took the rounded design ξ ≈ with the same support points x * 11 and x * 12 but with the marginal design ξ ≈ 1 (x * 11 ) = 3 7 and ξ ≈ 1 (x * 12 ) = 4 7 .So it was possible to substitute one orbit by the vertices of a 2-dimensional simplex (3 points -an equilateral triangle) and one by the vertices of a 2-dimensional cube or cross polytope (4 points -a square).Because of rounding the design ξ ≈ is not optimal but exact and has a high D-efficiency, which compares the rounded design ξ ≈ and the optimal design ξ * β 0 with respect to β 0 -here p = k + 1 = 4 and β 0 = (0.1, 1, 0, 0) : These designs are not very satisfactory.On the one hand the number of support points is not minimal.On the other hand only special cases have appropriate rational weights which allow a discretization or otherwise the optimality is lost by rounding.Therefore we want to establish minimal supported exact designs for the case (c0) in this paper.Mostly these designs wont be optimal but (highly) efficient.But we start with the reduction of the system of three equations in Theorem 2 to only one single equation for special unimodal intensity functions -symmetrical unimodal intensity functions -which can be found, for example, in binary response models with logit and probit link.

Optimal Design for Symmetrical Unimodal Intensity Functions
An interesting observation was made in the discussion section in Radloff and Schwabe (2019a).For models with unimodal intensity function in which the mode and threshold coincide (c and which are symmetrical, also the two orbit positions are symmetrical in a certain way, which we want to investigate here.For one dimension this has been considered and shown in Ford et al (1992, Section 6.5 and 6.6), but this proof cannot be extended to higher dimensions directly.Definition 1.An unimodal intensity function in which the mode and threshold coincide (c The intensity functions of the logit and probit models are symmetrical with c λ = 0.But the unimodal intensity function of the complementary log-log model has c and cannot be symmetrical for this reason. Lemma 1.Let the intensity function λ be symmetrical to c λ in the situation of Theorem 2 (c0).
• For given solve the equation system (3.7).
Lemma 1, whose proof sketch can be found in Appendix B, and Remark 2 in combination with Theorem 2 give (locally) D-optimal designs for models with symmetrical unimodal intensity functions.As a result we reduced the system of equations (3.4)-(3.6) to only one single equation (4.8).But now there is the question if condition (A4) can guarantee a unique solution as in Theorem 1 or in Theorem 2 (a) and (b) because Theorem 2 (c), especially (c0), tells nothing about uniqueness.But we want to add a remark about the values of r before.
Remark 3. Since the system of equations (3.4)-(3.6) in Theorem 2 (c0) should have a solution with two inner support points for the marginal design, x, y
Remark 4. For k = 1, see Remark 2, and for an intensity function satisfying (A4) there is only one solution of (4.13).
The proof sketch of Lemma 2 can be found in Appendix B. Lemma 2 guarantees a unique solution in r ∈ (0, |c λ − β 0 | + β 1 ).But Remark 3 points out that for Theorem 2 (c0) we need r ∈ (0, −|c λ − β 0 | + β 1 ).This means that the unique solution can result in the two-orbit case or in the one-orbit one-pole case of Theorem 2 (c).

Minimally Supported Designs
In the situation of Theorem 1 and Theorem 2 (a), (b), ( c1) and (c2) the designs have always the minimal number of support points to estimate the parameter vector β.These are k + 1 support points.In Radloff and Schwabe (2019a) revisited here in the introductory section we indicated exemplarily a (locally) D-optimal design for the logit model on the 3-dimensional ball with −β 0 = 0 and β 1 = 1.This design consists of six support points which are the vertices of two regular 2-dimensional simplices -equilateral triangles; see Figure 2 (left image).But this is not the minimum of support points to estimate the four parameters.So the question arises whether it is possible to reduce the number of support points as it can be found in the concept of fractional factorial designs, see, for example, Pukelsheim (1993, section 15.11).Instead of using all vertices of the hypercube [−1, 1] k as in the full factorial design the fractional factorial design picks only a special percentage of these points.For k = 3 represent a 2 3−1 -fractional factorial design.In our issue we do not want to pick four of the six points, but we want to use the orthogonality of the spaces spanned by the points (without the x 1 -component) in the two orbits (x 1 = −1 and x 1 = 1) of the given 2 3−1 -fractional factorial design.Here span{(−1, 1) , (1, −1) } ⊥ span{(−1, −1) , (1, 1) }.The idea for our problem is illustrated in Figure 2 (right image).The spanned spaces by points (without the x 1component) in the orbits are orthogonal to each other.And all points span a simplex.As stated above a (generalized) design ξ which is rotation invariant with fixed x 1invariant with respect to all orthogonal transformations in O(k) which do not change the x 1 -component -and which has all mass on the unit sphere can be decomposed into a marginal design ξ 1 on [−1, 1] and a probability kernel η (conditional design), i. e. ξ = ξ 1 ⊗ η.For fixed x 1 the kernel η(x 1 , •) is the uniform distribution on the surface of a (k − 1)-dimensional ball with radius 1 − x 2 1 -the orbit at position x 1 .If x 1 ∈ {−1, 1}, the (k − 1)-dimensional ball with the uniform distribution reduces to a single point and represents only a one-point-measure.Remembering q(x 1 ) = λ(β 0 + β 1 x 1 ) the related information matrix, see Radloff and Schwabe (2019b), is (5.14) with β 0 = (β 0 , β 1 , 0, . . ., 0) .The information matrix for a design on the k-dimensional unit sphere S k−1 , which is based on exactly two orbits, can be determined analogously to this result.Additionally the uniform distribution does not cover the the full orbits but only sub-spheres.
Lemma 3. Let ξ 1 be the two-point-measure in x 11 and x 12 with ξ 1 (x 11 ) = 1 2 − α and 12 .Then the information matrix is (5.15) ).Now the optimality case in Theorem 2 (c0) on two orbits should be used to investigate when both information matrices (5.14) und (5.15) are identical.With that both related (generalized) designs would be (locally) D-optimal.m+1) be a matrix, where the columns represent the m + 1 vertices of an m-dimensional regular simplex (in R m ).Then the columns of the matrix with arbitrary orthogonal transformations R 1 ∈ O(m − 1) and R 2 ∈ O(k − m) represent the support points of such a minimal supported design.
is an example for S m .In this notation I m stands for the standard simplex which needs to be scaled and shifted appropriately so that it is in combination with the last vertex − 1 √ m 1 m (last column) a regular simplex on the unit sphere S m−1 .Finally, we want to look at the D-efficiency, here with β 0 = (β 0 , β 1 , 0, . . ., 0) , for designs ξ with exactly p = k + 1 equally weighted support points in the region where two non-degenerated orbits occur.
As an example, the logit model with β 1 = 1 is used to determine the D-efficiency in dimensions k = 3 and k = 6.In Figure 3 and Figure 4 only the regions for −β 0 with two non-degenerated orbits in the optimal design (case (c0) in Theorem 2), i. e. −β 0 ∈ (−0.403, 0.403) (rounded) for k = 3 and −β 0 ∈ (−0.480, 0.480) (rounded) for k = 6, are plotted.For this purpose, three different types of exact designs are compared with the (locally) D-optimal design ξ * β 0 .The optimal design is a generalized design with real weights.Therefore it cannot be discretized as an exact design in general.First, the two optimal exact designs with one pole and one orbit, which are discretized as a regular (k−1)-dimensional simplex, are used for comparison.The orbit position remains unchanged and is determined at the boundary values −β 0 ≈ ±0.403 or −β 0 ≈ ±0.480.See the solid lines in both figures.Second, the designs with the same orbit position as the associated design which is (locally) optimal for −β 0 are the next alternative.Only the weights were rounded/shifted to integral multiples of 1 k+1 .See the dotted lines.Third, the designs with fixed design weights which are integral multiples of 1 k+1 represent the last model category.So only the positions of the orbits have to be optimized with these fixed design weights.This can be done by solving only the equations (3.4) and (3.5) with the selected weights in Theorem 2 (c).Equation (3.6) is omitted.See the dashed lines in both plots.The Figure 3 reveals for dimension k = 3 that there are only three positions in the range −β 0 ∈ [−0.403, 0.403] (rounded) where (locally) D-optimal designs with the minimal number of support points -four points -exists.For −β 0 ≈ −0.403 this is the design consisting of the pole x * 12 = −1 and one orbit at x * 11 with three points as vertices of an equilateral triangle.Then for −β 0 = 0 there are two orbits with two points each.And, at −β 0 ≈ 0.403 the design consists of one orbit at x * 12 with three equally weighted support points and the pole x * 11 = 1.In the span between these optimality positions the considered discretizations provide a fairly high efficiency.Using the transition directly from pole and orbit to orbit and pole, the efficiency is always greater than 0.988 (intersection of the solid lines).If the two orbits are also discretized in between, the efficiency is greater than 0.993 (intersection of dotted line and solid lines) or even greater than 0.997 (intersection of dashed line and solid lines).For dimension k = 6, see figure 4, an efficiency of more than 0.986 is possible by stepping directly from pole and orbit with six support points to orbit with six design points and pole.If the intermediate steps -two orbits with 2 and 5 points, 3 and 4 points, 4 and 3 points as well as 5 and 2 points -are used, then by simple rounding of the weights to integral multiples of 1 k+1 an efficiency greater than 0.995 (dotted lines) and with additional optimization of the orbit positions even greater than 0.999 (dashed lines) can be achieved.

Conclusion
In summary it can be postulated that very efficient designs are generated based on only k + 1 design points which is the minimal number of support points to estimate the parameter vector.It seems that higher dimensions enable designs with higher D-efficiency, in particular using the third option of discretization.Here we only considered designs with exactly two orbits.Thus it cannot be excluded that there are designs with a better efficiency or even (locally) optimal designs which are supported by exactly k + 1 points.Maybe these designs have support points which lie not on the orbit but are jittered a little bit.This as well as a potential lower efficiency bound needs further investigations.On the other side the reduction of the equation system to one single equation for determining (locally) D-optimal design for symmetrical unimodal intensity functions is a nice feature and can help to decrease computing costs.Also the question of optimal designs on the ball with respect to other optimality criteria should be considered in future.Finally, we want to emphasize that the established designs do not only work for the unit ball.By using the concept of equivariance for linear transformations, say scaling, reflecting and rotating, the class of design spaces can be extended to k-dimensional balls with arbitrary radius or any k-dimensional ellipsoid.
The two denominators are zero if and only if α = 1 2 − 1 k+1 and α = 1 2 − k k+1 , respectively.But this cannot happen to non-degenerated orbits because 1 2 − k k+1 < α < 1 2 − 1 k+1 .Putting both equations into the diagonal entry of the information matrix (5.14) yield They are identical to the diagonal entries of the information matrix (5.15) in Lemma 3 if and only if which are both equivalent to α = 1 2 − m k+1 .