1 Introduction

The contingent valuation method (CVM) using discrete response valuation questions is a widely used experimental method to measure the monetary value of nonmarket environmental goods. In the experiment, an agent is asked if she will buy a certain good at the price of x or not. She will accept the offered price if her willingness-to-pay (WTP) \(\omega \) to the good is higher than x. Let \(y = 0\) if x is accepted and \(y = 1\) if rejected, that is,

$$\begin{aligned} y=\mathbb {I}\{ \omega \le x \}. \end{aligned}$$
(1)

Let \(\mu \) be the distribution of \(\omega \). The objective of the experiment is to estimate the value of \(\theta (\mu )\), where \(\theta \) is a given function of \(\mu \). For example, if the mean value of the WTP is to be determined, \(\theta (\mu )=\int \omega \, d\mu (\omega )\) should be estimated. By observing independent copies of (xy) obtained from (1), the value of \(\theta (\mu )\) could be consistently estimated by standard statistical techniques such as probit, logit, or nonparametric maximum likelihood [18].

The survey design problem has been a major problem since the CVM was introduced by Bishop and Heberlein [6] and Hanemann [13]. For a survey question, a statistician should choose the bidding price distribution \(\nu \), from which x is randomly sampled. WTP estimates derived using the CVM are sensitive to the choice of \(\nu \) [9, 11, 17]. Optimal designs for \(\nu \) are to ease the sensitivity problem by minimizing the variance of these estimates. Cooper [10] proposed the optimal design using logit formulation of \(\mu \), and Alberini [1] studied the design for the probit. Duffield and Patterson [11] considered the optimal design for nonparametric \(\mu \). Kanninen [16] generalized the results to the multinomial logit model, where y takes multiple discrete values. For comprehensive surveys on the literature, please refer to [7, 8, 15].

This study investigated the optimal design problem from the perspective of information geometry. We generalize the nonparametric approach of Duffield and Patterson [11] by considering general response \(y=\rho (\omega , x)\) and nonspecified target \(\theta (\mu )\). Under the general settings, we formulated the optimal design problem as the minimization of the Cramér–Rao lower bound of \(\theta (\mu )\) over a set of the bidding price distributions \(\nu \). The problem was solved using general optimization techniques because it is the optimization of a function over a finite-dimensional space. However, in such approaches, the computation could be messy and a solution would be less intuitive. Instead, we used the information geometry method to formulate the problem. Because the Cramér–Rao lower bound is equal to the squared Fisher norm of a tangent vector field on the statistical manifold, the necessary and sufficient condition for the optimal design is concisely stated through dual connections [2,3,4].

The remainder of this paper is organized as follows: Sect. 2 introduces the geometry of finite measures. Section 3 presents the results of this study, which includes a necessary and sufficient condition for optimal design. According to this condition, a design becomes optimal if and only if it generates a vector field that is orthogonal to its own e-connection. In Sect. 4, the results are applied to the binary response experiment presented in (1). Section 5 presents the conclusion of the paper.

2 Geometry of finite measures

In this section, the geometry of finite measures is introduced. The terms and definitions are based on Chapter 2 of Ay et al. [4]. Let \(\mathcal {I}=\{1,\ldots , n\}\) be an arbitrary finite set. The linear space of function \(f: \mathcal {I}\rightarrow \mathbb {R}\) is denoted by \(\mathcal {F}(I)\). The space has the canonical basis \(\{e^i\in \mathcal {F}(\mathcal {I}):i\in \mathcal {I}\}\), where

$$\begin{aligned} e^i(j)= \left\{ \begin{array}{cc} 1 &{}\quad (j=i) \\ 0 &{}\quad (j\ne i). \end{array}\right. \end{aligned}$$

Each \(f\in \mathcal {F}(\mathcal {I})\) is expressed as follows: \(f=\sum _{i=1}^n f_i e^i\).

The dual space \(\mathcal {S}(\mathcal {I}):=\mathcal {F}^*(\mathcal {I})\) is the set of signed measures \(\mu :\mathcal {F}(\mathcal {I})\rightarrow \mathbb {R}\). The dual basis \(\{\delta _1,\ldots ,\delta _n \}\) is defined as follows:

$$\begin{aligned} \delta _i(e^j)= \left\{ \begin{array}{cc} 1 &{} \quad (i=j) \\ 0 &{}\quad (i\ne j). \end{array}\right. \end{aligned}$$

Each \(\mu \in \mathcal {S}(\mathcal {I})\) is expressed as \(\mu =\sum _{i=1}^n \mu ^i \delta _i\) with coefficients \(\{\mu ^1,\ldots ,\mu ^n\}\), such that

$$\begin{aligned} \mu (f):=\int _{\mathcal {I}} f\, d\mu :=\sum _{i=1}^n\mu ^if_i\in \mathbb {R} \end{aligned}$$

and

$$\begin{aligned} f\cdot \mu :=\sum _{i=1}^n f_i\mu ^i \delta _i\in \mathcal {S}(\mathcal {I}). \end{aligned}$$

On \(\mathcal {S}(\mathcal {I})\), we introduce a coordination system by \(\mu \mapsto (\mu ^1,\ldots ,\mu ^n)\). Given point \(\mu \in \mathcal {S}(\mathcal {I})\), the tangent space of \(\mathcal {S}(\mathcal {I})\) is \(T_\mu \mathcal {S}(\mathcal {I})=\text {Span}\left\{ \frac{\partial }{\partial \mu ^1},\ldots ,\frac{\partial }{\partial \mu ^n} \right\} \). The m-representation of a tangent vector \(a\in T_\mu \mathcal {S}(\mathcal {I})\) is

$$\begin{aligned} a(\mu )=\sum _{i=1}^n a^i \delta _i \in \mathcal {S}(\mathcal {I}), \end{aligned}$$

which allow us to identify \(T_\mu \mathcal {S}(\mathcal {I})\) with \(\mathcal {S}(\mathcal {I})\). In the following part, the tangent vectors and spaces are always given in the form of their m-representations.

Let \(\mathcal {M}_+(\mathcal {I})=\left\{ \mu \in \mathcal {S}(\mathcal {I}) \,:\, \mu ^i>0,\ i\in \mathcal {I} \right\} \). As an open submanifold of \(\mathcal {S}(\mathcal {I})\), the tangent space of \(\mathcal {M}_+(\mathcal {I})\) is identified with \(\mathcal {S}(\mathcal {I})\). Given two tangent vectors a and b in \(T_\mu \mathcal {M}_+(\mathcal {I})\), the Radon–Nikodym derivatives with respect to \(\mu \) are denoted by the following expression:

$$\begin{aligned} \frac{\textrm{d}a}{\textrm{d}\mu }=\sum _{i=1}^n\frac{a^i}{\mu ^i}e^i\quad \text {and}\quad \frac{\textrm{d}b}{\textrm{d}\mu }=\sum _{i=1}^n\frac{b^i}{\mu ^i}e^i. \end{aligned}$$

The Fisher metric on \(T_\mu \mathcal {M}_+(\mathcal {I})\) is now introduced by

$$\begin{aligned} \mathfrak {g}_\mu (a,b):=\mu \left( \frac{\textrm{d}a}{\textrm{d}\mu }\cdot \frac{\textrm{d}b}{\textrm{d}\mu } \right) =\sum _{i=1}^n \frac{a^i b^i}{\mu ^i}, \end{aligned}$$
(2)

and the Fisher norm is \(\Vert a\Vert _\mu :=\sqrt{ \mathfrak {g}_\mu (a,a) }\).

Let \(\mathcal {P}_+(\mathcal {I}):=\left\{ \mu \in \mathcal {M}_+(\mathcal {I}) \,:\, \sum _{i=1}^n\mu ^i=1 \right\} \), which is the set of positive probability measures on \(\mathcal {I}\). The tangent space \(T_\mu \mathcal {P}_+(\mathcal {I})\) is identified with

$$\begin{aligned} \mathcal {S}_0(\mathcal {I}):=\left\{ \mu \in \mathcal {S}(\mathcal {I}) \,:\, \sum _{i=1}^n\mu ^i=0 \right\} . \end{aligned}$$

Let \(\theta :\mathcal {P}_+(\mathcal {I})\rightarrow \mathbb {R}\) be a smooth functional. The differential of \(\theta \) in \(\mu \) is a linear form \((d \theta )_\mu :T_\mu \mathcal {P}_+(\mathcal {I})\rightarrow \mathbb {R}\) that is obtained by

$$\begin{aligned} (d \theta )_\mu a:=\frac{\partial \theta }{\partial a}(\mu ):=\lim _{t\rightarrow 0}\frac{\theta (\mu +ta)-\theta (\mu )}{t}. \end{aligned}$$

The Fisher metric allows the differential to be identified with the gradient \((\partial \theta )_\mu \):

$$\begin{aligned} (d \theta )_\mu a\equiv \mathfrak {g}_\mu (a,(\partial \theta )_\mu ),\quad a\in T_\mu \mathcal {P}_+(\mathcal {I}). \end{aligned}$$
(3)

The gradient vector field of \(\theta \) is as follows:

$$\begin{aligned} \partial \theta : \mathcal {P}_+(\mathcal {I})\rightarrow T\mathcal {P}_+(\mathcal {I}), \ \mu \mapsto (\partial \theta )_\mu . \end{aligned}$$

Given two points \(\mu \) and \(\mu '\) in \(\mathcal {P}_+(\mathcal {I})\), the m-parallel transport is determined by the following expression:

$$\begin{aligned} \varPi ^{(m)}_{\mu ,\mu '}:T_\mu \mathcal {P}_+(\mathcal {I})= \mathcal {S}_0(\mathcal {I})\rightarrow T_{\mu '}\mathcal {P}_+(\mathcal {I})= \mathcal {S}_0(\mathcal {I}),\quad a\mapsto a. \end{aligned}$$

The e-parallel transport \(\varPi _{\mu ,\mu '}^{(e)}:T_\mu \mathcal {P}_+(\mathcal {I})\rightarrow T_{\mu '}\mathcal {P}_+(\mathcal {I})\) is the conjugate of the m-transport and satisfies

$$\begin{aligned} \mathfrak {g}_{\mu '}\left( \varPi ^{(e)}_{\mu ,\mu '}a, \varPi ^{(m)}_{\mu ,\mu '}b \right) \equiv \mathfrak {g}_{\mu }\left( a, b \right) . \end{aligned}$$

For two smooth vector fields \(A:\mu \mapsto a_\mu \) and \(B:\mu \mapsto b_\mu \) on \(\mathcal {P}_+(\mathcal {I})\), the m-connection \(\nabla ^{(m)}\) and e-connection \(\nabla ^{(e)}\) are defined by the following expression:

$$\begin{aligned} \left. \nabla ^{(m)}_A B \right| _\mu := \lim _{t\rightarrow 0}\frac{ \varPi ^{(m)}_{\mu +ta_\mu ,\mu }b_{\mu +ta_\mu }-b_\mu }{t}= \frac{\partial b}{\partial a_\mu }(\mu ) \end{aligned}$$
(4)

and

$$\begin{aligned} \left. \nabla ^{(e)}_A B \right| _\mu:= & {} \lim _{t\rightarrow 0}\frac{ \varPi ^{(e)}_{\mu +ta_\mu ,\mu }b_{\mu +ta_\mu }-b_\mu }{t}\nonumber \\= & {} \frac{\partial b}{\partial a_\mu }(\mu ) -\left( \frac{\textrm{d}a_\mu }{\textrm{d}\mu }\cdot \frac{\textrm{d}b_\mu }{\textrm{d}\mu }-\mathfrak {g}_\mu (a_\mu ,b_\mu ) \right) \mu . \end{aligned}$$
(5)

See Appendix A.1 for the proof. According to the definitions,

$$\begin{aligned} \frac{\partial }{\partial c_\mu } (\mathfrak {g}(A,B))_\mu =\mathfrak {g}_\mu \Bigl (\left. \nabla ^{(m)}_C A \right| _\mu ,B_\mu \Bigr ) +\mathfrak {g}_\mu \Bigl ( A_\mu , \left. \nabla ^{(e)}_C B \right| _\mu \Bigr ) \end{aligned}$$
(6)

holds for three arbitrary vector fields A, B, and C, where \(\mathfrak {g}(A,B)\) denotes a function \(\mu \mapsto \mathfrak {g}_\mu (a_\mu ,b_\mu )\).

3 Main results

3.1 Model

Suppose that a mapping \(\rho :\mathcal {W}\times \mathcal {X}\mapsto \mathcal {Y}\) is provided, where \(\mathcal {W}=\{\omega _1,\ldots ,\omega _n \}\), \(\mathcal {X}=\{x_1,\ldots ,x_m \}\), and \(\mathcal {Y}=\{y_1,\ldots ,y_\ell \}\) are arbitrary finite sets. Let \(\sigma \in \mathcal {S}(\mathcal {W})\), \(\tau \in \mathcal {S}(\mathcal {X}\times \mathcal {Y})\), and \(f\in \mathcal {F}(\mathcal {W}\times \mathcal {X}\times \mathcal {Y})\). In the following, \(\sigma (f)\) and \(\tau (f)\) denote functions that are respectively determined by the following expression:

$$\begin{aligned} \sigma (f): \mathcal {X}\times \mathcal {Y}\rightarrow \mathbb {R},\ (x,y)\mapsto \int f(\omega ,x,y)\,\textrm{d}\sigma (\omega )=\sum _{i=1}^n f(\omega _i,x,y)\sigma ^i \end{aligned}$$

and

$$\begin{aligned} \tau (f): \mathcal {W}\rightarrow \mathbb {R},\ \omega \mapsto \int f(\omega ,x,y)\,\textrm{d}\tau (x,y)=\sum _{j=1}^m\sum _{k=1}^\ell f(\omega ,x_j,y_k)\tau ^{j,k}. \end{aligned}$$

Specifically, for a function \(\mathbb {I}_\rho :\mathcal {W}\times \mathcal {X}\times \mathcal {Y}\rightarrow \{0,1\}\) such that \(\mathbb {I}_\rho (\omega ,x,y)=\mathbb {I}\{ \rho (\omega ,x)=y\}\), \(\mu (\mathbb {I}_\rho )(x,y)=\sum _{i=1}^n\mathbb {I}\{\rho (\omega _i,x )=y \}\mu ^i\) provides the conditional distribution of \(y=\rho (\omega ,x)\) conditioned on x when \(\omega \) is distributed according to \(\mu \in \mathcal {P}_+(\mathcal {W})\). This is because

$$\begin{aligned} \textbf{P}\{ y=y_k|x=x_j\}= & {} E_\mu \left[ \mathbb {I}\{ y=y_k\} \mid x=x_j\right] \\= & {} E_\mu \left[ \mathbb {I}\{ \rho (\omega ,x_j)=y_k\} \right] \\= & {} \sum _{i=1}^n \mathbb {I}\{ \rho (\omega _i,x_j)=y_k\} \mu ^i. \end{aligned}$$

We denote the joint distribution of x and y, where \(\mu \in \mathcal {P}_+(\mathcal {W})\) and \(\nu \in \mathcal {P}_+(\mathcal {X})\), as

$$\begin{aligned} \rho (\mu ,\nu ):=\mu (\mathbb {I}_\rho )\cdot \nu , \end{aligned}$$
(7)

so that \(\rho (\mu ,\nu )(x_j,y_k)=\sum _{i=1}^n\mathbb {I}\{ \rho (\omega _i,x_j)=y_k \}\mu ^i\nu ^j\), in which \(\rho \) is considered as a mapping \(\mathcal {P}_+(\mathcal {W})\times \mathcal {P}_+(\mathcal {X})\rightarrow \mathcal {P}_+(\mathcal {X}\times \mathcal {Y})\). For simplicity of subsequent description, let

$$\begin{aligned} \rho _\nu (\cdot )=\rho (\cdot ,\nu ) \end{aligned}$$

and

$$\begin{aligned} \rho _\mu (\cdot )=\rho (\mu ,\cdot ). \end{aligned}$$

Given \(f\in \mathcal {F}(\mathcal {X}\times \mathcal {Y})\), the expectations and conditional expectations of f(xy) are computed as follows:

$$\begin{aligned} E_{\mu , \nu }[f(x,y)]= & {} \rho (\mu ,\nu )( f)\\= & {} \sum _{j=1}^m\sum _{k=1}^\ell f(x_i,y_k) \sum _{i=1}^n\mathbb {I}\{\rho (\omega _i,x_j)=y_k\}\mu ^i\nu ^j\\= & {} \sum _{i=1}^n\sum _{j=1}^m f(x_j,\rho (\omega _i,x_j)) \mu ^i\nu ^j, \\ E_\mu [f(x,y)|x]= & {} \sum _{i=1}^n f(x,\rho (\omega _i,x)) \mu ^i=\mu ( f(x,\rho (\cdot ,x)) ), \end{aligned}$$

and

$$\begin{aligned} E_\nu [f(x,y)|\omega ]=\sum _{j=1}^m f(x_j,\rho (\omega ,x_j)) \nu ^j=\nu ( f(\cdot , \rho (\omega ,\cdot )) ). \end{aligned}$$

For \(g\in \mathcal {F}(\mathcal {W})\), the conditional expectation of \(g(\omega )\) is expressed as follows:

$$\begin{aligned} E_\mu [g(\omega )|x,y]=\frac{ \sum _{i=1}^n g(\omega _i) \mathbb {I}\{\rho (\omega _i,x)=y\}\mu ^i }{ \sum _{i=1}^n \mathbb {I}\{\rho (\omega _i,x)=y\}\mu ^i } =\frac{\mu (g\cdot \mathbb {I}_\rho )(x,y)}{\mu (\mathbb {I}_\rho )(x,y)}. \end{aligned}$$

Let \(\mathcal {E}\) be an experiment introduced by \(\rho \) on \(\mathcal {X}\times \mathcal {Y}\):

$$\begin{aligned} \mathcal {E}:=R(\rho ):=\left\{ \rho (\mu ,\nu )\,:\, \mu \in \mathcal {P}_+(\mathcal {W}),\nu \in \mathcal {P}_+(\mathcal {X})\right\} , \end{aligned}$$
(8)

where \(R(\cdot )\) denotes the range of the given mapping. Two subsets, \(\mathcal {E}_\nu \) and \(\mathcal {E}_\mu \), are also provided by \(\mathcal {E}_\nu :=R(\rho _\nu )\) and \(\mathcal {E}_\mu :=R(\rho _\mu )\). In the following expression, we assume that

  1. (A1)

    \(\rho (\mu ,\nu )>0\) for every \(\mu \in \mathcal {P}_+(\mathcal {W})\) and \(\nu \in \mathcal {P}_+(\mathcal {X})\), and that

  2. (A2)

    \(\rho (\delta _1,\nu ), \ldots , \rho (\delta _n,\nu )\) are linearly independent, where \(\{\delta _1,\ldots ,\delta _n\}\) is the basis of \(\mathcal {S}(\mathcal {W})\).

Under (A1), \(\mathcal {E}\) becomes a submanifold of \(\mathcal {P}_+(\mathcal {X}\times \mathcal {Y})\), and \(\mathcal {E}_\nu \) and \(\mathcal {E}_\mu \) are submanifolds of \(\mathcal {E}\). Moreover, the following result is obtained:

Proposition 1

Assume (A1) and (A2). Let \(\theta :\mathcal {P}_+(\mathcal {W}) \rightarrow \mathbb {R}\) be an arbitrary mapping. Given \(\nu \), a mapping \(\kappa _\nu : \mathcal {E}_\nu \rightarrow \mathbb {R}\) exist such that

$$\begin{aligned} \kappa _\nu ( \rho _\nu (\mu ) ) \equiv \theta (\mu ). \end{aligned}$$
(9)

Proof

For arbitrary \(\mu _1\) and \(\mu _2\),

$$\begin{aligned} \rho _\nu (\mu _1)-\rho _\nu (\mu _2)=\mu _1(\mathbb {I}_\rho )\cdot \nu -\mu _2(\mathbb {I}_\rho )\cdot \nu =\sum _{i=1}^{n-1}(\mu _1^i-\mu _2^i)(\delta _i(\mathbb {I}_\rho )-\delta _n(\mathbb {I}_\rho ))\cdot \nu , \end{aligned}$$

which becomes 0 when and only when \(\mu _1=\mu _2\). Therefore, under (A1) and (A2), \(\rho _\nu \) becomes one to one. Let \(\kappa _\nu := \theta \circ (\rho _\nu )^{-1}\), and the proposition is shown. \(\square \)

The proposition reveals that (A1) and (A2) are sufficient conditions for the statistical identification of \(\theta (\mu )\). In the experiment, independent realizations of (xy), with which \(\rho _\nu (\mu )\) is estimated, were observed. The existence of a one-to-one correspondence between \(\rho _\nu (\mu )\) and \(\theta (\mu )\) implies that the value of \(\theta (\mu )\) is statistically estimated from the observations.

Because \(\rho _\nu \) and \(\rho _\mu \) are linear mappings, their differentials are obtained by \((d\rho _\nu )_\mu :\sigma \mapsto \rho _\nu (\sigma )=\rho (\sigma ,\nu )\) and \((d\rho _\mu )_\nu : \eta \mapsto \rho _\mu (\eta )=\rho (\mu ,\eta )\). Furthermore, as \(\rho (\mu ,\nu )\) is bilinear in \((\mu ,\nu )\), its differential at \((\mu ,\nu )\) is obtained by the following equation:

$$\begin{aligned} (d\rho )_{\mu ,\nu }=(d\rho _\nu )_\mu +(d\rho _\mu )_\nu ,\quad (\sigma ,\eta )\mapsto \rho (\sigma ,\nu )+\rho (\mu ,\eta ). \end{aligned}$$
(10)

The tangent spaces of \(\mathcal {E}_\nu \) and \(\mathcal {E}_\mu \) at \(\rho (\mu ,\nu )\) are \(T_{\rho (\mu ,\nu )}\mathcal {E}_\nu = R((d\rho _\nu )_\mu )\) and \(T_{\rho (\mu ,\nu )}\mathcal {E}_\mu = R((d\rho _\mu )_\nu )\), which are orthogonal to one another because of the following:

$$\begin{aligned}{} & {} \mathfrak {g}_{\rho (\mu ,\nu )}\left( (d\rho _\nu )_\mu \sigma ,\, (d\rho _\mu )_\nu \eta \right) \\{} & {} \quad =\sum _{j=1}^m\sum _{k=1}^\ell \frac{\left[ \sum _{i=1}^n\mathbb {I}\{ \rho (\omega _i,x_j)=y_k \}\sigma ^i\nu ^j \right] \cdot \left[ \sum _{i=1}^n\mathbb {I}\{ \rho (\omega _i,x_j)=y_k \}\mu ^i\eta ^j \right] }{\sum _{i=1}^n\mathbb {I}\{ \rho (\omega _i,x_j)=y_k \}\mu ^i\nu ^j}\\{} & {} \quad =\sum _{i=1}^n\sum _{j=1}^m\sum _{k=1}^\ell \mathbb {I}\{ \rho (\omega _i,x_j)=y_k \} \sigma ^i \eta ^j\\{} & {} \quad = \sum _{i=1}^n\sigma ^i\sum _{j=1}^m \eta ^j=0 \end{aligned}$$

for every \(\sigma \in T_\mu \mathcal {P}_+(\mathcal {W})= \mathcal {S}_0(\mathcal {W})\) and \(\eta \in T_\nu \mathcal {P}_+(\mathcal {X})= \mathcal {S}_0(\mathcal {X})\). The tangent space of \(\mathcal {E}\) at \(\rho (\mu ,\nu )\) is \(T_{\rho (\mu ,\nu )}\mathcal {E}=T_{\rho (\mu ,\nu )}\mathcal {E}_\nu \oplus T_{\rho (\mu ,\nu )}\mathcal {E}_\mu \).

The adjoint operator \((d\rho _\nu )^*_\mu \) is determined by the following expression:

$$\begin{aligned} (d\rho _\nu )^*_\mu : T_{\rho (\mu ,\nu )}\mathcal {E}_\nu \rightarrow T_\mu \mathcal {P}_+(\mathcal {W}), \quad \tau \mapsto \tau \left( \frac{ \mathbb {I}_\rho }{ \mu (\mathbb {I}_\rho ) }\right) \cdot \mu , \end{aligned}$$
(11)

where \(\tau =\sum _{j=1}^m\sum _{k=1}^\ell \tau ^{j,k}\delta _{j,k}\), \(\{\delta _{j,k}\}\) is the basis of \(\mathcal {S}(\mathcal {X}\times \mathcal {Y})\), and

$$\begin{aligned} \left[ \tau \left( \frac{ \mathbb {I}_\rho }{ \mu (\mathbb {I}_\rho ) }\right) \cdot \mu \right] (\omega _i)= \sum _{j=1}^m\sum _{k=1}^\ell \tau ^{j,k}\frac{ \mathbb {I}\{\rho (\omega _i,x_j)=y_k\} }{ \sum _{h=1}^n\mathbb {I}\{\rho (\omega _h,x_j)=y_k\}\mu ^h }\cdot \mu ^i. \end{aligned}$$
(12)

The operator is the adjoint of \((d\rho _\nu )_\mu \) because

$$\begin{aligned} \mathfrak {g}_{\mu }( \sigma , (d\rho _\nu )_\mu ^*\tau )= & {} \sum _{i=1}^n\frac{ \sigma ^i \cdot \tau \left( \frac{ \mathbb {I}_\rho }{ \mu (\mathbb {I}_\rho ) }\right) (\omega _i) \mu ^i }{\mu ^i}\\= & {} \sum _{i=1}^n \sigma ^i \sum _{j=1}^m\sum _{k=1}^\ell \tau ^{j,k}\frac{ \mathbb {I}\{\rho (\omega _i,x_j)=y_k\} }{ \sum _{h=1}^n\mathbb {I}\{\rho (\omega _h,x_j)=y_k\}\mu ^h }\\= & {} \sum _{j=1}^m\sum _{k=1}^\ell \tau ^{j,k}\frac{ \sum _{i=1}^n \mathbb {I}\{\rho (\omega _i,x_j)=y_k\}\sigma ^i\nu ^j }{ \sum _{i=1}^n\mathbb {I}\{\rho (\omega _i,x_j)=y_k\}\mu ^i\nu ^j }\\= & {} \sum _{j=1}^m\sum _{k=1}^\ell \frac{((d\rho _\nu )\sigma )(x_j,y_k)\cdot \tau (x_j,y_k)}{ \rho (\mu ,\nu )(x_j,y_k) }\\= & {} \mathfrak {g}_{\rho (\mu ,\nu )}\left( (d\rho _\nu )_\mu \sigma , \tau \right) . \end{aligned}$$

Note that the definition (12) of \((d\rho _\nu )_\mu ^*\) is independent of \(\nu \).

3.2 Optimal design

Suppose that the goal of the experiment is to estimate the value of \(\theta :\mathcal {P}_+(\mathcal {W})\rightarrow \mathbb {R}\) at a certain point \(\mu \). In the following, we assume that

  1. (A3)

    \((\partial \theta )_\mu \in R((d \rho _\nu )_\mu ^*)\)

for each \((\mu ,\nu )\in \mathcal {P}_+(\mathcal {W})\times \mathcal {P}_+(\mathcal {X})\). This is the differentiability condition in Ref. [22], and a regular estimation of \(\theta (\mu )\) is possible only if the condition holds.

Proposition 2

Assume (A1)(A3). The gradient \((\partial \kappa _\nu )_{\rho _\nu (\mu )}\) of \(\kappa _\nu =\theta \circ (\rho _\nu )^{-1}:\mathcal {E}_\nu \rightarrow \mathbb {R}\) exist such that

$$\begin{aligned} (\partial \theta )_{\mu }=(d\rho _\nu )^*_{\mu } (\partial \kappa _\nu )_{ \rho _\nu (\mu ) }, \quad (\partial \kappa _\nu )_{ \rho _\nu (\mu ) } \in T_{ \rho _\nu (\mu ) }\mathcal {E}_\nu . \end{aligned}$$
(13)

Proof

Because \(\kappa _\nu ( \rho _\nu (\mu ) ) \equiv \theta (\mu )\), the differential of \(\kappa _\nu \) is a linear mapping \(d \kappa _\nu \) such that

$$\begin{aligned} (d \kappa _\nu )_{ \rho _\nu (\mu ) } (d \rho _\nu ) \sigma = (d \theta )_\mu \sigma =\mathfrak {g}_{\mu }\left( \sigma , (\partial \theta )_\mu \right) \end{aligned}$$

for every \(\sigma \in T_\mu \mathcal {P}_+(\mathcal {W})\). See the following diagram:

Under (A3), there exists \(\hat{\tau } \in T_{\rho _\nu (\mu )}\mathcal {E}_\nu \) such that \((\partial \theta )_\mu =(d \rho _\nu )_\mu ^*\hat{\tau }\). Therefore,

$$\begin{aligned} (d \kappa _\nu )_{\rho _\nu (\mu )} (d \rho _\nu ) \sigma = \mathfrak {g}_\mu ( \sigma , (d\rho _\nu )_\mu ^* \hat{\tau }) = \mathfrak {g}_{\rho _\nu (\mu )}((d\rho _\nu )_\mu \sigma , \hat{\tau }), \end{aligned}$$

which implies \((d \kappa _\nu )_{\rho _\nu (\mu )}\tau =\mathfrak {g}_{\rho _\nu (\mu )}(\tau , \hat{\tau })\) holds for every \(\tau \in T_{\rho _\nu (\mu )} \mathcal {E}_\nu \). Because such \(\hat{\tau }\) is uniquely determined, \((\partial \kappa _\nu )_{\rho _\nu (\mu )}=\hat{\tau }\) satisfies the requirements of the proposition. \(\square \)

Equation (13) is typically referred to as the score equation. The solution \(\partial \kappa _\nu \) to the equation introduces a vector field \(\partial \kappa \) on \(\mathcal {E}\) by

$$\begin{aligned} \partial \kappa : \rho (\mu ,\nu ) \mapsto (\partial \kappa _\nu )_{\rho _\nu (\mu )}. \end{aligned}$$
(14)

The optimal design is defined as a minimizer of the Cramér-Rao lower bound \(\lambda (\theta |\nu )\) for the estimation of \(\theta =\theta (\mu )\). The lower bound can be found by computing the inverse of the Fisher information matrix, which is the variance matrix of the score

$$\begin{aligned} \left( \frac{\partial }{\partial \mu ^1} \log \rho _\nu (\mu )(x,y), \ldots , \frac{\partial }{\partial \mu ^{n-1}} \log \rho _\nu (\mu )(x,y)\right) , \end{aligned}$$

where \(\mu \) is parametrized by \(\mu = \sum _{i=1}^{n-1}\mu ^i\delta _i+\left( 1-\sum _{i=1}^{n-1}\mu ^i\right) \delta _n\). The computation involves complex matrix calculations. Moreover, we minimized the value to determine the optimal design.

An expression of the lower bound can be determined by characterizing the bound as the supremum of the Cramér–Rao lower bound of one-dimensional submodels. Let \(\epsilon >0\) be sufficiently small. Consider a smooth path \(t\in (-\epsilon ,\epsilon )\mapsto \mu _t \in \mathcal {P}_+(\mathcal {W})\), which passes through \(\mu \) at \(t=0\) with a velocity

$$\begin{aligned} \sigma =\left( \frac{d}{dt}\right) _{t=0} \mu _t \in \mathcal {S}_0(\mathcal {W}). \end{aligned}$$

Notably,

$$\begin{aligned} \left( \frac{d}{dt}\right) _{t=0} \rho _\nu (\mu _t) = \sum _{i=1}^n \mathbb {I}\{\rho (\omega _i,x)=y \}\left[ \left( \frac{d}{dt}\right) _{t=0} \mu ^i_t\right] \nu (x) =(d\rho _\nu )_\mu \sigma . \end{aligned}$$

The Cramér–Rao lower bound of ‘true’ \(t=0\) is the inverse of the Fisher information of the submodel at \(t=0\). Because

$$\begin{aligned} \left( \frac{d}{dt} \right) _{t=0} \log \rho _\nu (\mu _t)=\frac{\textrm{d}( (d\rho _\nu )_\mu \sigma )}{\textrm{d}\rho _\nu (\mu )}, \end{aligned}$$

the Fisher information of the submodel is

$$\begin{aligned} E_{\mu ,\nu }\left( \left( \frac{d}{dt} \right) _{t=0} \log \rho _\nu (\mu _t) \right) ^2 = \mathfrak {g}_{\rho (\mu ,\nu )}\left( (d\rho _\nu )_\mu \sigma ,(d\rho _\nu )_\mu \sigma \right) = \Vert (d\rho _\nu )_{\mu } \sigma \Vert _{\rho (\mu ,\nu )}^2. \end{aligned}$$

The lower bound of \(\theta =\theta (\mu )\) along with the one-parameter submodel \(t\mapsto \rho _\nu (\mu _t)\) is given by the following expression:

$$\begin{aligned} \lambda (\sigma ):=\left( \frac{\partial \theta }{\partial \sigma }(\mu ) \right) ^2 \Vert (d\rho _\nu )_{\mu } \sigma \Vert _{\rho (\mu ,\nu )}^{-2}. \end{aligned}$$

Let \(\hat{t}_S\) be the efficient estimator of \(t=0\) attaining the lower bound, where S denotes the sample size: that is,

$$\begin{aligned} \sqrt{S}\cdot \hat{t}_S\ {\mathop {\rightarrow }\limits ^{d}}\ N(0, \Vert (d\rho _\nu )_{\mu } \sigma \Vert _{\rho (\mu ,\nu )}^{-2}). \end{aligned}$$

Given the submodel \(t\mapsto \rho _\nu (\mu _t)\), the efficient estimator of \(\theta (\mu )\) is given by \(\hat{\theta }_S=\theta (\mu _{\hat{t}_S})\). By the Delta method, we have the following expression:

$$\begin{aligned} \sqrt{S}( \hat{\theta }_S-\theta )\ {\mathop {\rightarrow }\limits ^{d}}\ \frac{\partial \theta }{\partial \sigma }(\mu )\cdot N(0, \Vert (d\rho _\nu )_{\mu } \sigma \Vert _{\rho (\mu ,\nu )}^{-2})=N\left( 0, \lambda (\sigma )\right) \end{aligned}$$

holds (see e.g. Theorem 1.12 of Shao [21]). Because \(\frac{\partial \theta }{\partial \sigma }(\mu )=\mathfrak {g}_{\mu }\left( (\partial \theta )_{\mu },\sigma \right) \),

$$\begin{aligned} \lambda (\sigma )= & {} \mathfrak {g}_{\mu }\left( (\partial \theta )_{\mu }, \frac{\sigma }{ \Vert (d\rho _\nu )_{\mu } \sigma \Vert _{\rho (\mu ,\nu )} } \right) ^2\\= & {} \mathfrak {g}_{\rho (\mu ,\nu )}\left( (\partial \kappa _\nu )_{\rho (\mu ,\nu )},\frac{(d\rho _\nu )_{\mu }\sigma }{\Vert (d\rho _\nu )_{\mu } \sigma \Vert _{\rho ({\mu },\nu )}} \right) ^2 \end{aligned}$$

by the score equation (13). The Cramér–Rao lower bound for the full model is equal to the supremum of \(\lambda (\sigma )\) over the submodels \(t\mapsto \rho _\nu (\mu _t)\) [5, 22]. Since \((\partial \kappa _\nu )_{\rho (\mu ,\nu )}\in R((d\rho _\nu )_{\mu })\),

$$\begin{aligned} \lambda (\theta |\nu )= & {} \sup _{\sigma \in T_{\mu }\mathcal {P}_+(\mathcal {W})}\lambda (\sigma )\\= & {} \mathfrak {g}_{\rho (\mu ,\nu )}\left( (\partial \kappa _\nu )_{\rho (\mu ,\nu )},\frac{ (\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{\Vert (\partial \kappa _\nu )_{\rho (\mu ,\nu )} \Vert _{\rho ({\mu },\nu )}} \right) ^2\\= & {} \Vert (\partial \kappa _\nu )_{\rho _\nu (\mu )} \Vert _{\rho (\mu ,\nu )}^2. \end{aligned}$$

The supremum of \(\lambda (\sigma )\) is attained by \(\sigma \) such that \((\partial \kappa _\nu )_{\rho (\mu ,\nu )}=(d\rho _\nu )_\mu \sigma \). A submodel having tangent vector \(\sigma \) at \(\mu \) produces the largest variance to estimate \(\theta (\mu )\) among other submodels. Such submodels with tangent vector \(\sigma \) are called the least favorable or the hardest submodel [23].

Proposition 3

The Cramér–Rao lower bound of \(\theta =\theta (\mu )\) under \(\nu \) is as follows:

$$\begin{aligned} \lambda (\theta |\nu )=\Vert (\partial \kappa _\nu )_{\rho _\nu (\mu )} \Vert ^2_{ \rho (\mu ,\nu ) }, \end{aligned}$$
(15)

where \((\partial \kappa _\nu )_{\rho _\nu (\mu )}\) is a solution to the score equation (13).

Proposition 4

\(\lambda (\theta |\nu )\) is convex in \(\nu \).

Proof

Let \(G(\nu ):=[ g_{i,h}(\nu )]\) be an \(n\times n\) matrix with the (ih) element

$$\begin{aligned} g_{i,h}(\nu ):=\mathfrak {g}_{\rho (\mu ,\nu )}\left( \rho (\delta _i,\nu ), \rho (\delta _h,\nu ) \right) \end{aligned}$$

for \(1\le i \le n\) and \(1\le h\le n\). The matrix is linear in \(\nu \) and nonsingular according to (A2). Because \((\partial \kappa _\nu )_{\rho (\mu ,\nu )}\) is in \(T_{\mu }\mathcal {E}_\nu \), there exists \(\hat{\sigma }_{\nu } \in T_{\mu }\mathcal {E}_\nu \) such that \((\partial \kappa _\nu )_{\rho (\mu ,\nu )}=\rho (\hat{\sigma }_{\nu },\nu )\). Moreover, for \(1\le i \le n\), we have the following expression:

$$\begin{aligned} \frac{\textrm{d}((\partial \theta )_{\mu } )}{\textrm{d}\mu }(\omega _i)= & {} (\partial \kappa _\nu )_{\rho (\mu ,\nu )} \left( \frac{\mathbb {I}_\rho }{\mu (\mathbb {I}_\rho )} \right) (\omega _i)\nonumber \\= & {} \int \frac{\mathbb {I}_\rho (\omega _i,x,y)}{\mu (\mathbb {I}_\rho )(x,y)}\,\textrm{d}\rho (\hat{\sigma }_{\nu }, \nu )(x,y)\nonumber \\= & {} \int \frac{ \textrm{d}\rho (\delta _i,\nu )}{\textrm{d}\rho (\mu ,\nu )} \cdot \frac{ \textrm{d}\rho (\hat{\sigma }_{\nu },\nu )}{\textrm{d}\rho (\mu ,\nu )}\,\textrm{d}\rho (\mu , \nu )\nonumber \\= & {} \mathfrak {g}_{\rho (\mu ,\nu )}\left( \rho (\delta _i,\nu ), \rho (\hat{\sigma }_{\nu },\nu )\right) \nonumber \\= & {} \sum _{h=1}^n g_{i,h}(\nu )\hat{\sigma }_{\nu }^h. \end{aligned}$$
(16)

Let \({\varvec{\gamma }}=(\gamma _1,\ldots ,\gamma _n)^\top \) be a vector of coefficients of \({\textrm{d}((\partial \theta )_{\mu })}/{\textrm{d}\mu }\), and let \(\hat{{\varvec{\sigma }}}_{\nu }=(\hat{\sigma }_{\nu }^1,\ldots ,\hat{\sigma }_{\nu }^n)^\top \). Then, (16) implies that \(\hat{{\varvec{\sigma }}}_{\nu }=G(\nu )^{-1}{\varvec{\gamma }}\). From Proposition 3,

$$\begin{aligned} \lambda (\theta |\nu )= \hat{{\varvec{\sigma }}}_{\nu }^\top G(\nu )\hat{{\varvec{\sigma }}}_{\nu } ={\varvec{\gamma }}^\top G(\nu )^{-1}{\varvec{\gamma }}. \end{aligned}$$

Therefore, we have the following expression:

$$\begin{aligned} \lambda (\theta |\,t\nu _1+(1-t)\nu _2\,)= & {} {\varvec{\gamma }}^\top G(t\nu _1+(1-t)\nu _2)^{-1}{\varvec{\gamma }}\\= & {} {\varvec{\gamma }}^\top \Bigl [ tG(\nu _1)+(1-t)G(\nu _2)\Bigr ]^{-1}{\varvec{\gamma }} \\\le & {} t\lambda (\theta |\nu _1)+(1-t)\lambda (\theta |\nu _2) \end{aligned}$$

for arbitrary \(\nu _1\) and \(\nu _2\) in \(\mathcal {P}_+(\mathcal {X})\) and for any \(t\in (0,1)\) because of the convexity of the matrix inversion: for any positive definite matrices A and B,

$$\begin{aligned} ( t A+(1-t)B )^{-1} \le t A^{-1}+(1-t)B^{-1} \end{aligned}$$
(17)

holds, where the inequality is in the sense of positive definite matrices [19, 20]. \(\square \)

Definition 1

The optimal design for \(\theta =\theta (\mu )\) is \({\nu }\in \mathcal {P}_+(\mathcal {X})\) such that

$$\begin{aligned} \lambda (\theta |{\nu }) =\inf _{ \nu ' \in \mathcal {P}_+(\mathcal {X}) } \lambda (\theta |\nu '). \end{aligned}$$
(18)

Theorem 1

\({\nu }\) is optimal for \(\theta =\theta (\mu )\) if and only if

$$\begin{aligned} \mathfrak {g}_{\rho (\mu ,\nu )}\left( (\partial \kappa _\nu )_{\rho (\mu ,\nu )}, \nabla _H^{(e)}(\partial \kappa _\nu )_{\rho (\mu ,\nu )} \right) =0 \end{aligned}$$
(19)

holds for any \(H\in T_{\rho (\mu ,\nu )}\mathcal {E}_{\mu }\) (Fig. 1).

Fig. 1
figure 1

The first order condition (19) of the optimal design

Proof

Because \(\nu \mapsto \lambda (\theta |\nu )\) is convex, the lower bound is minimized at \(\nu \) if and only if the first order condition

$$\begin{aligned} \left( \frac{\partial }{\partial \eta }\right) \lambda (\theta |\nu )=0 \end{aligned}$$
(20)

holds for any \(\eta \in T_\nu \mathcal {P}_+(\mathcal {X})\). Because \(\lambda (\theta |\nu )=\Vert (\partial \kappa _\nu )_{\rho (\mu ,\nu )}\Vert _{\rho (\mu ,\nu )}^2\), (20) is equivalent to

$$\begin{aligned} \mathfrak {g}\left. \left( \nabla _H^{(m)}\partial \kappa ,\partial \kappa \right) \right| _{\rho (\mu ,\nu )} +\mathfrak {g}\left. \left( \partial \kappa , \nabla _H^{(e)}\partial \kappa \right) \right| _{\rho (\mu ,\nu )}=0 \end{aligned}$$

for \(H=(d\rho _{\mu })_{\nu }\eta \in T_{\rho (\mu ,\nu )}\mathcal {E}_{\mu }\). From the definition of \(\nabla ^{(m)}\),

$$\begin{aligned} \left. \nabla _H^{(m)}\partial \kappa \right| _{\rho (\mu ,\nu )}=\lim _{t\rightarrow 0}\frac{ (\partial \kappa _{\nu +t \eta })_{\rho (\mu ,\nu +t\eta )}-(\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{ t }. \end{aligned}$$

Because \((d\rho _{\nu +t\eta })_\mu ^*=(d\rho _\nu )_\mu ^*\),

$$\begin{aligned} (d\rho _\nu )_{\mu }^* \left( \left. \nabla _H^{(m)}\partial \kappa \right| _{\rho (\mu ,\nu )}\right)= & {} \lim _{t\rightarrow 0}\frac{ (d\rho _{\nu +t\eta })_{\mu }^*(\partial \kappa _{\nu +t \eta })_{\rho (\mu ,\nu +t\eta )}-(d\rho _\nu )_{\mu }^*(\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{ t }\\= & {} \lim _{t\rightarrow 0}\frac{ (\partial \theta )_{\mu }-(\partial \theta )_{\mu } }{ t }=0. \end{aligned}$$

Therefore, we have the following expression:

$$\begin{aligned} \left. \mathfrak {g}\left( \nabla _H^{(m)}\partial \kappa , \partial \kappa \right) \right| _{\rho (\mu ,\nu )}= & {} \mathfrak {g}_{\mu }\left( (d\rho _\nu )_{\mu }^*\left( \left. \nabla _H^{(m)}\partial \kappa \right| _{\rho (\mu ,\nu )}\right) ,\, \hat{\sigma }_{\nu } \right) \nonumber \\ {}= & {} 0, \end{aligned}$$
(21)

where \((\partial \kappa )_{\rho (\mu ,\nu )}=(\partial \kappa _\nu )_{\rho (\mu ,\nu )}=(d\rho _\nu )_\mu \hat{\sigma }_\nu \). \(\square \)

Corollary 1

\({\nu }\) is the optimal design for \(\theta (\mu )\) if and only if

$$\begin{aligned} E_{\mu }\left[ \left. \left( \frac{\textrm{d} (\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{\textrm{d} \rho (\mu ,{\nu })} (x,y)\right) ^2 \right| x \right] = E_{\mu ,\nu }\left[ \left( \frac{\textrm{d} (\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{\textrm{d} \rho (\mu ,{\nu })} (x,y)\right) ^2 \right] \end{aligned}$$
(22)

for all \(x\in \mathcal {X}\).

Proof

From the definition of \(\nabla ^{(e)}\), we have the following equation:

$$\begin{aligned} \left. \nabla _H^{(e)}\partial \kappa \right| _{\rho (\mu ,\nu )} = \left. \nabla _H^{(m)}\partial \kappa \right| _{\rho (\mu ,\nu )} - \left( \frac{\textrm{d}\eta }{ \textrm{d}\nu }\cdot \frac{\textrm{d}(\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{ \textrm{d}\rho (\mu ,\nu ) } \right) \rho (\mu ,\nu ). \end{aligned}$$

Note that \(\mathfrak {g}_{\rho (\mu ,\nu )}(\rho (\mu ,\eta ), (\partial \kappa _\nu )_{\rho (\mu ,\nu )})=0\) because \(\rho (\mu ,\eta ) \in T_{\rho (\mu ,\nu )}\mathcal {E}_{\mu }\) and \((\partial \kappa _\nu )_{\rho (\mu ,\nu )}\in T_{\rho (\mu ,\nu )}\mathcal {E}_\nu \). Therefore, (19) is equivalent to

$$\begin{aligned} E_{\mu ,\nu }\left[ \frac{\textrm{d}\eta }{ \textrm{d}\nu }(x)\cdot E_{\mu }\left[ \left. \left( \frac{\textrm{d}(\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{ \textrm{d}\rho (\mu ,\nu ) }(x,y)\right) ^2 \right| x\right] \right] =0, \end{aligned}$$

which holds for an arbitrary \(\eta \in \mathcal {S}_0(\mathcal {X})\) if and only if (22) is satisfied. \(\square \)

The intuition behind this condition can be obtained from the following expression:

$$\begin{aligned} \lambda (\theta \mid {\nu })= \int E_{\mu }\left[ \left. \left( \frac{ \textrm{d}(\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{ \textrm{d}\rho (\mu ,\nu ) }(x,y)\right) ^2\,\right| \, x\, \right] \, \textrm{d} {\nu }(x). \end{aligned}$$
(23)

If the lower bound is minimized at \( {\nu }\), any small perturbations that are added to \( {\nu }\) will not significantly change the value of \(\lambda (\theta \mid {\nu })\). This condition is possible if and only if the integrand on the right-hand side is independent of x.

Example 1

To see how the theorem works, let us consider a trivial response function \(\rho (\omega ,x)=\omega +x\), where \(\mathcal {W}\) and \(\mathcal {X}\) are subsets of \(\mathbb {R}\). In this case, the joint density of (xy) is given by \(\rho (\mu ,\nu )(x,y)=\sum _{i=1}^n\mathbb {I}\{\omega _i+x=y\}\mu ^i\nu (x)=\mu (y-x)\nu (x)\). The differential of \(\rho _\nu \) and its adjoint are as follows:\((d\rho _\nu )_\mu \sigma (x,y)=\sigma (y-x)\nu (x)\) and \((d\rho _\nu )_\mu ^*\tau (\omega )=\sum _{j=1}^m \tau (x_j,x_j+\omega )\) because

$$\begin{aligned} \mathfrak {g}_{\rho (\mu ,\nu )}\left( (d\rho _\nu )_\mu \sigma ,\tau \right)= & {} \sum _{j=1}^m \sum _{k=1}^\ell \frac{ \sigma (y_k-x_j)\nu (x_j)\cdot \tau (x_j,y_k) }{ \mu (y_k-x_j)\nu (x_j) }\\= & {} \sum _{i=1}^n \frac{ \sigma (\omega _i)\sum _{j=1}^m\tau (x_j,\omega _i+x_j) }{ \mu (\omega _i) }\\= & {} \mathfrak {g}_{\mu }\Bigl ( \sigma ,\ \sum _{j=1}^m\tau (x_j,\cdot +x_j) \Bigr ). \end{aligned}$$

The score equation \((\partial \theta )_\mu (\omega )=\sum _{j=1}^m\partial \kappa _\nu (x_j,\omega +x_j)\) is solved by \(\partial \kappa _\nu (x,y)=(\partial \theta )_\mu (y-x)\nu (x)\). For every \(\eta \in T_\nu \mathcal {P}_+(\mathcal {X})\) and \(H=\rho (\mu ,\eta )\),

$$\begin{aligned} \nabla _H^{(m)}\partial \kappa _\nu (x,y)=(\partial \theta )_\mu (y-x)\eta (x) \end{aligned}$$

and

$$\begin{aligned} \nabla _H^{(e)}\partial \kappa _\nu (x,y)= & {} \nabla _H^{(m)}\partial \kappa _\nu (x,y)- \frac{\textrm{d}\eta }{ \textrm{d}\nu }(x) \cdot \frac{\textrm{d}\partial \kappa _\nu }{ \textrm{d}\rho (\mu ,\nu )}(x,y) \cdot \rho (\mu ,\nu )(x,y) \\= & {} (\partial \theta )_\mu (y-x)\eta (x)-\frac{\textrm{d}\eta }{ \textrm{d}\nu }(x)\cdot (\partial \theta )_\mu (y-x)\nu (x)\\= & {} 0. \end{aligned}$$

Therefore, the condition (19) is trivially satisfied at arbitrary \(\nu \). In this example, \(\omega \) is always observable because \(\rho \) is invertible as \(\omega =y-x\). The distribution of x does not affect estimation efficiency. Thus, the choice of \(\nu \) becomes significant only when a model with information loss is estimated.

The optimality can be checked by applying the corollary, too. In this example,

$$\begin{aligned} \frac{ \textrm{d}\partial \kappa _\nu }{ \textrm{d}\rho (\mu ,\nu ) }(x,y)=\frac{ (\partial \theta )_\mu (y-x)\nu (x) }{ \mu (y-x)\nu (x) }=\frac{ (\partial \theta )_\mu (y-x)}{ \mu (y-x)}, \end{aligned}$$

which is independent of \(\nu \). Because

$$\begin{aligned} E_\mu \left[ \left. \left( \frac{ \textrm{d}\partial \kappa _\nu }{ \textrm{d}\rho (\mu ,\nu ) }(x,y)\right) ^2 \right| \,x\, \right] = \sum _{i=1}^n\left( \frac{ (\partial \theta )_\mu (\omega _i)}{ \mu (\omega _i)}\right) ^2\mu (\omega _i) \end{aligned}$$

is independent of x, arbitrary \(\nu \) becomes optimal when \(\rho (\omega ,x)=\omega +x\).

4 Binary response experiment

The optimal design to estimate the mean WTP \(E\omega \) with the binary response (1) and nonparametric \(\mu \) was proposed by Duffield and Patterson [11]. They directly minimized the asymptotic variance of the maximum likelihood estimator of \(E\omega \) to determine the optimal design. In this section, we apply Theorem 1 to replicate their result.

Let \(\mathcal {W}=\{\xi _1,\ldots ,\xi _n\}\), \(\mathcal {X}=\{\xi _1,\ldots ,\xi _{n-1}\}\), and \(\mathcal {Y}=\{0,1\}\), where \(\xi _1,\ldots ,\xi _n\) are n real numbers such that \(0\le \xi _1<\cdots <\xi _n\). Because \(\omega \le \xi _n\) holds with probability one, the highest price \(\xi _n\) is not contained in the support of x.

In the experiment, the joint distribution of (xy) is obtained by the following expression:

$$\begin{aligned} \rho (\mu ,\nu )(\xi _j,y)=\left( y\mu [0,\xi _j]+(1-y)\mu (\xi _j,\infty )\right) \nu ^j \end{aligned}$$
(24)

for \(j=1,\ldots ,n-1\) and \(y=0,1\), where \(\mu [0,\xi _j]:=\sum _{i\le j}\mu ^i\) and \(\mu (\xi _j,\infty ):=\sum _{i>j}\mu ^i\). The model satisfies the condition (A1) because \(\mu (\mathbb {I}_\rho )(\xi _j,y)\ge y \mu ^1+(1-y)\mu ^n>0\) holds for any \((\xi _j,y)\in \mathcal {X}\times \mathcal {Y}\). Assume that, for \(c_1,\ldots ,c_n\in \mathbb {R}\), \(\sum _{i=1}^n c_i\delta _i(\mathbb {I}_\rho )(\xi _j,y)=0\) holds for any \((\xi _j,y)\in \mathcal {X}\times \mathcal {Y}\). Because

$$\begin{aligned} \delta _i(\mathbb {I}_\rho )(\xi _j,y)=y\mathbb {I}\{ i\le j \}+(1-y)\mathbb {I}\{ i> j \}, \end{aligned}$$

\(y\sum _{i\le j}c_i+(1-y)\sum _{i>j}c_j=0\) holds for \(j=1,\ldots ,n-1\), which implies that \(c_1=\cdots =c_n=0\). Hence, condition (A2) is also satisfied.

The differential of \(\rho _\nu \) is obtained by the following:

$$\begin{aligned} ((d\rho _\nu )_\mu \sigma )(\xi _j,y)=\bigl ( y\sigma [0,\xi _j]+(1-y)\sigma (\xi _j,\infty )\bigr )\nu ^j \end{aligned}$$

for every \(\sigma \in T_\mu \mathcal {P}_+(\mathcal {W})=\mathcal {S}_0(\{\xi _1,\ldots ,\xi _n\})\). The adjoint operator is determined by

$$\begin{aligned} ((d\rho _\nu )_\mu ^*\tau )(\xi _i)=\left( \sum _{j=i}^{n-1}\nu ^j \frac{\sigma [0,\xi _j]}{\mu [0,\xi _j]}+\sum _{j=1}^{i-1}\nu ^j\frac{\sigma (\xi _j,\infty )}{\mu (\xi _j,\infty )}\right) \mu ^i \end{aligned}$$

for each \(\tau =\rho (\sigma ,\nu ) \in T_{\rho (\mu ,\nu )}\mathcal {E}_\nu \).

Let \(\gamma :=\textrm{d}(\partial \theta )_\mu /\textrm{d}\mu =\sum _{i=1}^n\gamma _i e^i\) with

$$\begin{aligned} \gamma _i=\left\{ \begin{array}{lc} \displaystyle \frac{\partial \theta }{\partial \mu ^i}(\mu ) -\sum _{h=1}^{n-1} \mu ^h \frac{\partial \theta }{\partial \mu ^h}(\mu ) &{}\quad (1\le i\le n-1)\\ \displaystyle -\sum _{h=1}^{n-1} \mu ^h \frac{\partial \theta }{\partial \mu ^h}(\mu ) &{}\quad (i=n) \end{array} \right. \end{aligned}$$
(25)

The derivation of (25) is explained in Appendix A.2. Let \((\partial \kappa _\nu )_{\rho (\mu ,\nu )}=\rho (\hat{\sigma }_{\nu },\nu )\), where \(\hat{\sigma }_{\nu }\) satisfies

$$\begin{aligned} \gamma _i=\sum _{j=i}^{n-1}\nu ^j \frac{\hat{\sigma }_{\nu }[0,\xi _j]}{\mu [0,\xi _j]}+\sum _{j=1}^{i-1}\nu ^j\frac{\hat{\sigma }_{\nu }(\xi _j,\infty )}{\mu (\xi _j,\infty )} \end{aligned}$$
(26)

for \(1\le i\le n\). Note that condition \(\hat{\sigma }_\nu \in \mathcal {S}_0(\mathcal {W})\) implies \(\hat{\sigma }_\nu [0,\infty )=\hat{\sigma }_\nu [0,\xi _i]+\hat{\sigma }_\nu (\xi _i,\infty )=0\) for \(1\le i\le n-1\). By the score equation (26),

$$\begin{aligned} \gamma _{i+1}-\gamma _i =-\nu ^i\frac{ \hat{\sigma }_\nu [0,\xi _i] }{ \mu [0,\xi _i] } +\nu ^i\frac{ \hat{\sigma }_\nu (\xi _i,\infty ) }{ \mu (\xi _i,\infty ) } =-\nu ^i\frac{ \hat{\sigma }_\nu [0,\xi _i] }{ \mu [0,\xi _i] \mu (\xi _i,\infty ) }, \end{aligned}$$

which implies

$$\begin{aligned} \hat{\sigma }_{\nu }[0,\xi _j]= -\frac{ \gamma _{j+1}-\gamma _j }{\nu ^j}\mu [0,\xi _j]\mu (\xi _j,\infty ) \end{aligned}$$

for \(1\le j\le n-1\) and \(\hat{\sigma }_{\nu }[0,\xi _n]=0\). Thus, we obtain the following expression:

$$\begin{aligned} \frac{\textrm{d}(\partial \kappa _\nu )_{\rho (\mu ,\nu )}}{\textrm{d}\rho (\mu ,\nu )}(\xi _j,y) = -\frac{ \gamma _{j+1}-\gamma _j }{\nu ^j}\Bigl ( y\mu (\xi _j,\infty )-(1-y)\mu [0,\xi _j]\Bigr ) \end{aligned}$$

and

$$\begin{aligned} \lambda (\theta (\mu )\mid \nu )= \sum _{j=1}^{n-1}\frac{\left( \gamma _{j+1}-\gamma _j \right) ^2}{\nu ^j}\mu [0,\xi _j]\mu (\xi _j,\infty ). \end{aligned}$$
(27)

The conditional expectation

$$\begin{aligned} E_\mu \left[ \left. \left( \frac{\textrm{d} (\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{\textrm{d}\rho (\mu ,\nu )}(x,y)\right) ^2\right| \ x=\xi _j\ \right] =\left( \frac{ \gamma _{j+1}-\gamma _j }{\nu ^j}\right) ^2\mu [0,\xi _j]\mu (\xi _j,\infty ) \end{aligned}$$

becomes independent of \(\xi _j\) if and only if

$$\begin{aligned} {\nu }^j = \frac{\left| \gamma _{j+1}-\gamma _j \right| \sqrt{ \mu [0,\xi _j]\mu (\xi _j,\infty ) }}{\sum _{h=1}^{n-1}\left| \gamma _{h+1}-\gamma _h \right| \sqrt{ \mu [0,\xi _h]\mu (\xi _h,\infty ) }} \end{aligned}$$
(28)

for \(j=1,\ldots ,n-1\). In particular, when \(\theta (\mu )=\int f\textrm{d}\mu \) with \(f\in \mathcal {F(\mathcal {W})}\), the optimal design for \(\theta (\mu )\) is obtained by the following expression:

$$\begin{aligned} {\nu }^j = \frac{\left| f(\xi _{j+1})-f(\xi _j) \right| \sqrt{ \mu [0,\xi _j]\mu (\xi _j,\infty ) }}{\sum _{h=1}^{n-1}\left| f(\xi _{h+1})-f(\xi _h) \right| \sqrt{ \mu [0,\xi _h]\mu (\xi _h,\infty ) }} \end{aligned}$$
(29)

because

$$\begin{aligned} \frac{\partial \theta }{\partial \mu ^j}(\mu )= \frac{\partial }{\partial \mu ^j}\left[ \sum _{i=1}^{n-1}\mu ^i f(\xi _i)+\left( 1-\sum _{i=1}^{n-1}\mu ^i \right) f(\xi _n)\right] =f(\xi _j)-f(\xi _n). \end{aligned}$$

Equation (29) is equivalent to equation (8) of Duffield and Patterson [11].

However, this design is not feasible because it contains an unknown \(\mu \). A feasible alternative is the min–max design that is defined as follows:

$$\begin{aligned} \nu _{\text {min--max}}:=\text {Arg}\min _{\nu \in \mathcal {P}_+(\mathcal {X})} \left[ \sup _{\mu \in \mathcal {P}_+(\mathcal {W})}\lambda (\theta (\mu )|\nu ) \right] . \end{aligned}$$

In the binary experiment, the maximal risk to estimate \(\theta (\mu )=\int f\textrm{d}\mu \) is equal to the following:

$$\begin{aligned} \sup _{\mu \in \mathcal {P}_+(\mathcal {W})}\lambda (\theta (\mu )\mid \nu )=\sum _{j=1}^{n-1}\frac{\left( f(\xi _{j+1})-f(\xi _j) \right) ^2}{4\nu ^j}, \end{aligned}$$

where the supremum is obtained by \(\mu =\delta _1/2+\delta _n/2\). The risk is minimized by the following expression:

$$\begin{aligned} \nu _{\text {min--max}}^j=\frac{ \bigl | f(\xi _{j+1})-f(\xi _j) \bigr |}{\sum _{h=1}^{n-1} \bigl | f(\xi _{h+1})-f(\xi _h) \bigr |}. \end{aligned}$$
(30)

In particular, when \(\mathcal {W}\) is equally spaced so that \(\xi _2-\xi _1=\cdots =\xi _n-\xi _{n-1}\), the \(\min \)-\(\max \) design for estimating \(E_\mu \omega \) becomes a uniform distribution on \(\mathcal {X}\). Therefore, the uniform design should be theoretically justified for the binary response experiment to estimate the mean.

5 Conclusions

In this study, the optimal design problem of the CVM experiment was described from the perspective of information geometry. The problem is formulated as the minimization of the Cramér–Rao lower bound, which is equal to the squared Fisher information norm of the gradient vector of the parameter functional to be estimated, over a statistical manifold of finite probability measures. The problem is solved by using the duality of the (em)-connections on the manifold. The necessary and sufficient condition of the minimization is stated as the orthogonality between the gradient and its e-connections. The result is applied to a classical binary experiment to confirm that it replicates the results obtained in Ref. [11].

In this study, finite probability measures are considered to avoid the technical difficulties of infinite dimensional spaces. To enhance the applicability of the result of the paper, generalizing the model to an infinite dimensional manifold is critical. To find more application examples is crucial. In the “double-bounded” CVM, for example, each respondent is posed with a second question depending on the response to the first question: if the first offer is accepted, then the second bid is set higher than the first bid; whereas, if the first offer is rejected, the second bid is set smaller. Therefore, the response function is provided by the following expression:

$$\begin{aligned} (y,y')=\left( \mathbb {I}\{ \omega \le x \}, \mathbb {I}\{ \omega \le x' \} \right) \in \{0,1\}\times \{0,1\}, \end{aligned}$$

where x is the first bid and \(x'\) is the second bid. The statistical efficiency of the double-bounded CVM is considerably improved compared with the conventional single-bounded CVM [14]. Asymptotic properties of the nonparametric estimation of the model were extensively studied by Groeneboom and Jongbloed [12]. In the future, the optimal distribution of the sequential bidding prices \((x,x')\) can be determined by applying the result of this study.