Differential geometry for the optimal design of the contingent valuation method

Tanaka, Hisatoshi

doi:10.1007/s41884-023-00109-w

Differential geometry for the optimal design of the contingent valuation method

Research Paper
Open access
Published: 31 May 2023

Volume 6, pages 413–433, (2023)
Cite this article

Download PDF

You have full access to this open access article

Information Geometry Aims and scope Submit manuscript

Differential geometry for the optimal design of the contingent valuation method

Download PDF

Hisatoshi Tanaka ORCID: orcid.org/0000-0002-1866-4695¹

1521 Accesses
4 Altmetric
Explore all metrics

Abstract

The contingent valuation method (CVM) is a widely used experimental method to measure the monetary value of goods. However, CVM estimates are sensitive to experiment design. In this study, we formulated the optimal design problem as a minimization problem of the Fisher information metric of a gradient vector field generated by using the statistical model of the CVM. Furthermore, a necessary and sufficient condition of the optimal design was proven.

The Lagrangian, constraint qualifications and economics

Article Open access 30 June 2022

Sensitivity analysis for set-valued equilibrium problems

Article 30 March 2020

Second-order composed contingent derivatives of perturbation maps in set-valued optimization

Article 01 August 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The contingent valuation method (CVM) using discrete response valuation questions is a widely used experimental method to measure the monetary value of nonmarket environmental goods. In the experiment, an agent is asked if she will buy a certain good at the price of x or not. She will accept the offered price if her willingness-to-pay (WTP) $\omega $ to the good is higher than x. Let $y = 0$ if x is accepted and $y = 1$ if rejected, that is,

$$\begin{aligned} y=\mathbb {I}\{ \omega \le x \}. \end{aligned}$$

(1)

Let $\mu $ be the distribution of $\omega $. The objective of the experiment is to estimate the value of $\theta (\mu )$, where $\theta $ is a given function of $\mu $. For example, if the mean value of the WTP is to be determined, $\theta (\mu )=\int \omega \, d\mu (\omega )$ should be estimated. By observing independent copies of (x, y) obtained from (1), the value of $\theta (\mu )$ could be consistently estimated by standard statistical techniques such as probit, logit, or nonparametric maximum likelihood [18].

The survey design problem has been a major problem since the CVM was introduced by Bishop and Heberlein [6] and Hanemann [13]. For a survey question, a statistician should choose the bidding price distribution $\nu $, from which x is randomly sampled. WTP estimates derived using the CVM are sensitive to the choice of $\nu $ [9, 11, 17]. Optimal designs for $\nu $ are to ease the sensitivity problem by minimizing the variance of these estimates. Cooper [10] proposed the optimal design using logit formulation of $\mu $, and Alberini [1] studied the design for the probit. Duffield and Patterson [11] considered the optimal design for nonparametric $\mu $. Kanninen [16] generalized the results to the multinomial logit model, where y takes multiple discrete values. For comprehensive surveys on the literature, please refer to [7, 8, 15].

This study investigated the optimal design problem from the perspective of information geometry. We generalize the nonparametric approach of Duffield and Patterson [11] by considering general response $y=\rho (\omega , x)$ and nonspecified target $\theta (\mu )$. Under the general settings, we formulated the optimal design problem as the minimization of the Cramér–Rao lower bound of $\theta (\mu )$ over a set of the bidding price distributions $\nu $. The problem was solved using general optimization techniques because it is the optimization of a function over a finite-dimensional space. However, in such approaches, the computation could be messy and a solution would be less intuitive. Instead, we used the information geometry method to formulate the problem. Because the Cramér–Rao lower bound is equal to the squared Fisher norm of a tangent vector field on the statistical manifold, the necessary and sufficient condition for the optimal design is concisely stated through dual connections [2,3,4].

The remainder of this paper is organized as follows: Sect. 2 introduces the geometry of finite measures. Section 3 presents the results of this study, which includes a necessary and sufficient condition for optimal design. According to this condition, a design becomes optimal if and only if it generates a vector field that is orthogonal to its own e-connection. In Sect. 4, the results are applied to the binary response experiment presented in (1). Section 5 presents the conclusion of the paper.

2 Geometry of finite measures

In this section, the geometry of finite measures is introduced. The terms and definitions are based on Chapter 2 of Ay et al. [4]. Let $\mathcal {I}=\{1,\ldots , n\}$ be an arbitrary finite set. The linear space of function $f: \mathcal {I}\rightarrow \mathbb {R}$ is denoted by $\mathcal {F}(I)$. The space has the canonical basis $\{e^i\in \mathcal {F}(\mathcal {I}):i\in \mathcal {I}\}$, where

$$\begin{aligned} e^i(j)= \left\{ \begin{array}{cc} 1 &{}\quad (j=i) \\ 0 &{}\quad (j\ne i). \end{array}\right. \end{aligned}$$

Each $f\in \mathcal {F}(\mathcal {I})$ is expressed as follows: $f=\sum _{i=1}^n f_i e^i$.

The dual space $\mathcal {S}(\mathcal {I}):=\mathcal {F}^*(\mathcal {I})$ is the set of signed measures $\mu :\mathcal {F}(\mathcal {I})\rightarrow \mathbb {R}$. The dual basis $\{\delta _1,\ldots ,\delta _n \}$ is defined as follows:

$$\begin{aligned} \delta _i(e^j)= \left\{ \begin{array}{cc} 1 &{} \quad (i=j) \\ 0 &{}\quad (i\ne j). \end{array}\right. \end{aligned}$$

Each $\mu \in \mathcal {S}(\mathcal {I})$ is expressed as $\mu =\sum _{i=1}^n \mu ^i \delta _i$ with coefficients $\{\mu ^1,\ldots ,\mu ^n\}$, such that

$$\begin{aligned} \mu (f):=\int _{\mathcal {I}} f\, d\mu :=\sum _{i=1}^n\mu ^if_i\in \mathbb {R} \end{aligned}$$

and

$$\begin{aligned} f\cdot \mu :=\sum _{i=1}^n f_i\mu ^i \delta _i\in \mathcal {S}(\mathcal {I}). \end{aligned}$$

On $\mathcal {S}(\mathcal {I})$, we introduce a coordination system by $\mu \mapsto (\mu ^1,\ldots ,\mu ^n)$. Given point $\mu \in \mathcal {S}(\mathcal {I})$, the tangent space of $\mathcal {S}(\mathcal {I})$ is $T_\mu \mathcal {S}(\mathcal {I})=\text {Span}\left\{ \frac{\partial }{\partial \mu ^1},\ldots ,\frac{\partial }{\partial \mu ^n} \right\} $. The m-representation of a tangent vector $a\in T_\mu \mathcal {S}(\mathcal {I})$ is

$$\begin{aligned} a(\mu )=\sum _{i=1}^n a^i \delta _i \in \mathcal {S}(\mathcal {I}), \end{aligned}$$

which allow us to identify $T_\mu \mathcal {S}(\mathcal {I})$ with $\mathcal {S}(\mathcal {I})$. In the following part, the tangent vectors and spaces are always given in the form of their m-representations.

Let $\mathcal {M}_+(\mathcal {I})=\left\{ \mu \in \mathcal {S}(\mathcal {I}) \,:\, \mu ^i>0,\ i\in \mathcal {I} \right\} $. As an open submanifold of $\mathcal {S}(\mathcal {I})$, the tangent space of $\mathcal {M}_+(\mathcal {I})$ is identified with $\mathcal {S}(\mathcal {I})$. Given two tangent vectors a and b in $T_\mu \mathcal {M}_+(\mathcal {I})$, the Radon–Nikodym derivatives with respect to $\mu $ are denoted by the following expression:

$$\begin{aligned} \frac{\textrm{d}a}{\textrm{d}\mu }=\sum _{i=1}^n\frac{a^i}{\mu ^i}e^i\quad \text {and}\quad \frac{\textrm{d}b}{\textrm{d}\mu }=\sum _{i=1}^n\frac{b^i}{\mu ^i}e^i. \end{aligned}$$

The Fisher metric on $T_\mu \mathcal {M}_+(\mathcal {I})$ is now introduced by

$$\begin{aligned} \mathfrak {g}_\mu (a,b):=\mu \left( \frac{\textrm{d}a}{\textrm{d}\mu }\cdot \frac{\textrm{d}b}{\textrm{d}\mu } \right) =\sum _{i=1}^n \frac{a^i b^i}{\mu ^i}, \end{aligned}$$

(2)

and the Fisher norm is $\Vert a\Vert _\mu :=\sqrt{ \mathfrak {g}_\mu (a,a) }$.

Let $\mathcal {P}_+(\mathcal {I}):=\left\{ \mu \in \mathcal {M}_+(\mathcal {I}) \,:\, \sum _{i=1}^n\mu ^i=1 \right\} $, which is the set of positive probability measures on $\mathcal {I}$. The tangent space $T_\mu \mathcal {P}_+(\mathcal {I})$ is identified with

$$\begin{aligned} \mathcal {S}_0(\mathcal {I}):=\left\{ \mu \in \mathcal {S}(\mathcal {I}) \,:\, \sum _{i=1}^n\mu ^i=0 \right\} . \end{aligned}$$

Let $\theta :\mathcal {P}_+(\mathcal {I})\rightarrow \mathbb {R}$ be a smooth functional. The differential of $\theta $ in $\mu $ is a linear form $(d \theta )_\mu :T_\mu \mathcal {P}_+(\mathcal {I})\rightarrow \mathbb {R}$ that is obtained by

$$\begin{aligned} (d \theta )_\mu a:=\frac{\partial \theta }{\partial a}(\mu ):=\lim _{t\rightarrow 0}\frac{\theta (\mu +ta)-\theta (\mu )}{t}. \end{aligned}$$

The Fisher metric allows the differential to be identified with the gradient $(\partial \theta )_\mu $:

$$\begin{aligned} (d \theta )_\mu a\equiv \mathfrak {g}_\mu (a,(\partial \theta )_\mu ),\quad a\in T_\mu \mathcal {P}_+(\mathcal {I}). \end{aligned}$$

(3)

The gradient vector field of $\theta $ is as follows:

$$\begin{aligned} \partial \theta : \mathcal {P}_+(\mathcal {I})\rightarrow T\mathcal {P}_+(\mathcal {I}), \ \mu \mapsto (\partial \theta )_\mu . \end{aligned}$$

Given two points $\mu $ and $\mu '$ in $\mathcal {P}_+(\mathcal {I})$, the m-parallel transport is determined by the following expression:

$$\begin{aligned} \varPi ^{(m)}_{\mu ,\mu '}:T_\mu \mathcal {P}_+(\mathcal {I})= \mathcal {S}_0(\mathcal {I})\rightarrow T_{\mu '}\mathcal {P}_+(\mathcal {I})= \mathcal {S}_0(\mathcal {I}),\quad a\mapsto a. \end{aligned}$$

The e-parallel transport $\varPi _{\mu ,\mu '}^{(e)}:T_\mu \mathcal {P}_+(\mathcal {I})\rightarrow T_{\mu '}\mathcal {P}_+(\mathcal {I})$ is the conjugate of the m-transport and satisfies

$$\begin{aligned} \mathfrak {g}_{\mu '}\left( \varPi ^{(e)}_{\mu ,\mu '}a, \varPi ^{(m)}_{\mu ,\mu '}b \right) \equiv \mathfrak {g}_{\mu }\left( a, b \right) . \end{aligned}$$

For two smooth vector fields $A:\mu \mapsto a_\mu $ and $B:\mu \mapsto b_\mu $ on $\mathcal {P}_+(\mathcal {I})$, the m-connection $\nabla ^{(m)}$ and e-connection $\nabla ^{(e)}$ are defined by the following expression:

$$\begin{aligned} \left. \nabla ^{(m)}_A B \right| _\mu := \lim _{t\rightarrow 0}\frac{ \varPi ^{(m)}_{\mu +ta_\mu ,\mu }b_{\mu +ta_\mu }-b_\mu }{t}= \frac{\partial b}{\partial a_\mu }(\mu ) \end{aligned}$$

(4)

and

$$\begin{aligned} \left. \nabla ^{(e)}_A B \right| _\mu:= & {} \lim _{t\rightarrow 0}\frac{ \varPi ^{(e)}_{\mu +ta_\mu ,\mu }b_{\mu +ta_\mu }-b_\mu }{t}\nonumber \\= & {} \frac{\partial b}{\partial a_\mu }(\mu ) -\left( \frac{\textrm{d}a_\mu }{\textrm{d}\mu }\cdot \frac{\textrm{d}b_\mu }{\textrm{d}\mu }-\mathfrak {g}_\mu (a_\mu ,b_\mu ) \right) \mu . \end{aligned}$$

(5)

See Appendix A.1 for the proof. According to the definitions,

$$\begin{aligned} \frac{\partial }{\partial c_\mu } (\mathfrak {g}(A,B))_\mu =\mathfrak {g}_\mu \Bigl (\left. \nabla ^{(m)}_C A \right| _\mu ,B_\mu \Bigr ) +\mathfrak {g}_\mu \Bigl ( A_\mu , \left. \nabla ^{(e)}_C B \right| _\mu \Bigr ) \end{aligned}$$

(6)

holds for three arbitrary vector fields A, B, and C, where $\mathfrak {g}(A,B)$ denotes a function $\mu \mapsto \mathfrak {g}_\mu (a_\mu ,b_\mu )$.

3 Main results

3.1 Model

Suppose that a mapping $\rho :\mathcal {W}\times \mathcal {X}\mapsto \mathcal {Y}$ is provided, where $\mathcal {W}=\{\omega _1,\ldots ,\omega _n \}$, $\mathcal {X}=\{x_1,\ldots ,x_m \}$, and $\mathcal {Y}=\{y_1,\ldots ,y_\ell \}$ are arbitrary finite sets. Let $\sigma \in \mathcal {S}(\mathcal {W})$, $\tau \in \mathcal {S}(\mathcal {X}\times \mathcal {Y})$, and $f\in \mathcal {F}(\mathcal {W}\times \mathcal {X}\times \mathcal {Y})$. In the following, $\sigma (f)$ and $\tau (f)$ denote functions that are respectively determined by the following expression:

$$\begin{aligned} \sigma (f): \mathcal {X}\times \mathcal {Y}\rightarrow \mathbb {R},\ (x,y)\mapsto \int f(\omega ,x,y)\,\textrm{d}\sigma (\omega )=\sum _{i=1}^n f(\omega _i,x,y)\sigma ^i \end{aligned}$$

and

$$\begin{aligned} \tau (f): \mathcal {W}\rightarrow \mathbb {R},\ \omega \mapsto \int f(\omega ,x,y)\,\textrm{d}\tau (x,y)=\sum _{j=1}^m\sum _{k=1}^\ell f(\omega ,x_j,y_k)\tau ^{j,k}. \end{aligned}$$

Specifically, for a function $\mathbb {I}_\rho :\mathcal {W}\times \mathcal {X}\times \mathcal {Y}\rightarrow \{0,1\}$ such that $\mathbb {I}_\rho (\omega ,x,y)=\mathbb {I}\{ \rho (\omega ,x)=y\}$, $\mu (\mathbb {I}_\rho )(x,y)=\sum _{i=1}^n\mathbb {I}\{\rho (\omega _i,x )=y \}\mu ^i$ provides the conditional distribution of $y=\rho (\omega ,x)$ conditioned on x when $\omega $ is distributed according to $\mu \in \mathcal {P}_+(\mathcal {W})$. This is because

$$\begin{aligned} \textbf{P}\{ y=y_k|x=x_j\}= & {} E_\mu \left[ \mathbb {I}\{ y=y_k\} \mid x=x_j\right] \\= & {} E_\mu \left[ \mathbb {I}\{ \rho (\omega ,x_j)=y_k\} \right] \\= & {} \sum _{i=1}^n \mathbb {I}\{ \rho (\omega _i,x_j)=y_k\} \mu ^i. \end{aligned}$$

We denote the joint distribution of x and y, where $\mu \in \mathcal {P}_+(\mathcal {W})$ and $\nu \in \mathcal {P}_+(\mathcal {X})$, as

$$\begin{aligned} \rho (\mu ,\nu ):=\mu (\mathbb {I}_\rho )\cdot \nu , \end{aligned}$$

(7)

so that $\rho (\mu ,\nu )(x_j,y_k)=\sum _{i=1}^n\mathbb {I}\{ \rho (\omega _i,x_j)=y_k \}\mu ^i\nu ^j$, in which $\rho $ is considered as a mapping $\mathcal {P}_+(\mathcal {W})\times \mathcal {P}_+(\mathcal {X})\rightarrow \mathcal {P}_+(\mathcal {X}\times \mathcal {Y})$. For simplicity of subsequent description, let

$$\begin{aligned} \rho _\nu (\cdot )=\rho (\cdot ,\nu ) \end{aligned}$$

and

$$\begin{aligned} \rho _\mu (\cdot )=\rho (\mu ,\cdot ). \end{aligned}$$

Given $f\in \mathcal {F}(\mathcal {X}\times \mathcal {Y})$, the expectations and conditional expectations of f(x, y) are computed as follows:

$$\begin{aligned} E_{\mu , \nu }[f(x,y)]= & {} \rho (\mu ,\nu )( f)\\= & {} \sum _{j=1}^m\sum _{k=1}^\ell f(x_i,y_k) \sum _{i=1}^n\mathbb {I}\{\rho (\omega _i,x_j)=y_k\}\mu ^i\nu ^j\\= & {} \sum _{i=1}^n\sum _{j=1}^m f(x_j,\rho (\omega _i,x_j)) \mu ^i\nu ^j, \\ E_\mu [f(x,y)|x]= & {} \sum _{i=1}^n f(x,\rho (\omega _i,x)) \mu ^i=\mu ( f(x,\rho (\cdot ,x)) ), \end{aligned}$$

and

$$\begin{aligned} E_\nu [f(x,y)|\omega ]=\sum _{j=1}^m f(x_j,\rho (\omega ,x_j)) \nu ^j=\nu ( f(\cdot , \rho (\omega ,\cdot )) ). \end{aligned}$$

For $g\in \mathcal {F}(\mathcal {W})$, the conditional expectation of $g(\omega )$ is expressed as follows:

$$\begin{aligned} E_\mu [g(\omega )|x,y]=\frac{ \sum _{i=1}^n g(\omega _i) \mathbb {I}\{\rho (\omega _i,x)=y\}\mu ^i }{ \sum _{i=1}^n \mathbb {I}\{\rho (\omega _i,x)=y\}\mu ^i } =\frac{\mu (g\cdot \mathbb {I}_\rho )(x,y)}{\mu (\mathbb {I}_\rho )(x,y)}. \end{aligned}$$

Let $\mathcal {E}$ be an experiment introduced by $\rho $ on $\mathcal {X}\times \mathcal {Y}$:

$$\begin{aligned} \mathcal {E}:=R(\rho ):=\left\{ \rho (\mu ,\nu )\,:\, \mu \in \mathcal {P}_+(\mathcal {W}),\nu \in \mathcal {P}_+(\mathcal {X})\right\} , \end{aligned}$$

(8)

where $R(\cdot )$ denotes the range of the given mapping. Two subsets, $\mathcal {E}_\nu $ and $\mathcal {E}_\mu $, are also provided by $\mathcal {E}_\nu :=R(\rho _\nu )$ and $\mathcal {E}_\mu :=R(\rho _\mu )$. In the following expression, we assume that

(A1)
$\rho (\mu ,\nu )>0$ for every $\mu \in \mathcal {P}_+(\mathcal {W})$ and $\nu \in \mathcal {P}_+(\mathcal {X})$, and that
(A2)
$\rho (\delta _1,\nu ), \ldots , \rho (\delta _n,\nu )$ are linearly independent, where $\{\delta _1,\ldots ,\delta _n\}$ is the basis of $\mathcal {S}(\mathcal {W})$.

Under (A1), $\mathcal {E}$ becomes a submanifold of $\mathcal {P}_+(\mathcal {X}\times \mathcal {Y})$, and $\mathcal {E}_\nu $ and $\mathcal {E}_\mu $ are submanifolds of $\mathcal {E}$. Moreover, the following result is obtained:

Proposition 1

Assume (A1) and (A2). Let $\theta :\mathcal {P}_+(\mathcal {W}) \rightarrow \mathbb {R}$ be an arbitrary mapping. Given $\nu $, a mapping $\kappa _\nu : \mathcal {E}_\nu \rightarrow \mathbb {R}$ exist such that

$$\begin{aligned} \kappa _\nu ( \rho _\nu (\mu ) ) \equiv \theta (\mu ). \end{aligned}$$

(9)

Proof

For arbitrary $\mu _1$ and $\mu _2$,

$$\begin{aligned} \rho _\nu (\mu _1)-\rho _\nu (\mu _2)=\mu _1(\mathbb {I}_\rho )\cdot \nu -\mu _2(\mathbb {I}_\rho )\cdot \nu =\sum _{i=1}^{n-1}(\mu _1^i-\mu _2^i)(\delta _i(\mathbb {I}_\rho )-\delta _n(\mathbb {I}_\rho ))\cdot \nu , \end{aligned}$$

which becomes 0 when and only when $\mu _1=\mu _2$. Therefore, under (A1) and (A2), $\rho _\nu $ becomes one to one. Let $\kappa _\nu := \theta \circ (\rho _\nu )^{-1}$, and the proposition is shown. $\square $

The proposition reveals that (A1) and (A2) are sufficient conditions for the statistical identification of $\theta (\mu )$. In the experiment, independent realizations of (x, y), with which $\rho _\nu (\mu )$ is estimated, were observed. The existence of a one-to-one correspondence between $\rho _\nu (\mu )$ and $\theta (\mu )$ implies that the value of $\theta (\mu )$ is statistically estimated from the observations.

Because $\rho _\nu $ and $\rho _\mu $ are linear mappings, their differentials are obtained by $(d\rho _\nu )_\mu :\sigma \mapsto \rho _\nu (\sigma )=\rho (\sigma ,\nu )$ and $(d\rho _\mu )_\nu : \eta \mapsto \rho _\mu (\eta )=\rho (\mu ,\eta )$. Furthermore, as $\rho (\mu ,\nu )$ is bilinear in $(\mu ,\nu )$, its differential at $(\mu ,\nu )$ is obtained by the following equation:

$$\begin{aligned} (d\rho )_{\mu ,\nu }=(d\rho _\nu )_\mu +(d\rho _\mu )_\nu ,\quad (\sigma ,\eta )\mapsto \rho (\sigma ,\nu )+\rho (\mu ,\eta ). \end{aligned}$$

(10)

The tangent spaces of $\mathcal {E}_\nu $ and $\mathcal {E}_\mu $ at $\rho (\mu ,\nu )$ are $T_{\rho (\mu ,\nu )}\mathcal {E}_\nu = R((d\rho _\nu )_\mu )$ and $T_{\rho (\mu ,\nu )}\mathcal {E}_\mu = R((d\rho _\mu )_\nu )$, which are orthogonal to one another because of the following:

$$\begin{aligned}{} & {} \mathfrak {g}_{\rho (\mu ,\nu )}\left( (d\rho _\nu )_\mu \sigma ,\, (d\rho _\mu )_\nu \eta \right) \\{} & {} \quad =\sum _{j=1}^m\sum _{k=1}^\ell \frac{\left[ \sum _{i=1}^n\mathbb {I}\{ \rho (\omega _i,x_j)=y_k \}\sigma ^i\nu ^j \right] \cdot \left[ \sum _{i=1}^n\mathbb {I}\{ \rho (\omega _i,x_j)=y_k \}\mu ^i\eta ^j \right] }{\sum _{i=1}^n\mathbb {I}\{ \rho (\omega _i,x_j)=y_k \}\mu ^i\nu ^j}\\{} & {} \quad =\sum _{i=1}^n\sum _{j=1}^m\sum _{k=1}^\ell \mathbb {I}\{ \rho (\omega _i,x_j)=y_k \} \sigma ^i \eta ^j\\{} & {} \quad = \sum _{i=1}^n\sigma ^i\sum _{j=1}^m \eta ^j=0 \end{aligned}$$

for every $\sigma \in T_\mu \mathcal {P}_+(\mathcal {W})= \mathcal {S}_0(\mathcal {W})$ and $\eta \in T_\nu \mathcal {P}_+(\mathcal {X})= \mathcal {S}_0(\mathcal {X})$. The tangent space of $\mathcal {E}$ at $\rho (\mu ,\nu )$ is $T_{\rho (\mu ,\nu )}\mathcal {E}=T_{\rho (\mu ,\nu )}\mathcal {E}_\nu \oplus T_{\rho (\mu ,\nu )}\mathcal {E}_\mu $.

The adjoint operator $(d\rho _\nu )^*_\mu $ is determined by the following expression:

$$\begin{aligned} (d\rho _\nu )^*_\mu : T_{\rho (\mu ,\nu )}\mathcal {E}_\nu \rightarrow T_\mu \mathcal {P}_+(\mathcal {W}), \quad \tau \mapsto \tau \left( \frac{ \mathbb {I}_\rho }{ \mu (\mathbb {I}_\rho ) }\right) \cdot \mu , \end{aligned}$$

(11)

where $\tau =\sum _{j=1}^m\sum _{k=1}^\ell \tau ^{j,k}\delta _{j,k}$, $\{\delta _{j,k}\}$ is the basis of $\mathcal {S}(\mathcal {X}\times \mathcal {Y})$, and

$$\begin{aligned} \left[ \tau \left( \frac{ \mathbb {I}_\rho }{ \mu (\mathbb {I}_\rho ) }\right) \cdot \mu \right] (\omega _i)= \sum _{j=1}^m\sum _{k=1}^\ell \tau ^{j,k}\frac{ \mathbb {I}\{\rho (\omega _i,x_j)=y_k\} }{ \sum _{h=1}^n\mathbb {I}\{\rho (\omega _h,x_j)=y_k\}\mu ^h }\cdot \mu ^i. \end{aligned}$$

(12)

The operator is the adjoint of $(d\rho _\nu )_\mu $ because

$$\begin{aligned} \mathfrak {g}_{\mu }( \sigma , (d\rho _\nu )_\mu ^*\tau )= & {} \sum _{i=1}^n\frac{ \sigma ^i \cdot \tau \left( \frac{ \mathbb {I}_\rho }{ \mu (\mathbb {I}_\rho ) }\right) (\omega _i) \mu ^i }{\mu ^i}\\= & {} \sum _{i=1}^n \sigma ^i \sum _{j=1}^m\sum _{k=1}^\ell \tau ^{j,k}\frac{ \mathbb {I}\{\rho (\omega _i,x_j)=y_k\} }{ \sum _{h=1}^n\mathbb {I}\{\rho (\omega _h,x_j)=y_k\}\mu ^h }\\= & {} \sum _{j=1}^m\sum _{k=1}^\ell \tau ^{j,k}\frac{ \sum _{i=1}^n \mathbb {I}\{\rho (\omega _i,x_j)=y_k\}\sigma ^i\nu ^j }{ \sum _{i=1}^n\mathbb {I}\{\rho (\omega _i,x_j)=y_k\}\mu ^i\nu ^j }\\= & {} \sum _{j=1}^m\sum _{k=1}^\ell \frac{((d\rho _\nu )\sigma )(x_j,y_k)\cdot \tau (x_j,y_k)}{ \rho (\mu ,\nu )(x_j,y_k) }\\= & {} \mathfrak {g}_{\rho (\mu ,\nu )}\left( (d\rho _\nu )_\mu \sigma , \tau \right) . \end{aligned}$$

Note that the definition (12) of $(d\rho _\nu )_\mu ^*$ is independent of $\nu $.

3.2 Optimal design

Suppose that the goal of the experiment is to estimate the value of $\theta :\mathcal {P}_+(\mathcal {W})\rightarrow \mathbb {R}$ at a certain point $\mu $. In the following, we assume that

(A3)
$(\partial \theta )_\mu \in R((d \rho _\nu )_\mu ^*)$

for each $(\mu ,\nu )\in \mathcal {P}_+(\mathcal {W})\times \mathcal {P}_+(\mathcal {X})$. This is the differentiability condition in Ref. [22], and a regular estimation of $\theta (\mu )$ is possible only if the condition holds.

Proposition 2

Assume (A1)–(A3). The gradient $(\partial \kappa _\nu )_{\rho _\nu (\mu )}$ of $\kappa _\nu =\theta \circ (\rho _\nu )^{-1}:\mathcal {E}_\nu \rightarrow \mathbb {R}$ exist such that

$$\begin{aligned} (\partial \theta )_{\mu }=(d\rho _\nu )^*_{\mu } (\partial \kappa _\nu )_{ \rho _\nu (\mu ) }, \quad (\partial \kappa _\nu )_{ \rho _\nu (\mu ) } \in T_{ \rho _\nu (\mu ) }\mathcal {E}_\nu . \end{aligned}$$

(13)

Proof

Because $\kappa _\nu ( \rho _\nu (\mu ) ) \equiv \theta (\mu )$, the differential of $\kappa _\nu $ is a linear mapping $d \kappa _\nu $ such that

$$\begin{aligned} (d \kappa _\nu )_{ \rho _\nu (\mu ) } (d \rho _\nu ) \sigma = (d \theta )_\mu \sigma =\mathfrak {g}_{\mu }\left( \sigma , (\partial \theta )_\mu \right) \end{aligned}$$

for every $\sigma \in T_\mu \mathcal {P}_+(\mathcal {W})$. See the following diagram:

Under (A3), there exists $\hat{\tau } \in T_{\rho _\nu (\mu )}\mathcal {E}_\nu $ such that $(\partial \theta )_\mu =(d \rho _\nu )_\mu ^*\hat{\tau }$. Therefore,

$$\begin{aligned} (d \kappa _\nu )_{\rho _\nu (\mu )} (d \rho _\nu ) \sigma = \mathfrak {g}_\mu ( \sigma , (d\rho _\nu )_\mu ^* \hat{\tau }) = \mathfrak {g}_{\rho _\nu (\mu )}((d\rho _\nu )_\mu \sigma , \hat{\tau }), \end{aligned}$$

which implies $(d \kappa _\nu )_{\rho _\nu (\mu )}\tau =\mathfrak {g}_{\rho _\nu (\mu )}(\tau , \hat{\tau })$ holds for every $\tau \in T_{\rho _\nu (\mu )} \mathcal {E}_\nu $. Because such $\hat{\tau }$ is uniquely determined, $(\partial \kappa _\nu )_{\rho _\nu (\mu )}=\hat{\tau }$ satisfies the requirements of the proposition. $\square $

Equation (13) is typically referred to as the score equation. The solution $\partial \kappa _\nu $ to the equation introduces a vector field $\partial \kappa $ on $\mathcal {E}$ by

$$\begin{aligned} \partial \kappa : \rho (\mu ,\nu ) \mapsto (\partial \kappa _\nu )_{\rho _\nu (\mu )}. \end{aligned}$$

(14)

The optimal design is defined as a minimizer of the Cramér-Rao lower bound $\lambda (\theta |\nu )$ for the estimation of $\theta =\theta (\mu )$. The lower bound can be found by computing the inverse of the Fisher information matrix, which is the variance matrix of the score

$$\begin{aligned} \left( \frac{\partial }{\partial \mu ^1} \log \rho _\nu (\mu )(x,y), \ldots , \frac{\partial }{\partial \mu ^{n-1}} \log \rho _\nu (\mu )(x,y)\right) , \end{aligned}$$

where $\mu $ is parametrized by $\mu = \sum _{i=1}^{n-1}\mu ^i\delta _i+\left( 1-\sum _{i=1}^{n-1}\mu ^i\right) \delta _n$. The computation involves complex matrix calculations. Moreover, we minimized the value to determine the optimal design.

An expression of the lower bound can be determined by characterizing the bound as the supremum of the Cramér–Rao lower bound of one-dimensional submodels. Let $\epsilon >0$ be sufficiently small. Consider a smooth path $t\in (-\epsilon ,\epsilon )\mapsto \mu _t \in \mathcal {P}_+(\mathcal {W})$, which passes through $\mu $ at $t=0$ with a velocity

$$\begin{aligned} \sigma =\left( \frac{d}{dt}\right) _{t=0} \mu _t \in \mathcal {S}_0(\mathcal {W}). \end{aligned}$$

Notably,

$$\begin{aligned} \left( \frac{d}{dt}\right) _{t=0} \rho _\nu (\mu _t) = \sum _{i=1}^n \mathbb {I}\{\rho (\omega _i,x)=y \}\left[ \left( \frac{d}{dt}\right) _{t=0} \mu ^i_t\right] \nu (x) =(d\rho _\nu )_\mu \sigma . \end{aligned}$$

The Cramér–Rao lower bound of ‘true’ $t=0$ is the inverse of the Fisher information of the submodel at $t=0$. Because

$$\begin{aligned} \left( \frac{d}{dt} \right) _{t=0} \log \rho _\nu (\mu _t)=\frac{\textrm{d}( (d\rho _\nu )_\mu \sigma )}{\textrm{d}\rho _\nu (\mu )}, \end{aligned}$$

the Fisher information of the submodel is

$$\begin{aligned} E_{\mu ,\nu }\left( \left( \frac{d}{dt} \right) _{t=0} \log \rho _\nu (\mu _t) \right) ^2 = \mathfrak {g}_{\rho (\mu ,\nu )}\left( (d\rho _\nu )_\mu \sigma ,(d\rho _\nu )_\mu \sigma \right) = \Vert (d\rho _\nu )_{\mu } \sigma \Vert _{\rho (\mu ,\nu )}^2. \end{aligned}$$

The lower bound of $\theta =\theta (\mu )$ along with the one-parameter submodel $t\mapsto \rho _\nu (\mu _t)$ is given by the following expression:

$$\begin{aligned} \lambda (\sigma ):=\left( \frac{\partial \theta }{\partial \sigma }(\mu ) \right) ^2 \Vert (d\rho _\nu )_{\mu } \sigma \Vert _{\rho (\mu ,\nu )}^{-2}. \end{aligned}$$

Let $\hat{t}_S$ be the efficient estimator of $t=0$ attaining the lower bound, where S denotes the sample size: that is,

$$\begin{aligned} \sqrt{S}\cdot \hat{t}_S\ {\mathop {\rightarrow }\limits ^{d}}\ N(0, \Vert (d\rho _\nu )_{\mu } \sigma \Vert _{\rho (\mu ,\nu )}^{-2}). \end{aligned}$$

Given the submodel $t\mapsto \rho _\nu (\mu _t)$, the efficient estimator of $\theta (\mu )$ is given by $\hat{\theta }_S=\theta (\mu _{\hat{t}_S})$. By the Delta method, we have the following expression:

$$\begin{aligned} \sqrt{S}( \hat{\theta }_S-\theta )\ {\mathop {\rightarrow }\limits ^{d}}\ \frac{\partial \theta }{\partial \sigma }(\mu )\cdot N(0, \Vert (d\rho _\nu )_{\mu } \sigma \Vert _{\rho (\mu ,\nu )}^{-2})=N\left( 0, \lambda (\sigma )\right) \end{aligned}$$

holds (see e.g. Theorem 1.12 of Shao [21]). Because $\frac{\partial \theta }{\partial \sigma }(\mu )=\mathfrak {g}_{\mu }\left( (\partial \theta )_{\mu },\sigma \right) $,

$$\begin{aligned} \lambda (\sigma )= & {} \mathfrak {g}_{\mu }\left( (\partial \theta )_{\mu }, \frac{\sigma }{ \Vert (d\rho _\nu )_{\mu } \sigma \Vert _{\rho (\mu ,\nu )} } \right) ^2\\= & {} \mathfrak {g}_{\rho (\mu ,\nu )}\left( (\partial \kappa _\nu )_{\rho (\mu ,\nu )},\frac{(d\rho _\nu )_{\mu }\sigma }{\Vert (d\rho _\nu )_{\mu } \sigma \Vert _{\rho ({\mu },\nu )}} \right) ^2 \end{aligned}$$

by the score equation (13). The Cramér–Rao lower bound for the full model is equal to the supremum of $\lambda (\sigma )$ over the submodels $t\mapsto \rho _\nu (\mu _t)$ [5, 22]. Since $(\partial \kappa _\nu )_{\rho (\mu ,\nu )}\in R((d\rho _\nu )_{\mu })$,

$$\begin{aligned} \lambda (\theta |\nu )= & {} \sup _{\sigma \in T_{\mu }\mathcal {P}_+(\mathcal {W})}\lambda (\sigma )\\= & {} \mathfrak {g}_{\rho (\mu ,\nu )}\left( (\partial \kappa _\nu )_{\rho (\mu ,\nu )},\frac{ (\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{\Vert (\partial \kappa _\nu )_{\rho (\mu ,\nu )} \Vert _{\rho ({\mu },\nu )}} \right) ^2\\= & {} \Vert (\partial \kappa _\nu )_{\rho _\nu (\mu )} \Vert _{\rho (\mu ,\nu )}^2. \end{aligned}$$

The supremum of $\lambda (\sigma )$ is attained by $\sigma $ such that $(\partial \kappa _\nu )_{\rho (\mu ,\nu )}=(d\rho _\nu )_\mu \sigma $. A submodel having tangent vector $\sigma $ at $\mu $ produces the largest variance to estimate $\theta (\mu )$ among other submodels. Such submodels with tangent vector $\sigma $ are called the least favorable or the hardest submodel [23].

Proposition 3

The Cramér–Rao lower bound of $\theta =\theta (\mu )$ under $\nu $ is as follows:

$$\begin{aligned} \lambda (\theta |\nu )=\Vert (\partial \kappa _\nu )_{\rho _\nu (\mu )} \Vert ^2_{ \rho (\mu ,\nu ) }, \end{aligned}$$

(15)

where $(\partial \kappa _\nu )_{\rho _\nu (\mu )}$ is a solution to the score equation (13).

Proposition 4

$\lambda (\theta |\nu )$ is convex in $\nu $.

Proof

Let $G(\nu ):=[ g_{i,h}(\nu )]$ be an $n\times n$ matrix with the (i, h) element

$$\begin{aligned} g_{i,h}(\nu ):=\mathfrak {g}_{\rho (\mu ,\nu )}\left( \rho (\delta _i,\nu ), \rho (\delta _h,\nu ) \right) \end{aligned}$$

for $1\le i \le n$ and $1\le h\le n$. The matrix is linear in $\nu $ and nonsingular according to (A2). Because $(\partial \kappa _\nu )_{\rho (\mu ,\nu )}$ is in $T_{\mu }\mathcal {E}_\nu $, there exists $\hat{\sigma }_{\nu } \in T_{\mu }\mathcal {E}_\nu $ such that $(\partial \kappa _\nu )_{\rho (\mu ,\nu )}=\rho (\hat{\sigma }_{\nu },\nu )$. Moreover, for $1\le i \le n$, we have the following expression:

$$\begin{aligned} \frac{\textrm{d}((\partial \theta )_{\mu } )}{\textrm{d}\mu }(\omega _i)= & {} (\partial \kappa _\nu )_{\rho (\mu ,\nu )} \left( \frac{\mathbb {I}_\rho }{\mu (\mathbb {I}_\rho )} \right) (\omega _i)\nonumber \\= & {} \int \frac{\mathbb {I}_\rho (\omega _i,x,y)}{\mu (\mathbb {I}_\rho )(x,y)}\,\textrm{d}\rho (\hat{\sigma }_{\nu }, \nu )(x,y)\nonumber \\= & {} \int \frac{ \textrm{d}\rho (\delta _i,\nu )}{\textrm{d}\rho (\mu ,\nu )} \cdot \frac{ \textrm{d}\rho (\hat{\sigma }_{\nu },\nu )}{\textrm{d}\rho (\mu ,\nu )}\,\textrm{d}\rho (\mu , \nu )\nonumber \\= & {} \mathfrak {g}_{\rho (\mu ,\nu )}\left( \rho (\delta _i,\nu ), \rho (\hat{\sigma }_{\nu },\nu )\right) \nonumber \\= & {} \sum _{h=1}^n g_{i,h}(\nu )\hat{\sigma }_{\nu }^h. \end{aligned}$$

(16)

Let ${\varvec{\gamma }}=(\gamma _1,\ldots ,\gamma _n)^\top $ be a vector of coefficients of ${\textrm{d}((\partial \theta )_{\mu })}/{\textrm{d}\mu }$, and let $\hat{{\varvec{\sigma }}}_{\nu }=(\hat{\sigma }_{\nu }^1,\ldots ,\hat{\sigma }_{\nu }^n)^\top $. Then, (16) implies that $\hat{{\varvec{\sigma }}}_{\nu }=G(\nu )^{-1}{\varvec{\gamma }}$. From Proposition 3,

$$\begin{aligned} \lambda (\theta |\nu )= \hat{{\varvec{\sigma }}}_{\nu }^\top G(\nu )\hat{{\varvec{\sigma }}}_{\nu } ={\varvec{\gamma }}^\top G(\nu )^{-1}{\varvec{\gamma }}. \end{aligned}$$

Therefore, we have the following expression:

$$\begin{aligned} \lambda (\theta |\,t\nu _1+(1-t)\nu _2\,)= & {} {\varvec{\gamma }}^\top G(t\nu _1+(1-t)\nu _2)^{-1}{\varvec{\gamma }}\\= & {} {\varvec{\gamma }}^\top \Bigl [ tG(\nu _1)+(1-t)G(\nu _2)\Bigr ]^{-1}{\varvec{\gamma }} \\\le & {} t\lambda (\theta |\nu _1)+(1-t)\lambda (\theta |\nu _2) \end{aligned}$$

for arbitrary $\nu _1$ and $\nu _2$ in $\mathcal {P}_+(\mathcal {X})$ and for any $t\in (0,1)$ because of the convexity of the matrix inversion: for any positive definite matrices A and B,

$$\begin{aligned} ( t A+(1-t)B )^{-1} \le t A^{-1}+(1-t)B^{-1} \end{aligned}$$

(17)

holds, where the inequality is in the sense of positive definite matrices [19, 20]. $\square $

Definition 1

The optimal design for $\theta =\theta (\mu )$ is ${\nu }\in \mathcal {P}_+(\mathcal {X})$ such that

$$\begin{aligned} \lambda (\theta |{\nu }) =\inf _{ \nu ' \in \mathcal {P}_+(\mathcal {X}) } \lambda (\theta |\nu '). \end{aligned}$$

(18)

Theorem 1

${\nu }$ is optimal for $\theta =\theta (\mu )$ if and only if

$$\begin{aligned} \mathfrak {g}_{\rho (\mu ,\nu )}\left( (\partial \kappa _\nu )_{\rho (\mu ,\nu )}, \nabla _H^{(e)}(\partial \kappa _\nu )_{\rho (\mu ,\nu )} \right) =0 \end{aligned}$$

(19)

holds for any $H\in T_{\rho (\mu ,\nu )}\mathcal {E}_{\mu }$ (Fig. 1).

Proof

Because $\nu \mapsto \lambda (\theta |\nu )$ is convex, the lower bound is minimized at $\nu $ if and only if the first order condition

$$\begin{aligned} \left( \frac{\partial }{\partial \eta }\right) \lambda (\theta |\nu )=0 \end{aligned}$$

(20)

holds for any $\eta \in T_\nu \mathcal {P}_+(\mathcal {X})$. Because $\lambda (\theta |\nu )=\Vert (\partial \kappa _\nu )_{\rho (\mu ,\nu )}\Vert _{\rho (\mu ,\nu )}^2$, (20) is equivalent to

$$\begin{aligned} \mathfrak {g}\left. \left( \nabla _H^{(m)}\partial \kappa ,\partial \kappa \right) \right| _{\rho (\mu ,\nu )} +\mathfrak {g}\left. \left( \partial \kappa , \nabla _H^{(e)}\partial \kappa \right) \right| _{\rho (\mu ,\nu )}=0 \end{aligned}$$

for $H=(d\rho _{\mu })_{\nu }\eta \in T_{\rho (\mu ,\nu )}\mathcal {E}_{\mu }$. From the definition of $\nabla ^{(m)}$,

$$\begin{aligned} \left. \nabla _H^{(m)}\partial \kappa \right| _{\rho (\mu ,\nu )}=\lim _{t\rightarrow 0}\frac{ (\partial \kappa _{\nu +t \eta })_{\rho (\mu ,\nu +t\eta )}-(\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{ t }. \end{aligned}$$

Because $(d\rho _{\nu +t\eta })_\mu ^*=(d\rho _\nu )_\mu ^*$,

$$\begin{aligned} (d\rho _\nu )_{\mu }^* \left( \left. \nabla _H^{(m)}\partial \kappa \right| _{\rho (\mu ,\nu )}\right)= & {} \lim _{t\rightarrow 0}\frac{ (d\rho _{\nu +t\eta })_{\mu }^*(\partial \kappa _{\nu +t \eta })_{\rho (\mu ,\nu +t\eta )}-(d\rho _\nu )_{\mu }^*(\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{ t }\\= & {} \lim _{t\rightarrow 0}\frac{ (\partial \theta )_{\mu }-(\partial \theta )_{\mu } }{ t }=0. \end{aligned}$$

Therefore, we have the following expression:

$$\begin{aligned} \left. \mathfrak {g}\left( \nabla _H^{(m)}\partial \kappa , \partial \kappa \right) \right| _{\rho (\mu ,\nu )}= & {} \mathfrak {g}_{\mu }\left( (d\rho _\nu )_{\mu }^*\left( \left. \nabla _H^{(m)}\partial \kappa \right| _{\rho (\mu ,\nu )}\right) ,\, \hat{\sigma }_{\nu } \right) \nonumber \\ {}= & {} 0, \end{aligned}$$

(21)

where $(\partial \kappa )_{\rho (\mu ,\nu )}=(\partial \kappa _\nu )_{\rho (\mu ,\nu )}=(d\rho _\nu )_\mu \hat{\sigma }_\nu $. $\square $

Corollary 1

${\nu }$ is the optimal design for $\theta (\mu )$ if and only if

$$\begin{aligned} E_{\mu }\left[ \left. \left( \frac{\textrm{d} (\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{\textrm{d} \rho (\mu ,{\nu })} (x,y)\right) ^2 \right| x \right] = E_{\mu ,\nu }\left[ \left( \frac{\textrm{d} (\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{\textrm{d} \rho (\mu ,{\nu })} (x,y)\right) ^2 \right] \end{aligned}$$

(22)

for all $x\in \mathcal {X}$.

Proof

From the definition of $\nabla ^{(e)}$, we have the following equation:

$$\begin{aligned} \left. \nabla _H^{(e)}\partial \kappa \right| _{\rho (\mu ,\nu )} = \left. \nabla _H^{(m)}\partial \kappa \right| _{\rho (\mu ,\nu )} - \left( \frac{\textrm{d}\eta }{ \textrm{d}\nu }\cdot \frac{\textrm{d}(\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{ \textrm{d}\rho (\mu ,\nu ) } \right) \rho (\mu ,\nu ). \end{aligned}$$

Note that $\mathfrak {g}_{\rho (\mu ,\nu )}(\rho (\mu ,\eta ), (\partial \kappa _\nu )_{\rho (\mu ,\nu )})=0$ because $\rho (\mu ,\eta ) \in T_{\rho (\mu ,\nu )}\mathcal {E}_{\mu }$ and $(\partial \kappa _\nu )_{\rho (\mu ,\nu )}\in T_{\rho (\mu ,\nu )}\mathcal {E}_\nu $. Therefore, (19) is equivalent to

$$\begin{aligned} E_{\mu ,\nu }\left[ \frac{\textrm{d}\eta }{ \textrm{d}\nu }(x)\cdot E_{\mu }\left[ \left. \left( \frac{\textrm{d}(\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{ \textrm{d}\rho (\mu ,\nu ) }(x,y)\right) ^2 \right| x\right] \right] =0, \end{aligned}$$

which holds for an arbitrary $\eta \in \mathcal {S}_0(\mathcal {X})$ if and only if (22) is satisfied. $\square $

The intuition behind this condition can be obtained from the following expression:

$$\begin{aligned} \lambda (\theta \mid {\nu })= \int E_{\mu }\left[ \left. \left( \frac{ \textrm{d}(\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{ \textrm{d}\rho (\mu ,\nu ) }(x,y)\right) ^2\,\right| \, x\, \right] \, \textrm{d} {\nu }(x). \end{aligned}$$

(23)

If the lower bound is minimized at $ {\nu }$, any small perturbations that are added to $ {\nu }$ will not significantly change the value of $\lambda (\theta \mid {\nu })$. This condition is possible if and only if the integrand on the right-hand side is independent of x.

Example 1

To see how the theorem works, let us consider a trivial response function $\rho (\omega ,x)=\omega +x$, where $\mathcal {W}$ and $\mathcal {X}$ are subsets of $\mathbb {R}$. In this case, the joint density of (x, y) is given by $\rho (\mu ,\nu )(x,y)=\sum _{i=1}^n\mathbb {I}\{\omega _i+x=y\}\mu ^i\nu (x)=\mu (y-x)\nu (x)$. The differential of $\rho _\nu $ and its adjoint are as follows:$(d\rho _\nu )_\mu \sigma (x,y)=\sigma (y-x)\nu (x)$ and $(d\rho _\nu )_\mu ^*\tau (\omega )=\sum _{j=1}^m \tau (x_j,x_j+\omega )$ because

$$\begin{aligned} \mathfrak {g}_{\rho (\mu ,\nu )}\left( (d\rho _\nu )_\mu \sigma ,\tau \right)= & {} \sum _{j=1}^m \sum _{k=1}^\ell \frac{ \sigma (y_k-x_j)\nu (x_j)\cdot \tau (x_j,y_k) }{ \mu (y_k-x_j)\nu (x_j) }\\= & {} \sum _{i=1}^n \frac{ \sigma (\omega _i)\sum _{j=1}^m\tau (x_j,\omega _i+x_j) }{ \mu (\omega _i) }\\= & {} \mathfrak {g}_{\mu }\Bigl ( \sigma ,\ \sum _{j=1}^m\tau (x_j,\cdot +x_j) \Bigr ). \end{aligned}$$

The score equation $(\partial \theta )_\mu (\omega )=\sum _{j=1}^m\partial \kappa _\nu (x_j,\omega +x_j)$ is solved by $\partial \kappa _\nu (x,y)=(\partial \theta )_\mu (y-x)\nu (x)$. For every $\eta \in T_\nu \mathcal {P}_+(\mathcal {X})$ and $H=\rho (\mu ,\eta )$,

$$\begin{aligned} \nabla _H^{(m)}\partial \kappa _\nu (x,y)=(\partial \theta )_\mu (y-x)\eta (x) \end{aligned}$$

and

$$\begin{aligned} \nabla _H^{(e)}\partial \kappa _\nu (x,y)= & {} \nabla _H^{(m)}\partial \kappa _\nu (x,y)- \frac{\textrm{d}\eta }{ \textrm{d}\nu }(x) \cdot \frac{\textrm{d}\partial \kappa _\nu }{ \textrm{d}\rho (\mu ,\nu )}(x,y) \cdot \rho (\mu ,\nu )(x,y) \\= & {} (\partial \theta )_\mu (y-x)\eta (x)-\frac{\textrm{d}\eta }{ \textrm{d}\nu }(x)\cdot (\partial \theta )_\mu (y-x)\nu (x)\\= & {} 0. \end{aligned}$$

Therefore, the condition (19) is trivially satisfied at arbitrary $\nu $. In this example, $\omega $ is always observable because $\rho $ is invertible as $\omega =y-x$. The distribution of x does not affect estimation efficiency. Thus, the choice of $\nu $ becomes significant only when a model with information loss is estimated.

The optimality can be checked by applying the corollary, too. In this example,

$$\begin{aligned} \frac{ \textrm{d}\partial \kappa _\nu }{ \textrm{d}\rho (\mu ,\nu ) }(x,y)=\frac{ (\partial \theta )_\mu (y-x)\nu (x) }{ \mu (y-x)\nu (x) }=\frac{ (\partial \theta )_\mu (y-x)}{ \mu (y-x)}, \end{aligned}$$

which is independent of $\nu $. Because

$$\begin{aligned} E_\mu \left[ \left. \left( \frac{ \textrm{d}\partial \kappa _\nu }{ \textrm{d}\rho (\mu ,\nu ) }(x,y)\right) ^2 \right| \,x\, \right] = \sum _{i=1}^n\left( \frac{ (\partial \theta )_\mu (\omega _i)}{ \mu (\omega _i)}\right) ^2\mu (\omega _i) \end{aligned}$$

is independent of x, arbitrary $\nu $ becomes optimal when $\rho (\omega ,x)=\omega +x$.

4 Binary response experiment

The optimal design to estimate the mean WTP $E\omega $ with the binary response (1) and nonparametric $\mu $ was proposed by Duffield and Patterson [11]. They directly minimized the asymptotic variance of the maximum likelihood estimator of $E\omega $ to determine the optimal design. In this section, we apply Theorem 1 to replicate their result.

Let $\mathcal {W}=\{\xi _1,\ldots ,\xi _n\}$, $\mathcal {X}=\{\xi _1,\ldots ,\xi _{n-1}\}$, and $\mathcal {Y}=\{0,1\}$, where $\xi _1,\ldots ,\xi _n$ are n real numbers such that $0\le \xi _1<\cdots <\xi _n$. Because $\omega \le \xi _n$ holds with probability one, the highest price $\xi _n$ is not contained in the support of x.

In the experiment, the joint distribution of (x, y) is obtained by the following expression:

$$\begin{aligned} \rho (\mu ,\nu )(\xi _j,y)=\left( y\mu [0,\xi _j]+(1-y)\mu (\xi _j,\infty )\right) \nu ^j \end{aligned}$$

(24)

for $j=1,\ldots ,n-1$ and $y=0,1$, where $\mu [0,\xi _j]:=\sum _{i\le j}\mu ^i$ and $\mu (\xi _j,\infty ):=\sum _{i>j}\mu ^i$. The model satisfies the condition (A1) because $\mu (\mathbb {I}_\rho )(\xi _j,y)\ge y \mu ^1+(1-y)\mu ^n>0$ holds for any $(\xi _j,y)\in \mathcal {X}\times \mathcal {Y}$. Assume that, for $c_1,\ldots ,c_n\in \mathbb {R}$, $\sum _{i=1}^n c_i\delta _i(\mathbb {I}_\rho )(\xi _j,y)=0$ holds for any $(\xi _j,y)\in \mathcal {X}\times \mathcal {Y}$. Because

$$\begin{aligned} \delta _i(\mathbb {I}_\rho )(\xi _j,y)=y\mathbb {I}\{ i\le j \}+(1-y)\mathbb {I}\{ i> j \}, \end{aligned}$$

$y\sum _{i\le j}c_i+(1-y)\sum _{i>j}c_j=0$ holds for $j=1,\ldots ,n-1$, which implies that $c_1=\cdots =c_n=0$. Hence, condition (A2) is also satisfied.

The differential of $\rho _\nu $ is obtained by the following:

$$\begin{aligned} ((d\rho _\nu )_\mu \sigma )(\xi _j,y)=\bigl ( y\sigma [0,\xi _j]+(1-y)\sigma (\xi _j,\infty )\bigr )\nu ^j \end{aligned}$$

for every $\sigma \in T_\mu \mathcal {P}_+(\mathcal {W})=\mathcal {S}_0(\{\xi _1,\ldots ,\xi _n\})$. The adjoint operator is determined by

$$\begin{aligned} ((d\rho _\nu )_\mu ^*\tau )(\xi _i)=\left( \sum _{j=i}^{n-1}\nu ^j \frac{\sigma [0,\xi _j]}{\mu [0,\xi _j]}+\sum _{j=1}^{i-1}\nu ^j\frac{\sigma (\xi _j,\infty )}{\mu (\xi _j,\infty )}\right) \mu ^i \end{aligned}$$

for each $\tau =\rho (\sigma ,\nu ) \in T_{\rho (\mu ,\nu )}\mathcal {E}_\nu $.

Let $\gamma :=\textrm{d}(\partial \theta )_\mu /\textrm{d}\mu =\sum _{i=1}^n\gamma _i e^i$ with

$$\begin{aligned} \gamma _i=\left\{ \begin{array}{lc} \displaystyle \frac{\partial \theta }{\partial \mu ^i}(\mu ) -\sum _{h=1}^{n-1} \mu ^h \frac{\partial \theta }{\partial \mu ^h}(\mu ) &{}\quad (1\le i\le n-1)\\ \displaystyle -\sum _{h=1}^{n-1} \mu ^h \frac{\partial \theta }{\partial \mu ^h}(\mu ) &{}\quad (i=n) \end{array} \right. \end{aligned}$$

(25)

The derivation of (25) is explained in Appendix A.2. Let $(\partial \kappa _\nu )_{\rho (\mu ,\nu )}=\rho (\hat{\sigma }_{\nu },\nu )$, where $\hat{\sigma }_{\nu }$ satisfies

$$\begin{aligned} \gamma _i=\sum _{j=i}^{n-1}\nu ^j \frac{\hat{\sigma }_{\nu }[0,\xi _j]}{\mu [0,\xi _j]}+\sum _{j=1}^{i-1}\nu ^j\frac{\hat{\sigma }_{\nu }(\xi _j,\infty )}{\mu (\xi _j,\infty )} \end{aligned}$$

(26)

for $1\le i\le n$. Note that condition $\hat{\sigma }_\nu \in \mathcal {S}_0(\mathcal {W})$ implies $\hat{\sigma }_\nu [0,\infty )=\hat{\sigma }_\nu [0,\xi _i]+\hat{\sigma }_\nu (\xi _i,\infty )=0$ for $1\le i\le n-1$. By the score equation (26),

$$\begin{aligned} \gamma _{i+1}-\gamma _i =-\nu ^i\frac{ \hat{\sigma }_\nu [0,\xi _i] }{ \mu [0,\xi _i] } +\nu ^i\frac{ \hat{\sigma }_\nu (\xi _i,\infty ) }{ \mu (\xi _i,\infty ) } =-\nu ^i\frac{ \hat{\sigma }_\nu [0,\xi _i] }{ \mu [0,\xi _i] \mu (\xi _i,\infty ) }, \end{aligned}$$

which implies

$$\begin{aligned} \hat{\sigma }_{\nu }[0,\xi _j]= -\frac{ \gamma _{j+1}-\gamma _j }{\nu ^j}\mu [0,\xi _j]\mu (\xi _j,\infty ) \end{aligned}$$

for $1\le j\le n-1$ and $\hat{\sigma }_{\nu }[0,\xi _n]=0$. Thus, we obtain the following expression:

$$\begin{aligned} \frac{\textrm{d}(\partial \kappa _\nu )_{\rho (\mu ,\nu )}}{\textrm{d}\rho (\mu ,\nu )}(\xi _j,y) = -\frac{ \gamma _{j+1}-\gamma _j }{\nu ^j}\Bigl ( y\mu (\xi _j,\infty )-(1-y)\mu [0,\xi _j]\Bigr ) \end{aligned}$$

and

$$\begin{aligned} \lambda (\theta (\mu )\mid \nu )= \sum _{j=1}^{n-1}\frac{\left( \gamma _{j+1}-\gamma _j \right) ^2}{\nu ^j}\mu [0,\xi _j]\mu (\xi _j,\infty ). \end{aligned}$$

(27)

The conditional expectation

$$\begin{aligned} E_\mu \left[ \left. \left( \frac{\textrm{d} (\partial \kappa _\nu )_{\rho (\mu ,\nu )} }{\textrm{d}\rho (\mu ,\nu )}(x,y)\right) ^2\right| \ x=\xi _j\ \right] =\left( \frac{ \gamma _{j+1}-\gamma _j }{\nu ^j}\right) ^2\mu [0,\xi _j]\mu (\xi _j,\infty ) \end{aligned}$$

becomes independent of $\xi _j$ if and only if

$$\begin{aligned} {\nu }^j = \frac{\left| \gamma _{j+1}-\gamma _j \right| \sqrt{ \mu [0,\xi _j]\mu (\xi _j,\infty ) }}{\sum _{h=1}^{n-1}\left| \gamma _{h+1}-\gamma _h \right| \sqrt{ \mu [0,\xi _h]\mu (\xi _h,\infty ) }} \end{aligned}$$

(28)

for $j=1,\ldots ,n-1$. In particular, when $\theta (\mu )=\int f\textrm{d}\mu $ with $f\in \mathcal {F(\mathcal {W})}$, the optimal design for $\theta (\mu )$ is obtained by the following expression:

$$\begin{aligned} {\nu }^j = \frac{\left| f(\xi _{j+1})-f(\xi _j) \right| \sqrt{ \mu [0,\xi _j]\mu (\xi _j,\infty ) }}{\sum _{h=1}^{n-1}\left| f(\xi _{h+1})-f(\xi _h) \right| \sqrt{ \mu [0,\xi _h]\mu (\xi _h,\infty ) }} \end{aligned}$$

(29)

because

$$\begin{aligned} \frac{\partial \theta }{\partial \mu ^j}(\mu )= \frac{\partial }{\partial \mu ^j}\left[ \sum _{i=1}^{n-1}\mu ^i f(\xi _i)+\left( 1-\sum _{i=1}^{n-1}\mu ^i \right) f(\xi _n)\right] =f(\xi _j)-f(\xi _n). \end{aligned}$$

Equation (29) is equivalent to equation (8) of Duffield and Patterson [11].

However, this design is not feasible because it contains an unknown $\mu $. A feasible alternative is the min–max design that is defined as follows:

$$\begin{aligned} \nu _{\text {min--max}}:=\text {Arg}\min _{\nu \in \mathcal {P}_+(\mathcal {X})} \left[ \sup _{\mu \in \mathcal {P}_+(\mathcal {W})}\lambda (\theta (\mu )|\nu ) \right] . \end{aligned}$$

In the binary experiment, the maximal risk to estimate $\theta (\mu )=\int f\textrm{d}\mu $ is equal to the following:

$$\begin{aligned} \sup _{\mu \in \mathcal {P}_+(\mathcal {W})}\lambda (\theta (\mu )\mid \nu )=\sum _{j=1}^{n-1}\frac{\left( f(\xi _{j+1})-f(\xi _j) \right) ^2}{4\nu ^j}, \end{aligned}$$

where the supremum is obtained by $\mu =\delta _1/2+\delta _n/2$. The risk is minimized by the following expression:

$$\begin{aligned} \nu _{\text {min--max}}^j=\frac{ \bigl | f(\xi _{j+1})-f(\xi _j) \bigr |}{\sum _{h=1}^{n-1} \bigl | f(\xi _{h+1})-f(\xi _h) \bigr |}. \end{aligned}$$

(30)

In particular, when $\mathcal {W}$ is equally spaced so that $\xi _2-\xi _1=\cdots =\xi _n-\xi _{n-1}$, the $\min $-$\max $ design for estimating $E_\mu \omega $ becomes a uniform distribution on $\mathcal {X}$. Therefore, the uniform design should be theoretically justified for the binary response experiment to estimate the mean.

5 Conclusions

In this study, the optimal design problem of the CVM experiment was described from the perspective of information geometry. The problem is formulated as the minimization of the Cramér–Rao lower bound, which is equal to the squared Fisher information norm of the gradient vector of the parameter functional to be estimated, over a statistical manifold of finite probability measures. The problem is solved by using the duality of the (e, m)-connections on the manifold. The necessary and sufficient condition of the minimization is stated as the orthogonality between the gradient and its e-connections. The result is applied to a classical binary experiment to confirm that it replicates the results obtained in Ref. [11].

In this study, finite probability measures are considered to avoid the technical difficulties of infinite dimensional spaces. To enhance the applicability of the result of the paper, generalizing the model to an infinite dimensional manifold is critical. To find more application examples is crucial. In the “double-bounded” CVM, for example, each respondent is posed with a second question depending on the response to the first question: if the first offer is accepted, then the second bid is set higher than the first bid; whereas, if the first offer is rejected, the second bid is set smaller. Therefore, the response function is provided by the following expression:

$$\begin{aligned} (y,y')=\left( \mathbb {I}\{ \omega \le x \}, \mathbb {I}\{ \omega \le x' \} \right) \in \{0,1\}\times \{0,1\}, \end{aligned}$$

where x is the first bid and $x'$ is the second bid. The statistical efficiency of the double-bounded CVM is considerably improved compared with the conventional single-bounded CVM [14]. Asymptotic properties of the nonparametric estimation of the model were extensively studied by Groeneboom and Jongbloed [12]. In the future, the optimal distribution of the sequential bidding prices $(x,x')$ can be determined by applying the result of this study.

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

References

Alberini, A.: Optimal designs for discrete choice contingent valuation surveys: single-Bound, double-Bound, and bivariate models. J. Environ. Econ. Manag. 28, 287–306 (1995)
Article Google Scholar
Amari, S.: Information Geometry and Its Applications. Springer, Berlin (2016)
Book MATH Google Scholar
Amari, S., Nagaoka, H.: Methods of Information Geometry. Oxford University Press, Oxford (2000)
MATH Google Scholar
Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.: Information geometry. Springer, Berlin (2017)
Book MATH Google Scholar
Bickel, P.J., Klaassen, C.A.J., Ritov, Y., Wellner, J.A.: Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins University Press, Baltimore (1993)
MATH Google Scholar
Bishop, R.C., Heberlein, T.A.: Measuring values of extramarket goods: are indirect methods biased? Am. J. Agric. Econ. 61, 926–930 (1979)
Article Google Scholar
Boyle, K.J.: Contingent Valuation in Practice. A Primer on Nonmarket Valuation. Springer, Berlin (2017)
Google Scholar
Carson, R., Hanemann, W.M.: Contingent valuation. In: Handbook of Environmental Economics, pp. 821–936. Elsevier, Amsterdam (2005)
Google Scholar
Cooper, J., Loomis, J.: Sensitivity of willingness-to-pay estimates to bid design in dichotomous choice contingent valuation models. Land Econ. 68, 211–224 (1992)
Article Google Scholar
Cooper, J.C.: Optimal bid selection for dichotomous choice contingent valuation surveys. J. Environ. Econ. Manag. 24, 25–40 (1993)
Article Google Scholar
Duffield, J.W., Patterson, D.A.: Inference and optimal design for a welfare measure in dichotomous choice contingent valuation. Land Econ. 67, 225–239 (1991)
Article Google Scholar
Groeneboom, P., Jongbloed, G.: Nonparametric Estimation Under Shape Constraints. Cambridge University Press, Cambridge (2014)
Book MATH Google Scholar
Hanemann, W.M.: Welfare evaluations in contingent valuation experiments with discrete responses. Am. J. Agric. Econ. 66, 332–341 (1984)
Article Google Scholar
Hanemann, W.M., Loomis, J., Kanninen, B.: Statistical efficiency of double-bounded dichotomous choice contingent valuation. Am. J. Agric. Econ. 73, 1255–1263 (1991)
Article Google Scholar
Johnston, R.J., Boyle, K.J., Adamowicz, W., Bennett, J., Brouwer, R., Cameron, T.A., Hanemann, W.M., Hanley, N., Ryan, M., Scarpa, R., Tourangeau, R., Vossler, C.A.: Contemporary guidance for stated preference studies. J. Assoc. Environ. Resour. Econ. 4, 319–405 (2017)
Google Scholar
Kanninen, B.: Optimal design for multinomial choice experiments. J. Mark. Res. 39, 214–227 (2002)
Article Google Scholar
Kanninen, B., Kriström, B.: Sensitivity of willingness-to-pay estimates to bid design in dichotomous choice valuation models: comment. Land Econ. 69, 199–202 (1993)
Article Google Scholar
Kriström, B.: A non-parametric approach to the estimation of welfare measures in discrete response valuation studies. Land Econ. 66, 135–139 (1990)
Article Google Scholar
Löwner, K.: Über monotone matrixfunctionen. Math. Z. 38, 177–216 (1934)
Article MathSciNet MATH Google Scholar
Nordström, K.: Convexity of the inverse and Moore–Penrose inverse. Linear Algebra Appl. 434, 1489–1512 (2011)
Article MathSciNet MATH Google Scholar
Shao, J.: Mathematical Statistics, 2nd edn. Springer, Berlin (2003)
Book MATH Google Scholar
Van der Vaart, A.W.: On differentiable functionals. Ann. Stat. 19, 178–204 (1991)
Article MathSciNet MATH Google Scholar
Van der Vaart, A.W.: Asymptotic Statistics. Cambridge University Press, Cambridge (1998)
Book MATH Google Scholar

Download references

Acknowledgements

I thank the two anonymous referees for their insightful comments and suggestions, which improved the manuscript. I also thank the participants of the 5th Conference on Geometric Science of Information (GSI) held at Sorbonne University in 2021 and the International Conference on Information Geometry for Data Science (IG4DS) held at Hamburg University of Technology in 2022 for helpful discussions.

Author information

Authors and Affiliations

School of Political Science and Economics, Waseda University, Shinjuku, Tokyo, 169-8050, Japan
Hisatoshi Tanaka

Authors

Hisatoshi Tanaka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hisatoshi Tanaka.

Ethics declarations

Conflict of interest

The author states that there is no conflict of interest.

Additional information

Communicated by Nihat Ay.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

1.1 A.1 The e-connection on the manifold of finite probability measures

Let $A:\mu \mapsto a_\mu $, $B:\mu \mapsto b_\mu $, and $C:\mu \mapsto c_\mu $ be smooth vector fields on $\mathcal {P}_+(\mathcal {I})$. By the definitions of $\varPi ^{(e)}$, we have the following expression:

$$\begin{aligned} \mathfrak {g}_{\mu }\left( \varPi ^{(e)}_{\mu +t a_\mu ,\mu }b_{\mu +ta_\mu }, \varPi ^{(m)}_{\mu +ta_\mu ,\mu }c_{\mu +ta_\mu } \right) \equiv \mathfrak {g}_{\mu +ta_\mu }\left( b_{\mu +ta_\mu }, c_{\mu +ta_\mu } \right) \end{aligned}$$

for sufficiently small t. Therefore, we have

$$\begin{aligned}{} & {} \mathfrak {g}_{\mu }\left( \lim _{t\rightarrow 0}\frac{\varPi ^{(e)}_{\mu +t a_\mu ,\mu }b_{\mu +ta_\mu } -b_\mu }{t}, c_\mu \right) \\{} & {} =\lim _{t\rightarrow 0} \frac{1}{t}\left[ \mathfrak {g}_{\mu +ta_\mu }\left( b_{\mu +ta_\mu }, c_{\mu +ta_\mu } \right) -\mathfrak {g}_\mu \left( b_\mu , \varPi ^{(m)}_{\mu +ta_\mu ,\mu }c \right) \right] \\{} & {} = \lim _{t\rightarrow 0} \frac{1}{t}\left[ \mathfrak {g}_{\mu +ta_\mu }\left( b_{\mu +ta_\mu }, c_{\mu +ta_\mu } \right) -\mathfrak {g}_\mu \left( b_\mu , c_\mu \right) \right] -\mathfrak {g}_\mu \left( b_\mu , \lim _{t\rightarrow 0}\frac{\varPi ^{(m)}_{\mu +ta_\mu ,\mu }c-c_\mu }{t} \right) \\{} & {} =\left( \frac{d}{dt}\right) _{t=0}\sum _{i=1}^n \frac{ b_{\mu +ta_\mu }^i c_{\mu +ta_\mu }^i }{\mu ^i+ta_\mu ^i} -\lim _{t\rightarrow 0} \mathfrak {g}_\mu \left( b_\mu , \left. \nabla ^{(m)}_A C \right| _\mu \right) \\{} & {} =\sum _{i=1}^n \frac{1}{\mu ^i} \left( \frac{\partial b^i}{\partial a_\mu }(\mu )- \frac{ a_\mu ^i}{ \mu ^i }\cdot \frac{ b_\mu ^i}{ \mu ^i }\cdot \mu ^i \right) c_\mu ^i\\{} & {} =\mathfrak {g}_\mu \left( \frac{\partial b}{\partial a_\mu }(\mu )- \frac{ \textrm{d} a_\mu }{\textrm{d} \mu }\cdot \frac{ \textrm{d} b_\mu }{\textrm{d} \mu }\mu , c_\mu \right) , \end{aligned}$$

which implies the existence of a constant $\zeta $ such that

$$\begin{aligned} \left. \nabla ^{(e)}_A B \right| _\mu = \lim _{t\rightarrow 0}\frac{\varPi ^{(e)}_{\mu +t a_\mu ,\mu }b_{\mu +ta_\mu } -b_\mu }{t} =\frac{\partial b}{\partial a_\mu }(\mu )- \frac{ \textrm{d} a_\mu }{\textrm{d} \mu }\cdot \frac{ \textrm{d} b_\mu }{\textrm{d} \mu }\mu +\zeta \mu \end{aligned}$$

because $c_\mu \in \mathcal {S}_0(\mathcal {I})$. Since $\left. \nabla ^{(e)}_A B \right| _\mu \in \mathcal {S}_0(\mathcal {I})$ is also required,

$$\begin{aligned} \zeta =\sum _{i=1}^n\zeta \mu ^i =-\sum _{i=1}^n\left( \frac{\partial b^i}{\partial a_\mu }(\mu )- \frac{ a^i_\mu }{ \mu ^i }\cdot \frac{ b^i_\mu }{ \mu ^i }\mu ^i \right) =\mathfrak {g}_\mu (a_\mu ,b_\mu ), \end{aligned}$$

which proves (5).

1.2 A.2 Derivation of the gradient (25) of $\theta (\mu )$

Choose arbitrary $\sigma \in \mathcal {S}_0(\mathcal {W})$. Let $\mu _t=\mu +t\sigma $, then

$$\begin{aligned} \mu _t=\sum _{i=1}^{n-1}(\mu ^i+t\sigma ^i)\delta _i+\left( 1-\sum _{i=1}^{n-1}(\mu ^i+t\sigma ^i)\right) \delta _n. \end{aligned}$$

For arbitrary constant $\gamma _0$,

$$\begin{aligned} \lim _{t\rightarrow 0}\frac{ \theta (\mu _t)-\theta (\mu ) }{t}= & {} \sum _{i=1}^{n-1} \frac{\partial \theta }{\partial \mu ^i}(\mu )\sigma ^i\\= & {} \sum _{i=1}^{n-1} \frac{1}{\mu ^i}\left( \frac{\partial \theta }{\partial \mu ^i}(\mu ) -\gamma _0\right) \mu ^i \sigma ^i + \frac{1}{\mu ^n}\cdot (-\gamma _0)\mu ^n\cdot \sigma ^n\\= & {} \mathfrak {g}_\mu \left( (\partial \theta )_\mu , \sigma \right) \end{aligned}$$

holds, where $\mu ^n=1-\sum _{i=1}^{n-1}\mu ^i$, $\sigma ^n=-\sum _{i=1}^{n-1}\sigma ^i$, and

$$\begin{aligned} (\partial \theta )_\mu =\sum _{i=1}^{n-1}\left( \frac{\partial \theta }{\partial \mu ^i}(\mu ) -\gamma _0\right) \mu ^i\delta _i +(-\gamma _0)\mu ^n\delta _n. \end{aligned}$$

Because $(\partial \theta )_\mu \in T_\mu \mathcal {P}_+(\mathcal {W})=\mathcal {S}_0(\mathcal {W})$ is required by the definition,

$$\begin{aligned} \sum _{i=1}^{n-1}\left( \frac{\partial \theta }{\partial \mu ^i}(\mu ) -\gamma _0\right) \mu ^i+(-\gamma _0)\mu ^n=0, \end{aligned}$$

which implies $\gamma _0=\sum _{h=1}^{n-1} \frac{\partial \theta }{\partial \mu ^h}(\mu ) \mu ^h$. Thus,

$$\begin{aligned} \frac{\textrm{d}(\partial \theta )_\mu }{\textrm{d}\mu }= \sum _{i=1}^{n-1}\left( \frac{\partial \theta }{\partial \mu ^i}(\mu ) - \sum _{h=1}^{n-1} \frac{\partial \theta }{\partial \mu ^h}(\mu ) \mu ^h \right) e^i +\left( -\sum _{h=1}^{n-1} \frac{\partial \theta }{\partial \mu ^h}(\mu ) \mu ^h\right) e^n \end{aligned}$$

is obtained.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Tanaka, H. Differential geometry for the optimal design of the contingent valuation method. Info. Geo. 6, 413–433 (2023). https://doi.org/10.1007/s41884-023-00109-w

Download citation

Received: 09 August 2022
Revised: 30 April 2023
Accepted: 13 May 2023
Published: 31 May 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s41884-023-00109-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Differential geometry for the optimal design of the contingent valuation method

Abstract

Similar content being viewed by others

The Lagrangian, constraint qualifications and economics

Sensitivity analysis for set-valued equilibrium problems

Second-order composed contingent derivatives of perturbation maps in set-valued optimization

1 Introduction

2 Geometry of finite measures

3 Main results

3.1 Model

Proposition 1

Proof

3.2 Optimal design

Proposition 2

Proof

Proposition 3

Proposition 4

Proof

Definition 1

Theorem 1

Proof

Corollary 1

Proof

Example 1

4 Binary response experiment

5 Conclusions

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A

Appendix A

1.1 A.1 The e-connection on the manifold of finite probability measures

1.2 A.2 Derivation of the gradient (25) of \(\theta (\mu )\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation