Differential geometry for the optimal design of the contingent valuation method

The contingent valuation method (CVM) is a widely used experimental method to measure the monetary value of goods. However, CVM estimates are sensitive to experiment design. In this study, we formulated the optimal design problem as a minimization problem of the Fisher information metric of a gradient vector field generated by using the statistical model of the CVM. Furthermore, a necessary and sufficient condition of the optimal design was proven.


Introduction
The contingent valuation method (CVM) using discrete response valuation questions is a widely used experimental method to measure the monetary value of nonmarket environmental goods.In the experiment, an agent is asked if she will buy a certain good at the price of x or not.She will accept the offered price if her willingness-to-pay (WTP) ω to the good is higher than x.Let y = 0 if x is accepted and y = 1 if rejected, that is, Let μ be the distribution of ω.The objective of the experiment is to estimate the value of θ(μ), where θ is a given function of μ.For example, if the mean value of the WTP is to be determined, θ(μ) = ω dμ(ω) should be estimated.By observing independent copies of (x, y) obtained from (1), the value of θ(μ) could be consistently estimated by standard statistical techniques such as probit, logit, or nonparametric maximum likelihood [18].The survey design problem has been a major problem since the CVM was introduced by Bishop and Heberlein [6] and Hanemann [13].For a survey question, a statistician should choose the bidding price distribution ν, from which x is randomly sampled.WTP estimates derived using the CVM are sensitive to the choice of ν [9,11,17].Optimal designs for ν are to ease the sensitivity problem by minimizing the variance of these estimates.Cooper [10] proposed the optimal design using logit formulation of μ, and Alberini [1] studied the design for the probit.Duffield and Patterson [11] considered the optimal design for nonparametric μ.Kanninen [16] generalized the results to the multinomial logit model, where y takes multiple discrete values.For comprehensive surveys on the literature, please refer to [7,8,15].
This study investigated the optimal design problem from the perspective of information geometry.We generalize the nonparametric approach of Duffield and Patterson [11] by considering general response y = ρ(ω, x) and nonspecified target θ(μ).Under the general settings, we formulated the optimal design problem as the minimization of the Cramér-Rao lower bound of θ(μ) over a set of the bidding price distributions ν.The problem was solved using general optimization techniques because it is the optimization of a function over a finite-dimensional space.However, in such approaches, the computation could be messy and a solution would be less intuitive.Instead, we used the information geometry method to formulate the problem.Because the Cramér-Rao lower bound is equal to the squared Fisher norm of a tangent vector field on the statistical manifold, the necessary and sufficient condition for the optimal design is concisely stated through dual connections [2][3][4].
The remainder of this paper is organized as follows: Sect. 2 introduces the geometry of finite measures.Section 3 presents the results of this study, which includes a necessary and sufficient condition for optimal design.According to this condition, a design becomes optimal if and only if it generates a vector field that is orthogonal to its own e-connection.In Sect.4, the results are applied to the binary response experiment presented in (1).Section 5 presents the conclusion of the paper.

Geometry of finite measures
In this section, the geometry of finite measures is introduced.The terms and definitions are based on Chapter 2 of Ay et al. [4].Let I = {1, . . ., n} be an arbitrary finite set.The linear space of function f : I → R is denoted by F(I ).The space has the canonical basis {e i ∈ F(I) : i ∈ I}, where Each f ∈ F(I) is expressed as follows: f = n i=1 f i e i .The dual space S(I) := F * (I) is the set of signed measures μ : F(I) → R. The dual basis {δ 1 , . . ., δ n } is defined as follows: On S(I), we introduce a coordination system by μ → (μ 1 , . . ., μ n ).Given point μ ∈ S(I), the tangent space of S(I) is The Fisher metric on T μ M + (I) is now introduced by and the Fisher norm is a μ := g μ (a, a).Let θ : P + (I) → R be a smooth functional.The differential of θ in μ is a linear form (dθ) μ : T μ P + (I) → R that is obtained by The Fisher metric allows the differential to be identified with the gradient (∂θ ) μ : (dθ) μ a ≡ g μ (a, (∂θ) μ ), a ∈ T μ P + (I). (3) The gradient vector field of θ is as follows: Given two points μ and μ in P + (I), the m-parallel transport is determined by the following expression: The e-parallel transport Π (e) μ,μ : T μ P + (I) → T μ P + (I) is the conjugate of the mtransport and satisfies For two smooth vector fields A : μ → a μ and B : μ → b μ on P + (I), the mconnection ∇ (m) and e-connection ∇ (e) are defined by the following expression: and See Appendix A.1 for the proof.According to the definitions, 3 Main results
The proposition reveals that (A1) and (A2) are sufficient conditions for the statistical identification of θ(μ).In the experiment, independent realizations of (x, y), with which ρ ν (μ) is estimated, were observed.The existence of a one-to-one correspondence between ρ ν (μ) and θ(μ) implies that the value of θ(μ) is statistically estimated from the observations.
Because ρ ν and ρ μ are linear mappings, their differentials are obtained by (dρ ν ) μ : is bilinear in (μ, ν), its differential at (μ, ν) is obtained by the following equation: The tangent spaces of E ν and , which are orthogonal to one another because of the following: The adjoint operator (dρ ν ) * μ is determined by the following expression: where τ = m j=1 k=1 τ j,k δ j,k , {δ j,k } is the basis of S(X × Y), and 123 The operator is the adjoint of (dρ ν ) μ because Note that the definition ( 12) of (dρ ν ) * μ is independent of ν.

Optimal design
Suppose that the goal of the experiment is to estimate the value of θ : P + (W) → R at a certain point μ.In the following, we assume that for each (μ, ν) ∈ P + (W) × P + (X ).This is the differentiability condition in Ref. [22], and a regular estimation of θ(μ) is possible only if the condition holds.
Equation ( 13) is typically referred to as the score equation.The solution ∂κ ν to the equation introduces a vector field ∂κ on E by The optimal design is defined as a minimizer of the Cramér-Rao lower bound λ(θ |ν) for the estimation of θ = θ(μ).The lower bound can be found by computing the inverse of the Fisher information matrix, which is the variance matrix of the score The computation involves complex matrix calculations.Moreover, we minimized the value to determine the optimal design.
An expression of the lower bound can be determined by characterizing the bound as the supremum of the Cramér-Rao lower bound of one-dimensional submodels.Let > 0 be sufficiently small.Consider a smooth path t ∈ (− , ) → μ t ∈ P + (W), which passes through μ at t = 0 with a velocity Notably, The Cramér-Rao lower bound of 'true' t = 0 is the inverse of the Fisher information of the submodel at t = 0.Because the Fisher information of the submodel is The lower bound of θ = θ(μ) along with the one-parameter submodel t → ρ ν (μ t ) is given by the following expression: Let tS be the efficient estimator of t = 0 attaining the lower bound, where S denotes the sample size: that is, Given the submodel t → ρ ν (μ t ), the efficient estimator of θ(μ) is given by θS = θ(μ tS ).By the Delta method, we have the following expression: holds (see e.g.Theorem 1.12 of Shao [21]).Because ∂θ ∂σ (μ) = g μ (∂θ ) μ , σ , 2 by the score equation (13).The Cramér-Rao lower bound for the full model is equal to the supremum of λ(σ ) over the submodels t → ρ ν (μ t ) [5,22].Since The supremum of λ(σ ) is attained by σ such that (∂κ ν ) ρ(μ,ν) = (dρ ν ) μ σ .A submodel having tangent vector σ at μ produces the largest variance to estimate θ(μ) among other submodels.Such submodels with tangent vector σ are called the least favorable or the hardest submodel [23].
Corollary 1 ν is the optimal design for θ(μ) if and only if for all x ∈ X .
Proof From the definition of ∇ (e) , we have the following equation: which holds for an arbitrary η ∈ S 0 (X ) if and only if (22) is satisfied.
The intuition behind this condition can be obtained from the following expression: If the lower bound is minimized at ν, any small perturbations that are added to ν will not significantly change the value of λ(θ | ν).This condition is possible if and only if the integrand on the right-hand side is independent of x.
Example 1 To see how the theorem works, let us consider a trivial response function ρ(ω, x) = ω + x, where W and X are subsets of R. In this case, the joint density of (x, y) is given by ρ(μ, ν)(x, y) . The differential of ρ ν and its adjoint are as follows: The score equation (∂θ ) μ (ω) = m j=1 ∂κ ν (x j , ω + x j ) is solved by ∂κ ν (x, y) = (∂θ ) μ (y − x)ν(x).For every η ∈ T ν P + (X ) and H = ρ(μ, η), and Therefore, the condition ( 19) is trivially satisfied at arbitrary ν.In this example, ω is always observable because ρ is invertible as ω = y − x.The distribution of x does not affect estimation efficiency.Thus, the choice of ν becomes significant only when a model with information loss is estimated.The optimality can be checked by applying the corollary, too.In this example, is independent of x, arbitrary ν becomes optimal when ρ(ω, x) = ω + x.

Binary response experiment
The optimal design to estimate the mean WTP Eω with the binary response (1) and nonparametric μ was proposed by Duffield and Patterson [11].They directly minimized the asymptotic variance of the maximum likelihood estimator of Eω to determine the optimal design.In this section, we apply Theorem 1 to replicate their result.
However, this design is not feasible because it contains an unknown μ.A feasible alternative is the min-max design that is defined as follows: In the binary experiment, the maximal risk to estimate θ(μ) = f dμ is equal to the following: where the supremum is obtained by μ = δ 1 /2 + δ n /2.The risk is minimized by the following expression: In particular, when W is equally spaced so that ξ , the minmax design for estimating E μ ω becomes a uniform distribution on X .Therefore, the uniform design should be theoretically justified for the binary response experiment to estimate the mean.

Conclusions
In this study, the optimal design problem of the CVM experiment was described from the perspective of information geometry.The problem is formulated as the minimization of the Cramér-Rao lower bound, which is equal to the squared Fisher information norm of the gradient vector of the parameter functional to be estimated, over a statistical manifold of finite probability measures.The problem is solved by using the duality of the (e, m)-connections on the manifold.The necessary and sufficient condition of the minimization is stated as the orthogonality between the gradient and its e-connections.The result is applied to a classical binary experiment to confirm that it replicates the results obtained in Ref. [11].
In this study, finite probability measures are considered to avoid the technical difficulties of infinite dimensional spaces.To enhance the applicability of the result of the paper, generalizing the model to an infinite dimensional manifold is critical.To find more application examples is crucial.In the "double-bounded" CVM, for example, each respondent is posed with a second question depending on the response to the first question: if the first offer is accepted, then the second bid is set higher than the first bid; whereas, if the first offer is rejected, the second bid is set smaller.Therefore, the response function is provided by the following expression:

123
where x is the first bid and x is the second bid.The statistical efficiency of the double-bounded CVM is considerably improved compared with the conventional single-bounded CVM [14].Asymptotic properties of the nonparametric estimation of the model were extensively studied by Groeneboom and Jongbloed [12].In the future, the optimal distribution of the sequential bidding prices (x, x ) can be determined by applying the result of this study.
which allow us to identify T μ S(I) with S(I).In the following part, the tangent vectors and spaces are always given in the form of their m-representations.Let M + (I) = μ ∈ S(I) : μ i > 0, i ∈ I .As an open submanifold of S(I), the tangent space of M + (I) is identified with S(I).Given two tangent vectors a and b in T μ M + (I), the Radon-Nikodym derivatives with respect to μ are denoted by the following expression which is the set of positive probability measures on I.The tangent space T μ P + (I) is identified with S 0 (I) := μ ∈ S(I) : n i=1 μ i = 0 .
for three arbitrary vector fields A, B, and C, where g(A, B) denotes a function μ → g μ (a μ , b μ ).