Treatment choice, mean square regret and partial identification

We consider a decision maker who faces a binary treatment choice when their welfare is only partially identified from data. We contribute to the literature by anchoring our finite-sample analysis on mean square regret, a decision criterion advocated by Kitagawa et al. in (2022) "Treatment Choice with Nonlinear Regret" . We find that optimal rules are always fractional, irrespective of the width of the identified set and precision of its estimate. The optimal treatment fraction is a simple logistic transformation of the commonly used t-statistic multiplied by a factor calculated by a simple constrained optimization. This treatment fraction gets closer to 0.5 as the width of the identified set becomes wider, implying the decision maker becomes more cautious against the adversarial Nature.


Introduction
Evidence-based policy making has been a keyword among researchers in social sciences and practitioners of public policies.A central question in evidence-based policy making is: how should a policy maker inform an optimal policy given information gathered from finite data?The seminal work of Manski (2004) advocates to approach the question via the framework of a statistical treatment choice, where the planner's policy choice is formulated based on the statistical decision theory of Wald (1950).
Ultimately, the selection of an optimal policy depends on the criterion of the decision maker.In the literature of statistical treatment choice, a widely used notion is regret (Savage, 1951), essentially the sub-optimality welfare gap between a policy under investigation and the oracle first-best policy.Furthermore, a common practice is to select optimal rules via minimax regret, which ranks decision rules via their worst-case expected regret over the underlying state of nature governing the sampling distribution and causal effects of the policy.
In a setting with point-identified welfare, optimal decision rules based on minimax regret are often singleton rules (e.g., Stoye 2009a andTetenov 2012b), i.e., they dictate to either treat everyone, or no one in the whole population given realized values of sample data.In a setting with partially-identified welfare, minimax regret optimal rules can be either singleton or non-singleton rules.See, for example, Manski (2009); Tetenov (2012a); Stoye (2012); Yata (2021).Recently, in a point-identified case, Kitagawa, Lee, and Qiu (2022) found that singleton rules can be sensitive to the sampling uncertainty and may incur a high chance of large welfare loss (see Kitagawa et al. 2022 for further analyses).As a result, Kitagawa et al. (2022) advocate the use of nonlinear regret to rank decision rules.For example, Kitagawa et al. (2022) recommend using mean square regret as a default, which penalizes rules with large variance of regret.This approach aligns with the choice of a decision maker who displays regret aversion, as axiomatized by Hayashi (2008).In a binary treatment setup, Kitagawa et al. (2022) show that minimax optimal decision rules with mean square regret are always fractional and follow a simple form of a logistic transformation of the commonly used t-statistic for the welfare contrast.
The particular minimax optimal rules derived in Kitagawa et al. (2022) focus on the case with point-identified welfare.That is, as finite sample data becomes large, the decision maker is able to learn the true welfare of each treatment and thus also to learn the true optimal treatment policy.While this assumption can be satisfied in many scenarios involving experimental data, there are still plenty of situations when such assumptions might be reasonably questioned.For example, even in randomized control trials (RCTs), outcome treatment effect of the experimental population in the RCT), and a * ∈ (0, 1.23) is the solution of a simple constrained optimization problem that depends on the ratio of two key parameters: the width of the identified set k, and the standard deviation σ of the estimate of the identified set.In the absence of partial identification, k = 0 and a * = 1.23, and (1.1) becomes the rule derived by Kitagawa et al. (2022).
The form of rule (1.1) is consistent with the findings by Kitagawa et al. (2022): minimax optimal rules with mean square regret are always fractional, irrespective of the magnitude of k and σ.Moreover, a * is the center of the identified set under the least favorable prior, and (1.1) is the posterior probability, under that least favorable prior, that the treatment effect of the target population is positive.Due to partial identification, the location of a * needs to be calibrated in a case-by-case manner.We show that a * < 1.23, so that the treatment fraction given t > 0 is strictly smaller than that in a point-identified case.Therefore, a direct impact of partial identification on treatment choice is that it further disciplines the planner to be more cautious against the adversarial Nature.That is, optimal decision rules will allocate a larger fraction of the population to the opposite treatment, compared to the point-identified case.
Our results draw a sharp contrast with the existing results by Stoye (2012) and Yata (2021), who derive minimax optimal rules under the same framework but with mean regret.Firstly, their results show that optimal decision rules are fractional only when k is large enough relative to σ.If k is sufficiently small, minimax regret optimal rules are still singleton rules.With our mean square regret criterion, minimax optimal rules are always fractional.Secondly, if mean regret is the risk function, whenever a fractional rule is optimal, the corresponding least favorable prior pins down the center of the identified set at a value of 0, i.e., under the least favorable prior, data is uninformative regarding the sign of the treatment effect of the target population.In contrast, under mean square regret, the least favorable prior for the center of the identified set supports two points symmetric around 0 so that the decision maker can update that prior with the data.
Due to the set-identified nature of the welfare and the nonlinear nature of the mean square regret, derivation of our results is more delicate than those considered in the existing literature.Indeed, the form of the optimal decision rule depends explicitly on the location of the least favorable prior, which will change depending on the ratio of k and σ.Following Donoho (1994) and Yata (2021), we find our minimax optimal rule by searching for the hardest one-dimensional subproblem and verifying that the minimax optimal rule for the hardest one-dimensional subproblem is indeed minimax optimal for the whole problem.This approach is different from, but very much related to the guess-and-verify approach (as exploited in Stoye, 2009aStoye, , 2012;;Kitagawa et al., 2022;Azevedo et al., 2023, among others).As we will demonstrate from Section 3 below, the approach by searching for the onedimensional subproblem still has a "guessing" component as well as a "verifying" component.In fact, one may view finding the hardest one-dimensional subproblem as one specific way of figuring out the least favorable prior.Technically, in our considered problem, one can still try to figure out the structure of the least favorable prior based on prior work (e.g., Stoye 2012) without using the techniques employed in this paper.Hence, it is not entirely clear which approach has a clear advantage in solving these minimax problems.It is beyond the scope of this paper to investigate optimal rules with mean square regret under the multivariate-signal setting considered by Yata ( 2021), but we conjecture that similar analyses in this paper may be extended.
The rest of the paper is organised as follows.Section 2 introduces our setup.Section 3 presents steps to derive our new minimax mean square regret optimal rules via finding the hardest one-dimensional subproblem.Section 4 concludes.

Setup
Our analysis begins with the basic framework of optimal treatment choice with partially identified welfare and with finite-sample data (see also Manski 2000;Brock 2006;Manski 2007bManski , 2009;;Tetenov 2012a;Stoye 2012 for earlier investigations).A decision maker contemplates assigning a binary treatment D ∈ {0, 1} to an infinitely large population which we call target population.Let Y t (1) be the potential outcome of the target population when D = 1 (treatment), and Y t (0) be the potential outcome of the target population when D = 0 (control).Denote by P t ∈ P the joint distribution of {Y t (1), Y t (0)}.We assume that a planner aims to maximize the mean outcome of the target population.Define the average treatment effect of the target population as , where E t [• ] denotes the expectation with respect to P t .Then, it is easy to see that the infeasible optimal treatment policy for the target population is To learn about the unknown parameter θ t ∈ R, the decision maker has access to finite data collected from some RCTs.However, we assume that the RCTs are implemented on a population, which we call experimental population, that is potentially different from the target population.That is, the decision maker is concerned about the external validity of the RCT: the data only has limited validity and the RCTs only partially identify the true parameter of interest θ t .To derive finite sample optimality results, we assume that the RCTs have internal validity so that the decision maker is able to derive a normally distributed estimator θe ∈ R for the average treatment effect of the experimental population.That is, where θ e ∈ R is the unknown average treatment effect of the experimental population, and σ 2 > 0 is known.Note θ e is the point-identified reduced-form parameter.And θ e is potentially different from θ t , which is the parameter of interest that the decision maker really cares about.Without any assumptions on the relationship between θ e and θ t , the problem becomes trivial, as θ e and θ t can be arbitrarily different so that nothing can be learnt from the RCTs about θ t .In that sense, data is completely useless.The potential usefulness of data in revealing the true unknown θ t lies in the following key assumption: for each θ e ∈ R, the decision maker knows a priori that the difference between θ t and θ e can be at most k ∈ R, a known constant.That is, the identified set for θ t is: (2.1) with k > 0 known.Note the case of k = 0 corresponds to the point-identified case in which θ t and θ e coincide.The case of k = ∞ corresponds to the case when RCT data is completely uninformative about the true θ t .
Remark 2.1.The shape of the identified set I(θ e ) in (2.1) is a symmetric interval around θ e .Moreover, the upper and lower bounds of I(θ e ) are both affine in θ e with the same gradient.Such a nice structure facilitates finite-sample analysis and arises in many problems, including the missing data (Manski, 1989), extrapolation under a Lipshitz assumption (Stoye, 2012;Ishihara and Kitagawa, 2021;Yata, 2021), and welfare bounds with externally invalid experimetnal population (Adjaho and Christensen, 2022;Kido, 2022).However, there are also many situations when I(θ e ) does not have the nice form in (2.1).Deriving finite-sample results will be more challenging and is beyond the scope of this paper, and we leave them for future research.
The decision maker needs to choose a statistical treatment rule that maps the empirical evidence summarized by θe ∈ R to the unit interval: where δ(x) is the fraction of the target population to be treated after the policy maker observes θe = x.Note we assume that the primitive action space for the planner is [0, 1].That is, fractional treatment allocation according to some randomization device is allowed after data have been observed.
We deviate from the existing literature in treatment choice by evaluating the performance of δ via mean square regret, a decision criterion advocated by Kitagawa et al. (2022) as a special case of nonlinear regret.In a setting with point-identified welfare and with finite-sample data, Kitagawa et al. (2022) observe that optimal rules are usually singleton rules and are sensitive to the sampling uncertainty.To alleviate concerns regarding the robustness of optimal decision rules with respect to sampling uncertainty, Kitagawa et al. (2022) advocate the criteria of nonlinear regret, which incorporates other useful information from the regret distribution (e.g., the second or higher moments), while the standard regret criterion only focuses on the mean of the regret distribution.In particular, mean square regret criterion penalizes rules with large variance of regret, and yields optimal treatment fractions with a simple formula.From the perspective of decision theory, mean square regret also characterizes the choice behaviour of a decision maker who displays regret aversion, a notion axiomatized by Hayashi (2008).A natural open question is how the optimal rules will change under the mean square regret criterion if the welfare is now partially identified, which we address in this paper.To proceed, note that applying δ to the target population yields a welfare of and a regret of to the planner.The mean square regret of δ is defined as where E θe [• ] is with respect to RCT data θe ∼ N (θ e , σ 2 ).As Reg( δ, P t ) depends on P t only through θ t , we can simplify R sq ( δ, θ e , P t ) as where θ := θ e θ t ∈ Θ ⊆ R 2 are the unknown parameters in the problem, and is the associated parameter space.

Minimax optimal rules
We aim to find a minimax optimal rule in terms of mean square regret.Viewing R sq ( δ, θ) as the risk function in statistical decision theory, we introduce the following standard definition of minimax optimality.
Definition 3.1.Let D be a set of statistical decision rules that are functions of θe .A rule δ * is mean square regret minimax optimal if it is such that Since θ ∈ Θ is a two-dimensional parameter, finding a minimax optimal rule is more challenging than in a point-identified case, which can be viewed as a special case when θ e = θ t and the unknown parameter is one-dimensional.That said, note the standard guessand-verify approach (Proposition 4.2, Kitagawa et al., 2022) is still valid.In theory, we can still try to figure out a least favorable prior in R 2 and show the Bayes optimal rule with respect to that hypothetical least favorable prior, say δπ , is such that where r( δπ ) is the Bayes mean square regret of δπ under the hypothetical least favorable prior.
Here, we take a different, but related approach that was adopted by Yata (2021), who follows Donoho (1994) to find a minimax optimal rule by searching for a hardest one-dimensional subproblem.We discuss the connections between these two approaches in Section 3.2 and Remark 3.4.
Below, we present the core results of this paper.We first review and extend some existing results in the one-dimensional problem, which will be useful for the derivation of the minimax optimal rule in one-dimensional subproblem and also for our two-dimensional problem.

Review of the existing results in one-dimensional problem
Example 3.1 (Stylized one-dimensional problem).Let Ȳ1 ∼ N (τ, 1) be normally distributed with an unknown mean τ ∈ [−c, c] for some 0 < c < ∞, and a known variance normalized to one, with the likelihood function where ϕ(x) is the pdf of a standard normal distribution.The mean square regret of a rule δ : R → where the expectation E[• ] is with respect to Ȳ1 ∼ N (τ, 1).Lemma 3.1 (Mean square regret minimax rule in a stylized one-dimensional problem).In terms of mean square regret, a minimax optimal rule in Example 3.1 is τ 2 ρ(τ ).Moreover, the worst-case mean square regret of δ * is Proof.See Appendix A.
Remark 3.1.The result of Lemma 3.1 implies that when c ≥ τ * , minimax optimal decision rule is the same as the one found in Kitagawa et al. (Theorem 4.2, 2022), while the optimal rule differs when c < τ * .This result is very intuitive.We know that a global least favorable prior (when c is allowed to be as large as we want) puts equal probabilities on τ * and −τ * .If c ≥ τ * , the global least favorable prior is always feasible, so the minimax optimal rule must remain the same.If c < τ * , the global least favorable prior is no longer feasible.Instead, Lemma 3.1 shows that the constrained least favorable prior when c < τ * puts equal probabilities on the boundary points c and −c, and the minimax optimal rule is the Bayes optimal rule with respect to that constrained least favorable prior.

One-dimensional subproblem
In this and next subsections, we explain in detail how to derive a minimax optimal rule under mean square regret by using the approach taken by Donoho (1994) and Yata (2021).
The key idea is to find a one-dimensional subproblem (which we know how to solve from results in Section 3.1) that is as difficult as the original two-dimensional problem.In this particular example, as the parameter space Θ ⊆ R 2 is symmetric, it is natural to consider a one-dimensional subproblem in which the parameter space is simply the line connecting two symmetric points around (0, 0) ′ in Θ (to be formally introduced below).For such onedimensional subproblem, we can use Lemma 3.1 to find its minimax optimal rule and the associated worst-case mean square regret.Then, we search among all such one-dimensional subproblems.The one with the largest worst-case mean square regret is our hardest onedimensional subproblem, and its associated minimax rule is our "guess" of the minimax optimal rule for the original two-dimensional problem.A final crucial step is to verify that this candidate minimax rule derived from the hardest one-dimensional subproblem is indeed a minimax rule of the original problem-this corresponds to the "verifying" step.Therefore, the approach taken by Donoho (1994) and Yata (2021) still has a "guessing" component and a "verifying" component, and is very much related to the guess-and-verify approach that focuses on finding a least favorable prior (exploited in, e.g., Stoye 2009aStoye , 2012;;Kitagawa et al. 2022;Azevedo et al. 2023).We further discuss the connections between the two approaches in Remark 3.4.To be more concrete, a one-dimensional subproblem embedded in the two-dimensional problem can be constructed as follows.Let a e ≥ 0 and a t ∈ I(a e ) be two known constants.That is, a t ŝ is normally distributed with an unknown mean sa t (since s is unknown) and with a known variance at ae 2 σ 2 .Note that sa t is the average treatment effect of the target population.We may then apply Lemma 3.1 to characterize a minimax optimal rule for the one-dimensional subproblem.The case when θ e = 0, in contrast, requires a separate consideration, as this corresponds to the case when data θe ∼ N (0, σ 2 ) reveals no information regarding s.See Remark 3.3 for further discussions.Considering both cases when θ e > 0 and θ e = 0, we have the following lemma.Lemma 3.2 (Mean square regret minimax rule of a one-dimensional subproblem).A minimax optimal rule for the one-dimensional subproblem is That is, Moreover, the worst-case mean square regret of δ * ae,at is Proof.See Appendix A.
Remark 3.2.The interpretation of the minimax optimal rule in the one-dimensional subproblem is as follows.Intuitively, note as long as a e ̸ = 0, at |at|σ θe := t is a standard t-statistic.Consistent with the conclusion from Kitagawa et al. (2022), a minimax optimal rule in this parametric problem is a logistic transformation of t.If ae σ ≥ τ * , then the minimax optimal rule is a logistic transformation of 2τ * t.If, in contrast, 0 < ae σ < τ * , then the minimax optimal rule is a logistic transformation of 2 ae σ t.As we can see, if t > 0, the treatment fraction when 0 < ae σ < τ * is smaller than the case when ae σ ≥ τ * .Such a structure has intuitive implications on the minimax optimal rule derived later.See Remark 3.5 for a further discussion.
Remark 3.3.The situation when a e = 0 is particularly interesting and demonstrates further difference between the criterion of mean square regret and that of mean regret.If it holds a e = 0, then θe ∼ N (0, σ 2 ).That is, data is completely uninformative and reveals no information regarding the unknown s.In this situation, . This subproblem coincides with what was analyzed by Manski (2007a).If the mean of the regret is the criterion, Manski (2007a) shows that any rule δ such that E[ δ( θe )] = 1 2 is a minimax optimal rule, where the expectation is with respect to θe ∼ N (0, σ 2 ).That is, there are many minimax optimal rules for this particular subproblem.Using the uninformative data can still be minimax optimal under mean regret criterion, as using random data may be purely utilized as a radomization device without affecting the mean of regret.This draws a sharp contrast with mean square regret, under which the minimax optimal rule is δ * 0,at = 1 2 .That is, the minimax optimal rule under mean square regret is to not use data at all and allocate a fraction of 1 2 of the whole population to treatment.Such a fractional rule may be implemented via a randomization device that does not depend on data.This is intuitively easy to understand: any other rule that (1) is optimal in terms of the mean of regret and (2) uses random data and generates a positive variance of regret is not optimal in terms of mean square regret as they introduce further variance with respect to data without decreasing the mean of regret.

Hardest one-dimensional subproblem
From Lemma 3.2, we see that for each one-dimensional subproblem where θ ∈ Θ ae,at , the worst mean square regret of the minimax optimal rule depends on the value of a e and a t , both of which are assumed to be known.Let a * e ≥ 0 and a * t ∈ I(a * e ) be two constants.We call the problem of finding a minimax optimal rule when θ ∈ Θ a * e ,a That is, Θ a * e ,a * t is the one-dimensional parameter space that yields the largest possible worstcase mean square regret of its associated minimax rule.If we view the minimax problem as a game between the adversarial Nature and the econometrician, then the hardest onedimensional subproblem is the problem that the Nature will pick, provided that the Nature is restricted to choose only among the one-dimensional subproblems.
(iii) a * is strictly decreasing in k and strictly increasing in σ.
Proof.See Appendix A.
It turns out δ * H is not only a minimax optimal rule of the hardest one-dimensional subproblem, but also a minimax optimal rule of the original two-dimensional problem.That is, choosing the hardest one-dimensional subproblem is still the adversarial Nature's best move, even if they are allowed to choose any parameter in the two-dimensional parameter space.
Proof.See Appendix A.
Remark 3.4.By now, we can see a clear connection between the approach taken by Donoho (1994) and Yata (2021) in finding minimax optimal decisions and the guess-and-verify approach (Proposition 4.2, Kitagawa et al., 2022).Intuitively, we can view finding the hardest one-dimensional subproblem as one way of finding the least favorable prior.Indeed, in the original two-dimensional problem, the least favorable prior can be verified to be supported on a * σ a * σ + k and −a * σ −a * σ − k with equal probabilities.Technically, once an econometrician figures out the structure of the least favorable prior (which is possible given prior work in the literature, e.g., Stoye 2012), they can proceed without using the techniques employed in this paper, by directly invoking Kitagawa et al. (Proposition 4.2, 2022).Therefore, it is not entirely clear which approach has a relative advantage in solving these minimax problems.
Remark 3.5 (Comparison with Kitagawa et al. 2022).If the treatment effect of the target population is point-identified (k = 0), the theory of Kitagawa et al. (2022) applies and the minimax optimal rule is δ * = , which agrees with the conclusion from Theorem 3.1 by mechanically setting k = 0. Theorem 3.1 clearly demonstrates the effect of partial identification (k > 0) on the optimal decision rules.Partial identification moves the worst-case location of the point-identified parameter θ e further toward zero and away from τ * : the minimax optimal rule becomes δ * with a * < τ * .Therefore, partial identification further encourages the decision maker to be more cautious against the adversarial Nature: optimal treatment fraction under partial identification will be closer to 0 compared to a point-identified situation.From Lemma 3.3(iii), we know the value of a * decreases as k becomes larger: more partial identification results in more ambiguity, leading to more prudent or cautious treatment allocation.If k = ∞, then a * = 0 and the optimal treatment rule becomes δ * H = 1 2 .Remark 3.6 (Comparison with Stoye 2012 amd Yata ( 2021)).The conclusion of Theorem 3.1 is quantitatively and qualitatively different from the conclusion of Stoye (2012) and Yata (2021), who both use the mean of regret as a risk criterion and derive optimal fractional rules when k is large enough.As shown by Stoye (2012) (and was generalized by Yata (2021) to setups with multivariate signals), if mean of the regret is the risk criterion, whether or not a minimax optimal rule is fractional depends on the magnitude of k.If k ≤ π 2 σ, the naive empirical success rule 1{ θe ≥ 0} is minimax optimal.When k > π 2 σ, a minimax optimal rule is found to be fractional and admits δ * = Φ θe / 2k 2 /π − σ 2 , under which the worst-case location for θ e is at 0, i.e., when data are uninformative.Theorem 3.1 draws a very different picture compared to the existing literature: first of all, optimal rules are always fractional, irrespective of the magnitude of k.Second, the worst-case location for θ e is at ±a * σ ̸ = 0, which implies that data is still informative regarding the true unidentified treatment effect of the target population.See Figure 3.1 for an illustration of the minimax optimal rules in terms of mean regret and mean square regret with respect to different values of k.

Conclusion
In this paper, we study optimal binary treatment choice with mean square regret and with partially identified welfare, extending the analyses by Kitagawa et al. (2022).Our results lead to a simple and intuitive rule that is sharply different from the existing literature on treatment choice under partial identification with mean regret criterion.In particular, minimax optimal rules are always fractional, irrespective of the width of the identified set.The optimal treatment fraction is a logistic transformation of the commonly used t-statistic multiplied by a factor that is calculated by a simple constrained optimization.Our results are useful for policy makers who wish to make fractional treatment assignment but are concerned that the true optimal policy can not be identified from data.For future research, it would be interesting to consider optimal treatment choice with a general and arbitrary identified set, or with an estimated identified set.It would also be interesting to consider optimal individualised treatment choice with mean square regret.Figure 3.1: Minimax optimal rules in the Gaussian experiment with a unit variance and an unknown mean.In each of the graphs, k represents the width of the identified set.The dashed line is minimax optimal rule with respect to mean regret as a function of z, where z represents each possible realization of the Gaussian experiment.The solid line is minimax optimal rule with respect to mean square regret as a function of z.Note in the limiting case k = ∞, the two rules coincide.

A Proofs of main results
Proof of Lemma 3.1 By Remark 3.1, we focus on the case when c < τ * .Let π c be a prior on τ such that It can be verified that the Bayes optimal rule with respect to π c is By applying integration by change-of-variable, we may find the Bayes mean square regret of δπc ( Ȳ ) as implying δπc is indeed a minimax optimal rule by applying Kitagawa et al. (Proposition 4.2, 2022).

Proof of Lemma 3.2
We prove the lemma by considering two cases.
Case 1: a e = 0.In this case, for each θ ∈ Θ 0,at , where θe ∼ N (0, σ 2 ).This is a case where data θe reveals no information regarding the unknown s.If in addition to a e = 0, it holds that a t = 0.Then, any rule is minimax optimal.Focus on the case when We have the following decomposition That is, the mean square regret of each rule depends on δ only via µ δ and V δ , both of which are independent of s.Thus, for each As a t ̸ = 0, it is easy to see that a minimax optimal rule would set V δ = 0 and µ δ = 1 2 .That is, δ * 0,at = 1 2 , which means that the minimax optimal rule does not use data θe at all.Moreover, sup θ∈Θ 0,a t R sq ( δ * 0,at , θ) = a 2 t 4 .Case 2: a e > 0. In this case, note for each θ ∈ Θ ae,at , where θe ∼ N (a e s, σ 2 ).If a t = 0, then any rule is minimax optimal.Focus on a t ̸ = 0.
where the first equality follows from the definition, the second equality follows from applying integration by-change-of-variable and letting z = x στ s , and letting δ1 (z) = δ(σ τs z).
which coincides with δ * ae,at .Furthermore, by applying Lemma 3.1 and (A.1), we derive the worst-case mean square regret of δ * ae,at as Proof of Lemma 3.3 Proof of statement (i) sup where the first equalify follows from θ t ∈ [θ e − k, θ e + k], and the second equality is because ae+k ae 2 is decreasing in a e .Similarly, when 0 ≤ ae σ < τ * , sup 0≤ ae σ <τ * ,at∈I(ae) sup θ∈Θa e,at R sq ( δ * ae,at , θ) = sup Considering both (A.3) and (A.4), we see that finding the worst-case one-dimensional subproblem is reduced to finding Since ãe = ae σ , the hardest one-dimensional subproblem corresponds to a * e = σa * , a * t = σa * + k.Applying Lemma 3.2 yields the formula for δ * H and the expression for sup θ∈Θ H R sq ( δ * H , θ) as stated in (i) of the current lemma.

Proof of statement (ii)
Write g(ã e ) := ãe + k σ 2 ρ (ã e ), which is a continuous and differentiable function.Therefore, and This implies that moving away from ãe = 0 to a small positive number always increases g(ã e ).Thus, 0 is never a solution of sup 0≤ãe≤1.23 g(ã e ).

Proof of statement (iii)
By statement (ii), a * is an interior solution and must satisfy the following FOC: As (a * + k σ ) > 0, a * must also satisfy Moreover, as a * is a local maximum of a continuously differentiable function, it must also satisfy the following second-order condition: Viewing the right-hand-side of (A.8) as a function of a * and k, say F (a * , k), we may write .
Proof of Theorem 3.1 Firstly, note the following inequalities hold: where the first inequality follows from the definition of δ * , the second relation follows from Θ H ⊆ Θ, and the third relation follows from the fact that δ * H is a minimax optimal rule of the hardest one-dimensional subproblem.Secondly, Theorem B.1 establishes Combining (A.10) and (A.11) yields the desired conclusion.
Lemma B.4. g( θe ) has a unique sign change from + to − at a * .
Proof.Note by Lemma B.2, g( θe ) = 2 w(y)ϕ(y − θe )dy, where w(y) is defined in (B.3).Also, Thus, where As w * (y) 2 δ * H (y) > 0, the sign of w(y 1 ) is determined by w(y 1 ).It is straightforward to verify that Algebra shows where w a * (y) = ϕ 2 (y−a * )ϕ 2 (y+a * ) (ϕ(y−a * )+ϕ(y+a * )) 3 > 0 and is such that w a * (−y) = w a * (y) for all y ∈ R, and w(y) is strictly decreasing from +∞ to −∞.Let t * be the unique point such that w(t * ) = 0. where all three terms above can be signed to be negative.A similar decomposition also reveals that g (1) (a * ) < 0 holds true when t * < 0. Thus, we we conclude that g Below we show that g * c (• ) is increasing in [0, c], and the conclusion will follow.We take two steps.
Kitagawa et al. (Example 4.1, 2022)  focus on the general result when c = ∞.The following lemma extends the result ofKitagawa et al. (2022) by allowing c to be bounded and sufficiently small.Let ρ(a) expectation E[• ] is with respect to Ȳ1 ∼ N (a, 1).