1 Introduction

Evidence-based policy making has been a keyword among researchers in social sciences and practitioners of public policies. A central question in evidence-based policy making is: how should a policy maker inform an optimal policy given information gathered from finite data? The seminal work of Manski (2004) advocates to approach the question via the framework of a statistical treatment choice, where the planner’s policy choice is formulated based on the statistical decision theory of Wald (1950).

Ultimately, the selection of an optimal policy depends on the criterion of the decision maker. In the literature of statistical treatment choice, a widely used notion is regret (Savage, 1951), essentially the sub-optimality welfare gap between a policy under investigation and the oracle first-best policy. Furthermore, a common practice is to select optimal rules via minimax regret, which ranks decision rules via their worst-case expected regret over the underlying state of nature governing the sampling distribution and causal effects of the policy.

In a setting with point-identified welfare, optimal decision rules based on minimax regret are often singleton rules (e.g., Stoye, 2009a and Tetenov, 2012b), i.e., they dictate to either treat everyone, or no one in the whole population given realized values of sample data. In a setting with partially-identified welfare, minimax regret optimal rules can be either singleton or non-singleton rules. See, for example, Manski (2009), Tetenov (2012a), Stoye (2012), and Yata (2021). Recently, in a point-identified case, Kitagawa et al. (2022) found that singleton rules can be sensitive to the sampling uncertainty and may incur a high chance of large welfare loss (see Kitagawa et al., 2022 for further analyses). As a result, Kitagawa et al. (2022) advocate the use of nonlinear regret to rank decision rules. For example, Kitagawa et al. (2022) recommend using mean square regret as a default, which penalizes rules with large variance of regret. This approach aligns with the choice of a decision maker who displays regret aversion, as axiomatized by Hayashi (2008). In a binary treatment setup, Kitagawa et al. (2022) show that minimax optimal decision rules with mean square regret are always fractional and follow a simple form of a logistic transformation of the commonly used t-statistic for the welfare contrast.

The particular minimax optimal rules derived in Kitagawa et al. (2022) focus on the case with point-identified welfare. That is, as finite sample data becomes large, the decision maker is able to learn the true welfare of each treatment and thus also to learn the true optimal treatment policy. While this assumption can be satisfied in many scenarios involving experimental data, there are still plenty of situations when such assumptions might be reasonably questioned. For example, even in randomized control trials (RCTs), outcome data under treatment or control might still be missing due to noncompliance of the sample units or due to attrition in the data-collecting process. Even without noncompliance or attrition and when the RCTs are internally valid, researchers may also be concerned about external validity, in the sense that the population for which the treatment policy is applied may be different from the population under which the RCTs are conducted.

Table 1 Treatment choice with partial identification: existing results and aim of this paper

What is the optimal treatment policy when a decision maker cares about mean square regret but faces the problem of a partially identified welfare? Do the results of Kitagawa et al. (2022) that optimal rules are fractional remain to hold under partial identification? This paper aims to address these questions in a finite-sample framework, extending the analyses by Kitagawa et al. (2022). See Table 1 for an illustration of the motivation of the paper in relation with the existing results in the literature. Following earlier studies by Brock (2006), Manski (2000), Manski (2007b), Tetenov (2012a), Stoye (2012), among others, we adopt a simple, but well-motivated regret-based framework in which a policy maker, who wishes to maximize the expected outcome of the population, needs to choose a binary treatment when (1) the average treatment effect of the target population is partially identified, but (2) the identified set for the average treatment effect of the target population is a symmetric interval with a fixed and known length around the point-identified reduced-form parameter, for which a Gaussian sufficient statistic is available. Scenarios sharing both or either of the features have been studied by, e.g., Adjaho and Christensen (2022), Ben-Michael et al. (2022), Christensen et al. (2023), D’Adamo (2021), Ishihara and Kitagawa (2021), Kido (2022), Stoye (2012), Tetenov (2012a), Yata (2021).

This paper contributes to the literature by developing new finite-sample optimal decision rules with mean square regret under partial identification, which has not been considered elsewhere in the literature to the best of our knowledge. We show that the fundamental form of the minimax optimal rules derived by Kitagawa et al. (2022) is preserved in the partial identification case. With partially identified welfare, minimax optimal rules have the following simple logistic form:

$$\begin{aligned} \frac{\exp \left( 2\cdot a^{*}\cdot {\hat{t}} \, \right) }{\exp \left( 2\cdot a^{*}\cdot {\hat{t}} \, \right) +1}, \end{aligned}$$
(1.1)

where \({\hat{t}}\) is the t-statistic for the reduced-form parameter (say, the average treatment effect of the experimental population in the RCT), and \(a^{*}\in (0,1.23)\) is the solution of a simple constrained optimization problem that depends on the ratio of two key parameters: the width of the identified set k, and the standard deviation \(\sigma\) of the estimate of the identified set. In the absence of partial identification, \(k=0\) and \(a^{*}=1.23\), and (1.1) becomes the rule derived by Kitagawa et al. (2022).

The form of rule (1.1) is consistent with the findings by Kitagawa et al. (2022): minimax optimal rules with mean square regret are always fractional, irrespective of the magnitude of k and \(\sigma\). Moreover, \(a^*\) is the center of the identified set under the least favorable prior, and (1.1) is the posterior probability, under that least favorable prior, that the treatment effect of the target population is positive. Due to partial identification, the location of \(a^*\) needs to be calibrated in a case-by-case manner. We show that \(a^{*}<1.23\), so that the treatment fraction given \({\hat{t}}>0\) is strictly smaller than that in a point-identified case. Therefore, a direct impact of partial identification on treatment choice is that it further disciplines the planner to be more cautious against the adversarial Nature. That is, optimal decision rules will allocate a larger fraction of the population to the opposite treatment, compared to the point-identified case.

Our results draw a sharp contrast with the existing results by Stoye (2012) and Yata (2021), who derive minimax optimal rules under the same framework but with mean regret. Firstly, their results show that optimal decision rules are fractional only when k is large enough relative to \(\sigma\). If k is sufficiently small, minimax regret optimal rules are still singleton rules. With our mean square regret criterion, minimax optimal rules are always fractional. Secondly, if mean regret is the risk function, whenever a fractional rule is optimal, the corresponding least favorable prior pins down the center of the identified set at a value of 0, i.e., under the least favorable prior, data is uninformative regarding the sign of the treatment effect of the target population. In contrast, under mean square regret, the least favorable prior for the center of the identified set supports two points symmetric around 0 so that the decision maker can update that prior with the data.

Due to the set-identified nature of the welfare and the nonlinear nature of the mean square regret, derivation of our results is more delicate than those considered in the existing literature. Indeed, the form of the optimal decision rule depends explicitly on the location of the least favorable prior, which will change depending on the ratio of k and \(\sigma\). Following Donoho (1994) and Yata (2021), we find our minimax optimal rule by searching for the hardest one-dimensional subproblem and verifying that the minimax optimal rule for the hardest one-dimensional subproblem is indeed minimax optimal for the whole problem. This approach is different from, but very much related to the guess-and-verify approach (as exploited in Azevedo et al., 2023; Kitagawa et al., 2022; Stoye, 2009a, 2012, among others). As we will demonstrate from Sect. 3 below, the approach by searching for the one-dimensional subproblem still has a “guessing” component as well as a “verifying” component. In fact, one may view finding the hardest one-dimensional subproblem as one specific way of figuring out the least favorable prior. Technically, in our considered problem, one can still try to figure out the structure of the least favorable prior based on prior work (e.g., Stoye, 2012) without using the techniques employed in this paper. Hence, it is not entirely clear which approach has a clear advantage in solving these minimax problems. It is beyond the scope of this paper to investigate optimal rules with mean square regret under the multivariate-signal setting considered by Yata (2021), but we conjecture that similar analyses in this paper may be extended.

Our research is related to a rapidly growing literature on treatment choice with partially identified welfare. It is known that minimax regret optimal rules may be fractional with or without true knowledge of the identified set (Brock, 2006; Cassidy & Manski, 2019; Manski, 2000, 2002, 2005, 2007a, b, 2013, 2021; Stoye, 2009b, 2012; Tetenov, 2012a; Yata, 2021). Fractional rules also arise in a setting with point-identified but nonlinear welfare (Manski, 2009; Manski & Tetenov, 2007 ). Our results focus on a scenario when the policy maker cannot differentiate each individual in the population. There is also a large literature on individualized policy learning with concerns on partially identified welfare, including issues like distributional robustness, external validity or asymmetric welfare, by, e.g., Adjaho and Christensen (2022), Ben-Michael et al. (2021, 2022), Christensen et al. (2023), D’Adamo (2021), Ishihara and Kitagawa (2021), Kallus and Zhou (2018), Kido (2022), Lei et al. (2023). When welfare is point-identified, finite-sample optimal rules are derived in Hirano and Porter (2009, 2020), Schlag (2006), Stoye (2009a), and Tetenov (2012b). Individualised treatment choice with point-identified welfare is considered in Athey and Wager (2021), Bhattacharya and Dupas (2012), Kitagawa and Tetenov (2018, 2021), Manski (2004), Mbakop and Tabord-Meehan (2021), among others.

The rest of the paper is organised as follows. Section 2 introduces our setup. Section 3 presents steps to derive our new minimax mean square regret optimal rules via finding the hardest one-dimensional subproblem. Section 4 concludes.

2 Setup

Our analysis begins with the basic framework of optimal treatment choice with partially identified welfare and with finite-sample data (see also Brock, 2006; Manski, 2000; Manski, 2007b, 2009; Stoye, 2012; Tetenov, 2012a for earlier investigations). A decision maker contemplates assigning a binary treatment \(D\in \{0,1\}\) to an infinitely large population which we call target population. Let \(Y_{t}(1)\) be the potential outcome of the target population when \(D=1\) (treatment), and \(Y_{t}(0)\) be the potential outcome of the target population when \(D=0\) (control). Denote by \(P_{t}\in \mathcal {P}\) the joint distribution of \(\left\{ Y_{t}(1),Y_{t}(0)\right\}\). We assume that a planner aims to maximize the mean outcome of the target population. Define the average treatment effect of the target population as \(\theta _{t}:={\mathbb {E}}_{t}\left[ Y_{t}(1)-Y_{t}(0)\right]\), where \({\mathbb {E}}_{t}[\cdot ]\) denotes the expectation with respect to \(P_{t}\). Then, it is easy to see that the infeasible optimal treatment policy for the target population is

$$\begin{aligned} {\textbf{1}}\left\{ \theta _{t}\ge 0\right\} . \end{aligned}$$

To learn about the unknown parameter \(\theta _{t}\in {\mathbb {R}}\), the decision maker has access to finite data collected from some RCTs. However, we assume that the RCTs are implemented on a population, which we call experimental population, that is potentially different from the target population. That is, the decision maker is concerned about the external validity of the RCT: the data only has limited validity and the RCTs only partially identify the true parameter of interest \(\theta _{t}\). To derive finite sample optimality results, we assume that the RCTs have internal validity so that the decision maker is able to derive a normally distributed estimator \({\hat{\theta }}_{e}\in {\mathbb {R}}\) for the average treatment effect of the experimental population. That is,

$$\begin{aligned} {\hat{\theta }}_{e}\sim N(\theta _{e},\sigma ^{2}), \end{aligned}$$

where \(\theta _{e}\in {\mathbb {R}}\) is the unknown average treatment effect of the experimental population, and \(\sigma ^{2}>0\) is known. Note \(\theta _{e}\) is the point-identified reduced-form parameter. And \(\theta _{e}\) is potentially different from \(\theta _{t}\), which is the parameter of interest that the decision maker really cares about. Without any assumptions on the relationship between \(\theta _{e}\) and \(\theta _{t}\), the problem becomes trivial, as \(\theta _{e}\) and \(\theta _{t}\) can be arbitrarily different so that nothing can be learnt from the RCTs about \(\theta _{t}\). In that sense, data is completely useless. The potential usefulness of data in revealing the true unknown \(\theta _{t}\) lies in the following key assumption: for each \(\theta _{e}\in {\mathbb {R}}\), the decision maker knows a priori that the difference between \(\theta _{t}\) and \(\theta _{e}\) can be at most \(k\in {\mathbb {R}}\), a known constant. That is, the identified set for \(\theta _{t}\) is:

$$\begin{aligned} I(\theta _{e}):=[\theta _{e}-k,\theta _{e}+k], \forall \theta _{e}\in {\mathbb {R}}, \end{aligned}$$
(2.1)

with \(k>0\) known. Note the case of \(k=0\) corresponds to the point-identified case in which \(\theta _{t}\) and \(\theta _{e}\) coincide. The case of \(k=\infty\) corresponds to the case when RCT data is completely uninformative about the true \(\theta _{t}\).

Remark 2.1

The shape of the identified set \(I(\theta _{e})\) in (2.1) is a symmetric interval around \(\theta _{e}\). Moreover, the upper and lower bounds of \(I(\theta _{e})\) are both affine in \(\theta _{e}\) with the same gradient. Such a nice structure facilitates finite-sample analysis and arises in many problems, including the missing data (Manski, 1989), extrapolation under a Lipshitz assumption (Ishihara and Kitagawa, 2021; Stoye, 2012; Yata, 2021), and welfare bounds with externally invalid experimental population (Adjaho and Christensen, 2022; Kido, 2022). However, there are also many situations when \(I(\theta _{e})\) does not have the nice form in (2.1). Deriving finite-sample results will be more challenging and is beyond the scope of this paper, and we leave them for future research.

The decision maker needs to choose a statistical treatment rule that maps the empirical evidence summarized by \({\hat{\theta }}_{e} \in {\mathbb {R}}\) to the unit interval:

$$\begin{aligned} {\hat{\delta }}:{\mathbb {R}}\rightarrow [0,1], \end{aligned}$$

where \({\hat{\delta }}(x)\) is the fraction of the target population to be treated after the policy maker observes \({\hat{\theta }}_{e}=x\). Note we assume that the primitive action space for the planner is [0, 1]. That is, fractional treatment allocation according to some randomization device is allowed after data have been observed.

We deviate from the existing literature in treatment choice by evaluating the performance of \({\hat{\delta }}\) via mean square regret, a decision criterion advocated by Kitagawa et al. (2022) as a special case of nonlinear regret. In a setting with point-identified welfare and with finite-sample data, Kitagawa et al. (2022) observe that optimal rules under mean regret are usually singleton rules and are sensitive to the sampling uncertainty. To alleviate concerns regarding the robustness of optimal decision rules with respect to sampling uncertainty, Kitagawa et al. (2022) advocate the criteria of nonlinear regret, which incorporates other useful information from the regret distribution (e.g., the second or higher moments), while the standard regret criterion only focuses on the mean of the regret distribution. In particular, mean square regret criterion penalizes rules with large variance of regret, and yields optimal treatment fractions with a simple formula. From the perspective of decision theory, mean square regret also characterizes the choice behaviour of a decision maker who displays regret aversion, a notion axiomatized by Hayashi (2008). A natural open question is how the optimal rules will change under the mean square regret criterion if the welfare is now partially identified, which we address in this paper. To proceed, note that applying \({\hat{\delta }}\) to the target population yields a welfare of

$$\begin{aligned} W({\hat{\delta }},P_{t})&:={\hat{\delta }}{\mathbb {E}}_{t}\left[ Y_{t}(1)\right] +(1-{\hat{\delta }}){\mathbb {E}}_{t}\left[ Y_{t}(0)\right] \end{aligned}$$

and a regret of

$$\begin{aligned} Reg({\hat{\delta }},P_{t}):=W({\textbf{1}}\left\{ \theta _{t}\ge 0\right\} ,P_{t})-W({\hat{\delta }},P_{t})=\theta _{t}\left\{ {\textbf{1}}\{\theta _{t}\ge 0\}-{\hat{\delta }}\right\} \end{aligned}$$

to the planner. The mean square regret of \({\hat{\delta }}\) is defined as

$$\begin{aligned} R_{sq}({\hat{\delta }},\theta _{e},P_{t}):={\mathbb {E}}_{\theta _{e}}\left[ Reg^{2}({\hat{\delta }},P_{t})\right] , \end{aligned}$$

where \({\mathbb {E}}_{\theta _{e}}[\cdot ]\) is with respect to RCT data \({\hat{\theta }}_{e}\sim N(\theta _{e},\sigma ^{2})\). As \(Reg({\hat{\delta }},P_{t})\) depends on \(P_{t}\) only through \(\theta _{t}\), we can simplify \(R_{sq}({\hat{\delta }},\theta _{e},P_{t})\) as

$$\begin{aligned} R_{sq}({\hat{\delta }},\theta ):=\theta _{t}^{2}{\mathbb {E}}_{\theta _{e}} \left[ \left( {\textbf{1}}\{\theta _{t}\ge 0\}-{\hat{\delta }}\right) ^{2}\right] , \end{aligned}$$

where \(\theta :=\left( \begin{array}{c} \theta _{e}\\ \theta _{t} \end{array}\right) \in \Theta \subseteq {\mathbb {R}}^2\) are the unknown parameters in the problem, and

$$\begin{aligned} \Theta :=\left\{ (\theta _{e},\theta _{t})^{\prime }\in {\mathbb {R}}^{2}|\theta _{e}\in {\mathbb {R}},\theta _{t}\in I(\theta _{e})\right\} \end{aligned}$$

is the associated parameter space.

3 Minimax optimal rules

We aim to find a minimax optimal rule in terms of mean square regret. Viewing \(R_{sq}({\hat{\delta }},\theta )\) as the risk function in statistical decision theory, we introduce the following standard definition of minimax optimality.

Definition 3.1

Let \(\mathcal {D}\) be a set of statistical decision rules that are functions of \({\hat{\theta }}_{e}\). A rule \({\hat{\delta }}^{*}\) is mean square regret minimax optimal if it is such that

$$\begin{aligned} \sup _{\theta \in \Theta }R_{sq}({\hat{\delta }}^{*},\theta )=\min _{{\hat{\delta }}\in \mathcal {D}}\sup _{\theta \in \Theta }R_{sq}({\hat{\delta }},\theta ). \end{aligned}$$

Since \(\theta \in \Theta\) is a two-dimensional parameter, finding a minimax optimal rule is more challenging than in a point-identified case, which can be viewed as a special case when \(\theta _{e}=\theta _{t}\) and the unknown parameter is one-dimensional. That said, note the standard guess-and-verify approach (Proposition 4.2, Kitagawa et al., 2022) is still valid. In theory, we can still try to figure out a least favorable prior in \({\mathbb {R}}^2\) and show the Bayes optimal rule with respect to that hypothetical least favorable prior, say \({\hat{\delta }}_{\pi }\), is such that

$$\begin{aligned} r({\hat{\delta }}_{\pi })=\sup _{\theta \in \Theta } R_{sq}({\hat{\delta }}_{\pi },\theta ), \end{aligned}$$

where \(r({\hat{\delta }}_{\pi })\) is the Bayes mean square regret of \({\hat{\delta }}_{\pi }\) under the hypothetical least favorable prior. Here, we take a different, but related approach that was adopted by Yata (2021), who follows Donoho (1994) to find a minimax optimal rule by searching for a hardest one-dimensional subproblem. We discuss the connections between these two approaches in Sect. 3.2 and Remark 3.4.

Below, we present the core results of this paper. We first review and extend some existing results in the one-dimensional problem, which will be useful for the derivation of the minimax optimal rule in one-dimensional subproblem and also for our two-dimensional problem.

3.1 Review of the existing results in one-dimensional problem

Example 3.1

[Stylized one-dimensional problem] Let \({\bar{Y}}_{1}\sim N(\tau ,1)\) be normally distributed with an unknown mean \(\tau \in [-c,c]\) for some \(0<c<\infty\), and a known variance normalized to one, with the likelihood function

$$\begin{aligned} f({\bar{y}}_{1}|\tau )=\phi ({\bar{y}}_{1}-\tau ), \forall {\bar{y}}_{1}\in {\mathbb {R}}, \end{aligned}$$
(3.1)

where \(\phi (x)\) is the pdf of a standard normal distribution. The mean square regret of a rule \({\hat{\delta }}:{\mathbb {R}}\rightarrow [0,1]\) based on data \({\bar{Y}}_{1}\) is

$$\begin{aligned} R_{sq}({\hat{\delta }},\tau )=\tau ^{2}{\mathbb {E}}\left[ \left( {\textbf{1}} \{\tau \ge 0\}-{\hat{\delta }}({\bar{Y}}_{1})\right) ^{2}\right] , \end{aligned}$$

where the expectation \({\mathbb {E}}[\cdot ]\) is with respect to \({\bar{Y}}_{1}\sim N(\tau ,1)\).

Kitagawa et al. (Example 4.1, 2022) focus on the general result when \(c=\infty\). The following lemma extends the result of Kitagawa et al. (2022) by allowing c to be bounded and sufficiently small. Let \(\rho (a):={\mathbb {E}}\left[ \left( \frac{1}{\exp \left( 2a{\bar{Y}}_{1}\right) +1}\right) ^{2}\right]\), where the expectation \({\mathbb {E}}[\cdot ]\) is with respect to \({\bar{Y}}_{1}\sim N(a,1)\).

Lemma 3.1

(Mean square regret minimax rule in a stylized one-dimensional problem) In terms of mean square regret, a minimax optimal rule in Example 3.1 is

$$\begin{aligned} {\hat{\delta }}^{*}={\left\{ \begin{array}{ll} \frac{\exp \left( 2\cdot \tau ^{*}\cdot {\bar{Y}}_{1}\right) }{\exp \left( 2\cdot \tau ^{*}\cdot {\bar{Y}}\right) +1}, &{} \text {if }c\ge \tau ^{*},\\ \frac{\exp \left( 2\cdot c\cdot {\bar{Y}}_{1}\right) }{\exp \left( 2\cdot c\cdot {\bar{Y}}_{1}\right) +1}, &{} \text {if }c<\tau ^{*}, \end{array}\right. } \end{aligned}$$

where \(\tau ^{*}\approx 1.23\) solves \(\sup \limits _{\tau \in [0,\infty )}\tau ^{2}\rho (\tau )\). Moreover, the worst-case mean square regret of \({\hat{\delta }}^{*}\) is

$$\begin{aligned} R_{sq}^{*}:=\sup _{\tau \in [-c,c]}R_{sq}({\hat{\delta }}^{*},\tau )={\left\{ \begin{array}{ll} \left( \tau ^{*}\right) ^{2}\rho (\tau ^{*})\approx 0.12, &{} \text {if }c\ge \tau ^{*},\\ c^{2}\rho (c) &{} \text {if }c<\tau ^{*}. \end{array}\right. } \end{aligned}$$

Proof

See Appendix 1. \(\square\)

Remark 3.1

The result of Lemma 3.1 implies that when \(c\ge \tau ^{*}\), minimax optimal decision rule is the same as the one found in Kitagawa et al. (Theorem 4.2, 2022), while the optimal rule differs when \(c< \tau ^{*}\). This result is very intuitive. We know that a global least favorable prior (when c is allowed to be as large as we want) puts equal probabilities on \(\tau ^{*}\) and \(-\tau ^{*}\). If \(c\ge \tau ^{*}\), the global least favorable prior is always feasible, so the minimax optimal rule must remain the same. If \(c< \tau ^{*}\), the global least favorable prior is no longer feasible. Instead, Lemma 3.1 shows that the constrained least favorable prior when \(c< \tau ^{*}\) puts equal probabilities on the boundary points c and \(-c\), and the minimax optimal rule is the Bayes optimal rule with respect to that constrained least favorable prior.

3.2 One-dimensional subproblem

In this and next subsections, we explain in detail how to derive a minimax optimal rule under mean square regret by using the approach taken by Donoho (1994) and Yata (2021). The key idea is to find a one-dimensional subproblem (which we know how to solve from results in Sect. 3.1) that is as difficult as the original two-dimensional problem. In this particular example, as the parameter space \(\Theta \subseteq {\mathbb {R}}^2\) is symmetric, it is natural to consider a one-dimensional subproblem in which the parameter space is simply the line connecting two symmetric points around \((0,0)^\prime\) in \(\Theta\) (to be formally introduced below). For such one-dimensional subproblem, we can use Lemma 3.1 to find its minimax optimal rule and the associated worst-case mean square regret. Then, we search among all such one-dimensional subproblems. The one with the largest worst-case mean square regret is our hardest one-dimensional subproblem, and its associated minimax rule is our “guess” of the minimax optimal rule for the original two-dimensional problem. A final crucial step is to verify that this candidate minimax rule derived from the hardest one-dimensional subproblem is indeed a minimax rule of the original problem—this corresponds to the “verifying” step. Therefore, the approach taken by Donoho (1994) and Yata (2021) still has a “guessing” component and a “verifying” component, and is very much related to the guess-and-verify approach that focuses on finding a least favorable prior (exploited in, e.g., Azevedo et al., 2023; Kitagawa et al., 2022; Stoye, 2009a, 2012). We further discuss the connections between the two approaches in Remark 3.4.

To be more concrete, a one-dimensional subproblem embedded in the two-dimensional problem can be constructed as follows. Let \(a_{e}\ge 0\) and \(a_{t}\in I(a_{e})\) be two known constants. It follows then \(\left( \begin{array}{c} a_{e}\\ a_{t} \end{array}\right) \in \Theta\) and \(\left( \begin{array}{c} -a_{e}\\ -a_{t} \end{array}\right) \in \Theta\). Let

$$\begin{aligned} \Theta _{a_{e},a_{t}}:=\left\{ \theta \in {\mathbb {R}}^{2}|\theta =s\left( \begin{array}{c} a_{e}\\ a_{t} \end{array}\right) ,s\in [-1,1]\right\} \subseteq \Theta \end{aligned}$$

be the line connecting \(\left( \begin{array}{c} a_{e}\\ a_{t} \end{array}\right)\) and \(\left( \begin{array}{c} -a_{e}\\ -a_{t} \end{array}\right)\). The parameter space \(\Theta _{a_{e},a_{t}}\) is one-dimensional as it contains only one unknown parameter \(s\in [-1,1]\). We call the problem of finding a minimax optimal rule when \(\theta \in \Theta _{a_{e},a_{t}}\) a one-dimensional subproblem. Indeed, for intuition, suppose \(a_{e}>0\) and let \({\hat{s}}:=\frac{{\hat{\theta }}_{e}}{a_{e}}\). Simple algebra shows that

$$\begin{aligned} {\hat{s}}\sim N\left( s,\frac{\sigma ^{2}}{a_{e}^{2}}\right) , \end{aligned}$$

which further implies that

$$\begin{aligned} a_{t}{\hat{s}}=\frac{a_{t}}{a_{e}}{\hat{\theta }}_{e}\sim N\left( sa_{t},\left( \frac{a_{t}}{a_{e}}\right) ^{2}\sigma ^{2}\right) . \end{aligned}$$

That is, \(a_{t}{\hat{s}}\) is normally distributed with an unknown mean \(sa_{t}\) (since s is unknown) and with a known variance \(\left( \frac{a_{t}}{a_{e}}\right) ^{2}\sigma ^{2}\). Note that \(sa_{t}\) is the average treatment effect of the target population. We may then apply Lemma 3.1 to characterize a minimax optimal rule for the one-dimensional subproblem. The case when \(\theta _{e}=0\), in contrast, requires a separate consideration, as this corresponds to the case when data \({\hat{\theta }}_{e}\sim N(0,\sigma ^2)\) reveals no information regarding s. See Remark 3.3 for further discussions. Considering both cases when \(\theta _{e}>0\) and \(\theta _{e}=0\), we have the following lemma.

Lemma 3.2

(Mean square regret minimax rule of a one-dimensional subproblem) A minimax optimal rule for the one-dimensional subproblem is

$$\begin{aligned} {\hat{\delta }}_{a_{e},a_{t}}^{*}={\left\{ \begin{array}{ll} \frac{\exp \left( 2\cdot \tau ^{*}\cdot \frac{a_{t}}{\left| a_{t}\right| \sigma } {\hat{\theta }}_{e}\right) }{\exp \left( 2\cdot \tau ^{*}\cdot \frac{a_{t}}{\left| a_{t}\right| \sigma }{\hat{\theta }}_{e}\right) +1}, &{} \frac{a_{e}}{\sigma }\ge \tau ^{*},\\ \\ \frac{\exp \left( 2\cdot \frac{a_{e}}{\sigma }\frac{a_{t}}{\left| a_{t} \right| \sigma }{\hat{\theta }}_{e}\right) }{\exp \left( 2\cdot \frac{a_{e}}{\sigma } \frac{a_{t}}{\left| a_{t}\right| \sigma }{\hat{\theta }}_{e}\right) +1}, &{} 0\le \frac{a_{e}}{\sigma }<\tau ^{*}. \end{array}\right. } \end{aligned}$$

That is,

$$\begin{aligned} \sup _{\theta \in \Theta _{a_{e},a_{t}}}R_{sq}({\hat{\delta }}_{a_{e},a_{t}}^{*},\theta ) =\min _{{\hat{\delta }}\in \mathcal {D}}\sup _{\theta \in \Theta _{a_{e},a_{t}}}R_{sq}({\hat{\delta }},\theta ). \end{aligned}$$

Moreover, the worst-case mean square regret of \({\hat{\delta }}_{a_{e},a_{t}}^{*}\) is

$$\begin{aligned} \sup _{\theta \in \Theta _{a_{e},a_{t}}}R_{sq}({\hat{\delta }}_{a_{e},a_{t}}^{*},\theta )={\left\{ \begin{array}{ll} \frac{a_{t}^{2}\sigma ^{2}}{a_{e}^{2}} \left( \tau ^{*}\right) ^{2}\rho (\tau ^{*}), &{} \frac{a_{e}}{\sigma }\ge \tau ^{*},\\ a_{t}^{2}\rho \left( \frac{a_{e}}{\sigma }\right) , &{} 0\le \frac{a_{e}}{\sigma }<\tau ^{*}. \end{array}\right. } \end{aligned}$$

Proof

See Appendix 1. \(\square\)

Remark 3.2

The interpretation of the minimax optimal rule in the one-dimensional subproblem is as follows. Intuitively, note as long as \(a_{e}\ne 0\), \(\frac{a_{t}}{\left| a_{t}\right| \sigma }{\hat{\theta }}_{e}:={\hat{t}}\) is a standard t-statistic. Consistent with the conclusion from Kitagawa et al. (2022), a minimax optimal rule in this parametric problem is a logistic transformation of \({\hat{t}}\). If \(\frac{a_{e}}{\sigma }\ge \tau ^*\), then the minimax optimal rule is a logistic transformation of \(2\tau ^*{\hat{t}}\). If, in contrast, \(0<\frac{a_{e}}{\sigma }<\tau ^*\), then the minimax optimal rule is a logistic transformation of \(2\frac{a_{e}}{\sigma }{\hat{t}}\). As we can see, if \({\hat{t}}>0\), the treatment fraction when \(0<\frac{a_{e}}{\sigma }<\tau ^*\) is smaller than the case when \(\frac{a_{e}}{\sigma }\ge \tau ^*\). Such a structure has intuitive implications on the minimax optimal rule derived later. See Remark 3.5 for a further discussion.

Remark 3.3

The situation when \(a_{e}=0\) is particularly interesting and demonstrates further difference between the criterion of mean square regret and that of mean regret. If it holds \(a_{e}=0\), then \({\hat{\theta }}_{e}\sim N(0,\sigma ^{2})\). That is, data is completely uninformative and reveals no information regarding the unknown s. In this situation, \(\theta _{t}\in [-|a_{t}|,|a_{t}|]\). This subproblem coincides with what was analyzed by Manski (2007a). If the mean of the regret is the criterion, Manski (2007a) shows that any rule \({\hat{\delta }}\) such that \({\mathbb {E}}[{\hat{\delta }}({\hat{\theta }}_{e})]=\frac{1}{2}\) is a minimax optimal rule, where the expectation is with respect to \({\hat{\theta }}_{e}\sim N(0,\sigma ^{2})\). That is, there are many minimax optimal rules for this particular subproblem. Using the uninformative data can still be minimax optimal under mean regret criterion, as using random data may be purely utilized as a radomization device without affecting the mean of regret. This draws a sharp contrast with mean square regret, under which the minimax optimal rule is \({\hat{\delta }}^*_{0,a_{t}}=\frac{1}{2}\). That is, the minimax optimal rule under mean square regret is to not use data at all and allocate a fraction of \(\frac{1}{2}\) of the whole population to treatment. Such a fractional rule may be implemented via a randomization device that does not depend on data. This is intuitively easy to understand: any other rule that (1) is optimal in terms of the mean of regret and (2) uses random data and generates a positive variance of regret is not optimal in terms of mean square regret as they introduce further variance with respect to data without decreasing the mean of regret.

3.3 Hardest one-dimensional subproblem

From Lemma 3.2, we see that for each one-dimensional subproblem where \(\theta \in \Theta _{a_{e},a_{t}}\), the worst mean square regret of the minimax optimal rule depends on the value of \(a_{e}\) and \(a_{t}\), both of which are assumed to be known. Let \(a_{e}^{*}\ge 0\) and \(a_{t}^{*}\in I(a_{e}^{*})\) be two constants. We call the problem of finding a minimax optimal rule when \(\theta \in \Theta _{a_{e}^{*},a_{t}^{*}}\) the hardest one-dimensional subproblem if

$$\begin{aligned} \sup _{\theta \in \Theta _{a_{e}^{*},a_{t}^{*}}}R_{sq} ({\hat{\delta }}_{a_{e}^{*},a_{t}^{*}}^{*},\theta )=\sup _{a_{e}\ge 0,a_{t}\in I(a_{e})}\sup _{\theta \in \Theta _{a_{e},a_{t}}}R_{sq} ({\hat{\delta }}_{a_{e},a_{t}}^{*},\theta ). \end{aligned}$$

That is, \(\Theta _{a_{e}^{*},a_{t}^{*}}\) is the one-dimensional parameter space that yields the largest possible worst-case mean square regret of its associated minimax rule. If we view the minimax problem as a game between the adversarial Nature and the econometrician, then the hardest one-dimensional subproblem is the problem that the Nature will pick, provided that the Nature is restricted to choose only among the one-dimensional subproblems. To characterise the hardest one-dimensional subproblem, let

$$\begin{aligned} a^{*}\in \arg \sup _{0\le {\tilde{a}}_{e}\le \tau ^*} \left( {\tilde{a}}_{e}+\frac{k}{\sigma } \right) ^{2} \rho \left( {\tilde{a}}_{e}\right) \end{aligned}$$
(3.2)

Lemma 3.3

(Mean square regret minimax rule of the hardest one-dimensional subproblem)

  1. (i)

    The hardest one-dimensional subproblem corresponds to \(a_{e}^{*}=a^{*}\sigma\), and \(a_{t}^{*}=a^{*}\sigma +k\). Let \(\Theta _{\textrm{H}}:=\Theta _{a^{*}\sigma ,a^{*}\sigma +k}\) be the hardest one-dimensional parameter space. The minimax optimal rule with respect to this hardest one-dimensional subproblem is

    $$\begin{aligned} {\hat{\delta }}_{\text {H}}^{*}:={\hat{\delta }}_{a^{*}\sigma ,a^{*}\sigma +k}^{*}=\frac{\exp \left( 2\cdot a^{*}\cdot \frac{{\hat{\theta }}_{e}}{\sigma }\right) }{\exp \left( 2\cdot a^{*}\cdot \frac{{\hat{\theta }}_{e}}{\sigma }\right) +1}, \end{aligned}$$

    and

    $$\begin{aligned} \sup _{\theta \in \Theta _{\textrm{H}}}R_{sq}({\hat{\delta }}_{\text {H}}^{*},\theta ) =\sigma ^{2}\left( a^{*}+\frac{k}{\sigma }\right) ^{2}\rho \left( a^{*}\right) . \end{aligned}$$
  2. (ii)

    \(0<a^{*}<\tau ^{*}\).

  3. (iii)

    \(a^*\) is strictly decreasing in k and strictly increasing in \(\sigma\).

Proof

See Appendix 1. \(\square\)

It turns out \({\hat{\delta }}_{\text {H}}^{*}\) is not only a minimax optimal rule of the hardest one-dimensional subproblem, but also a minimax optimal rule of the original two-dimensional problem. That is, choosing the hardest one-dimensional subproblem is still the adversarial Nature’s best move, even if they are allowed to choose any parameter in the two-dimensional parameter space.

Theorem 3.1

\(\sup _{\theta \in \Theta }R_{sq}({\hat{\delta }}_{\textrm{H}}^{*},\theta )=\min _{{\hat{\delta }} \in \mathcal {D}}\sup _{\theta \in \Theta }R_{sq}({\hat{\delta }},\theta ).\) That is, \({\hat{\delta }}_{\textrm{H}}^{*}\) is a minimax optimal rule in terms of mean square regret for the original two-dimensional problem analyzed in Sect. 2.

Proof

See Appendix 1. \(\square\)

Remark 3.4

By now, we can see a clear connection between the approach taken by Donoho (1994) and Yata (2021) in finding minimax optimal decisions and the guess-and-verify approach (Proposition 4.2, Kitagawa et al., 2022). Intuitively, we can view finding the hardest one-dimensional subproblem as one way of finding the least favorable prior. Indeed, in the original two-dimensional problem, the least favorable prior can be verified to be supported on \(\left( \begin{array}{c} a^{*}\sigma \\ a^{*}\sigma +k \end{array}\right)\) and \(\left( \begin{array}{c} -a^{*}\sigma \\ -a^{*}\sigma -k \end{array}\right)\) with equal probabilities. Technically, once an econometrician figures out the structure of the least favorable prior (which is possible given prior work in the literature, e.g., Stoye, 2012), they can proceed without using the techniques employed in this paper, by directly invoking Kitagawa et al. (Proposition 4.2, 2022). Therefore, it is not entirely clear which approach has a relative advantage in solving these minimax problems.

Remark 3.5

(Comparison with Kitagawa et al., 2022) If the treatment effect of the target population is point-identified (\(k=0\)), the theory of Kitagawa et al. (2022) applies and the minimax optimal rule is \({\hat{\delta }}^{*}=\frac{\exp \left( 2\cdot \tau ^{*}\cdot \frac{{\hat{\theta }}_{e}}{\sigma }\right) }{\exp \left( 2\cdot \tau ^{*}\cdot \frac{{\hat{\theta }}_{e}}{\sigma }\right) +1}\), which agrees with the conclusion from Theorem 3.1 by mechanically setting \(k=0\). Theorem 3.1 clearly demonstrates the effect of partial identification (\(k>0\)) on the optimal decision rules. Partial identification moves the worst-case location of the point-identified parameter \(\theta _{e}\) further toward zero and away from \(\tau ^*\): the minimax optimal rule becomes \({\hat{\delta }}^{*}_{\text {H}}=\frac{\exp \left( 2\cdot a^{*}\cdot \frac{{\hat{\theta }}_{e}}{\sigma }\right) }{\exp \left( 2\cdot a^{*}\cdot \frac{{\hat{\theta }}_{e}}{\sigma }\right) +1}\) with \(a^*<\tau ^*\). Therefore, partial identification further encourages the decision maker to be more cautious against the adversarial Nature: optimal treatment fraction under partial identification will be closer to 0 compared to a point-identified situation. From Lemma 3.3(iii), we know the value of \(a^*\) decreases as k becomes larger: more partial identification results in more ambiguity, leading to more prudent or cautious treatment allocation. If \(k=\infty\), then \(a^*=0\) and the optimal treatment rule becomes \({\hat{\delta }}^{*}_{\text {H}}=\frac{1}{2}\).

Remark 3.6

(Comparison with Stoye, 2012 and Yata, 2021) The conclusion of Theorem 3.1 is quantitatively and qualitatively different from the conclusion of Stoye (2012) and Yata (2021), who both use the mean of regret as a risk criterion and derive optimal fractional rules when k is large enough. As shown by Stoye (2012) and generalized by Yata (2021) to setups with multivariate signals, if mean of the regret is the risk criterion, whether or not a minimax optimal rule is fractional depends on the magnitude of k. If \(k\le \sqrt{\frac{\pi }{2}}\sigma\), the naive empirical success rule \({\textbf{1}}\{{\hat{\theta }}_{e}\ge 0\}\) is minimax optimal. When \(k>\sqrt{\frac{\pi }{2}}\sigma\), a minimax optimal rule is found to be fractional and admits \({\hat{\delta }}^*=\Phi \bigl ({\hat{\theta }}_{e}/\sqrt{2k^{2}/\pi -\sigma ^{2}}\bigr )\), under which the worst-case location for \(\theta _{e}\) is at 0, i.e., when data are uninformative. Theorem 3.1 draws a very different picture compared to the existing literature: first of all, optimal rules are always fractional, irrespective of the magnitude of k. Second, the worst-case location for \(\theta _{e}\) is at \(\pm a^*\sigma \ne 0\), which implies that data is still informative regarding the true unidentified treatment effect of the target population. See Fig. 1 for an illustration of the minimax optimal rules in terms of mean regret and mean square regret with respect to different values of k.

Fig. 1
figure 1

Minimax optimal rules in the Gaussian experiment with a unit variance and an unknown mean. In each of the graphs, k represents the width of the identified set. The dashed line is minimax optimal rule with respect to mean regret as a function of z, where z represents each possible realization of the Gaussian experiment. The solid line is minimax optimal rule with respect to mean square regret as a function of z. Note in the limiting case \(k=\infty\), the two rules coincide

4 Conclusion

In this paper, we study optimal binary treatment choice with mean square regret and with partially identified welfare, extending the analyses by Kitagawa et al. (2022). Our results lead to a simple and intuitive rule that is sharply different from the existing literature on treatment choice under partial identification with mean regret criterion. In particular, minimax optimal rules are always fractional, irrespective of the width of the identified set. The optimal treatment fraction is a logistic transformation of the commonly used t-statistic multiplied by a factor that is calculated by a simple constrained optimization. Our results are useful for policy makers who wish to make fractional treatment assignment but are concerned that the true optimal policy can not be identified from data. For future research, it would be interesting to consider optimal treatment choice with a general and arbitrary identified set, or with an estimated identified set. It would also be interesting to consider optimal individualised treatment choice with mean square regret.