Bandwidth selection for treatment choice with binary outcomes

This study considers the treatment choice problem when the outcome variable is binary. We focus on statistical treatment rules that plug in fitted values from a nonparametric kernel regression, and show that the maximum regret can be calculated by maximizing over two parameters. Using this result, we propose a novel bandwidth selection method based on the minimax regret criterion. Finally, we perform a numerical exercise to compare the optimal bandwidth choices for binary and normally distributed outcomes.


Introduction
This study examines the problem of determining whether to treat individuals based on observed covariates.A standard approach is to employ a plug-in rule that selects the treatment according to the sign of an estimate of the conditional average treatment effect (CATE).Kernel regression is a prevalent technique for estimating the CATE, and a crucial aspect of implementing this method is the decision regarding bandwidth selection.Many studies have proposed bandwidth selection approaches to solve the estimation problem (see Li and Racine (2007)).However, to the best of our knowledge, there are few studies that investigate the bandwidth selection for the treatment choice problem.We propose a novel method for bandwidth selection in the treatment choice problem when dealing with binary outcome variables.
In this study, we consider the planner who want to determine whether to treat individuals with a particular covariate value based on experimental data.Following Manski (2004Manski ( , 2007)), Stoye (2009Stoye ( , 2012)), Tetenov (2012), Ishihara andKitagawa (2021), andYata (2021), we focus on the minimax regret criterion to solve the decision problem.When the outcome variables are binary, the conditional distributions of the outcomes are characterized by conditional mean functions.We assume that the conditional mean functions are Lipschitz functions and show that the maximum regret can be calculated by optimizing two parameters.Based on these results, we propose a computationally tractable algorithm for obtaining the optimal bandwidth.Ishihara and Kitagawa (2021) and Yata (2021) derive the minimax regret rule in a similar setting when outcome variables are normally distributed.Using the argument of Ishihara and Kitagawa (2021), the calculation of the maximum regret for a nonrandomized statistical treatment rule, which incorporates fitted values derived from nonparametric kernel regression, can be performed with ease.However, this approach relies on the normality and therefore can not be applied to binary outcomes.Stoye (2012) considers the statistical decision problems with binary outcomes.The setting of Stoye (2012) is similar to that of this study, but our framework differs from two perspectives.First, we focus on the treatment choice at a particular covariate value, whereas Stoye (2012) considers treatment assignment functions that map from the covariate support into a binary treatment status.Second, our restriction on the conditional mean functions differs from that of Stoye (2012).Stoye (2012) assumes that the variations of the conditional mean functions are bounded.By contrast, our study considers the conditional mean functions as Lipschitz functions.
The remainder of this paper is organized as follows.Section 2 explains the study setting.Section 3 defines the statistical decision problem and provides a computationally tractable algorithm to obtain the optimal bandwidth.Section 4 presents a numerical analysis to compare bandwidth selections for binary and normally distributed outcomes.
Section 5 concludes the study.

Settings
Suppose that we have experimental data {Y i , D i , X i } n i=1 , where X i ∈ R dx is a vector of observable pre-treatment covariates, D i ∈ {0, 1} is a binary indicator of the treatment, and Y i is a binary outcome.Then Y i satisfies (1) Under unconfoundedness assumption, we can consider µ(1, x) − µ(0, x) as the CATE.
Throughout this paper, we consider the planner seeking to determine whether to treat individuals with X i = x 0 based on the data D ≡ (Y 1 , Y 0 ) given that the parameters p 1 ≡ (p 1,0 , . . ., p 1,n 1 ) ′ and p 0 ≡ (p 0,0 , . . ., p 0,n 0 ) ′ are unknown, where x 0 is a specific value of the covariate vector.The value x 0 does not have to be included in the support of the covariate distribution in data.Without loss of generality, we assume that x 0 = 0.
3 Main Results

Welfare and regret
Given a treatment choice action δ ∈ [0, 1], we define welfare attained by δ as An optimal treatment choice action given knowledge of p is Then W (δ * ) is the optimal welfare that would be achievable if we knew p.
Let δ(D) ∈ {0, 1} be a statistical treatment rule that maps data D to the binary treatment choice decision.The welfare regret of δ(D) is defined as where E p (•) is the expectation with respect to the sampling distribution of D given the parameters p.
This study focuses on the following statistical treatment rule.
Assumption 1.We consider the following class of statistical treatment rules D: where K : R dx → R + denotes the kernel function.
Assumption 1 implies that we focus on non-randomized statistical treatment rules that plug in the fitted values based on nonparametric kernel regression.In addition, we assume that the kernel function takes non-negative values.Our results are dependent on this condition.In (5), θ is the bandwidth of the kernel regression estimator.
The minimax regret criterion selects a statistical treatment rule that minimizes the Since a statistical treatment rule δ ∈ D is characterized by bandwidth θ, the optimal bandwidth can be calculated as follows: where δθ is as defined in (5).In the next subsection, we describe the calculation of the optimal bandwidth θ * .

Minimax regret rule
The following theorem implies that we can calculate max p∈P R(p, δθ ) by optimizing two parameters.
Theorem 1.Under Assumption 1, for any δθ ∈ D we obtain where p− (p 1,0 , p 0,0 The proof of Theorem 1 provides parameters that maximize the regret of δθ ∈ D. If p 1,0 > p 0,0 , then the regret of δθ is maximized at Similarly, if p 1,0 < p 0,0 , then the regret of δθ is maximized at Using these results, we can calculate the maximum regret when p 1,0 and p 0,0 are fixed. Hence, we can compute max p∈P R(p, δθ ) by optimizing p 1,0 and p 0,0 .
In view of Theorem 1, we can compute the optimal bandwidth θ * using the following algorithm.
This algorithm is advantageous from a computational perspective.Computation of the exact minimax regret rule is often challenging in the context of statistical treatment choices.In situations where the sample size is large, calculating the maximum regret necessitates addressing a substantial-dimensional optimization problem.However, using Theorem 1, it is possible to calculate the maximum regret with greater ease.In the next section, we compute θ * by using this algorithm.
Remark 1. Similar to Ishihara and Kitagawa (2021), when the Lipschitz constant C is unknown, we do not know how to select C in a theoretically justifiable data-driven manner.
However, Ishihara and Kitagawa (2021) and Yata (2021) propose some practical choices for C. In the empirical application, Ishihara and Kitagawa ( 2021) perform leave-one-out cross-validation to select C. Yata ( 2021) estimates a lower bound on C by using the derivative of the conditional mean function.In our setting, we can also apply both of these methods.

Numerical Examples
In this section, we perform a numerical analysis to compare the optimal bandwidth explained in the previous section with that of the normally distributed outcomes.Throughout this section, we set the pre-treatment covariates as equidistant grid points on [−1, 1]: where we set n 1 = n 0 = n/2.We consider the kernel regression class D defined as ( 5), where we use the Gaussian kernel as the kernel function.We calculate the optimal bandwidth θ * by using the proposed method.
We compare our method with bandwidth selection, which minimizes the maximum regret when the outcome variables are normally distributed.Suppose that Y 1,i ∼ N(p 1,i , 0.5 2 ) and Y 0,i ∼ N(p 0,i , 0.5 2 ).Then, we set the parameter space as P N ≡ P N 1 × P N 0 , where Define w 1,i,θ ≡ K(X 1,i /θ)/ n 1 i=1 K(X 1,i /θ) and w 0,i,θ ≡ K(X 0,i /θ)/ n 0 i=1 K(X 0,i /θ).Using the argument of Ishihara and Kitagawa (2021), the maximum regret can be expressed as follows: where Φ(•) is a distribution function of N(0, 1), η(a) ≡ max t>0 {t • Φ(−t + a)}, s(θ) ≡ 0.5 Appendix 2 provides the details of the derivation.Using this result, we calculate the bandwidth that minimizes the maximum regret for the normally distributed outcomes.1 lists the optimal bandwidth choices for binary and normal cases.In the binary case, we calculate the optimal bandwidth by using the algorithm in Section 3.2.In the normal case, we calculate the bandwidth that minimizes (8), that is, the bandwidth choice proposed by Ishihara and Kitagawa (2021).
In many cases, the optimal bandwidth decreases as C decreases or n increases.When n is large, the optimal bandwidth for the binary outcomes is close to that for normally distributed outcomes.This phenomenon arises due to the asymptotic normality of the kernel regression estimator.In the binary case, the maximum regret ( 7) is a step function with respect to θ and approaches a continuous function as n increases.Hence, Table 1 shows the optimal bandwidth range when n = 10.When n is small, the optimal bandwidth for the binary outcomes can be significantly different from that for normally distributed outcomes.

Conclusion
This study investigated whether to treat individuals based on observed covariates.The standard approach to this problem is to use a plug-in rule that determines the treatment based on the sign of an estimate of the CATE.We focused on statistical treatment rules based on nonparametric kernel regression.In situations in which the outcome variables are normally distributed, Ishihara and Kitagawa (2021) showed that the maximum regret can be calculated easily.This study demonstrated that the maximum regret can be calculated by optimizing two parameters even when the outcome variables are binary.Using these results, we proposed an optimal bandwidth selection method for the binary outcomes.In addition, we performed a numerical analysis to compare the optimal bandwidth choices for binary and normally distributed outcomes.For any p ∈ [0, 1], p− 1 (p) and p+ 0 (p) are contained in P 1 and P 0 , respectively.Additionally, we have p 1,i ≥ p− 1,i (p 1,0 ) and p 0,i ≤ p+ 0,i (p 0,0 ) for all i.Because Ber(p) has first-order stochastic dominance over Ber(p) for p ≥ p, it follows from Assumption 1 and Lemma 1 that As p 1,0 = p 0,0 implies R(p, δθ ) = 0, we obtain (7).
Hence, we obtain