Statistical decision theory respecting stochastic dominance

The statistical decision theory pioneered by (Wald, Statistical decision functions, Wiley, 1950) has used state-dependent mean loss (risk) to measure the performance of statistical decision functions across potential samples. We think it evident that evaluation of performance should respect stochastic dominance, but we do not see a compelling reason to focus exclusively on mean loss. We think it instructive to also measure performance by other functionals that respect stochastic dominance, such as quantiles of the distribution of loss. This paper develops general principles and illustrative applications for statistical decision theory respecting stochastic dominance. We modify the Wald definition of admissibility to an analogous concept of stochastic dominance (SD) admissibility, which uses stochastic dominance rather than mean sampling performance to compare alternative decision rules. We study SD admissibility in two relatively simple classes of decision problems that arise in treatment choice. We reevaluate the relationship between the MLE, James–Stein, and James–Stein positive part estimators from the perspective of SD admissibility. We consider alternative criteria for choice among SD-admissible rules. We juxtapose traditional criteria based on risk, regret, or Bayes risk with analogous ones based on quantiles of state-dependent sampling distributions or the Bayes distribution of loss.


Introduction
Wald (1950) considered the broad problem of using sample data to make decisions under uncertainty. He posed the task as choice of a statistical decision function (a rule, for short), which maps potentially available data into a choice among the feasible actions. He recommended ex ante evaluation of statistical decision functions as procedures, chosen prior to realization of the data, specifying how a decision maker would use whatever data may be realized. Expressing the objective as minimization of loss, he proposed that the decision maker evaluate a rule by its mean performance across potential samples, which he termed risk.
In the presence of uncertainty about the loss function and the sampling process yielding the data, Wald prescribed a three-step decision process. The first stage specifies the state space (parameter space), which indexes the loss functions and sampling distributions that the decision maker deems possible. The second stage eliminates inadmissible rules. A rule is inadmissible (weakly dominated) if there exists another one that yields at least as good mean sampling performance in every possible state of nature and strictly better mean performance in some state. The third stage uses some criterion to choose an admissible rule. Wald studied the minimax criterion when the decision maker places no subjective probability distribution on the state space and minimization of Bayes risk (the subjective mean of risk across states) when such a distribution is present.
In many respects, the Wald framework has breathtaking generality. It enables comparison of all statistical decision functions whose risk is well-defined in each possible state. It applies whatever the sampling process and sample size may be. It applies whatever information the decision maker may have about the loss function and the sampling process. The state space may be finite dimensional (parametric) or larger (nonparametric). The true state of nature may be point or partially identified.
A striking exception to the generality of the Wald framework is its use of mean loss to measure the probabilistic performance of alternative rules. Risk is state-dependent mean loss across potential samples and Bayes risk is overall mean loss across samples and states when a subjective distribution is placed on the state space. The literature on statistical decision theory has followed Wald in measuring sampling and overall performance by risk and Bayes risk. See, for example, the texts of Ferguson (1967) and Berger (1985).
We cannot be sure why statistical decision theory has exclusively used mean loss to measure the performance of statistical decision functions, but we can conjecture. One reason may have been the predisposition of statisticians in the mid-twentieth century to use the mean to express the central tendency of probability distributions rather than the median or other location parameters; see Huber (1981) for an interesting discussion. Another reason may have been the influence of the von Neumann and Morgenstern (1944) and Savage (1954) axiomatic derivations of maximization of expected utility, which have often been interpreted as providing rationales to favor this decision criterion over others. Yet subsequent developments in axiomatic decision theory have called into question whether the axioms that yield expected utility maximization are as compelling as they once seemed. See, for example, Binmore (2009).
Considering the matter afresh, we think it evident that evaluation of the probabilistic performance of statistical decision functions should respect stochastic dominance. However, we do not see a compelling reason to focus exclusively on mean loss. We think it instructive to measure probabilistic performance by various functionals that respect stochastic dominance. These include the means of increasing functions of loss and quantiles of the distribution of loss. This paper develops general principles and illustrative applications for statistical decision theory respecting stochastic dominance. The general principles are introduced in Section 2. We modify the Wald definition of admissibility to an analogous concept of stochastic dominance (SD) admissibility, which uses stochastic dominance rather than mean sampling performance to compare alternative statistical decision functions. We cite representation theorems that characterize stochastic dominance in terms of inequalities ordering the means of increasing functions and the quantiles of two probability distributions. These theorems yield alternative characterizations of SD-admissibility.
Sections 3 and 4 apply the general principles to particular classes of decision problems. Section 3 considers the special case of state-dependent binary loss, where the loss function takes only two values in each state. We show that, when the loss function has this form, state-dependent error probabilities are a sufficient statistic for sampling performance and SD admissibility is equivalent to mean admissibility.
An important application occurs in decision problems where a planner uses sample data to inform choice of one of two treatments to assign to a population of persons. It has been common in medical and other settings to use experimental or observational data on treatment response to test the superiority of one treatment relative to the other and to use the test result to make a treatment choice. In this setting, every rule assigning all members of the population to a single treatment is characterizable as performance of a hypothesis test. We show that the use of error probabilities to determine the admissibility of test rules differs from its traditional use in the theory of hypothesis testing.
Section 4 studies a class of decision problems in which SD and mean admissibility do not coincide.
These are problems in which the set of feasible actions is ordered and the sampling process, which generates real-valued data, satisfies the monotone likelihood ratio property. Analysis of mean admissibility in this setting dates back to Karlin and Rubin (1956). Here we study SD admissibility. Possible applications occur when choosing the dose of a real-valued treatment for a population given real-valued sample data that are informative about dose response.
Section 5 revisits the Stein phenomenon of mean inadmissibility of the MLE estimator of a multivariate normal mean of dimension greater or equal to three when the loss function is the componentwise sum of squared losses. We reevaluate the relationship between the MLE, James-Stein, and James-Stein positive part estimators from the perspective of SD admissibility. Section 6 considers alternative criteria for choice among SD-admissible actions. We juxtapose traditional criteria based on risk, regret, or Bayes risk with analogous ones based on quantiles of statedependent sampling distributions or the Bayes distribution of loss. We show how mean and quantile criteria differ when applied to choice of a test rule.

General Principles
Section 2.1 reviews the concepts of Wald's statistical decision theory. Section 2.2 generalizes these concepts to make stochastic dominance rather than risk the basic quantity used to evaluate the performance of statistical decision functions. Section 2.3 uses two representation theorems for stochastic dominance to characterize SD-admissibility by classes of inequalities that order the means of increasing functions and the quantiles of loss.

Concepts of the Wald Theory
Wald's statistical decision theory begins with specification of a state space S, a set of feasible decisions (or actions) D, and a loss function L(•, •): S × D → [0, ∞) specifying the loss incurred by each feasible action in each possible state. The ideal objective is to minimize loss in the true state. Given that the true state is unknown, the ideal objective is sure to be achievable only if there exists an action that uniformly minimizes loss in all states of S. Wald's practical objective is to prescribe reasonable decision rules when no such action exists.
The adjective "statistical" describes statistical decision theory because Wald assumes that a statedependent sampling distribution Qs generates data whose value, say ψ, lies in a known sample space Ψ. He supposes that the decision maker observes ψ and knows the vector (Qs, s  S) of state-dependent sampling distributions. In this setting, a statistical decision function δ(•): Ψ → D is any Ψ-measurable function that maps the data into an action. Let Δ denote the space of feasible rules.
Research in statistical decision theory often finds it useful to consider randomized rules that map the data into a specified probability distribution on D rather than into a specific action. Consideration of randomized rules does not require alteration of the definition of δ. One may define the sample space and the state-dependent sampling distributions to include a white-noise component used to randomly choose an action.
To measure the performance of a candidate rule δ, Wald focuses on the state-dependent mean loss (risk) that it generates across potential samples; that is, Risk is computable in principle, although computation may be difficult in practice. Supposing that computation of risk is tractable, Wald recommends use of the vector [R(s, δ), s  S] of state-dependent risks to measure the performance of δ across potential samples and to compare δ with other rules.
To begin, rule δ is deemed better than rule δ′ if R(s, δ) ≤ R(s, δ′) for all s  S and R(s, δ) < R(s, δ′) for some s. If there exists a δ that is better than δ′ in this sense, then δ′ is said to be inadmissible and should be eliminated from further consideration. A rule that is not inadmissible is called admissible.
Going a bit further, a decision maker can eliminate an admissible rule when there exists a riskequivalent rule that is retained for consideration. Rules δ and δ′ are risk-equivalent if R(s, δ) = R(s, δ′) for all s  S. When multiple admissible rules are risk-equivalent, a decision maker who uses risk to evaluate sampling performance can eliminate all but one of them without consequence.
Having eliminated all inadmissible rules and perhaps some admissible rules within risk-equivalent groups of rules, the decision maker's problem is to choose among the subset of rules that remain, say Δa.
It is possible in principle that Δa may be empty, but applications of the Wald theory typically have enough regularity to ensure not only that Δa is non-empty but that every inadmissible decision function is dominated by an admissible one.
Whereas elimination of inadmissible and risk-equivalent admissible rules is uncontroversial, there is no consensus on choice within Δa, which requires comparison of rules whose risk vectors are unordered.
Wald studied minimization of Bayes risk when the decision maker places a subjective probability distribution, say π, on the state space. This criterion solves the problem (2) min ∫R(s, δ)dπ(s). It often is difficult to determine the set of admissible rules. Given this, researchers applying the Wald theory commonly skip the step of determining admissibility and use a decision criterion to choose among all feasible options, not just those that are admissible. When any of the criteria listed above yields a unique choice, it necessarily is admissible. When a criterion yields a set of equally good choices, the set may include inadmissible options that are strictly dominated only in states that do not affect the value of the optimum. Bayes risk is unaffected by values of risk that occur off the π-support of S. Maximum risk and regret are unaffected by dominance in states that do not determine the maximum.

Respect for Stochastic Dominance
The new work of this paper begins with the observation that the basic probabilistic quantity underlying statistical decision theory is not risk but rather the state-dependent distribution of loss that a decision function generates across potential samples; that is, Qs{L[s, δ(ψ)]}. The expectation (risk) is but one of many potentially relevant features of this distribution.
State-dependent distributions of loss are computable in principle. Supposing that computation is tractable, we think it natural to generalize the Wald theory by recommending use of the vector (Qs{L[s, δ(ψ)]}, s  S) to measure the performance of δ across potential samples. It is also natural to recommend that evaluation of the performance of alternative statistical decision functions should respect stochastic dominance. This recommendation has many precedents in studies of decision making that are not explicitly concerned with use of sample data. See, for example, Quirk and Saposnik (1962), Hadar and Russell (1969), Hanoch and Levy (1969), and Manski (1988 When a decision maker places a subjective distribution on the state space, respect for stochastic dominance means that one should Bayes-SD prefer δ to δ′ if the distribution of loss across samples and states generated by δ′ stochastically dominates that generated by δ. The distribution of loss under δ is We will adapt Wald's definition of Bayes risk and say that Φπ is the Bayes loss distribution. The fact that Φ is the mean over S of the state-dependent loss distributions implies a connection between SD-preference and Bayes-SD preference. The following lemma follows immediately from (5): Lemma 1: If rule δ is SD-preferred to rule δ′, then Φπ{L[s, δ′(ψ)]} ≥sd Φπ{L[s, δ(ψ)]}. 

Representation Theorems Relating SD-Admissibility to Mean and Quantile Loss
Respect for stochastic dominance does not require the decision maker to use any particular real functional of loss distributions to measure the performance of a decision function. Nevertheless, there exist useful representation theorems that characterize stochastic dominance in terms of two alternative classes of functionals, these being means of increasing functions of loss and quantiles of the distribution of loss.

Means of Increasing Functions of Loss
Let P and P′ denote two probability distributions on the real line. It has long been known that P = P′ if and only if ∫f(y)dP(y) = ∫f(y)dP′(y) for every integrable increasing function f(•). Several articles studying expected utility maximization when utility is an increasing function of income show that P stochastically dominates P′ if and only if ∫f(y)dP(y) ≥ ∫f(y)dP′(y) for every integrable increasing function f(•) and ∫f(y)dP(y) > ∫f(y)dP′(y) for some increasing f(•). See Quirk and Saposnik (1962), Hadar and Russell (1969), and Hanoch and Levy (1969). This representation theorem immediately yields a characterization of SD-inadmissibility: Lemma 2: Rule δ′ is SD-inadmissible if and only if there exists a rule δ such that for every increasing f(·) and all s  S, for some increasing f(·) and s. 
Section 3.1 develops the basic finding. Section 3.2 applies it to choice between two treatments.

Using Error Probabilities to Characterize SD and Mean Admissibility
When loss has form (8) A succinct way to express SD-preference is to define the state-dependent probability ρs(δ) that δ yields an error, choosing the action with larger loss rather than the one with smaller loss. An error is logically impossible when Lsa = Lsb, so ρs(δ) = 0 in these states. In states with Lsa ≠ Lsb, With this definition of error probabilities, we obtain a simple characterization of SD-inadmissibility.
Lemma 4: Let the loss function have form (8). Then rule δ′ is SD-inadmissible if and only if there exists another rule δ such that ρs(δ) ≤ ρs(δ′) for all s  S and ρs(δ) < ρs(δ′) for some s.  Error probabilities also characterize mean admissibility. Given a loss function of form (8), mean loss in state s (risk) is This yields a parallel characterization of mean inadmissibility.

Choice between Two Treatments
An important class of applications of statistical decision theory consider use of sample data on treatment response to inform a planner who must choose treatments for a population. Past work by Manski (2004Manski ( , 2005Manski ( , 2021, Manski and Tetenov (2007), Hirano and Porter (2009), Stoye (2009), Tetenov (2012, Manski and Tetenov (2016), and Kitagawa and Tetenov (2018) has used the Wald framework to study this decision problem. A statistical decision function uses the data to choose a treatment allocation, so such a function has been called a statistical treatment rule (STR). The planner's objective has been expressed as maximization of a social welfare function that sums treatment outcomes across the population.
The mean sampling performance of an STR has been called expected welfare. Maximization of social welfare is equivalent to minimization of loss. Expected welfare is negative risk.
We consider here the relatively simple case in which the planner must assign one of two treatments to each member of a treatment population, denoted J. The feasible treatments are T = {a, b}. Each j  J has a response function uj(): T → Y mapping treatments t  T into real-valued individual welfare outcomes uj(t). Treatment is individualistic; that is, a person's outcome may depend on the treatment he is assigned but not on the treatments assigned to others. The population is a probability space (J, Ω, P), and the probability distribution P[u()] of the random function u(): T → R describes treatment response across the population. The population is large in the sense that J is uncountable and P(j) = 0, j  J.
While treatment response may be heterogeneous, we suppose here that the members of the population are observationally identical to the planner. That is, the planner does not observe person-specific covariates that would enable systematic differentiation of treatment of different persons. In principle, the planner can randomly allocate persons to the two treatments with specified allocation probabilities. The notation introduced below allows for this possibility. However, when applying the findings of Section 3.1, we will consider only test rules, which assign all members of the population to one treatment or the other.

The Mean Sampling Performance of STRs
A statistical treatment rule maps sample data into a treatment allocation. Let Δ denote the space of functions that map T  Ψ into the unit interval and that satisfy the adding-up conditions: δ  Δ  δ(a, ψ) + δ(b, ψ) = 1,  ψ  Ψ. Each function δ  Δ defines a statistical treatment rule, δ(a, ψ) and δ(b, ψ) being the fractions of the population assigned to treatments a and b when the data are ψ. Observe that this definition of an STR does not specify which persons receive each treatment, only the assignment shares.
Designation of the particular persons receiving each treatment is immaterial because assignment is random, the population is large, and the planner has an additive welfare function. As δ(a, ψ) + δ(b, ψ) = 1, we use the shorthand δ(ψ) to denote the fraction assigned to treatment b. The fraction assigned to treatment a is 1 − δ(ψ).
The planner wants to maximize population welfare, which adds welfare outcomes across persons.
The problem of interest is treatment choice when knowledge of P and Q does not suffice to determine the ordering of α and β. Hence, the planner does not know the optimal treatment. Let {(Ps, Qs), s  S} be the set of (P, Q) pairs that the planner deems possible. The planner does not know the optimal treatment if S contains at least one state such that αs > βs and another such that αs < βs. We assume this throughout.
Considered as a function of ψ, U(δ, Ps, ψ) is a random variable with state-dependent sampling where Es[δ(ψ)]  Ψ δ(ψ)dQs(ψ) is the mean (across potential samples) fraction of persons who are assigned to treatment b.

SD and Mean Admissibility of Test Rules
An important class of STRs are the uniformly singleton rules. Given a treatment set of any size, a rule is uniformly singleton if, for every possible data realization, it assigns the entire population to one treatment. The treatment to which the entire population is assigned may vary with the data realization.
Uniformly singleton rules are particularly simple when there are two treatments. In this case, a rule is uniformly singleton rule if, for each ψ  Ψ, δ(ψ) = 1 or δ(ψ) = 0. The class of uniformly singleton STRs is the same as the class of rules that use the outcome of a hypothesis test to choose between the treatments.
Construction of a test rule begins by partitioning the state space into disjoint subsets Sa and Sb, where Sa contains all states in which treatment a is uniquely optimal and Sb contains all states in which b is uniquely optimal. Thus, αs > βs  s  Sa, αs < βs  s  Sb, and the states with αs = βs are somehow split between the two sets. Let s* denote the unknown true state. The two hypotheses are [s*  Sa] and [s* 

S b ].
A test rule δ partitions the sample space Ψ into disjoint acceptance regions Ψδa and Ψδb. When the data ψ lie in Ψδa, the rule accepts hypothesis [s*  Sa] by setting δ(ψ) = 0. When ψ lies in Ψδb, the rule accepts [s*  Sb] by setting δ(ψ) = 1. We use the word "accepts" rather than the traditional term "does not reject" because treatment choice is an affirmative action.
The above shows that test rules are uniformly singleton. The converse holds as well. If δ is uniformly singleton, one can collect all of the data values for which the rule assigns everyone to treatment a, call this subset of the sample space the acceptance region Ψδa, and do likewise for Ψδb. In what follows, we use the term test rule rather than uniformly singleton rule.
Lemmas 4 and 5 shows that a test rule δ is both SD and mean inadmissible if there exists another test rule δ′ such that ρs(δ) ≤ ρs(δ′) for all s  S and ρs(δ) < ρs(δ′) for some s.
A special but important class of hypothesis tests juxtaposes two simple hypotheses. Then the Neyman-Pearson Lemma shows that, among all tests with a specified probability of a Type I error, the likelihood-ratio test minimizes the probability of a Type II error, and vice versa. In the context of treatment choice, having two simple hypotheses means that S contains two states, with treatment a better in one state and b better in the other. Then the Neyman-Pearson Lemma implies that a planner considering use of a test rule need not look beyond the class of likelihood-ratio tests. Applying Lemmas 4 and 5 to likelihood ratio tests yields this result, which makes explicit the form of error probabilities for likelihood-ratio tests. Hence, the result is an immediate application of the proposition.

Q. E. D.
A fundamental feature of the above analysis is that all error probabilities symmetrically determine the result. In contrast, the classical theory of hypothesis testing differentiates between null and alternative hypotheses, and correspondingly between Type I and Type II errors. It restricts attention to tests that yield a predetermined probability of a Type I error and seeks a test of this type that yields an adequately small probability of a Type II error. Such asymmetric treatment of the two hypotheses is illogical from the perspective of statistical decision theory.

Ordered Actions and Continuous Real Data Satisfying the Monotone-Likelihood Ratio Property
We now study a class of decision problems in which SD and mean admissibility do not coincide.
These are problems in which the set of feasible actions is ordered and the sampling process generating the data satisfies the monotone likelihood ratio property. Analysis of mean admissibility in this setting dates back to Karlin and Rubin (1956), with continuation by Manski and Tetenov (2007). Here we study SD admissibility. Section 4.1 develops the basic finding. Section 4.2 applies it to treatment choice.

Basic Finding
Proposition 7 shows that the fractional monotone treatment rules form an essentially complete class with respect to stochastic dominance when the data satisfy the maintained assumptions. A fractional monotone rule is one in which δ(ψ) is weakly increasing in ψ. Essential completeness means that any randomized decision rule δ(ψ, υ) can be replaced by a fractional monotone rule δ'(ψ) that weakly stochastically dominates δ(ψ, υ) in each state s. The planner then does not need to consider any other types of STRs. Manski and Tetenov (2007, Proposition 1) show that fractional monotone rules form an essentially complete class when the planner wants to maximize the expectation Es[f(U(δ, Ps, ψ, υ))] of a concave-monotone function f() of the population welfare and ψ is binomial. Here we establish a more general result that a planner with any decision criterion that respects stochastic dominance can restrict attention to fractional monotone rules. Given that ψ has a continuous distribution, random variable F0(ψ) has a Uniform(0, 1) distribution in state s0. Hence, random variable δ'(ψ) = G -1 δ,s0(F0(ψ)) has c.d.f. Gδ,s0 in state s0. 1 Given that both G -1 δ,s0 () and F0() are non-decreasing, δ'(ψ) is also non-decreasing in ψ. Given that F0 is continuous and G -1 δ,s0 is left-continuous, δ'(ψ) is also left-continuous.
In states where u(a, s) is constant in a, the distributions of payoffs are identical for all strategies.
Hence, weak stochastic dominance holds. Now suppose that state s satisfies (14a), so u(a, s) is non- Given that state s satisfies (14a), the test with rejection region Ω' = (ψ: ψ ≤ ψt) is a likelihood-ratio test. The tests with rejection regions Ω and Ω' have the same size. If follows from the Neyman-Pearson lemma that test Ω' must be at least as powerful as Ω in state s. 2 That is, Gδ',s(t) ≥ Gδ,s(t).

Choice between Two Treatments
Proposition 7 applies to the treatment-choice problem of Section 3.2, with action a  [0, 1] denoting the fraction of the population assigned to treatment b. Payoff function (12) is decreasing in a when βs − αs < 0, increasing in a when βs − αs > 0, and constant when βs − αs = 0. Hence, the payoff function satisfies the assumptions of the proposition. Suppose that Qs(ψ) is continuous and possesses the monotone likelihood ratio property in (β − α). Then the proposition shows that the class of fractional monotone STRs is essentially complete under any decision criterion that respects stochastic dominance.

Choice of Treatment Dose
Let action a be a dose level for a real-valued treatment; for example, it may be the dose of a medical drug treatment. Suppose that administering a higher dose is beneficial but costly. In particular, let the Suppose that one obtains real-valued data ψ drawn from a continuous distribution and that ψ provides an informative but imperfect signal about b(s); for example, ψ may be the result of an informative but imperfect diagnostic test. It is relatively easy to imagine signal generation processes in which ψ has the MLR property. For example, it may be that ψ equals b(s) plus a white-noise error. Then Proposition 7 implies that treatment rule should be monotone in the realization of ψ and should be non-randomized.
The MLE estimator in this problem is δ ( ) ≡ . Stein (1956) has shown that the MLE estimator is mean-inadmissible. It is mean-dominated, for example, by the James-Stein (1961)

Mean and Quantile Decision Criteria
We now turn attention from SD-admissibility to choice of a decision function. Section 2.1 posed three leading criteria that use mean performance to evaluate alternative rules---minimax, minimax-regret, and minimization of Bayes risk. Here we juxtapose these criteria with analogous ones that use quantile performance. Section 6.1 presents the quantile criteria in abstraction. Section 6.2 applies them to selection of a test rule for choice between two treatments.
Decision making using a quantile-utility criterion was proposed in Manski (1988) in a setting without sample data. It was observed there that maximization of expected and quantile utility differ in important respects. Whereas the ranking of actions by expected utility is invariant only to cardinal transformations of the objective function, the ranking by quantile utility is invariant to ordinal transformations. Whereas expected utility conveys risk preferences through the shape of the utility function, quantile utility does so through the specified quantile, with higher values conveying more risk preference. Whereas expected utility is not well-defined when the distribution of utility has unbounded support with fat tails, quantile utility is always well-defined.
There is reason to think that quantiles of welfare distributions matter to decision makers. For example, recent writings on finance have shown explicit concern with low quantiles of earnings distributions, using the term value-at-risk. See, for example, Jorion (2006).

Quantile Criteria
Let λ  (0, 1) be a specified quantile. The λ-quantile minimax and minimax-regret criteria are analogous to criteria (3) and (4)  There are at least two ways that one might define a quantile analog to minimization of Bayes risk.
Replacement of risk with λ-quantile loss in criterion (2)  Although the mean-based criteria (2) and (2′) are equivalent, the quantile-based criteria (16) and (17) generally differ from one another.
It is well-known that minimization of Bayes risk is also equivalent to solution of the collection of conditional Bayes decision criteria (2′′) min ∫L(s, d)dΦ(s|ψ), ψ ∊ Ψ. d  D See, for example, Berger (1985, pp.159-160). That is, minimization of Bayes risk is equivalent to minimization of the posterior expected value of Bayes loss at every point in the sample space. This result, which follows from Fubini's Theorem, does not hold for quantile-based criteria. The quantile analog of (2′′) would be to minimize the posterior λ-quantile of Bayes loss at every point in the sample space. This posterior quantile criterion generally differs from both (18) and (19). We do not know whether it has an interpretation from the perspective of ex ante statistical decision theory. Hence, we do not consider it further.

Criteria for Selection of a Test Rule
The mean and quantile-based decision criteria of Sections 2.1 and 6.1 offer a menu of procedures for choice of a statistical decision function. To illustrate their application, we continue the analysis of Section 3.2 and consider choice between two treatments, focusing on test rules. Following common practice, we skip the step of determining admissibility and use the criteria to choose among all feasible test rules, not just those that are admissible. Observe that mean and quantile sampling performance are both monotonically decreasing in the error probability, falling from max(αs, βs) to min(αs, βs) as ρs(δ) increases from 0 to 1. However, they differ in the pattern of decrease. Whereas mean performance varies linearly with the error probability, quantile performance is a step function. This difference in the pattern of decrease implies differences between decision criteria based on mean and quantile performance, described below.  (20) is typically solved by a data-invariant rule. Savage (1951) mentions this in passing. Manski (2004) proves it in the special case in which one treatment is a status quo option whose mean treatment response is known.
Now consider a data-varying rule, which chooses treatment a for some data realizations and b for others. Suppose, as is typically the case in practice, that ρs(δ) > 0 in every state where αs ≠ βs. Minimum expected welfare for this rule is less than αL because there exist states in which βs < αL and there is positive sampling probability that the rule choose treatment b. Hence, the data-invariant rule that always chooses treatment a uniquely solves the maximin problem.
Maximin choice based on quantile performance does not yield such an extreme result and, hence, may be more palatable. Minimum λ-quantile welfare with the two data-invariant rules are αL and βL. The minimum λ-quantile welfare of a data-varying rule is less than αL if ρs(δ) ≥ λ in some state where βs < αL.
However, the minimum λ-quantile welfare of such a rule is greater than αL if ρs(δ) < λ in all states where βs < αL and in some state where βs > αL. Thus, a data-varying rule solves the maximin problem if its error probabilities are positive but not too large.
The difference is consequential. The minimax value of mean regret is generically positive. It is zero only in degenerate settings where there exists a rule with ρs(δ) = 0 in all states of nature. On the other hand, minimax λ-quantile regret is zero in some settings with positive error probabilities. Maximum λquantile regret is zero if ρs(δ) < λ in all states.
First observe that a rule with ρs(δ) < λ in all states trivially exists when λ > ½. While one ordinarily thinks of ψ as data that are informative about treatment response, statistical decision theory also encompasses study of STRs that make treatment choice vary with uninformative data. That is, δ may make the treatment allocation depend on data generated by a randomizing device. Suppose in particular that Ψ = {0, 1}, Qs(ψ = 0) = Qs(ψ = 1) = ½ for all s  S, and δ is the rule that lets Ψδa = {0} and Ψδb = {1}. The error probabilities for this test rule are ρs(δ) = ½ for all s  S. Hence, the λ-quantile maximum regret of rule δ is zero for all λ > ½.
To the best of our knowledge, there exists no similarly obvious way to form a rule with zero λquantile maximum regret when λ  ½. In this domain, achievement of zero maximum regret becomes a more stringent condition as λ decreases. It appears infeasible to perform an elementary general analysis, but we can make progress by examining particular contexts.
Proposition 8 demonstrates that test rules with zero maximum regret exist if S is a metric space with positive distance between the sets Sa and Sb (for example, if S is finite), and the data enable sufficiently precise estimation of the true state. In contrast, Proposition 9 shows that for λ < ½, no such test rule exists if the set S is connected and other regularity conditions hold. In combination, the two propositions show that zero λ-quantile maximum regret is neither an empty concept nor ubiquitous. It is attainable by a test rule in some settings but not in others.
Proposition 8: Let S be a subset of a metric space (Θ, d) with distance d(, ). Let Proposition 9: Let S be a connected subset of a metric space (Θ, d) with distance d(, ). Let Sa> ≡ {s  S: αs > βs} and Sb> ≡ {s  S: αs < βs}. Assume that the closure of the set Sa>  Sb> is S; that is, for any s  S and any r > 0, there exists s'  Sa>  Sb> such that d(s, s') < r. Let the probability Qs(Ψ0) be continuous in s for every measurable subset of the sample space Ψ0  Ψ. Then no test rule with zero λ-quantile maximum regret exists for λ < ½.  Proof: Let λ < ½ and suppose that test-rule δ has zero maximum regret. Let sa  Sa> and sb  Sb>. Then Qsa(Ψδb) < λ and Qsb(Ψδb) > 1 − λ.
The sampling distribution has the required continuity if, for example, Qs is Normal(s, k) for some fixed k > 0 or if Qs is Binomial (n, s) for some integer n.