Abstract
When we are faced with a choice among acts, but are uncertain about the true state of the world, we may be uncertain about the acts’ “choiceworthiness”. Decision theories guide our choice by making normative claims about how we should respond to this uncertainty. If we are unsure which decision theory is correct, however, we may remain unsure of what we ought to do. Given this decisiontheoretic uncertainty, metatheories attempt to resolve the conflicts between our decision theories...but we may be unsure which metatheory is correct as well. This reasoning can launch a regress of everhigherorder uncertainty, which may leave one forever uncertain about what one ought to do. There is, fortunately, a class of circumstances under which this regress is not a problem. If one holds a cardinal understanding of subjective choiceworthiness, and accepts certain other criteria (which are too weak to specify any particular decision theory), one’s hierarchy of metanormative uncertainty ultimately converges to precise definitions of “subjective choiceworthiness” for any finite set of acts. If one allows the metanormative regress to extend to the transfinite ordinals, the convergence criteria can be weakened further. Finally, the structure of these results applies straightforwardly not just to decisiontheoretic uncertainty, but also to other varieties of normative uncertainty, such as moral uncertainty.
1 Introduction
People sometimes make claims about how we ought to act in the face of empirical uncertainty. A “decision theory” is a collection of such claims. Because they make demands on our behavior, decision theories are “norms”. (Descriptive claims about how actual agents act under uncertainty are sometimes called “positive decision theories”. For the purposes of this paper, the term “decision theory” will refer exclusively to normative decision theories, as defined above.) Moral theories are also norms, for example, because they too are collections of claims about how we “ought” (albeit in another sense) to act.
Though I hope that the formal results of this paper are of interest to readers of many metaethical persuasions, the language used throughout will assume a realist position. I will say, then, that we suffer decisiontheoretic uncertainty when we assign positive probability to the truthvalues of conflicting decision theories, and, more generally, some form of normative uncertainty whenever we assign positive probability to the truthvalues of conflicting norms.
The most widely accepted normative decision theory, by far, is expected utility theory. As a normative theory, expected utility theory can be interpreted as the claim that we have a cardinal utility function whose value depends on what act we choose and on the state of the world, and that we ought to act so as to maximize the expected value of that function, given our uncertainty across states. Alternatively, expected utility theory can be interpreted as the claim that we have an ordinal utility function whose value depends on what act we choose and on the state of the world, and that we ought to act in such a way as satisfies various assumptions (the von Neumann–Morgenstern axioms, perhaps), which together happen to entail that we will be acting as if we were maximizing the expected value of a cardinal utility function. Either way, expected utility theory, as a normative ideal, is a set of claims about what we ought to do.
Expected utility theory is not without critics. Soon after its foundations were laid in the 1940s and 1950s by John von Neumann, Oskar Morgenstern, and Leonard Savage, others, such as Maurice Allais, raised objections to the claim that expected utility maximization is a model of ideal behavior. It is true that, in the apparent absence of plausible alternatives, expected utility theory came to serve as the unchallenged basis for almost all of economic theory. But the debate over the normative claims that underlie it continued among some economists and philosophers, and has recently received new attention with Lara Buchak’s 2013 publication of Risk and Rationality, which argues that it can be rational to violate the von Neumann–Morgenstern “independence” axiom so as to act on explicit risk preferences. In sum, the normative claims made by expected utility theory—even if sometimes taken for granted—are claims about whose truthvalue an agent can be uncertain.
What “should” one do in the face of decisiontheoretic uncertainty? Is there even a coherent way to interpret that question? The position that there is—that we “ought” to act on the basis of our normative uncertainty in general—has been called “uncertaintism” (or, less frequently, “metanormativism”). The uncertaintist, presumably, must then offer an account of what one should do when one assigns, say, a seventy percent chance to the truth of expected utility theory, and a thirty percent chance to the truth of Buchak’s alternative.
As we can see, questions about how to deal with empirical uncertainty give rise to questions about how to deal with decisiontheoretic uncertainty. But our regress does not stop there. Just as decision theories are theories about how to act in the face of empirical uncertainty, let us use the term “metanormative theories” for collections of claims about how we ought to act in the face of normative uncertainty. It seems that, just as we can suffer normative uncertainty, we can suffer metanormative uncertainty as well: we can assign positive probability to conflicting metanormative theories. Metametanormative theories, then, are collections of claims about how we ought to act in the face of metanormative uncertainty. And so on. In the end, it seems that the very existence of normative claims—the very notion that there are, in some sense or another, ways “one ought to behave”—organically gives rise to an infinite hierarchy of metanormative uncertainty, with which an agent may have to contend in the course of making a decision.
Postulating such a hierarchy may seem like a strange and unneccesarily complex solution to a rather small and obscure problem. By analogy, therefore, consider three other structures explored in detail by philosophers and economic theorists: belief hierarchies, preference hierarchies, and hierarchies of deliberation procedures. We have beliefs about the world; and when one reflects on the fact that we can also have beliefs about people’s beliefs, one can hardly help but document the emergence of a “belief hierarchy”, constituted of beliefs about the world, beliefs about beliefs (2ndorder beliefs), beliefs about beliefs about beliefs (3rdorder), and so on [For a rigorous exploration of the structure, see Mertens and Zamir (1985)]. Likewise, we have preferences over features of the world, and the fact we can also have preferences over the contents of people’s preferences (2ndorder preferences), and so on, gives rise to a “preference hierarchy” [see for example Bergemann et al. (2017)]. Finally, when we are not sure which act would maximize our utility, we may find it useful to ponder our options, even if doing so would come at a cost. Since pondering is in some sense just another option, we must then ponder whether to take one of our firstorder options or to take the option of pondering among them, and so on. Lin (2014) concludes that this deliberation must stop after finite steps, and will do so justifiably so long as the reasoner reaches the point that further metadeliberation no longer obviously dominates taking one of the acts available. Lipman (1991) also explores this problem, in terms more similar to those we will use below: by constructing a (transfinite) hierarchy of beliefs over decisionmaking procedures, which justifies an act whenever it is judged to be (judged to be...) at least as good as the other available options, including the options of further deliberation at any order.
Similarly, normative decision theories tell us how to act in the face of uncertainty about the true state of the world. The fact that we may have to act in the face of uncertainty about the true decision theory thus seems plausibly to give rise to a “metanormative hierarchy”, similar in many respects to the hierarchies above.
The infinite hierarchy of metanormative uncertainty, furthermore, is more than hypothetical. I myself, for instance, do not currently know the correct decision theory. Expected utility theory seems highly plausible to me, but I cannot fully rule out Buchak’s arguments for some sort of riskweighted expected utility theory, or arguments for theories that evade “Pascal’s muggings” by giving special treatment to lowprobability but highexpectedvalue events, to name just two examples. Likewise, I believe that there is a “true metatheory”—a correct way to act in the face of my decisiontheoretic uncertainty—but I am not certain of what it is, nor of the true theory at any order of the above hierarchy.
Despite all this uncertainty, however, my efforts to be moral or rational do ultimately result in conclusions about how to act. Somehow, from my infinite hierarchy of metanormative uncertainty, it seems I can be guided by normative concerns.
The process by which this happens seems potentially interesting. Questions of decisiontheoretic uncertainty have received little attention so far, however—research on normative uncertainty has focused primarily on moral uncertainty—and even the most recent inquiries into normative uncertainty (again, usually presented in the context of moral uncertainty) have generally left such questions unanswered.
Weatherson (2014) avoids the problem by positing that the only norms are “firstorder”; that there simply is no right or wrong way to deal with normative uncertainty.
Sepielli (2014) allows for “orders of rationality” and the corresponding “iterated normative uncertainty” (526). He constructs a metanormative hierarchy much like that we will define below, to serve a bridge between the mindindependent norms of which we are uncertain and the actionguiding, “allthingsconsidered” decision rule (given by what he calls “global systemic rationality”) to which we are more plausibly all accountable. Then, given an agent who assigns some positive credence to theories of rationality, at some order, which conflict as to which act is most rational for her (at that order), he expresses the suspicion that that appealing to a higherorder rationality concept will always allow the agent to eliminate at least one action available. Posit “actions A...Z”, he writes; “[p]erhaps my uncertainty regarding objective normativity will intentionally explain my doing any of A...R.... My hope is to show that, as a general matter, potential actions will be hived off with each steppingback” (and permanently so). Given a finite actset, this procedure would ultimately result in a coherent definition of what the agent subjectively ought to do. Unfortunately, as for whether the procedure succeeds, he concludes, “I’ll have to make good on that suspicion elsewhere” (539).
Tarsney (2017) reiterates Sepielli’s hope for what he calls “fixed point solutions” to the regress problem. He acknowledges, however, that such fixed points will not necessarily exist (240–241), and concludes that a general solution to the regress problem must thus be found altogether elsewhere. In particular—motivated also by the thought that finite, boundedly rational agents cannot be normatively required to work out the fixed point of an infinite hierarchy, even if it exists (238)^{Footnote 1}—he argues for what he calls the “Enkratic Principle” (246). This is the broad, mindindependent principle that, just as long as one is in some sense “trying one’s best” to do the firstorder right thing in the face of normative uncertainty, one’s decision is justified. On Tarsney’s account, the regress is thus cut off at the second order.
Finally, many accept that uncertaintism does indeed require some sort of fixedpoint solution to the regress problem—but conclude from this that there simply must be something deeply wrong with uncertaintism. The following sentiment, described in Volume 7 of Oxford Studies in Normative Ethics (2017), is typical:
But a regress problem looms. Let us suppose that I am uncertain among some ordinary moral theories, [and] I ask what to do given the probability distribution over T1...Tn. But I am uncertain as to the answer, assigning some probability to each of U1...Un. This prompts me to ask what I ought to do given this probability distribution.... We can imagine this process iterating indefinitely.... The possibility of normative uncertainty all the way up makes the uncertaintist project look pointless.
Does this possibility in fact render the uncertainist project pointless? Or can one accept the possibility of normative uncertainty all the way up, and still be normguided in some important classes of circumstances? I here argue the latter. To begin to answer the above questions more precisely, Sect. 2 presents a formal framework that aims to capture our intuitions about the concept of decisiontheoretic uncertainty. Within this framework, Sect. 3 specifies various conditions under which two similar convergence results follow, as given in Sect. 4. Section 5 considers the earlier sections’ implications for normative uncertainty more generally.
2 Framework
2.1 Choiceworthiness
A finite set \(A = \{a_{1}, \ldots , a_{A}\}\) of “feasible acts” presents itself. There is a finite set of “possible states” \(S = \{s_{1}, \ldots , s_{S}\}\) to which I assign positive probability.^{Footnote 2} I assign utilities to performing each act in each state, as represented by the utility function u(A, s), where the value assigned to each act is not necessarily independent of the alternatives in A.^{Footnote 3} I also find myself in an overall finite epistemic position e, specifying the probabilities I assign to all relevant claims.^{Footnote 4} Let us call \(\pi = \langle A, u, e \rangle \) my “choice problem”.
Definition 2.1
A choice problem \(\pi \) is a triple of (i) a set of acts A, (ii) an epistemic position e, and (iii) a statecontingent utility function u over A.
We will say that my utility function specifies the “objective choiceworthiness” (or simply “choiceworthiness”) of each act, conditional on each state. That is, given s, the choiceworthiness of \(a_{i}\) is \(u(A, s)_{i}\)—the ith element of the Avector u(A, s). From the probabilities I assign to the states in S, therefore, I also assign probabilities to potential values of the objective choiceworthiness of each act.
Definition 2.2
A (finite) choiceworthiness distribution is a (finite) probability distribution over choiceworthiness values for some (finite) set of acts.
Let \({\mathbb {D}}^{n}\) denote the set of all finite probability distributions in \({\mathbb {R}}^{n}\), and let some \(d(\pi ) \in {\mathbb {D}}^{A}\) represent the choiceworthiness distribution entailed by \(\pi \).
Many other finite probability distributions over \({\mathbb {R}}^{A}\) might do just as well as the chosen d at representing my finite choiceworthiness distribution. Exactly which others depends on how much structure is contained in our understanding of “utility”. If we understand utility to be a merely “ordinal” quantity, for instance, then any transformation of d that is monotonic in choiceworthiness (and constant in probability) represents the same choiceworthiness distribution. We are here assuming nothing about utility except that it at least partially orders actstate pairs from a given {feasible set \(\times \) possible set}, and that \({\mathbb {R}}\) is “rich enough” to capture any potential difference between the choiceworthiness values of particular actstate pairs—that choiceworthiness cannot be lexicographic, for instance. As discussed at the end of Sect. 2.5, these assumptions about choiceworthiness, here made implicitly, will follow from similar assumptions made explicitly about subjective choiceworthiness.
2.2 Subjective choiceworthiness
I am uncertain about acts’ choiceworthinesses. Even so, I may know that one act is the most appropriate for me to choose, given my epistemic position. As I write this, for instance, I assign high probability to the event that, if I go to the doctor, I will swiftly be cured of my back injury (an outcome I would prefer immensely to the status quo), and low probability to the roughly complementary event that, if I go to the doctor, I will waste some time and remain injured (an outcome to which I would slightly prefer the status quo). Despite this uncertainty, and all my other uncertainty, I am in fact certain that going to the doctor is the “better choice” for me right now (by far!). There is thus some scale on which the act of going to the doctor scores higher for me than the act of not going—and would score higher for anyone with the same utility function, in the same overall epistemic position, facing the same set of feasible acts. Let us call this scale “subjective chiceworthiness”.
It is my intuition that subjective choiceworthiness c, when welldefined, is fundamentally a cardinal scale. That is, I would maintain that a representation of acts’ subjective choiceworthinesses (for an agent in a given situation) in \({\mathbb {R}}^{A}\) would be unique at least up to affine transformation. If my feasible act set \(A = \{a_{1}, a_{2}, a_{3}\}\) consists of going to the doctor (\(a_{1}\)), going to a very slightly less competent doctor (\(a_{2}\)), or not going at all (\(a_{3}\)), then there is some important and foundational sense in which, given my epistemic position and my preferences, the distance between \(c(\pi )_{1}\) and \(c(\pi )_{2}\) is less than the distance between \(c(\pi )_{2}\) and \(c(\pi )_{3}\). It might be objected that I will always do whatever winds up being most subjectively choiceworthy; that therefore, in the absence of a specified theory of decisionmaking under uncertainty, no information is conveyed by postulated differences between the acts not chosen; and that c is therefore better understood as merely an ordering, or perhaps even as a choice relation. To this it might be replied that, under certain circumstances, differences in subjective choiceworthiness could bear some relationship to the subjective probability with which a subjectively suboptimal act would become optimal upon further reflection. Or that cardinal subjective choiceworthiness takes on a clearer meaning in other situations of normative uncertainty (i.e. one might not choose the most subjectively morally choiceworthy act, and might in some sense be more blameworthy the less subjectively morally choiceworthy one’s act was)—and that it would be strange for subjective choiceworthiness to be fundamentally cardinal in one of these situations but not the other. Or that our models of the world are generally simpler when we extend our intuitions regarding quantities’ cardinality beyond the domains in which they happen to be testable—such as our intuition that temperature is generally cardinal, even on some cold, distant star that we will only discover if its temperature rises above some threshold.
Furthermore, cardinal subjective choiceworthiness allows for the convergence results described below, and less structured interpretations of subjective choiceworthiness would not. If we are otherwise persuaded that the regress problem must have some solution or other, it is not circular to allow this observation itself to lend credibility to the concept of cardinal subjective choiceworthiness.
In any event, for the purposes of this analysis, we will understand subjective choiceworthiness (again, when welldefined) to be cardinal. We will represent it by a “subjective choiceworthiness function” \(c(\pi )\), where c assigns a real number to the subjective choiceworthiness of each of the acts in a feasible set A, for an agent with a utility function u, in epistemic position e.
Note that by having c map into the real numbers, we are assuming that all information about differences in subjective choiceworthiness (and therefore utility) can be captured by ratios of differences in real numbers. We are here explicitly assuming for subjective choiceworthiness what we provisionally assumed above for objective choiceworthiness—that, for instance, it cannot be lexicographic. Like probability theories that let us condition on probability 0 events, utility theories that let us distinguish between acts that differ infinitesimally in choiceworthiness may also be interesting to consider in light of the regress problem. However, we will not touch them here.
2.3 Metachoiceworthiness
In general, if I am to translate a choiceworthiness distribution d into a determination of how to act, I must invoke a “decision theory”: a collection of claims concerning how to evaluate acts in light of one’s choiceworthiness distribution. For example, using this terminology, one decision theory is “Expected Choiceworthiness Theory” (EC). EC is characterized by the fact that, if I am certain that it is the correct decision theory, then each act’s subjective choiceworthiness for me is its expected choiceworthiness under d.^{Footnote 5} Another decision theory would be “minimum choiceworthiness”—a theory characterized by the fact that, if I am certain that it is the correct decision theory, then each act’s subjective choiceworthiness for me is its minimum possible choiceworthiness under d.
Just as I am uncertain about the true state of the world, I may also be uncertain about the correct decision theory. To come to a determination of how to act, therefore, I may have to invoke a sort of “meta decision theory” (or, “2metatheory”): a collection of claims concerning how to respond to one’s uncertainty over decision theories.
Note that, since this is so, the decision theories (we will awkwardly call these “1metatheories”, for ease of indexing) cannot themselves be claims about subjective choiceworthiness. This is perhaps a surprising claim, so it bears repeating: expected utility theory (for example) is not, in this language, a theory about what subjective choiceworthiness is, or even about what it ought to be “all things considered”. It is, rather, a theory about what subjective choiceworthiness “1ought” to be, for someone with a given objective choiceworthiness distribution over his feasible set—or, a theory about what subjective choiceworthiness is for someone with a given objective choiceworthiness distribution over his feasible set, if he knows the true 1metatheory.
Suppose, for instance, that I am faced with three feasible acts, that I assign probability to each of two 1metatheories, \(t_{1}\) and \(t_{2}\), and that I am certain of “2metatheory” m. The theories are such that if I were certain of \(t_{1}\), the subjective choiceworthinesses of the acts would be ordered \(a_{1} \succ a_{2} \succ a_{3}\); if I were certain of \(t_{2}\), the subjective choiceworthinesses of the acts would be ordered \(a_{3} \succ a_{2} \succ a_{1}\); and, given the probabilities I assign to \(t_{1}\) and \(t_{2}\), but my certainty about m, the subjective choiceworthinesses of the acts are in fact ordered \(a_{2} \succ a_{3} \succ a_{1}\). Although I assign probability \(\frac{1}{2}\) to \(t_{1}\), I assign no positive probability to the event that \(a_{1}\) is more subjectively choiceworthy than \(a_{2}\) from my epistemic position. The 1metatheories’ claims, therefore, are not claims about the acts’ subjective choiceworthinesses given my empirical uncertainty, but about how the acts score on an altogether different scale. Let us call this scale “metachoiceworthiness”, or “1metachoiceworthiness”. Of course, metachoiceworthiness must be constructed such that, if I know that an act’s 1metachoiceworthiness is x, then the act’s subjective choiceworthiness for me is also x. We might therefore informally think of 1metachoiceworthiness as “whatever subjective choiceworthiness is, for someone who knows the correct 1metatheory”. But since, again, decision theories are not actually claims about subjective choiceworthiness, let us begin by thinking about 1metachoiceworthiness on its own terms, and only afterward consider its relationship to subjective choiceworthiness.
In any event, the elusiveness of subjective choiceworthiness is not restricted to “order 1”. Just as I may be uncertain as to the correct 1metatheory, I may be uncertain as to the correct 2metatheory; I may therefore have to appeal to a “3metatheory”; and the 2metatheories are therefore making claims not about acts’ subjective choiceworthiness given beliefs about their 1metachoiceworthiness, but about acts’, say, “2metachoiceworthiness” given beliefs about their 1metachoiceworthiness. So our regress begins.
2.4 kmetachoiceworthiness
Let us call choiceworthiness “0metachoiceworthiness”, choiceworthiness distributions “0metachoiceworthiness distributions”, and decision theories “1metatheories”. The concepts of kchoiceworthiness, kmetachoiceworthiness distributions, and kmetatheories can then together be defined recursively.
Definition 2.3
The k metachoiceworthiness \(c_{k}\) of an act \(a_{i}\), for an agent facing finite choice problem \(\pi \), is \(a_{i}\)’s subjective choiceworthiness for an agent with the same (\(k  1\))metachoiceworthiness distribution as that entailed by \(\pi \), but who knows the correct kmetatheory.
Let us denote acts’ relative kmetachoiceworthiness by the twoplace relation \(\succ _{\pi ,k}\).
Definition 2.4
A (finite) k metachoiceworthiness distribution \(d_{k} \in {\mathbb {D}}^{A}\) is a probability distribution over kmetachoiceworthiness values for some (finite) set of acts A.
Definition 2.5
A k metatheory, applied to a finite set of acts A, is a function \(t_{k} : {\mathbb {D}}^{A} \rightarrow {\mathbb {R}}^{A}\), representing claims about the kmetachoiceworthiness of the acts in A given (\(k1\))metachoiceworthiness distribution \(d_{k1} \in {\mathbb {D}}^{A}\).
Note that, strictly speaking, if we want our kmetatheories to make kmetachoiceworthiness claims over finite actsets of arbitrary size, we would have to say that a kmetatheory is a family of functions \(\{t_{k}^{n}\}\) from \({\mathbb {D}}^{n}\) to \({\mathbb {R}}^{n}\), with one for each \(n \in {\mathbb {N}}\). For simplicity, however, we will take \(n = A\) as given and interpret our project only as an attempt to find criteria under which the subjective choiceworthinesses of any n acts will be welldefined—with the understanding that identical reasoning would apply to any other n.
We can now define a few aditional terms.
Definition 2.6
A k metatheory distribution \(d_{t_{k}}\) is a probability distribution over kmetatheories.
Definition 2.7
A metatheoretic hierarchy (or simply “hierarchy”) T is a collection of kmetatheories \(t_{k}\) with one for each \(k > 0\).
Definition 2.8
A hierarchy distribution \(d_{T}\) is a probability distribution over hierarchies.
Let \(d_{t_k}\) and \(d_T\) denote the number of kmetatheories and hierarchies, respectively, to which I assign positive probability.
Let \(\vec {c_k} \in {\mathbb {R}}^{d_{t_k}A}\) represent the claims made by my \(d_{t_{k}}\) kmetatheories about the kmetachoiceworthinesses of the A acts in A. Let \(\vec {p_{k}} \in \Delta ^{d_{t_{k}}1}\) represent the probabilities I assign to these kmetatheories. We can now represent my kmetachoiceworthiness distribution by \(d_{k} = \langle \vec {c_{k}},\vec {p_{k}}\rangle \).^{Footnote 6}
2.5 The relationship of kmetachoiceworthiness to subjective choiceworthiness
Upon introducing the cardinal subjective choiceworthiness function \(c(\pi )\) above, we placed no restrictions on what it could be. Now that we have documented the emergence of an elaborate web of concepts concerning \(\pi \), however, we can consider how it relates to c.
Recall that kmetachoiceworthiness claims are defined so that, if I know that an act’s kmetachoiceworthiness for me is x, the act’s subjective choiceworthiness for me is x. Let us now introduce a compatible, minimally restrictive principle with which one’s subjective choiceworthiness function might comply in the face of uncertainty about an act’s kmetachoiceworthiness.
Definition 2.9
The Dominance Principle is the principle that

If \(b \ge x\) \(\forall b \in [\vec {c_k}]_i\), and \(b^{*} > x\) for some \(b^{*} \in [\vec {c_k}]_i\), then \(c(a_i) > x\).

If \(b \le x\) \(\forall b \in [\vec {c_k}]_i\), and \(b^{*} < x\) for some \(b^{*} \in [\vec {c_k}]_i\), then \(c(a_i) < x\).
Note that if I accept the Dominance Principle, it follows immediately that my subjective choiceworthiness for an act \(a_i\) is welldefined whenever \(\cap _{k \in {\mathbb {N}}}[\min ([\vec {c_k}]_i), \max ([\vec {c_k}]_i)]=1\). That is, whenever exactly one number lies in the ranges of “admissible” (not dominated) kmetachoiceworthiness values, across all k, for an act, that number must be the act’s subjective choiceworthiness.
Note also that any claim about subjective choiceworthiness itself, such as the Dominance Principle, in some sense takes on both a positive and a normative interpretation. One could interpret the Principle normatively as asserting that one’s subjective choiceworthiness always ought to obey the above pattern. In this case, if one accepts the Principle, one’s subjective choiceworthiness also does obey it, since to hold that an act should be ranked highly for someone in your epistemic position is simply another way to say that it is highly subjectively choiceworthy. Alternatively, one could interpret the Principle positively as asserting that, as a matter of fact, subjective choiceworthiness always obeys the above pattern. If one accepts this claim (and that “ought implies can”), one must also accept that subjective choiceworthiness always ought to obey the above pattern. Either way, if one accepts the Principle, one cannot assign positive probability to kmetatheories that claim that the kmetachoiceworthiness of an act lies outside the admissible range imposed by one’s \(k^{\prime }\)metachoiceworthiness distribution for the act for lower orders \(k^{\prime } < k\).
Finally, note that the framework outlined here differs from other approaches to subjective choiceworthiness in the following respect. Some other approaches [e.g. that of MacAskill (2016a)] begin with the normative theories in all their diversity; work through problems of intertheoretic comparability; and then try to define subjective choiceworthiness with no more structure than necessary. On some accounts, this minimal structure allows only for a binary classification of acts into the “permissible” and the “impermissible” [as recommended, for instance, by Barry and Tomlin (2016)]. The above approach, by contrast, begins by assuming that subjective choiceworthiness is a single cardinal scale, and it characterizes kmetachoiceworthiness claims, and the kmetatheories that make them, in terms of the subjective choiceworthinesses that they would induce if they were known. This approach has the cost of assuming cardinal subjective choiceworthiness, but it has the benefit of immediately giving all my kmetachoiceworthiness claims both unit and level comparability, without requiring any further assumptions.
Thus, from a cardinal definition of subjective choiceworthiness, we also get a cardinal definition of utility, without having to assume it explicitly. By similar reasoning, we also get cardinal definitions of kmetachoiceworthiness for all k. Note that we are not taking the Von Neumann–Morgenstern approach of defining my utility function so that it represents the choices I would make if I were maximizing expected utility; indeed, our project is to explore how far I can stray from certainty about expected utility theory while still knowing how I subjectively ought to act.
3 Conditions
3.1 Completeness
A “partial kmetatheory” would be one that makes claims about the kmetachoiceworthinesses of some acts under some (\(k1\))metachoiceworthiness distributions, but not of all acts under all (\(k1\))metachoiceworthiness distributions. A partial decision theory of “strict dominance”, for instance, claims that \(a_{i} \succ _{\pi ,1} a_{j} \iff u(A, s)_{i} > u(A, s)_{j}\) \(\forall s \in S\), and makes no other claims at all. That is, it claims that an act \(a_{i}\) is more 1metachoiceworthy than an act \(a_{j}\) if and only if \(a_{i}\) is more objectively choiceworthy than \(a_{j}\) in all the states to which I assign positive probability, and it is silent about acts’ relative 1metachoiceworthinesses in all other cases.
Conversely,
Definition 3.1
A kmetatheory, applied to a finite set of acts A, is complete if it is defined throughout \({\mathbb {D}}^{A}\).
One condition for the results below is that I assign positive probability only to complete decision theories. Believing that the true decision theory is complete is, I think, reasonably well motivated by the sense that, just as I know acts’ 0metachoiceworthiness (i.e. objective choiceworthiness) if I know the true state, I should be able to know acts’ 1metachoiceworthiness if I know the true decision theory (and so on up the hierarchy). In any event, we will remove partial decision theories from consideration so as to separate the regress problem from the problems of incomparability that can plague normative uncertainty in their own right.
Note that the framework laid out in Sect. 2 does not allow us to assign positive probability to the “nihilistic decision theory” (one that makes no claims about acts’ 1metachoiceworthinesses under any choiceworthiness distribution). Since a decision theory is a collection of claims determining what acts’ subjective choiceworthinesses would be for me if I knew how to respond to my empirical uncertainty, and since my subjective choiceworthiness is already defined in the degenerate case of empirical certainty, all my decision theories at least claim that an act’s subjective choiceworthiness is its objective choiceworthiness, when my objective choiceworthiness distribution is degenerate.
3.2 Continuity
We will say that
Definition 3.2
A kmetatheory \(t_{k}\) is continuous if slight changes to the individual (\(k1\))metachoiceworthiness claims to which one assigns positive probability produce only slight changes to the kmetachoiceworthiness claims made by \(t_{k}\).
To be called continuous, \(t_{k}\) need not respond continuously to the probability one assigns to some (\(k1\))metachoiceworthiness claim. A more formal definition of continutity, as we are using the term here, is given in the “Appendix”.
A second condition, necessary for only the first of the results below, is that I assign positive probability only to continuous decision theories.
3.3 The Analog Principle
MacAskill (2014) argues that, when we are facing both empirical and normative uncertainty over a set of acts, there is a sense in which we should treat our empirical and normative uncertainty “analogously”. If I am uncertain which act is objectively best, it may seem unlikely that the appropriate response to my uncertainty would depend on the reason (i.e., empirical or normative) for my uncertainty—especially upon considering that I might have uncertainty about how to behave without even knowing the reason for my uncertainty.
In the context of the regress problem, one might likewise argue that we should treat our empirical and kmetatheoretic uncertainty analogously. More formally:
Definition 3.3
Let \(t_{k}^{*} : {\mathbb {D}}^{A} \rightarrow {\mathbb {R}}^{A}\) denote the true kmetatheory. The Analog Principle is the claim that \(t_{k}^{*} = t_{1}^{*}\) \(\forall k \ge 1\).
One might wonder if it matters whether my beliefs about the kmetatheories are correlated across different orders \(k^{\prime }\) (as of course they are—very strongly!—if I accept the Analog Principle), or whether they are correlated my beliefs about the state of the world. In fact, it does not. A kmetatheory is simply a function of my (k1)metachoiceworthiness distribution; a kmetatheory’s output therefore does not depend on the probability that it is the true kmetatheory, nor on its probability conditional on some state or \(k^{\prime }\)metatheory.
A final condition, necessary only for the first of the results below, is that I accept the Analog Principle.
4 Convergence
4.1 Intuition
In the context of the framework above, the commonness of welldefined subjective choiceworthiness is not surprising. If I assign positive probability to a finite number of theories, and they disagree about how subjectively choiceworthy some act should be for me, there will be a minimum and a maximum to that range of values. In the face of that uncertainty, my subjective choiceworthiness should lie somewhere in the interior of the range. Where, exactly? I will assign positive probability to different answers as to where it should lie, producing a smaller range. And so on. Given a few other assumptions (either the continuity of my theories and my acceptance of the Analog Principle, or the possibility of transfinite hierarchies), this process will not “get stuck” by shrinking the range of potential subjective choiceworthiness values for each act merely from a larger range to a smaller range. Instead, the process is guaranteed ultimately to shrink said range to a single point.
In other words, nothing very counterintuitive falls out of the mathematical setup of the problem. The point of this exercise is simply to formally illustrate a coherent framework whereby our intuitions about normative uncertainty—including about the infinite regress that it threatens—can be reconciled with the understanding that, at the end of the day, we make normguided decisions.
With that said, the convergence results can be stated as follows.
4.2 Natural hierarchies
Theorem 1
If one assigns positive probability only to a finite set of decision theories all of which are complete and continuous, and if one accepts the Dominance Principle and the Analog Principle, then one’s subjective choiceworthiness is welldefined over any finite set A of acts.
The proof can be found in the “Appendix”.
Proposition 4.1
If two acts are equally subjectively choiceworthy, this fact will not necessarily be revealed by the iterated application of one’s distribution over kmetatheories.
Proof
Suppose for example that I assign probability \(\frac{1}{2}\) to Expected Choiceworthiness Theory, and to the analogous hierarchy (\(T_{1}\)) according to which the (k+1)metachoiceworthiness of an act is its expected kmetachoiceworthiness. I assign probability \(\frac{1}{2}\) to a riskaverse hierarchy of theories (\(T_{2}\)) according to which the (k+1)metachoiceworthiness of an act is the average of its expected kmetachoiceworthiness and its minimum possible kmetachoiceworthiness.
I am deciding between two acts, \(a_{1}\) and \(a_{2}\). I assign probability \(\frac{1}{2}\) to a state in which \(a_{1}\) has objective choiceworthiness 0 and probability \(\frac{1}{2}\) to a state in which \(a_{1}\) has objective choiceworthiness 1. Act \(a_{2}\) has objective choiceworthiness \(\frac{1}{3}\) in both states. The subjective choiceworthiness of \(a_{2}\) is, of course, \(\frac{1}{3}\). The kmetachoiceworthiness claim about \(a_1\) made by \(T_1\) is \([\vec {c_k}]_{1,1} = 1  \sum _{n=1}^{k}\frac{1}{2^{2n1}}\); that made by \(T_2\) is \([\vec {c_k}]_{1,2} = \sum _{n=1}^{k}\frac{1}{4^n}\). This can be seen by verifying algebraically that these summations satisfy the intitial conditions \([\vec {c_1}]_{1,1} = \frac{1}{2}\) and \([\vec {c_1}]_{1,2} = \frac{1}{4}\), and the recursive conditions \([\vec {c_k}]_{1,1} = \frac{1}{2} [\vec {c_{k1}}]_{1,1} + \frac{1}{2} [\vec {c_{k1}}]_{1,2}\) and \([\vec {c_k}]_{1,2} = \frac{1}{4} [\vec {c_{k1}}]_{1,1} + \frac{3}{4} [\vec {c_{k1}}]_{1,2}\) \((k > 1)\).
As k increases, \(\{[\vec {c_k}]_{1,1}\}\) converges to \(\frac{1}{3}\) from above, while always strictly greater, and \(\{[\vec {c_k}]_{1,2}\}\) converges to \(\frac{1}{3}\) from below, while always strictly less. Thus the acts’ subjective choiceworthinesses are precisely equal, even though any formal comparison of the acts is undefined if we halt our deliberation after finite k. \(\square \)
This example does not imply the potentially concerning conclusion that one might not be able to infer acts’ relative subjective choiceworthinesses, from one’s choiceworthiness distribution and one’s hierarchy distribution, in finite time. Indeed, since the subjective choiceworthiness of each act is welldefined under the above conditions, and is a logical consequence of one’s finite choice problem, we know from the completeness of firstorder logic that the value of \(c(\pi )_{i}\) can be determined for all i by some finite proof—such as the one given above.
The example does, however, demonstrate that there may be facts about subjective choiceworthiness that are not discovered by the “iterate a few times and hope my uncertainty is more or less resolved” algorithm. In particular, therefore, I believe it demonstrates that fixedpoint solutions to the regress problem need not take the form either of convergence after finite k or of monotonic decreases in the set of maximally kchoiceworthy acts—as hoped for in Sepielli (2014), and as claimed by Tarsney (2017, p. 239).
4.3 Transfinite hierarchies
Suppose that the kmetatheories to which I assign positive probability are all complete and compatible with the Dominance Principle, but that they are not continuous, or that I reject the Analog Principle. It is then possible that my subjective choiceworthiness for some act \(a_{i}\) does not converge, even after infinite steps. That is, though \(\lim _{k \rightarrow \infty } \min ([\vec {c_{k}}]_{i})\) and \(\lim _{k \rightarrow \infty } \max ([\vec {c_{k}}]_{i})\) do both exist (by the fact that the respective sequences in k are monotonic and bounded), \(\lim _{k \rightarrow \infty } \min ([\vec {c_{k}}]_{i}) < \lim _{k \rightarrow \infty } \max ([\vec {c_{k}}]_{i})\). In such a situation, it seems, someone could still claim that there is a “right way” for me to act. Someone could claim that my subjective choiceworthiness for \(a_{i}\) should be the average of \(\lim _{k \rightarrow \infty } \min ([\vec {c_{k}}]_{i})\) and \(\lim _{k \rightarrow \infty } \max ([\vec {c_{k}}]_{i})\), for example.
If I assign positive probability to competing theories of how to act in situations like the above, I must appeal to a theory about how to act in the face of this uncertainty. So our regress extends beyond the natural numbers, into the transfinite ordinals.
The definitions given in Sects. 2 and 3 can generally be reinterpreted so that k (or, \(\kappa \)) is any ordinal, not just any natural number. Two, however, will require slight tweaks:
Definition 4.1
(Definition 2.3, revised) The \(\underline{\kappa }\) metachoiceworthiness \(c_{\kappa }\) of an act \(a_{i}\), for an agent facing finite choice problem \(\pi \), is \(a_{i}\)’s subjective choiceworthiness for an agent with the same \(\kappa ^{\prime }\)metachoiceworthiness distribution as that entailed by \(\pi \) for all \(\kappa ^{\prime } < \kappa \), but who knows the correct \(\kappa \)metatheory.
Definition 4.2
(Definition 2.5, revised) A \(\underline{\kappa }\) metatheory, applied to a finite set of acts A, is a function \(t_{\kappa } : {\mathbb {D}}^{\kappa A} \rightarrow {\mathbb {R}}^{A}\), representing claims about the \(\kappa \)metachoiceworthiness of the acts in A given \(\kappa ^{\prime }\)metachoiceworthiness distributions \(d_{\kappa ^{\prime }} \in {\mathbb {D}}^{A}\) for all \(\kappa ^{\prime } < \kappa \).
Note that this definition of a higherorder metatheory is strictly more general than the original, even with respect to finite k, because it allows kmetatheories to be functions of one’s beliefs not only about (\(k1\))metachoiceworthiness but about \(k^{\prime }\)metachoiceworthiness at all lower orders \(k^{\prime } < k\). The following result thus holds, as Theorem 1 does not, regardless of whether we want to allow for this possibility.
We can now state the following:
Theorem 2
If one assigns positive probability only to a finite set of complete \(\kappa \)metatheories for each ordinal \(\kappa \), and one accepts the Dominance Principle, then one’s subjective choiceworthiness is welldefined over any finite set A of acts.
The proof can be found in the “Appendix”.
4.4 The infectiousness of stubbornness
Definition 4.3
The Weak Dominance Principle is the principle that

If \(b \ge x\) \(\forall b \in [\vec {c_\kappa }]_i\), for some \(\kappa \), then \(c(a_i) \ge x\).

If \(b \le x\) \(\forall b \in [\vec {c_\kappa }]_i\), for some \(\kappa \), then \(c(a_i) \le x\).
Let us call a \(\kappa \)metatheory “compromising” if it is compatible with the (strong) Dominance Principle, and “stubborn” if it is not. Expected Choiceworthiness Theory and riskweighted variants of it are examples of “compromising” theories. Minimax Theory, according to which an act’s metachoiceworthiness is its minimum possible choiceworthiness, is an example of a “stubborn” theory. However, it is compatible with the Weak Dominance Principle.
Both the theorems above demonstrate that, when all the decision theories (or \(\kappa \)metatheories) to which I assign positive probability are compromising (along with some other conditions), the range of potential subjective choiceworthiness values for each act shrinks to point. In both cases, this is demonstrated roughly by the fact that, when the range of potential subjective choiceworthiness values for some act is a nondegenerate interval at some order k (or \(\kappa \)), the application of even higherorder metatheories shrinks this interval by increasing its minimum.
One might notice that, strictly speaking, neither of the proofs requires that all my decision theories (or \(\kappa \)metatheories) be compromising. Suppose, for example, that I reject the Dominance Principle, but accept the Weak Dominance Principle. Suppose further that just one decision theory (or one \(\kappa \)metatheory for each \(\kappa \)) to which I assign positive probability is “stubborn”—or, that all the stubborn theories to which I assign positive probability are onesidedly pessimistic (like Minimax) or optimistic (like Maximax) at each order. Then our proofs can go through with only slight modifications. We just need to shrink our interval exclusively from the top or from the bottom, at a given order, to avoid asking concessions of our stubborn theories.
The plausibility of stubborn theories, however, poses two challenges for this project in general.
First, stubborn theories are “infectious”: they can determine our behavior regardless how little positive credence we give them. Suppose I am deciding between two acts, \(a_{1}\) and \(a_{2}\). I assign probability 0.99 to a state in which \(a_{1}\) has objective choiceworthiness 1, and probability 0.01 to a state in which \(a_{1}\) has objective choiceworthiness 0. Act \(a_{2}\) has objective choiceworthiness 0.0001 in both states. Furthermore, I assign probability 0.99 to Expected Choiceworthiness, and probability 0.01 to Minimax, at every order of the hierarchy. It would be deeply counterintuitive to conclude that, from such a position, \(a_{2}\) is more subjectively choiceworthy than \(a_{1}\). It would be perhaps even more counterintuitive to conclude that the acts’ subjective choiceworthinesses were equal, if \(a_{2}\)’s objective choiceworthiness were known to be 0. But of course we do reach both conclusions, as repeated applications of my distribution over decision theories bring \(a_{1}\)’s higherorder metachoiceworthiness arbitrarily close to 0—according even to the sequence of “Expected Choiceworthiness”analogous theories.
Second, stubborn theories can clash with each other. If we are going to give some weight to Maximin, at every order, it seems only fair to give some weight to Maximax at every order as well. But if we do, then each act’s range of potential subjective choiceworthiness values never shrinks at all; \(\min ([\vec {c_{\kappa }}]_{i}) = \min ([\vec {c_{0}}]_{i})\), and \(\max ([\vec {c_{\kappa }}]_{i}) = \max ([\vec {c_{0}}]_{i})\), for all \(\kappa \).
It does not feel as though I can fully rule out stubborn theories. Thus, despite all our progress, I am still left with the original motivating question: how do I so regularly wind up with welldefined subjective choiceworthiness? One encouraging thought is the observation that, though stubbornness is infectious in one sense, there is a sense in which compromise is infectious as well. For example, suppose I assign positive probability to stubborn \(\kappa \)metatheories (or even, only to stubborn \(\kappa \)metatheories) at almost all \(\kappa \), but assign positive probability only to compromising theories at a relatively sparse class of orders—at the limit ordinals, perhaps. Then, even though there is a sense in which I believe in stubborn theories “almost everywhere” up the hierarchy, the scattered allcompromising orders will still force my subjective choiceworthiness range for each act down to a point. (A minimally modified version of the proof of Theorem 2 presented in the “Appendix” will hold so long as our class \(\Gamma \) of “allcompromising” orders is such that, for every set M of ordinal numbers, there is an ordinal number \(\gamma \in \Gamma : \gamma > \mu \,\forall \,\mu \in M\).) Similar reasoning applies to the case of merely natural hierarchies. In short, my hierarchy distribution can handle a lot of stubbornness; as long as an allcompromising order comes along every now and then to shrink my subjective choiceworthiness range for each act, there are reasonable conditions under which subjective choiceworthiness will generally be welldefined.
4.5 Rescaling
MacAskill (2014) offers the following example of undefined subjective choiceworthiness.
Suppose an agent faces a choice problem:
She must choose among acts \(A = \{a_1, a_2\}\). She assigns probability \(\frac{18}{23}\) to a state \(s_1\) in which \(a_1\) has objective choiceworthiness 0 and \(a_2\) has objective choiceworthiness 1, and probability \(\frac{5}{23}\) to a state \(s_2\) in which \(c_0(a_1) = 4\) and \(c_0(a_2) = 0\).
In evaluating the acts at order 1, she assigns probability \(\frac{18}{23}\) to Expected Choiceworthiness Theory (\(t_1\)), and probability \(\frac{5}{23}\) to “Square Root Theory” (\(t_2\)), according to which an act’s 1metachoiceworthiness is the expectation of the square root of the difference between its objective choiceworthiness and the objective choiceworthiness of the least objectively choiceworthy act in A. Thus
On MacAskill’s reading of the problem, “transformations of each individual choiceworthiness function by an absolute value are permissible, and transformations of all choiceworthiness functions by a multiplying factor are permissible” (222). Thus he produces
As we can see, after rescaling, the 1metachoiceworthiness distribution of \(a_1\) is precisely what the 0metachoiceworthiness distribution of \(a_2\) had been, and the 1metachoiceworthiness distribution of \(a_2\) is precisely what the 0metachoiceworthiness distribution of \(a_1\) had been. Therefore, if we obey the Analog Principle—that is, if our distribution over kmetatheories is the same at every order—and if we rescale after every step, our kmetachoiceworthiness distribution for the acts will flip forever between that of “Order 0” and that of “Order 1, rescaled”, without converging.
To my mind, however, kmetachoiceworthiness claims are intuitively characterized by the property that, if one believes them, they define one’s subjective choiceworthiness. If an agent faces empirical uncertainty over two acts’ objective choiceworthiness as represented above, we want to say that, in the event that she learns the truth of \(s_2\), act \(a_1\)’s subjective choiceworthiness for her is 4. In precisely the same language, I think, we want to say that in the event that she learns the truth of Expected Choiceworthiness Theory (but does not learn the true state), \(a_1\)’s subjective choiceworthiness for her is \(\frac{20}{23}\)—and likewise all up the hierarchy. To keep these claims “in line”, the framework of Sect. 2 permits realvalued representations of the subjective choiceworthiness values and kmetachoiceworthiness distributions associated with a given choice problem to be rescaled only in conjunction, not independently.
Furthermore, if this is the right way to think about kmetachoiceworthiness, then Square Root Theory (SR) is, as stated, incoherent. SR does not specify which realvalued representations of objective choiceworthiness to use as inputs, so its claims should be independent to rescaling. But they are not. Using our agent’s 0metachoiceworthiness distribution as represented above, SR claims that the 1metachoiceworthiness of \(a_1\) is \(\frac{20}{23}\), and EC claims that the 1metachoiceworthiness of \(a_1\) is lower (just \(\frac{10}{23}\)). But if we had represented her 0metachoiceworthiness distribution differently,
we would conclude that EC claims that the 1metachoiceworthiness of A is \(\frac{5}{23}\) for her, and that SR claims the same.
To ensure that an act’s true kmetachoiceworthiness for an agent be independent of the scale she arbitrarily uses to represent her (\(k1\))metachoiceworthiness distribution, all our kmetatheories have to be “affine” (unique up to affine transformation). Though this condition closes the door to “Square Root Theory”, it permits a wide array of other riskaverse theories, including Buchak’s REU Theory and the riskaverse theory presented in Proposition 4.1.
5 Applications to moral uncertainty
If I assign positive probability only a finite set of complete, cardinal, comparable moral theories—or, if I at least know the right way to represent all my moral theories’ choiceworthiness claims on the same cardinal scale—then the results above can be applied almost directly to my moral choice problems under empirical certainty.
“Almost”, because, to avoid wading into a sea of hopeless complexity, we must assume that my moral theories make no claims about how to respond to uncertainty per se. That is, we must say that, for example, among varieties of utilitarianism, I assign positive probability only to those that claim that an act’s objective choiceworthiness is (something along the lines of) its objective impact on total utility. I must assign no positive probability to a variety of utilitarianism that claims that an act’s objective choiceworthiness is, say, my expectation of its impact on total utility. Utilitarianism is so often described as the idea that we ought to maximize the world’s expected utility—see Parfit (1984, pp. 25, 26), for instance—that one might easily come to believe that Expected Choiceworthiness is the only way utilitarians are allowed to deal with uncertainty. In this context, however, we should be careful to separate the unique moral claim of utilitarianism (that value is identified with utility) from the independent decisiontheoretic claim (that one ought to maximize expected value). Moral theories that explicitly incorporate such decisiontheoretic claims may also be interesting to consider in light of the regress problem, but we will not discuss them here.
With that restriction, suppose I am certain about the state of the world. I then simply have to swap out our language about objective choiceworthiness being “my utility function, contingent on the true state of the world” for language about objective choiceworthiness being “moral value, contingent on the true moral theory”, and Sects. 2–4 apply to cases of moral uncertainty, under empirical certainty, in full.
If I face both empirical uncertainty and moral uncertainty, however, my situation is more complex. One approach would be for me to take “objective choiceworthiness” to be a function of both the true state and the true moral theory, to consider my probability distribution over {states} \(\times \) {moral theories}, and then to apply my hierarchy distribution. Another approach, however, would be for me first to work out the subjective choiceworthiness of each act, conditional on each state, in light of my distribution over moral theories, and then to apply my hierarchy distribution a second time, to work out the subjective choiceworthiness of each act in light of my distribution over states. A third approach, symmetrical to the second, would be for me first to work out the subjective choiceworthiness of each act, conditional on each moral theory, in light of my distribution over states, and then to apply my hierarchy distribution a second time, to work out the subjective choiceworthiness of each act in light of my distribution over moral theories (These approaches have the disadvantage that they would not be able to account for any dependence between my distribution over states and my over moral theories. They have the advantage, however, that they would be able to account for the possibility that my hierarchy distribution over ways of dealing with moral uncertainty differs from my hierarchy distribution over ways of dealing with empirical uncertainty). And other conceivable approaches abound.
Unfortunately, these approaches will not necessarily all yield the same subjective choiceworthiness values, or even the same act recommendations—not even under decisiontheoretic certainty, and not even when I believe that the same theory should be used in the face of emprical uncertainty as in the face of moral uncertainty. Consider the following situation. I assign positive probability to a set of moral theories \(M = \{m_{1}, m_{2}, m_{3}\}\) and to a set of states \(S = \{s_{1}, s_{2}, s_{3}\}\). I have two feasible acts, \(a_{1}\) and \(a_{2}\). Their objective choiceworthinesses, conditional on each state and moral theory, are as follows:
Furthermore, I am certain that an act’s kmetachoiceworthiness is its secondlowestpossible (\(k1\))metachoiceworthiness. If I apply this decision theory to my uncertainty over {states} \(\times \) {moral theories}, I get \(c(a_{1}) = 2\) and \(c(a_{2}) = 3\), so \(a_{2} \succ a_{1}\). However, if I apply this decision theory first over states (conditional on each moral theory) and then over moral theories—or, first over moral theories (conditional on each state) and then over states—then I get \(c(a_{1}) = 4\) and \(c(a_{2}) = 3\), so \(a_{1} \succ a_{2}\). These complications only worsen when \(M \ne S\), in which case even the “same” decision theory can aggregate across moral theories and across states arbitrarily differently.
There is another way in which decisiontheoretic uncertainty can interact with moral uncertainty. It is often argued that morality requires us to make decisions as if from behind a “veil of ignorance” about our own identity among those affected by our actions. If so, the moral choiceworthiness of an act depends directly on its decisiontheoretic 1metachoiceworthiness. Suppose that I ought to act toward a group as if my identity is, in probability, distributed uniformly over the group. Then, if Expected Choiceworthiness is the correct decision theory, the Veil of Ignorance argument points toward classical utilitarianism as the correct moral theory; if Minimax is the correct decision theory, toward Rawls’s “maximin criterion”; if some riskweighted theory is the correct decision theory, toward the corresponding version of prioritarianism; and so on.^{Footnote 7} But we will not explore this interaction further here.
Finally, the above thoughts about how to integrate the results of Sects. 2–4 into situations of moral uncertainty can apply straightforwardly to normative uncertainty in other domains, so long as one assigns positive probability finite set of theories which are in some analogous sense complete, cardinal, and intertheoretically comparable. But we will not explore such applications further here.
6 Conclusion
We are often uncertain about the moral and decisiontheoretic norms which we believe should guide our behavior. Even when these norms conflict, however, we often have a subjective understanding of whether some act would be rationally or morally permissible for us, from our position of normative uncertainty. “Uncertaintism” might be understood as the project of unraveling how this uncertainty translates into the subjective choiceworthiness on which we ultimately feel justified in acting.
When the uncertaintist tries to specify any particular mechanism for translating the uncertainty over choiceworthiness into an appropriate characterization of subjective choiceworthiness, however, we find that, just as we are not certain of our acts’ objective choiceworthinesses, we are not certain of his proposed mechanism either. Nor are we certain about how to how to deal with our uncertainty about such a mechanism. Indeed, our certainty about subjective choiceworthiness seems to stand strangely on its own. In general, when we try to ground our certainty about subjective choiceworthiness in metanormative certainty at some order, we find that the hopedfor ground of certainty does not exist. For some, as cited above, this “possibility of normative uncertainty all the way up makes the uncertaintist project look pointless”.
The results presented here demonstrate that, as stated, the quoted worry is not justified. We can reliably have welldefined subjective choiceworthiness without being certain about the correct firstorder normative theory or about any higherorder metatheory. We only have to commit to a weaker family of assumptions, such as the Dominance Principle. This observation should lend the “uncertainist project” at least some hope.
But commitment to these weaker assumptions may still be a strong requirement. Certainty about them may never actually obtain, or may obtain only rarely. Ultimately, therefore, it is up to the reader to judge whether this theorizing sheds any light on more realistic cases of normative uncertainty.
In any event, this preliminary investigation has uncovered one class of “fixedpoint” solutions to the regress problem. Even if doubts can be cast on the constraints here imposed in the process, I hope these results have encouraged the reader that solutions along similar lines might more generally be found.
Notes
In the process, he distinguishes between the “ideal regress problem”, which an ideal agent with perfect reasoning ability might face, and the “nonideal regress problem”. In order to separate the issue of normative uncertainty from the issue of bounded rationality, here we will consider only what he calls the “ideal regress problem”. Note that we are implicitly assuming that normative facts are not logical facts; if they are, then it is impossible for an agent with perfect reasoning ability to suffer normative uncertainty.
Here and elsewhere, we will assume that all credences satisfy the Kolmogorov probability axioms. Note that this implies that all the sets over which I have probability distributions are nonempty.
We will assume that the probability of each state is independent of the chosen act. We will thus bypass the question of how to act in the face of such dependency (i.e. causal decision theory vs. evidential decision theory and other alternatives), and focus entirely on the question of how to act in the face of uncertainty over states (i.e. expected utility theory vs. its alternatives). For an analysis of how to approach uncertainty between causal and evidential decision theory, see MacAskill (2016b).
More precisely, let e specify my probability distribution over the set of [{states of the world} \(\times \) {decision theories (or, 1metatheories)} \(\times \) {2metatheories} \(\times \) {3metatheories} \(\times \cdots \)]. The concept of a “kmetatheory” is defined in Sect. 2.4.
Let us distinguish EC from “Maximize Expected Choiceworthiness” (MEC). MEC is the weaker theory characterized only by the fact that, if I am certain that it is correct, then the acts with the highest subjective choiceworthiness for me are the acts with the highest expected objective choiceworthiness under d.
This is not to say that a given distribution can only be represented by one particular vector pair. Multiple kmetatheories may make the same kmetachoiceworthiness claims in some situation, for instance.
The implications of riskweighted expected utility theory for decisions made on behalf of groups, rather than individuals, are further discussed by Buchak (2013, pp. 167, 168).
References
Allais, M. (1953). Le comportement de l’homme rationnel devant le risque: critique des postulats et axiomes de l’ecole Americaine. Econometrica, 21(4), 503–546.
Barry, C., & Tomlin, P. (2016). Moral uncertainty and permissibility: Evaluating option sets. Canadian Journal of Philosophy, 46(6), 898–923.
Bergemann, D., Morris, S., & Takahashi, S. (2017). Interdependent preferences and strategic distinguishability. Journal of Economic Theory, 168, 329–371.
Bostrom, N. (2016). Pascal’s mugging. Analysis, 69(3), 443–445.
Buchak, L. (2013). Risk and rationality. Oxford: Oxford University Press.
Lin, H. (2014). On the regress problem of deciding how to decide. Synthese, 191(4), 661–670.
Lipman, B. (1991). How to decide how to decide how to...: Modeling limited rationality. Econometrica, 59(4), 1105–1125.
Lockhart, T. (2000). Moral uncertainty and its consequences. Oxford: Oxford University Press.
MacAskill, W. (2013). The infectiousness of Nihilism. Ethics, 123(3), 508–520.
MacAskill, W. (2014). “Normative Uncertainty”. D.Phil. thesis, University of Oxford.
MacAskill, W. (2016). Normative uncertainty as a voting problem. Mind, 125(500), 967–1004.
MacAskill, W. (2016). Smokers, psychos, and decisiontheoretic uncertainty. The Journal of Philosophy, 113(9), 425–445.
Mertens, J., & Zamir, S. (1985). Formulation of Bayesian analysis for games with incomplete information. International Journal of Game Theory, 14(1), 1–29.
Parfit, D. (1984). Reasons and persons. Oxford: Oxford University Press.
Pittard, J., & Worsnip, A. (2017). Metanormative contextualism and normative uncertainty. Mind, 126(1), 155–193.
Rawls, J. (1971). A theory of justice. Cambridge, MA: Harvard University Press.
Savage, L. (1954). The foundations of statistics. New York: Wiley.
Sepielli, A. (2009). What to do when you don’t know what to do. In R. ShaferLandau (Ed.), Oxford studies in metaethics (Vol. X). Oxford: Oxford University Press.
Sepielli, A. (2014). What to do when you don’t know what to do when you don’t know what to do. Noûs, 47(1), 521–544.
Sepielli, A. (2017). How moral uncertaintism can be both true & interesting. In M. Timmons (Ed.), Oxford studies in normative ethics (Vol. VII). Oxford: Oxford University Press.
Smith, H. (2010). Subjective rightness. Social Philosophy and Policy, 27(2), 64–110.
Tarsney, C. (2017). Rationality and moral risk: A moderate defense of hedging. Ph.D. thesis, University of Maryland, College Park.
von Neumann, J., & Morgenstern, O. (1953). Theory of games and economic behavior. Princeton: Princeton University Press.
Weatherson, B. (2014). Running risks morally. Philosophical Studies, 167(1), 141–163.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
My thanks to Anubav Vasudevan, Christian Tarsney, and two anonymous referees for helpful comments and suggestions. All errors are my own.
Appendix
Appendix
Definition A.1
(Definition 2.3formalized). A kmetatheory \(t_{k}\) is continuous if \(\forall \delta > 0\) \(\exists \varepsilon > 0 : \vec {x}< \varepsilon \Longrightarrow t_{k}(\vec {c_{k+1}}+\vec {x},\vec {p_{k1}})t_{k}(\vec {c_{k1}},\vec {p_{k1}}) < \delta \) \((\delta \in {\mathbb {R}}, \varepsilon \in {\mathbb {R}}, \vec {x} \in {\mathbb {R}}^{d_{t_{k1}}A})\).
Proof of Theorem 1
Let \(d_{t}\) represent my probability distribution over decision theories. By the Analog Principle, \(d_{t}\) also represents my probability distribution over kmetatheories, for any k. My probability distribution over the available acts’ kmetachoiceworthinesses can then be represented by the pair \(\langle \vec {c_{k}}, \vec {p_{0}} \rangle \), \(\vec {c_{k}} \in {\mathbb {R}}^{d_{t}A}\), \(\vec {p_{0}} \in \Delta ^{d_{t}1}\), for all \(k \ge 1\). Note that \(\vec {p_{0}}\) does not depend on k. We can thus let \(f : {\mathbb {R}}^{d_{t}A} \rightarrow {\mathbb {R}}^{d_{t}A}\) represent the function, fully specified by my probability distribution over decision theories, from the ordered set of kmetachoiceworthiness claims about A made by my \(d_{t}\) kmetatheories to the ordered set of (k+1)metachoiceworthiness claims about A made by my \(d_{t}\) (k+1)metatheories.
Let us think of the output of f as an \({\mathbb {R}}^{A}\)valued vector of length \(d_{t}\), with one point in \({\mathbb {R}}^{A}\) given by each decision theory to which I assign positive probability. Since all the decision theories to which I assign positive probability are continuous in \({\mathbb {R}}^{d_{t}A}\), and since vectorvalued functions are continuous if their components are continuous, f is continuous.
Let us now return to thinking of the output of f after k iterations as a vector \(\vec {c_{k}}\) (a \({\mathbb {R}}^{d_{t}}\)valued vector of length A, representing the kmetachoiceworthiness assigned to each act by each theory). By the Dominance Principle, for each act \(a_{i}\) there either exists an order \(k : \min ([\vec {c_{k}}]_{i}) = \max ([\vec {c_{k}}]_{i})\), or else \(\min ([\vec {c_{k+1}}]_{i}) > \min ([\vec {c_{k}}]_{i})\) and \(\max ([\vec {c_{k+1}}]_{i}) < \max ([\vec {c_{k}}]_{i})\) for all k. In the former case, the subjective choiceworthiness of \(a_{i}\) for me is of course welldefined.
In the latter case, the sequence \(\{\min ([\vec {c_{k}}]_i)\}\) is monotonically increasing, and the sequence \(\{\max ([\vec {c_{k}}]_i)\}\) is monotonically decreasing, in k. Since \({\min ([\vec {c_k}]_{i})}\) is bounded above (for example, by \(\max (\vec {[c_{0}]_{i}})\)), each sequence has a limit, by the Monotone Convergence Theorem. It now follows from the continuity of f that \(\lim _{k \rightarrow \infty } \min ([\vec {c_{k}}]_i) = \lim _{k \rightarrow \infty } \max ([\vec {c_{k}}]_i)\).
To see this, by contradiction let \(\{c_{j}\}\) be a convergent subsequence of \(\{c_{k}\}\) (as must exist, by the boundedness of \(\{c_{k}\}\)), and let \(\vec {c} = \lim _{k \rightarrow \infty } \vec {c_{j}}\), with \(\min ([\vec {c}]_{i}) < \max ([\vec {c}]_{i})\). Setting \(2\delta = \min (f(\vec {c})_{i})  \min ([\vec {c}]_{i})\), we know that \(f(\vec {c})\vec {c} \ge 2\delta \) (Let t be one of the theories assigning the minimum value to \(a_{i}\) under \(\vec {c}\). Since t must assign at least the minimum value to \(a_{i}\) under \(f(\vec {c})\), \(2\delta \) can, by the Triangle Inequality, serve as a lower bound for the difference between \(f(\vec {c})\) and \(\vec {c}\)). And since \(\{c_{k}\} \rightarrow \vec {c}\), we can for any \(\varepsilon \) choose j large enough that \(\vec {c_{j}}  \vec {c} < \varepsilon \). We now have a point \(\vec {c}\) and a distance \(\delta \) such that, for any sufficiently small \(\varepsilon \) (namely \(\varepsilon \le \delta \)), there is a \(j^{*}\) with \(\vec {c_{j}}  \vec {c} < \varepsilon \), but \(f(\vec {c_{j}})f(\vec {c}) \ge \delta \), for all \(j \ge j^{*}\). (This follows from the Reverse Triangle Inequality: \(f(\vec {c_{j}})f(\vec {c}) \ge f(\vec {c})\vec {c}f(\vec {c_{j}})\vec {c} = f(\vec {c})\vec {c}  \vec {c_{j+1}}  \vec {c} \ge 2\delta  \varepsilon \ge \delta \).) Since there is no \(\varepsilon \) small enough to ensure that \(\vec {x}  \vec {c}< \varepsilon \Longrightarrow f(\vec {x})  f(\vec {c}) < \delta \) \((x \in {\mathbb {R}}^{d_{t}A})\), f is not continuous in \({\mathbb {R}}^{d_{t}A}\).
We have seen that for each act \(a_{i}\), \(\lim _{k \rightarrow \infty } \min ([\vec {c_{k}}]_{i}) = \lim _{k \rightarrow \infty } \max ([\vec {c_{k}}]_{i})\). It follows that \(\cap _{k \in {\mathbb {N}}}[\min (d_{k}(\pi )_{i}),\max (d_{k}(\pi )_{i})] = 1\) for each \(a_{i}\). In other words, the subjective choiceworthiness of each act in A is welldefined. \(\square \)
Proof of Theorem 2
Choose an act \(a_{i}\). By the Dominance Principle, \(\{\min ([\vec {c_{\kappa }}]_{i})\}\) and \(\{\max ([\vec {c_{\kappa }}]_{i})\}\) must be monotonically increasing (decreasing) transfinite sequences indexed by \(\kappa \). By the Monotone Convergence Theorem, these sequences have limits; let \(\{\min ([\vec {c_{\kappa }}]_{i})\} \rightarrow x\) and \(\{\max ([\vec {c_{\kappa }}]_{i})\} \rightarrow y\). Consider the set \(I_{i} = \cap _{\kappa } [\min ([\vec {c_{\kappa }}]_{i}), \max ([\vec {c_{\kappa }}]_{i})]\). By (the transfinite case of) the Nested Interval Theorem, \(I_{i}\) cannot be empty. \(I_{i}\) can only be a point (if \(x = y\)), in which case the subjective choiceworthiness of \(a_{i}\) is welldefined, or a positivelength interval (if \(x < y\)), in which case the subjective choiceworthiness of \(a_{i}\) is not welldefined. By contradiction, therefore, suppose \(x < y\).
Choose \(\varepsilon > 0\). Define the interval \(G = [x  \varepsilon , x)\), and divide it into the countable partition given by \(G_{j} = [x\frac{\varepsilon }{2^{j}},x\frac{\varepsilon }{2^{j+1}}), j \ge 0\). For each \(G_{j}\), choose an ordinal \(\gamma : \min ([\vec {c_{\gamma }}]_{i}) \in G_{j}\), if such \(\gamma \) exists; skip \(G_{j}\) if no such \(\gamma \) exists. (Such \(\gamma \) must exist for infinitely many \(G_{j}\); if \(\gamma : \min ([\vec {c_{\gamma }}]_{i}) \in G_{j}\) existed for only finitely many \(G_{j}\), \(\{\min ([\vec {c_{\kappa }}]_{i})\}\) could not converge to x.) We have thus constructed a countable sequence \(\Gamma = \{\gamma _{j}\}\) of ordinals such that \(\{\min ([\vec {c_{\gamma _{j}}}]_{i})\} \rightarrow x\).
Choose \(\gamma ^{*} : \gamma ^{*} > \gamma \) \(\forall \gamma \in \Gamma \). (This is possible because, for every set M of ordinal numbers, there is an ordinal number \(\sigma : \sigma > \mu \) \(\forall \mu \in M\).) Since \(\sup _{\gamma ^{\prime }< \gamma ^{*}} \min ([\vec {c_{\gamma ^{\prime }}}]_{i}) = x< y \le \inf _{\gamma ^{\prime } < \gamma ^{*}} \max ([\vec {c_{\gamma ^{\prime }}}]_{i})\), \(t_{\gamma ^{*}}(\pi )_{i} > x\) for all the \(\gamma ^{*}\)metatheories \(t_{\gamma ^{*}}\) to which I assign positive probability. So \(\min ([\vec {c_{\gamma ^{*}}}]_{i}) > x\), a contradiction. \(\square \)
Rights and permissions
OpenAccess This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Trammell, P. Fixedpoint solutions to the regress problem in normative uncertainty. Synthese 198, 1177–1199 (2021). https://doi.org/10.1007/s11229019020989
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11229019020989
Keywords
 Decision theory
 Moral uncertainty
 Normative uncertainty
 Uncertaintism