Decision theory and cognitive choice
Authors
- First Online:
- Received:
- Accepted:
DOI: 10.1007/s13194-010-0005-3
- Cite this article as:
- Welch, J.R. Euro Jnl Phil Sci (2011) 1: 147. doi:10.1007/s13194-010-0005-3
- 2 Citations
- 157 Views
Abstract
The focus of this study is cognitive choice: the selection of one cognitive option (a hypothesis, a theory, or an axiom, for instance) rather than another. The study proposes that cognitive choice should be based on the plausibilities of states posited by rival cognitive options and the utilities of these options' information outcomes. The proposal introduces a form of decision theory that is novel because comparative; it permits many choices among cognitive options to be based on merely comparative plausibilities and utilities. This form of decision theory intersects with recommendations by advocates of decision theory for cognitive choice, on the one hand, and defenders of comparative evaluation of scientific hypotheses and theories, on the other. But it differs from prior decision-theoretic proposals because it requires no more than minimal precision in specifying plausibilities and utilities. And it differs from comparative proposals because none has shown how comparative evaluations can be carried out within a decision-theoretic framework.
Keywords
Decision theoryCognitive choiceProbabilityPlausibilityUtilityInformation1 Introduction
Decision theory tends to work best for trivial decisions, such as when you are in a casino, trying to decide whether to play craps or roulette. For life’s biggest decisions, such as whether to get married or have children, it is pretty much useless. (Irvine 2006, 112)
Considerations like these have spurred repeated bids for a more realistic theory of decision. Some have tried to retrench by dropping back from point-valued to interval-valued functions (Kyburg 1979; Kaplan 1996).^{2} Others recommend limiting a set of epistemically possible probability distributions to a realistic subset whose members are epistemically reliable (Gärdenfors and Sahlin 1982). Still others attempt to rescale decision-theoretic recommendations for ideal agents down to our actual, nonideal case (Weirich 2004; Pollock 2006).
This article also angles for a more realistic decision theory. It does so as a cousin to the interval-based approach of the previous paragraph, but it carries interval’s retrenchment strategy to its outer limit. Just as point values require more precision than interval values, interval values require more precision than comparative values. What might we achieve by applying decision theory with a bare minimum of comparative values for probabilities and utilities?
The answer, I suggest, is ‘more than you might expect’. To flesh out the details of this answer is the purpose of this study. It introduces a comparative version of decision theory that is really just a refinement of the reasoning that scientists and non-scientists alike manage in the course of ordinary decision making. To avoid raising unrealistic expectations, I ask the reader to keep in mind that the comparative decision theory outlined below is not completely general; like standard forms of decision theory, it does not always render a verdict. But it does offer reasonable verdicts in an overwhelming majority of cases. To establish this thesis in its full generality is unfortunately beyond the scope of this article. Here the thesis will be deployed with reference to just one kind of choice: cognitive decisions. A follow-up article will focus on a specific form of cognitive decision: scientific theory choice. A planned further development is the application of comparative decision theory to choices that are partly or wholly noncognitive.
The approach to cognitive choice to be proposed below intersects with recommendations by two groups of thinkers. One is composed of advocates of decision theory for cognitive choice, including Hempel (1960), Levi (1967, 1984), Hintikka (1970a), Kaplan (1981, 1996), Maher (1993), Festa (1999), and others. The second group consists of those who have emphasized comparative evaluation of scientific hypotheses and theories, such as Kuhn (1970, 77, 147), Laudan (1984, 29), Salmon (1990, 329, 331), and Sober (1999, 58).^{3} But the comparative decision theory outlined below diverges fundamentally from both lines of thought. It differs from prior decision-theoretic proposals because it requires no more than minimal precision in specifying probabilities and utilities; as a result, it can be applied where standard, more demanding forms of decision theory cannot. And it differs from the comparative proposals because none, so far as I know, has shown how comparative evaluations can be carried out within a decision-theoretic framework. These differences and others will emerge below.
The article’s proposal is deployed in three stages. The basic decision-theoretic concepts required for decisions under risk are recapped and then adjusted for the special needs of cognitive choice in Section 2. The concept of relative disutility is introduced and generalized in Section 3, which establishes an isomorphism between utility and information. Finally, the results of the two preceding sections are used to develop comparative decision theory in Section 4.
2 Decisions under risk
People face decision problems. These problems are customarily divided into two classes: decisions under ignorance, which do not treat probabilities, and decisions under risk, which do. Since cognitive choices can be looked at as decisions under risk, decisions under ignorance will be ignored in this study. The basic concepts required for decisions under risk are reviewed in Section 2.1 and adapted to the context of cognitive choice in Section 2.2.
2.1 The basics
Decisions under risk can be conceptualized by drawing on seven basic concepts: acts, states, outcomes, probability, utility, order, and decision rule. Though the decision-theoretic uses of these concepts will be all too familiar to most readers, I comment briefly on each in order to motivate adjustments to be proposed in the following section.
Acts, states, and outcomes can be thought of as propositions (Jeffrey 1983, 82–85). To choose an act is to make a proposition true. A state can be viewed as a proposition about features that can be attributed to the world. An outcome can be taken as a news item that results from pairing acts and states. The number of acts, states, and outcomes under consideration by an agent at a given moment is here assumed to be finite (cf. Gärdenfors and Sahlin 1982, 364; Weirich 2004, 24, 28, 142).
- Pr1.
If q is contradictory, μ(q) = 0.
- Pr2.
If q is tautologous, μ(q) = 1.
- Pr3.
If q and r are mutually exclusive, \( \mu \left( {q\,{\hbox{v}}\,r} \right) = \mu (q) + \mu (r) \).
Judgments about probabilities are expressed with varying degrees of sharpness. A probability may be characterized numerically as a point like 1/6 or as an interval such as 1/4–1/3. Yet Keynes claimed that “not all probabilities are numerical” ([1921] 2004, 65). This claim was criticized in its day by Jeffreys (1931, 223), who wanted to keep the link between probability and numbers as tight as possible. But even if all probabilities are ultimately numerical, epistemic limitations sometimes force us to talk about them with less than numerical precision. Probabilities can be described nonnumerically, either in qualitative terms like ‘beyond a reasonable doubt’ or in comparative terms such as ‘greater than’, ‘equal to’, or ‘less than’. The probabilistic uses of both qualitative and comparative terms are amply documented in discussions of law, scientific theory, questions of conscience, documental authenticity, insurance rates, and much else (Franklin 2001).
A utility function represents the agent’s preferences for possible outcomes due to choice. Typically, a utility function υ maps propositions about possible outcomes onto the real numbers. Utilities have been variously axiomatized, but a particularly transparent version due to Jensen requires completeness (full comparability and transitivity), continuity, and independence (1967, 171–173).
Classic formulations of decision theory assume that beliefs, desires, and preferences for acts have representations that are totally ordered. Because a probability measure represents an agent’s beliefs by mapping them onto the unit interval, any belief in the probability measure’s domain must be less probable, as probable, or more probable than any other. Analogously, whenever a utility function maps the agent’s desires onto the reals, any desire in the function’s domain must be less than, equal to, or greater than any other (e.g., von Neumann and Morgenstern [1944] 1953, 26). In much the same way, an agent’s preference for a given act is assumed to be less than, equal to, or greater than that for any other act (e.g., Savage 1972, 18). The net result of these assumptions is to exclude the possibility of incomparable beliefs, desires, and preferences for acts.
2.2 Adjusting the basics for cognitive choice
- a)
Acts: The acts at the heart of this study are cognitive acts such as choosing a hypothesis, theory, axiom, or world view. In order to refer generically to hypotheses, theories, axioms, world views, and the like, I will use the term ‘cognitive options’ or, where context permits, ‘options’. To choose a cognitive option, in the sense employed here, is to use an option for a cognitive purpose such as explaining, predicting, or simply understanding some range of phenomena.
- b)
States: The states of the world that are relevant to cognitive choice are those posited by the options under consideration, whatever they happen to be. As a practical matter, however, not all states posited by these options actually form part of the decision-theoretic matrix. Rather, the crucial states are chosen with an eye to contrast, for the states that matter in cognitive choice are those where the options under consideration diverge. The propositions associated with these states may be contradictories or contraries.
- c)
Outcomes: The outcomes of decisions can be quite varied, of course. Some outcomes are cognitive, such as information gained or information lost. But economic, legal, political, military, social, aesthetic, and other sorts of outcomes are noncognitive. Where the relevant outcomes are purely cognitive, the decision can be classified as cognitive; where the situation is mixed, with relevant outcomes that are both cognitive and noncognitive, the decision is partly cognitive; and where the relevant outcomes are purely noncognitive, the decision is noncognitive. As noted, my concern in these pages is cognitive decisions.
If information lost and information gained are outcomes of cognitive decisions, the question ‘What is information?’ is decision-theoretically relevant. Some explications are statistical (Shannon 1948); others are semantic (Hintikka and Pietarinen 1966); still others are pragmatic (Levi 1984). For the limited purposes of this article, I will operate with a notion of information as reduction of uncertainty. This notion is sufficiently generic to be admitted by statistical (Floridi 2004, 198), semantic (Hintikka 1970b, 264), and pragmatic (Levi 1984, 65) points of view. Uncertainty, I will assume, is uncertainty about attributable states of the world. Take, for example, the proposition that impact from an asteroid or comet had nothing to do with the extinction of the dinosaurs—the most common view among paleontologists as late as 1988 (Bryson 2003, 195, 199). Luis and Walter Alvarez’s hypothesis that just such an impact was responsible for the dinosaurs’ extinction, once the impact of Comet Shoemaker-Levy 9 on Jupiter was observed and the Chicxulub impact crater identified, eliminated the no-impact possibilities for most paleontologists. The Alvarez hypothesis is therefore informative.
- d)Plausibility: In the context of cognitive choice, reliance on the concept of probability is highly problematic due to the numeric exigencies of decision theory, already noted in Section 1. Leonard J. Savage frankly admitted the difficulty:
The postulates of personal probability [used in Savage’s decision theory] imply that I can determine, to any degree of accuracy whatsoever, the probability (for me) that the next president will be a Democrat. Now, it is manifest that I cannot really determine that number with great accuracy, but only roughly…. [A]s is widely recognized, all the interesting and useful theories of modern science, for example, geometry, relativity, quantum mechanics, Mendelism, and the theory of perfect competition, are inexact…. (1972, 59)
As a way of dealing with the paucity of reliable numeric probabilities, I propose that we replace probability with plausibility in the context of cognitive choice. Whereas a probability measure may map propositions about attributable states to numbers in the unit interval, a plausibility measure can map propositions about attributable states to members of any partially ordered set (Friedman and Halpern 1995). In this highly general conception, a plausibility function returns values that are bounded by nonnumeric limits ⊤ and ⊥, where ⊤ represents the maximum plausibility and ⊥ the minimum plausibility (Friedman and Halpern 1995). For any plausibility value x, therefore, ⊥ ≤ x ≤ ⊤. Though plausibility values are often limited to the special case of the unit interval (e.g., Klir 2006, 166), they need not be numeric at all. Propositions about states can be mapped to qualitative plausibilities like high, low, intermediate, unlikely, nearly certain, and the like as long as they are partially ordered.
For our purposes, then, a plausibility measure π can be taken to map attributable states to plausibility values. π satisfies the following requirements for propositions q and r (Chu and Halpern 2004, 209–210):- Pl1.
If q is contradictory, π(q) = ⊥.
- Pl2.
If q is tautologous, π(q) = ⊤.
- Pl3.
If q implies r, π(q) ≤ π(r).^{4}
Comparing Pl1–3 to the earlier Pr1–3 for probability shows that the plausibility measure π generalizes the probability measure μ; probability is therefore a special case of plausibility. In fact, plausibility so defined appears to be the most general of the current approaches to representing uncertainty. Other standard representations, including Dempster-Shafer belief functions, possibility measures, and ranking functions, all turn out to be special cases of plausibility (Halpern 2003, ch. 2).
- Pl1.
- e)
Utility: The foregoing points about the paucity of reliable numeric probabilities also apply to utilities. Regardless of whether utilities are ultimately numeric or not, epistemic limitations sometimes force us to express them nonnumerically. As a matter of empirical fact, utilities are often expressed with varying degrees of quantitative sharpness. The utility of an outcome may be expressed numerically, either as a number or as an interval, but it can also be expressed nonnumerically, in qualitative terms like ‘good’, ‘bad’, or ‘outstanding’ (Halpern 2003, 165) or comparative terms such as ‘greater than’, ‘equal to’, or ‘less than’.
In the general case, decision-theoretic utility represents the evaluation of an outcome due to choice. In the restricted context of this study, utility represents the evaluation of an outcome due to cognitive choice (cf. Hempel [1960] 1965, 75–76). As noted above, the outcomes of cognitive choice are information outcomes. Any option that is more informative than another should receive, to that extent, a higher utility. In the context of cognitive choice, utility should be proportional to information.
On one appealing view, information is a consequence of choosing cognitive options that are true, while misinformation results from choosing cognitive options that are false. This view, I suggest, needs refinement, for it would not discriminate between an option that is almost but not quite right and an option that is all wrong. We would be better off, therefore, to take outcomes of cognitive choice to be information in proportion to the chosen option’s degree of truthlikeness. Some options are maximally truthlike—those that state the full truth in their domains—others are not truthlike at all—those that capture no truth in their domains—while others fall between these two extremes. Given less-than-maximal degrees of truthlikeness, options that are false need not be equally false. A hypothesis that falsely gives the age of the universe as 10 billion years is more truthlike than a hypothesis that falsely gives it as 6,000 years. Choice of the first hypothesis would yield more information than choice of the second.
- f)Order: The idea that preferences are totally ordered is doubtful in the extreme. Even von Neumann and Morgenstern were uneasy about it: “It is very dubious, whether the idealization of reality which treats this postulate as a valid one, is appropriate or even convenient” ([1944] 1953, 630). Others have gone further, declaring it descriptively and normatively mistaken:
I share these concerns; in fact, I would like to amplify them.Of all the axioms of utility theory, the completeness axiom [which posits a total order] is perhaps the most questionable. Like others of the axioms, it is inaccurate as a description of real life; but unlike them, we find it hard to accept even from the normative viewpoint. Does ‘rationality’ demand that an individual make definite preference comparisons between all possible lotteries (even on a limited set of basic alternatives)? For example, certain decisions that our individual is asked to make might involve highly hypothetical situations, which he will never face in real life; he might feel that he cannot reach an ‘honest’ decision in such cases. Other decision problems might be extremely complex, too complex for intuitive ‘insight’, and our individual might prefer to make no decision at all in these problems…. Is it ‘rational’ to force decisions in such cases? (Aumann 1962, 446; cf. Ok 2002; Ok et al. 2004)
Suppose we are concerned with the probabilities of two attributable states: s_{1} and s_{2}. Although we can sometimes affirm that s_{1} is more probable, equally probable, or less probable than s_{2}, sometimes we cannot. Even if we assume that, in the final analysis, any state is probabilistically comparable to any other (cf. Jeffreys 1961, 16), prior to the final analysis we may not be able to perform the comparison. The probability that Mexico City’s population will exceed 22,000,000 by 2022 and the probability that the first card drawn from this old and probably incomplete deck will be a heart surely defy any attempt to order them in a reasonable way.^{5} For all practical purposes, then, probabilities are partially ordered (cf. Keynes [1921] 2004, 29–30, 34).
Yet the cognitive situation may be worse than mere absence of numeric probabilities. An agent may be unable to say whether the plausibility of Mexico City’s population reaching a certain mark is greater than, equal to, or less than the plausibility of drawing a heart from a possibly incomplete deck, for example. At the crucial point of decision, some states turn out to be plausibilistically incomparable. At such points, therefore, plausibilities are partially ordered.
Like the plausibility of a state, the utility of an outcome may be viewed as greater than, equal to, or less than another. Regrettably, such comparisons cannot always be made. In a well-known example, Dewey and Tufts describe the value conflict of a citizen who wants to be loyal to his country but opposes his country’s war (1932, 174–175).^{6} Even if these alternatives are comparable in some deep sense, epistemic limitations may prevent the citizen from carrying out the comparison. For all practical purposes, utilities are partially ordered.
To express the appropriate order relations, I will take the nonstrict comparative term ‘≼’ as primitive. The ≼ relation establishes a partial order; that is, it is reflexive, antisymmetric, and transitive.
When flanked by plausibility values, ‘≼’ can be read as ‘is less plausible than or equally plausible to’ or ‘is not more plausible than’. The following plausibility relations are immediately definable in terms of it, conjunction (‘∧’), and negation (‘–’):This approach affords the philosophical advantage of neatly distinguishing equiplausibility from incomparability, which may not be possible when a strict relation like infraplausibility or supraplausibility is taken as primitive.Infraplausibility \( [\pi ({s_1},e) < \pi ({s_2},e)]{ =_{\rm{df}}}[\pi ({s_1},e) \preccurlyeq \pi ({s_2},e)] \wedge - [\pi ({s_2},e) \preccurlyeq \pi ({s_1},e)] \)
Supraplausibility \( [\pi \left( {{s_1},e} \right) > \pi \left( {{s_2},e} \right)\left] { { =_{\rm{df}}}-} \right[\pi \left( {{s_1},e} \right) \preccurlyeq \pi \left( {{s_2},e} \right)] \wedge [\pi \left( {{s_2},e} \right) \preccurlyeq \pi \left( {{s_1},e} \right)] \)
Equiplausibility \( [\pi \left( {{s_1},e} \right) = \pi \left( {{s_2},e} \right)]{ =_{\rm{df}}}[\pi \left( {{s_1},e} \right) \preccurlyeq \pi \left( {{s_2},e} \right)] \wedge [\pi \left( {{s_2},e} \right) \preccurlyeq \pi \left( {{s_1},e} \right)] \)
Incomparability \( [\pi \left( {{s_1},e} \right)|\,\pi \left( {{s_2},e} \right)]{ =_{\rm{df}}} - [\pi \left( {{s_1},e} \right) \preccurlyeq \pi \left( {{s_2},e} \right)] \wedge - [\pi \left( {{s_2},e} \right) \preccurlyeq \pi \left( {{s_1},e} \right)] \).
Like Savage, who used his primitive ‘≤’ to mean ‘is not preferred to’ for acts and ‘is not more probable than’ for events, I will use the primitive ‘≼’ in different settings and allow its associated values to determine its sense. In addition to the plausibilistic usage just described, ‘υ(o_{1}) ≼ υ(o_{2})’ can be read as ‘the utility of outcome o_{1} is no greater than the utility of outcome o_{2}’ and ‘PE(a_{1}) ≼ PE(a_{2})’ as ‘the plausibilistic expectation of act a_{1} is no greater than the plausibilistic expectation of act a_{2}’ (plausibilistic expectation is defined in the following subsection on decision rules). Taking these nonstrict relations as primitive, we can define relations of utility and plausibilistic expectation that are structurally parallel to infraplausibility, supraplausibility, equiplausibility, and incomparability.
- g)
Decision rules: The preceding paragraphs in this section amount to a retrofitting of decision problems under risk for cognitive choice. But decision problems plead for decision rules. What decision rules would be appropriate for cognitive choice?
To suggest an answer to this question, we begin by noting a historical process succinctly summarized by Savage. In discussing Daniel Bernoulli’s advocacy of maximizing expected utility, he remarks:Between the time of Ramsey and that of von Neumann and Morgenstern there was interest in breaking away from the idea of maximizing expected utility…. This trend was supported by those who said that Bernoulli gives no reason for supposing that preferences correspond to the expected value of some function, and that therefore much more general possibilities must be considered. Why should not the range, the variance, and the skewness, not to mention countless other features, of the distribution of some function join with the expected value in determining preference? The question was answered by the construction of Ramsey and again by that of von Neumann and Morgenstern…; it is simply a mathematical fact that, almost any theory of probability having been adopted and the sure-thing principle [Savage’s second postulate] having been suitably extended, the existence of a function whose expected value controls choices can be deduced (1972, 96–97).
The decision rule that corresponds to GEU is to maximize generalized expected utility. Chu and Halpern show that this rule is universal in the sense that it determines the same ordinal rankings as any decision rule that satisfies a trivial condition. The condition is that the rule weakly respect utility—roughly, that act preferences track outcome utilities for all constant acts (2004, 216, 219, 226–227). Constant acts have outcomes that are independent of states of the world (Savage 1972, 25).
- (a)
positive values: up < uP < UP; up < Up < UP; uP | Up^{8}
- (b)
negative values: –UP < –uP < –up; –UP < –Up < –up; –uP | –Up
- (c)
mixed values: for all x, y ∈ T, (x < 0 ∧ y > 0) → (x < y).
- (a)
\( (x < { }0 \wedge y < { }0) \to ((x \oplus y) = -). \)
- (b)
\( (x < 0 \wedge y > 0 \wedge \left| x \right| < \left| y \right|) \to ((x \oplus y) = + ). \)
- (c)
\( (x < 0 \wedge y > 0 \wedge \left| x \right| = \left| y \right|) \to ((x \oplus y) = 0). \)
- (d)
\( (x < 0 \wedge y > 0 \wedge \left| x \right| > \left| y \right|) \to ((x \oplus y) = -). \)
- (e)
\( (x > 0 \wedge y > 0) \to ((x \oplus y) = + ). \)
The reader will have noticed that the values in the expectation domain D are both coarse and sparse. They have been chosen to reflect the merely comparative discriminations that condition most real-life decision making. Judgments that the plausibility of one state is greater than that of another, for example, or that the utility of one outcome is equal to that of another, are not very precise. But they are precise enough to ground the comparative decision theory detailed in Section 4.
The decision rule that corresponds to PE would be to maximize plausibilistic expectation. This decision rule generalizes the rule based on E, the standard definition of expected utility, in two directions at once: from probability to plausibility and from total to partial order.
Given that we must operate sub specie temporis, what would be a rational approach to these incomparabilities? Since they cannot be wished away, I am afraid the options are stark: to abandon decision theory whenever incomparability rears its head, or to adapt the decision rule to the situation. I would not object to abandoning decision theory provided we have a viable alternative. Unfortunately, I do not know what that would be. I conclude, then, that our best hope is to adapt the decision rule to the situation.
The intuition underlying my proposed adaptation can be introduced as follows. Suppose that an opening in Buddhist philosophy is to be filled by one of two specialists: one has published exclusively on philosophy of logic in Hindi; the other, exclusively on ethics in Mandarin. The departmental chair, who must make the decision, is utterly unable to read either language and lacks contacts with the relevant expertise. From the chairperson’s point of view, the candidates’ publications are not comparable, but their teaching abilities are comparable and unequal. If publications and teaching are the only relevant criteria, the chair should hire the better teacher.
If we generalize this intuition, we end up with something like the following norm: Where just two criteria are relevant to a choice but one is somehow inapplicable, we must fall back on the other criterion. To apply this norm to cognitive choice, we would have to consider two cases. The first is where utilities are comparable while plausibilities are not; the other, where plausibilities are comparable but utilities are not. I will refer to the first case as ‘utility-comparable’ and to the second as ‘plausibility-comparable’. In both cases, the strategy is to ignore the incomparable values, since nothing useful can be extracted from them, and rely on the comparable values instead.
My proposal for decision rules thus amounts to this: for fully comparable situations, maximize plausibilistic expectation (PE); for utility-comparable situations, maximize utility-comparable expectation (UCE); and for plausibility-comparable situations, maximize plausibility-comparable expectation (PCE). Note that all three senses of expectation are special cases of Chu and Halpern’s GEU.
Let me try to sum up the rationale for these rules as concisely as possible. The rationale is pragmatic, and it can be articulated around one belief, one desire, and two hard facts. The belief is that classical decision theory, for all its elegance and power, does not afford a realistic approach to cognitive choice. The desire is to move decision theory in a direction that will remedy this situation. The hard facts are numeric poverty and incomparability. Numeric poverty is the lack of reliable numbers for probabilities and utilities that characterizes most real-life decision making. Incomparability is the occasional but persistent inability to reasonably determine whether one probability, plausibility, or utility is greater than, equal to, or less than another. Together, these facts drive the shift from E to PE. PE admits nonnumeric representations of beliefs and desires and, because it countenances partial orders, recognizes the reality of incomparability.
But, as we have seen, incomparability incapacitates PE; if two utilities, say, are incomparable, the summations of products it mandates cannot be performed. Hence the two options mentioned above: abandon decision theory in cases of incomparability, or adapt PE to the situation. The option of abandoning decision theory is doubly prohibitive: it runs directly counter to the project of developing a realistic decision-theoretic treatment of cognitive choice, and it leaves us high and dry without an alternative. Consequently, I think we should adapt PE to the situation. Granted, the special-case rules based on UCE and PCE are far from ideal. But their distance from the ideal does not mean they are faulty; rather, it reflects a defective situation. The rationale for using them is basically no different than that for using the rule based on PE. The rationale for all three can be summed up in three words: ‘Use comparable data!’
I want to conclude this discussion of decision rules by mentioning two objections that might be raised against my proposal. The first builds on the claim that decision-theoretic choice requires comparability of both plausibilities and utilities. When these conditions are not met, therefore, we can only suspend judgment. This would amount to relying exclusively on the decision rule associated with PE and rejecting the special-case decision rules based on UCE and PCE.
In response, I would recall William James’ distinction between forced and avoidable options ([1897] 1979, 14–15). A forced option, in James’ sense, is a “complete logical disjunction” such as “Either accept this truth or go without it”. Here logic forces a choice of exactly one alternative. But other options are characterized by pragmatic, not logical, force: a choice must be made in order to achieve some goal. Even though it is not logically necessary to choose one of a set of screwdrivers, for instance, it might be pragmatically necessary in order to set a screw. In much the same way, cognitive choice can be pragmatically forced. There are times when we want to explain, to predict, or to evaluate an experiment, and without choosing a cognitive option we would not be able to proceed. When faced with these pragmatically forced options, the sensible response is to rely on the comparable data at hand, whether plausibilities and utilities, just utilities, or just plausibilities. This is the strategy underlying the decision rules based on PE, UCE, and PCE.
Incidentally, suspension of judgment is often motivated by concerns that have no intrinsic connection with information outcomes—fear of personal or professional embarrassment, for example. These concerns can be seamlessly integrated in a decision-theoretic matrix as an additional type of outcome, and suspension of judgment will sometimes be a rational response. Yet this maneuver makes the problem partly cognitive, not cognitive, and therefore falls outside the purview of this study.^{10}
A variant of the objection in favor of suspending judgment is that all we really care about is utility; plausibility registers in decision-theoretic choice only to the extent that it maximizes utility.^{11} Hence the parallel modifications of PE that generate UCE and PCE are misguided. The decision rule based on UCE is acceptable because it plays the utility game, but the rule associated with PCE should be rejected and replaced by a policy of suspending judgment.
Three observations can be offered in response. If we take our cue on decision making from E, PE, and GEU, there is no mathematical justification for favoring utility over probability or plausibility. Even though probability in E and plausibility in PE and GEU play markedly different roles than utility, the products in the summations that yield mathematical expectations are made up of one part probability or plausibility and one part utility. That is, probability or plausibility has the same mathematical import as utility.
In addition, the mathematics reflects what I take to be the right response to the following scenario. Suppose that attaining a cognitive goal requires you to choose between two cognitive options whose information outcomes under relevant states of the world are known. Try as you might, however, you simply cannot rank one outcome as more desirable, equally desirable, or less desirable than the other. Nevertheless, the states of the world posited by one option appear to be more plausible than the states posited by its rival. You need to choose a cognitive option; how should you proceed? I can only suggest that ignoring what is known about the options—the relative plausibilities of the states they posit—would be epistemically imprudent. This is the gist of the decision rule based on PCE.
Finally, there are historical considerations that buttress the PCE-based decision rule. PCE is intimately related to what is perhaps the most ancient cognitive practice of all: probabilism. Though the term ‘probabilism’ can be traced at least as far back as the New Academy to Carneades’ doctrine of the probable, the practice of basing action on the highest available probabilities is much older than Carneades; its origins are lost in prehistory. Since those who act as probabilists are typically focused on plausibility, not probability in the strict mathematical sense, ‘plausibilism’ would be a more accurate description than ‘probabilism’ (cf. Pigozzi 2009, 4). So understood, plausibilism would dictate PCE in plausibility-comparable situations. Though these historical considerations are not conclusive in themselves, the fact that PCE can be grafted onto this age-old cognitive tradition hardly strikes me as trivial. In our suite of three decision rules, then, it is PCE that can claim bragging rights for pedigree.
The decision rules based on PE, UCE, and PCE are employed in Section 4 below. But one further building block is needed before we can proceed: the concept of relative disutility.
3 Relative disutility
The comparative decision theory outlined in this article relies on the concept of relative disutility, present in all but name in a proposal by Hintikka and Pietarinen (1966). Section 3.1 introduces and adapts this notion, and Section 3.2 generalizes it for cognitive choice. As the reader will observe, relative disutility is not meant to be applied indiscriminately; rather, it should be used only when reliable numeric utilities are not available.
3.1 The Hintikka-Pietarinen proposal
This proposal for the utilities u of h and –h can be summed up in the following decision table, where s_{h} and s_{-h} are states of the world posited by h and –h respectively:If h is true, the utility of his [the theorist’s] decision is the valid information he has gained…. If h is false, it is natural to say that his disutility or loss is measured by the information he lost because of his wrong choice between h and –h, i.e., by the information he would have gained if he had accepted –h instead of h. (1966, 107–108; cf. Hintikka 1970a, 16).
Hintikka-Pietarinen utility assignments are highly suggestive. They suggest the possibility of generalization to include all cognitive options, not just hypotheses. They are consistent with the view that cognitive choice is a two-person zero-sum game played by a truth-seeking self and nature (Hintikka 1983, 3). And they are strongly analogous to regret values for the minimax regret rule in decisions under ignorance (Peterson 2009, 49–50). Regret, in fact, is a form of disutility.
To characterize these utility assignments more precisely, we need to distinguish between intrinsic and relative utility.^{12} The intrinsic utility of an outcome is its utility considered in itself, without reference to other utilities. But the relative utility of an outcome is its utility compared to another utility. The Hintikka-Pietarinen disutilities of –u(–h) from choosing h when s_{-h} holds and –u(h) from choosing –h when s_{h} holds are relative disutilities.
Which type of utility is more appropriate: intrinsic or relative? In the context of cognitive choice, I submit that intrinsic utilities are inappropriate. Take the case of a false theory—Newtonian dynamics, for example. A plausible intrinsic utility for such a theory is zero.^{13} But the relative utility of Newtonian dynamics would vary with the context. If it is being compared to Buridan’s impetus theory, its relative utility would normally be positive; but if it is being compared to relativistic dynamics, its relative utility would typically be negative. To invariably assign a utility of zero to cognitive options that are false would misdescribe the mechanics of cognitive choice, I think. For if the utilities of all false options are zero, one false option could not be reasonably preferred to another; yet one false option can be reasonably preferred to another; hence the utilities of all false options are not zero. The utility of a cognitive option, like so much else, depends on what is on the menu.
Though the comparative decision theory presented here draws on the Hintikka-Pietarinen proposal for epistemic utility, it differs from Hintikka and Pietarinen’s approach in a number of nontrivial ways. Here I will mention only one. Hintikka and Pietarinen were concerned with the binary case of contradictory hypotheses, but many cognitive options are not so neatly related. The relevant options are often contraries instead of contradictories. Hence the comparative decision theory below generalizes Hintikka and Pietarinen’s approach to cover more typical cases of cognitive choice where the options in play may be contraries as well as contradictories. This generalization is carried out in Section 3.2.
3.2 Generalizing the Hintikka-Pietarinen proposal
First, a terminological matter: distinctions among total outcome, shared outcome, and unique outcome. An act’s total outcome is its full set of consequences. An act’s shared outcome is any part of its total outcome that is also obtainable by performing another act under consideration. An act’s unique outcome is any part of its total outcome that is not obtainable by performing other acts under consideration. A simple example: if one act pays off with a lottery ticket and a theater ticket while another act results in the same lottery ticket and a concert ticket, the lottery ticket is the shared outcome of both acts; the theater ticket is the unique outcome of the first act; and the concert ticket is the unique outcome of the second.
To extend the concept of relative disutility to cognitive options that may be contraries as well as contradictories, we note that the information outcomes of cognitive options c_{1} and c_{2} may overlap or not. If they do not, the total outcome of an act is identical to its unique outcome. In such cases, the situation is only slightly more complicated than that envisioned by Hintikka and Pietarinen above. The choice of c_{1} will result in the gain of any utility u_{1} provided by that option or the loss of any utility u_{2} provided by the rival c_{2}. Conversely, the choice of c_{2} will lead to the utility gain u_{2} or the utility loss –u_{1}. The further complication is that, unlike the Hintikka-Pietarinen scenario, c_{1} and c_{2} may be contraries and therefore not jointly exhaustive. Hence there is a possibility of a third cognitive option c_{3}. But we can deal with this third option provided we can deal with the first two. That is, suppose that we bring the comparative decision theory outlined in these pages to bear on the choice between c_{1} and c_{2} and that c_{2} turns out to be the winner. Then we can repeat the procedure for c_{2} and c_{3}. As before, the outcomes of choosing the options may or may not overlap. If they do not, we proceed as in this paragraph; if they do, we proceed as in the next one.
If the information outcomes do overlap, total outcome cannot be identical to unique outcome. Nor can total outcome be identical to shared outcome when the cognitive options involved are contraries; contrary options assure unique information outcomes. Let c_{1} and c_{2} be contrary cognitive options with shared information outcomes. Since the shared outcome would be obtained regardless of whether we choose c_{1} or c_{2}, it could not provide a reason for choosing one option over the other. The independence principle thus applies: “if two acts have the same consequences in some states, then the person’s preferences regarding those acts should be independent of what that common consequence is” (Maher 1993, 10; cf. Behn and Vaupel 1982, 315).^{14} In such cases, the independence principle authorizes ignoring shared outcomes and choosing on the basis of unique outcomes alone. Now suppose that a unique outcome offered by c_{1} has utility u_{1} and a unique outcome promised by c_{2} has utility u_{2}. Hence if we choose c_{1}, we miss out on any utility provided by c_{2} but not by c_{1}; we could have obtained this utility by choosing c_{2} instead. The disutility of this choice would be –u_{2}. Conversely, if we choose c_{2}, the disutility of this choice would be the loss of any utility uniquely provided by c_{1}. This disutility is –u_{1}.
In summary, the information outcomes of choosing c_{1} and choosing c_{2} are either entirely disjoint, in case they do not overlap, or have partial outcomes that are disjoint, in case they do overlap. In the latter case, the independence principle licenses choice based on disjoint partial outcomes alone. Together, the two cases permit us to generalize the Hintikka-Pietarinen proposal to include cognitive options that are contraries as well as contradictories. For even if two options are contraries, choice of one option foregoes whatever unique outcome would result from choice of the other. Hence any utility attaching to an unrealized unique outcome would be lost as well. Consequently, if the unique outcome of choosing one option has utility u, the relative disutility of the outcome of choosing the rival option is –u.
4 Comparative decision theory
Relying on the conceptual framework outlined in Sections 2 and 3, we can now indicate how comparative decision theory could be applied in choosing a cognitive option.^{15} In its simplest and most decisive form, cognitive choice is the selection of one of two cognitive options. Section 4.1 treats this binary case, and Section 4.2 extends the binary strategy to the finite general case. I assume that the numeric data that would make the application of standard forms of decision theory feasible are unavailable.
4.1 The binary case
Binary cases for comparative decision theory
Case | Plausibility | Utility |
---|---|---|
1 | < | < |
2 | < | > |
3 | < | = |
4 | < | | |
5 | > | < |
6 | > | > |
7 | > | = |
8 | > | | |
9 | = | < |
10 | = | > |
11 | = | = |
12 | = | | |
13 | | | < |
14 | | | > |
15 | | | = |
16 | | | | |
Only case 16 remains. Like cases 2 and 5, it results in no decision, but it does so for a different reason. In cases 2 and 5, the decision-theoretic machinery breaks down. In case 16, however, the machinery cannot even start up; since both plausibility and utility are incomparable, decision theory has no grist for its mill. Here it can say nothing at all.
One advantage of the comparative approach of the preceding paragraphs is that the difference between indifference and indecision is entirely transparent. Where the result is indifference (cases 11, 12, and 15), we have a good decision-theoretic reason to choose either option. But where the outcome is indecision (cases 2, 5, and 16), we have no decision-theoretic reason to choose at all.
While the indifference judgments in cases 11, 12, and 15 do not assert that either c_{1} or c_{2} is superior to the other, they do make an affirmation: c_{1} is as choice-worthy as c_{2}. This affirmation is a disjunctive judgment, structurally comparable to the disjunctive solutions proposed in the literature on moral dilemmas (Greenspan 1983, 117–118; Gowans 1987, 19; Zimmerman 1996, 209, 220–221). To appreciate the work that disjunctive judgments do, take the four basic possibilities for binary choice of any kind: option 1, option 2, both option 1 and option 2, neither option 1 nor option 2. A decision theorist who forms the judgment ‘c_{1} or c_{2}’, as in cases 11, 12, and 15, has already rejected the neither option. And, since the options are rivals and cannot be true (or justified) at all the same points, the theorist might feel compelled by circumstances to choose one of them even though she has no reason to choose it over its rival. The theorist would then have made a disjunctive judgment that excludes two of the four basic options: neither and both.
Binary cases with resolutions
Case | Plausibility | Utility | Resolution |
---|---|---|---|
1 | < | < | c_{2} |
2 | < | > | no decision |
3 | < | = | c_{2} |
4 | < | | | c_{2} |
5 | > | < | no decision |
6 | > | > | c_{1} |
7 | > | = | c_{1} |
8 | > | | | c_{1} |
9 | = | < | c_{2} |
10 | = | > | c_{1} |
11 | = | = | c_{1} or c_{2} |
12 | = | | | c_{1} or c_{2} |
13 | | | < | c_{2} |
14 | | | > | c_{1} |
15 | | | = | c_{1} or c_{2} |
16 | | | | | no decision |
4.2 The finite general case
We have assumed from the beginning that the number of acts open to the agent at a given moment is finite. Consequently, the number of cognitive options under consideration by the agent at a given moment is finite. This implication may appear false in light of the frequent observation that, at any given moment, there are an infinite number of cognitive options from which to choose. I grant that there may be an infinite number of epistemically possible options at any given moment, but there are not an infinite number of epistemically promising ones, that is, options regarded as serious candidates by experts in the field.^{16} Famously, there were two serious candidates in the field of gravitational physics at the time of the solar eclipse of 1919: Newton’s and Einstein’s. That this was no exception is borne out by the history of science. The number of serious candidates at any given moment appears to be always, or almost always, finite and small. A theory with a parameter having a large—perhaps infinite—number of possible values is not viewed as a large number of theories; it is regarded as a single theory with a large number of parametric versions. Think of the general theory of relativity and its parameter for spacetime curvature. Hence if we are always, or almost always, faced with a small number of serious cognitive options, the binary case is critical. For if it is possible to choose between c_{1} and c_{2} such that c_{2}, say, is the winner, then it is also possible in principle to hold a run-off between c_{2} and any option c_{3}—and so on successively.
This assumes that preference among cognitive options is a transitive relation. The transitivity of preference is routinely affirmed by decision theorists (Savage 1972, 18; Jeffrey 1983, 144–145; Maher 1993, 60), yet this affirmation has been repeatedly challenged (e.g., Hughes 1980; Black 1985; Baumann 2005). This is not the occasion for a full-blown discussion of transitivity, but I would like to state the two following claims. Even if cognitive preference should turn out to be intransitive, first of all, the comparative approach to binary cognitive choice outlined in Section 4.1 would still go through, for transitivity is not an issue where only two options are concerned. Hence the comparative route is always open for whatever two options we care to evaluate. The second claim is that the assumption that cognitive preference is transitive, if properly understood and suitably employed, does in fact hold. The main consideration is to restrict transitive inference to the same sense of expectation. That is, we have been speaking of expectation in three related senses: plausibilistic expectation (PE), utility-comparable expectation (UCE), and plausibility-comparable expectation (PCE) (Section 2.2g). Transitivity holds provided that the expectations in play are all plausibilistic, all utility-comparable, or all plausibility-comparable. Mixing these senses generates fallacies of equivocation.^{17}
These all-too-brief considerations cannot pretend to establish the transitivity of cognitive preference, of course; the issue is a large one indeed (Maher 1993, ch. 2). But the transitivity assumption seems to be as widely accepted as any normative principle of rational choice; it is common to both the Anglo–American and Franco–European schools of decision theory, for instance (Fishburn 1991, 115). To conclude a balanced discussion of transitivity, Patrick Maher appeals to the wisdom of Chairman Mao: let “a hundred flowers blossom, and a hundred schools of thought contend”. That is, “Since foundational arguments have been found inadequate to settle the [transitivity] issue either way, advocates of different positions should get to work developing theories based on their preferred principles. We can then use our judgments of the resulting theories to help decide between the principles” (Maher 1993, 62). This article has attempted to follow this advice.
5 Conclusion
How would the results of Section 4.1 compare with those from standard numeric forms of decision theory? Recall that the decision rule for plausibilistic expectation based on PE cannot be applied to cases with either incomparable utilities (4, 8, 12) or incomparable plausibilities (13, 14, 15); consequently, we derived decision rules associated with UCE and PCE for these special cases. But the standard decision rule for expected utility associated with E fails to be applicable to these very same cases. Still, if we employ the same tactics for E as we did for PE, we could adopt special-case decision rules analogous to UCE and PCE, and these rules would yield comparable verdicts. As a result, standard numeric forms of decision theory would determine fifteen of the sixteen cases in Section 4.1. Comparative decision theory is marginally less effective in this sense, for it would determine cognitive choice in thirteen of the sixteen cases. In another sense, however, comparative decision theory is much more effective, for it can frequently be applied where numeric forms of decision theory cannot. A bare minimum of comparative inputs can return verdicts where more finely-tuned forms of decision theory return nothing at all.
I conclude with the observation that comparative decision theory is not restricted to the context of cognitive choice. In fact, it is not restricted by context at all. It can be applied anywhere provided the utility scale has the kind of symmetry illustrated here for cognitive choice. The results, as we have seen, are surprisingly good odds when faced with the usual human predicament: the need to decide without enough numbers.
The term ‘real agents’ includes software agents that may not be appropriately programmable with point-valued probabilities and utilities (Chu and Halpern 2008, 4–5, 25).
Early work on interval probability functions includes Kyburg (1961), Good (1962), and Levi (1974, with further references on p. 407).
There are also intriguing connections to a contrastive account of knowledge (Morton and Karjalainen 2003; Schaffer 2004).
Paul Samuelson’s observation on Savage’s theory (along with Ramsey’s and de Finetti’s) is still worth noting: “it is important to realize that this is a purely ordinal theory and the same facts can be completely described without using privileged numerical indicators of utility or probability” (1952, 670 n1).
‘|’, which was defined in Section 2.2f) for incomparable plausibilities, utilities, and plausibilistic expectations, is used analogously here. A product such as uP can be thought of as the plausibilistic expectation of an act relative to a single state.
Cf. “In nonquantitative cases the principle to maximize utility may not apply because options’ utilities may not exist. The absence of crucial probabilities and utilities may prevent computing them according to principles of expected utility analysis” (Weirich 2004, 59).
Distinctions among cognitive, partly cognitive, and noncognitive decisions are introduced in Section 2.2c).
I am grateful to an anonymous reviewer for European Journal for Philosophy of Science for raising this point.
An analogous distinction can be drawn for probability. “Whether or not a given sentence is accepted depends not so much on its total probability taken in isolation, but on that probability as compared to the probabilities of the alternative hypotheses being considered” (Levi 1967, 98).
Maher takes independence to be a requirement of rationality “when the preferences are relevant to a sufficiently important decision problem, and where there are no rewards attached to violating … independence” (1993, 12, 63–83).
An alternative approach is explored by Ted Lockhart, who relies on ordinal probability rankings, second-order probabilities, integration to calculate average values, and the principle of indifference to address moral questions (2000, 62–66, 71–72).
“Although an infinite number of options may arise under the idealization of unlimited cognitive power, in real cases the number of options is finite” (Weirich 2004, 142). Cf. Gärdenfors and Sahlin (1982, 366), Laudan (1984, 28), and Giere (1985, 87).
Transitivity is discussed in greater depth in the writer’s “Real-Life Decisions and Decision Theory” (forthcoming).
Acknowledgements
Prasanta Bandyopadhyay, James Franklin, Theo Kuipers, and Ana Portilla contributed insightful comments on an earlier version of this paper. Two anonymous referees and the editors of European Journal for Philosophy of Science offered highly constructive criticism of the present version. Audiences at the University of Groningen in the Netherlands, Complutense University and the University of Alcalá de Henares in Spain, and Visva Bharati University in India also provided valuable feedback. I am grateful to them all.