1 Introduction

Naturalism is the vague but suggestive doctrine that philosophy ought to be continuous with science. It is best identified with a loose cluster of typical impulses (Rysiew 2017), including hostility to accounts of philosophical methodology that posit faculties or methods that are unanswerable to (or take priority over) the best scientific theories and methods. Naturalists are thus typically suspicious of the rationalistic claim that philosophical knowledge emanates from an infallible kind of a priori insight, intuition, or reflection that philosophers are specially attuned to.Footnote 1

Two naturalistic lines of research on the method of cases (MoC) are currently prominent. Theorists on the F-line focus on the form and content of (the reasoning induced by) philosophical thought experiments. For example, they aim to provide a rational reconstruction of the ‘Gettier-reasoning’ that supports the standard ‘Gettier judgment’ (knowledge is not justified true belief) in response to Gettier cases (Gettier 1963). Williamson (2007) and Geddes (2017) develop a broadly naturalistic version of the F-line, eschewing appeal to ‘rational insight’ based on, for instance, a sui generis faculty of intuition (Bealer 1998; BonJour 1998; Bealer 2002) or mere conceptual competence (Boghossian 1996).Footnote 2 Instead, their accounts are non-exceptionalist: they only appeal to ordinary cognitive capacities whose nature and reliability is amenable to scientific (e.g. evolutionary) explanation.Footnote 3 Meanwhile, theorists on the X-line (e.g. Weinberg et al. 2001; Swain et al. 2008; Wright 2010; Starmans and Friedman 2012; Nagel et al. 2013; Turri 2013; Machery 2017) focus on the question of robustness: which philosophical thought experiments, if any, elicit judgments that are stable and uniform across population and presentation? Standard scientific tools (i.e. rigorous experimental design and analysis) are deployed to clarify and assess MoC’s trustworthiness, again on the naturalistic assumption that MoC utilizes ordinary judgment (lest folk surveys be rendered irrelevant).

F-liners and X-liners proceed from opposing inclinations. F-liners typically assume that prominent instances of MoC induce good reasoning, yielding knowledge in paradigm cases. Recovering this possibility is taken as a mark of an adequate reconstruction. X-liners assume a sceptical stance: it is a matter of (empirical) scrutiny whether MoC deserves its cherished status in the philosopher’s tool kit.

Our broad aim is to clarify the interaction between the F-line and X-line, and gesture at the common path forward for naturalists. Our narrow aim is to explore, in particular, how F-liner Williamson (2007) and X-liner Machery (2017) complement and contrast with each other: we identify crucial shared commitments; rule on a disagreement about the force of the Gettier thought experiment (henceforth: Gettier); and thereby examine how far Williamsonians should accept radical Macherian conclusions. Williamson and Machery are fruitful stalking horses. Their common ground offers an attractive foundation for a moderate naturalism about MoC, the implications of which deserves close scrutiny.Footnote 4 Further, Williamson’s account has been especially influential. Despite criticism of its details (Jenkins 2008; Ichikawa and Jarvis 2009; Malmgren 2011; Roca-Royes 2011; Vetter 2017), it remains a basic model for refinement for naturalists. Meanwhile, Machery’s Philosophy Within Its Proper Bounds (henceforth PwPB) is a milestone for experimental philosophy, basing radical methodological conclusions on nuanced argumentation and a comprehensive overview of existing experimental results, including large-scale studies reported by Machery et al. (2017, 2018a, b). In particular, Machery (2017, pp. 6–8) advocates radical restrictionism: in light of its empirically confirmed unreliability, traditional MoC should effectively be shelved, and judgment about standard philosophical cases suspended, Gettier cases included.Footnote 5 In contrast, some prominent X-liners only endorse moderate restrictionism (Weinberg 2007; Alexander and Weinberg 2014): existing empirical results don’t establish the widespread unreliability of philosophical thought experiments, but show that identifying trustworthy instances is a non-trivial empirical task.

Section 2 isolates common ground between the Williamsonian F-line and the Macherian X-line. Section 3 uses it to explicate and criticize a Macherian case for pessimism about Gettier. Section 3.1 argues that Macherian pessimism hinges on the claim that Gettier cases have intrinsic features that disturb ordinarily reliable judgment. Section 3.2 argues that key Gettier cases are not disturbing. Section 4 considers implications for central arguments in PwPB. Section 4.1 argues that Machery’s argument for radical restrictionism is undermined if Gettier can paradigmatically be taken as reliable. However, Sect. 4.2 presents a cautious variant of Machery’s argument, in support of a potent modal ignorance that limits philosophy’s theoretical ambitions, despite some preservation of traditional MoC. On a Williamsonian model, the resulting moderate modal skepticism subtly contrasts with a more familiar form advocated by van Inwagen (1998) and Hawke (2011, 2017).

2 Common ground

On a broadly Williamsonian approach, a successful account of MoC has three features.Footnote 6 First, it is non-exceptionalist. Second, it paints MoC as delivering (what Machery calls) material-mode conclusions: MoC, it is held, is not used “to discover the meaning of words or the semantic content of concepts of philosophical interest, but to understand their referents” (Machery 2017, p. 16). Relatedly, applications of MoC are taken to establish metaphysical possibilities, the “sort of possibility most relevant to the nature of the phenomena under investigation” (Williamson 2007, p. 206). Third, the account must explain the paradigmatic success of Gettier, on the hypothesis that “if any thought experiment can succeed in philosophy, then [Gettier’s] do” (Williamson 2007, p. 178).

Points of affinity with Machery (2017) are immediate. Machery agrees that MoC is best described as non-exceptionalist: the induced judgments “are warranted, if they are, for the very reason that everyday judgments are warranted, whatever that is” (Machery 2017, p. 21). He agrees that MoC is best characterized as in the material-mode (2017, p. 16). Though cautious in his conclusions, he agrees that Gettier stands out as particularly robust: the judgments elicited by Gettier cases have only negligible demographic variation (Machery et al. 2018a) and only small to moderate ordering and framing effects (see Table 2.9 on pp. 86–87 of Machery 2017). The folk apparently judge in accord with philosophical orthodoxy at a similar rate to their judgment of ignorance in response to a trivial ‘false belief’ case. This contrasts with early experimental studies that concluded significant demographic variation in judgment (Weinberg et al. 2001), but used small sample sizes and failed to be replicated (Nagel et al. 2013; Turri 2013; Kim and Yuan 2015; Sayadsayamdost 2015). Indeed, Machery et al. (2018a) hypothesize that the Gettier judgment reflects universal features of folk epistemology (Machery et al. 2017 are more cautious).

To elaborate, consider a key Gettier case:

Hospital. Paul Jones was worried because it was 10 pm and his wife Mary was not home from work yet. Usually she is home by 6 pm. He tried her cell phone but just kept getting her voicemail. Starting to worry that something might have happened to her, he decided to call some local hospitals to ask whether any patient by the name of “Mary Jones” had been admitted that evening. At the University Hospital, the person who answered his call confirmed that someone by that name had been admitted with major but not life-threatening injuries following a car crash. Paul grabbed his coat and rushed out to drive to University Hospital. As it turned out, the patient at University Hospital was not Paul’s wife, but another woman with the same name. In fact, Paul’s wife had a heart attack as she was leaving work, and was actually receiving treatment in Metropolitan Hospital, a few miles away.

Philosophical orthodoxy takes Hospital to induce the judgment that Paul has a justified true belief (his wife is in hospital) that isn’t knowledge. Call this a singular Gettier judgment, supporting the universal Gettier judgment: knowledge is not justified true belief. As it is tricky to explain precisely why Paul lacks knowledge (Shope 1983), suggestive but non-committal terminology will be useful: Paul’s belief is not knowledge since its grounds are not suitably sensitive to what makes it true—its truth is somehow lucky.

Credible studies indicate that Hospital induces widespread convergence on the singular Gettier judgment, bolstering philosophical orthodoxy.Footnote 7 Surveying over 2000 participants, Machery et al. (2017) find both men and women made the singular Gettier judgment at a rate of about 80%. Participants across 23 countries and 16 languages made the singular Gettier judgment at rates between 70% and 90%.Footnote 8 Machery et al. (2018a) report similar cross-cultural invariance: 86% of US respondents issued the singular Gettier judgment; 95% of Brazilians; 88% of Indians; 91% of Japanese.

Hospital represents an important class of Gettier case. In the terminology of Turri (2019), it exhibits the structure: no detect with replacement.Footnote 9 Though the agent is reasonable to believe the proposition in question, they fail to genuinely detect its truth. The presumed truthmaker for the proposition has not in fact been realized; it is true in virtue of a ‘replacement’ truthmaker. Paul justifiably believes his wife is hospitalized, on the basis of a reasonable presumption that she was admitted to University. His presumption is incorrect: she was admitted to Metropolitan. This class is doubly notable. First, it plausibly includes the original counter-examples of Gettier (1963). Hence, the philosophical work achieved by Gettier’s paper is equally achieved by the robust inducement of a singular Gettier judgment by Hospital. Second, there is evidence that cases in this class tend to induce the singular Gettier judgment with striking frequency: see Starmans and Friedman (2012), Turri (2013), Turri et al. (2015) for a selection.Footnote 10 This contrasts, Turri et al. (2015) show, with Gettier cases with so-called detection with failed threat structure (e.g. the fake-barn cases of Goldman (1976)) or detection with replacement structure (e.g. the ‘authentic evidence’ cases of Starmans and Friedman (2012)). Turri (2019) rightly cautions: that a certain type of Gettier case induces (or fails to induce) largely uniform judgment doesn’t support conclusions about the abstract class of Gettier cases as a whole—in particular, those with very different epistemic structure. We nowhere assume that conclusions about Hospital translate into clear morals for, say, fake-barn cases (or vice versa).

Strikingly uniform folk judgment about Hospital doesn’t indicate accurate judgment if folk epistemic judgment is systematically inaccurate. However, Williamson and Machery accept (what Alexander and Weinberg (2014) call) the general reliability thesis: blind-spots granted, folk epistemic judgment is generally accurate when evaluating suitably mundane cases.Footnote 11 Crucially, non-exceptionalism and the general reliability thesis yield:

Epistemic non-exceptionalism. Absent specific defeat, a MoC judgment about a mundane case is rightly treated as expert judgment.

Epistemic non-exceptionalism would be questionable if promising accounts of MoC that entail it were elusive. Fortunately, Williamson (2007, Ch. 6) offers such an account. The reasoning induced by Hospital is explicated roughly as:

W1:

Hospital is (metaphysically) possible.

W2:

If Hospital were the case, then someone would justifiably believe a true proposition without knowing it.

C1:

Thus: it is (metaphysically) possible for someone to justifiably believe a true proposition without knowing it.

C2:

Thus: it is not (metaphysically) necessary that one knows p just in case p is true and one justifiably believes p.

Generally, Gettier-reasoning proceeds as follows: the subject judges both that the described case is possible (\(\mathbf{W1}\)) and that if it were to occur, then someone would have a justified belief in true proposition p without knowledge of p (\(\mathbf{W2}\)). The subject thereby draws a singular Gettier judgment (\(\mathbf{C1}\)). The universal Gettier judgment follows (\(\mathbf{C2}\)).

W1 is justified by whatever justifies ordinary objective possibility claims (perhaps: reality-oriented imagination or ampliative reasoning). Williamson proposes that W2 is justified via an exercise of reality-oriented imagination, guiding a simulated rational belief update: “one supposes the antecedent and develops the supposition, adding further judgments within the supposition by reasoning, offline predictive mechanisms, and other offline judgments” (2007, pp. 152-153). What grounds the accuracy of such simulations? For Gettier, we can partly appeal to our ordinary capacity for mindreading (Nagel 2012). Indeed, given an actual Gettier case, the modal and counterfactual aspects of the reasoning are trivialized, with W2’s justification plausibly collapsing into mere mindreading.

Williamson’s account has met resistance. We needn’t be distracted. First, it is ‘proof-of-concept’ for the Williamsonian approach, whatever refinements await. Second, the objections chiefly target the appeal to counterfactual reasoning, but such worries can be postponed by focusing on actualized Gettier cases. Third, the chief criticisms may not necessitate radical refinement. To illustrate, the account has been criticized for erroneously predicting that deviant realizations can defeat Gettier-reasoning (Ichikawa and Jarvis 2009; Malmgren 2011). A deviant realization of Hospital satisfies its bare description but includes details that necessitate that the agent does not have a justified true belief without knowledge (e.g., Paul knows by an unmentioned source that his wife is in hospital). Now suppose that (only) deviant realizations are actual. Thus W2 is false, and the Williamsonian must conclude that the Gettier-reasoning fails. This is counter-intuitive: if deviant realizations are actualized, this seems irrelevant to Gettier’s force. Here are three strategies for amending Williamson’s analysis. The first targets the appeal to a counterfactual conditional, perhaps deploying a more subtle conditional (cf. Geddes 2017). The second amends the content of the counterfactual: perhaps the consequent is better explicated as the stronger ‘someone would justifiably believe a true proposition on grounds that are not sufficient for knowledge’ (cf. Sosa 2017). The third questions whether Hospital is rightly taken as the input for the Gettier-reasoning: perhaps there is a gap between it and the intended extension thereof that the philosopher successfully communicates (cf. Ichikawa and Jarvis 2013, Ch.8). Clearing this gap might seem a job for a general theory of communication.

Anyway, the account has advantages that refinements should arguably preserve. Fit with pre-theory: the account echoes a pre-theoretic description of participating in a Gettier thought experiment: the given text is a springboard for imagining a scenario that one judges to have certain epistemic features. Non-exceptionalism: understanding reality-oriented imagination as a form of simulation that bears on the epistemology of counterfactuals aligns with developments in cognitive science and psychology.Footnote 12 Similar remarks apply to mindreading.Footnote 13 Moreover, counterfactual, possibility and epistemic judgments are ordinary phenomena with a plausible evolutionary purpose.Footnote 14Possibility of success: the argument from W1 and W2 to C1 and C2 is valid (on standard semantics). Further, general skepticism about such premises balloons into an implausible skepticism about everyday modal and counterfactual claims (cf. Williamson 2016a, b). In particular, typical Gettier cases seemingly evoke mundane possibilities and everyday epistemic notions. Possibility of defeat (i.e. fallibilism): Since ordinary modal, counterfactual and epistemic judgments are fallible, Gettier-reasoning is predicted to be fallible. No appeal is made to infallible ‘modal vision’, ‘rationalistic intuition’, or ‘raw conceptual competence’ (cf. Bealer 1998; BonJour 1998; Bealer 2002; Sosa 2007). This accommodates skepticism about applications of MoC where far-fetched possibilities are evoked or subjects lack requisite conceptual competence or background knowledge. (Compare Hospital to thought experiments that suspend the laws of nature or mention zombies.) Thus, moderate modal skepticism is accommodated, à la van Inwagen (1998). As Williamson puts it, “we are more reliable in evaluating some kinds [of counterfactuals] than others. [...] We may be correspondingly more reliable in evaluating possibility of everyday scenarios than of ‘far-out’ ones, and extra caution may be called for in the latter case” (2007, p. 164).

Further alignment with Macherian commitments is now evident. Assuming that the experimental results collected in Machery (2017) indicate that epistemic peers are genuinely disagreeing when confronted with philosophical cases, a non-exceptionalist account of MoC must apparently accommodate blameless error, i.e., fallibilism. Further, Machery endorses moderate modal skepticism. Explicitly, Machery (2017, 6.1.1) advocates skepticism towards (what he calls) modally immodest philosophical theories: theories committed to ambitious metaphysical necessities of peculiar philosophical interest. In support, Machery (2017, 6.2) argues that stress-testing such theories requires an ability we lack: to reliably survey unusual, atypical, and remote possibilities. Thus, his advocacy of modal modesty is grounded in a moderate modal skepticism, which he in turn grounds in MoC’s purported unreliability.

We draw two main morals. First, the basic commitments of the Williamsonian F-line and Macherian X-line are largely complementary. (sect. 3 exploits epistemic non-exceptionalism and the general reliability thesis; Sect. 4 revisits moderate modal skepticism.) Second, assuming these commitments, the demographic data reported by Machery et al. (2017, 2018a, b) and the account of MoC in Williamson (2007) render it eminently plausible that Hospital-like Gettier cases induce reliable judgment.

3 Macherian pessimism

At this point, it might be puzzling how a Macherian could be pessimistic about the reliability of Gettier. Two arguments for pessimism about MoC can be extracted from Machery (2017). In this section, we explicitly apply such arguments to Gettier, and respond.

  • Worrying data. Judgment in response to Gettier is significantly influenced by mere presentation: in particular, framing (Machery 2017, Ch. 2). Furthermore, particular presentations cannot be singled out as promoting accurate judgment. Thus, the Gettier judgment should be rejected as unreliable across the board.

  • Philosophy is disturbing. Relative to traditional philosophical aims, philosophically interesting cases generally have disturbing characteristics that promote unreliable judgment (Machery 2017, Ch. 3). Furthermore, Gettier is no exception: Gettier cases invariably have (at least) one of these characteristics. Thus, the Gettier judgment should be rejected as unreliable across the board.

In response to the first, we conditionally deny the second premise: it is reasonable (given epistemic non-exceptionalism) to take certain Gettier cases as evincing accurate judgment, if there aren’t independent reasons to think Gettier cases are intrinsically disturbing. The second argument, we suggest, is thus the more basic of the two. In response to it, we again deny the second premise: Gettier cases don’t characteristically exhibit any of the disturbing characteristics identified by Machery. We elaborate below.

3.1 Gettier and framing

Does the worrying data cast doubt on the reliability of Gettier-reasoning? To focus the discussion, we concentrate on the data issued by Study 2 of (Machery et al. 2018b).Footnote 15 Here, 85% of respondents judge that Paul in Hospital has the impression that he knows, but doesn’t know; while only 63% of respondents judge similarly for the agent in Clock, a second Gettier case. Clock is a variant on the classic case due to Bertrand Russell. (Basically: a stopped clock happens to read 4 ’o clock on its face. At 4 ’o clock, a hapless agent observes the clock face and thereby forms a belief about the time.) What to conclude?

We doubt the right conclusion is that Gettier cases evoke significantly non-uniform or unreliable judgment, for this requires an unmotivated inductive step. The class of Gettier cases is large, varying over possible epistemic structures and narrative details. Absent an argument that our sample (Hospital and Clock) is representative, nothing rules out, for instance, that the vast majority of Gettier cases induce the singular Gettier judgment at a rate akin to Hospital, with Clock an outlier.

The conclusion is in doubt even if one grants the sample is representative, for it isn’t clear that the data exhibits a framing effect in the first place. A framing effect is exhibited by two cases when (i) there is a statistically significant difference in how subjects respond and (ii) the cases differ only in superficial narrative details: with respect to philosophically relevant structure, they are equivalent. Let’s grant that Hospital and Clock both deserve the title ‘Gettier case’. However, Starmans and Friedman (2012) and Turri et al. (2015) caution that Gettier cases vary significantly in underlying epistemic structure. Hospital and Clock exemplify this. In Hospital, the agent believes a proposition (‘My wife is in hospital’) on the basis of a presumed truthmaker (she was admitted to University) that differs substantially from the actual truthmaker (she was admitted to Metropolitan). Clock doesn’t share this feature. Further, the nature of the defect in the agent’s information source differs. In Hospital, the agent consults a device (a call to the hospital) that is (known to be) generally reliable with respect to the salient domain (admittance facts), but is, as a matter of (bad) luck, misleading in this one instance. In Clock, the agent consults a device (the stuck clock) that is (surprisingly) highly unreliable with respect to the salient domain (time facts), but is, as a matter of (good) luck, accurate in this one instance.

The conclusion is doubtful even if one grants the sample is representative and issues a framing effect. Machery (2017, p. 104) offers the following criterion for judging unreliability: “the judgments elicited by a given case are unreliable provided that they are influenced by at least a demographic variable or a presentation variable and provided that this influence is large [enough]”. Note, however, that Machery (2017, sect. 3.3.1, p. 108) doesn’t think it suffices that the influence count as ‘large’ in terms of standard benchmarks from psychology. To see why, first note with Machery (2017, p. 46) that we are concerned with cases where “the dependent variable is a percentage (e.g., the percentage of people agreeing that the character does not know the relevant proposition in the situation described by a Gettier case)”. Machery (2017, pp. 45–47) deems the independent variable’s effect size as ‘large’, relative to standard benchmarks, when the absolute difference between the percentages under two conditions exceeds 30%. Let’s say, in this case, that the variable’s influence is significant; assuring one that the observed effect doesn’t merely reflect noisy data. (To illustrate: for Hospital and Clock, the difference in percentage is 22%, indicating only ‘moderate’ significance.)Footnote 16 However, ‘significance’ is then neither necessary nor sufficient for concluding that the population’s judgment is unreliable. Consider sub-populations A and B, each making up 50% of the total population. If 100% of A-respondents and 70% of B-respondents answer ‘yes’ to polar question Q, then the influence of sub-population membership is significant, but, overall, 85% of the population answer ‘yes’. If the correct answer is unknown, we can merely conclude that the population is either largely reliable on Q or largely unreliable. Further, if 52% of A-respondents and 48% of B-respondents answer ‘yes’, the difference in response is not significant, but the average response matches chance. The population is, on average, unreliable.

Thus, Machery (2017, sect. 3.3.1) proposes we attend to average response:Footnote 17 a variable has a large enough effect for determining unreliability when, in the aggregate (across different values of the variable), the distribution of responses is substantially mixed, i.e., the probability of any given response is sufficiently close to chance. That is, when the influence of the variable is accounted for, disagreement is stark.

To illustrate: suppose that half the population are political conservatives and half are political liberals. Suppose that 100% of conservatives answer ‘no’ to ‘Is global warming real?’, while 100% of liberals answer ‘yes’. Thus, the distribution of ‘yes/no’ answers is 50/50. One concludes: the effect size of the (pernicious) variable of political affiliation is large enough to conclude unreliability, since it produces widespread disagreement in the aggregate. (Further, if we don’t know which of ‘yes’ or ‘no’ is right, and we cannot assume that one sub-population has special competence on the issue, then we cannot identify which sub-population has accurate judgment, so cannot ignore the overall unreliability of the population’s judgment.) Second example: suppose that 80% of conservatives answer ‘yes’ to ‘Is global warming real?’, while 100% of liberals answer ‘yes’. Then the probability that a random member of the population will answer ‘yes’ is 90%: significant agreement is exhibited. Hence, we shouldn’t take the effect size as large enough (despite a 20% difference between groups) and shouldn’t conclude that the population’s aggregate judgment is unreliable.

Now compare Hospital and Clock. Here, the aggregate probability of a certain response is presumably calculated as the probability that a random member of the population gives that answer after being assigned Hospital or Clock with a coin flip.Footnote 18 If the experimental data is representative, the probability that ‘mere impression of knowledge’ is chosen over ‘knowledge’ is thus 74%. This represents notable agreement. (Machery presumably agrees: compare the ‘room color’ example discussed by (Machery 2017, p. 104).) So why conclude significant unreliability, rather than lightly tempering one’s credence that ‘mere impression of knowledge’ is the right answer?

Turn to our main argument, which is maximally concessive to Machery. Let’s grant that the data indicates that Gettier-reasoning is significantly unreliable in the aggregate. Nevertheless, a question remains as to the exact conclusion this warrants.

  • Option 1. Judgment in response to Gettier cases is not terribly reliable in the aggregate.

  • Option 2. While judgment in response to Gettier cases is not terribly reliable in the aggregate, judgment relative to certain Gettier cases (or presentations thereof) is reliable.

Option 2 is a stronger hypothesis, and better explains the overall data. As noted previously, there is independent evidence that judgment induced by certain (presentations of) Gettier cases yields significant agreement across diverse demographics (Machery et al. 2018a, b). This uniformity is explained by Option 2 and left mysterious by Option 1. Certainly, if Gettier-reasoning were invariably unsystematic, then robust agreement on any particular Gettier case would be extremely surprising. So Option 2 should be accepted over Option 1, on abductive grounds.Footnote 19

A fortiori, one shouldn’t suspend judgment on the question of reliability (as a moderate restrictionist might advocate).Footnote 20 There is a good reason to take judgment induced by certain cases as reliable: this best explains a striking regularity.

But what of the possibility that significant agreement on a particular Gettier case indicates that our judgment is systematically inaccurate on that case? If this were a serious possibility, then Option 3 could be deployed to explain the data, on a par with Option 2.

  • Option 3. Judgment in response to Gettier cases is not terribly reliable in the aggregate, and judgment relative to certain Gettier cases (or presentations thereof) is systematically inaccurate, generating an epistemic illusion.

However, an epistemic non-exceptionalist should not take Option 3 seriously without specific support for it over Option 2. Absent specific evidence that a certain (presentation of a) Gettier case corrupts judgment, she observes a basic confidence in ordinary judgment. If the case generates widespread agreement (relative to a large and diverse population of individuals), the presumption should be that ordinary judgment has here largely yielded accurate (‘expert’) judgment, as is typical for ordinary cases. Compare a toy example: suppose that half of the population of climate scientists are liberals, half are conservatives. It turns out that 98% of the former answer ‘yes’ to ‘Is climate change real?’, compared to only 60% of the latter. The uniformity among liberals is striking. Should we posit that their judgment is systematically inaccurate (wholly corrupted by political brainwashing)? This is excessively skeptical, in the absence of specific evidence. The normal presumptions stand until defeated: a scientist’s judgment is normally expert, and expert judgment generally converges. Thus, the best explanation for the uniform liberal judgment is that it is accurate: the liberal experts judge exactly as we would expect experts to judge (striking consensus); while the conservative experts judge as we would expect experts to judge under the influence of disturbing factors (a mixed response).Footnote 21

Is there independent reason to think that Gettier-reasoning typically exhibits peculiarities that jeopardize ordinary judgment? Were the answer ‘yes’, Option 3 would be live. We’ll argue ‘no’ with respect to the ‘disturbing characteristics’ proposed by Machery (2017).

3.2 Is Gettier disturbing?

Machery (2017, Ch. 3.5) argues that philosophically interesting cases typically have one of three disturbing characteristics that promote unreliable judgment:

  • Entanglement. Judgment of the case is influenced by its superficial content. That is, arbitrary narrative details (that merely render the case concrete and vivid) influence our judgment, though they have no real bearing on the issue the case is intended to investigate.

  • Unusualness. The case describes an unusual situation, relative to the demands of ordinary life. Ordinary life doesn’t offer opportunities to exercise judgment in such situations (not even unrealized opportunities), so we cannot assume ordinary judgment is primed for them.Footnote 22

  • Atypicality. The case pulls apart properties that generally co-occur in ordinary life, sabotaging the heuristics of ordinary judgment and encouraging ad hoc responses.

It is explicable that philosophically interesting cases tend to have these features. Philosophy investigates phenomena that, while familiar and fundamental, puzzle us on close inspection. We engage in philosophical reflection precisely because we struggle to delineate core features. It is therefore difficult to guard against (or correct) entanglement. Further, philosophical theories often target necessary truths, with rival theories often agreeing on everyday cases. Such theories can only be stress-tested with unusual or atypical cases.

We discuss each disturbing characteristic in relation to Gettier, in turn.

3.3 Entanglement

We grant that philosophical cases face a threat of Entanglement: it is hard to rule out that any particular judgment is subject to entanglement. Further, we tentatively grant that there is specific evidence of entanglement in the case of Gettier: as noted, Machery et al. (2018b) report that responses to certain Gettier cases are influenced by merely presentational factors.Footnote 23

Given the general reliability thesis, one must deny that the mere threat of entanglement casts doubt on the reliability of Gettier-reasoning. If it did, there would be similar grounds for doubting the reliability of countless ordinary epistemic judgments: the latter seem no less susceptible to entanglement. You see Sam reading the headline of today’s New York Times. The headline states that Clinton lost the election. Sam is, in your experience, an affable and reasonable person. Further, you are aware of the Times’ reputation for journalistic excellence and find it an enjoyable read. You judge (rightly) that Sam thereby knows that Clinton lost the election. But the threat of entanglement is present. Absent general confidence in ordinary epistemic judgment, nothing rules out the possibility that one’s judgment has here been influenced by epistemically irrelevant features of the situation (say, one’s warm feelings for Sam or the New York Times). As usual, it is difficult to exactly delineate the features of the situation that make the knowledge ascription reasonable, so a more cautious assessment of Sam’s epistemic state is elusive.

What of the specific evidence that presentation influences Gettier-reasoning? We reiterate our conclusion from Sect. 3.1: given epistemic non-exceptionalism, the best explanation of the overall data is that only certain Gettier cases (or presentations thereof) are likely entangled. This suggests that adverse presentation effects can be ameliorated by a judicious selection of presentational features (and that experimental philosophy provides useful tools for identifying them). Call those Gettier cases that elicit markedly stable judgment sober. Going forward, we focus on such and assume Hospital is among them.

3.4 Unusualness

That Gettier cases are unusual has initial support, as Weinberg (2017, sect. 3) notes. Anecdotally, philosophy students find them surprising on first encounter. Some need help to grasp their structure: rushing their introduction seems a pedagogical error. Experimentally, Turri (2013) reports that judgments about Gettier cases converge much more readily if their structure is presented with extra perspicuity. There is evidence, then, that Gettier cases don’t regularly emerge for evaluation in ordinary life, and ordinary faculties aren’t always primed to notice and properly assess them.

It doesn’t follow that (Hospital-like) Gettier cases are intrinsically disturbing. To show this, we decompose Hospital.

  • Component 1: Justified belief without ‘sensitivity’.Footnote 24

    Starting to worry that something might have happened to his wife, Paul Jones decided to call some local hospitals to ask whether any patient by the name of “Mary Jones” had been admitted that evening. At the University Hospital, the person who answered his call confirmed that someone by that name had been admitted with major but not life-threatening injuries following a car crash. Paul grabbed his coat and rushed out to drive to University Hospital. As it turned out, the patient at University Hospital was not Paul’s wife, but another woman with the same name.

  • Judgment. Paul didn’t come to know anything about his wife via the call, but it led him to justifiably/reasonably/blamelessly believe she was hospitalized.

  • Component 2: True belief.

    Paul’s wife had a heart attack as she was leaving work, and was actually receiving treatment in Metropolitan Hospital, a few miles away.

  • Judgment. Paul had a true belief if he believed his wife was hospitalized.

Component 1 yields, by itself, a key judgment: Paul’s ignorance. Strikingly, the truth value of ‘Mary Jones is in hospital’ needn’t be specified for this judgment to be apt. A tempting conclusion: the truth value is irrelevant. A ready explanation: misleading appearances aside, University’s admission roster holds no information about Paul’s wife, and sources that are uninformative about X don’t induce knowledge about X.

The general phenomenon is familiar and mundane. Suppose Ann asks Bob, a trustworthy person: “Does Carol eat meat?” Bob sincerely replies: “No, Carol is vegetarian. She told me so”. However, Ann and Bob are speaking at cross purposes: Ann is talking about Carol Jones; Bob about Carol Smith. Indeed, he doesn’t know anything about (doesn’t hold information concerning) the dietary preferences of Carol Jones. Ann might thereby reasonably believe Carol Jones is vegetarian, but this isn’t knowledge; Bob didn’t communicate any knowledge about Carol Jones. Whether or not Carol Jones is in fact vegetarian seems irrelevant to this mundane assessment. Another instance: Ann asks Bob: “Do all the conference speakers eat meat?” Bob sincerely replies: “No, one of them told me she is vegetarian”. However, Bob is talking about Carol Smith: he mistakenly believes she is a conference speaker. Indeed, he doesn’t know the dietary preferences of any conference speaker. Ann forms a reasonable belief that not every conference speaker eats meat. This isn’t knowledge; Bob didn’t have any to communicate. Whether any speaker is in fact vegetarian is irrelevant.

Further, Components 1 and 2 are, on their face, simple and mundane. Assuming the general reliability thesis, ordinary judgment is primed for such circumstances: absent defeating considerations, our assessment is trustworthy.

Of course, situations akin to Component 1 and 2 might occur infrequently. If so, they are unusual, in a straightforward sense. Does this defeat default confidence in our immediate judgments? No—it rather illustrates that low probability events can be mundane and, therefore, apt for reliable judgment. As Williamson (2016b, sect. 2.3) observes, to assume that low probability events invariably disturb ordinary judgment is markedly skeptical: just about any situation is of low probability under the right description. Indeed, it is evident that ordinary judgment doesn’t collapse in the face of rare/unexpected events: if it did, we would be severely impeded in ordinary life.

If there is anything notably intriguing and unusual, it is the combination of Component 1 and 2. Mere combination can introduce two complications: lowered probability and heightened complexity. But, again, ordinary judgment isn’t so brittle as to collapse in the face of lightly improbable combinations of ordinary situations. Sam reads in the New York Times that Clinton lost the election. Conclusion: she knows Clinton lost. Blake reads in the New York Times that Clinton lost the election. Conclusion: she knows Clinton lost. Coincidentally, they read exactly the same copy of the NYT (at a certain doctor’s waiting room; they both fell sick that day). We wouldn’t and shouldn’t retract our initial judgments of knowledge simply because of this coincidence. Rare combinations of mundane elements are sometimes mundane. Similar remarks apply to complexity introduction. We face complex situations in ordinary life (e.g. a busy city street). Navigating them requires skills in complexity management: selective attention and careful bookkeeping. An agent that lacks these is again severely impeded, certainly in high stakes situations. So, if complexity invariably disturbed ordinary judgment, the general reliability thesis would be undermined. Of course, Turri (2013) provides prima facie evidence that the complexity of some Gettier cases disturbs ordinary judgment. Unsurprisingly, this is ameliorated with a careful presentation (explaining why introducing Gettier cases to students requires care). Anyway, the experimental results indicate Hospital doesn’t fall prey to such disturbance.

At any rate, even if the combination of Component 1 and 2 could lead to confusion, a simple strategy safeguards accuracy: be careful to judge the components individually and then conjoin the judgments. Could the combination of Component 1 and 2 somehow defeat the considerations that render the corresponding judgments individually apt? This strains credulity: the respective considerations seem decisive. Again, consider Component 1: it seems obvious that knowledge about X cannot accrue from a source that carries no information about X—no matter the circumstances of X.

So, is Hospital disturbingly unusual? This conclusion isn’t licensed simply because it involves rare events or relative complexity. It seems a harmless combination of simple mundane elements: ordinary judgment is presumably here expert, a matter of merging individual judgments about Component 1 and 2. No experimental result defeats this presumption.

Weinberg (2017, sect. 3) proposes a more subtle reason to take (the simple elements of) Gettier cases as disturbingly unusual: they hinge on information about the ‘specific inferential pathways’ taken by the Gettierized agent. (He continues: “And it seems to me we only in the rarest of circumstances are in a situation to [furthermore] know that [the agent’s] belief might be true, while also being aware of a range of possible truthmakers for that belief” (idem., p. 265).) Weinberg suggests there is a profound lack of such information in ordinary life. In Gettier (1963), an agent uses disjunction-introduction to infer ‘Jones owns a Ford or Brown is in Barcelona’ from ‘Jones owns a Ford’, where ‘Brown is in Barcelona’ was randomly selected. It is hard to think of mundane situations where someone transparently reasons like this.

However, to claim mundane situations never yield information about ‘specific inferential pathways’, broadly understood, is to exaggerate. Ordinary speakers often report their reasoning for evaluation. Ann: “Someone in the office is vegetarian”. Dave: “How do you know?” Ann: “Bob is Carol’s good friend and told me she is vegetarian”. One judges: Ann believes that someone in the office is a vegetarian, on the basis (of her belief) that Carol is. One judges: she knows the former if she knows the latter, which hinges on whether Bob knew it. This is exceedingly mundane. (As is observing that Ann’s belief that someone is vegetarian might be true, and could be made true by multiple possible situations.)

Grant that disjunction introduction yields strange reasoning. Not all Gettier cases involve such strangeness. Another classic case from Gettier (1963) hinges, less artificially, on existential-introduction. Hospital induces a perfectly ordinary judgment about an agent’s reasoning: Paul believes his wife was admitted to University, on the basis that her name is on the admission roster.

In short, Hospital might be unusual, but, assuming general reliability, we shouldn’t take it as disturbingly unusual: unusual in any sense that undermines ordinary judgment. To generalize: absent specific defeat, cases in this structural family shouldn’t be counted by an epistemic non-exceptionalist as disturbingly unusual if constructed from simple mundane elements, presented with perspicuity, and assessed with care.

3.5 Atypicality

Turn to Atypicality. Machery worries about situations where there is a package of features, e.g., abc, that typically indicates \(\Phi\), and ordinary judgment exploits this as a mere heuristic. Thus one shouldn’t conclude from our ordinary practice that any of a, b or c is necessary for the truth of \(\Phi\), nor that ordinary judgment is reliable when the package is pulled apart. Hence, philosophically interesting cases that fracture the package yield dubious judgments. In the case of Gettier, the typical package ‘truth + justification + sensitive belief’ indicates knowledge, and serves as a heuristic for ordinary judgment. (We grant these claims.) But Gettier pulls sensitivity (whatever it is) apart from truth and justification. Hence, the worry goes, Gettier induces unreliable judgment.

In response, two points: (i) splitting a typical package doesn’t necessarily lead to unreliable judgment; (ii) Gettier plausibly investigates exactly this sort of split (i.e., where reliability is not undermined).Footnote 25 To see (i), consider: it is easy to think of ordinary situations where justification is present without truth. Here, a typical package is pulled apart. But we shouldn’t conclude that judgment in these situations is unreliable, since lack of truth is an ordinary, decisive marker of ignorance (as Machery 2017, sect. 3.6.3 notes). In support of (ii), we suggest that lack of sensitivity (whatever exactly it is) is analogous to lack of truth: an ordinary, decisive marker of ignorance. Again consider Component 1: a mundane situation where we judge an agent as ignorant, given a lack of sensitive belief. Despite being hard to make precise, the rationale for this judgment is again easily gestured at. Though Paul is unaware of it, the admission roster issues misleading evidence concerning his wife. Indeed, misleading appearances aside, it clearly carries no information about his wife. Agents that form beliefs on the basis of a (relevantly) bereft information source don’t thereby acquire knowledge. Compare: an agent that forms beliefs about a celebrity’s lifestyle on the basis of The National Enquirer doesn’t thereby accrue knowledge. The badness of the source is decisive: it doesn’t (seem to) matter if the belief happens to be true or if the agent has somehow been convinced to consider the National Enquirer trustworthy.Footnote 26

In short, Gettier cases like Hospital might be atypical, but, assuming general reliability, one shouldn’t conclude a disturbing atypicality.

In sum: we see no compelling reason for an epistemic non-exceptionalist to take disturbing characteristics as intrinsic to (or typical of) Gettier cases in Hospital’s structural family.

4 Upshot for Machery’s master arguments

4.1 Radical restrictionism

PwPB defends a severe conclusion: philosophers should abandon the traditional method of cases. Machery reasons inductively, using an inductive step:

If the judgments elicited by most of the philosophical cases that have been examined by experimental philosophers are unreliable, then the judgments elicited by most philosophical cases are plausibly unreliable. (2017, p. 102)

He offers three lines of support for this claim:

  1. 1.

    The tested cases are typical examples of philosophical cases: they “possess many of the properties many philosophical cases possess” (2017, p. 109).

  2. 2.

    “[The tested cases] are canonical. They are famous, and, consciously or unconsciously, they function as templates or paradigms when philosophers write novel cases” (2017, pp. 109-110).

  3. 3.

    Philosophically interesting cases typically posses the disturbing characteristics discussed above, so its members are generally relevantly similar to the cases that have been tested (2017, sect. 3.5).

On this basis, the tested cases are claimed to be representative of the class of philosophically interesting cases.Footnote 27

Should we accept the inductive step? We proceed on the assumption that our previous arguments have been successful: Gettier cases needn’t be taken to generally exhibit disturbing characteristics; Gettier-reasoning (applied to sober cases) induces reliable judgment; and naturalists needn’t find this mysterious, as a Williamsonian analysis illustrates. In particular, we assume this for Gettier cases with the underlying epistemic structure of Hospital, including those of Gettier (1963). This puts pressure on the inductive step.Footnote 28 Gettier cases are clearly philosophically interesting. They aren’t intrinsically disturbing. They are typical. They are (especially) canonical: few thought experiments (even limiting ourselves to Hospital’s class) have been as influential or elicited as much consensus among philosophers. Certainly, it is rash to assume that cases that are controversial among philosophers (precisely because they plausibly disturb ordinary judgment) better represent the broad class of philosophically interesting cases.Footnote 29 In short, even if 1 and 2 are true, 3 and the inductive step shouldn’t be casually accepted: what rules out that philosophically interesting cases are frequently akin to sober Gettier cases like Hospital?

Even if one grants Machery’s inductive step (and that most tested cases induce unreliable judgments), one can resist his severe conclusion. For he requires another conditional: if most philosophically interesting cases induce unreliable judgment, then MoC should be abandoned. But Gettier, it seems, showcases a class of cases for which MoC proves effective, with significant philosophical benefits in tow (as its influence attests). This success should be preserved and emulated. The experimental results are a signal for caution and reform. MoC shouldn’t be abandoned, but recognized as fallible and utilized with discipline (and experimental checks). Gettier (Hospital-like cases in particular) represents a paradigm towards which MoC can and should aspire.

Machery (2017, sect. 5.6) is skeptical about the prospects for reforming MoC. Further, he anticipates objections to his inductive argument. He writes:

Nor is it an objection that some philosophical cases may not possess any disturbing property. The claim is not that every philosophical case elicits a cognitive artifact or diverse responses, but that the kind of case philosophers use for dialectical purpose tends, non-accidentally, to elicit cognitive artifacts or a diversity of responses. (2017, p. 183)

Our own objections don’t rest merely on the possible existence of philosophical cases that aren’t disturbing: we are not fallaciously proposing that a single counter-example undermines a statistical or generic claim. Our key claim is that certain typical and canonical philosophical cases don’t possess disturbing properties. In this connection, we emphasize that Machery doesn’t deploy vanilla statistical-inductive reasoning: he doesn’t base his conclusion that most philosophical cases elicit unreliable judgments on a (demonstrably) random and suitably large sample of tested philosophical cases (or, indeed, of tested typical and canonical cases). Nor does he establish the relative degree of typicality or canonicity for various philosophical cases, as would be essential for evaluating the plausible hypothesis that Hospital-like Gettier cases typify an especially large bulk of philosophical cases. Thus, he hasn’t established that his sample of tested cases warrants generalization to most or all philosophical cases; nor that Gettier isn’t by itself a significant success story for MoC.

Machery continues:

Nor is it compelling to respond that the advice to suspend judgment remains inapplicable until there is clear-cut evidence about what cases exactly are impugned by experimental-philosophy studies. First, we have provided reasons to believe that disturbing cases prime unreliability and disagreement. Second, even if we were unsure about how broadly to suspend judgment, we should still suspend judgment in response to all the cases in contemporary philosophy (except those known to be immune to demographic and presentation effects) because the cases examined by philosophers are typical and canonical. Similarly, if we find that some eggs are contaminated with Salmonella, we would stop eating eggs sold by the brand selling them, even if it is unclear whether all eggs are contaminated. (2017, p. 183)

In issuing a blanket ban on new applications of MoC (though he grants the possibility of cases that are immune to serious demographic and presentation effects), Machery underestimates our ability to (reasonably, defeasibly) discriminate between philosophical cases that are likely or unlikely to induce reliable judgment. Compare the debate induced by the proposal in Weinberg (2007) that epistemic judgment about philosophical cases isn’t sufficiently hopeful: we lack robust error-detection mechanisms for regulating it. Ironically, Machery (2017, Ch. 3) convincingly defuses generic worries about hopefulness. Further, studies reported by Wright (2010, 2013) suggest that ordinary respondents reliably register the presence of instability/unreliability in their epistemic judgments.Footnote 30 Machery has himself identified a rough but promising list of features that problematic cases typically exhibit: namely, the disturbing characteristics (entanglement, unusualness, atypicality). If such characteristics are lacking (as far as one can tell), an epistemic non-exceptionalist assumes that ordinary judgment is primed to rule accurately on what appears to be an ordinary case. This assumption can, of course, be defeated by experimental investigation. Granted, some disturbing characteristics may be hard to discern: entanglement, for instance. However, our reservation in taking the mere threat of entanglement too seriously (Sect. 3.3) is again pertinent. Other characteristics seem easier to spot: modally exotic cases involving philosophical zombies or evil demons seem easily distinguished from relatively mundane cases like Hospital.

So the Salmonella analogy is inapt. Contrast a second case of egg contamination. In the summer of 2017, The Netherlands experienced a large scale contamination of eggs with fipronil, a poisonous insecticide (NOS 2017a, b). The level of fipronil was so high in certain clusters of eggs that those eggs were inedible. But the National Health Organization merely advised people to ‘proceed with caution’ when consuming eggs, rather than halt consumption altogether. This was sensible: it was reasonably clear which eggs were contaminated. Indeed, a serial number is printed on every egg, and the Dutch National Health Organization was able to release a list of numbers for eggs that were reasonably suspected to be infected.

The same advice applies to MoC: a naturalist should proceed with caution, but to discontinue MoC entirely is an overreaction to the data.

4.2 Modal modesty

Macherian pessimissm about Gettier should be unconvincing to both Williamsonians and Macherians, in virtue of common ground: epistemic non-exceptionalism. The Williamsonian F-line gets a better handle on Gettier: suitably mundane cases like Hospital deploy ordinary possibility, counterfactual and epistemic judgment in the production of substantive philosophical knowledge. So much for the claim that (traditional) MoC has not or cannot yield substantive philosophical conclusions, and should be shelved.

However, Machery (2017, Introduction) describes his critique of MoC as a detour on the way to his main conclusion that “resolving many traditional and contemporary philosophical issues is beyond our epistemic reach” (p.1); in particular, “modally immodest issues cannot be resolved, and modally immodest philosophical views [cannot be] supported” (p. 3). Philosophers, he worries, often pursue theories of knowledge, mind, personal identity, right action and free will that target ostentatious claims of metaphysical necessity. Machery (2017, sect. 6.1.1) offers this argument:

M1.:

Many central philosophical issues are about metaphysical necessities, and resolving these issues requires establishing these necessities.

M2.:

Philosophers must appeal to unusual and atypical philosophical cases to establish these metaphysical necessities.

M3.:

We should suspend judgment about the situations described by current philosophical cases and, more generally, by unusual and atypical philosophical cases.

M4.:

There is no other way of learning about the pertinent metaphysical necessities and possibilities.

MC.:

Hence, there are many philosophical issues that we cannot resolve.

We reject M3: Hospital counts as a ‘current philosophical case’ that is, broadly speaking, unusual (Sect. 3.4) and atypical (Sect. 3.5), yet apt for judgment. Since Hospital represents a canonical and typical class of cases, M3 shouldn’t even be accepted generically.

However, a nearby argument is harder to dismiss. Gettier-reasoning is typically mundane and well-supported by empirical studies. This cannot be said for a large swathe of tested philosophical cases: Truetemp, Switch, Transplant, Society of music lovers, etc. Unlike Hospital, these don’t strike us pre-theoretically as (unlucky but) mundane: they are unusual or atypical in a plausibly disturbing sense. Indeed, empirical investigation reveals serious demographic and presentation effects (Machery 2017, Ch. 2). Suppose these cases are indeed canonical and typical examples of a larger class of exotic philosophical cases (in contrast to mundane philosophical cases). Indeed, they largely belong to a salient sub-class: modally remote cases, instantiated only in suitably ‘distant’ possible worlds. One may then deploy an inductive argument (analogous to but more modest than that in Sect. 4.1): MoC applied to exotic philosophical cases is unreliable. This supports:

M3*.:

We should suspend judgment about the situations described by exotic (e.g. remote) philosophical cases.

Here is a variant of M2:

M2*.:

Philosophers must appeal to exotic (e.g. remote) philosophical cases to establish these metaphysical necessities.

Replacing M2 with M2* and M3 with M3* yields a Macherian argument for MC that is untouched by our foregoing critique. The tentative neo-Macherian moral: philosophers ought not abandon (substantive uses of) MoC, but limit it to (putatively) mundane cases that cannot support especially ambitious, modally immodest metaphysical theses.

This argument deserves careful scrutiny. It aims to support a moderate modal skepticism that is subtly different from a more familiar form (cf. van Inwagen 1998).Footnote 31 The Williamsonian account of MoC (Sect. 2) helps to draw the distinction. The familiar form worries about the gap between merely considering/imagining a concrete remote case and having established its possibility: premise W1 in the reconstructed MoC reasoning is questioned. The Macherian form worries about our accuracy when applying ordinary judgment to remote cases: W2 is questioned. Thus, suspicion is raised about judgments about knowledge, right action or free will in response to clearly possible but remote cases. Counterpossible conditionals (i.e., counterfactual conditionals with impossible antecedentsFootnote 32) also highlight the contrast, if impossible antecedents are both remote and support only some consequents. Van Inwagen-style modesty worries about whether and when we can rightly identify a counterpossible conditional. Macherian modesty worries about whether and when we can rightly assess its truth.

So, a version of Machery’s core argument for rejecting modal immodesty survives our critique. Detailed evaluation is for future work. Our tentative conclusion: a promising and prominent naturalistic programme spanning the F-line and X-line is plausibly committed to both the reliability of typical Gettier-reasoning and modally modest philosophy.