1 Introduction

Many philosophers hold that not only is there a causal condition on perception, but also that this condition represents a conceptual truth about perception: that it is a conceptual truth that when someone perceives an object, that object causes their experience of it. The claim that the causal condition is a conceptual truth about perception has implications both for debates about the nature and objects of perceptual experience. On the one hand, it has been used to motivate philosophical theories of perception according to which perceptual experiences are distinct from, but (when veridical) causally responsive to, mind-independent objects in our environment. This includes the sense-datum, adverbialist, and forms of the representationalist theory of perception, although it may also be consistent with relationalist and disjunctivist theories assuming that there can be causal relations between logically related entities.Footnote 1 On the other hand, whether the causal condition is a conceptual truth has featured prominently in debates about the metaphysics of color, where the claim that the causal condition is a conceptual truth has been used to motivate the claim that (whatever else they may be) necessarily, colors cause color experiences; similar issues also arise in relation to other sensible qualities, including sounds and smells.Footnote 2

An influential line of argument for the claim that the causal condition is a conceptual truth is based on intuitive responses to (or judgments about) thought experiments famously associated with Grice (1961), of which there are two types.Footnote 3 The first involves physical objects that block the view of an object in the subject’s environment while a phenomenally indistinguishable experience is caused by another qualitatively similar object; for instance, a mirror blocks the view of a pillar in front of the subject, redirecting the image of another qualitatively similar pillar that is somewhere else (e.g. behind the subject). Following Roberts et al. (2016), we call these “Blocker cases.” The second type of thought experiment involves situations in which there is an object of the relevant kind that is in front of the subject with no physical object in the way, yet the subject’s experience is not brought about by that object but by some internal means, for instance, by something directly stimulating the subject’s brain. Following Roberts et al. (2016), we call cases where the causal link is broken but no physical blocker is present, “Non-Blocker cases.”

Many philosophers hold that it is intuitively plausible that the subject does not see the environmental object in either type of case, and that the explanation of this intuitive judgment is that in both cases the object is not the cause of the subject’s experience. Indeed, even those who reject the causal theory of perception—because they think that perceiving involves some other, non-causal, type of relation—typically accept the intuitions about Gricean cases to the effect that the subject fails to see the object (e.g. Hyman 1992: 278–280). Are the supposedly intuitive verdicts that no seeing occurs in Gricean cases really intuitive? Are they the verdicts that competent speakers who possess the concept of seeing arrive at or only the verdicts that, perhaps indoctrinated, philosophers arrive at? This is a question that experimental philosophy can help to answer.

Previous work has suggested that Gricean intuitions may not be universally shared. Roberts et al. presented participants with Gricean-style Blocker and Non-Blocker cases. They found that the vast majority of participants agreed with the Gricean intuition that seeing does not occur in Blocker cases, but they also found that a substantial minority diverged from Grice in agreeing that seeing occurs in Non-Blocker cases; further, they found that the Blocker and non-Blocker cases were significantly different. They thus hypothesized that folk intuitions better align with a “no blocker condition”: it is not necessary for an object to cause a subject’s experience for the subject to see it; it is only necessary that the object not be blocked by another. This is a weaker condition than the causal condition and suggests that at least some of the folk might have a more liberal conception of perception than philosophers.

This finding is complemented by a recent investigation into Johnson’s core beliefs about color. In an influential discussion, Johnston (1992) claims that the thesis which he calls “Explanation”—an application of the causal theory of perception to color perception, such that the colors of objects sometimes causally explain our visual experiences of color—represents one of five “core” beliefs about color (beliefs that are central to the concept of color). Roberts and Schmidtke (2019) experimentally examined Johnston’s five core beliefs, and found that, though all five were agreed with, Explanation was agreed with the least. They found (in their more rigorous second study) that only 59% agreed with Explanation.

The present paper extends current empirical enquiry into the causal theory of perception by presenting the results of two new studies. The first study was designed to test the generalizability of Roberts et al.’s findings to additional sense modalities: audition and olfaction. If the causal theory of perception is genuinely a theory of perception and not just a theory of vision, then it should also hold of non-visual sensory modalities. The first study was also designed to address two potential objections to Roberts et al.’s study. The first is that participants who agreed that seeing occurs in Non-Blocker cases may have responded using the phenomenal sense of “see”—the sense used in the phrase “Macbeth sees a dagger” to express that Macbeth is hallucinating. If this was the case, participants did not understand Roberts et al.’s target item in the non-phenomenal perceptual sense intended (Fischer 2017; see also Fischer and Engelhardt 2017, 2019). We address this concern by including a phenomenal control item to check for how participants are understanding our target items.

The second potential objection is that Roberts et al. examined the wrong kind of judgments. According to one way of developing this objection, it might be suggested that some participants did not reflect sufficiently on the Non-Blocker cases in Roberts et al.’s study, and so their responses ought not to be taken as evidence against the claim that the causal theory of perception is a conceptual truth. Machery (2017: 156) calls this “the reflection defense” of the method of cases. The reflection defense is closely related to the better known “expertise defense” (e.g. Williamson 2007; Ludwig 2007), which claims that the right kinds of judgements are those made by philosophical experts; unlike the expertise defense, however, the reflection defense does not restrict the right kind of judgments just to those made by people with philosophical training but more widely to those judgments that are the result of appropriate kinds of cognitive processes. Although exactly what it takes for a claim to be a conceptual truth is controversial, those who claim that there are conceptual truths typically allow for the possibility of error and ignorance about them—at least in certain concept users and/or in certain circumstances. So, it is compatible with the causal theory being a conceptual truth that participants who do not reflect sufficiently, perhaps because they lack the relevant reflective dispositions and abilities, arrive at the wrong answer.

Gricean cases present an interesting and illuminating case study for assessing the prospects for the reflection defense, since it is reasonable to suppose that the Non-Blocker cases, in particular, may require fairly high levels of reflection. In real-world scenarios, it is (almost) invariably true that if it looks like there is an X in front of you and there is an X in front of you, not obstructed by a physical object, then you see X. Hence, it seems reasonable to think that based on this experience that your gut reaction might be that you see X, even in Non-Blocker cases. The difference that Roberts et al. found between Blocker and Non-Blocker cases may, therefore, just be due to their participants being insufficiently reflective. To address this concern, and thereby investigate the feasibility of the reflection defense more generally, we included the original three Cognitive Reflection Test (CRT) items (Frederick 2005) to differentiate more reflective participants from less reflective ones with the thought that the more reflective might be more prone to disagree with the Non-Blocker cases, thereby eliminating the difference between the Blockers and Non-Blockers found by Roberts et al.Footnote 4

The second study sought to further investigate the robustness of the claim that the folk conception better aligns with a no blocker condition by considering a new kind of Blocker case. Roberts et al.’s study (and, as will be seen, our study 1) supports this hypothesis by showing that a substantial minority agree that seeing occurs in Non-Blocker cases, whereas in Blocker cases significantly less agree that seeing occurs. However, the Gricean-style Blocker cases used differ from the Non-Blocker cases in more ways than simply including a blocker: they also involve redirection, i.e. a causal relationship to a different object not in front of the subject, for instance, a qualitatively identical pillar that is behind the perceiver and reflected in a mirror. The causal relationship to a different object may make the lack of a causal relationship to the target object more salient or be taken to exclude the possibility of seeing the target object. This alone might be sufficient to explain the difference Roberts et al. found between Blocker and Non-Blocker cases. To support the claim that the folk conception better aligns with a no blocker condition, it is necessary to create two types of cases that only differ in that one includes a blocker. We therefore created new Blocker cases for study 2 by simply adding a blocker to the Non-Blocker cases.

2 Study 1

2.1 Methods

Materials

The survey was created and administered using Qualtrics, Version 2018. Participants were recruited via Amazon Mechanical Turk (M-Turk) from the 11th to 14th of April 2019. The data was analyzed with IBM SPSS, Version 25.

Procedure

The survey was made available to M-Turk users who had a HIT approval rate of 90% or higher, had graduated from a high school in the United States, and were currently living in the United States. Consenting participants were asked to complete an online survey comprised of five parts. For part 1, participants were randomly allocated to one of three property groups: vision, audition, or olfaction. In each group, participants were presented with a Blocker and Non-Blocker case involving that group’s property in a random order. After reading each case, participants indicated their agreement with a target statement on a 10-point Likert scale, from “disagree absolutely” to “agree absolutely.” Each point on the scale was associated with a semantic label such that responses 1 to 5 indicated decreasing levels of disagreement, and responses 6 to 10 indicated increasing levels of agreement. Each case and its associated target question are given in Table 1.

Table 1 Blocker and Non-Blocker Cases

In part 2, participants were asked to respond to three further cases in a random order using the same 10-point scale: the Agree and Disagree controls, and the Phenomenal control. The Agree control was designed to evoke agreement, and the Disagree control was designed to evoke disagreement. These controls were designed to be attention checks. The answers are basically given by the wording of the cases: for the Agree control, the phrase “in plain view” is used, and for the Disagree control the phrase “outside of view” is used. The Phenomenal control was designed to pick out participants who might be interpreting the target statements of our main cases in a phenomenal sense as per Fischer’s concern (2017), discussed in the introduction. These cases and their target questions are given in Table 2.

Table 2 Control Cases

In part 3, participants were asked the original three Cognitive Reflection Test (CRT) items. Each item was presented as a multiple-choice question with four response options (Sirota and Juanchich 2018). The CRT is different from many validated psychometric tests in that it is free to use, widely publicized in academic and popular press, and has easy to verify correct answers. The original paper by Friedrick (2005) has 1177 citations on Web of Science as of the 13th of June 2019, and 3274 citations on Google Scholar. As the CRT is often used in online forums, where users can quickly search for the correct answers, researchers have expressed concerns about its enduring ability to distinguish high and low scorers (Toplak et al. 2014; Haigh 2016; Stieger and Reips 2016). Newer versions of the CRT have thus been proposed, often with greater numbers of items (Toplak et al. 2014; Primi et al. 2016). We chose not to use these newer versions for two reasons. First, CRT items take time for participants to complete, and we wanted to make sure the survey lasted less than five minutes on average given the amount of money we could pay participants (0.50 USD). Second, the increase in participant scores (on average) after exposure to the CRT do not appear to hinder its capacity to predict relevant characteristics, e.g. need for cognition and base-rate neglect (Bialek and Pennycook 2018). We tend to agree with Bialek and Pennycook’s (2018) speculation that people who are naturally more reflective may be more likely to question their initial judgements and look up the answers.

In part 4, participants were asked to answer questions about their demographics: age, gender, educational attainment, and the number of philosophy and science college/university-level courses they had taken dealing with human perception. Finally, part 5 asked participants to share any questions, comments, or concerns. Those who completed the survey saw a unique code to enter into the M-Turk interface to receive 0.50 USD. Participants completed the survey in about four minutes (Mdn = 232 s, Interquartile range = 169 to 340).

Participants

In total, 800 participants completed the survey. As the current paper focuses on “folk” intuitions, the 39 participants who indicated taking four or more courses (roughly, a university minor in the United States) in the philosophy and/or science of human perception were removed from the analyses. Further, we wanted to focus on those who passed the Agree control (response of 6 or higher indicating some degree of agreement) and the Disagree control (5 or lower indicating some degree of disagreement), as participants who did not pass displayed insufficient effort responding. So, of the remaining 761 participants, the 57 who failed the Agree control, the 89 who failed the Disagree control, and the 10 who failed both (total N = 156) were also removed from the analyses. We could have been stricter but felt these cut-points were the best justified, and we did not want to remove too many from our sample given the analyses we planned to run. Of the remaining 605, similar numbers were allocated to each property group: 215 for Vision, 203 for Audition, and 187 for Olfaction.

Regarding the CRT items, nearly half of participants (N = 288) answered all the items correctly and were placed in the All Correct group for the inferential analyses; this was similar across the property groups: 104 for Vision, 99 for Audition, and 85 for Olfaction. As nearly half answered all the items correctly, which is much higher than found in Friedrick (2005), some of our participants probably had prior familiarity and looked up the answers. Regarding the Phenomenal control, approximately 90% (N = 552) indicated some level of disagreement and so were placed in the Non-Phenomenal-interpretation group for the inferential analyses (response of 5 or lower indicating some degree of disagreement); this was similar across the Property factor: 197 for Vision, 181 for Audition, and 174 for Olfaction.

The educational attainment of the remaining participants was as follows: 49 had a high school diploma, 138 had attended some college (but no degree), 78 had an associate’s degree, 225 had a bachelor’s degree, 83 had a master’s degree, 17 had a further specialist degree (e.g. medical doctor), 12 had a doctorate degree, and the remaining 3 did not say. Regarding gender, 347 participants identified as female, 256 as male, 1 as non-binary, and 1 preferred not to say. The median age was 39 years (Interquartile range = 32 to 53).

2.2 Results

Descriptive Statistics

To understand the 605 participants’ responses, their mean responses along with the percentage who agreed with each case were examined. Overall the mean agreement was lower for the Blocker cases (M = 2.57, SD = 2.25, 12.40% agree) than for the Non-Blocker cases (M = 3.77, SD = 2.95, 27.77%). This pattern was stable across Property. Specifically, the Blocker cases were lower than the Non-Blocker cases for Vision (respectively, M = 3.07, SD = 2.46, 17.21% vs. M = 4.19, SE = 3.18, 32.56%), Audition (respectively, M = 2.31, SD = 2.05, 9.85% vs. M = 3.79, SD = 2.87, 29.56%), and Olfaction (respectively, M = 2.29, SD = 2.10, 9.63% vs. M = 3.26, SD = 2.68, 20.32%).

Figure 1 was created to be similar to Fig. 1 in Roberts et al.’s (2016) paper to allow for comparison across studies. Figure 1 displays the percentage of participants who gave each response (1–10) across each case and property. The white bar represents a response of 6, i.e. the point at which participants change from disagreeing to agreeing.

Fig. 1
figure 1

Percentage of participants who gave each response across each case and property

Inferential Analyses

To assess whether differences are statistically significant and the factors that may moderate them, a four-way mixed ANOVA was conducted with Case-type (Blocker, Non-Blocker) as a within-subjects factor, and Property (Vision, Audition, Olfaction), CRT (All correct, Not All Correct), and Phenomenal (Non-Phenomenal-Interpretation, 5 or lower; Phenomenal-Interpretation, 6 or higher) as between-subjects factors. We chose to use an ANOVA, because, as mentioned in the introduction, we were interested in the two-way interactions between Case-type and each remaining factor.

A significant effect was found for most factors: Case-type (F(1, 593) = 21.82, p < 0.001, ηp2 = 0.04), CRT (F(1, 593) = 7.00, p = 0.01, ηp2 = 0.01), and Phenomenal (F(1, 593) = 14.16, p < 0.001, ηp2 = 0.02). The effect of Property, though, only approached significance (F(1, 593) = 2.38, p = 0.09, ηp2 = 0.01). No interactions were significant (all F’s(1, 593) < 2.13, p’s > 0.12, ηp2’s < 0.01).

Each factor was further examined using the estimated marginal means and standard errors. Bonferroni’s corrections are applied to interpret whether the results are significant, but we report unadjusted p-values to allow the reader to more easily make their own judgements.

Regarding Case-type, the Blocker cases (M = 3.12, SE = 0.17) were rated lower than the Non-Blocker cases (M = 4.16, SE = 0.22). The mean difference between them was significant (Mdiff = 1.04, SE = 0.22, p < 0.001, 95% Confidence Interval [0.61, 1.48]).

Regarding Property, the mean of Vision (M = 4.13, SE = 0.27) was notably higher descriptively than Audition (M = 3.42, SE = 0.27) and Olfaction (M = 3.37, SE = 0.31). None of the differences between the properties were significant, though, including the mean difference between Vision and Audition (Mdiff = 0.71, SE = 0.38, p = 0.06, 95% CI [−0.03, 1.45]), Vision and Olfaction (Mdiff = 0.75, SE = 0.41, p = 0.07, 95% CI [−0.05, 1.56]), and Audition and Olfaction (Mdiff = 0.05, SE = 0.41, p = 0.90, 95% CI [−0.76, 0.85]).

Regarding CRT, participants who answered all the CRT items correctly tended to respond lower (M = 4.07, SE = 0.20) than those who answered at least one CRT item incorrectly (M = 3.21, SE = 0.26). The mean difference between them was significant (Mdiff = 0.86, SE = 0.33, p = 0.01, 95% CI [0.22, 1.49]). Regarding Phenomenal, participants in the Non-Phenomenal-Interpretation group, tended to respond lower (M = 3.03, SE = 0.09) than those in the Phenomenal-Interpretation group (M = 4.25, SE = 0.31). The mean difference between Non-Phenomenal-Interpretation and Phenomenal-Interpretation was significant (Mdiff = 1.22, SE = 0.33, p < 0.001, 95% CI [0.59, 1.86]). (Note that the samples sizes for Phenomenal-Interpretation are quite small, ranging from 13 to 22 participants in each property group.) Generally, we take these results to show that Phenomenal worked as intended. We would expect those in the Non-Phenomenal-Interpretation group to respond lower and for those in the Phenomenal-Interpretation group to respond higher.

In summary, participants disagreed more with the Blocker cases than with the Non-Blocker cases. While there were descriptive differences between properties, they were not significant. Additionally, participants who answered all the CRT items correctly, and those who disagreed with Phenomenal, disagreed more than those who did not.

Ideal Participants

In order to test the “reflection defense” of philosophers’ intuitions, it is necessary to focus on the responses of “ideal participants,” i.e. those participants who (a) passed the attention checks, (b) answered all the CRT items correctly, and (c) disagreed with Phenomenal. Only 270 participants (132 female, 137 male, and 1 did not to say) out of the 605 used for the previous analyses are left in this ideal participant sample, exemplifying the fact that we are looking at a very restricted subset of the population. Of these 270 participants, similar numbers were allocated to each property group: 97 for Vision, 93 for Audition, and 80 for Olfaction. Figure 2 displays the percentage of ideal participants who gave each response (1–10) across each case and property.

Fig. 2
figure 2

Percentage of ideal participants who gave each response across each case and property

A two-way mixed measures ANOVA was conducted with Case-type (Blocker, Non-Blocker) as a within-subjects factor, and Property (Vision, Audition, Olfaction) as a between-subjects factor. A significant effect was found for both factors: Case-type (F(1, 267) = 51.24, p < 0.001, ηp2 = 0.16) and Property (F(1, 267) = 4.31, p = 0.01, ηp2 = 0.03). The interaction was not significant (all F(1, 267) = 0.61, p = 0.54, ηp2 = 0.01). The effects of each factor were further examined using the estimated marginal means and standard errors. Again, we interpret p-values in light of Bonferonni’s correction, but unadjusted p-values are reported to allow the reader to more easily make their own judgements.

Regarding Case-type, the Blocker cases (M = 2.14, SE = 0.11, 6.67% agree) were rated lower than the Non-Blocker cases (M = 3.35, SE = 0.17, 22.59% agree). The mean difference between them was significant (Mdiff = 1.21, SE = 0.17, p < 0.001, 95% CI [0.88, 1.55]). Hence, it can be seen that even for ideal participants that we find a difference between the Blocker and Non-Blocker cases with a substantial percentage agreeing with the latter.

Regarding Property, the highest mean was for Vision (M = 3.18, SE = 0.19) followed by Audition (M = 2.73, SE = 0.20) and Olfaction (M = 2.34, SE = 0.21). The mean difference between Vision and Olfaction was significant (Mdiff = 0.83, SE = 0.29, p = 0.004, 95% CI [0.27, 1.40]), but the difference between Vision and Audition (Mdiff = 0.45, SE = 0.28, p = 0.10, 95% CI [−0.09, 0.99]) and between Audition and Olfaction (Mdiff = 0.39, SE = 0.29, p = 0.18, 95% CI [−0.18, 0.96]) were not significant.

2.2.1 Further Analyses

Further analyses were conducted to address questions arising from the above analyses having to do with the Phenomenal control and the order of Blocker and Non-Blocker cases.

Phenomenal Control

To examine why some participants agreed with the Phenomenal control, we went back to all 800 participants and examined the relationship between participants’ responses to Phenomenal (N = 101 Phenomenal-Interpretation), Disagree (N = 108 agree), and Agree (N = 74 disagree) using Chi-squared tests. The Chi-squared test was significant for Phenomenal and Disagree (X2 (1, N = 800) = 29.26, p < 0.001): participants who agreed with Phenomenal were more likely to agree with Disagree (N = 31 out of 101, 30.69%) than participants who disagreed with Phenomenal (N = 77 out of 699, 11.02%). The Chi-squared was also significant for Phenomenal and Agree (X2 (1, N = 800) = 10.12, p < 0.001): participants who agreed with Phenomenal were more likely to disagree with Agree (N = 18 out of 101, 17.82%) than participants who disagreed with Phenomenal (N = 56 out of 699, 8.01%). Probably, these results are best explained by the Phenomenal item overlapping in function with the attention checks. In other words, many participants (though not all) who agreed with Phenomenal were probably insufficient effort responders. Moreover, as discussed, ideal participants displayed a similar response pattern to the overall sample, even though this group excluded (by definition) those who agreed with Phenomenal. This suggests that there is no evidence that participants were employing a phenomenal reading of the target questions in the Non-Blocker cases.

Order Effects

Using all 605 participant responses, potential order effects were examined. The main purpose of this analysis was to determine if participants were influenced by the order in which they experienced the cases, particularly with regard to ideal vs. non-ideal participants. For each property, nearly half of the participants experienced the Blocker cases first (Vision = 108, 50.2%, Audition = 104, 51.2% and Olfaction = 100, 53.5%). A four-way ANOVA with Case-type (Blocker, Non-Blocker) as a within-subjects factor and Property (Vision, Audition, Olfaction), Ideal (Ideal, Non-Ideal), and Order (Blocker-First, Non-Blocker-First) as between-subjects factors was conducted. Regarding the main purpose of this analysis, we emphasize that there was no interaction between Order and Ideal (F(1, 593) = 0.56, p = 0.45, ηp2 = 0.001). The effect of Order itself was also not significant (F(1, 593) = 0.004, p = 0.95, ηp2 < 0.001). The interaction between Order and Property was significant (F(2, 593) = 4.69, p = 0.01, ηp2 = 0.02). This is explained by the pattern of responses for Vision and Audition being the reverse of each other. For Vision, participants who saw the Blocker cases first gave lower responses (M = 3.24, SE = 0.21) than those who saw the non-Blocker cases first (M = 3.93, SE = 0.21). For Audition, participants who saw the Blocker cases first gave higher responses (M = 3.31, SE = 0.21) than those who saw the Non-Blocker cases first (M = 2.73, SE = 0.22). No other interactions with Order were significant (F’s < 0.72, p’s > 0.46).Footnote 5

3 Study 2

3.1 Methods

Materials

Like Survey 1, Survey 2 was created and administered using Qualtrics, Version 2018. Participants were recruited via M-Turk on the 30th of May 2019 and the data was analyzed with IBM SPSS, Version 25.

Procedure

Survey 2 was made available to M-Turk users who met the same conditions as those in Survey 1. As in Survey 1, consenting participants were asked to complete an online survey comprised of five parts. To potentially increase the proportion of sufficient effort responders, before starting part one, participants indicated their agreement to read the items carefully by ticking a box. For part 1, participants were randomly allocated to one of three property groups (vision, audition, or olfaction) and experienced the original Non-Blocker and the new Blocker cases involving that group’s property in a random order. After reading each case, participants indicated their agreement with a target statement on the same 10-point Likert scale. Each case and its associated target question are in Table 3. Parts 2 through 5 were the same as in Survey 1. Participants who completed the survey saw a unique code to enter into the M-Turk interface to receive 0.50 USD. Participants completed the survey in about four minutes (Mdn = 235 s, Interquartile range = 172.25 to 339.25).

Table 3 Non-Blocker Cases and New Blocker Cases

Participants

In total, 800 participants completed the survey. The 37 participants who indicated taking four or more courses in the philosophy and/or science of human perception were removed from the analyses. Of the remaining 763 participants, 75 who failed the Agree control, 87 who failed the Disagree control, and 17 who failed both (total N = 179) were removed from the analyses. Of the remaining 584, similar numbers were allocated to each property group: 197 for Vision, 199 for Audition, and 188 for Olfaction.

Regarding the CRT, nearly half of participants (N = 283) answered all the items correctly and were placed in the All Correct group; this was similar across the property groups: 101 for Vision, 97 for Audition, and 85 for Olfaction. Regarding the Phenomenal control, approximately 90% (N = 535) indicated some level of disagreement and so were placed in the Non-Phenomenal-Interpretation group for the inferential analyses; this was similar across the Property factor: 175 for Vision, 184 for Audition, and 176 for Olfaction.

The educational attainment of the remaining participants was as follows: 60 had a high school diploma, 123 had attended some college (but no degree), 83 had an associate’s degree, 228 had a bachelor’s degree, 65 had a master’s degree, 13 had a further specialist degree (e.g. medical doctor), 10 had a doctorate degree, and the remaining 2 did not say. Regarding gender, 334 participants identified as female, 246 as male, and 4 as non-binary. The median age of the participants was 39 years (Interquartile range = 32 to 50).

3.2 Results

Descriptive Statistics

To understand the 584 participants’ responses, their mean responses along with the percentage who agreed with each case were examined. Overall the mean agreement was lower for the new Blocker cases (M = 2.74, SD = 2.47, 14.21% agree) than for the original Non-Blocker cases (M = 3.98, SD = 2.94, 29.79%). This pattern was stable across the properties. Specifically, the new Blocker cases were lower than the Non-Blocker cases for Vision (respectively, M = 2.81, SD = 2.49, 13.71% vs. M = 3.99, SD = 2.99, 30.96%), Audition (respectively, M = 2.94, SD = 2.60, 17.09% vs. M = 4.37, SD = 3.08, 35.68%), and Olfaction (respectively, M = 2.47, SD = 2.27, 11.70% vs. M = 3.56, SD = 2.70, 22.34%). Figure 3 displays the percentage of participants who gave each response (1–10) across each case and property.

Fig. 3
figure 3

Percentage of participants in study 2 who gave each response across each case and property

Inferential Analyses

To assess whether differences are statistically significant, and the factors that may moderate them, a four-way mixed ANOVA was conducted with Case-type (Blocker, Non-Blocker) as a within-subjects factor, and Property (Vision, Audition, Olfaction), CRT (All correct, Not All Correct), and Phenomenal (Non-Phenomenal-Interpretation, Phenomenal-Interpretation) as between-subjects factors.

A significant effect was found for Case-type (F(1, 572) = 24.63, p < 0.001, ηp2 = 0.04) and Phenomenal (F(1, 572) = 18.77, p < 0.001, ηp2 = 0.03). The three-way interaction between Case-type, Property, and CRT was significant (F(1, 572) = 4.52, p = 0.01, ηp2 = 0.02), as was the three-way interaction between Property, CRT, and Phenomenal F(1, 572) = 3.41, p = 0.03, ηp2 = 0.01). The four-way interaction was also significant F(1, 572) = 5.12, p = 0.01, ηp2 = 0.02). No other effects or interactions were significant (all F’s(1, 572) < 2.52, p’s > 0.11, ηp2 < 0.005). Each factor will be examined below. Again, we interpret p-values in light of Bonferonni’s correction, but unadjusted p-values are reported.

Regarding Case-type, similar differences appeared in studies 1 and 2. The new Blocker cases (M = 3.48, SE = 0.20) were rated lower than the original Non-Blocker cases (M = 4.65, SE = 0.25). The mean difference between them was significant (Mdiff = 1.17, SE = 0.24, p < 0.001, 95% Confidence Interval [0.71, 1.63]).

Regarding Property, similar mean trends in studies 1 and 2 appeared without significant differences. The mean of Vision (M = 4.46, SE = 0.28) was higher descriptively than Audition (M = 4.17, SE = 0.34) and Olfaction (M = 3.57, SE = 0.38). Again none of the differences between properties were significant, including the mean difference between Vision and Audition (Mdiff = 0.29, SE = 0.44, p = 0.51, 95% CI [−0.58, 1.16]), between Vision and Olfaction (Mdiff = 0.88, SE = 0.48, p = 0.07, 95% CI [−0.06, 1.82]), and between Audition and Olfaction (Mdiff = 0.59, SE = 0.51, p = 0.25, 95% CI [−0.41, 1.60]).

Regarding CRT, similar mean trends in studies 1 and 2 appeared. Participants who answered all the CRT items correctly tended to respond lower (M = 3.86, SE = 0.33) than those who answered at least one CRT item incorrectly (M = 4.27, SE = 0.21). However, in contrast to study 1 where the means were significantly different, in study 2 the means were not significantly different (Mdiff = 0.41, SE = 0.39, p = 0.30, 95% CI [0.36, 1.17]).

Regarding Phenomenal, there were also similar mean trends in studies 1 and 2. Participants who indicated some degree of disagreement with phenomenal, i.e. those in the Non-Phenomenal-Interpretation group, tended to respond lower (M = 3.22, SE = 0.10) than those in the Phenomenal-Interpretation group (M = 4.91, SE = 0.38). Also similar to study 1, in study 2 the mean difference between Non-Phenomenal-Interpretation and Phenomenal-Interpretation was significant (Mdiff = 1.69, SE = 0.39, p < 0.001, 95% CI [0.92, 2.46]). However, like in study 1, note that the samples sizes for Phenomenal-Interpretation are quite small: for study 2, the sample sizes ranged from 12 to 22 participants in each property group.

In summary, similar findings were obtained in studies 1 and 2. In study 2, participants disagreed more with the new Blocker cases than with the original Non-Blocker cases. While there were descriptive differences between the property groups and CRT groups, these differences were not significant. Also, participants who disagreed with Phenomenal, disagreed more with our target questions than those who did not, as we expected.

Ideal Participants

Similar to study 1, we now examine the responses of ideal participants, i.e. those participants who (a) passed the attention checks, (b) answered all the CRT items correctly, and (c) disagreed with Phenomenal. 270 participants (this happened by chance to be the exact same number as in study 1) out of the 584 used in the prior analyses are left in this ideal participant sample. Similar numbers were allocated to each property group: 95 participants for Vision, 93 for Audition, and 82 for Olfaction. Figure 4 displays the percentage of participants who gave each response (1–10) across each case and property.

Fig. 4
figure 4

Percentage of ideal participants for study 2 who gave each response across each case and property

A two-way mixed measures ANOVA was conducted with Case-type (original Non-Blocker, new Blocker) as a within-subjects factor, and Property (Vision, Audition, Olfaction) as a between-subjects factor. A significant effect was found for both factors: Case-type (F(1, 267) = 58.73, p < 0.001, ηp2 = 0.18) and Property (F(1, 267) = 3.65, p = 0.03, ηp2 = 0.03). The interaction was not significant (all F(1, 267) = 0.29, p = 0.75, ηp2 = 0.002). Each factor will be examined below. We interpret p-values in light of Bonferonni’s correction but unadjusted p-values are reported.

Regarding Case-type, the new Blocker cases (M = 2.18, SE = 0.11, 7.04% agree) were rated lower than the Original Non-Blocker cases (M = 3.38, SE = 0.16, 22.22% agree). The mean difference between them was significant (Mdiff = 1.20, SE = 0.16, p < 0.001, 95% CI [0.89, 1.51]). Like in study 1, even for ideal participants we find a difference between the Blocker and Non-Blocker cases with a substantial percentage agreeing with the latter.

Regarding Property, the highest mean was for Audition (M = 3.21, SE = 0.20), followed by Vision (M = 2.64, SE = 0.20) and Olfaction (M = 2.48, SE = 0.21). The mean difference between Vision and Audition (Mdiff = 0.57, SE = 0.28, p = 0.04, 95% CI [0.02, 1.12]) and the mean difference between Audition and Olfaction (Mdiff = 0.73, SE = 0.29, p = 0.01, 95% CI [0.17, 1.30]) were significant, but the mean difference between Vision and Olfaction was not significant (Mdiff = 0.17, SE = 0.29, p = 0.56, 95% CI [0.40, 0.73]). Interestingly, this pattern is opposite to what was found for study 1.

Order Effects

Using all 584 participant responses, potential order effects were examined. The main purpose of this analysis was to determine if non-ideal participants were more influenced by the order in which they experienced the cases than ideal participants. For each property, nearly half of the participants experienced the Blocker cases first (Vision = 104, 52.8%, Audition = 108, 54.3% and Olfaction = 83, 44.1%). A four-way ANOVA with Case-type (New-Blocker, Original-Non-Blocker) as a within-subjects factor and Property (Vision, Audition, Olfaction), Ideal (Ideal, Non-Ideal), and Order (New-Blocker-First, Original-Non-Blocker-First) as between-subjects factors was conducted. Regarding the main purpose of this analysis, we emphasize that there was no interaction between Order and Ideal (F(1, 572) = 0.98, p = 0.32, ηp2 = 0.02). Further, the effect of Order was not significant (F(1, 572) = 0.04, p = 0.97, ηp2 < 0.001). The three-way interaction between Order, Property, and Ideal was significant (F(1, 572) = 3.48, p = 0.03, ηp2 = 0.01). This interaction is explained by the reversed pattern of responses for ideal and non-ideal participants for the audition property. Ideal participants who saw the Blocker cases first gave higher responses (M = 3.76, SE = 0.28) than those who saw the Non-Blocker cases first (M = 2.65, SE = 0.28), while non-ideal participants who saw the Blocker cases first gave lower responses to the cases (M = 2.76, SE = 0.33) than those who saw the Non-Blocker cases first (M = 4.43, SE = 0.38). No other interactions with Order were significant (F’s < 1.17, p’s > 0.28).Footnote 6

4 Discussion

This discussion is broken down into four main sections. The first section looks at some background to the present project. The second section briefly discusses what we found. The third considers the philosophical implications of our results.Footnote 7

4.1 Background

Roberts et al. (2016) hypothesized that the folk concept of perception better aligns with a no blocker condition than with a causal condition. If the folk judged there to be a causal condition on perception, they should treat both Blocker cases and Non-Blocker cases the same, because in both types of cases the causal link between the object and the subject involved in normal perception is broken. Roberts et al. did not find this to be the case. They found that a substantial minority agreed that seeing occurs in the Non-Blocker cases, and that in the Blocker cases significantly less agreed that seeing occurs.

What is a no blocker condition? It is simply the condition on perception that there be no physical, external object (e.g. a mirror) that disrupts a subject’s perception of a target object. The no blocker condition is weaker than the causal condition. In any situation in which the no blocker condition fails, the causal condition also fails, because the causal link is broken by the blocker; whereas there are situations in which the causal condition fails but the no blocker condition is realized: namely, those situations described by the Non-Blocker cases in studies 1 and 2 (and by the Clock and Snake cases used by Roberts et al.). In the Non-Blocker cases there is no physical object blocking the subject’s perception. So, if the no blocker condition is correct, a substantial minority of non-philosophers can be said to have a more liberal interpretation of when someone counts as perceiving than many philosophers.Footnote 8

4.2 What we Did and Found

Our study 1 extended the sense modalities tested from only vision in Roberts et al.’s study to audition and olfaction as well. We also made some improvements to the study design: namely, we included the Phenomenal control item to check how participants were understanding our target items (regarding whether they see the actual object in front of them), and we included the original three CRT items (Frederick 2005) to see how being reflective might influence participants’ responses. Our phenomenal control suggests that most participants were interpreting our target questions as we intended, i.e. non-phenomenally. The CRT analyses suggest that even participants who demonstrate the highest degree of reflectivity respond to Blocker and Non-Blocker cases differently, with a substantial minority of participants agreeing that seeing occurs in the Non-Blocker cases. This holds true when looking at our ideal participants, which only include those participants who both had a high CRT score and who interpreted our target items non-phenomenally. Further, it is worth noting that the effect size for this difference is considered to be large and, in fact, larger than the effect size found when looking at the non-ideal participants.

Our study 2 was designed to address a remaining and important limitation with our study 1 (and with Roberts et al.’s study): the original, Gricean-style Blockers differed from the Non-Blockers in more ways than simply including a blocker; they also included redirection (e.g. in our study, due to a mirror, speakers, or fans). It may be protested that the difference between the Blocker cases and the Non-Blocker cases is due to the redirection element, and so we cannot say for sure that the results support the hypothesis that folk intuitions better align with a no blocker condition. In study 2, we tested whether the redirection element could explain our results by creating new Blocker cases that only differ from the Non-Blocker cases in that they include a blocker. As the difference between the Blocker and Non-Blocker cases remained with, again, a large effect size for the ideal participants (and larger than with the non-ideal), the redirection element cannot explain our results. It would seem that whether a blocker is present is the factor mediating the effect.

4.3 Philosophical Implications

Assuming that metaphysical necessity can be separated from conceptual necessity, our experiments are compatible with the causal theory of perception being metaphysically true; however, the results of our experiments present a robust challenge to the claim that the causal theory of perception is a conceptual truth.Footnote 9 There are different ways of understanding what exactly a conceptual truth is, although the basic idea is that conceptual truths are true in virtue of the meaning of the concepts that they involve. This claim about meaning is typically combined, in turn, with a claim about understanding, such that someone who understands the concept should thereby assent to the conceptual truth, at least when it is presented to them (see e.g. Snowdon 1980: 176; for discussion, see Williamson 2007). Intuitions about cases, like the Blocker and Non-Blocker cases considered in the present paper, are meant to draw out these implicit, conceptual commitments.

It is consistent with the claim that the causal theory of perception is a conceptual truth about perception that not everyone is disposed to accept the theory when it is first presented, and not everyone is disposed to agree about the conditions in which the relevant concept should be applied. Even so, some explanation needs to be provided for why these gaps between meaning and understanding emerge where and when they do. The present research presents a problem for four possible explanations on behalf of proponents of the claim that the causal theory of perception is a conceptual truth about perception.

First, it is not reasonable in the present case to suppose that the divergence is due to participants simply being inattentive; this is controlled for by the attention checks, and, amongst the ideal group, by their responses to the CRT: reflection requires attention. Second, it is unreasonable to suppose that the divergence is due to participants interpreting our target items phenomenally, for this is controlled for by the phenomenal control in the ideal group.

The responses of the ideal group also call into question a third possible explanation of the divergence: that some conceptual truths are more difficult to grasp than others, and that knowing how to apply concepts is more difficult in some cases than in others. It is plausible that the Blocker cases are conceptually easier to grasp than the Non-Blocker cases; this might be because, for example, mirrors are common, whereas hallucinations due to drugs are not. However, even among those who demonstrated good reasoning abilities, as evidenced by their responses to the CRT, a substantial proportion are still prepared to say that perceiving occurs in Non-Blocker cases. Further, the possibility that the Blocker cases may use easier to understand components is less of an issue for study 2 than for study 1, as study 2 uses Blocker cases that just involve a blocker added to the Non-Blocker cases. In other words, in study 2, the Blocker and Non-Blocker cases are very similar to each other.

The current study also raises concerns for a fourth line of response to the claim that the causal theory of perception is a conceptual truth. This line of response is an instance of the “reflection defense”: that in general it is only the intuitions or case-judgments of reflective individuals that we should rely on when assessing responses to thought experiments (see Machery 2017). The reflection defense is closely related to the better known “expertise defense” (e.g. Williamson 2007; Ludwig 2007), as reflective ability is a skill that is associated with philosophical expertise: Livengood et al. (2010), see also Easton (2018), discovered that participants with some graduate training in philosophy had a mean CRT score triple that of participants with no training. The reflection defense is distinct from the expertise defense, however, because the CRT is designed to identify “reflective” individuals who are able to override initial “gut” responses in favor of responses that demonstrate more careful thought, whatever their academic background. Even taking participants’ CRT scores into account, we still find that a substantial minority do not seem to subscribe to the causal theory of perception.Footnote 10

One might respond that it is not just reflective ability that is required but also philosophical training. For this response to succeed, there must be empirically testable factors that capture the training philosophers receive and which have the desired impact on empirical results. Regardless, if philosophical training has an effect, it might be tempting to think that the causal theory of perception is not a conceptual truth but either an a posteriori philosophical discovery or a philosophical precisification of our ordinary way of thinking that may or may not involve making a normative recommendation for how perception should be understood (which is not necessarily how it is understood). Further discussion would raise wider questions than can be considered here, although we note that both of these options put pressure on the common way of understanding the role of thought experiments in philosophical argument as providing theory-neutral evidence for philosophical claims.Footnote 11