Philosophers’ linguistic expertise: A psycholinguistic approach to the expertise objection against experimental philosophy

Philosophers are often credited with particularly well-developed conceptual skills. The ‘expertise objection’ to experimental philosophy builds on this assumption to challenge inferences from findings about laypeople to conclusions about philosophers. We draw on psycholinguistics to develop and assess this objection. We examine whether philosophers are less or differently susceptible than laypersons to cognitive biases that affect how people understand verbal case descriptions and judge the cases described. We examine two possible sources of difference: Philosophers could be better at deploying concepts, and this could make them less susceptible to comprehension biases (‘linguistic expertise objection’). Alternatively, exposure to different patterns of linguistic usage could render philosophers vulnerable to a fundamental comprehension bias, the linguistic salience bias, at different points (‘linguistic usage objection’). Together, these objections mount a novel ‘master argument’ against experimental philosophy. To develop and empirically assess this argument, we employ corpus analysis and distributional semantic analysis and elicit plausibility ratings from academic philosophers and psychology undergraduates. Our findings suggest philosophers are better at deploying concepts than laypeople but are susceptible to the linguistic salience bias to a similar extent and at similar points. We identify methodological consequences for experimental philosophy and for philosophical thought experiments.


Introduction
This paper will redevelop and assess the 'expertise objection' to experimental philosophy, by drawing on methods and findings from psycholinguistics. Experimental philosophy focuses on the empirical investigation of philosophically relevant intuitions. According to the expertise objection, experimental philosophers go wrong already at the first step of their empirical studies: they recruit the wrong participants. 1 Experimental philosophers typically recruit convenience samples without philosophical training: M-Turkers, psychology undergraduates, etc. But philosophical training and expertise improve thinkers' conceptual competencies and, thereby, their intuitive case judgments. Findings about the intuitions of 'laypeople' are therefore irrelevant for philosophical research.
This objection has been initially directed at the 'negative', 'restrictionist', and 'evidential' strands of experimental philosophy. These strands seek to assess the evidentiary value of philosophically relevant intuitions and examine intuitions elicited by verbal case descriptions in philosophical thought experiments (reviews: Machery, 2017;Mallon, 2016). Empirical findings about laypeople's intuitions about X -specifically, that they are sensitive to irrelevant factors or cognitive biases -are meant to support the conclusion that professional philosophers should not treat (all or some of) their own intuitions about X as evidence for philosophical theories. These methodological arguments rely on the inductive 'lay-expert inference' from experimental findings about laypeople to the conclusion that also professional philosophers' intuitions will be influenced by the irrelevant factors and biases found to affect lay participants. The expertise objection challenges this inference: The objection assumes that professional 1 The complementary 'reflection objection' charges that most experimental philosophy studies the wrong judgments -spontaneous, rather than reflective (review: Machery, 2017, pp.155-158). For its empirical assessment, see, e.g., de Bruin, 2020;Kneer et al., forthcoming;Schwitzgebel & Cushman, 2015. philosophers have a methodological or conceptual expertise that laypeople possess to a lesser extent; it suggests that this expertise makes philosophers less vulnerable to the irrelevant factors and biases that affect laypeople's case judgments; and it infers that philosophers' intuitions are more stable and accurate (reviews: Machery, 2017, pp.158-169;Nado, 2014). This has consequences also for straightforwardly 'positive' experimental philosophy (e.g., for 'conceptual analysis 2.0'; Machery, 2017, pp.208-244): If necessary at all, experimental implementations of the method of cases should recruit philosophers as participants.
The empirical assessment of this objection simultaneously promises to contribute to elucidating the nature of philosophical expertise. To assess the objection, the 'direct strategy' conducts experiments with laypeople and philosophers that examine whether irrelevant factors or biases affect the two groups' intuitions about philosophically relevant cases differently. Only few studies to date have clearly executed this compelling strategy, with a strong focus on moral intuitions (see Sect. 2). Our paper will range further and dig deeper: We turn from intuitions about specific kinds of cases to comprehension inferences which determine how case descriptions are interpreted and thereby shape judgments about the cases described, in any area of philosophy. This move will allow us to redevelop the expertise objection by drawing on psycholinguistics. An experiment will employ the direct strategy to examine whether academic philosophers are better than psychology undergraduates at deploying conceptual information and whether philosophers are less susceptible to cognitive biases affecting the interpretation of case descriptions.
Section 2 distinguishes different versions of the expertise objection and reviews extant evidence to identify the most promising version or objection. Section 3 draws on findings from psycholinguistics and experimental philosophy to develop this linguistic expertise objection (LEO), complement it with the new linguistic usage objection (LUO), and outline how these two objections jointly provide a 'master argument' against experimental philosophy's layexpert inference. Sections 4-5 empirically examine these two objections. Section 6 discusses the findings' -productive -consequences for both experimental philosophy and the methodology of philosophical thought experiments.

Expertise objections
The expertise objection is commonly motivated by an analogy: Like members of other academic disciplines, philosophers have specific professional expertise. Analytic philosophers arguably 'are experts in the analysis of folk concepts' (Horvath, 2010, p.465). Such analysis involves thought experiments that elicit intuitions about the applicability of concepts in hypothetical cases. While philosophers' professional expertise will extend considerably further, it should therefore encompass an 'intuitive expertise': Like, e.g., the mathematical intuitions of mathematicians, philosophers' intuitions about the applicability of concepts to hypothetical cases will be more reliable than those of non-experts (e.g., Hales, 2006, p.171;Williamson, 2011, p.220). This undermines experimental philosophy's lay-expert inference (Devitt, 2011;Hales, 2006;Horvath, 2010;Kauppinen, 2007;Ludwig, 2007;Williamson, 2007;. The 'intuitive expertise' is taken to arise from philosophers' superior ability to 'apply general concepts to specific examples with careful attention to the relevant subtleties' (Williamson, 2007, p.191;cf. Ludwig, 2007, p.138;Horvath, 2010, pp.466-467). This superior conceptual competence can be due to different kinds of professional expertise that philosophers could credibly claim as a result of training or selection effects. Plausibly, philosophers are better versed in the methods of philosophical thought experimentation. Weinberg and colleagues (2010, p.336) distilled from the debate the further suggestions that philosophers could benefit from better conceptual schemata or domain theories, or from better cognitive skills than laypeople. That is, philosophers could possess better relevant conceptual or world knowledge, or could be better at deploying their knowledge in making their judgments.
We thus obtain three distinct versions of the expertise objection that have been advanced, often in tandem:  According to the 'methodological expertise objection', philosophers have more experience with the method of cases. This makes them better at interpreting the task and taking into account precisely the task-relevant information in vignettes (Ludwig, 2007, p.153;Williamson, 2011, p.216).  According to the 'epistemological expertise objection', philosophical training and research lead philosophers to develop more extensive or better structured representations of conceptual and other knowledge about the domain of their philosophical theorizing. This makes their case judgments better informed and more sensitive to relevant information (Devitt, 2011, p.426;cf. Ludwig, 2007, p.153;Weinberg et al., 2010, pp.335-336).  According to the 'linguistic expertise objection', philosophers are better at deploying semantic or conceptual knowledge: In judgment and reasoning about verbally described cases, they are generally better at contextualizing conceptual information (Williamson, 2011, p.216); i.e., they are better at taking into account also contextual information and background knowledge, e.g., in disambiguating ambiguous expressions and enriching sketchy case descriptions (Horvath, 2010, p.467). All objections claim that philosophers possess a certain expertise or skill to a higher extent than laypeople, assume that this expertise or skill renders intuitive case judgments more reliable, and conclude that philosophers' case judgments are more stable, i.e., less susceptible to irrelevant factors and cognitive biases, and more accurate than laypersons' intuitions. 2 Philosophers' intuitions can only be more stable and accurate than laypeople's if they are different. The 'direct strategy' (Schulz et al., 2011(Schulz et al., , p.1724) therefore assesses empirically (1) whether philosophers' intuitive case judgments about a domain differ from lay judgments. It further assesses (2) whether the philosophers' judgments are more stable. The assessment (3) of their relative accuracy is difficult since there are no uncontroversial ways of telling which philosophically relevant intuitions are accurate. Experimentalists have examined, instead, whether philosophers' intuitions are more internally coherent (Löhr, 2019) or closer to a textbook consensus (Horvath &Wiegmann, 2016;Schindler & Saint-Germier, ms).
Eight studies to date clearly execute the first two steps. All examine ethically relevant intuitions. All show that philosophers' intuitions are influenced by irrelevant factors or biases. Four studies on ethically relevant intuitions (about hedonism, free will/moral responsibility, and moral dilemmas) did not ensure that philosophical participants had high levels of relevant topical expertise (Löhr, 2019;Schultz et al., 2011;Tobia et al., 2013a;2013b). Even so, these participants will have been proficient with the method of cases that is used across different areas of philosophy. These studies can therefore be regarded as addressing (only) the methodological expertise objection -and finding against it. 3 Four further studies simultaneously addressed also the epistemological expertise objection, by recruiting expert ethicists for an investigation of moral intuitions: They compared the moral permissibility judgments laypeople and expert ethicists make about trolley or related cases. Both groups' judgments were subject to order effects of the same size, reduced neither by reflection prompts nor self-reported expertise on the specific issues in question (Schwitzgebel & Cushman, 2012;; experts' intuitions were no less sensitive to order effects and an irrelevant factor (inclusion of irrelevant response options) (Wiegmann et al., 2020) and were susceptible to almost as many psychologically distinct framing effects as laypersons' (Horvath & Wiegmann, 2021). These studies speak against the epistemological expertise objection: either philosophical ethicists do not have more extensive or better structured moral knowledge than laypeople, or such 'philosophically improved' moral knowledge does not render people's moral case judgments notably less susceptible to irrelevant factors and biases.
Two studies examining accuracy rather than stability suggest that the difficulties documented for the epistemological and methodological expertise objections are not restricted to the domain of moral philosophy. Horvath and Wiegmann (2016) found the intuitive knowledge attributions of expert epistemologists were only partially consistent with the textbook consensus. A recent study speaks to the methodological expertise objection: Schindler and Saint-Germier (ms) compared philosophers' and laypersons' judgments about six cases pertaining to thought experiments from across theoretical philosophy and found philosophers' judgments were significantly closer to the textbook consensus for -only -half the cases. 4 While these first two expertise objections require further investigation, extant findings motivate turning to the remaining linguistic expertise objection. Our study is the first to develop and assess this objection -and to execute all three steps of the direct strategy. We now set out this empirically neglected objection, explain why it matters, and how we propose to render it empirically tractable.
According to the linguistic expertise objection (LEO), philosophers are better than laypeople at deploying conceptual information (even when they possess the same conceptual information as laypeople); this deployment competence makes their judgments about verbally described cases more stable and accurate. This objection considers the process that leads, in philosophical thought experiments, from verbal case descriptions to intuitive judgments about the cases described. Properly understood, LEO addresses the first stage of the process: the interpretation of the verbal case description. Psycholinguistic research (to be reviewed in Sect. 3.1) reveals that the interpretation readers place on texts is built up from 'conceptual' information that is automatically activated by words, by default, as we read them. The interpretation process involves integrating information that gets sequentially activated, as we read through the text: we need to integrate information activated by words we read now with information activated by words we read previously; we need to complement information activated by individual words with information activated only by larger chunks of text (e.g., combinations of words) or wider discourse context, and with background knowledge; and we need to suppress initially activated information that subsequently turns out to be irrelevant in the given context. Being better at deploying conceptual information thus amounts to being better at contextualizing conceptual information in these ways.
As developed in the light of these empirical findings, LEO assumes that (1) Philosophers are better than laypeople at contextualizing conceptual information, that is, at complementing and suppressing default information, as appropriate. LEO further assumes that (2) Better contextualization (complementation and suppression) ability renders philosophers' interpretations of vignettes less susceptible to comprehension biases and, thereby, less sensitive to irrelevant factors (like verbal differences between equivalent formulations or order of presentation). Philosophical vignettes are crafted to include the information to be taken into account in making the judgments of interest to the thought experimentalist. This motivates the third assumption: (3) Improved ability to take sentence and discourse context into account through complementation and suppression of default information will better align readers' interpretations with the intended interpretation.
Improved contextualization ability thus renders philosophers' interpretations of vignettes more stable and accurate. Since these interpretations shape the intuitive judgments people make about the cases described, LEO infers that also philosophers' intuitive case judgments are more stable and accurate than those of laypeople. LEO challenges experimental philosophy's lay-expert inference for many important philosophical thought experiments: The default information activated by words includes mainly information about typical properties of objects, people, and events (see Sect. 3.1). However, to address their research questions, philosophical thought experiments frequently need to consider unusual cases that pull apart features that typically go together (Machery, 2017, pp.111-118). To accurately interpret descriptions of such cases, people need to either complement the default information with further contextual information or to suppress some of the default information that is stipulated not to apply to the case. For example, to correctly interpret Gettier cases, people need to complement the information that the protagonist has a justified true belief with the further information that they are right by chance (which is atypical for cases of justified true belief) -and need to take both into account in their case judgments (Turri, 2013). Similarly, to correctly interpret zombie scenarios, people need to disambiguate the polysemous term 'zombie' and suppress the default information that zombies have rotting bodies and attack and eat humans, to take into account that the 'philosophical zombies' at issue are physicobehaviorally indistinguishable from us . It is therefore prima facie plausible to suggest that pronounced differences in the ability to complement and suppress default information can translate into different judgments in many important philosophical thought experiments.
We propose to go beyond extant studies not only in examining this empirically neglected expertise objection, but also in drilling down deeper. To contribute to the gradual elucidation of how different cognitive skills are involved in philosophical expertise, intuitive or other, we drill down do the level of specific cognitive skills, as captured by empirically valid psychological constructs. Above, we distinguished three relevant kinds of expertise and detailed how extant studies found against expertise objections based on two of them. In turning to the remaining expertise of interest, we employ a 'specific skills approach': We consider specific cognitive skills that underwrite the expertise, and ask whether philosophers possess a particular skill to a higher extent than laypeople (as per assumption 1 above), and whether this renders philosophers' judgments more stable (as per 2) and more accurate (as per 3). 5 With this approach, we examine suppression or 'inhibition' (a focus motivated in Sect. 3.1 below), investigate susceptibility to the comprehension bias from which higher inhibition is most likely to shield participants (Sect. 3.2), and study its influence on interpretation accuracy (see Sect. 6.1). The novel approach also motivates the use, in the main study (Sect. 5), of simple (onesentence) items, whose interpretation does not stand to benefit from familiarity with philosophical thought experimentation or expert background knowledge. This allows for targeted examination of the linguistic expertise objection, without confounds pertaining to the methodological or epistemological expertise objections.

Two complementary objections
According to the linguistic expertise objection (LEO), philosophers are better than laypeople at deploying conceptual information and this makes their judgments about verbally described cases more stable. We now draw on research from psycholinguistics in order to translate this objection into empirically testable hypotheses. To do so, we spell out what 'deploying conceptual information' amounts to (Sect. 3.1) and identify a philosophically relevant bias that better 'deployment competence' should shield philosophers from (Sect. 3.2). These two steps will translate the objection's first two assumptions -(1) and (2) above, respectively -into testable hypotheses. Appreciation of the bias will simultaneously motivate a new 'linguistic usage objection' (Sect. 3.3).

Conceptual information and its deployment
What is psychologically real 'conceptual information'? Cognitive science draws the distinction between conceptual and other information in processing terms and typically conceives of 'concepts' as bodies of information stored in long-term memory and retrieved by default, in the exercise of higher cognitive competencies including language comprehension, perceptual categorization, and inductive learning (review: Machery, 2009). Conceptual information thus is information that is retrieved by default, i.e., rapidly retrieved (e.g., in response to a verbal stimulus), either in every context (such as any textual context) (Machery, 2017) or outside all context (as in single word priming experiments) (Fischer, 2020), by an automatic process (Bargh et al., 2012).
The information that qualifies as 'conceptual' in virtue of default retrieval mostly is information about the world that philosophers consider 'empirical': Information is retrieved automatically through activation of representations including stereotypes (a.k.a. 'prototypes' or 'schemas'). Stereotypes are built up through observation of co-occurrences in the physical environment and through extraction of co-occurrence information from linguistic discourse (McRae & Jones, 2013). They encode statistical information about typical and diagnostic properties of category members (Hampton, 2006). More complex stereotypes (situation schemas) encode information about typical features of events or actions, agents, 'patients' acted on, and typical relations between them (Ferretti et al., 2001;Hare et al., 2009;McRae et al., 1997). Dependency networks in complex schemas encode causal, functional, and nomological information (Sloman et al., 1998). Much of this 'world knowledge' qualifies as conceptual information, due to default activation: Many stereotypes are associated with nouns and verbs which rapidly activate them in single-word priming experiments (Lucas, 2000).
Activated stereotypes support defeasible default inferences about what (else) is (also) true of the situation talked about (e.g., unless indicated otherwise, the 'tomato' is red; Levinson, 2000). 6 'Conceptual' information in cognitive science's sense, namely, statistical world knowledge encoded by stereotypes, thus provides an initial basis for utterance interpretation (Elman, 2009). For present purposes, the most relevant utterances are the case descriptions philosophers consider in thought experiments -and typically encounter through reading, like participants in experimental-philosophy studies. In reading comprehension, relevant conceptual knowledge and further world knowledge need to be integrated into the situation model: the mental representation of the situation described by the text, which provides the basis for further judgements and reasoning about that situation (Kintsch, 1988;Zwaan, 2016). To facilitate accurate judgment and reasoning about specific situations, we need to contextualize our default inferences. In this setting, the competence of 'deploying conceptual information' consists in a twofold ability to manage the information that individual words activate by default, as we read them: the ability to suppress from the situation model the conclusions of default inferences that are contextually irrelevant (Faust & Gernsbacher, 1996), and to complement relevant default information with further world knowledge that is contextually relevant but is activated only by combinations of words rather than any single word (Bicknell et al., 2010;Matsuki et al., 2011), in the sentence or wider discourse context (Metusalem et al., 2012).
Competence at these tasks is modulated by two different forms of intelligence (Cattell, 1987): 'fluid intelligence' only minimally depends upon prior learning; 'crystallized intelligence' reflects cultural learning and includes both world or domain knowledge and lexical knowledge. Better domain knowledge helps readers to complement conceptual knowledge, to arrive at utterance interpretations that are positive, stereotypical, and specific (Levinson, 2000, pp.114-115;Garrett & Harnish, 2007). Better domain knowledge also cancels stereotypical inferences that less knowledgeable readers regard as relevant. Similarly, richer lexical knowledge supports both complementation and suppression, namely, by facilitating pragmatic inferences from oppositions between authors' chosen words and informationally stronger and weaker expressions (Levinson, 2000, pp.75-104) and from authors' preferences of marked expressions over shorter, more frequent, or neutral words (pp.136-137). These pragmatic inferences can complement or defeat stereotypical inferences (pp.157-158). At the level of fluid intelligence, low-level cognitive abilities conceptualized as 'executive functions' (Miyake et al., 2000) modulate the exercise of several cognitive competencies, including reading comprehension (review: Butterfuss & Kendeou, 2018). For our purposes, the key function is inhibition (Miyake & Friedman, 2012;cf. Dempster, 1990): the ability to manage the activation of irrelevant information and to actively inhibit or suppress prepotent responses to stimulisuch as default inferences from verbal stimuli, where they are contextually irrelevant.
Better domain knowledge is claimed for philosophers by the epistemological expertise objection which did not stand up well to empirical scrutiny (Sect. 2). By contrast, it is a priori plausible that, due to training and selection effects, academic philosophers should benefit (i) from better lexical knowledge, which correlates with years in formal education (Engelhardt et al., 2008) and extent of reading (Stanovich, 1993), and (ii) from higher inhibition, which correlates with verbal intelligence in adolescents and adults (Friedman et al., 2006). On balance, these two factors favor suppression of irrelevant default information more than complementation with relevant further information. In developing the linguistic expertise objection (LEO), we therefore focus on suppression ability: to test LEO's first assumption, that philosophers are better than laypeople at contextualizing conceptual information, we'll examine H1 Academic philosophers are better than laypeople (e.g., psychology undergraduates) at suppressing default inferences that are contextually irrelevant.

A philosophically relevant cognitive bias
According to LEO's second assumption, higher levels of conceptual competence shield philosophers from comprehension biases. The stronger competence claimed by H1 should shield them at least against biases that promote contextually irrelevant stereotypical inferences. One such bias is the linguistic salience bias that affects polysemy processing. Many words (over 40% in English) are polysemous, i.e., have several distinct, but related senses (Byrd et al., 1987). Subordinate senses can sometimes be generated by rules (as in metonymy) and sometimes not (as in metaphor) and are processed accordingly (reviews: Eddington & Tokowicz, 2015;Vicente, 2018). Different senses of 'irregular' polysemes do not activate distinct semantic representations (Klepousniotou et al., 2012;McGregor et al., 2015), but a 'unitary representation' that consists of overlapping feature clusters (stereotypes) (Brocher et al., 2016). The interpretation of specific uses involves suppressing component features that are not shared by different senses and irrelevant in the given utterance context (cf. Giora, 2003;Giora et al., 2007). E.g., the verb 'to see' activates a schema with agent features including S looks at X, S knows X is there, and S knows what X is, and patient features including X is in front of S and X is near S. To interpret a purely epistemic use ('Mary saw the possibilities'), the hearer needs to suppress all features except the epistemic agent features, to obtain the intended interpretation (Mary knew there were possibilities and knew what they were).
Such suppression becomes difficult where one sense exceeds all others in linguistic salience. The linguistic salience of a sense is a function of exposure frequency (of how often the hearer encounters the word in one sense, rather than another), modulated by prototypicality (how good examples of the relevant category -say, seeing -the word is deemed to stand for in this sense) (Giora, 2003). The feature cluster associated with more frequently encountered senses are activated more strongly (Brocher et al., 2018), and clusters constitutive of more prototypical sub-categories are activated more strongly (Hampton, 2006). Accordingly, features associated with the most salient sense are activated most strongly. Frequently co-occurring component features of an activated stereotype exchange lateral cross-activation (Hare et al., 2009;McRae et. al., 2005). Where such cross-activation complements strong initial activation due to high linguistic salience, feature suppression becomes difficult. Irrelevant component features of the dominant stereotype then remain partially activated and support inappropriate inferences (from 'Mary saw the possibilities' to the possibilities were in front of Mary), as per the linguistic salience bias hypothesis (SBH) (Fischer & Engelhardt, 2019;: When (i) one sense of an irregular polyseme is much more salient than all others, (ii) interpretation of utterances using a subordinate sense requires suppression of features associated with that dominant sense, and (iii) some, but not all, of the features strongly associated with the dominant sense are contextually relevant then (1) contextually irrelevant stereotypical inferences supported by the dominant sense will be triggered by the subordinate use as well, and (2) these automatic inferences will influence further judgment and reasoning.
This bias matters for philosophy: Philosophers often employ familiar words in new, but related senses, so that conditions (i) and (ii) are met . Philosophical thought experiments often pull apart features that typically go together (Machery, 2017, pp.116-18), so that (iii) is met. In such thought experiments and related arguments, case descriptions will trigger contextually inappropriate inferences whose conclusions will enter the situation model on which judgments and reasoning about the described case are based. For example, Fischer and Engelhardt (2020) suggested that the 'argument from hallucination' relies on contextually inappropriate default inferences from phenomenal uses of perception verbs ('Macbeth saw a dagger') to factive and spatial conclusions (There was a dagger in front of Macbeth) that are cancelled by the context but, even so, presupposed in further reasoning (from 'There was no physical dagger before Macbeth' to 'There was a non-physical dagger').
As proponents of the expertise objection have plausibly assumed (e.g., Horvath, 2010, p.471), analytic philosophers are well trained in distinguishing, explaining, and reasoning with, different senses of words. Polysemy processing therefore is an arena in which it is particularly plausible to expect analytic philosophers to be better at deploying conceptual knowledge than laypeople. The linguistic salience bias that affects polysemy processing is a cognitive bias from which better suppression ability (as per H1) seems most apt to shield philosophers. This bias is therefore ideally suited to put LEO to the test: we will examine the objection's second assumption, that better contextualization ability renders philosophers less susceptible to comprehension biases, by investigating H2 Professional analytic philosophers are less susceptible to linguistic salience bias than laypeople.
We thus obtain this empirically developed LEO: Philosophers are better than laypeople at suppressing contextually irrelevant default inferences (as per H1). They are therefore less susceptible to cognitive biases including the linguistic salience bias (as per H2). As a result, their interpretations of verbally described cases will be more stable and based on more coherent situation models -and, therefore more accurate (as per LEO's remaining third assumption). This will render philosophers' case judgments more stable and accurate.

LUO: The linguistic usage objection
The linguistic salience bias hypothesis simultaneously motivates a new, alternative objection to the lay-expert inference: If (pace H2) philosophers are equally susceptible to this bias, they will be susceptible to it at different points. Specialists may use a word much more frequently in a technical sense than laypeople do. In this case, an ordinary sense that stands out in salience for laypeople will not stand out so much for specialists. Even if linguistic salience bias leads laypeople to make contextually inappropriate stereotypical inferences that are supported by that word's ordinarily dominant sense, the bias will not lead the specialists to make these inferences. Conversely, specialist discourse may make a sense that is not dominant in ordinary discourse clearly stand out in linguistic salience for specialists. This salience imbalance may lead specialists to make inappropriate stereotypical inferences that laypeople avoid. Either way, different inferences will feed into the situation models that ground laypeople's and philosophers' judgments about verbally described cases, and their responses will differ, as will the soundness of these responses. This linguistic usage objection (LUO) translates into two hypotheses: HF [Frequency Hypothesis] Different senses or uses of some familiar words have notably different relative exposure frequencies for laypeople and expert philosophers. H3 These differences make expert philosophers and laypeople vulnerable to linguistic salience bias when encountering different words.
Together, LEO and LUO promise to add up to a 'master argument' against experimental philosophy's lay-expert inference: If academic philosophers are better at suppressing irrelevant default inferences (as per H1) and therefore less susceptible to linguistic salience bias than laypeople (as per H2), the philosophers will make fewer mistakes. If they are equally susceptible to the bias (and their linguistic diet differs from laypersons', as per HF), philosophers will make different mistakes (as per H3). Either way, experimental findings about lay responses to verbally described cases will not simply carry over to expert philosophers. This argument challenges the lay-expert inference where philosophers use irregular polysemes from ordinary discourse in special senses, to talk about unusual cases that pull apart what typically goes together. To empirically assess this argument, we conducted corpus analyses and an experiment.

Corpus analyses
Exposure frequencies are commonly inferred from occurrence frequencies in corpora. To examine the frequency hypothesis HF and derive empirically testable predictions from the competing hypotheses H2 and H3, we conducted three manual corpus studies (Sect. 4.1) and distributional semantic analysis (Sect. 4.2). To support HF and test H3, we need to identify polysemous words of philosophical interest that display pronounced salience imbalances in ordinary discourse which are absent or reversed in specialist philosophical discourse (so that similar susceptibility to linguistic salience bias will lead laypeople and philosophers to make different inappropriate inferences). To test H2, we need words where salience imbalances are preserved in philosophical discourse (so that different propensities to make inappropriate inferences from them will be indicative of different susceptibility to the bias). The best experimental evidence for linguistic salience bias of philosophical interest (see Sect. 3.2.) comes from two perception-verbs (Fischer & Engelhardt, 2017;2019;. We examined these verbs, to ascertain whether they provide what we need.

Corpus analyses
We examined the use of the verbs 'see' and 'be aware of' in samples of at least 1000 sentences randomly drawn from three corpora roughly representative of ordinary discourse, academic philosophy, and a specific sub-area, respectively: (1) the British National Corpus (BNC), (2) a topically generic philosophy corpus compiled from two philosophy encyclopedias (Stanford Encyclopedia of Philosophy and International Encyclopedia of Philosophy) (SEP/IEP), and (3) a philosophy of perception corpus (PHILO-P) comprised of ten monographs that shaped philosophical debates about sense-data, (challenges to) naïve and direct realism, and the resulting 'problem of perception'. We classified the occurrences of 'see' and 'aware of' as perceptual or non-perceptual and assigned uses of 'see' to one of twelve dictionary-attested senses. Methods and results are detailed in Appendix A.
Headline findings ( Table 1) provide evidence of pronounced salience imbalances in ordinary discourse that, in specialist philosophical discourse, are roughly preserved for 'see' and reversed for 'aware of'. In ordinary discourse (BNC), perceptual uses (where the agent perceives by sense the object of sight or awareness) are clearly dominant for 'see' and clearly subordinate for 'aware of'. Slight changes in usage patterns across corpora for 'see' are driven mainly by an increase of purely epistemic uses of 'see' ('know/understand something' or 'find out' without using one's eyes), from 12% of classifiable occurrences in the BNC sample to 23% in the SEP-IEP sample and 36% in PHILO-P (see Appendix A). For 'aware of', we observe a dominance reversal between ordinary discourse, where the purely epistemic use ('know about a fact or situation') dominates, and specialist discourse (PHILO-P), where the perceptual use is dominant. The two verbs seem to provide what we need. Table 1. Perceptual uses as percentage of classifiable uses in random samples from corpora.

Distributional semantic analysis
To extend our analysis, we built a computational model of the uses of 'see' and 'aware of' across our three corpora. Methods and results are detailed in Appendix B. We constructed distributional semantics representations of each occurrence of either verb in our annotated samples. We used those representations to train a classifier which classifies a given occurrence as perceptual or non-perceptual. We had already annotated manually all uses of 'see' and 'aware of' in the smaller PHILO-P corpus but deployed the classifier to classify all their uses in the larger corpora --and considered separately their use in the academic section of the BNC (ACPROSE) and the remainder of this corpus (Table 2). In the whole BNC we observed a still dominant, but lower proportion of perceptual uses of 'see' than in our random sample. This demonstrates the usefulness of automatic classification to correct potential sampling biases. For the BNC, we now observe an almost identical proportion as for SEP-IEP and PHILO-P. The markedly lower proportion in ACPROSE (42%) suggests that academic philosophers may be professionally exposed to perceptual uses of 'see' less frequently than the philosophy corpora suggest. Even so, differences in exposure frequencies between academic philosophers and laypersons seem bound to remain minor for 'see'. Distributional semantic analysis thus confirms that perceptual uses of 'see' will be roughly equally salient for academic philosophers and laypeople, so that any differences in judgment and reasoning will be due to different susceptibility to linguistic salience bias (as per H2). Findings for 'aware of' confirm the dominance reversal in the philosophy of perception, as reflected in PHILO-P: In all other corpora, the verb's perceptual use is subordinate. This dominance reversal (as per HF) suggests comparisons of philosophers of perception with other philosophers and laypeople will allow us to assess H3.
Further relevant findings emerge from prior validation of our classifier. Classifiers are validated by assessing their verdicts against human annotations and showing that they perform better than a simple chance heuristic (which classifies all occurrences of a word as instances of its dominant use in the corpus). We observed major improvements on this baseline, and accuracy over 90% (Table 3). This indicates that context words (without even syntactic parsing) provide enough information to identify non-perceptual uses of 'see' and 'aware of', whose interpretation requires suppression of initially activated schema components (see Sect. 5.1). We infer that no specialist knowledge is required to identify need for suppression; laypeople should perform as well at the task as philosophers. To follow up this suggestion and determine whether differences in linguistic diet might make it difficult for laypeople to identify perceptual vs non-perceptual uses in unfamiliar discourse settings like philosophical vignettes, we performed cross-domain classification: We trained our classifier on one domain's annotation (e.g., BNC) and tested its accuracy on an annotated sample from another domain (e.g., SEP-IEP). Results (Table 4) still show considerable improvements over baseline.  Moderate drops in performance are observed in specific directions, especially when training on PHILO-P, but accuracy remains over 80% in nearly all cases -even though classifications are based only on information about word co-occurrences. Humans, who can take into account also syntactic information, wider context, and world knowledge, should have little trouble identifying non-perceptual uses (or need for suppression) in unfamiliar discourse settings. Validation and cross-domain classification findings jointly suggest that laypeople are no less able than philosophers to identify subordinate uses of our target words -and the need to suppress components of initially activated schemas -when reading experimental vignettes.
Responses to texts using the two verbs therefore allow us to study to what extent laypeople and philosophers differ in their ability to act on this insight -where exposure frequencies are similar ('see') (to test H2 and LEO) and where they differ ('aware of') (to test H3 and LUO).

Experiment
To examine the competing objections (LEO and LUO), we employed a plausibility rating task, and compared responses from psychology undergraduates, philosophers of perception ('PoPs'), and 'Other Philosophers'.

Predictions
We used the psycholinguistic cancellation paradigm to examine spatial inferences from visual and purely epistemic uses of 'S sees X' and 'S is aware of X' to X is in front of S. 7 Participants read sentences with concrete and abstract objects, intended to invite visual and purely epistemic readings of the verb, respectively. Half the items were inconsistent with the 'see'-stereotype ('s-inconsistent') and placed the object behind the agent: (3a/b) Chuck sees / is aware of the spot on the wall behind him. (s-inconsistent visual) (4a/b) Jack sees / is aware of the problems that lie behind him. (s-inconsistent epistemic) Arguably due to embodiment effects (Glenberg & Kaschak, 2002), both 'see' and 'aware of' initially activate a schema that sees the agent looking at an object of sight/awareness before them (Fischer & Engelhardt, 2019, pp.71-72, 81;2020, p.428). Both verbs thus trigger spatial default inferences, which clash with s-inconsistent sequels. 8 In response to such conflicts, stereotypical inferences can be completely suppressed within one second and fail to influence subsequent unspeeded plausibility judgments (Fischer & Engelhardt, 2017).
The linguistic salience bias hypothesis (SBH) maintains that participants are unable to completely suppress such contextually inappropriate inferences where the stereotype supporting them is associated with the dominant use of the verb. For 'see', the perceptual use that supports spatial inferences is dominant in ordinary and philosophical discourse (Sect. 4). The SBH hence predicts that spatial inferences will influence plausibility judgments of laypeople and philosophers even where 'see' is ostensibly used in a purely epistemic sense (including s-inconsistent items like 4a). The dictionary-attested purely epistemic sense of 'see' ('know/understand something') and familiar spatial time metaphors (whereby ahead = in the future; behind = in the past) facilitate purely metaphorical interpretations of these items (Joe knows what problems he will have in the future and Jack knows what problems he had in the past). To obtain these intended interpretations, participants need to completely suppress initial spatial inferences. But what if participants cannot suppress spatial inferences from 'see'? The space-time metaphors in our items give rise to embodied cognition effects (Boroditsky & Ramscar, 2002;Bottini et al., 2015) and support spatial reasoning about temporal relations (Casasanto & Boroditsky, 2008;Gentner et al., 2002). Persistent spatial inferences from 'see' will prevent purely metaphorical interpretation also of the space-time metaphors, engage spatial reasoning, and create the impression of a conflict, in s-inconsistent 'see'-items. Prevention of purely metaphorical interpretation can result in persistent 'visual' interpretation that identifies, e.g., the problems seen with visible objects (Mountaineer Jack sees the difficult-to-cross crevice 7 In this paradigm, participants read or hear sentences where the expression of interest is followed by a sequel that is inconsistent with (or 'cancels') a hypothesised inference from that expression. If the automatic inference is triggered, its clash with the sequel will engender comprehension difficulties requiring cognitive effort. If the inference is not suppressed, the perceived clash will persist and lower the sentence's plausibility. Effort is picked up by eye-tracking measures including pupil dilations and longer 'late' reading times, and by signature electrophysiological responses ('N400s'); plausibility is assessed with rating tasks (see Fischer & Engelhardt, 2019, for a review). 8 The studies cited used the cancellation paradigm to document these inferences and provide evidence from pupillometry (Fischer & Engelhardt, 2020) and reading times (Fischer & Engelhardt, 2019). They also exclude various confounds (e.g., appropriate factive, rather than inappropriate spatial inferences from epistemic uses). that lies behind him). Even where the object (say, problem) is not identified as visual, the impression of a conflict between spatial implications from 'see' and the sequel will make sinconsistent 'see'-sentences feel 'weird' and lower their plausibility.
By contrast, the perceptual use of 'aware of' is clearly subordinate at least for laypeople and Other Philosophers (Sect. 4). Linguistic salience bias will therefore impede their suppression of initially triggered spatial inferences only from 'see', but not from 'aware of', and will not prevent purely metaphorical interpretation of s-inconsistent epistemic items with 'aware'. Since s-inconsistent epistemic items are more plausible on the purely metaphorical interpretation that (according to the SBH) is unobtainable for 'see'-sentences, laypeople and Other Philosophers will rate s-inconsistent epistemic 'see'-items less plausible than their 'aware'-counterparts -even though, on the contextually appropriate metaphorical interpretation, both mean the same (Jack knows what problems he had in the past). This sameness of meaning makes the effect size of this comparison a potential measure of the strength of linguistic salience bias.
The competing hypotheses making up LEO and LUO, respectively, make different predictions about cross-group comparisons. LEO's first component, H1, claims philosophers are better than laypeople at suppressing contextually cancelled default inferences. We can assess this claim without complications from linguistic salience bias by considering spatial inferences from 'aware of'. These are cancelled by s-inconsistent sequels. The purely epistemic use of 'aware of' that is dominant in ordinary discourse facilitates 'non-visual' interpretations, which do not require current visual contact: Chuck is aware of the spot on the wall behind him because he has seen it earlier or been told about it. Jack is aware of the problems that lie behind him because he keeps being reminded of them. Etc. To the extent to which initial spatial inferences are suppressed, readers can adopt these 'non-visual' interpretations and feel no conflict. H1 thus predicts that philosophers will deem s-inconsistent 'aware'-items more plausible than undergraduates. The dominance of the perceptual use in specialist discourse in philosophy of perception could make suppression of spatial inferences and 'non-visual' interpretation more difficult for PoPs. This motivates restricting this prediction to Other Philosophers.
LEO's H2 claims that professional philosophers will be less susceptible to linguistic salience bias than undergraduates. Hence philosophers will be better able than undergraduates to suppress spatial inferences from epistemic uses of 'see'. As a result, H2 predicts, philosophers will deem s-inconsistent epistemic 'see'-sentences more plausible than undergraduates, and the plausibility differential between these sentences and corresponding 'aware'-sentences (as reflected by the effect size for this comparison) will be smaller for philosophers than undergraduates.
LUO's H3 claims that exposure to different usage patterns renders specialists susceptible to linguistic salience bias at different points and leads to non-suppression of inappropriate inferences from different words. For high-frequency words, differences between specialists and others will arise only from outright dominance reversals in specialist discourse. We observe a clear reversal for 'aware of', whose perceptual use is clearly dominant in the philosophy of perception corpus. By H3, this renders PoPs less able to suppress initial spatial inferences from epistemic uses of the verb and leads PoPs to find s-inconsistent items with these uses less plausible. This reduction in plausibility should show up in comparisons between groups benefiting from similar levels of suppression ability, that is, PoPs and Other Philosophers. H3 thus predicts PoPs will judge s-inconsistent 'aware'-items with abstract objects less plausible than Other Philosophers. Predictions are summed up in Table 5.

Methods
Participants: All participants self-identified as native speakers of English (that is, all other participants were excluded from analyses). 92 undergraduate psychology students (first and second year) from the University of East Anglia participated for course credit. Their mean age was 19.6 (SD=2.89). 11 were male, 81 female. 9 Academic philosophers were recruited through an electronic mailing list, a blog announcement, and personal emails to members of 14 UK philosophy departments and to individually targeted experts who had made sustained contributions, including recent contributions, to the pertinent debates in the philosophy of perception (namely, to the debates captured by our PHILO-P corpus) (see Appendix C for details).
72 academic philosophers with a PhD in philosophy were assigned to the group of other philosophers because they reported no research or teaching in philosophy of perception. Mean reported age was 43.6 (SD=9.94). 49 were male, 23 female.
22 academic philosophers holding a PhD in philosophy were assigned to the philosophy of perception (PoP) group because they reported philosophy of perception as 'main' or 'primary research area' and at least 'some' teaching in the area. The small size of this sample (otherwise typical of studies in clinical psychology that examine rare mental health conditions) reflects the small size of the highly specialized population targeted. Mean reported age was 46.8 (SD=11.33). 18 were male, 4 female.
Materials: We used 48 critical items: six for each of the eight conditions (illustrated by examples 1a-4b, Section 5.1). S-inconsistent epistemic items (like 4a/b) employed the cancellation phrases 'that lie(s) behind him/her' and '[that] s/he has turned from', in equal number. There were 24 filler items. 'See' and 'aware' versions of items were rotated across two lists of materials, with approximately half of the participants completing each list. Participant instructions and critical items are provided by Appendix C.
Participants read items online via Qualtrics and rated their plausibility on a 5-point Likert scale anchored at 1 with 'very implausible', at 3 with 'neutral (neither plausible nor implausible)', and at 5 with 'very plausible'. Items were presented to each participant in random order. The main task was followed by demographic questions. In our analyses, we applied the Bonferroni-Holm correction to control for multiple comparisons associated with the several simple effects t-tests conducted (Armstrong, 2014;Cabin & Mitchell, 2000;Holm, 1979). The corrected significance thresholds are reported in square brackets after the relevant p-values.

Results
To preview findings, results bore out predictions from the linguistic salience bias hypothesis SBH and from H1, but not from H2 and H3. The most striking finding is that philosophers are better at deploying conceptual information than undergraduates (as per H1) -but this does not render them less susceptible even to the cognitive bias from which this ability seems most apt to shield them (pace H2). The findings speak against the linguistic expertise objection and fail to support the linguistic usage objection.
SBH predicts differences between conditions, within groups. Our key hypotheses H1-H3 predict differences between groups. We report first global analyses that provide the statistical justification for comparisons between conditions and between groups -and first evidence pertaining to our hypotheses. We then report comparisons between conditions, within groups, and finally comparisons between groups that directly assess our key hypotheses.

Comparisons between conditions
To decompose the interactions, we considered responses to items with visual and epistemic objects separately. Table 7 presents the results of these analyses, and the subsequent paired comparisons. Across all three groups, we observe the same pattern of significant differences across the board, including in the epistemic conditions where linguistic salience bias may assert itself: even though all three groups rated s-consistent epistemic items with 'see' and 'aware' equally plausible, all three groups deemed s-inconsistent epistemic items with 'see' less plausible than such items with 'aware' (for more detailed analyses, see Appendix D). This is evidence of linguistic salience bias (as per SBH) across all three groups. For philosophers (PoPs and others), we observed a medium effect of the verb manipulation ('see' vs 'aware') in the sinconsistent epistemic condition. This effect was larger (rather than smaller) than for undergraduates. These two findings provide first evidence against H2. For philosophers (PoPs and others), we further observed a large effect of the verb manipulation in the s-inconsistent visual condition. This effect was larger than the (medium) effect for undergraduates, due to higher ratings for 'aware'-items ( Figure 1). These finding are consistent with H1.

Comparisons between groups
We finally made comparisons between groups. There was little variability across groups in the s-consistent conditions. To assess the key predictions from H1-H3 (summed up in Table 5), we examined the s-inconsistent conditions. Figure 2 displays the means for ease of comparisons. S-inconsistent epistemic condition. A 2×3 mixed model (verb × group) ANOVA showed a significant interaction F(2,183)=3.63, p=.029, η 2 =.038. Independent samples t-tests examined whether our three groups gave different ratings to 'see'-and 'aware'-items, respectively. Pace To sum up: For all three groups, we found response patterns predicted by the linguistic salience bias hypothesis SBH. For all three groups, we thus replicated findings from previous studies with undergraduate participants (Fischer & Engelhardt, 2017;2019;. Previous studies combined eye tracking with plausibility ratings, in a laboratory setting; replication with a new online delivery format strengthens support for the SBH. Distributional semantic analysis (Sect. 4.2) provided further evidence, suggesting that all participants should be able to identify the need for suppression in epistemic contexts, so that present findings evidence the inability to suppress contextually inappropriate default inferences that the SBH predicts. Appendix D provides further analyses assessing this hypothesis. We now discuss H1-H3 in connection with the competing objections they motivate.

Assessing the linguistic expertise objection (LEO)
The linguistic expertise objection makes three assumptions (Sect. 2): (1) Philosophers are better than laypeople at contextualizing conceptual information, that is, at complementing and suppressing default information, as appropriate.
(2) Better contextualization ability renders philosophers' interpretations of case descriptions less susceptible to comprehension biases and, thereby, less sensitive to irrelevant factors (e.g., framing and order effects). (3) This also makes philosophers' interpretation of case descriptions more accurate. LEO infers from these assumptions that philosophers' intuitive judgments about verbally described cases are more stable (i.e., less susceptible to biases and irrelevant factors) and more accurate.
We put assumption (1) to the test by examining the hypothesis H1 that philosophers are better at suppressing default inferences, where these are contextually irrelevant. Our findings were largely consistent with H1: Academic philosophers' item ratings were more sensitive than psychology undergraduates' to differences in verb ('see' vs 'aware') and object (visual vs epistemic) ( Table 6), suggesting better ability to integrate default information activated by verb and object-noun. The best test for H1 is provided by cross-group comparisons of ratings for sinconsistent 'aware'-items, which require suppression of default inferences but involve no complications from linguistic salience bias (Sect. 5.1). Academic philosophers gave higher ratings than psychology undergraduates to s-inconsistent 'aware'-items with visual objects. This suggests they were better at winning through to a 'purely epistemic' interpretation of such items. However, the predicted difference between Other Philosophers and undergraduates in ratings for s-inconsistent 'aware'-items with epistemic objects, while numerically notable, remained shy of even marginal significance upon correction for multiple comparisons. These findings offer qualified support for the hypothesis H1 that philosophers are better than laypeople at suppressing contextually irrelevant default inferences.
We put LEO's assumption (2) to the test by examining the hypothesis H2 that academic philosophers differ from psychology undergraduates in being less susceptible to the linguistic salience bias. This comprehension bias is most apt to be mitigated by better suppression ability and affects polysemy interpretation -at which analytic philosophers can be plausibly thought to excel (Sect. 3.2). A potential measure of susceptibility to the bias is the effect size of the comparison between ratings for 'aware'-and 'see'-items in the s-inconsistent epistemic condition: These sentences mean the same, on the intended purely metaphorical interpretation. Even so, we observed a medium-sized effect for philosophers, which was almost twice as large as for undergraduates (Table 7). However, the larger effect size for philosophers is primarily due to philosophers giving higher 'aware' ratings, rather than lower 'see' ratings. Higher ratings for s-inconsistent 'aware' items are promoted by pragmatic inferences (Manner inferences with the M-heuristic, see Levinson, 2000, pp.136-137): Preference of the marked expression 'aware of' over the simpler alternative 'see', rendered salient by our materials, suggests that the situation talked about deviates from the seeing-stereotype associated with the simpler alternative. This inference supports suppression of contextually inappropriate default inferences from 'aware' (see Sect. 3.1). We interpret philosophers' larger effect size as indicative of better pragmatic inferencing skills, rather than worse inhibition. On this interpretation, our findings do not show that philosophers are more susceptible to linguistic salience bias than undergraduates. However, our groups' equally low 'see' ratings do show that philosophers are no less susceptible to the bias.
To test LEO's third assumption, that better suppression ability will render philosophers' interpretations of case descriptions more accurate, we consider ratings for items with epistemic objects, whose interpretation requires suppression of default inferences from either of our two verbs. The intended interpretation of these items is made explicit by knowledge attributions like, e.g., 'Jack knows what problems he had in the past'. An earlier study (Fischer & Engelhardt, 2020, Appendix A) elicited plausibility ratings for these knowledge attributions from psychology undergraduates. The attributions were rated distinctly plausible (mean rating 4.03, SD=.37). We can use this mean rating as a norm of accuracy, to assess present ratings in the relevant (s-inconsistent epistemic) conditions ( Figure 2). Ratings for 'see'-sentences did not differ between groups, all means were neutral (not significantly above mid-point 3; see Appendix D), and thus equally inaccurate. Mean ratings for 'aware'-sentences did not significantly differ between groups, either. They were significantly above mid-point '3' for undergraduates and Other Philosophers (while the small sample size prevented ratings from philosophers of perception to take this hurdle, if by a whisker, upon correction for multiple comparisons; see Appendix D). Philosophers' mean ratings were merely numerically closer to 4. All groups made the same judgments about these items (deemed them plausible)philosophers just did so slightly more emphatically, getting closer to our accuracy norm.
This largely refutes LEO: Philosophers do seem better at suppressing inappropriate default inferences (as per H1), at any rate where these inferences are not supported by linguistic salience bias. As a result, philosophers do make some slightly more accurate plausibility assessments. However (pace H2), philosophers are no less susceptible to the linguistic salience bias than undergraduates, and the judgments affected by this bias are not more accurate when coming from philosophers rather than undergraduates. While H1 would benefit from further support, present findings suggest a striking conclusion: Philosophers' likely better ability to deploy conceptual information does not render them less susceptible even to the cognitive bias from which this ability seems most apt to shield them.

Assessing the linguistic usage objection (LUO)
The key finding, that (pace H2) philosophers are no less susceptible to the linguistic salience bias than laypeople, secures the starting point of the new linguistic usage objection (Sect. 3.3). The finding entails that pronounced salience imbalances arising from the dominant use of ordinarily subordinate uses of words, in specialist discourse, can lead specialist philosophers to go along with inappropriate default inferences from those words, which laypeople (and philosophers with other specializations) avoid (as per H3).
Our study examined this possibility by considering inferences from 'aware of', which is predominantly given a perceptual use in key debates in the philosophy of perception, while a non-perceptual, purely epistemic use is dominant in ordinary discourse. However, we did not find any evidence of persistent spatial inferences from non-perceptual uses, in the judgments of philosophers of perception who engage with the relevant debates extensively enough for their linguistic exposure patterns to be affected by them. The small size of this highly specialist population was reflected in the small size of our sample. This places a caveat on our findings. We did observe the pattern of numeric results predicted by H3: Philosophers of perception gave s-inconsistent epistemic items with 'aware' mean ratings that were numerically lower than mean ratings from other philosophers with arguably equal conceptual competence but different linguistic diet; but the difference remained so far shy of significance that even a sample comprising the entire specialist population of interest is highly unlikely to produce a significant difference (Sect. 5.3).
The relevant PHILO-P corpus contained only 375 occurrences of 'aware of' among its 1 million words -less than a fifth as many occurrences as 'see'. This suggests that despite its prominence in the targeted debates, 'aware of' may still be used too infrequently in the philosophy of perception for its use in this specialist discourse to influence overall relative exposure frequencies of the high-frequency verb (Foraker & Murphy, 2012). Even philosophers of perception contributing to and teaching the relevant debates will be exposed to the word more often in ordinary or generic academic discourse, where the verb's purely epistemic use dominates. Despite the dominance reversal in specialist debates, they will overall encounter the word more in its purely epistemic use -like other philosophers and laypeople. Hence they are no worse at suppressing inappropriate perception-related (spatial) inferences from the verb.
Present findings suggest that common words must be used very frequently in specialist discourse, for even an outright dominance reversal to create new vulnerabilities to inappropriate default inferences. This considerably narrows the scope of the linguistic usage objection: For new vulnerabilities to be created, it is not enough that an irregular polyseme has a different dominant use in specialist discourse -specialists must also use the word very frequently, in such discourse. This restricts LUO to rather few plausible candidates, like the verb 'to know'. 10 By largely refuting LEO and mitigating LUO, our findings defang the 'master argument' against experimental philosophy's lay-expert inference (Sect. 3.3). They reduce principled objections to local difficulties.

Main findings and methodological consequences
In summary, we found that professional philosophers are better at deploying conceptual information than laypeople (psychology undergraduates): they are better at suppressing contextually irrelevant default inferences from words. Even so, philosophers are no less susceptible to the cognitive bias this competence seems most apt to shield them from, viz., the linguistic salience bias. This comprehension bias allows contextually inappropriate default inferences to influence utterance interpretation and further cognition. It does so under conditions which frequently recur in philosophy (Sect. 3.2): where unbalanced polysemous words are used in a subordinate sense, to talk about cases that pull apart features that go together, in the associated stereotype. Neither the observed difference in conceptual competence nor marked differences in linguistic usage between expert and ordinary discourse lead this bias to result in notable differences between lay and expert judgments. Since this comprehension bias is the bias most likely affected by the examined difference in conceptual competence, it seems unlikely that the observed difference in competence will render philosophers less susceptible than laypeople to any cognitive bias and result in markedly different case judgments. In a nutshell, philosophers' better conceptual competence does not make their judgments more stable or greatly more accurate than those of laypeople.
Present findings have productive methodological consequences for experimental philosophy. First, they support experimental philosophy's lay-expert inference in the face of linguistic expertise and usage objections -the arguably most promising versions of the expertise objection (see Sections 2-3). Present findings refute these objections in a perhaps unexpected way. Expertise objections assume there is a difference in expertise or competence between philosophers and laypeople, and that this difference makes philosophers' case judgments less susceptible to cognitive biases and irrelevant factors (Sect. 2). Present findings provide some evidence of potentially relevant differences, but reveal these differences need not make a difference: We found some evidence of differences in conceptual competence between philosophers and laypeople, and documented a difference in linguistic diet; but these differences did not translate into different susceptibility to even the most pertinent cognitive bias, or render philosophers' judgments appreciably more accurate. This suggests that contributions to 'negative', 'restrictionist', or 'evidential' experimental philosophy can work with lay participants to assess claims about the stability and accuracy of philosophers' judgments.
Second, present findings open up new avenues for these related research programs. Contributions to 'negative' and 'restrictionist' experimental philosophy have elicited sensitivity to order and framing effects (reviews: Machery, 2017;Mallon, 2016). Linguistic salience bias explains contextually inappropriate inferences that lead to framing effects (like 'see' vs 'aware of'). For example, when laypeople are asked to imagine philosophical zombies that have bodies like ours and behave like us, but where 'all is dark inside', twice as many people accept that the imagined beings lack conscious experience when these being are described as 'zombies', rather than 'duplicates', and this framing effect is explained by linguistic salience bias . Indeed, given that the bias asserts itself under conditions that frequently recur in philosophy, it is arguably a major source of philosophically relevant framing effects. The advance from eliciting to explaining (some) framing effects facilitates a move from purely negative to more specific and constructive findings: The mere elicitation of such effects allows us to infer only that intuitive judgments about the topic at issue are unreliable (cf. Machery, 2017, pp.77-85). By contrast, explanations of case judgments that invoke the linguistic salience bias allow us to adjudicate between judgments elicited by different frames, and identify biasing and non-biasing frames. The finding that the linguistic salience bias affects philosophers and laypeople equally means that psycholinguistic findings about this comprehension bias can be deployed for the restrictionist purpose of identifying conditions under which philosophers may (not) trust their intuitions (e.g., Weinberg, 2015).
Moreover, the finding allows evidential experimental philosophy to expand its philosophical remit, and assess not only case judgments in thought experiments but also verbal reasoning in philosophical argument. Psychological findings about how cognitive biases affect verbal reasoning help expose previously undetected fallacies. A number of studies with lay participants followed up the suggestion that linguistic salience bias leads to previously undetected fallacies of equivocation, for example, in philosophical arguments about perception: arguments 'from illusion' and 'from hallucination' rely on default inferences from special ('phenomenal') uses of appearance-and perception-verbs that are licensed only by their dominant sense and cancelled by the sentence or discourse context (Fischer & Engelhardt, 2016;2017;Fischer, Engelhardt, & Sytsma, 2021;. These and other philosophical arguments have been advanced mainly by professional philosophers. The finding that professional philosophers are no less susceptible to linguistic salience bias than laypeople provides the necessary empirical foundation for this extension of evidential experimental philosophy. Present findings also have ultimately productive methodological consequences for philosophical thought experimentation. The debate about whether expertise renders philosophers' case judgments immune to factors that vitiate lay judgments have developed into a more wide-ranging debate about the soundness of the method of cases -specifically, about whether non-accidental features of this method systematically undermine the reliability of both lay and expert judgments. A focus of debate has been the 'esotericity' of the cases considered (e.g., Cappelen, 2012;Machery, 2017;Weinberg, 2015;Williamson, 2016). To test modal implications of philosophical theories, thought experiments must consider cases that are unusual (which we hardly, if ever, observe or read/hear about); to adjudicate between competing theories that agree about typical cases, they must consider cases that pull apart features that typically go together (Machery, 2017, pp.113-120). Critics of the method suggest that these features promote unreliability in both lay and expert judgments (ibid.). In the only study to date to specifically address this suggestion, Schindler and Saint-Germier (2020) examined thought experiments from physics involving cases with these two 'disturbing' features. They found a clear majority of expert physicists and laypeople made correct judgments about five of six cases presented. 11 These first findings suggest that, to be viable, the criticism of the method of cases needs to be developed through causal hypotheses that propose specific links between the two disturbing features and unreliability.
Present findings motivate such hypotheses. To describe cases which pull apart features that typically go together, philosophers frequently fall back on familiar words associated with a stereotype that combines those typically co-occurring features (e.g., 'see' for cases of hallucination). Where they cannot fall back on an established subordinate sense of the word, philosophers will create a new special use. Either way, the interpretation of the word ('see') in the description of the case (hallucination) requires suppression of the automatically activated typical feature (object of sight is in front of the viewer) that has been 'pulled away' and cancelled by contextual information (e.g., the information that the protagonist hallucinates). Where this happens, linguistic salience bias is liable to arise (Sect. 3.2). Readers of the case description are then prone to only partially suppress the irrelevant feature and integrate it to some extent into the situation model that informs further judgement and reasoning about the case (cf. Sect. 3.1). The case judgment of interest will be unduly influenced by the cancelled feature that judges are meant to set aside. Linguistic salience bias can thus affect judgments about cases that pull apart typically co-occurring features, and render the judgments unreliable. Our key finding reveals this problem arises to the same extent for laypeople and philosophers. The problem may be exacerbated where cases are also unusual (like hallucination): People (including philosophers) tend to know little about unusual cases. Suppression of contextually irrelevant default information is aided by integration with background knowledge (Fischer & Engelhardt, 2017) activated by discourse context (Metusalem et al., 2012). Where cases are unusual, paucity of background knowledge makes it more likely that irrelevant default information remains unsuppressed and unduly influences judgments of laypersons and experts alike.
Insights into specific sources of the problem are productive, as they allow us to work around the problem. Where judgments are rendered unreliable by linguistic salience bias, we can rephrase case descriptions so that they do not trigger inappropriate default inferences we cannot suppress: In describing cases that pull apart typically co-occurring features, thought experimentalists need to avoid words whose dominant sense is associated with an 'unhelpful' stereotype that comprises the typically co-occurring features the thought experiment pulls apart (e.g., the zombie stereotype comprises both lack of conscious experience and attacks and eats humans; . Rather, they need to find descriptions that do not trigger contextually irrelevant inferences (e.g., 'physical duplicate that lacks conscious experience'). Since the linguistic salience bias arises only where a polyseme has a clearly dominant sense (Sect. 3.2), it may occasionally also be viable to recruit a balanced polyseme whose main sense is associated with an unhelpful stereotype but is not clearly dominant.
As we have seen above, extensions of restrictionist experimental philosophy can lead to insights into the sources of judgment unreliability. Such insights allow thought experimentalists to avoid the pinpointed pitfalls by developing suitable case descriptions. Further empirical study of the sources of unreliability will reveal to what extent the method of cases remains viable. In any case, thought experiments will need to become more similar to psychological experiments: The development of suitable case descriptions requires preliminary work of the sort standardly involved in developing materials for psychology experiments. To guard against linguistic salience bias, for example, thought experimentalists need to explore word-related stereotypes (e.g., through listing and sentence completion tasks, cf. McRae et al., 1997) or examine relative occurrence frequencies of different senses (Fischer & Engelhardt, 2020, pp.434-5). Present findings show that philosophers need to take these precautions also when developing case descriptions for their own benefit. Just like psychological experiments, philosophical thought experiments require some empirical preparation -also when conducted by expert philosophers.

Supplementary Materials Appendix A -Corpus studies
We conducted three corpus studies to assess the relative occurrence frequencies of different senses or uses of 'see' and 'aware of' in ordinary discourse, academic philosophical discourse, and specialist discourse in a sub-field of philosophy, respectively.

Materials:
We used three corpora which we regarded as representative of the three types of discourse of interest: -The British National Corpus contains a written component (texts from books, periodicals, letters, reports, etc.) and a spoken component (from transcribed recordings), making it better suited than other corpora to model ordinary discourse. Its focus on British English is appropriate in the light of the participant samples of our main study. The corpus contains 100 million words, including 172,643 instances of the verb 'see' and 6,920 instances of 'is aware of', across all tenses. - The SEP/IEP corpus is a topically generic philosophy corpus built from a snapshot of two philosophical encyclopedias (Stanford Encyclopedia of Philosophy and Internet Encyclopedia of Philosophy), taken in February 2018 (Sytsma et al. 2019, p.227). It contains 29 million words, including 46,384 instances of 'see' and 2,389 instances of 'aware of'. - The PHILO-P corpus is a philosophy of perception corpus built for this study, made up of ten monographs that shaped philosophical debates about sense-data, (challenges to) naïve and direct realism about perception, and the 'problem of perception' (Austin 1962;Ayer 1940;Ayer 1956;Brewer 2011;Broad 1923;Jackson 1977;Price 1932;Robinson 1994;Russell 1912;Smith 2002). It contains 1 million words, including 2,048 instances of 'see' and 375 instances of 'aware of'. We manually annotated occurrences of 'see' and 'aware of' in their context sentences. We did so for randomly selected subsets of 1000 occurrences of each verb from the BNC, and 1500 occurrences from SEP-IEP. We annotated all occurrences of these verbs, in PHILO-P.
Procedure: To annotate instances of 'see', three judges (the first author and two PhD students ignorant of the research questions) were given a list of 12 senses of 'see', recognized by the Macmillan English Dictionary for Advanced Leaners (MEDAL) or Oxford English Dictionary (OED), with dictionary explanations and examples (from MEDAL, except for the phenomenal sense, Sense 12 below, which is only given in the OED). 12 We regarded it as an open question whether any of these dictionary explanations would individually capture a perceptual sense, because different dictionary senses can be devised to characterize uses referring to situations of different degrees of specificity and complexity (so that, e.g., WordNet 3.1 recognizes 24 senses of 'see', to MEDAL's 11). More complex schemas or scripts (e.g., for doctor-visits) will then often include or adapt the basic schema associated with the visual sense (e.g., when you 'see the doctor', you typically look at and notice them, with your eyes). Judges were therefore asked to assess independently (1) which of the given dictionary-senses the sentence used, and (2) whether this use was 'perceptual' or 'non-perceptual'. This second classification was intended to capture uses which activate the basic schema without subsequent suppression (though possible complementation by a richer schema). For this classification, judges were instructed to ask themselves: Does the subject most probably see with their eyes whatever it is s/he is said to 'see', in the situation talked about? Alternatively, where 'see' can be replaced by 'notice', ask: Does the subject use her eyes in order to notice whatever it is she is said to 'see'? Since 'be aware of' has only one dictionary-attested sense ('knowing about a situation or fact', MEDAL) annotators merely classified its instances as 'perceptual' or 'non-perceptual'. They did so by asking themselves: Is the protagonist, in the situation talked about, seeing, hearing, tasting, smelling, or feeling whatever s/he is said to be aware of? ('Feeling' in a wide sense, including proprioception, introspection, sensing of features in the environment not straightforwardly reducible to any one of the five senses, etc.) In each sentence, only the first occurrence of the verb of interest was considered. Final classifications were jointly determined after discussion of previous independent annotations. Instances that remained unclassifiable due to partial transcription, inherent ambiguity, persistent disagreement between judges, or because they employed phrasal verbs (like 'see through'), were discarded.

Results
Occurrence frequencies for each sense of 'see' in our three corpora are shown in Table A-1. We noted that, in the BNC sample, virtually all occurrences of senses 1, 2, and 3 were classified as perceptual, by all judges. By contrast, occurrences of other senses were deemed 'perceptual' only 19% of the time for sense (7), 20% for sense (12), and 22% for sense (8). We therefore regarded senses 1, 2, and 3, as providing a good approximation at a 'visual sense'. For further analysis, we grouped together these visual senses as well as the epistemic senses 4 and 7 (used to state that S possesses or acquires particular knowledge or understanding) and the doxastic senses 5 and 6 (as in 'Fanon sees paranoia as symptomatic of racism' and 'She saw Messingham's hand in this somewhere'). Occurrence frequencies for these sense-clusters and the philosophically important phenomenal sense 12 ('Hallucinating, Macbeth saw a dagger') varied across our three corpora ( Figure A-1). We found yet more marked differences in proportions of perceptual vs non-perceptual uses, across the samples from our three corpora (see Table A-2). Appendix B -Distributional Semantics study We constructed distributional semantics (DS) representations of each occurrence of 'see' and 'aware' of' in our annotated samples, and used them to train a classifier to label all occurrences of these verbs in our three corpora as either perceptual or non-perceptual. 13 To obtain such representations, we constructed neural embeddings that 'contextualize' representations of words as a function of the particular utterance they occur in, corresponding to the speaker meaning of the term (Peters et al., 2018;Devlin et al., 2019).

Data processing
Using the annotated sentences from Appendix A, we computed the contextualized vectors of 'see' and 'aware of' in each sentence, as obtained through Bidirectional Encoder Representations from Transformers (BERT; Devlin et al., 2019), in its pretrained, 'Base' version. BERT Base representations each take the form of a 768-dimensional vector which encapsulates the lexical meaning of the word in the given context (e.g., sentence) in which it is encountered. The model is made of 110 million parameters trained on 3.3 billion words from an English Wikipedia snapshot and the BooksCorpus (Zhu et al., 2015). For reference, we provide below a visual representation of the BERT vector space for 'see' and 'aware' in each one of our annotated corpora, flattened from 768 to 2 dimensions using Principal Component Analysis. Each point in a space corresponds to one particular use of 'see'/'aware' in a given sentence. We expect perceptual and non-perceptual usages to form separate clusters. The plots for the BNC, and for the SEP and PHILO-P 'see'-sentences, show strongly separated clusters of perceptual instances (on the right of the BNC figures and the left of SEP and PHILO-P). This indicates that at least some usages have very specific and recognizable patterns of use in our corpora (e.g., sense 3 above). The plot for PHILO-P 'aware' instances also illustrates the strong class imbalance in that corpus, with the figure consisting of mostly red (perceptual) points. Overlapping clusters indicate more similar sets of usages. However, clusters that show overlap in two dimensions may be separable in three or more dimensions. Our classifiers' accuracies -all above 90% (see Tables 3A and 3B of the paper) demonstrate that this is indeed the case in 768 dimensions.

Classifier
To validate the representations, we trained a perceptual vs non-perceptual classifier, using as input the vectors extracted from the data, and as output the consensus label given to an instance by our annotators. The classifier was a simple Multilayer Perceptron consisting of two hidden layers with ReLU activation, and an output layer with softmax activation. We performed hyperparameter tuning using Bayesian optimization on a reserved portion of the data totaling 200 data points. 14 We then performed 5-fold cross- 14 We used the BayesianOptimization package: https://github.com/fmfn/BayesianOptimization. Last accessed January 28, 2020. validation on the rest of the sentences, averaging accuracies over the five folds. Results are shown in Tables 3A and 3B of the paper.

Automatic deployment
We used our trained classifiers to automatically annotate all instances of 'see' and 'aware' in our two large corpora (BNC and SEP-IEP). However, non-curated data may contain phenomena unseen by our classifier, creating problems for its automatic deployment. Manual annotation of the BNC illustrated that it differs from our philosophical corpora in containing instances of 'see' that relate to phrasal verbs (e.g., 'to see something through') and other unclassifiable usages (cf. Appendix A). When trained over decidable instances only, the classifier is then forced to decide between the perceptual and nonperceptual classes and may wrongly assign a label to an unclassifiable instance.
To address this issue, we retrained our BNC classifier, including a third class corresponding to undecidable instances. As training sample, we used all instances of 'see' that obtained full agreement amongst the three annotators, including those where the label was marked as undecidable. This resulted in 887 training instances, including 624 perceptual, 246 non-perceptual, and 16 undecidable. When training a classifier over such a skewed distribution, the minority class is usually ignored at training time. We therefore applied memory replay to the minority class. Memory replay is used in machine learning to prevent forgetting of rare but potentially useful experiences (Schaul et al., 2016). In our implementation, we repeated each instance of the undecidable class 5 times in the training data. We retrained using the set of hyperparameters used for the two-class training. The classifier obtained an average of 94% accuracy over 5 folds. The addition of a third class thus increased the classifier's overall accuracy.
The BNC contains 150,305 instances of see in its non-academic section, and 20,914 in the academic portion (ACPROSE). 15 We ran the three-class classifier separately on those two subsets, obtaining the results reported in Table 2 of the paper. For the non-academic portion of the corpus, we obtained 96,240 perceptual instances, 51,388 non-perceptual and 2,677 undecidable. The proportion of perceptual instances is thus 96240 / (96240 + 51388) = 0.65. For ACPROSE, we obtained 8,814 perceptual instances, 12,079 non-perceptual and 21 undecidable. The proportion is thus 8814 / (8814 + 12079) = 0.42.

Cross-domain annotation
We report results from a cross-domain classification in Tables 4A and 4B of the paper. This exercise examined whether the latent features used to capture the perceptual / non-perceptual distinction in one data source are the same as the ones modelling the distinction in another data source. If so, we can infer that the distinction is fundamentally the same in both corpora.
Results from cross-domain classification indicate that for 'see', the BNC and SEP model each other very well: a classifier trained on the BNC achieves 96% accuracy on SEP (just two points below the in-domain classifier); a classifier trained on the SEP achieves 87% accuracy on the BNC (3 points below in-domain classification). Similarly, SEP and PHILO-P model each other well, with minimal losses compared to in-domain classification. Although results are less convincing when comparing the BNC and PHILO-P, accuracies remain well above baseline. For 'aware', both the BNC and SEP model PHILO-P accurately, and the SEP classifier also performs well on the BNC. There is a larger loss in performance when applying the BNC classifier to SEP-IEP, possibly because the BNC encounters only relatively few perceptual uses of 'aware' (21%) in the course of training. However, accuracy remains well above baseline. The PHILO-P classifier struggles in modelling the BNC and SEP data, but this is expected given the extreme class imbalance of the data it was trained on, and the small size of this data set. Barring mathematical issues due to class imbalance, we conclude that our trained classifiers are very robust in cross-domain settings.

Appendix C -Experiment: Methods
This appendix contains further information about participant recruitment, instructions, and experimental materials for our main study.

Participant recruitment
To recruit academic philosophers as participants for the main study, invitations with links to the questionnaire were circulated through the Philos-L mailing list and the Daily Nous blog, and emailed to 15 A few hundred of these 172,643 instances were lost, as BERT cannot process very long sentences.
These items included no outliers (attracting mean ratings 3 SDs from the mean in any condition).

Appendix D -Experiment: Further analyses
Further analyses examined (1) demographic factors and (2) the linguistic salience bias hypothesis.

Demographic factors
Our sample of psychology undergraduates was predominantly female and young. Our two samples of academic philosophers were predominantly male, and (inevitably) older. These imbalances within and across participant samples motivated correlational analyses of gender and age.

Gender
We first included gender as a covariate in the 2x2x2x3 mixed-model ANOVA. This did not produce a main effect of gender, and gender did not interact with any of the other variables. We then conducted bivariate correlations between gender and ratings in the eight within-subject conditions for the entire sample of participants (N=186). We found only one significant correlation, in the inconsistent-awarevisual condition r(186)=-.27, p<.001, where males gave higher plausibility ratings (3.83 vs 3.40). For all other conditions, correlations were not significant (all p's >.27).
These findings suggest that gender did not influence responses. The lone correlation observed in the inconsistent-aware-visual condition may reflect the fact that most female participants (75%) were psychology undergraduates, whereas most male participants (85%) were academic philosophers: it may be an artefact of other, cognitive, differences between students and philosophers. This suggestion is supported by the fact that the inconsistent-aware-visual condition is the condition with the most pronounced difference in mean ratings between students and philosophers. To examine the suggestion, we considered correlations between gender and ratings in this condition separately, for undergraduates, PoPs, and Other Philosophers. The correlation observed in the whole sample disappeared (all p's >.16).

Age
While all participants reported their gender, 4 (of 92) undergraduates, 6 (of 22) philosophers of perception, and 22 (of 72) other philosophers did not report their age, placing a caveat on our analyses.
We included age as a covariate in the 2x2x2x3 mixed-model ANOVA. We found age variously interacted with the within-subjects variables but did not produce a main effect (p=.20). We then conducted bivariate correlations between age and the eight within-subject conditions for the entire sample (N=186). As was the case for gender, we found only one significant correlation, again in the inconsistent-aware-visual condition r(154)=.30, p<.001, where older participants gave higher plausibility ratings. All remaining correlations were p>.05. Following up on the interactions, we finally ran the correlations for the three participant groups separately. There were two significant correlations, both in the Other Philosophers group: inconsistent-visual-aware condition r(50)=-.30, p=.038 and inconsistent-aware-epistemic condition r(50)=-.33, p=.020. The older these philosophers are, the less plausible they find items in these conditions. Arguably, the paucity of relevant data points from philosophers of perception (only 14 reported their age) and the restricted age range of the undergraduate sample prevented significant correlations in these groups.
Age thus correlates differently with ratings in the key inconsistent-visual-aware condition, in the whole sample (with a positive correlation) and the Other Philosophers group (negative correlation). This suggests that the positive correlation observed in the whole sample reflects another difference than age, between undergraduates and philosophers. It arguably reflects the same difference between these groups as the correlation between gender and ratings we previously found in only that condition. We suggested a relevant difference: Due to training or selection effects, philosophers will be better than undergraduates at suppressing contextually irrelevant default inferences (Sect. 3.1). This suppression ability influences ratings without interference from linguistic salience bias precisely in the two inconsistent aware conditions (Sect. 5.1), where Other Philosopher's ratings are negatively correlated with their age. Suppression ability is influenced by lexical and world knowledge, and by inhibition (Sect. 3.2). Our items described familiar situations with high-frequency words, so that any differences in such knowledge will be less relevant than differences in inhibition. Unlike such knowledge, fluid intelligence including inhibition declines in advanced age (Wang & Kaufmann, 1993;DeLuca et al., 2003). We submit this explains the observed correlations: Philosophers have higher suppression ability than undergraduates, but the presently most relevant component of this ability declines with age, while even older philosophers' suppression ability continues to exceed that of psychology undergraduates. In the condition in which this cognitive difference leads to the biggest difference in mean ratings between undergraduates, who are mostly young and female, and philosophers, who are mostly older and male, this cognitive difference due to selection and training leads to correlations of ratings with age and gender.

Linguistic salience bias
To assess whether the plausibility differences predicted by the linguistic salience bias hypothesis SBH translate into categorical differences between conditions, we conducted t-tests with a test value of '3' for mean ratings of each condition. Mean ratings significantly above this mid-value indicate that items were deemed distinctly plausible, mean ratings significantly below '3' indicate items were deemed distinctly implausible, and ratings not significantly different from '3' indicate items were deemed neither plausible nor implausible, but neutral. Findings displayed the same pattern across our three groups (psychology undergraduates, expert philosophers of perception, and other philosophers) ( Table D-1): S-inconsistent 'see'-sentences with visual objects were deemed distinctly implausible. S-inconsistent 'see'-sentences with epistemic objects were deemed neutral. Items in all other conditions struck participants as distinctly plausible (with the one caveat below). For all groups, the key plausibility difference predicted by SBH was categorical: Whereas s-inconsistent 'see'-sentences with epistemic objects were deemed neutral, s-inconsistent 'aware' sentences with epistemic objects were deemed distinctly plausible (with one caveat: philosophers of perception gave these 'aware'-sentences numerically higher mean ratings than psychology undergraduates, but the difference to mid-point remained just shy of significance, upon correction for multiple comparisons, due to the small size of the sample). Next, we examined whether the specific formulation of the cancellation phrase used for sentences with epistemic objects affected plausibility ratings. We used the phrases 'that lie(s) behind him/her' and '[that] s/he has turned from', in equal number. The latter supports an epistemic interpretation: The phrase implies the agent previously 'looked at' the patient (problem, etc.), that is (unpacking the familiar vision cognition metaphor), previously thought about the problem, etc. Thus interpreted, 'turned from' implies that the agent knows there is, e.g., a problem. This implication will promote an epistemic interpretation of relevant items. It will also increase their plausibility over otherwise similar items with 'behind' (which imply nothing of the sort) -if the verb is given an epistemic reading. We therefore considered separately the mean ratings for items using these different cancellation phrases, per group. Indeed, all groups deemed both 'see'-and 'aware'-items with 'turn from' more plausible than items with 'behind' ( , pp.86-88; 2020, pp.429-30). It remains unclear why this difference failed to replicate in the present study. The only substantial difference between the studies is the delivery format: Previous studies were laboratory-based and simultaneously involved eye tracking. The present online questionnaire format may have led undergraduate participants to dispense less effort and go along with default perceptual readings of 'aware of' (see Sect. 5.1), in the absence of further contextual support (beyond the abstract patient-noun) for an epistemic reading (offered by 'turned from', but not 'behind'). By contrast, Other Philosophers' ratings preserved the difference between 'see'-and 'aware'-items with both 'turned from' and 'behind' and are fully consistent with the key prediction from the SBH. Finally, we examined whether the suppression difficulties we observed can indeed be attributed to the effect of linguistic salience bias on the processing of subordinate uses of polysemous words or whether suppression difficulties of the same magnitude can be observed also where the verb is applied in its dominant sense to stereotype-incongruent situations, as in our s-inconsistent visual see condition, where spatial inferences from visual uses of 'see' are cancelled by s-inconsistent sequels ('Chuck sees the spot on the wall behind him'). We therefore examined whether participants would make a higher number of incorrect judgments potentially resulting from suppression difficulties, in the epistemic than the visual inconsistent see condition.

2019
In both conditions, participants will attempt to come up with interpretations that render items true (e.g., Chuck sees the spot in a mirror) and assess how likely the resulting scenario is to obtain under circumstances inferable from the sentence context. In the visual condition, these scenarios are not very implausible but certainly less than plausible -on a 'truth-making' interpretation, these items should attract ratings of '2' or '3'. But these interpretations require suppression of the spatial inference. Hence lower ratings, of '1', provide evidence of suppression difficulties that prevent a consistent and 'truthmaking' interpretation of these s-inconsistent items.
For 'see'-items with epistemic objects, complete suppression of the spatial inference should yield a purely epistemic interpretation (e.g., Jack knows what problems he had in the past). Fischer and Engelhardt (2020, Appendix A) elicited ratings for paraphrases of epistemic items that make explicit this interpretation. Such paraphrases of s-inconsistent epistemic items attracted a mean rating of 4.03 (SD=.37, on a 5-point Likert scale). We can use this mean rating as a norm of correctness: Ratings below '4' (i.e., '1'-'3') provide evidence of suppression difficulties that prevented the intended interpretation.
To assess whether participants face suppression difficulties of the same magnitude in both cases, we therefore compared the proportions of 'too low' responses to s-inconsistent 'see'-items in the visual and the epistemic conditions. To control for the different number of relevant response options (1 vs 3), we multiplied the number of '1' responses to visual items by 3. Since our hypotheses predict no differences in the ratings of philosophers of perception and other philosophers for 'see'-items, we combined these two groups for the present analysis. Controlling for the difference in response options yielded, for our 92 undergraduates, a proportion .35 of 'too low' ('1') responses to visual inconsistent 'see'-items vs .54 of 'too low' ('1'-'3') responses to epistemic counterparts. Considering proportions permits t-tests. A paired samples t-test showed this difference was significant t(91) = 2.77, p=.007 [.025]. For our 94 philosophers, controlling for the difference in response options yielded a proportion of .59 of 'too low' responses to visual inconsistent 'see'-items, vs .63 to epistemic counterparts -a not significant difference t(93) = .44, p=.66 [.05].
The asymmetry found in undergraduate judgments provides evidence that the processing of epistemic uses of 'see' involves extra suppression difficulty in addition to those involved in processing utterances applying the dominant sense to stereotype-deviant situations. The absence of this asymmetry in philosophers is mainly due to a larger proportion of '1' ratings for inconsistent 'see'-items with visual objects, in philosophers' judgments. Since higher ratings for inconsistent visual 'aware'-items (as per H1) suggest that philosophers benefit from higher suppression ability than undergraduates, the larger proportion of '1' ratings for 'see'-items is arguably not due to suppression difficulties but may be a task artefact: The critical items make the contrast between 'see' and 'aware of' contextually salient, so that participants with higher verbal IQ may have concluded that, for the purposes of the experiment, 'aware' items provide a 'correct' description of stereotype-deviant cases of seeing, and that 'see' items are to be deemed 'incorrect' descriptions of such cases. We therefore infer from our undergraduate participants' ratings that the suppression difficulties we observed can indeed be attributed to the effect of linguistic salience bias on the processing of subordinate uses of polysemous words.