Imagine that your dog didn’t want to eat his food and he was lethargic. He wasn’t interested in going for a walk and didn’t even want his favorite toy. He normally is very hungry and loves to play, so you know something is wrong with your dog, but you don’t know exactly what. Who would you go to find out what is wrong with your dog? Why would you go to see that person?

It seems likely you would go see your veterinarian and for good reasons. The veterinarian has trained for years to be able to know what to look for, has developed skills to be able to make good decisions, often rather quickly, and has the practice to be able to use that knowledge. Of course, veterinarians are not perfect—they sometimes make mistakes. But on average, they are able to display consistent, repeatable, and superior performance in taking care of your dog compared to the average non-veterinarian. In other words, your veterinarian has expertise and can display expert performance concerning your dog’s health (Ericsson et al., 2006, 2007, 2018; Cokely et al., 2018). This expertise likely makes the veterinarian’s medical judgment about your dog better than your or your friends’ judgments. It also means that your veterinarian’s judgment is less likely to be influenced by factors that are extraneous to correct diagnoses and treatment (e.g., the veterinarian’s personality or mood). In this way, through training and deliberative practice, your veterinarian’s judgments about your dog’s health are likely to consistently just be better than non-veterinarian’s judgments.

One might think that something similar happens when one becomes a philosophical expert. On this line of thought, philosophical experts, through their training and deliberative practice, have better philosophically relevant intuitions than non-philosophical experts. Philosophers may have a better grasp of the key concepts and arguments or they may simply have more apt cognitive styles and strategies that make them more tuned to key elements of scenarios. In turn, the knowledge or cognitive styles may make their intuitions better, truer, and less susceptible to mistakes. This reasoning holds that philosophers, just like veterinarians, are therefore less likely to be influenced by extraneous factors like personality concerning their judgments in their area of expertise.

The reasoning expressed in the previous paragraph is consistent with what has been come to be known as the Expertise Defense. The Expertise Defense holds that through philosophers’ special training and abilities, extraneous features are less likely to influence their expert judgments. In this chapter, we review the Expertise Defense. We then go on to criticize the Expertise Defense in two ways. First, we provide arguments that the kind of expertise philosophers are likely to have is not likely to make some of their philosophical judgments better or less prone to problematic biases (for more on what exactly is problematic about these biases, see Chap. 6). Second, we provide direct evidence that at least some extraneous factors influence philosophers’ judgments in some paradigmatic examples. These two criticisms suggest that the Expertise Defense fails, at least as it pertains to philosophers’ reliance on intuitions for some central philosophical projects in some prominent philosophical areas.

The Expertise Defense

Perhaps the most common response to the empirical data about potentially problematic philosophical implications of variation in philosophically relevant intuitions is the Expertise Defense. The Expertise Defense has been articulated in various forms by several theorists (Sosa 2007a; Kauppinen, 2007; Ludwig 2007; Williamson, 2007, 2011). However, a common theme that unites all the various forms of the Expertise Defenses is the basic notion that philosophers are different from non-philosophers in one very important way. Unlike non-philosophers, philosophers are experts about the area in which they work. Like most fields, philosophy has theories and terms that are nuanced and difficult to understand. Philosophers, compared to the folk, are likely to understand these nuances and theories better. That richer and more sophisticated understanding of those terms and theories would make it less likely that those who are experts would be prone to the same problematic biases or judgment tendencies as the folk. Of course, these expert philosophers can incorporate folk intuitions into their theorizing, including potentially problematic biases, but the philosophers themselves are not likely to display those same problematic biases. If the Expertise Defense is correct, then the kinds of associations of philosophically relevant intuitions and personality we’ve documented in Chaps. 24 may be interesting but are not that problematic for the practice of philosophy.

We think that we have made the case that for many philosophical projects, intuitions play important (perhaps irreplaceable) roles. As such, these general approaches constitute an important element in the tradition of philosophy. We take tradition seriously. While there are often good reasons to alter or reject tradition, we take the position that there needs to be good reason to change or alter tradition. In other words, we accept that the burden of proof rests on us (see Horvarth, 2010; Sosa, 2009; T. Williamson, 2011). That means we need to provide reasons why the philosophical tradition of giving a central role to philosophy needs to be altered or abandoned.

At this point, a few words about what a burden of proof is and what we view as a reasonable assessment of when that burden has been satisfied are warranted. First, we take it as a truism that the burden of proof can be satisfied. Some theorists appear to set an exceptionally high bar. For example, Kauppinen observes that “the actual studies conducted so far have failed to rule out competence failures, performance failures, and the potential influence of pragmatic factors” (2007, p. 105). We agree. But empirical science simply cannot ever satisfy that burden—it is always possible that some other factor is responsible for observed effects. Empirical science just is not in the business of ruling out everything. Consequently, the burden has to be something less than that.

Some have attempted to satisfy this burden in one way or another (see, for some examples, below). Concerning the Expertise Defense, we adopt the position that the burden of proof has been satisfied when we demonstrate that expert philosophical intuitions vary similarly (or similarly problematically) with folk intuitions (J. M. Weinberg, Gonnerman, Buckner, & Alexander, 2010, p. 333). There is no need to satisfy the stronger claim that experts have exactly the same intuitions as the folks and are biased in exactly the same ways. After all, and consistent with our view presented below, expertise does matter in lots of domains and in many ways. And, there is no need to claim that personality alone is responsible for variation in some philosophical relevant intuitions (a very implausible view). Rather, we think that all we need is evidence that personality is among the factors that are responsible for philosophically relevant intuitions in expert philosophers. As such, one of the goals in this chapter is to provide evidence that expert philosophical intuitions are related to personality. If we achieve that goal, then we think we have largely discharged the burden of proof set out by defenders of intuitions in many philosophical projects.

Before we evaluate the Expertise Defense, we’ll start by offering a somewhat lengthy discussion concerning the scientific evidence about (a) what expertise is, (b) how people can acquire expertise, and (c) how we measure expertise. These three elements are essential to understanding what the Expertise Defense amounts to and whether we can tell if philosophers have the relevant expertise to deflect worries about their intuitions. Those not interested in an in-depth discussion of expertise and how it is developed can safely skip to the summary of key points on page 181.

Decision Making

One useful framework in psychology holds that human judgment and decision making performance is often characterized by the interplay of “fast and slow” thinking processes, also often referred to as differences in Dual Systems (Kahneman, 2011). The idea is that “humans have, in effect, two separate minds” (Evans & Frankish, 2009). System 1 is said to be evolutionarily older and rapidly gives rise to intuitions and emotion. The other system, System 2, is evolutionarily newer and thought to be primarily involved in deliberative and coherent rational thought. More specifically, the evolutionarily older System 1 (i.e., “fast”) processes may typically involve high capacity, fast, associative, parallel, unconscious, and automatic processes that give rise to intuitions and impressions. In contrast, System 2 (“slow”) processes are more evolutionarily unique to humans and generally tend to be slower, effortful, serial (i.e., unfolding one step at a time), and conscious, involving rule-based processes that are demanding of working memory and executive functions (e.g., activity in the dorsolateral prefrontal cortex of the brain). For the current purposes, and in accord with most available data, the Systems are often characterized as having a default-interventionalist architecture: System 1 generates intuitions based on past experience, associations, and emotions, while System 2 then monitors and potentially corrects or modifies those intuitions with logic or deliberation, assuming sufficient attentional resources and motivation (e.g., when one is not stressed, checked out, or thinking about too many things; see Kahneman, 2003, 2011).

The Dual Systems approach has been widely adopted, connecting research in most subfields of psychology as well as neuroscience, economics, and philosophy (Kahneman, 2003; Stanovich & West, 2000). The evidence that human cognition can be efficiently characterized by differences in automatic (e.g., intuitive) and deliberative processes is well-established and has been for about four decades (e.g., automatic v. controlled processes; Shiffrin & Schneider, 1977). Of course, there are some serious concerns about the specific instantiations of the dual systems theory and its predictive validity (Cokely, 2009; Gigerenzer & Regier, 1996; Moshman, 2000; Newstead, 2000; Newell, 1973; Osman, 2004). Nevertheless, the framework has proven very popular and useful in the decision sciences because of its broad explanatory power: Even though the theory does not allow for many specific predictions, it does help organize and interpret a wide range of results.

Dual Systems theory has been put to extensive use to explain the link between domain-general cognitive abilities and normatively superior (i.e., high-quality) decision making. To be clear, domain-general cognitive abilities refer to abilities, like one’s attentional control (e.g., working-memory capacity; Cokely, Kelley, & Gilchrist, 2006), that tend to be at least a little beneficial on lots of tasks (e.g., a person who is better able to regulate their attention can do so on tasks at work and on tasks at home). In contrast domain-specific abilities like expertise tend to be profoundly beneficial for very specific tasks only (e.g., a chess master is excellent at chess but will be no better than any other amateur when presented with a different strategy game like Poker). Research shows that domain-general cognitive abilities, including intelligence, statistical numeracy, working memory, attentional control, and others, tend to predict more normative judgment and decision making in classical heuristics and biases tasks (i.e., abstract, laboratory experiences like choosing between risky gambles; see Cokely & Kelley, 2009; Cokely, Feltz, Ghazal Allan, Petrova, Garcia-Retamero, 2018).Footnote 1

In theory, the relationship between general abilities and superior decision making reflects differences in the interplay of System 1 and System 2 processes. More intelligent people are more likely to use System 2 to monitor and correct the output of System 1, or else they may disregard biased intuitions all together and use normative rule-based processes to calculate answers. For example, when faced with a risky prospect (e.g., the choice between two gambles), individuals with higher levels of attentional control tend to make more correct choices. That is, participants tend to act as-if they weight and integrate the available information in accord with an expected value model (i.e., multiply value by probability and select the option that will on average offer the highest expected payoff). Interestingly, however, research shows that even in highly simplified, paradigmatic tasks “smarter” people don’t tend to use more normative processes to make better judgments and decisions. Instead, System 2 processes appear to reflect qualitative and quantitative differences in simple deliberative heuristic search and problem understanding (i.e., elaborative encoding of stimuli) (Cokely & Kelley, 2009; see also Barton et al., 2009; Woller-Carter et al., 2012; and for examples in expertise, see Moxley, Ericsson). Theoretically, the links between domain-general abilities and rational decisions reflect a host of early selection metacognitive processes (i.e., thinking about thinking; Cokely & Kelley, 2009; Flavell, 1979).

What Is a Good Decision?

The emergence of the modern scientific debate on human rationality, or how people make decisions and what qualifies as a good decision, can be traced in large part to the Ages of Reason and Enlightenment (i.e. seventeenth and eighteenth centuries, respectively). During these times logic and careful, justifiable reasoning became highly prized by philosophers, empiricists, and political actors alike. As an example, consider the astronomer and physicist Pierre-Simon Laplace. Laplace’s legacy includes seminal contributions to probability theory; however, more important for our purposes, he also provided a description of a fictional omniscient being that captured the Zeitgeist of the times. This being, known as Laplace’s superintelligence, was envisioned as one who would know all the details of past and present and with this knowledge could readily make good choices and predict the future with perfect certainty (Gigerenzer, 2006).

For many people, Laplace’s vision of a decision maker who is omniscient and computationally unbounded may seem like an elaborate fantasy. Yet this fantasy or some version of it is fundamental to much of the research and theory in the modern economic, cognitive, and decision sciences. Some readers will find this surprising, or ironically unreasonable, but models of “rational man” and homo economicus are among the most central and influential models used in the allied decision and risk sciences. According to neo-classical economic theory people behave as-if they are unboundedly rational and make optimal (but not necessarily perfect) choices—choosing as-if they have solved a complicated decision calculus (Hastie, 2001; Shafir & Tversky, 1995). These decisions can be described by optimization processes that reflect people’s maximization of their own subjective expected utilities (i.e., personal values) via multi-attribute integration calculations wherein one optimally weights and integrates all available information in the light of one’s values and other risks or uncertainties. As a simplification for illustration, one could list every possible pro and con for a certain decision, weight each pro/con according to values (subjective utility), multiply those values by the probability of occurrence, and then integrate the information optimally (e.g., linear integration as in linear regression). Such theories are at the core of dozens of models of decision making including modern theories on diverse topics in motivation, attitudes, and moral judgments (Gigerenzer & Selten, 2001; Gigerenzer et al., 1999) (Weirich, 2004). Although this approach has provided interesting and useful theory, these models often conflict with empirical evidence as psychological science has clearly demonstrated that even though people act as-if they perform a complicated decision calculus, this is not how real people with limited resources (e.g., time, attention, memory) use information to make decisions (Gigerenzer, Todd, and the ABC Research Group, 1999; Kahneman, Slovic, & Tversky, 1982; Payne, Bettman, & Johnson, 1992, 1993; Shafir & Tversky, 1995). Indeed, many decisions are so computationally complex or underspecified that optimization would not be possible for any person or known machine (Gigerenzer et al., 1999).

In the mid-twentieth century, Herbert Simon (1955, 1990) introduced his notion of bounded rationality. Simon argued that people have only limited time, knowledge, and cognitive resources and thus human decision makers cannot carry out the types of optimization computations that were (and still are) often assumed to be essential to rational decision making. Instead, Simon argued that effective decision making must often involve heuristics, which can be less formally described as simple rules of thumb (i.e., non-optimizing decision processes with non-exhaustive search processes; Simon, 1990; for a computationally precise modern extension of this program, see Gigerenzer et al., 1999). In the 1970s, Daniel Kahneman and Amos Tversky carried related ideas forward with the acclaimed heuristics and biases research program (Kahneman, et al., 1982; Kahneman & Tversky, 2000; Tversky, & Kahneman, 1974). This research program provided a huge body of evidence showing that people often relied on a handful of heuristics that led to biases. Note that a bias is technically defined as a tendency—e.g., most people have a right-hand bias for most activities like writing—it is not necessarily synonymous with error but can be associated with errors under specific conditions. Interestingly, however, the heuristics and biases program focused extensively on biases that led to non-normative errors. While this provided vivid and illustrative examples, it also led to some confusion because identifying normative errors required normative assumptions and justifications about the appropriate standards for an accurate or good judgment, something that is not without controversy (Anderson, 1991; Gigerenzer, & Goldstein, 1996).

In the case of the heuristics and bias approach, it was assumed that human cognition should be compared to a very specific set of context-free normative standards such as the outcomes of “rational” optimization processes and logic. Thus, non-normative errors are said to be evidenced when people’s judgments deviate from “an established fact…[or] an accepted rule of arithmetic, logic, or statistics” (Kahneman & Tversky, 1982, p. 493). For example, when asked “which city is further North: New York or Rome?” many people confidently respond New York, even though it is incorrect. Similarly, when a doctor says “95% of patients who are treated survive” people tend to feel much more optimistic about surgical outcomes than when a doctor says “5% of patients who are treated don’t survive.” Theoretically, the same information is provided yet differences in framing produce dramatic differences in intuitions and biases—a non-normative, non-logical difference. To further illustrate with one of the most influential findings to emerge from the heuristics and biases research program, consider the model of how people value risky prospects—i.e., prospect theory. A technical description is beyond our current scope, but a key component is reflected in the fact that people tend to prefer receiving $100 for certain when compared to a 75% chance of winning $200, yet paradoxically they prefer a 75% chance of losing $200 to a certain $100 loss. Theoretically, when faced with multiple lotteries such as these, the normative decision is one that simply calculates the expected value of the two prospects by multiplying the probability by the potential outcome and comparing the choices. Thus, we are comparing two prospects, one worth $150 on average (i.e., 75% of $200 = $150) and another worth $100 on average. Accordingly, it is rational, on average, to prefer the risky option (75% of $200) for gains but not losses, even though most people prefer the exact opposite. To simplify, people act as-if losses loom larger than equivalent gains: The subjective joy one receives by gaining $100 pales in comparison to the subjective pain of losing an equivalent amount, hence the pattern of risk aversion for gains and risk preference for losses (i.e., losses hurt almost three times more than the joy one experiences from an equivalent gain).

The impact of the research on heuristics and biases is hard to overestimate, having influenced our fundamental understanding of human psychology and behavior across many domains including medicine, finance, business, economics, and law. Nevertheless, despite its many successes, the heuristics and biases program has some notable limitations. One of the most serious concerns is that the program has emphasized ways in which heuristics are associated with errors, which has led some to argue that heuristic use is a problem that needs to be corrected. In this light, heuristics are seen as inferior or second-best choice processes designed to be used by computationally disadvantaged individuals. In contrast, research demonstrates that heuristics are often powerful tools that can lead to superior decision making in humans, animals, and machines, particularly under conditions of high complexity or uncertainty as are present in many everyday decisions (Gigerenzer 2008; Simon, 1990). Other concerns focus on the fact that when more representative materials are provided, many biases go away (e.g., Gigernezer, 2001). Still other work has emphasized important differences in criteria used to evaluate judgment and decision making, including coherence (e.g., logic and calculation) versus correspondence (e.g., predictive validity in natural environments). That is, some violations of neo-classic notions of rationality also appear to result from strategies that are very well adapted to real-world task requirements (McKenzie, 2003; Hammond, 2000). Setting this issue of the appropriate standard to the side, what is more central to our current review is the nature and interplay of intuitive and deliberative cognitive processes that are thought to give rise to more and less rational judgments and decisions.

To further illustrate, consider an example. In manufacturing one can improve the quality of goods sent to market by (a) improving inputs (e.g., more skilled workforce), (b) improving outputs (e.g., careful inspection and repair), or (c) doing both. In the metacognition literature these quality control efforts are referred to in terms of (a) early selection versus (b) late correction processing (Jacoby, Kelley, & McElree, 1999; Jacoby, Shimizu, Daniels, & Rhodes, 2005). Late correction processes attempt to detect and repair (e.g., System 2) the output of faulty automatic processes (e.g., System 1) such as biased intuitions. In contrast, early selection can use controlled processing (e.g., System 2) to generate goals, strategies, and mental contexts that qualitatively alter the output of automatic processes (e.g., System 1) before biased intuitions are generated. Research suggests that early selection processes may be key factors that influence a wide range of behaviors, including performance on intelligence tests themselves. Individuals who score higher on domain-general cognitive ability measures tend to spend more time preparing for tasks and also more elaborately and strategically encode information, deliberatively building cognitive representations that provide better support during subsequent task performance (Baron, 1978, 1985; Cokely & Kelley, 2009; Ericsson & Kintsch, 1995). To the extent that early selection cognitive control processes are recruited, they involve deliberate memory encoding. This elaborative encoding causes information in working memory to be moved to long-term memory, freeing-up attentional resources and creating more enduring and detailed mnemonic representations (Cokely, Kelley, & Gilchrist, 2006). In laboratory tasks, this tends to cause better task performance because better representations give rise to better intuitions and to a better ability to monitor performance. However, these same types of metacognitive and deliberative efforts are also processes that give rise, over time, to domain-specific expertise. Indeed, the Knowledge is Power account of Skilled Decision Theory suggests that the primary reason that most people make better decisions is not because they override intuitions but instead because they educate them (i.e., using System 2 to educate and refine System 1 so that intuitions are naturally more informed and less biased; Cokely et al., 2018; see also Cho et al., 2024).

Acquiring Expertise

Sometimes people find it surprising but today there is wide agreement among the scientists who have studied expertise that experts are always made, never born: Without exception, no matter how talented someone may be, they will need to practice deliberatively for many years before they will be able to become a verifiable expert performer. Research also shows that standardized general ability tests and genetic markers consistently fail to predict individual differences in expert performance, such that there is no correlation between intelligence and skilled performance in fields such as chess, music, sports, and medicine. Among typical and healthy individuals, two of the only innate differences that have been found reliable to predict success are height and weight, which matter to relatively few professions. So what does predict success? To put it simply, deliberative practice and access to valuable resources.

A considerable body of research has been devoted to studying the acquisition of expert performance, including the mechanisms that give rise to expert performance more generally (for a practically comprehensive treatment of relevant findings, we refer the interested reader to the Cambridge Handbook of Expertise and Expert Performance; Ericsson et al., 2006). Among the core findings of this research is that expertise doesn’t just require practice, rather it requires a particular kind of practice known as deliberate practice, which primarily involves specific and sustained efforts at doing something one couldn’t do before. Research also shows expertise requires a great amount of deliberate practice. All expert performers, including the most gifted or talented, need a minimum of about ten years (or 10,000 hours) of intense training before they succeed at the highest levels such as winning international competitions, which is an important criterion for high levels of expert performance. Of course, in some fields the apprenticeship is even longer. The most elite musicians often require on the average of 20–30 years of steady practice in order to succeed at the international level. The development of verifiable expert performance also requires specific kinds of environments. For example, Bloom’s (1985) landmark study suggests that elite performers often study with devoted teachers and tend to be supported enthusiastically by their family and relatives throughout their developing years. More than this, however, experts need to be in a learning environment that is not systematically biased, and they need accurate and timely feedback on their performance. Without feedback one cannot learn. And if one does not learn, one never improves the quality of one’s intuitions.

Several recent landmark studies have transformed our understanding of the causes and consequences of general decision making skill. Beyond informing contemporary theory and policy, these findings speak to long-standing debates about the association between general intelligence and decision making. For more than a century major assumptions about the nature of this link have shaped debates about the causes and consequences of class structure, which appear to have affected opportunity in many ways. A theoretical question at the heart of these debates is whether decision making ability is primarily determined by the basic, innate cognitive abilities of normal healthy people. At one extreme we know the answer: Expert performers who engage in extensive deliberate practice are consistently able to circumvent limitations imposed by basic, innate cognitive abilities thanks to long-term working-memory resources that support superior decision making within their domain of expertise (Ericsson & Kintsch, 1995; Ericsson, Prietula, & Cokely, 2007). But still, a persistent refrain is that innate, general cognitive abilities (e.g., intelligence) are important predictors of better decisions and in some instances set upper bounds of what a person can achieve cognitively.

To further illustrate this perspective, consider data from a related seminal contribution entitled Cognitive reflection and decision making (Frederick, 2005). In this research, Frederick showed that participants with higher general cognitive ability scores tended to act as-if they avoided fundamental biases similar to the framing effects previously discussed by weighting and integrating the available information in accord with an expected value model (i.e., multiplying the value by probability). All participants answered tricky yet rudimentary math-type questions that often bias people toward incorrect intuitively appealing answers (e.g., “if a bat and a ball cost 1.10, and the bat cost 1.00 more than the ball, how much does the ball cost”—hint it’s not 10 cents). Those who answered these kinds of questions incorrectly tended to show marked asymmetries in their evaluation and selection of risky prospects (e.g., risk seeking for losses yet risk averse for gains). Although Frederick was cautious with his theoretical interpretations, converging evidence indicates that his assessment of cognitively impulsivity also predicted steeper rates of delay discounting on intertemporal choices, which was compelling for many reasons (e.g., which would you prefer $300 dollars now or $400 dollars next month).

Likewise, the groundbreaking work of Stanovich and West (1998, 2000, 2008; see also Toplak, West & Stanovich, 2011), Frederick (2005; Kahneman & Frederick, 2007), and others tells the story of the state-of-the-science at that time. The emerging leading perspective from the relative new field of decision psychology was that intelligent people generally tended to make more intelligent decisions because they possessed the special capacities needed to override non-rational, emotional, or intuitive impressions in support of complex logical and formal decision analyses. In some sense this was an efficient hypothesis to start with because among other virtues it was a simple explanation in accordance with economic assumptions. Nevertheless, it largely appears to be wrong.

Deliberation Is for Understanding

Building on the work of Peters, Baron, Reyna, and many others, Cokely and Kelley (2009) conducted the first study to directly map the relations between decision strategies, basic cognitive abilities, and superior decision making under risk. Using choice outcome modeling, decision latencies (e.g., reaction time), and retrospective verbal protocol analysis (Ericsson & Simon, 1984; Fox, Ericsson & Best, 2011) they assessed and modeled how individuals with higher cognitive ability scores (i.e., working memory, numeracy, and cognitive reflection) typically made superior decisions when evaluating paradigmatic risky prospects (i.e., lotteries). Despite the paradoxical findings from Peters et al. (2006) indicating that sometimes more skilled decision makers were more biased, dual systems theory suggested that more cognitively able individuals might generally make better decisions under risk by inhibiting affective responses and generating abstract and logical decision analyses (e.g., Evans & Frankish, 2009; Kahneman & Tversky, 2000; Kahneman, 2003). However, retrospective protocol analyses, wherein participants recreated their decision strategies after their decisions had been made, indicated that less than 5% of the sample attempted to calculate expected values during decision making. Instead, the vast majority of people made superior risky decisions because they tended to deliberate more, such that the ability-to-performance relationship was fully mediated by large differences in affective and elaborative evaluation and understanding (i.e., representing relations between feelings, thoughts, and consequences in personally relevant narratives).

The results of Cokely and Kelley (2009) showed that even when evaluating very simple risky prospects, superior decision making under risk generally followed from differences in how and how much participants thought about and understood the decision problem (see also Pachur & Galesic, 2013). For example, better decision makers spent more time imagining how changes in wealth would affect their life and how those changes might feel via informal narratives (e.g., “even though that’s probably never going to happen it really is more money than I pay in tuition so I can’t take the risk”). Generally, better decisions also appeared to reflect metacognitive heuristics that offer simple strategies for understanding and exploring thoughts and feelings, such as disconfirming (e.g., identifying multiple reasons for and against a decision), reframing (e.g., considering potential outcomes framed in terms of costs as well as potential benefits), forecasting (e.g., more elaborately exploring how potential consequences would feel for various stakeholders and why), prioritizing (e.g., reflecting on their own assumptions about what their goal was, why it was their goal, and what their top priority goals should be), and re-checking (e.g., transforming probabilities, re-reading, and organizing facts and assumptions).Footnote 2 Moreover, deliberation as measured either by number of considerations or decision latency predicted superior decision making much better than any (and every) other combination of cognitive ability test scores. Among these relatively typical public college students, decision making quality was much better explained by deliberative heuristics strategies and representative understanding than by cognitive ability profiles (e.g., cognitive impulsivity, attentional control) or logical formal decision analyses (e.g., expect utility). Indeed, some of the least “able” individuals were nevertheless among the best decision makers, reflecting their extensive and personally meaningful heuristic deliberation.

Theoretically, the relations between decision making skill and deliberative heuristic search reflect a host of metacognitive processes that are essential for understanding and contextualizing the decision problem (Cokely & Kelley, 2009; Cokely et al., 2012; Ghazal et al. 2014, Garcia-Retamero & Cokely, 2013a, 2013b, 2013c; see also Peters et al., 2006; Peters, 2012; Reyna et al., 2009; Reyna, 2004). This is useful in part because a more representative understanding of risks and trade-offs means that decision heuristics are better informed (e.g., accurate assessment of cue validities and the magnitude of stakes). The same way extensive knowledge and practice allow expert performers to quickly make superior decisions in routine situations by considering only a small number of cues (Shanteau, 1988, 1992), heuristic deliberation helps people identify the most essential information and trade-offs that take priority for heuristic decision making. Essentially, elaborative deliberation serves as a means of contextualizing risks and consequences in personally meaningful terms, which helps people intuitively feel the weight of various options and stakes without expressly creating or solving a formal econometric analyses.

Theoretically, the processes that support general decision making skill (and risk literacy) are the same as those that give rise to complex situation model development during reading comprehension and those that maintain the high-fidelity situation awareness that often characterizes expert performance. The common thread is that skilled decision making isn’t usually limited by basic cognitive abilities because the development of an integrated understanding engages long-term working-memory capacities (Ericsson & Kintsch, 1995). In turn, long-term working memory functionally expands one’s reasoning capacity far beyond what could be supported by basic cognitive capacities alone (e.g., attentional control). In effect, because decision makers have a vast expert-like knowledge of themselves (e.g., experiences, values, and preferences), personally meaningful heuristic deliberation enables fast and durable long-term memory encoding and representation of complex constellations of relevant risks, rewards, and trade-offs. That said, even if nearly anyone can functionally circumvent limitations imposed by basic cognitive abilities like intelligence by utilizing their ultra-high-capacity long-term working-memory resources, accurately evaluating risk still requires that people have specialized risk literacy skills, which is a topic we’ll save for another time (e.g., for recent reviews, see Cokely et al., 2012, 2013, 2014, 2018; see also Allan et al., 2017a; 2017b; Barton et al., 2009; Cho et al., 2024; Ellis et al., 2014; Garcia-Retamero & Cokely, 2011, 2012, 2013a, 2013b, 2013c, 2014a, 2014b, 2015a, 2015b, 2017; Garcia-Retamero et al., 2012, 2014, 2015, 2016a, 2016b, 2019a, 2019b; Garrido et al., 2021; Ghazal et al., 2014; Keller et al. 2010; Merritt et al., 2010; Okan et al., 2012a, 2012b, 2015, 2018; Petrova et al., 2015, 2017, 2018, 2023; Petushek et al., 2014, 2015a, 2015b; Ghazal et al., 2014; Ramasubramanian et al., 2019; Raza et al., 2019, 2023; Salehi, et al., 2018; Wong et al., 2010; Woller-Carter et al., 2012; Ybarra et al., 2017).

Based on the literature reviewed in the past three sections, there are three points that are most relevant to assessing whether philosophers have expertise and what kind of expertise that is likely to be.

  1. 1.

    Experts are always made and not born. The principle applies to philosophical expertise as well. Nobody is born a philosophical expert. Acquisition of philosophical expertise requires prolonged, deliberate practice.

  2. 2.

    Innate cognitive abilities (e.g., intelligence) can’t explain and are not necessary for expert performance. Again, this principle applies to philosophers. If there is philosophical expertise, being a philosophical expert does not require or entail that one possesses a rare level of general cognitive abilities (e.g., intelligence) that non-experts are unlikely to possess.

  3. 3.

    Expert judgment and decision making primarily result from differences in knowledge and skills, which enable long-term working memory to support fast, durable, and complex mental representations and processes. Again, this principle applies to philosophers. If philosophical expertise exists, it will primarily reflect the acquisition of specialized skills and knowledge that will allow philosophers to use long-term working memory (instead of relying on limited short-term memory resources) to conceptualize, reason, and think in highly sophisticated ways.

Given these three principles, we can help answer the following questions: Is there philosophical expertise, and if there is, what is it and does it provide support for the Expertise Defense? In the next two sections, we review some evidence relevant to assessing philosophical expertise characterized by points 1–3.

Expertise in Philosophy

Indirect Strategies

Broadly speaking, there are two general strategies that one could use to determine the strength of the Expertise Defense—indirect and direct strategies. The first we will discuss are indirect strategies. Indirect strategies generally attempt to identify some of the key elements identifying expertise or how one can develop expertise. Then, those who adopt an indirect strategy try to argue that the ways that the markers of expertise or the ways that expertise are developed in philosophy do not have some of those key elements. Given the lack of some of these key elements of expertise, we would not expect that philosophers have the relevant kinds of expertise to deflect the worries raised by some results from experimental philosophy. In those cases where there is a lack of expertise, we would not expect that philosophers would have any qualitatively better intuitions (e.g., better early selection of intuitions or better late correction of intuitions).

J. M. Weinberg et al. (2010) have adopted indirect strategies to argue against the expertise defense that largely mirrors how we have documented expertise identification and development above. To illustrate one way expertise development in philosophy differs from other areas, we’ll look at one of their examples—i.e., the kind of feedback that is provided. As reviewed above, feedback is one required element for developing expertise. For example, when piloting an airplane, it is clear when one makes a mistake. One goes off course, crashes an airplane, violates flight plans, etc. These mistakes are often not only evident but they also are not temporally remote from the action, both of which are important for being able to recognize and learn from the feedback that is given. But in many philosophical domains and debates, that kind of unequivocal evidence and immediate feedback is lacking. Except for some cases in logic or issues concerning factual knowledge (e.g., historical dates, specific examples, specific vocabulary or cases), our perspective is that it is rare for a philosophical view to be seen as simply mistaken by all. For example, is the Justified True Belief account of knowledge wrong? Do arguments with “mistakes” that involve the Justified True Belief account of knowledge have the same kinds of feedback mechanisms as those for the airplane pilot? It appears that, for the most part, the answer to both questions is “no.” If that is right, then the proponent of the indirect strategy argues that philosophers are simply not likely to have the right kinds of feedback to make their intuitions qualitatively better and immune from effects documented in experimental philosophy (including personality’s relation to some philosophically relevant intuitions).Footnote 3 Hence, philosophy doesn’t provide the right kind of feedback to develop the relevant expertise. Since the relevant feedback is often lacking, expertise is not likely to be developed. So, if there is philosophical expertise, it is not of the relevant kind to help support the expertise defense.

While indirect strategies are suggestive that philosophy might not provide the right kind of feedback to develop the relevant philosophical expertise, it would be desirable to have some evidence that philosophers lack the relevant expertise to support the Expertise Defense. In other words, it would be desirable to have actual, empirical evidence that philosophers display the same (or similarly problematic) biases that non-expert philosophers do. To provide this evidence, one needs to adopt a direct strategy. Evidence provided by direct strategies is the focus of the next section.

Direct Strategies

Direct strategies are similar to indirect strategies in that they attempt to provide evidence that the kinds of expertise that philosophers are likely to have is not the right kind of expertise to eliminate the effects such as personality’s relation to intuitions. But there is an important difference between direct and indirect strategies. Whereas indirect strategies try to draw connections between philosophy and other expertise domains, direct strategies try to show that philosophers have the same (or similarly problematic) biases, intuitions, or judgment tendencies as the philosophically naïve.Footnote 4 To do so, many researchers have started to document these biases, intuitions, or judgment tendencies in philosophers. There is gathering evidence that experts sometimes behave in much the same way as the folk. In this section, we review several attempts to directly assess the Expertise Defense. These studies provide evidence that philosophical training is likely to matter, at least in the sense that philosophical training appears to be related to different, more reflective cognitive styles, argument and evidence evaluation, and a tendency to gravitate to some philosophical positions. However, expertise does not generally remove or reduce the effects of at least some problematic biases in philosophers (perhaps similar to high cognitive abilities not being related to better decisions in chess). Consequently, to the extent that similar problematic biases are found in philosophical experts, the Expertise Defense fails.

To help contextualize direct strategies, it is important to have a sense of what kind of expertise philosophers are likely to have and if that is the relevant kind of expertise to support the expertise defense. To illustrate, take arguments about a specific kind of expertise—moral expertise. There have been a number of arguments about whether moral expertise can even exist, and if it does, what it is (pace our discussion of philosophical expertise). Some extreme views suggest that there is no moral expertise of any kind (Ayer, 1954; Broad, 1952; Ryle, 1957). Subsequent criticisms and defenses of moral expertise have been more nuanced largely by specifying the kinds of expertise that moral experts could have and the ways in which those moral skills could be actualized.

Peter Singer (1972) has detailed several ways in which one could become a moral expert (and these sentiments have been echoed by others, e.g., Archard (2011); Crosthwaite (1995); Hare (1989)). According to Singer, moral experts have special resources with respect to the following (similar to points 1–3 in the previous section):

  1. 1.

    Logical reasoning

  2. 2.

    Understanding of ethical theory and meaning of moral terms

  3. 3.

    Informed of relevant factual information

  4. 4.

    Time to think and reflect

Given these resources, it is often speculated that they will lead to more correct normative judgments (e.g., Singer writes “it would be surprising if moral philosophers were not, in general, better suited to arrive at the right, or soundly based, moral conclusions than non-philosophers” (1972, p. 117)). We see no reason to dispute that philosophers have access to any of the special resources indicated in 1–4. Indeed, most critics of moral expertise grant that philosophers have access to those resources.

What is critical for our understanding of the expertise defense is whether moral expertise goes beyond “descriptive” expertise to substantive expertise rather than being moral cartographers (Archard, 2011; Crosthwaite, 1995; Hare, 1989). “Cartographers” can explain the details of theories and meanings of terms, but may not be able to give better answers to substantive questions. To continue the analogy, cartographers might be able to tell one the best path and waypoints once a destination is chosen (e.g., Rome), but a cartographer cannot tell a person where they should go (e.g., Rome v. Venice). What is required for an adequate defense of the Expertise Defense is that moral experts typically come to true and correct conclusions—something Driver has called an Expert Judger. Expert judgers come to correct normative conclusions rather than simply drawing attention to morally relevant features and facts (e.g., determine whether moral objectivism is true rather than being able to articulate the theory and key terms). When there are disagreements among expert judgers, on Driver’s view, one consults a meta-field expert that can reliably (even if not perfectly) determine which of two contradictory moral judgments is actually true. The critical question, then, for the Expertise Defense is whether there are any relevant meta-field experts to help alleviate the worries about personality’s relation to philosophical judgments. We think there is good empirical reason to think that there is not, at least for many areas of philosophy. We turn now to empirical evidence to help support our claim that there are no meta-field experts in many areas of philosophy.

Philosophical Training and Cognitive Skills and Style

Recall one of the main points about expertise in general: Perhaps expert philosophers have some cognitive skills that non-expert philosophers do not. One way to show that philosophical training and expertise could insulate philosophical experts from the effects of extraneous factors is by showing that philosophical training increases some kinds of cognitive abilities likely to be related to philosophical thinking (hence, philosophical experts are always made and not born—the first important point about expertise in general). For example, perhaps philosophers have learned some metacognitive skills that make them less likely to engage in System 1 type errors. That is, philosophers may have some initial reactions to scenarios or thought examples, but that initial reaction may be corrected or attenuated because of their greater cognitive reflectivity. The correction may reduce biases associated with the folk because philosophical experts may have better logical reasoning skills, including metacognitive heuristics. If philosophical training gives philosophers some new abilities, then there might be good reason to think that the Expertise Defense may be successful. (We will save discussion of the third general point about expertise concerning experts having more nuanced knowledge structures than novices for later in this chapter.)

Some suggestive research indicates that philosophers have the ability to evaluate arguments and evidence in ways that people with other kinds of training may not. In one study, Kuhn (1991) performed a series of case studies on graduate students in philosophy. She found that those with graduate training in philosophy were generally better at evaluating arguments and evidence compared to experts in other domains (e.g., parole officers and teachers). While the sample of philosophy graduate students was small (N = 5), these results suggest that philosophers may have some skills that make their intuitions more robust against unwanted biases. These trained philosophers may be able to evaluate arguments and evidence in ways that allow problematic biases to be reduced or eliminated.

A different line of research suggests that perhaps philosophers have a unique, different, more reflective cognitive style than those without philosophical training. This reflective cognitive style may reduce the influence of extraneous factors. Cognitive reflectivity is a general way of thinking that is typified by careful, deliberate reasoning that can overcome intuitively compelling, yet wrong, responses (e.g., the Cognitive Reflection Task discussed above). Livengood, Sytsma, Feltz, Scheines, and Machery (2010) (see also Cokely and Feltz (2009a)) presented some evidence that those who have philosophical training across several levels of education had a more reflective cognitive style (see Fig. 5.1). These results suggest that philosophers have greater cognitive reflectivity than others, in particular non-philosophers or those who have had less exposure to philosophy. Hence, the differences in CRT for philosophers support the idea that philosophers may have abilities or skills that could reduce the influence of extraneous factors on their philosophically relevant intuitions.

Fig. 5.1
A double bar graph. The top 3 values are as follows. Some college. No philosophy training, 0.43. Some philosophy training, 0.74. Bachelor's degree. No philosophy training, 0.65. Some philosophy training, 1.16. Professional or graduate degree. No philosophy training, 0.82. Some philosophy training, 1.21.

Mean CRT scores across different levels of education for philosophers and non-philosophers (Livengood et al., 2010)

The data concerning CRT and philosophical training is correlational so determining the causal direction is difficult. First, it could be that philosophical training causes increased cognitive reflectivity. Second, those who are higher in cognitive reflectivity may gravitate toward philosophy. Or, cognitive reflectivity and philosophical training may be caused by some other third variable. While Livengood et al. (2010) attempted to model whether the effect of philosophical training was causal, statistical tests were equivocal about the direction of causation, so given the currently available evidence there is no empirical way to prefer one direction of causation over the others. However, the direction of causation isn’t required to help support the Expertise Defense. After all, the Expertise Defense simply states that philosophers, through their special training, skills, or cognitive abilities, are not as prone to be influenced by extraneous factors. In this case, philosophers may have some special skills, perhaps antecedent to philosophical training, that make them less likely to be influenced by extraneous factors. It does not necessarily matter to the Expertise Defense the way that expert philosophers come to have the relevant expertise. Rather, all that is required is that expert philosophers in fact have the relevant expertise. To illustrate, consider an analogy. It does not matter if professional soccer players are naturally gifted or if they obtain their abilities through extensive training (which, of course, they do). All that matters to be a world-class soccer player is that one in fact has the relevant abilities regardless of how those abilities are obtained. So, philosophers having higher cognitive reflectivity may allow them to be reflective enough about the nuances of philosophical issues, thought examples, and intuitions and that may be enough to insulate the influence of extraneous factors on their intuitions.

Free Will, Extraversion, and Expertise

So far, the direct evidence for or against the Expertise Defense has only been suggestive. However, some of the evidence from direct strategies reviewed so far suggests that philosophers have cognitive skills that are different from those who are non-philosophers. Some of these skills or abilities may appear to support the Expertise Defense because philosophers appear to be more cognitively reflective and more skilled at evaluating arguments compared to non-philosophers. So far, the Expertise Defense is looking increasingly more likely to succeed.

The Direct Strategies so far reviewed, while trying to show whether philosophers have the cognitive abilities or skills relevant to support the Expertise Defense, are still not direct enough to settle disputes surrounding the Expertise Defense. Critics may think that philosophers may be better at evaluating arguments or may have a more reflective cognitive style, but those factors alone are not sufficient to shield their intuitions from the problematic aspects of irrelevant factors like personality. Recall the discussion of moral expertise. No one disputes that philosophers are good at constructing and defending arguments and nobody disputes that philosophers often reflect very long and deeply about issues that are core to their research program. Rather, the critic thinks that the key to the dispute is whether philosophers have systematically different contents of intuitions that serve as evidence for premises of arguments and whether these systematic differences are influenced by irrelevant factors. For example, extraverts may tend to have more compatibilist friendly intuitions, and they may be more likely to use the content of those intuitions as evidence for the arguments than introverts. Using their cognitive reflectivity and argument evaluation abilities, philosophers could create subtle and technically correct arguments based on the content of their intuitions. But, given the different content in their intuitions, extraverts and introverts could end up with quite different conclusions concerning the relation between determinism, free will, and moral responsibility. Of course, we suspect that these concerns aren’t very compelling to defenders of the Expertise Defense. Apologists may simply reply that these skills or abilities do in fact give us reason to think that the Expertise Defense is correct. Is there a way that we can more efficiently settle the debate?

The answer is “yes.” However, different methods are required. The currently reviewed methods only hint at the possibility that the Expertise Defense is correct. But, one could simply measure whether intuitions of philosophers are influenced in the same or similar ways to those of non-professionals. If the intuitions of philosophers are influenced by extraneous factors, then the Expertise Defense fails. If the intuitions of philosophers are not influenced by extraneous factors, then there is reason to think that the Expertise Defense may be correct.Footnote 5

To start, there is some evidence that philosophers have some of the same biases that non-professionals have.Footnote 6 Schwitzgebel and Cushman (2012) conducted a series of experiments that attempted to show that an order effect in judgments that is common for some non-professional philosophers is also present in professional philosophers.Footnote 7 Theoretically, it should not matter to the truth of the content of an intuition whether scenarios are presented in one order versus another order. Order is an extraneous feature to the truth of the content of intuition. Consequently, it is widely agreed that the order effects constitute an irrelevant factor that could influence the content of the truth of intuitions.

In this case, Schwitzgebel and Cushman (2012) tested judgments about three different kinds of moral principles and three scenarios illustrating those principles. These scenarios and principles were drawn from the literature and concerned the Doctrine of Double Effect (e.g., diverting a trolley that kills one to save five versus pushing a large person that kills the large person but saves five), differences between actions and omissions (e.g., allowing to die versus killing), and finally cases of moral luck (e.g., a driver passing out and hitting a tree versus hitting and killing a person). They then gave these scenarios and principles to non-academics, non-philosopher academics, philosophers, and finally ethicists. Broadly, across all of the principles and scenarios, there were order effects for each of the three principles and for each of the three scenarios. In particular, the magnitude of the effect of order was roughly similar across all groups regardless of philosophical training.Footnote 8

The results form Schwitzgebel and Cushman’s studies are bad news for the Expertise Defense. Recall that one of the central empirical planks of the Expertise Defense is that philosophers, through their specialized training, skills, or cognitive styles, are less likely to display some of the effects of extraneous factors that non-philosophers display. However, Schwitzgebel and Cushman (2012) present data that this simply is not the case—philosophers, even those who specialize in ethics—tend to display the same kinds of order effects to a similar degree as those who are non-professional ethicists. Hence, expertise does not typically appear to make the appropriate difference in the effect of an extraneous factor on intuitions as the Expertise Defense predicts.

While all of the studies that directly test the Expertise Defense are important and illuminating, they all suffer from a common shortcoming—they do not guarantee that the participants in the experiments are the relevant experts.Footnote 9 Often, expertise is identified simply by looking at a person’s credentials or university degrees. However, those ways of identifying true expertise (as it was discussed above) is not always reliable (Ericsson & Lehman, 1996; Ericsson et al., 2007). For example, somebody who is a trained professional stockbroker may not be able to pick winning stocks at a greater frequency than non-stockbrokers. Many of the studies reviewed above used a credential-based approach (e.g., working in a philosophy department; self-identification of being a philosopher). Rather than relying on these kinds of credential-based approaches, a strong test of the Expertise Defense would be to test those who can demonstrate some elements of expertise. One of these necessary elements that we explored was the possession of superior objective knowledge about a field by asking people a set of questions to measure that objective knowledge.Footnote 10

Using an objective measure of philosophical expertise would allow us to identify those who have expertise in a specific philosophically relevant area versus those who do not. As reviewed above, research suggests that in some domains using credentials is not necessarily the best way to identify expertise. Philosophical expertise seems to be one of those domains. One reason that just being identified as “a philosopher” may not give a sense for what that philosopher’s area of expertise is. Philosophy is a highly diverse field with many different topic areas (e.g., philosophy of physics, ethics, logic). Because of this diversity, one may be an expert in one area (e.g., mereology) but not an expert in another area (e.g., moral objectivism). The specialization of philosophy is not unique and is found in other disciplines as well. For example, you wouldn’t necessarily go to a brain surgeon for heart surgery since they have different areas of expertise. Along these lines, somebody may be a professional philosopher yet not be an expert about free will (e.g., they may be an expert about mereology). We think defenders of the expertise defense can leverage these observations and argue that just being a “philosopher” does not necessarily mean that one is an expert about the relevant question. For example, even if extraversion predicted compatibilist intuitions for “philosophers” that does not mean that compatibilist would predict compatibilist intuitions for expert philosophers in free will. There may be something about that specific kind of training and expertise for free will experts that makes that relation go away.

All of this means that we are making it more difficult to adequately address the Expertise Defense. The reason why it is more difficult is because the set of relevant philosophical experts won’t be identified by credentials or self-reports and the set will be much smaller than the set of “philosophers” (see, e.g., Stich (1998)). Given this specification and refinement of the Expertise Defense, we accept that we need evidence for the truth of a version of the following principle to claim that the Expertise Defense fails:

(EQ) Philosophers’ intuitions about hypothetical cases vary equally with irrelevant factors as those of non-philosophers. (Horvarth, 2010, p. 464)Footnote 11

The review about expertise provided above offers some good reasons for thinking that perhaps philosophical expertise makes a difference in a way to think that EQ is false. In some domains, experts just have higher quality intuitions (e.g., airplane pilots; chess players) and are not prone to the same kinds of judgment biases as non-experts (Ericsson & Lehmann, 1996; Ericsson et al., 2007). For example, grandmaster chess players have qualitatively different intuitions about game positions and risks than chess novices. Professional soccer players understand and can better predict what to expect during soccer play as compared to soccer novices. The idea is that through training and practice, these people just understand the problem space better, meaning their intuitions are better informed and calibrated. This could be no less true in philosophy.

To our knowledge, only one direct test of the Expertise Defense attempts to estimate philosophical expertise. This study involves extraversion’s relation to compatibilist judgments in verified philosophical experts. We will treat this as our paradigmatic example.

It is not news that free will experts disagree about the correct answer to the compatibility question. There is even some debate about whether compatibilist or incompatibilism is the dominant view among free will experts. For example, some experts hold that “in contemporary discussions of free will, incompatibilists self-identify as the underdog” (Nichols, 2007, p. 261). However, other experts such as Robert Kane think “Compatibilism has surely become the dominant view among philosophers today” (1996, p. 12) and Derk Pereboom notes, “the demographic profile of the free will debate reveals a majority of soft determinists, who claim that we possess the freedom for moral responsibility, that determinism is true, and these views are compatible” (1995, p. 21). However, others think that there is evidence the most philosophers are incompatibilists.Footnote 12

Whatever the professional landscape in free will is, we might ask if extraversion predicts that variation in expert intuitions about the compatibility question. There is reason to think that extraversion does. Because philosophers are also humans with personality, personality could be related to or influence philosophers’ intuitions about free will and moral responsibility. If education for philosophers does not provide the right kinds of environments to create the relevant expertise, then we should see extraversion predicting intuitions about the compatibility question even in verified experts.

Schulz, Cokely, and Feltz (2011) explored whether extraversion predicted expert intuitions about the compatibility question. To do so, they asked a group of German participants to complete the extraversion sub-scale from the NEO-PI-R (Costa & McCrae, 1992, German version: Ostendorf & Angleitner, 2004). After completing the NEO-PI-R, participants read a standard determinism scenario. Participants were asked to rate their level of agreement with the Free Will questions on a scale from 1 (absolutely disagree) to 7 (absolutely agree). Finally, participants were presented with the following Free Will Skill Test.

  1. 1.

    “Well-known counterexamples for the PAP are called the Frankfurt cases. (true)

  2. 2.

    Arthur Schopenhauer said that there is definitely free will. (false)

  3. 3.

    Two important opinions in the debate about free will and determinism are called compatibilism and incompatibilism. (true)

  4. 4.

    PAP stands for the principle of alternate personalities. (false)

  5. 5.

    One frequently used argument for the freedom of choice is the experiment from Benjamin Libet. (false)

  6. 6.

    One well-known believer in free will was Jean Paul Sartre. (true)

  7. 7.

    The classical Trolley Problem is about two trains on a collision course. (false)

  8. 8.

    William James suggested that there could be soft determinism. (true)

  9. 9.

    One argument in the field of moral philosophy is Moore’s open statement argument. (false)

  10. 10.

    The Stockholm’s interpretation sees quantum physics as an argument against determinism. (false)”

After each question, participants had to indicate if they thought that the statement was true, false, or they did not know. A correct answer counted as one point, the wrong answer as a minus point, and “I don’t know” as zero points, providing a correction for any participant guessing. The rationale was that if somebody was randomly guessing the answers, their total score would on average be 0.

There is good reason to think the Free Will Skill Test meets or exceeds the standards of classical test theory used for the development of many psychological and educational assessments (e.g., diagnosis, personnel selection). A validation study was conducted with 44 philosophy graduate students (age: mean = 23, SD = 2.76; 16 females). Among the moderately skilled sample in the validation study (range: 0–9, mean = 2.9, SD = 2.2), no distributional skew was observed. Further analysis indicated that the instrument had a very high test-retest correlation, r = .99 over short time intervals (10 to 30 minutes). The test had a Cronbach’s Alpha of .75, which is above the conventional adequacy threshold for psychological instruments, providing further evidence of reliability and indicating high internal consistency. There was also evidence of convergent validity as the scores showed substantial correlations with free will relevant self-rated knowledge (r = .5), estimated number of papers read (rho = .49), lectures attended (rho = .33), and years one had been interested in the debate (rho = .39).

As predicted, extraversion—or more specifically warmth, which is an essential facet thereof—was systematically related to compatibilist intuitions. Importantly, with respect to the effect of personality on judgments of free will and moral responsibility, there was no reliable difference between folk and expert intuitions. The Free Will Skill Test predicted the compatibilist composite score (M = 1.5, range 0–10, SD = 1.3) explaining 9% of the variance (see model 1 in Table 2.1). Greater philosophical knowledge was associated with stronger incompatibilist intuitions. This suggests that as one is educated and learns more about the free will debate, one is likely to become more incompatibilist. However, warmth explained additional variance controlling for the Free Will Skill test score in a stepwise regression (Table 5.1). The full model explained 14% of the variance in participants’ judgments (see model 4 in Table 2.1). Critically, when controlling for expert knowledge, warmth continued to predict a moderate amount of unique judgment variance (about 5%; see Table 2.1), an estimated bias that is roughly equivalent in size to that observed among the folk.Footnote 13 The meta-analytic r2 estimate from Chap. 2 was about .04, suggesting that we can have some confidence that expert’s personality had similar relations to free will judgments as novices even given the small sample size.

Table 5.1 Explained variance of the different predictors. Please note that absolute r and F are indicated by Δr2 and ΔF for models that are not stepwise (1, 2, and 4)

These findings support some recent hypotheses suggesting that many people who are knowledgeable about the free will debate are incompatibilists. Results further indicated that both extraversion and expert knowledge were reliable, non-redundant, and not interacting predictors of judgment bias in a paradigmatic free will case. That is, extraversion biased expert and non-expert intuitions about the Compatibility Question to the same degree, suggesting that expertise does not eliminate or reduce the general compatibilism bias observed among experts. Training does change free will intuitions, but does not likely eliminate the effect of personality even for a group likely to be more cognitive reflective and knowledgeable. In short, free will training changes intuitions, but not always in the relevant ways. Consequently, at least for this paradigmatic example, the Expertise Defense fails.

The Difficult Death of the Expertise Defense

At this point, there is substantial reason to suspect that the Expertise Defense fails. Apologists no doubt have an arsenal of replies suggesting that it is possible that the Expertise Defense works. For example, Rini (2015) argues that we have some independent reason to think that the extraneous factors should be reduced or eliminated for philosophers because they are experts. Rini gives three possible explanations for problematic findings like the ones we have reviewed, all of which she argues are not threats to philosophical expertise. First, the philosophers who are polled are not really experts. Second, the philosophers who are polled don’t really pay attention. Third, the philosophers who are polled haven’t formulated responses to these kinds of scenarios because they are experts. That is, philosophers may have intentionally not formed responses to these scenarios because of various theoretical commitments. Finally, there could be some diachronic instability (e.g., with framing effects), but philosophers could become aware of those effects and then use that knowledge to discount the justificatory role those intuitions play.

We think we’ve already made the case that given the evidence it is more plausible that philosophers do not have the right kind of expertise rather than any of the explanations offered by Rini. First, we can dismiss the “not experts” explanation. There is direct evidence that verifiable experts tend to display similar biases associated with personality in some paradigmatic instances (e.g., in free will). Second, experts likely do pay attention to the scenarios in surveys because they are about the very things that they have devoted a large amount of their lives to studying. In other words, they are likely intrinsically motivated to respond well. Similarly, if one is an expert in a domain, one most certainly has views about basic notions in the debate like determinism’s relation to freedom and moral responsibility. Regardless, they seem to be sufficiently motivated to pass a basic knowledge test. Finally, at least with respect to personality, the problem isn’t diachronic instability. The intuitions are diverse yet stable among different groups of individuals. As such, the diachronic explanation completely leaves the personality (and similar) findings untouched.

All of this does not call into question intuitions’ use in all projects, and perhaps expertise has a valuable role to play in those other domains for the reasons already discussed. For example, perhaps one is interested in using intuitions in conceptual analysis (see Chap. 6 for a fuller discussion). Given one’s access to intuitions along with greater cognitive reflectivity (or some other cognitive skill) and one’s knowledge of the domain (or some other relevant knowledge) one may be able to skillfully construct a conceptual analysis that uses those intuitions as evidence. After all, we are not arguing that one’s own intuitions are not indicative of one’s own concept (on average), nor do we need to argue that philosophers do not have some genuine skills (e.g., skills of argument). Consequently, our arguments and data do not call into question (although they don’t necessary support) those kinds of uses of intuitions or the Expertise Defense’s role in supporting those kinds of philosophical practices. There may be other costs associated with taking this strategy (see Chap. 6) for other uses of intuitions. Nevertheless, those costs and benefits should be evaluated separately from the general evaluation of the Expertise Defense.

We think this puts critics of the Expertise Defense in a rhetorically strong position. If either indirect or direct strategies show that philosophers’ intuitions vary as a function of extraneous factors in a similar way to folk intuitions, then the expertise defense is in trouble. If an indirect strategy is correct, then philosophers don’t possess the relevant expertise to deflect the worrisome implications of the extraneous factors we have identified. If a direct strategy is successful, then philosophers’ intuitions display similar biases as expressed by the folk or at least that intuitions show some systematic variation with personality. Given the amassing evidence that either an indirect or a direct strategy has merit in connection with the relative lack of evidence for the Expertise Defense, we can conclude that the Expertise Defense fails.