Introduction

One of the most significant methodological contributions of neuropsychology to the broader field of cognitive (neuro)science is the notion of double dissociation; a situation where a brain injured patient A is significantly more impaired on task X than patient B while patient B is significantly more impaired than patient A on task Y (Shallice, 1988; Teuber, 1955). When such double dissociations are observed, it is highly likely that some of the processes underlying performance in the two tasks are independent, although it must be noted that other interpretations are possible (Davies, 2010). Double dissociations have been the backbone of cognitive neuropsychology which has sought to uncover “pure” cases with selective deficits that could speak to the specificity of domains and/or cognitive processes. This has often led to intense debates regarding the pureness of disorders, for example whether pure alexia reflects a visual recognition difficulty confined to words (Gaillard et al., 2006), or whether other object categories are also affected (Starrfelt et al., 2010). This quest for specificity also characterized cognitive neuroscience, where the first couple of decades with functional imaging had a strong focus on brain regions that evinced specificity, for example for words, i.e., the visual word form area (Dehaene & Cohen, 2011; Price & Devlin, 2011). This is probably no coincidence, as functional neuroimaging in the beginning relied heavily on the subtraction method (Sternberg, 1969), assuming that the neural substrate of functionally distinct cognitive processes could be identified by comparing two tasks which differed only in the process of interest (Friston et al., 1996). This is basically the same kind of logic that underlies many interpretations of (double) dissociations (Van Orden et al., 2001). Functional neuroimaging, however, also examined associations such as the overlap among tasks in their neural activations (Price & Friston, 1997). Indeed, Haxby et al. (2001) laid the ground for a major shift in conceptualization of functional brain organization when they introduced the use of multivariate methods to examine the relationship between activated areas (voxels), rather than how activation in a particular voxel/area differed across tasks. It is notable that a similar interest in associations between deficits in task performance is not characteristic of cognitive neuropsychology where the weight on dissociations still dominates in studies examining deficits following acquired brain injury and, increasingly, developmental disorders (Starrfelt & Robotham, 2018).

There are reasons, however, why observations of dissociations between deficits are considered more valuable than observations of associations among deficits in cognitive neuropsychology (Davies, 2010; Ellis & Young, 1988). One has to do with specificity. If two deficits co-occur it may have little theoretical value if the co-occurrence is caused by factors that are likely to affect several domains. As an example, an attentional deficit may affect performance on several tasks that may have very little in common besides requiring attentional resources. Another, and more profound, reason has to do with the fact that associations may not necessarily reflect the (mis)working of a common cognitive process that underlies impaired performance on tasks A and B. It just might be that tasks A and B are performed by means of independent processes which are mediated by functionally distinct areas—say areas X and Y—but that these independent processes are often compromised together because areas X and Y are frequently affected simultaneously due to physical proximity; a phenomenon we will refer to here as “collateral damage”. The problem concerning collateral damage has been recognized in cognitive neuropsychology for a long time (Caramazza, 1986; Shallice, 1988). It is also a central problem because it introduces an imbalance when inferences are made regarding the cognitive underpinnings of neuropsychological disorders, and thus the underlying cognitive architecture; theoretical explanations based on observations of dissociations are simply considered more credible than explanations based on associations, and this makes for a bias that favors theories based on assumptions of domain- or process-specificity. We will term such theories “specific-mechanism” accounts. This bias is unfortunate because it is quite likely that most abilities rely on shared processes to some extent; just think of the G-factor in intelligence. In the following we will term theories, which are based on assumptions of domain-general processes, for “shared-mechanism” accounts.

Not only are shared-mechanism accounts viewed as evidentially inferior than specific-mechanism accounts—because the former are based on associations and the latter on dissociations—it is also our impression that many questions in cognitive neuropsychology are tackled as though explanations based on shared- and specific-mechanism accounts are incommensurable; cf. the discussion of whether pure alexia is really “pure” (Starrfelt & Shallice, 2014). A similar tendency can be seen in the broader field of cognitive neuroscience, where a tension between shared- and specific-mechanism accounts is also observed (Yovel & Duchaine, 2006), for example with regards to whether face recognition is supported by dedicated areas or by areas that also support recognition of other object classes (Behrmann & Plaut, 2013; Kanwisher, 2010; Yovel & Kanwisher, 2004).

The purpose of the present work is twofold: (i) We wish to discuss and clarify—with special emphasis on the neuropsychological aspects—the general theoretical implications regarding dissociations and associations as they relate to shared- and specific-mechanism accounts, and (ii) we want to sketch a procedure that might help to discriminate between shared- and specific-mechanism accounts based on behavioral data. This discussion, and the suggested procedure, will be exemplified in the context of the current debate regarding developmental prosopagnosia (DP). However, and as indicated above, the implications apply to other areas of neuropsychology, and cognitive neuroscience, where inferences based on associations and dissociations are used to support conceptual models.

DP is a syndrome characterized by severely impaired face recognition in individuals who have experienced face recognition problems their whole life and where there is no (known) brain damage present (Susilo & Duchaine, 2013). A recent review by Geskin and Behrmann (2018) on DP has led to considerable debate regarding how the empirical data on DP should be interpreted (Barton, 2018; Behrmann & Geskin, 2018; Campbell & Tanaka, 2018; de Gelder & Van den Stock, 2018; Eimer, 2018; Garrido et al., 2018; Gerlach et al., 2018; Gray & Cook, 2018; Nestor, 2018; Ramon, 2018; Rosenthal & Avidan, 2018; Rossion, 2018; Starrfelt & Robotham, 2018; Towler & Tree, 2018). The main finding in Geskin and Behrmann’s review (Geskin & Behrmann, 2018) was that of the 238 identified cases with adequate data by their judgment, approximately 80% had evidence of both face and object recognition impairments while 20% showed only impaired face processing. According to Geskin and Behrmann this finding of frequent associations in impairments across domains favored “…an interpretation of a single mechanism that might support the recognition of more than one [faces and objects] but not all visual classes [words]” (p. 23). Such a conclusion downplays the significance of the 20% of cases that showed a single dissociation and implies that most cases with DP have (presumably milder) object recognition deficits (see Fig. 1a).

Fig. 1
figure 1

Three different models of the relationship between face and object recognition in DP. a A strong version of a shared-mechanism account where all cases with DP are assumed to have object recognition deficits because the mechanism impaired in DP (yellow area) is shared by faces and objects even though it is more critical for faces. b A strong version of a specific-mechanism account where all processes except for those related to low-level vision are specific to faces and objects. In this version, DP will always reflect an impaired face-specific mechanism (red area), and accompanying object recognition problems will reflect collateral damage to object-specific mechanisms (blue area). c A hybrid model where DP may be caused by impairment to a mechanism (at an intermediate level) common for both face and object processing but more important for face than for object recognition (yellow area), or by impairment to face-specific processes (in high-level vision) (red area).

Opposite to Geskin and Behrmann is the position taken by Gray and Cook (2018). According to their Independent disorders hypothesis, DP is caused by impairment in a selective face-processing mechanism, and accompanying deficits reflect damage to other mechanisms that tend to co-occur with the impairment causing DP (see Fig. 1b). They speculated that this co-occurrence could be due to aberrations in occipitotemporal development that tend to affect not one but several brain areas that each support independent perceptual functions. The Independent disorder hypothesis can be considered to be at one end of a theoretical continuum, in that it implies that object recognition deficits in DP do not arise from the same cognitive dysfunction that gives rise to face recognition problems (i.e., DP). Rather, object recognition deficits are seen as a result of collateral maldevelopment, i.e., a concurrent developmental object agnosia (DOA) that affects objects but, on its own, not faces. From this point of view there are no theoretical implications to be drawn regarding cognitive architecture from the putatively 80% of cases that show an association (Geskin & Behrmann, 2018).

The available evidence suggests that DP occurs much more frequently with than without DOA (Geskin & Behrmann, 2018). If DP and DOA are independent disorders this is an empirical observation that requires an explanation. Similarly, it is also surprising that there are very few subjects reported with “pure DOA,” i.e., with intact face recognition, and to our knowledge only one case has been reported (Germine et al., 2011). Importantly, this imbalance may only pose a challenge if the available evidence is unbiased, which it may not be. As argued by Gray and Cook (2018), cases with pure DOA may be less likely to approach researchers if object recognition difficulties are less socially debilitating than face recognition deficits. DOA might also be underreported simply because more researchers are aware of DP than of DOA. Consequently, the present literature may underestimate the occurrence of DOA.

If two such divergent points of view can be derived from the same set of observations, is there any way we can pit them against each other or even falsify them? Before we offer our take on this, we need to clarify what independence entails.

What does it mean for two disorders to be independent?

Two disorders, in this case DP and DOA, are (stochastically) independent if the frequency of their co-occurrence [DP ∩ DOA] equals the product of the frequency of the disorders in isolation [p(DP ∩ DOA) = p(DP)p(DOA)]. Let us assume that we have data where such frequencies can be derived in an unbiased manner, and that it is found that the disorders are non-independent [p(DP∩DOA) > p(DP)p(DOA)]. Would that mean that the disorders reflect the same cognitive mechanism(s)? As argued above the answer is “no” because co-occurrence may be caused by pathology that affects two different cognitive mechanisms that each support a different function (i.e., object and face recognition). In this case, we have “separable disorders” rather than independent disorders. But what if the results do show stochastic independence? Would that imply that independent neural or cognitive mechanisms underlie the disorders? The answer is “yes.” If disorders are stochastically independent, they share nothing, either neurally or cognitively.

Now, the functions compromised in two disorders may rely on some processes that are shared by the functions and some that are specific to a given function. To stick with the present example, at early levels of processing (low-level vision), object and face recognition do depend on common forms of processing. The functions of striate cortex are required for both face and object recognition, but we would consider subjects with loss of striate function as being blind rather than having a DP+DOA overlap syndrome. While this is an obvious example, object and face recognition also involve many low-level and intermediate cognitive operations (e.g., figure-ground segregation) beyond striate cortex, and these too are almost certain to be shared. Hence the question of independence really should only apply to higher levels of processing, where there is the hypothetical chance of divergence into selective processing streams (see Fig. 1c). If cases with deficits at lower levels are not excluded, this will lead to an overestimation of cases with co-occurring disorders such as DP and DOA. Indeed, one can ask whether this may be partly responsible for the observation that 80% of individuals with DP seem to show object recognition deficits (Geskin & Behrmann, 2018). This then could be a second reason why associations between face and object recognition deficits are frequent, without necessarily invoking a common vulnerability to pathology of selective face and object recognition processes.

Given that it may be difficult to exclude all cases with deficits at lower levels, that could impair both face and object recognition, it seems unlikely that the data will support stochastic independence of DP and DOA. Rather, the better question may be whether we can find evidence that DP and DOA are separable disorders. That is, instead of trying to show that p(DP ∩ DOA) = p(DP)p(DOA), we might show that p(DP+nonDOA) > 0 and p(nonDP+DOA) > 0—that is, that the disorders can doubly dissociate. It is worth pointing out that the independent disorders hypothesis assumes separability rather than independence between disorders. Indeed, this account expects the incidence of DOA to be higher in the population with DP than in the wider population, and this suggests some degree of association between DP and DOA (Gray & Cook, 2018). Hence, despite its name, the independent disorder hypothesis does not assume independence at the level of disorders but at the level of cognitive functioning. This difference is important to bear in mind and is clearly relevant for disorders other than DP and DOA.

Does DP reflect a face-specific mechanism, or a mechanism shared by face and object recognition?

If, as in the independent disorders hypothesis, DP can reflect impairment of a face-specific mechanism, it should be possible to find evidence of a (double) dissociation between DP and DOA. On this question we believe that there is agreement in principle, even if there is disagreement as to whether convincing dissociations have yet been reported (Gerlach et al., 2018). However, it is less clear what sort of evidence could falsify specific-mechanism accounts. In a related manner, it is also not clear what empirical evidence would support shared-mechanism accounts, given that one can always claim that a concurrent object recognition impairment reflects an associated (collateral) defect in an independent mechanism and not the workings of a shared mechanism (Garrido et al., 2018; Gray & Cook, 2018). These are important conceptual points, because the inability to support their position and falsify a competing hypothesis places shared-mechanism accounts at a disadvantage, cf. the discussion in the introduction.

Finally, there are questions of what counts as a dissociation, and what kind of dissociations are found. A classical dissociation implies that one performs normally on one task but abnormally on another, and as a further requirement that the difference in performance between the tasks differs significantly from the difference seen in healthy populations (Crawford et al., 2003). Of these requirements the latter is probably the most important (Gerlach & Starrfelt, 2021; McIntosh, 2018). However, there can also be strong dissociations, in which the subject is impaired on both tasks, but significantly worse on one. There are likely to be many possible reasons why maldevelopment of mechanisms that are common to two functions may lead to greater difficulties with one of the functions, thus giving rise to strong dissociations. As an example, some of the present authors have suggested that DP in some cases may reflect delayed processing of global shape information (Gerlach et al., 2017; Gerlach & Starrfelt, 2018a). On this account, fast derivation of global shape information is important for efficient recognition of most visual stimuli, including faces and objects, but critical when stimuli are highly visually similar and must be classified at a subordinate level, as with faces. Consequently, maldevelopment that affects fast global shape processing will lead to a disproportional recognition impairment for faces relative to objects, i.e., DP characterized by strong rather than classical dissociations (Gerlach & Starrfelt, 2021).

If associations are frequently observed between two disorders, is it possible to make a prediction from a shared-mechanism account—such as the delayed global shape processing hypothesis—that cannot readily be explained by specific-mechanisms accounts by means of a collateral-disturbance argument? We will argue that it is, at least in principle.

A procedure for contrasting shared- and specific-mechanism accounts

In the following we will outline a procedure that can be used to discriminate between shared- and specific-mechanism accounts. We will use DP to illustrate the procedure, but the procedure is applicable to other disorders where both shared- and specific-mechanism accounts have been invoked. The procedure is basically a mixture of single-subject and group-based analyses. In the first step, we identify all individuals in a large test sample who show signs of disorders X or Y. This is done by examining whether any individual in the test sample performs abnormally on tasks measuring functions x and y thought to be compromised in disorders X and Y. Abnormality is assessed relative to an independent reference sample. Hence, the test sample is unbiased in the sense that it is not restricted to people who are likely to have either disorder X or Y to begin with. In the second step we remove all cases from the group with disorder X who show signs of comorbidity, that is, individuals who are classified as having both disorder X and Y. In step three we examine whether the group comprising individuals with disorder X, and disorder X only, show evidence of impairment of function y relative to the rest of the test sample who were not classified as having disorder X (or Y). If there is evidence of impairment of function y in the group with disorder X only, this will suggest that disorder X is associated with impairments of function y in a manner that cannot be accounted for by comorbidity (collateral damage) alone. This residual impairment of function y in disorder X, we argue, can be accounted for by assuming that functions x and y involve a common process that contributes to both functions, but which is more critical for function x than y, thus giving rise to a type X disorder if impaired (a shared-mechanism account). Note, that from this it does not follow that disorder X will always reflect impairment of a process common to functions x and y. It may be that there are cases of disorder X that arise due to impairment of processes that are specific to function x, just as there may be cases of disorder Y that follow impairments to processes that are specific to function y (specific-mechanism accounts).

Now to the example. If DP can reflect maldevelopment of mechanisms shared by both face and object recognition, but which are more important for face than for object recognition, many individuals with DP should have subtle object recognition deficits. Hence, a shared-mechanism account predicts that DPs on average have worse object recognition than neurotypical controls. A specific-mechanism account does not make this prediction but can accommodate such an outcome by assuming that the DP group includes a subset of comorbid cases (DP+DOA) who lower the group mean for object recognition. However, if such comorbid cases are removed (DP+DOA), the specific-mechanism account predicts no difference in object-processing performance between the “pure” DP group and controls. In contrast, a shared-mechanism account still predicts worse object processing on average in the remaining DPs than in controls even when cases showing significant deficits in object processing (DOA cases) are removed, although the difference would likely be diminished because cases with very poor object-processing performance have been removed. Accordingly, the following two hypotheses regarding face and object recognition in DP can be derived, and the outcome of the second of these hypotheses has the potential to differentiate between shared- and specific-mechanism accounts:

  • Hypothesis A (common to both shared- and specific-mechanism accounts): Object recognition performance will be reduced in DP groups compared with control groups.

  • Hypothesis B (shared-mechanism accounts only): Reduced object recognition performance will be present in DP groups even when cases of combined DP and DOA are removed.

In the following we will test these hypotheses in a “proof-of-concept” fashion in a large sample of psychology students who have performed two tests that are commonly used in studies of DP: The Cambridge Face Memory Test (CFMT) (Duchaine & Nakayama, 2006) and the Cambridge Car Memory Test (CCMT) (Dennett et al., 2012). If a given student scores significantly below a control sample on the CFMT we will consider that person to be a DP suspect (DPSuspect). If a given student scores significantly below the control sample on the CCMT we will consider that person to be a DOA suspect (DOASuspect).

We acknowledge from the outset that we do not have enough information to conclude whether the DPSuspect and DOASuspect individuals do in fact have DP or DOA. This would ideally require the absence of lesions on brain imaging, exclusion of psychiatric issues, autism spectrum condition, a history of lifelong face and/or object recognition difficulties, and problems on additional objective tests of face or object recognition (Barton & Corrow, 2016). Having said that, we note that (i) this uncertainty is similar for DPSuspect and DOASuspect individuals, and (ii) no agreement exists so far on diagnostic criteria for DP and certainly not DOA. Hence, the dataset is intended to serve as a “proof of concept or procedure,” and we cannot know whether the data are indicative of what would be found in a representative sample of the population using more stringent criteria.

Method

Tasks

The original versions of the CFMT (Duchaine & Nakayama, 2006) and CCMT (Dennett et al., 2012) were used, with instructions and feedback translated to Danish. In addition to the CFMT and the CCMT we also had data from all participants on the Cambridge Face Perception Test (CFPT) (Duchaine et al. 2007). While not of main interest here, we include results from the CFPT because it provides a measure that can inform interpretations regarding a potential association between performance on the CFMT and CCMT. As mentioned in the introduction, observation of an association between two deficits may have little theoretical interest if the association is caused by general factors such as poor attention or poor compliance in the test situation. Hence, if individuals are impaired on the CFMT and the CCMT for such reasons, we would expect them to also perform poorly on the CFPT. If they do not, the co-occurrence of impairments on the CFMT and the CCMT becomes more theoretically interesting.

In the CFMT and the CCMT the participant is introduced to six target stimuli, and then tested with forced choice items consisting of three stimuli, one of which is the target. The tests comprise a total of 72 trials each distributed over three phases: (a) an intro phase with 18 trials where the study stimulus and the target stimulus are identical, (b) a novel phase with 30 trials where the target differs from the study stimulus in pose and/or lighting, and (c) a novel+noise phase with 24 trials where the target differs from the study stimulus in pose and/or lighting and where Gaussian noise is added to the target. The dependent measure is number of correct trials. The maximum score is thus 72; chance-level is 24. Performance on the CCMT was not adjusted for car expertise (see the section on “Limitations” for discussion of this aspect).

In the CFPT the participant must arrange six facial images according to their similarity to a target face. The images were created by morphing six different individuals with the target face. The proportion of the morph coming from the target face is varied in each image (88%, 76%, 64%, 52%, 40%, and 28%). The test comprises 16 trials, half with upright and half with inverted faces. Scores for each item are computed by summing the deviations from the correct position for each face. Scores for the 8 trials are then added to determine the total number of respectively upright and inverted errors. Hence, the dependent measure is a deviation-score; the higher the score the poorer the performance, with chance-level at each orientation being a deviation-score of 93.3. We only use data from the upright trials here.

Participants

Test sample

A total of 343 first-year psychology students (255 female; mean age = 23.2 years, SD = 5.2 years) who were naïve to our hypotheses took part in the study as part of their course in cognitive psychology. The course is approved by the study board at the Department of Psychology, University of Southern Denmark, and the experiments conducted do not require formal ethical approval/registration according to Danish Law and the institutional requirements. Prior to participation, the students were informed that data collected in the experiments might be used in an anonymous form in future publications. Participants were free to opt-out if they wished, and participation in the experiments was taken as consent. Hence the sample size was determined by the number of students who took the course in the years 2014–2018 and provided data for all three tasks (CFMT, CCMT and CFPT). No participants were excluded from the analyses reported below. Task order was counterbalanced for the CFMT and the CCMT. Except for one year, the CFPT was always performed a week prior to the CFMT and the CCMT, which were performed on the same day. The individual data for each of the 343 participants are provided in the Supplementary Data (https://osf.io/rfm6b/).

Even though the test sample is clearly not representative of the normal population, it is unbiased in the sense that the participants were not recruited because they were likely to have either face or object recognition difficulties. Such bias is prevalent in many studies where object recognition is examined in participants who are selected because of their face recognition problems, and this bias is likely to inflate the difference observed between face and object recognition performance (Klargaard et al. 2018).

Reference sample

Any individual in the test sample who scored significantly below the mean of an independent reference sample on the CFMT or CCMT was classified as DPSuspect or DOASuspect respectively. The use of an independent reference sample is important because using the mean and standard deviation (SD) of the test sample would yield biased estimates; by definition any normally distributed sample will have 2.5% of its subjects scoring 2 SD below the mean—a proportion that is conspicuously similar to the 2–-3% prevalence suggested for DP (Bowles et al., 2009; Kennerknecht et al., 2006).

The reference sample consisted of 61 individuals recruited as controls in previous studies of DP for whom we have data on several tests, including the CFMT, the CCMT and the CFPT (40 female; mean age = 36.7 years, SD = 9.8 years). The CFMT was always performed before the CCMT. Participants in this group fulfilled the following inclusion criteria: performance on the CFPT and the CFMT was within the normal range according to the gender- and age-corrected norms of Bowles et al. (2009), no indication of autism spectrum condition as assessed by the Autism-Spectrum Quotient questionnaire (Baron-Cohen et al., 2001), and no history of serious psychiatric illness, neurological or developmental disorders. While this procedure was necessary, as controls were recruited in the context of a DP study, it does introduce a potential bias in the present context because no inclusion criterion was applied for CCMT performance. The reason is that there are no suitable normative data, since those reported by Dennett et al. (2012) only cover the age range of 18–32 years.

Even though the reference sample is on average older than the test sample we found no age effect in the reference sample on the CFMT (r = -.05, p = .68) or the CCMT (r = -.09, p = .48) which is similar to what was reported by Bowles et al. (2009) for the CFMT on age groups below 50 years. However, on the CFPT we found a positive correlation with age (r = .34, p = .008), suggesting that performance decreased with age (low scores on the CFPT signify good performance). We found no significant effects of gender for any of the tests (all rpbs below .21 and all ps > .12) which could have justified the use of gender corrected cutoffs. The correlation between the CFMT and the CCMT in the reference sample was r = .07 (p = .57). The mean score for the reference sample was 59.02 (SD = 7.17) on the CFMT, 52.05 (SD = 9.16) on the CCMT, and 39.08 (SD = 13.11) on the CFPT. For the CFMT ZSkewness and ZKurtosis were −.86 and −1.45, respectively. For the CCMT ZSkewness and ZKurtosis were −.41 and −1.0, respectively. Only for the CFPT did we see some departure from normality with ZSkewness and ZKurtosis lying at 3.9 and 5.91, respectively. This was mainly driven by an outlier (Control40) with a score of 92. If this case were to be removed—which we have no reason to—the data appear rather normally distributed, with ZSkewness and ZKurtosis amounting to 1.1 and .59, respectively. It is worth noting that the means and SDs observed in our reference sample for the Cambridge recognition tests are very similar to those reported by Gray, Biotti, and Cook (2019) for their reference sample (N = 61, mean age = 37, SD = 9.8): CFMT 60.16 (SD = 6.89) and CCMT 52.93 (SD = 9.05). The individual data for each of the 61 control participants are given in the Supplementary Data (https://osf.io/rfm6b/).

Alternative reference sample

To evaluate the generalizability of the results obtained, we had originally planned to use an additional reference sample comprising a large group (N = 261) of university students from New Zealand with approximately the same age (M = 19, SD = 3) and gender composition (71% female) as the test sample. We decided to abandon this otherwise important check however, because it turned out that the New Zealandic sample performed significantly below the Danish test sample on the CFMT (Mdifference = −3.5, 95% CI [−2.2, −4.8], t602 = 5.18, p < .0001); a problem not present with the Danish reference sample which did not differ systematically from the test sample (Mdifference = −1.1, 95% CI [−3.1, .9] t402= −1.1, p = .28). Hence, had the New Zealandic sample been used instead of the Danish reference sample, the number of DPSuspect cases would have dropped from the 8.7% reported below to 1.2%, and the number of DOASuspect cases from 5.2% to 1.5% (the New Zealandic sample also scored below the Danish test sample on the CCMT, albeit not significantly so; Mdifference = .93, 95% CI [−.4, 2.3], t602 = 1.34, p = .18). Use of the New Zealandic sample would thus also have made the group-based analyses, to be presented below, impossible due to too few observations. Hence, we decided to use the Danish reference sample even though it yields a higher cutoff than what is reported for other studies (Bowles et al., 2009). The individual data for each of the 261 participants in the alternative reference sample are provided in the Supplementary Data (https://osf.io/rfm6b/), so that researchers using our or other approaches can use them in the future.

Even though we consider it correct not to use the New Zealandic sample to evaluate the generalizability of the present results, it clearly highlights that the outcome of the present procedure depends heavily on the reference sample used, as cutoff scores can differ from sample to sample (Albonico et al., 2017; Bowles et al., 2009). While this is true of all approaches that use reference samples to classify abnormality, it is nevertheless an aspect that requires careful attention if the procedure we suggest here should be applied to other datasets.

Statistical procedures

We used Crawford and colleagues’ procedure for classifying abnormal performance for each individual on the CFMT and CCMT (Bayesian one-sided point estimate: less than 5% of the population with a score more deviant than the one observed) (Crawford et al., 2010). With the present reference sample this corresponded to a score at or below 46 on the CFMT or at or below a score of 36 on the CCMT. For the CFPT we used the procedure developed by Crawford, Garthwaite, and Ryan (2011) which allows for controlling of covariates; in this case age which correlated with task performance in the reference sample. A one-sided estimate for detecting deviance on the CFMT, the CCMT and the CFPT was used because the hypotheses were directional, targeting the lower end of the distribution of performance. While this choice may be considered too liberal, others argue in favor of it (Crawford & Garthwaite, 2012), and it yields greater power of the group-based analyses to be performed. The most important aspect in the present context is, however, that same criterion is applied for each test (CFMT/CCMT).

Having identified the individuals who scored significantly below the mean of the reference sample on the CFMT (DPSuspects) or/and the CCMT (DOASuspects) we made the following two group-based comparisons. First, we compared the mean CCMT score of the DPSuspect individuals with the mean score of the remaining subjects who had scored in the normal range on the CFMT (nonDPSuspects). If the mean CCMT score of the DPSuspect group is significantly lower than for the part of the sample that are nonDPSuspects, this will yield support for an association between face and object recognition performance in the DPSuspect group (hypothesis A). Second, we removed those DPSuspect subjects who had also scored poorly on the CCMT (DP + DOASuspects, i.e., cases with possible comorbidity), and then compared the mean CCMT score of the remaining members of the DPSuspect group with the part of the sample that are nonDPSuspects. If an association between face and object recognition performance is still found in the DPSuspect group following removal of cases that are both DPSuspects and DOASuspects, this will suggest that the association between face and object recognition performance in the DPSuspect group is not driven by comorbidity alone (hypothesis B). These analyses were performed by independent-samples t-tests with bias-corrected and accelerated bootstrapping (1000 samples) as implemented in the SPSS software package (version 26). We computed the 90% CI for the mean differences and considered a difference reliable if its associated lower-level CI was not 0 or negative (corresponding to a one-tailed test).

In the last group-based analyses, we compared (i) the mean CFPT score of the DPSuspect group with the mean CFPT score of the nonDPSuspect group, and (ii) the mean CFPT score of the DOASuspect group with the mean CFPT score of the nonDOASuspect group. This was also performed by means of bias-corrected and accelerated bootstrapping estimating the upper-level 90% CI (corresponding to a one-tailed test). The use of one-tailed tests is justified because we have no reason to expect the DPSuspect individuals to perform better on the CCMT and the CFPT than the nonDPSuspect individuals. As argued above, these comparisons can provide evidence that can inform the interpretation of what potential associations between deficits across the CFMT and the CCMT might reflect.

In addition to the group-based analyses, we performed dissociation analyses at the individual level to see whether any of the individuals exhibited a dissociation in performance on the CFMT and the CCMT. We did this by applying the criteria suggested by Crawford et al. (2003) and implemented in the software program DissocsBayes_ES.exe (Crawford et al., 2010). This procedure estimates (i) whether a case’s scores on the CFMT or/and the CCMT differ significantly from that of the normal population by using the reference sample’s mean and SD as sample statistics (rather than as population parameters), and (ii) whether the case’s standardized difference between the CFMT and the CCMT differs significantly from the standardized differences in controls by taking into account the correlation between the CFMT and the CCMT in the reference sample. To count as a dissociation, two criteria must be fulfilled: (i) the person’s performance on the CFMT or/and CCMT must differ significantly from that of the normal population (estimated based on our reference sample), and (ii) the difference in performance of that person on the CFMT and the CCMT must differ significantly from the difference scores of the normal population on the same tasks. If a dissociation is detected, it may be either a strong or a classical dissociation (Shallice, 1988). A strong dissociation refers to a pattern in which an individual’s scores deviate significantly from the reference sample on both tasks X and Y, and where the individual’s difference between tasks X and Y deviates from what can be expected in the normal population. In comparison, a (putative) classical dissociation refers to a performance pattern where an individual’s scores differ significantly from the reference sample on one of the tasks but is within the normal range on the other, and where the individual’s performance difference between tasks X and Y deviates from what can be expected in the normal population. The reason that the classical dissociation is termed “putative” is that failure to reject the null-hypothesis for one of the tasks does not really constitute evidence of normal performance on that task (Crawford et al., 2003). Following the guidelines by Crawford and Garthwaite (2005), abnormality of individual scores on the CFMT and CCMT were based on one-sided estimates, whereas the test for a difference between the case’s standardized score on the CFMT and the CCMT relative to the reference sample was based on two-tailed tests.

Results

The data for each individual in the test sample on the CFMT, the CCMT, and the CFPT are provided in the supplementary data (https://osf.io/rfm6b/). Thirty (8.7%) individuals (DPSuspects) scored at or below the CFMT cutoff of 46 (5% if the cutoff is based on a two-sided test), and 18 (5.2%) individuals (DOASuspects) did so on the CCMT (cutoff ≤ 36) (1.5% if the cutoff is based on a two-sided test). Four individuals (1.2%) performed abnormally on both the CFMT and the CCMT (DPSuspect + DOASuspect). If DPSuspects and DOASuspects reflected stochastically independent disorders, the frequency of those performing abnormally on both tests should equal the product of the frequency of those performing abnormally on either test, which would be 1.57 subjects (0.46%). Hence there are 2.7 times more subjects with DPSuspect + DOASuspect than predicted by stochastic independence.

The mean difference between the CCMT score of the nonDPSuspect group (n = 313) and the DPSuspect group (n = 30) was 4.5 (t341 = 2.93, p = .001, lower 90% CI = 2.2), d = .56, with the nonDPSuspect group obtaining the highest mean score. In comparison, the mean difference between the CCMT score of the nonDPSuspect group (n = 313) and the DPSuspect and nonDOASuspect group (n = 26) was 2.8 (t337 = 1.74, p = .02, lower 90% CI = 0.5), d = .36, again with the nonDPSuspect group obtaining the highest mean score. We note that these analyses are conservative because the nonDPSuspect group (n = 313) included the individuals who scored abnormally on the CCMT (DOASuspects) (n = 14). Indeed, if these (DOASuspects) individuals are removed—which one could argue is the most fair contrast—the difference in CCMT performance between the DPSuspect only group and the nonDPSuspect + nonDOASuspect group (n = 299) increases to 3.5 (t323 = 2.33, p = .005, lower 90% CI = 1.2), d = .48.

Dissociation analyses revealed a total of six putatively classical dissociations (1.7%). Four individuals exhibited impaired face recognition but normal car recognition, while two individuals showed impaired car recognition but normal face recognition. Together these observations form a double dissociation between face and object (car) performance.

The mean difference between the CFPT score of the nonDPSuspect group (n = 313) and the DPSuspect group (n = 30) was −10.6 (t341 = −3.96, p = .001, upper 90% CI = −5.6), d = .76, with the nonDPSuspect group obtaining the lowest score (a low score on this test is associated with better performance). In comparison, the mean difference between the CFPT score of the nonDOASuspect group (n = 325) and the DOASuspect group (n = 18) was −5.8 (t341 = −1.49, p = .07, upper 90% CI = .38), d = .43, which is not significant. At a single-subject level, 7 of the 30 DPSuspect individuals performed abnormally on the CFPT (23%). This included two subjects who also performed poorly on the CCMT (i.e., DPSuspect + DOASuspect). In contrast, none of the individuals who performed abnormally on the CCMT alone (DOASuspects; n = 14) performed abnormally on the CFPT.

Discussion

One of the aims of this paper is to explore a procedure that can be used to discriminate between shared-mechanism (domain-general) and specific-mechanism (domain-specific) accounts of associations between neuropsychological deficits. Basically, the procedure involves testing for the presence of both associations and dissociations in cognitive functioning using a combination of group-level and single-subject-level analyses. To illustrate the procedure, we examined performance in a large unbiased sample on two widely used tests in research on developmental prosopagnosia (DP): the Cambridge Face Memory Test (CFMT) and the Cambridge Car Memory Test (CCMT). We designated individuals from this sample as DPSuspects or DOASuspects (developmental object agnosia) if their performance on the CFMT and CCMT respectively deviated abnormally from an independent reference sample. With this setup we argue that it is in principle possible to discriminate between two major types of accounts of DP: the shared-mechanism accounts, which assume a common functional locus underlying associated deficits in face and object recognition in DP, and the specific-mechanism accounts, which assume that associations between object and face recognition deficits in DP reflect co-occurring impairments to independent cognitive mechanisms (comorbidity).

If impairments in face and object recognition are associated in the DPSuspect group, we would expect the DPSuspect group to perform worse than the nonDPSuspect group on the object recognition task (CCMT). This is hypothesis A. If such an association is found, and if it reflects impairment to a function that is involved in both face and object recognition and not just comorbidity, we would further expect the association to persist even if cases of comorbidity are removed from the DPSuspect group. This is hypothesis B.

In support of hypothesis A, we find that object recognition (indexed by CCMT performance) is impaired in DPSuspect individuals at the group level. Importantly, object recognition performance is still impaired in the DPSuspect group when DPSuspect individuals who also perform abnormally on the CCMT (DOASuspects) are removed from the group analysis. Consequently, the poorer object recognition performance of the DPSuspect group cannot be explained by comorbidity alone (DOASuspects). This suggests that object recognition performance is affected in the subset of individuals who are classified as DPSuspect and who do not meet the criterion for being classified as DOASuspects (n = 24). This finding confirms hypothesis B, and therefore is consistent with a shared-mechanism account, but it does not follow readily from a specific-mechanisms account.

On the other hand, the single-subject analysis does provide some support for specific-mechanism accounts, in that, using the same tasks, the same reference sample and strict classification criteria, we find evidence of a double dissociation between performance on the CCMT and the CFMT for some individuals.

Considered together these findings suggest that both shared and specific mechanisms are involved in face and object recognition, both of which could be impaired in DPSuspects. To the degree that the DPSuspect individuals really have DP, a point we return to below, this would imply that two types of DP exist: (i) one reflecting maldevelopment of a mechanism(s) specific to faces (Duchaine et al. 2006), and (ii) one reflecting maldevelopment of a mechanism(s) common to face and object recognition but more critical for faces (Gerlach et al., 2016).

With respect to the frequency of deficits at a single-subject level, 30 out of 343 (8.7%) exhibited abnormal face recognition performance. Of these 30 DPSuspect subjects, only four (i.e., 13% of the 30) could be considered as showing good evidence—i.e., a putative classical dissociation—for a selective face recognition deficit. On the other hand, 4 of the 30 (13%) individuals performed abnormally with both faces and objects (DPSuspect + DOASuspect), leaving 22 DPSuspect subjects (73%) with indeterminate evidence for either involvement or sparing of object recognition. Conversely, 2 of 18 DOASuspect subjects (11%) showed a putative classical dissociation with normal face recognition performance, versus 4 of 18 with (22%) impaired performance in both domains (DPSuspect + DOASuspect), and 12 subjects (67%) with indeterminate evidence. Although the numbers are small, this would suggest that the frequency of associated deficits (DPSuspect + DOASuspect) is equal to or exceeds that of either selective deficit (DPSuspect/nonDOASuspect or nonDPSuspect/DOASuspect). This would not be compatible with DP and DOA being stochastically independent disorders. This is reflected in the fact that there are 2.7 times more DPSuspect + DOASuspect subjects than predicted by stochastic independence of DP and DOA.

If face-selectivity in DPSuspect individuals is defined by a putative classical dissociation, then 13% of the DPSuspect cases exhibited a face-selective impairment, a figure similar to the 20% estimate made by Geskin and Behrmann (2018) in their review of individuals diagnosed with DP. A similar outcome was reported by who found that 15% of their DPs (6/40) showed a classical dissociation in matching of faces and hands/houses, when applying the same conservative criteria for defining dissociations as we do here. The application of conservative criteria cannot be the whole explanation, however, because Gray et al. (2019) found a putative classical dissociation in performance in 52% (24/46) of their DP sample using the same criteria and comparing performance on the CFMT and the CCMT. The same is true of a study by Barton, Albonico, Susilo, Duchaine, and Corrow (2019), comparing performance on the CFMT and the Cambridge Bicycle Memory Test in 12 DPs. They found 50% putative classical dissociations. One thing that differs between these studies is that studies reporting high levels of dissociations have been biased in that they both tested a group selected because of deviant performance on one of the test dimensions alone (face recognition) and employed the same task for classifying DPs as for comparing face and object recognition performance (Barton et al., 2019; Gray et al., 2019). This is likely to inflate the rate of observed dissociations (Klargaard et al., 2018; Kriegeskorte et al. 2010). Considered together these studies thus suggest that dissociations in face and object recognition performance are only seen in a minority of the DP population; a conclusion also reached in the study by Barton et al. (2019) when examining performance across more than one object recognition task.

Even though the procedure used here is applied to data regarding face and object recognition deficits, it is applicable to other types of deficits where (the degree of) selectivity has been questioned. An example is category-specific semantic disorders for natural objects and artefacts. Do such disorders reflect damage to processes that are category-specific, as suggested by Warrington and McCarthy (1987), or might they arise in a domain-general network, as suggested by Chen et al. (2017), or are both frameworks needed to account for the full pattern of associations and dissociations? And what about letters and digits; are these processed by the same or different cognitive systems, and to what degree (Starrfelt & Behrmann, 2011)? The procedure could also be used to investigate omissions and substitution errors in neglect dyslexia: Are such errors a consequence of a single functional impairment in a single attentional mechanism, which can be disrupted along a continuum of severity (Behrmann et al. 1991; Mozer & Behrmann, 1990), or do they reflect two independent mechanisms (Martelli et al. 2011)?

Even if the suggested procedure seems applicable to a range of questions debated in cognitive neuropsychology, it is also associated with limitations. We will highlight these below. Again, we will use the present data on face and object recognition deficits to illustrate them, but many of the limitations will apply in other contexts also.

Limitations

We will begin with a limitation that is specific to the present dataset. As mentioned in the “Method” section, there may be a potential bias in the reference sample we used which may have influenced the proportion of individuals classified as DPSuspect and DOASuspect. As noted, the reference sample was unlikely to include individuals with DP but may have included individuals with DOA. If so, this could increase the average CFMT score relative to the average CCMT score in the reference sample, making it more likely for individuals in the test sample to be classified as DPSuspects than DOASuspects. On the other hand, all individuals in the reference sample performed the CFMT before the CCMT, whereas individuals in the test sample were randomized so that half performed the CFMT first and half the CCMT. If experience with the format of the Cambridge tests makes the task easier, this should have favored CCMT performance for the reference sample. This in turn should make it more likely for an individual in the test sample to obtain a DOASuspect score. Hence, there may be two potential but opposing biases that may have affected our estimate of the frequency of DPSuspects and DOASuspects. To this we can add the general problem—discussed above—that the results obtained will always depend, at least to some extent, on the particular reference sample used for classifying abnormality (Gerlach et al., 2016). This limitation applies to all studies using a control population, both group studies and single case studies.

Even if our estimates of the frequencies of face and object recognition difficulties in the normal population were accurate, we do not know whether the DPSuspect individuals actually have DP, just as we do not know whether the DOASuspect individuals really suffer from DOA. We also do not know whether the impairment observed in the DOASuspect group would manifest on other object categories than cars, which it should if the problem is of a general nature. Only to the degree that car recognition serves as a good index of object recognition in general—and there are some data that suggest that cars and/or vehicles may differ from other objects (Cepulic et al., 2018; Gerlach & Starrfelt, 2018b; Richler et al., 2017; Van Gulick et al., 2016)—can the present data be used to draw firm inferences regarding the relationship between face and object recognition.

Another potential limitation of the present study is that it did not attempt to calibrate CCMT performance for car expertise. The ability to identify cars correlates strongly with the non-visual semantic knowledge about car models (Barton et al., 2009; Barton et al., 2019). The degree to which expertise contributes to performance on the CCMT specifically is not known. Expertise effects on CCMT performance were described in a study (Dennett et al., 2012) that indexed expertise in two ways. First, they used self-ratings, which have been shown to be a poorer predictor of car recognition than objective tests of semantic car knowledge (Barton et al., 2009). Second, they tested visual recognition of car models, which makes for a somewhat tautological result: essentially, long-term visual car recognition correlates with short-term visual car recognition. Nevertheless, it remains plausible that at least some of the variance in CCMT performance in our subjects reflected variations in car expertise.

A final and general reservation regarding the procedure concerns the interpretation of the associations observed. As we argued above, the question of independence should only apply to higher levels of processing, where there is the hypothetical chance of divergence into selective processing streams: in the present case for faces and objects. In a similar vein, associations are of most theoretical interest if they do not reflect low-level factors (e.g., poor sight) or nuisance factors (e.g., poor attention or compliance). Ideally this calls for a (more low-level) control task that can be used as reference point for performance with the two target categories; here faces and objects. Only to the extent that performance on such a control task is within normal range will concurrent impairments (in face and object recognition performance) be indicative of a non-trivial association. In the present case one can argue that the CFPT can serve this purpose. If impaired performance with faces or cars reflects low-level impairments or poor compliance, one would suspect performance to be impaired on the CFPT as well. Two findings speak against this interpretation: (i) of the 30 DPSuspects only 23% scored abnormally on the CFPT, and (ii) of the 18 DOASuspects, only 11% scored abnormally on the CFPT. Likewise, of the 25 individuals who performed abnormally on the CFPT, 72% performed within the normal range on the CFMT and the CCMT. Indeed, of the 73 individuals who performed abnormally on one or more of the three tasks only 2 (3%) did so across all tasks suggesting little influence of potential low-level factors.

Advantages

Despite its limitations, the approach suggested here represents an advance in some methodological respects. First, the results are based on an unbiased sample given that the test sample had not been selected on the grounds of having a specific type of impairment. This circumvents a problem in most of the current literature on DP, and in cognitive neuropsychology in general, namely that individuals are selected for study because of a particular deficit—in our DP example, their poor face recognition. As discussed above, this selection bias increases the likelihood of finding dissociations, in this case between face and object recognition (Klargaard et al., 2018). Second, in most neuropsychological studies—and in all studies of DP reviewed by Geskin and Behrmann (2018)—dissociations have been based on criteria that did not require the difference between tasks X and Y to differ significantly from that seen in the normal population (Gerlach et al., 2018). Our estimates of selectivity are based on the stricter criteria for a putative classical dissociation.

Finally, we find a double dissociation between face and object (car) recognition performance using the same tasks and the same reference sample. To our knowledge, there has been no prior demonstration of such a conservatively derived double dissociation. At the very least this provides evidence at the single-subject level, that performance on the CCMT and CFMT could be supported by partially different cognitive processes or processes that are differentially affected by systematic differences between faces and cars.

Conclusion

It is not uncommon to find claims that two disorders are independent if they (doubly) dissociate. As we have argued here, we should be careful with our choice of words as independence is rather different at the level of disorders compared with at the level of cognitive processes. It is far more likely that different disorders are separable than they are independent. Cognitive processes, on other hand, are likely to be independent, and they can be so even if involved in disorders that are not independent; for example, when disorders co-occur because of collateral damage or maldevelopment. Indeed, this is why associations between deficits are typically deemed less evidential than dissociations between deficits when it comes to theorizing about cognitive architecture.

Even if it is true that associations between deficits may reflect collateral damage it is also the case that associations may reflect that two deficits are genuinely caused by malfunction of the same cognitive process. Here we demonstrate that it is possible to discriminate between these alternatives by means of a procedure involving large samples that are unbiased in the sense that the participants are not selected because of a specific deficit. We exemplify the procedure in the context of developmental prosopagnosia, but the procedure is in principle applicable to all neuropsychological deficits/disorders.

By simulating the procedure on an actual dataset, we identify several aspects of the procedure that should be addressed if the procedure is applied prospectively. However, even as implemented here, the procedure yields estimates of dissociations/associations that are well in line with existing studies. The main advantage of the procedure is that it allows for examination of both associations and dissociations in the same sample by combining group-level and single-subject-level analyses. This, we believe, can help even the balance in the use of associations and dissociations as grounds for neuropsychological theorizing, and it can also help us to take more seriously the possibility that seemingly selective deficits may reflect impairments to both domain-specific and domain-general mechanisms. This is possible because the suggested approach may yield evidence in support of an association between deficits X and Y, and hence domain-generality, even when associations caused by likely collateral damage to domain-specific mechanisms have been accounted for. Hence, as Caramazza said in 1986:

Ultimately, what we should be cautioned about is not the evils of using single dissociations or overinterpreting association of symptoms, but the evils of not developing a sufficiently detailed model of the cognitive systems of interest to guide the search for richly articulated patterns of performance in brain-damaged patients. To the extent that our models of cognitive functioning are well developed, we will be able to make efficient use of single and double dissociations and association of symptoms. (p. 66)