Introduction

In their article, Sklar et al. (2012) claimed to have shown that participants can solve complex arithmetic equations nonconsciously, i.e., in the absence of consciously perceiving the equations. Specifically, they examined whether the presentation of multistep additions and subtractions with three single-digit operands (e.g., “9 − 3 − 4 =”, “3 + 1 + 4 =”) and without the result (2 and 8, respectively) would bias the verbal enumeration of a subsequently presented, visible number. Thus, the experiments by Sklar et al. were designed to test for “priming” effects, in which the exposure to a stimulus (or, prime) influences the response to a second stimulus (or target).Footnote 1 Following the seminal work by Meyer and Schvaneveldt (1971), priming has been widely used in the fields of cognitive psychology and neuroscience to infer the structure of semantic representations, including the representation of numerical values (Dehaene, Molko, Cohen, & Wilson, 2004; Knops, 2016). In the case of Sklar et al., target numbers were either congruent or incongruent with the result of the prime equation. For example, the target number “2” is congruent with the result of the equation “9 − 3 − 4 =”, while numbers “3” or “5” are incongruent. Repeated measures analysis of variance (rm-ANOVA) revealed significantly shorter response times (RTs) for congruent compared with incongruent priming trials. Rather surprisingly, this congruency priming effect was significant for subtractions, but not for additions.Footnote 2 The authors concluded from these data “that uniquely human cultural products, such as […] solving arithmetic equations, do not require consciousness” (p. 19617).

For the following reasons, we believe that it is crucial to reexamine whether the claims made by Sklar et al. (2012) are fully supported by the available data. From a theoretical standpoint, the claim of “doing arithmetic nonconsciously” is a strong claim and, hence, demands strong evidence. Most cognitive scientists would agree that the complex nature of the underlying cognitive processes renders it implausible, rather than plausible, that effortful arithmetic operations may be performed without consciousness. Specifically, multistep additions and subtractions as used by Sklar et al. cannot be solved by declarative fact retrieval from long term memory alone. Successful performance would necessitate that arithmetic rules can be initiated and followed unconsciously, and that the unconscious intermediary results are stored in working memory. So far, there exists only a single study on unconscious addition making the former claim (Ric & Muller, 2012). Furthermore, considering the technical setup adds to the a priori implausibility of the effect reported by Sklar et al. They used continuous flash suppression (CFS) to render the prime equations invisible for up to 2 seconds. Following the introduction of this interocular suppression method (Tsuchiya & Koch, 2005), a very heterogeneous picture has emerged about the extent to which high-level unconscious visual processing is possible under CFS (Ludwig & Hesselmann, 2015; Sterzer, Stein, Ludwig, Rothkirch, & Hesselmann, 2014; Yang, Brascamp, Kang, & Blake, 2014). To the best of our knowledge, there is no evidence for Sklar and colleagues’ premise that CFS allows for more unconscious processing, because “it gives unconscious processes ample time to engage with and operate on subliminal stimuli” (p. 19614). On the contrary, it may rather be that long suppression durations (Experiment 6: 1,700 ms and 2,000 ms; Experiment 7: 1,000 ms and 1,300 ms) are associated with a particularly deep suppression of visual processing under CFS (Tsuchiya, Koch, Gilroy, & Blake, 2006), and extended periods of invisible stimulation have been shown to lead to negative priming influences (Barbot & Kouider, 2011).

The goal of the current article is to provide a critical reexamination of the claims made by Sklar et al. (2012) based on the data they collected. We do so by approaching the original data set from different angles, and we assess whether the conclusions based on the original results still hold when taking the results of these new analyses into account. In the next sections, we provide five reanalyses of the data obtained by Sklar et al. First, we verified the repeatability of the reported analyses. This was a crucial first step that guaranteed we were analyzing the same data set as the original study. Second, we analyzed the data using a Bayesian linear mixed-effects models with crossed random effects for participants and stimuli, relying on Bayes factors to quantify how strongly the data support the predictions made by one statistical model compared to another. Indeed, throughout our reanalyses we are explicitly interested in quantifying the degree to which the data provide evidence for the claims that were made in the original study. It has been argued that classical significance testing approaches are not explicitly connected to statistical evidence, whereas the Bayes factor provides the possibility to quantify evidence in a coherent framework (Morey, Romeijn, & Rouder, 2016). This motivated us to rely on the Bayes factor throughout our reanalyses (except for the last one, see further). Third, Sklar et al. (2012) claimed that the congruency effect was observed for subtraction equations but not for addition equations. However, the interaction between congruency and operation was never tested although it is critical to ensure the congruency effect is different for subtraction compared to addition. Fourth, if the congruency effect observed for the subtraction equations was genuinely due to number processing, one would predict a distance effect to be present in the data. That is, as prime-target distance increases, response times should increase as well. Therefore, we assessed whether the data showed such distance-dependent priming effects. Fifth, Sklar et al. (2012) interpreted that the congruency effect indicated that participants unconsciously solved the equations. It has recently been argued that such a claim is only warranted if reaction times are predictive of prime-target congruency. That is, accurate classification of the prime-target relationship should be possible from the reaction time distributions.

For the sake of brevity, we report all reanalyses for Experiment 6 only in the main text of this article. We refer to the Supplementary Materials for the results of the reanalysis of Experiment 7, which were qualitatively the same. An extensive overview of all calculations is also available in the Supplementary Materials, including the code used to process the data and conduct the analyses.

Methods

Data preparation

We obtained the data from Sklar et al. (Sklar et al., 2012). All data were processed and analyzed in R 3.3.2, a statistical programming language, and RStudio 1.0.44 (R Core Team, 2014; RStudio Team, 2015). A complete overview of this analysis can be found in the R markdown file in the Supplementary Materials (https://doi.org/10.6084/m9.figshare.4888391.v2). All data were visualized with the yarrr package version 0.1.2 (Phillips, 2016).

Reanalysis #1: Repeatability of the reported analyses

We followed all data processing steps reported in Sklar et al. to compute mean response times for each participant – condition combination. We used the afex package version 0.16-1 to recalculate the repeated measures ANOVA by using the aov_car() function (Singmann, Bolker, Westfall, & Aust, 2016). Type III sums of squares were used, as these are default in many commercially available statistical packages and because one of these (SPSS) was used by Sklar et al. to analyze the data.

Reanalysis #2, #3, and #4: Bayesian linear mixed-effects models with crossed random effects

As linear mixed-effects models do not require a fully balanced data set, we used the trial-level (raw) data for all analyses. To account for the positive skew of the response time distributions, all analyses were performed on logarithmically transformed response times (as in Moors, Boelens, van Overwalle, & Wagemans, 2016; Moors, Wagemans, & de-Wit, 2016). All Bayes factors were calculated using the BayesFactor package version 0.9.12-2 (Morey, Rouder, Love, & Marwick, 2015). The Bayes factor refers to the ratio of marginal likelihoods of different statistical models under consideration (e.g., a model with a main effect of prime-target congruency versus an empty model with only random effects), quantifying the change from prior to posterior model odds:

$$ \frac{p\left({M}_1\left| D\right.\right)}{p\left({M}_2\left| D\right.\right)}=\frac{p\left({M}_1\right)}{p\left({M}_2\right)}\frac{p\left( D\left|{M}_1\right.\right)}{p\left( D\left|{M}_2\right.\right)} $$

where

$$ B{F}_{12}=\frac{p\left( D\left|{M}_1\right.\right)}{p\left( D\left|{M}_2\right.\right)}=\frac{\int_{\varTheta_1} p\left( D\left|\theta \right.\right) p\left(\theta \right) d\theta}{\int_{\varTheta_2} p\left( D\left|\theta \right.\right) p\left(\theta \right) d\theta} $$

In itself, the Bayes Factor can be interpreted as a relative measure of evidence for one statistical model compared to another. That is, the value of the Bayes Factor has no absolute meaning, and should always be interpreted relative to the statistical models under consideration. As Etz and Vandekerckhove (2016, p.4) put it: “The Bayes factor is most conveniently interpreted as the degree to which the data sway our belief from one to the other hypothesis.” Although the Bayes factor is inherently continuous, its values are sometimes partitioned into categories indicating different grades of evidence. For example, a Bayes factor of 3 or more often is associated with moderate evidence for one model, whereas Bayes factors larger than 10 are deemed strong evidence for that model. Bayes factors between 1/3 and 3 often are interpreted as providing equal support for both models, or anecdotal evidence for either model. We took these categories as guidelines, but we do not wish to fall prey to traditional accept/reject classifications, such as those that are standard in classical null hypothesis significance testing.

We used the generalTestBF() function to calculate the Bayes factors associated with the full model (i.e., including all fixed and random effects of interest) and most reduced versions of the full model (the whichModels argument was set to “withmain,” such that interaction effects were only included if the respective main effects also were included in the model) (Rouder, Engelhardt, McCabe, & Morey, 2016). With respect to the random effects, random intercepts were included for both participants and target stimuli. Initial analyses also were performed, including random slopes for participants and target stimuli for the congruency effect. However, models including random slopes were never favored over models, including random intercepts only. Therefore, we decided to drop random slopes altogether in the analyses reported. All default prior settings were used (i.e., a “medium” prior scale for the fixed effects (r = 0.5) and a “nuisance” prior scale for the random effects (r = 1)). Our general strategy of reporting the Bayes factors is as follows. We extracted the model with the highest Bayes factor compared to an empty model (i.e., an intercept-only model) and considered this to be the model that predicted the data best (in the following referred to as “best model”). We then recalculated all Bayes factors such that they are compared to this best model. In all tables, this yields an overview of the best model (Bayes factor = 1), and how strongly the data support the predictions made by this model compared to all other models. Because prior settings influence the Bayes factor, we also report on sensitivity analyses in the Supplementary Materials by varying the value of the prior scale of the fixed effects (which are of most interest here). We always included two models in the sensitivity analysis (i.e., yielding a single Bayes factor for each value of the prior scale). One of those was the best model, and in the other one the most important variable for the current reanalysis was included or excluded (depending on its inclusion in the best model). For example, if the best model contained only a main effect of prime-target congruency in addition to the random effects, the sensitivity analysis would be conducted for this model and the model including random effects only.

Reanalysis #5: A significant difference does not imply accurate classification

For this analysis, we used the R code that was used in Franz and von Luxburg (Franz & von Luxburg, 2015), which is publicly available (https://osf.io/7825t/). The classification analysis can be summarized as follows. The goal is to determine a threshold RT that can be used to classify RTs as either congruent or incongruent. In the case of the median classifier, the median RT is used as a threshold. In the case of the trained classifier, the data set is split into two halves: a training and a test set. For the training set, the threshold value is determined that leads to the fewest number of misclassifications, and this threshold is then applied to the test set. This was then repeated 10 times, and the average classification accuracy was taken as classification performance for the trained classifier.

Results

Reanalysis #1: Repeatability of the reported analyses

We first assessed the repeatability of the statistical analyses reported in Sklar et al. (2012). Repeatability entails taking the raw single-trial data, applying the same processing steps as outlined in the methods section and ending up with the same numbers reported by the authors in the original paper (Ioannidis et al., 2009). Importantly, repeatability without discrepancies implies that all following reanalyses are based on the same data set as used for the original publication. In short, this was the case (Fig. 1a). For the subtraction condition, we observed a significant priming effect (F(1,15) = 16.79, p = 0.001) and no interaction between prime-target congruency and presentation duration (F(1,15) = 2.13, p = 0.17). As reported, we did not observe a priming effect for the addition equations (F(1,15) = 1, p = 0.33). We refer to the Supplementary Materials for the repeatability of all other analyses reported by Sklar et al.

Fig. 1
figure 1

Overview of the reanalyses (Experiment 6). (a) Reanalysis #1. Repeatability of the reported results. Priming effects (i.e., difference between incongruent and congruent condition, in ms) are depicted for the addition and subtraction equations, for both prime presentation durations. (b) Reanalysis #4. Effect of prime-target distance on response times. Response times (ms) are depicted in function of absolute prime-target distance. (c) Reanalysis #5. Classification of response times. Classification accuracy is depicted in function of the classifier type, median or trained. In all plots, thick lines indicate the arithmetic mean, dots depict data from individual participants, and the beans indicate fitted densities (Phillips, 2016)

Reanalysis #2: (Bayesian) linear mixed-effects modeling with crossed random effects

Next, we applied an alternative statistical model that is generally considered more appropriate for experimental designs such as the one used in Sklar et al. (2012), linear mixed-effects modeling (LMM) with crossed random effects for participants and stimuli (Baayen, Davidson, & Bates, 2008; Clark, 1973). That is, in the experiments a sample of participants responds to a sample of stimuli (i.e., the target numbers) in a set of different conditions. This induces variability in reaction times not only due to participants, but also due to targets. Traditionally, repeated measures ANOVAs or paired t tests are applied to data sets such as these to account for participant variability. However, the stimuli used in the experiment also induce stimulus variation, which is traditionally not accounted for and can substantially increase Type I error rates (Judd, Westfall, & Kenny, 2012). In psycholinguistics it is now standard to simultaneously model participants and items as random effects by relying on LMMs with crossed random effects (Baayen et al., 2008). Similar voices have been raised for social psychology (Judd et al., 2012; Wolsiefer, Westfall, & Judd, 2016), and these models have been applied in the CFS literature as well (Moors, Boelens, et al., 2016; Moors, Wagemans, et al., 2016; Stein, Kaiser, & Peelen, 2015). Thus, to ensure proper Type I error control, we implemented LMMs with crossed random effects throughout our reanalyses. Furthermore, in contrast to ANOVAs, LMMs do not require a fully balanced data set. As such, the raw trial-level data can be used to model RTs, rather than having to rely on mean RTs. Table 1 summarizes the results of the Bayes factor analysis applied to the subtraction condition of Experiment 6. In line with the original results, the results indicate that a model containing only a main effect of prime-target congruency is the model that predicted the data best (“best model”). The Bayes factors in Table 1 do provide some insight into the strength of this evidence, however. That is, the best model is not strongly favored over a model that does not include prime-target congruency (model 4) or one that includes no effects at all (model 3). Furthermore, the sensitivity analysis between model 1 and model 3 indicates that this pattern generalizes across all prior scales except for the smallest two considered (r = 0.1 and r = 0.3). Here, the Bayes factors indicate stronger evidence for a model including prime-target congruency.

Table 1 Bayes factor analysis of the subtraction data (Experiment 6)

Reanalysis #3: Analysis of the claimed “congruency x operation” interaction

Sklar et al. (2012) interpret the results of Experiments 6 and 7 as follows: “The results so far show that subtraction equations are solved nonconsciously and hence are sufficient to confirm our hypothesis that complex arithmetic can be performed unconsciously. However, why did not we find evidence for nonconscious solution of the easier-to-solve addition equations?” (p. 19616).

This interpretation of their results in terms of finding a congruency effect for subtraction but not for addition is based on comparing a difference in significance levels. However, such an interpretation is not warranted, because a difference in significance level does not imply a significant difference (Gelman & Stern, 2006; Nieuwenhuis, Forstmann, & Wagenmakers, 2011). With respect to the study of Sklar et al., an interpretation of the results in terms of the possibility of unconscious arithmetic for subtraction but not for addition thus requires an explicit test of the interaction between prime-target congruency and operation. In any case, for an experimental design such as the one used in Sklar et al., it would be most straightforward to first analyze the 2 x 2 x 2 design as a whole (prime-target congruency x prime presentation duration x operation). Depending on the presence of interactions, the analysis could then be broken up into simpler ones. We first analyzed the data using a 2 x 2 x 2 rm ANOVA. The critical interaction indeed proved to be significant (F(1,15) = 9.88, p = 0.007). However, as highlighted before, such an analysis does not properly control the Type 1 error rate. Therefore, we subjected the full data set to the Bayesian LMM analysis, the results of which are reported in Table 2. Two things are notable. First, in contrast with the results of the rm-ANOVA, none of the models reported in Table 2 includes an interaction between prime-target congruency and operation (or any other factor). Second, the best model does not include an effect of prime-target congruency. Although it is not strongly favored compared to models that do include an effect of prime-target congruency, the evidence for the absence of a congruency effect is notably stronger here compared to the evidence for the presence of a congruency effect in Reanalysis #2. Furthermore, a sensitivity analysis between models 1 and 4 indicates that this pattern generalizes across all prior scales considered (i.e., the Bayes factor never switches toward indicating evidence in favor of the model including prime-target congruency).

Table 2 Bayes Factor analysis for the full data set (Experiment 6)

Reanalysis #4: The effect of numerical prime-target distance on response times

We now turn to a reanalysis that was motivated by theory-based predictions from the number processing literature. The priming model proposed by Sklar et al. (2012) can be summarized as follows: the arithmetic equation is solved, then the result (i.e., a number) is mentally represented, and this short-lived numerical representation influences the participant’s response to a subsequently presented number (i.e., the target). For symbolic primes and targets (e.g., Arabic numbers), it is commonly found that as numerical prime-target distance increases, priming effects decrease in a V-shaped manner (Reynvoet, Brysbaert, & Fias, 2002; Roggeman, Verguts, & Fias, 2007). This well-established feature of numerosity priming is generally explained by representational overlap between the prime and the target (Van Opstal, Gevers, De Moor, & Verguts, 2008). Thus, if the data reported in Sklar et al. (2012) involve genuine number processing, one should observe a V-shaped priming function for prime-target distances. To test for a V-shape, the absolute value of the prime-target distance was taken as a predictor. A positive slope, significantly different from zero, would indicate that prime-target distance influenced RTs in a V-shape manner. Table 3 shows the results of the Bayes factor analysis. As is apparent, distance was only included as a predictor in model 6, for which the evidence was considerably less strong compared to the best (empty) model (see also Fig. 1b). For the sensitivity analysis we used models 1 and 6. Here, the pattern of Bayes factors also generalized across all different prior scales.

Table 3 Bayes factor analysis of the prime-target distance (Experiment 6)

Reanalysis #5: A significant difference does not imply accurate classification

In the abstract, Sklar et al. (Sklar et al., 2012) interpret the priming effects observed in their study as follows: “[…] multistep, effortful arithmetic equations can be solved unconsciously” (p. 19614). However, the result from which this claim was derived is the mean RT difference between the congruent and the incongruent condition (i.e., the priming effect). As recently highlighted by Franz and von Luxburg (Franz & von Luxburg, 2014, 2015), this type of analysis is not sufficient to make these claims. Specifically, the claim that participants are able to nonconsciously solve equations implies that the RTs should be predictive of the prime-target congruency with which they were associated. The idea behind the approach by Franz and von Luxburg is to leave the framework of classical null hypothesis significance testing on mean RT data behind and ask how much information about prime-target congruence is available in the RT data. This empirical question is addressed by means of trial-by-trial classification: If the significant prime-target congruency effect on RTs is supposed to serve as evidence for “good” unconscious processing, then we should be able to use the RTs to decide for each trial whether the prime (i.e., the number representing the solution of the equation) and the target (i.e., the visible target number) were congruent or incongruent. According to the underlying priming model, small RTs would indicate a congruent trial, large RTs would indicate an incongruent trial. To test how good classification performance of the prime-target relationship was for the subtraction condition, we used two different classifiers, a median and a trained one (Table 4; Fig. 1c). The median classifier assumes that the data follow a log(normal) distribution, whereas the trained classifier is a standard distribution-free classifier from the machine learning literature. As can be derived from Table 4, the trained classifier performs completely at chance (50.88%). The median classifier performs only slightly above chance (53.10%). This indicates that the congruency effect for the subtraction data is only associated with poor classification of the prime-target congruency based on the RTs.

Table 4 Classification performance for the subtraction data (Experiment 6)

Discussion

In this article, we critically reanalyzed the data reported in Sklar et al. (2012). We first established that all analyses were repeatable without any discrepancies (Reanalysis #1). For making their data available to us and their research transparent, the authors should be applauded. Indeed, recent empirical evaluations have shown that the published biomedical literature generally lacks transparency, including public access to raw data and code (Goodman, Fanelli, & Ioannidis, 2016; Iqbal, Wallach, Khoury, Schully, & Ioannidis, 2016; Leek & Jager, 2016). Furthermore, a recent series of studies has indicated that half of the published psychology papers include at least one statistical inconsistency, and one in eight even a gross inconsistency (Nuijten, Hartgerink, van Assen, Epskamp, & Wicherts, 2015). Importantly, the full repeatability ensured that our following reanalyses were based on exactly the same data set as used for the original publication.

We then applied four different analyses to the data set, inspired by methodological, statistical, and theoretical considerations. The results of these analyses can be summarized as follows:

  • When applying a statistical model that provides better control for the Type I error rate for the experimental design at hand, we showed that the evidence in favor of the presence of a congruency effect was not as strong as would be derived from the analyses that were reported in the original article, albeit the fact that the best model did include an effect of prime-target congruency (Reanalysis #2). Thus, on purely statistical grounds, this result shows that merely accounting for item variability substantially attenuates the strength of the evidence for the reported priming effect. In essence, this does not contradict the result reported by Sklar et al. Nevertheless, we argue that the strength of the evidence provides an important nuance to the interpretation of these results.

  • The data do not strongly support the claim that unconscious arithmetic can happen for subtraction equations, but not for addition equations. That is, none of the models reported in Reanalysis #3 included an interaction between prime-target congruency and operation. Moreover, all BF analyses also indicated that the data were more consistent with statistical models not including an effect of prime-target congruency. Thus, an analysis based on the full data set rather than different subsets of the data revealed that no strong evidence for main effects of or interactions with prime-target congruency was observed.

  • No characteristic patterns of number processing, which have repeatedly and robustly been reported in the literature, are present in the current data set (Reanalysis #4). This indicates that one should be very cautious to invoke mechanisms related to number processing to explain these results.

  • Even if the priming effect is taken at face value after the results of the three previous reanalyses, the data set does not provide evidence that participants unconsciously solved the equations that were presented subliminally. That is, the classification of the prime-target congruency based on the reaction times is nearly at chance, calling into question the assertion that people can solve equations nonconsciously (Reanalysis #5). Although the median classifier performed slightly above chance, its performance was still considerably lower than the performance that was taken to be the cutoff for establishing invisibility of the prime equations (60%).

Taken together, we conclude that the converging nature of all four reanalyses indicates that the data used for invoking the existence of unconscious arithmetic contain little evidential value for those claims (i.e., evidential value in terms of the Bayes factors obtained in the reanalyses). Within the conceptual framework proposed by Goodman et al. (2016), our reanalyses therefore suggest low inferential reproducibility of the study by Sklar et al.

A critical reviewer suggested that, based on the results of our reanalyses, one would expect that the original findings would not replicate easily. In this context, a direct replication of the study by Sklar et al., using the same experimental setup and exactly the same stimulus material, would be very informative. This was the goal of the recent study by Karpinski, Yale, and Briggs (2016). The authors used exactly the same materials as in Sklar et al. and aimed at replicating the original effect in a larger sample (n = 94). Interestingly, they obtained evidence for unconscious addition, but not subtraction (i.e., opposite findings compared to Sklar et al.). As this data set would be very informative for our reanalysis, we contacted the authors of this replication study. Upon reanalyzing the data set together with the authors, it became apparent, however, that a coding error led to an incorrect calculation of the mean RTs. A corrected analysis of the data did not reveal any priming effects for unconscious additions or subtractions (Karpinski & Briggs, personal communication). That is, the critical paired comparisons for assessing priming effects for addition and subtraction both no longer passed the threshold for statistical significance (addition: t(93) = 0.11, p = 0.92; subtraction: t(93) = 0.23, p = 0.82). The paper has now been retracted. Thus, the single published replication study of the unconscious addition and subtraction effects reported in Sklar et al. actually failed to replicate the original pattern of results. Together with the results of our reanalyses, we argue that the results of this nonreplication calls for caution when interpreting the original results.

Exploring the scope and limits of non-conscious processing is essential for the formulation of theories of consciousness (Dehaene, Charles, King, & Marti, 2014). Since the results reported in Sklar et al. (2012) might have important implications for theories of (un)conscious processing (Dehaene et al., 2014; Koch, Massimini, Boly, & Tononi, 2016; Soto & Silvanto, 2014), we were motivated to conduct this critical reanalysis. If the reported effect is true, it can indeed be considered as an extraordinary case of subliminal perception and, as Sklar et al. argue, it might even “call for a significant update of our view of conscious and unconscious processes” (p.19614). In line with this notion, the senior author of this study recently suggested that “unconscious processes can carry out every fundamental high-level function that conscious processes can perform” (p. 195) (Hassin, 2013). Nonconscious arithmetic would be the most recent culmination point in a decades-long debate among cognitive scientists about the existence and potency of subliminal perception (Doyen, Klein, Simons, & Cleeremans, 2014). This debate has been characterized by a repeating cycle of provocative claims followed by methodological criticism, primarily aimed at the psychophysical and statistical methods used to establish the absence of conscious perception (Hesselmann & Moors, 2015). Of note, for the purpose of this reexamination, we solely relied on the data that were used to claim the existence of nonconscious arithmetic. For example, we simply took at face value that the post hoc selected sample of participants, whose data were submitted to statistical analysis, did not see the arithmetic equations; the crucial aspect of post hoc data selection and its implications has been treated elsewhere (Shanks, 2016).

The scientific study of consciousness has traditionally sought to assemble an exhaustive inventory of the psychological processes that can proceed unconsciously to isolate those that are exclusively restricted to conscious cognition (Naccache, 2009). During the course of the last decades, a large body of empirical evidence has been accumulated by applying this strategy, in particular within the domain of visual perception. Vision research provides a wide range of paradigms designed to transiently suppress visual stimuli from conscious perception, i.e., render a physically present target stimulus invisible for neurologically intact observers (Bachmann, Breitmeyer, & Ogmen, 2007). These paradigms differ with respect to what types of visual stimuli can be suppressed from awareness, and how effective the suppression is in terms of duration and controllability of onset and offset (Kim & Blake, 2005). Along another dimension, the available paradigms may be placed within a functional hierarchy of unconscious processing, according to the extent to which features of visual stimuli are processed on an unconscious level and still induce effects on behaviour, e.g., in priming experiments (Breitmeyer, 2015). The results of our reanalysis can be framed into an emerging series of results that indicate that unconscious processing associated with CFS is not as high-level as previously thought (Hedger, Adams, & Garner, 2015; Hesselmann, Darcy, Sterzer, & Knops, 2015; Hesselmann & Knops, 2014; Moors, Boelens, et al., 2016) and that neural activity related to stimuli suppressed by CFS is considerably reduced already in early visual areas (Fogelson, Kohler, Miller, Granger, & Tse, 2014; Yuval-Greenberg & Heeger, 2013). Importantly, building such a functional hierarchy should eventually allow to formulate predictions on the level of unconscious processing that can be expected in a specific experimental setup. In the absence of prior assumptions on the depth of visual suppression associated with a specific paradigm, every new report of high-level unconscious processing seems equally plausible, and the boundaries of nonconscious processing are ultimately pushed further and further.

In sum, as extraordinary claims require extraordinary evidence, we were motivated to reanalyze the data obtained in Sklar et al. based on statistical, methodological, and theoretical considerations, and within a framework that allowed us to quantify the evidence for statistical models that reflected theoretical claims (i.e., unconscious arithmetic revealed through priming effects). Together with the recent non-replication of the original results (Karpinski et al., 2016) and a recent critique of the post-hoc selection of unaware participants that was used in the original study (Shanks, 2016), we argue that our results indicate that the evidence for the existence of unconscious arithmetic is inconclusive. This current state of affairs can only be overcome by cumulative research strategies, explicitly aimed at assessing the robustness of the findings and quantifying the strength of the evidence for the theoretical claims.

Supporting Information

All Supplementary Materials can be accessed as a HTML or R Markdown file at: https://doi.org/10.6084/m9.figshare.4888391.v2