The idea that the human mind makes use of two distinct kinds of processing enjoys a long tradition, one that can be traced back at least two millennia. In 350 B.C.E., Aristotle claimed that the mind is composed of two separate systems, one supporting intuition and another supporting reasoning. Over the last few decades, experimental psychology has seen a wide proliferation of theories and models that mirror this age-old view. These dual-process accounts have been extensively reviewed elsewhere (Evans, 2008; Keren & Schul, 2010). Often, these accounts propose the tandem operation of one process that is relatively fast, nonconscious, automatic, and coarse and a separate process that is relatively slow, conscious, deliberate, and fine-grained (Evans, 2008).
In the domain of social cognition, for example, it is theorized that people initially evaluate others in a nonconcious, automatic fashion. This rapid evaluation may be subsequently modified by a more conscious and deliberate assessment, which takes more time. For instance, an individual’s prejudice may lead to a rapid negative reaction to another person, but this may be controlled by a more deliberate motivation to be nonprejudicial (Devine, 1989). In memory research, dual-process accounts argue that an item’s recognition is the result of two separate processes, familiarity and recollection. Familiarity is relatively fast, involving a coarse assessment of whether an item has been previously encountered and lacking access to specific details. Recollection, on the other hand, is relatively slow, involving a more fine-grained assessment and explicit retrieval of an item’s details (Atkinson & Juola, 1973; Jacoby, 1991). In language research, some accounts, such as the unrestricted-race account, propose that an initial syntactic structure is selected on the basis of evidence accumulated during sentence processing. When syntax is substantially ambiguous, this account posits that a separate, slower reanalysis process may then intervene on the initial, quick commitment to a syntactic structure if it turns out to be inappropriate (van Gompel, Pickering, Pearson, & Liversedge, 2005; van Gompel, Pickering, & Traxler, 2001).
Common to all of these accounts is the presence of two processes that work on different temporal scales. A consequence of this is the prediction that a participant’s set of responses, in certain conditions, is being drawn from two separate populations. This is because, during some trials within an experimental condition, the second, slow process may be in agreement with the first, quick process, but on other trials, the two processes may be in disagreement. For example, according to an unrestricted-race account, on some trials in a sentence processing experiment an initially selected syntactic structure will turn out to be inappropriate and therefore need to be corrected by subsequent reanalysis. On other trials in the same condition, the initially selected structure will in fact be correct and need no intervention (van Gompel et al., 2005). As such, the unrestricted-race account predicts that a participant’s response distribution derives from two separate trial populations, one involving zero intervention and a second involving intervention. Thus, this account, like many others hypothesizing two independent processes, predicts that the distributions of behavioral measures based on these responses will exhibit bimodality.
Whereas dual-process accounts predict the presence of bimodality in certain response distributions, other single-process accounts seek to disconfirm that any bimodality is present, because these predict unimodal distributions. For example, constraint-based accounts of sentence processing argue that the selection of a syntactic structure is accomplished by a single process involving dynamic competition, rather than two independent processes involving reanalysis. Thus, constraint-based accounts predict that certain response distributions will be unimodal. In particular, when syntax is ambiguous, these accounts predict that trials will reveal a single, continuous range of competition between the possible interpretations of the ambiguity, thereby giving way to a unimodal distribution over behavioral responses (Farmer, Anderson, & Spivey, 2007). A similar difference in predictions occurs in social categorization research. When categorizing the sex of a sex-atypical male face (e.g., one containing slight feminine features), for example, discrete stage-based approaches to social categorization predict a bimodal response distribution. This is because these approaches assume that an initial categorization is made on the basis of coarse perceptual features (e.g., female), which sometimes may need to be intervened on by a more fine-grained reanalysis (e.g., male) if the initial categorization turns out to be incorrect. At other times, the initial categorization would be correct and need no intervention. Thus, on some trials there is intervention, and on other trials there is none, forming a bimodal distribution. A dynamic interactive approach, on the other hand, predicts a unimodal distribution. This is because it assumes that such sex-atypical faces will always trigger the same single process involving dynamic competition between sex categories (Freeman & Ambady, 2011a), but that the competition among possible interpretations this system can make will give way to a normal distribution over the strength of the competition (Freeman, Ambady, Rule, & Johnson, 2008).
Examining a response distribution’s characteristics in order to distinguish between competing theoretical accounts has a long history with reaction time measurements (Ratcliff, 1979). Recently, distributional analyses have become increasingly important with the advent of more continuous, temporally fine-grained measures that index participants’ tentative commitments to various response alternatives during online processing. For example, in studies recording hand movement trajectories (via a computer mouse, wireless remote, or electromagnetic position tracker), analysis of a response distribution’s modality has been crucial in distinguishing between accounts of categorization (Dale, Kehoe, & Spivey, 2007; Freeman & Ambady, 2009, 2011b; Freeman et al., 2008; Freeman, Pauker, Apfelbaum, & Ambady, 2010), language processing (Dale & Duran, 2011; Farmer et al., 2007; Spivey, Grosjean, & Knoblich, 2005), decision making (McKinstry, Dale, & Spivey, 2008), learning (Dale, Roche, Snyder, & McCall, 2008), visual search (Song & Nakayama, 2008), and attentional control (Song & Nakayama, 2006). We take this particular methodology as one especially ripe for investigating measures of bimodality and for distinguishing between single-process and dual-process phenomena.
For example, in one series of studies, participants categorized faces’ sex by moving the computer mouse from the bottom center of the screen to either the top-left or top-right corner, which were marked “male” and “female” (Freeman et al., 2008). When categorizing sex-atypical faces, participants’ mean mouse trajectories showed a continuous attraction to the opposite sex-category response (on the opposite side of the screen), relative to sex-typical faces. This mean continuous-attraction effect could reflect either a single-process phenomenon involving dynamic competition or, alternatively, a dual-process phenomenon involving an initial analysis and subsequent reanalysis. If the effect has a unimodal distribution, in which some trials involve strong attraction, some medium attraction, and some weak attraction, it would suggest that sex-atypical faces triggered dynamic competition between parallel, partially active sex categories. However, if the effect has a bimodal distribution, in which some trials involve zero attraction and others involve extremely strong attraction, it would suggest that categorization of sex-atypical faces involved dual processes that sometimes agreed and sometimes conflicted. On some trials, an initial perceptual analysis and subsequent fine-grained reanalysis would agree, resulting in zero attraction (a discrete movement straight to the correct category). On other trials, the initial analysis (e.g., female) would turn out to be incorrect and require intervention from later reanalysis (e.g., male), resulting in extremely strong attraction (an initial discrete movement to the incorrect category, which would have to be redirected midflight by a corrective movement straight to the correct category; Freeman et al., 2008).
Thus, in accounts hypothesizing two independent processes, sometimes the two processes can agree with each other. For the sake of presentation here, we will call this kind of response “Mode 1.” At other times the two processes can be in conflict, and we will call this “Mode 2.” A schematic illustration appears in Fig. 1, which shows how a dual process introduces bimodal features using the two-choice mouse-tracking paradigm (although it applies to any behavioral measure—e.g., reaction times). In this paradigm, typically a stimulus is presented and participants move the mouse from the bottom center of the screen to the top-left or top-right corner (Freeman & Ambady, 2010; Spivey et al., 2005). In this figure, the top-left corner represents the correct response and the top-right corner represents the incorrect response. Each panel is a depiction of one experimental condition. The top panel shows one unimodal population of trajectories that all show an attraction toward the incorrect response (sometimes strong, sometimes medium, sometimes weak), which is often interpreted by single-process accounts as dynamic competition. The lower panels show bimodal populations of trajectories, in which dynamic competition is not present. Instead, some proportion (1 – p; Mode 1) of trials involve a discrete movement toward the correct category, and the rest of the trials (p; Mode 2) involve an initial discrete movement toward the incorrect category, which is then redirected in midflight by a discrete movement to the correct category. The middle panel depicts a population of trajectories with a recognizable amount of separation between Mode 1 and Mode 2 responding, whereas the bottom panel depicts a population with more extreme separation. Both panels depict a pattern of results consistent with dual-process accounts, where Mode 1 responses occur when an earlier and a later process agree and Mode 2 responses occur when the two processes conflict. Importantly, when all trajectories in each panel are averaged together into a mean trajectory for the experimental condition, the three mean trajectories would look quite similar, resembling something like the top panel’s trajectory (see Freeman et al., 2008, Study 3). This highlights the importance of examining distributional characteristics, as the underlying pattern of responses may be quite different, although the mean effects look virtually the same.
As is shown in Fig. 1, a response distribution’s shape would be strongly affected by single versus dual modes of responding, with dual modes introducing bimodal features. Specifically, two parameters are likely to affect the distributional shape. One of these parameters is the distance between mean responses in Mode 1 and Mode 2. In a mouse-tracking paradigm, this could be the difference in the trajectories’ deviations toward the incorrect response between Mode 1 and Mode 2 responses. As Mode 2 responses become more extreme, the distance increases between the two peaks of the bimodal distribution, as illustrated in Fig. 1. This is not limited to a mouse-tracking paradigm; for example, this distance could refer to a difference in reaction times (e.g., Atkinson & Juola, 1973; Ratcliff, 1979). The other parameter is the proportion of responses in Mode 2. If the likelihood of a Mode 2 response is 25 % (as in Fig. 1), it is easy to superficially observe bimodality in the response distribution. However, if the likelihood of a Mode 2 response is only 5 %, for example, observing bimodality is likely to be substantially more difficult, because the Mode 2 population could be obscured by the considerably larger Mode 1 population, thereby feigning unimodality.
Distinguishing between unimodality and bimodality
Researchers have used several measures to distinguish between unimodality and bimodality, including the bimodality coefficient (BC; SAS Institute, 1989), Hartigan’s dip statistic (HDS; Hartigan & Hartigan, 1985), and the difference in Akaike’s information criterion (AIC; Akaike, 1974) between one-component and two-component Gaussian mixture distribution models (McLachlan & Peel, 2000). An extensive discussion of these measures is beyond the scope of this article, but we provide a brief description of each. In the present work, we focus on utilizing these measures “out of the box”—that is, on how a researcher’s estimation of bimodality may be done with readily available scripts and other sources. The measures that we employ have this property of accessibility and ease of application (see the Appendix for our code).
The BC is based on an empirical relationship between bimodality and the third and fourth statistical moments of a distribution (skewness and kurtosis). It is proportional to the division of squared skewness with uncorrected kurtosis, BC ∝ (s
2 + 1)/k, with the underlying logic that a bimodal distribution will have very low kurtosis, an asymmetric character, or both; all of these conditions increase BC. The values range from 0 and 1, with those exceeding .555 (the value representing a uniform distribution) suggesting bimodality (SAS Institute, 1989).
The HDS is a statistic calculated by taking the maximum difference between the observed distribution of data and a uniform distribution that is chosen to minimize this maximum difference. The idea is that repeated sampling from the uniform (with the sample size of the original data) produces a sampling distribution over these differences; a bimodal (or n-modal) distribution is one in which the HDS is at or greater than the 95th percentile among all sampled values. In other words, as compared to the uniform distribution (which Hartigan & Hartigan, 1985, argued to be the best choice for testing unimodality), a multimodal distribution has statistically significant disparities in its distribution function. Thus, the HDS is given to null-hypothesis logic and is inferential; if p < .05, the distribution is considered to be bimodal or multimodal (Hartigan & Hartigan, 1985).
Finally, the AIC is a well-known information-theoretic goodness-of-fit measure for an estimated statistical model, with lower AIC values indicating better fit. To assess modality, one can fit the observed data using one-component (i.e., unimodal) and two-component (i.e., bimodal) Gaussian mixture distribution models to determine which of the two models minimizes AIC (McLachlan & Peel, 2000). If the one-component model minimizes AIC, the distribution is better described as unimodal; if the two-component model minimizes AIC, the distribution is better described as bimodal. Importantly, the AIC weighs the likelihood score of a model against the number of parameters used to construct the model. If the AIC for a bimodal mixture model is smaller than that of a unimodal model, it suggests that the goodness of fit exceeds the cost of having an additional component in the model.Footnote 1
As described earlier and exemplified by Fig. 1, the presence of a dual process can affect a response distribution by introducing bimodal features. The degree of bimodality would be influenced by two important factors, the distance in mean responses between Modes 1 and 2, and the proportion of responses in Mode 2 (vs. Mode 1). To examine how these factors affect the distributional shape and the detection of bimodality using BC, HDS, and AIC measures, we systematically manipulated them in a number of simulations. We also manipulated the degree of positive skewness in the response distribution. It is quite common for reaction time distributions to exhibit positive skewness, and indices of spatial attraction, curvature, or deviation in hand-trajectory data (see Freeman & Ambady, 2010) commonly exhibit positive skewness as well. Given that such distributions often feature positive skewness, it is important to understand how skew might influence the detection of bimodality.
The HDS measure initially proposed in Hartigan and Hartigan (1985) was meant to test the null hypothesis of unimodality against the alternative hypothesis of multimodality, with the null of an asymptotic uniform distribution. Though it is widely utilized in the bimodal context, the test was intended to explore departures from unimodality of a kind that may have more than two modes. Essentially, it tests the departure of an observed density function from a unimodal one (assumed to have a single inflection point between convex and concave segments). This means that HDS is relatively more robust to skew: Regardless of the location of the center of the observed function, HDS tests the observed function against the presence of a single inflection point. In both BC and AIC, high skew may significantly impact the test for the presence of more than one mode, so that both may sometimes find a spurious second mode in long-tailed, skewed distributions (a point originally made by Hartigan & Hartigan, 1985). Nevertheless, researchers have often used measures such as BC or AIC with distributions containing high skew.
Another important, yet often underappreciated, issue in judging the modality of distributions in psychological experiments is sample size. For example, in the original mouse-tracking study (Spivey et al., 2005), the authors noted that having too few trials within an individual subject posed problems for assessing bimodality on a per-subject basis; instead, they opted for assessing it at the group level. This has now become the norm in mouse-tracking research (see, e.g., Freeman & Ambady, 2010). How many trials, then, is too few? In general, the sampling error of skewness and kurtosis are high at smaller sample sizes (10 or fewer), suggesting that the BC, which is computed from these parameters, may be unstable at smaller sample sizes. HDS is based on an empirical resampling from a uniform distribution, and thus may naturally correct for smaller sample sizes, as the simulated distribution will accommodate this. Examining the performance of all three measures—BC, AIC, and HDS—with unimodal and bimodal distributions of varying skew and sample size, among other factors that are known to drive modality, has therefore long been needed.
The present work
Given that the BC and AIC measures (and not the HDS measure) may be substantially biased by skew and sample size, we hypothesized that HDS may be an overall more robust measure for judging the modality of a distribution. As we have discussed, adjudicating between unimodality and bimodality has become an increasingly important issue in psychological experiments, with unimodal and bimodal outcomes often leading to opposite interpretations of cognitive dynamics (e.g., Dale & Duran, 2011; Farmer et al., 2007; Freeman et al., 2008). Here, we provide a comprehensive analysis of three modality measures with the aim of making a recommendation for measure selection in future research. To this end, we have taken a two-pronged approach. First, we systematically examined the measures’ performance in simulations in which the factors described above were tightly controlled and varied. Then, to validate the results and increase their generality, we examined the measures’ performance with previously published experimental data, which contained distributions theoretically known to be either unimodal or bimodal.