A critical question in the development of models of word processing concerns the extent to which orthographic and phonological information are automatically extracted from a stimulus during processing. The automaticity of processing suggests that with extensive practice, processing may occur with little or no attention. However, research indicates that a more complex definition of automaticity is needed. In their review of past work, Kahan, Hengen, and Mathis (2011) described criteria for characterizing automaticity, including processes that (a) occur quickly, (b) require no capacity, (c) occur without intention, (d) do not involve conscious awareness, (e) are initiated immediately upon the presentation of a stimulus, and (f) cannot be interrupted by other cognitive processes, once begun (see Ansari & Besner, 2005; Besner & Care, 2003; Besner & Risko, 2005; Neely & Kahan, 2001; Posner & Snyder, 1975). Thus, any examination of automaticity in the context of word processing needs to specify which criteria relevant to automaticity are being addressed. In Kahan et al.’s study and the present article, the focus was on two criteria: Automatic processes are capacity-free, and they are initiated upon stimulus presentation. However unlike Kahan et al., who made comparisons between word and nonword stimuli in order to make claims about the automaticity of word processing per se, here we focused exclusively on whether prelexical processing occurs in a capacity-free manner upon stimulus presentation when the task requires participants to extract orthographic or phonological codes.

A number of previous studies (Ansari & Besner, 2005; Besner & Care, 2003, Besner & Risko, 2005; Kahan, Hengen, & Mathis, 2011) examining the automaticity of processes during word recognition have used the “task-set” procedure originated by Besner and Care. In this procedure, participants perform two separate tasks, where the particular task to be engaged is cued at the start of each trial. Trial-by-trial cueing is preferred in order to avoid the development of a “mental set” formed when participants are aware of the task in advance. Ansari and Besner (2005; see also Besner & Care, 2003; Besner & Risko, 2005) have found that such advance knowledge of the task may create an impression (inappropriately) that the processes involved are automatic. Conversely, changing the task on a trial-by-trial basis may eliminate results consistent with automatic processing. Thus, providing the task cue on a trial-by-trial basis offers the most conservative test of the automaticity of processes. In addition, the task cue may be presented simultaneously with or before the stimulus. Finally, the stimulus may be presented clearly or degraded.

The key to examining data from the task-set procedure is to determine whether the effect of stimulus clarity (i.e., faster response times to clear than to degraded stimuli) disappears when the cue is presented simultaneously with the stimulus. Elimination of the response time difference between clear and degraded stimuli would suggest capacity-free processing, because such perceptual processing is occurring with other cognitive processes (e.g., cue interpretation). If the effect of stimulus clarity were to remain, this would suggest that the dimension of the target that participants were trying to process was delayed relative to cue processing. Thus, a comparison of response times to clear versus degraded stimuli at simultaneous versus delayed stimulus onset asynchronies (SOAs) is indicative of whether or not the processing of that aspect of the target is occurring simultaneously with cue decoding. When processing is not occurring simultaneously with interpreting the task cue, the relationship between clear and degraded stimuli will not change, and responses will be faster to clear than to degraded stimuli (as is illustrated in the top panel of Fig. 1). This is because the processing of the stimulus takes up resources, and as such, is added to the time to identify the task on the basis of the cue. On the other hand, when the processing of the target occurs simultaneously with interpreting the task cue, the effect of stimulus clarity gets “absorbed” into the time it takes to identify the task on the basis of the cue, and clear and degraded stimuli will have comparable response times (as is illustrated in the bottom panel of Fig. 1). In addition, if the processes involved in clarifying a degraded stimulus are the same, irrespective of whether a person is working to extract orthographic or phonological codes from the degraded target, then the pattern of results should be the same across tasks. However, if the processes involved in clarifying a degraded target differ with the type of information that the person is trying to extract (i.e., letter- or sound-based codes), then the patterns of results may differ across tasks. In the present experiment, we examined whether prelexical codes are extracted in a capacity-free manner upon stimulus presentation, and whether this differs depending on the codes that participants are trying to extract and the modality of presentation.

Fig. 1
figure 1

Hypothetical patterns of response time data to degraded and clear targets at a simultaneous (0-ms) and a longer (750-ms) stimulus onset asynchrony (SOA) if perceptual processing does not (top panel) or does (bottom panel) occur simultaneously with interpreting a task cue.

Previous work using the task-set procedure has provided a preliminary understanding of the automaticity of processes. Besner and Care (2003) examined whether accessing phonological information from nonwords is capacity-free and stimulus-driven. Participants completed either a task involving the accessing of orthographic and phonological information, in which they pronounced visually presented nonwords, or a task focused on visual features, in which they determined whether the nonword was presented in upper- or lowercase letters. Nonwords were presented either clearly or degraded. In the pronunciation task, Besner and Care found that the stimulus clarity effect was not absorbed into the cue-processing time, suggesting that prelexical processing was not accessed in a capacity-free manner. Similarly, with the case determination task, it appeared that feature-level processing was not capacity-free. In a related study, Paulitzki, Risko, O’Malley, Stolz, and Besner (2009) examined word stimuli and obtained some evidence of underadditivity in the reading-aloud task, and greater evidence of underadditivity in the feature-level task. Data from this later study suggested that under some circumstances, the processing of prelexical codes can occur automatically and in conjunction with interpreting the task cue. As was suggested by Kahan et al. (2011), the differences in results between Besner and Care and Paulitzki et al. could be due to methodological differences between the studies: Whereas Besner and Care surrounded the visual target with a colored box, Paulitzki et al. provided an auditory cue to indicate the task to be performed. Paulitzki et al. changed to an auditory cue after noting that the visual system may have had difficulty processing the visual target and the visual task cue in the Besner and Care study. These two studies also differed in their uses of nonword (Besner & Care, 2003) and word (Paulitzki et al., 2009) stimuli. This distinction is particularly important, because it raises questions about the potential role of top-down lexical connections in the extraction of phonological and orthographic codes from word stimuli. Although some researchers might argue that such connections are not available in the processing of nonwords (see, e.g., Kahan et al., 2011), the issue is far from settled in the literature. For example, Reynolds and Besner (2002), in a series of simulations, provided evidence of an interaction between the orthographic lexicon and the letter units that served to facilitate letter identification. They argued that such facilitation could ultimately have an effect on nonword processing.

To address these differences, Kahan et al. (2011) conducted two visual task-set experiments using both word (Exp. 1) and nonword (Exp. 2) stimuli with an auditorily presented task cue. Kahan et al. also adopted two tasks that were more fully focused on phonological and orthographic information. In their phonological task, participants determined whether the letter “a” in a visual stimulus was pronounced /e/ (as in “flame”) or /æ/ (as in “band”). In the orthographic task, participants determined whether the visual stimulus contained an “l” or an “n.” Their results showed that concurrent processing of the stimulus and cue occurred for word stimuli, but not for nonword stimuli, in both the phonological and orthographic tasks. These findings led Kahan et al. to argue that words do show some capacity-free processing, suggesting that phonological and orthographic information are accessed automatically during visual word recognition.

Although Kahan et al. (2011) were able to address capacity-free processing in visual word recognition, their results did not indicate whether such automatic processing might be observed for auditorily presented stimuli as well. Moreover, closer examination of the Kahan et al. procedures reveals a potential confound. In their phonological task, the spelling of the critical letter “a” was held constant while the pronunciation of the letter was varied (i.e., /e/ as in “flame” vs. /æ/ as in “band”). However, in their orthographic task, the pronunciation of the letter varied with the spelling (i.e., participants looked for “l” versus “n,” which are spelled and also pronounced differently). Thus, in their orthographic task any effects of orthography were confounded with phonology.

These issues are addressed in the present study, which extends the work of Kahan et al. (2011). Our participants engaged in an auditory or a visual multitasking experiment. In each experiment, they were instructed to perform either a phonological or an orthographic task on each trial. In the phonological task, participants determined whether the letter “a” in the stimulus was pronounced /e/ (as in “face”) or /æ/ (as in “cast”). In the orthographic task, they determined whether the sound /s/ was spelled with an “s” (as in “waste”) or with a “c” (as in “trace”). The particular task in which they engaged was cued on a trial-by-trial basis. In the auditory experiment, the cue was a red or a green square; in the visual experiment, the cue was a high or a low tone. The stimuli were modified from those used by Kahan et al., to ensure that the phonological task tested only phonological information and the orthographic task tested only orthographic information. The presented (auditory or visual) stimuli were clear or degraded. One question concerned whether the phonological and orthographic tasks would lead to different results across the two modalities. It is possible that in the auditory modality, automaticity of processing would be seen for the phonological task, but in the visual modality, automaticity of processing would be seen for the orthographic task. Such congruency between modality and linguistic information would not be unheard of in the literature. Tanenhaus, Flanigan, and Seidenberg (1980) referred to a potential influence of congruency when discussing the mechanisms that might have led to their finding that primes sharing phonological and orthographic information with subsequent targets interfered with the color naming of the target. They argued that one explanation for their cross-modal priming effect could be that it occurs after a word is recognized “through the code related to the modality in which the word is presented” (Tanenhaus et al., 1980, p. 519). In such a system, lexical access is achieved through the code congruent with the modality of presentation, and only after lexical access is the alternate code made available. Assuming such a mechanism in which congruent codes and modalities are processed first, the facilitation of responses may be seen in the modality consistent with the cued task (i.e., in the auditory modality for the phonological task, and in the visual modality in the orthographic task; see also Lafontaine, Chetail, Colin, Kolinsky, & Pattamadilok, 2012, for similar arguments).

A second question concerns whether the processing of prelexical codes will occur in parallel with interpreting the task cue and whether this processing differs depending on the task demands and the modality of presentation. As in Kahan et al. (2011), if processing does occur automatically, then we should observe an underadditive pattern in which the stimulus clarity effect is reduced or eliminated in the simultaneous cue-delay condition.

Method

Participants

A group of 64 students from Bowdoin College participated in the study for partial credit toward a course research requirement or were entered into a raffle to win one of four $10 gift cards to a local gelato store. All participants were native English speakers with no known history of speech or hearing disorders. Half of the participants (32) were tested in the auditory session, and half (32) were tested in the visual session.

Materials

A total of 160 words that contained the letter “a” and the sound /s/ were selected for use in the study. Each word fit into one of four conditions, created by crossing two different pronunciations of the letter “a” (/e/ as in “brace” and /æ/ as in “dance”) and two different spellings of the sound /s/ (“c” as in “brace” and “s” as in “waste”), resulting in 40 words per condition. A complete list of the 160 target words grouped into the four conditions is presented in the Appendix. In addition to the 160 experimental stimuli, eight practice stimuli were selected, such that two practice stimuli were consistent with each of the four conditions being tested. The same stimulus words were used in the auditory and visual sessions.

Auditory stimuli were digitally recorded with 16-bit resolution at a sampling rate of 44,100 samples per second in stereo using Cool Edit Pro Version 2 software. The stimuli were recorded by a female speaker in lists of five words each, were digitally edited, saved in separate waveform files, and equalized on average root-mean square intensity. A waveform file of brown noise was also generated and then mixed with each auditory waveform file to create degraded versions of each auditory stimulus.

The presentation of cues and stimuli, timing of the experiment, collection of responses, and measurement of response times were controlled by E-Prime software (Schneider, Eschman, & Zuccolotto, 2002) running on Dell Optiplex 755 personal computers. In the auditory sessions, participants heard stimuli over Shure SRH 440 Professional studio headphones.

Procedure

Participants read instructions indicating that they would be performing one of two tasks—either a phonological or an orthographic task—on each trial. The particular task to be performed was indicated by a cue at the beginning of the trial. In the auditory sessions, the task cue was a red or a green square presented in the center of the computer monitor that subtended 1.15 deg of visual angle horizontally and vertically. In the visual sessions, the task cue was a high (1000-Hz) or a low (500-Hz) tone presented for 750 ms. For the phonological task, participants needed to extract sound codes from the words to determine whether the letter “a” in the word was pronounced /e/ or /æ/. For the orthographic task, participants needed to extract letters from the words to determine whether the sound /s/ in the word was spelled with an “s” or a “c.”

In the auditory sessions, stimuli were presented over headphones at approximately 80 dB; in the visual sessions, the target appeared in 14-point Courier New font inside a centrally presented black rectangle that subtended 8.00 deg horizontally and 2.58 deg vertically.

For a particular participant, half of the 160 presented words required a phonological decision and half required an orthographic decision. In the phonological task, half of the words required an /e/ response and half required an /æ/ response. Similarly, in the orthographic task, half of the words required an “s” response and half required a “c” response. Moreover, half of the stimuli in each condition were presented clearly (i.e., auditory words without noise and visual words presented white on a black background; luminance values of 255, 255, 255), and half were degraded (i.e., auditory words mixed with brown noise and visual words presented in dark gray on a black background; luminance values of 15, 15, 15). Finally, on each trial the auditory or visual task cue appeared either simultaneously with the target or 750 ms before the target. The particular condition presented on a trial was fully randomized. The factorial combination of task (phonological or orthographic), stimulus clarity (clear or degraded), and cue–target SOA (0 or 750 ms) resulted in eight critical conditions. Twenty trials were presented in each of the critical conditions. In order to counterbalance each of the relevant variables along with the mapping of response keys to the different responses, 16 lists were generated, such that every target word appeared in every critical condition across lists. Participants were randomly assigned to a list upon visiting the laboratory. Thus, although each participant saw a given target word (e.g., “brace”) only once, across participants, each target word was presented in each of the relevant conditions. As a result, participants responded that the “a” in “brace” was pronounced /e/ in the phonological task and that the /s/ in “brace” was spelled with a “c,” when “brace” was either presented clearly or degraded by auditory or visual noise, and when “brace” was presented either simultaneously with the target (0-ms cue–target SOA) or 750 ms before the target (750-ms cue–target SOA).

The sequence of events for a particular trial is presented pictorially in Fig. 2. At the start of each trial, participants saw a blank screen for 1,500 ms. Following this intertrial interval (ITI), participants (a) waited for 750 ms before getting the cue simultaneously with the target (0-ms cue–target SOA) or (b) were presented with the cue for 250 ms, followed by a 500-ms blank screen before getting the target (750-ms cue–target SOA). In the auditory sessions, the cue was a red or a green square centrally presented on the screen. In the visual sessions, a high or low tone was played over the headphones to indicate which task the participants should perform. For half of the participants, a red square (high tone) cued the phonological task, and a green square (low tone) cued the orthographic task. For the other half of the participants, the opposite assignment of cues to tasks was made. Once the target was presented, participants determined whether the “a” in the word was pronounced /e/ or /æ/ (on phonological-task trials) or whether the /s/ in the word was spelled with a “c” or an “s” (on orthographic-task trials). Participants responded by pressing the “Q,” “W,” “8,” or “9” keys on the computer keyboard to indicate their responses. The mapping of computer key to responses—/e/, /æ/, “s,” and “c”—was counterbalanced across participants. Response times were measured from the onset of the auditory or visual target until the participant’s keypress response. In the visual sessions, the target remained visible until the participant responded. Response times and errors were recorded by the computer. Participants completed eight practice trials and 160 experimental trials.

Fig. 2
figure 2

Sequence of events for a single trial in the auditory modality with (a) a cue–target stimulus onset asynchrony (SOA) of 0 ms or (b) a cue–target SOA of 750 ms, and in the visual modality with (c) a cue–target SOA of 0 ms or (d) a cue–target SOA of 750 ms. ITI, intertrial interval

Results

Following Kahan et al. (2011), trials immediately after an error that had a response time (RT) greater than 4,500 msFootnote 1 were eliminated from the analyses, resulting in the elimination of 0.05 % of the trials. Posterror slowing of this sort has been documented numerous times elsewhere (see Carter & van Veen, 2007; Gehring, Goss, Coles, Meyer, & Donchin, 1993; Kleiter & Schwarzenbacher, 1989; Notebaert et al., 2009; Rabbitt, 1966). In addition, the mean RT and standard deviation for each participant in each condition was calculated, and RTs that were more than 2.5 standard deviations away from the participant’s mean in each condition were discarded. This resulted in the removal of less than 2 % of the trials per participant. Mean RTs for trials on which participants responded correctly were computed and submitted to two separate 2 (modality: auditory or visual) × 2 (task: phonological or orthographic) × 2 (cue–target SOA: 0 or 750 ms) × 2 (stimulus clarity: clear or degraded) repeated measures analyses of variance (ANOVAs) treating subjects (F 1) and items (F 2) as random factors.Footnote 2 The ANOVA by subjects (F 1) had a mixed design with task, cue–target SOA, and stimulus clarity as within-subjects variables and modality as a between-subjects variable. The ANOVA by items (F 2) was a fully within-subjects repeated measures design. Mean response times and error rates are provided in Table 1.

Table 1 Mean response times (RTs, in milliseconds) and error rates to clear and degraded targets in phonological and orthographic tasks across two cue–target stimulus onset asynchronies (SOAs) in the auditory and visual modalities

The analysis yielded a main effect of stimulus clarity in both the subject and item analyses [F 1(1, 62) = 8.26, MSE = 62,246, p = .006; F 2(1, 157) = 8.21, MSE = 189,133, p = .005]. RTs to clear stimuli averaged 1,720 ms, and those to degraded stimuli averaged 1,784 ms. A main effect of cue–target SOA was also found in the subject and item analyses [F 1(1, 62) = 143.58, MSE = 57,500, p = .0001; F 2(1, 157) = 140.68, MSE = 287,728, p = .0001], such that the average RT in the simultaneous (0-ms) condition was 1,879 ms, whereas the average RT in the delayed (750-ms) condition was 1,625 ms. An interaction of cue–target SOA and modality (auditory or visual) was also found in both analyses [F 1(1, 62) = 23.38, MSE = 57,500, p = .0001; F 2(1, 157) = 31.554, MSE = 216,529, p = .0001]. An interaction of stimulus clarity and cue–target SOA was found in the subject analysis [F 1(1, 62) = 3.96, MSE = 37,968, p = .051], but not in the item analysis [F 2(1, 157) = 2.75, MSE = 298,876, p = .099]. Most importantly, as is illustrated in Fig. 3, an interaction was found between stimulus clarity, cue–target SOA, and task. Although this effect was only marginally significant in the subject analysis [F 1(1, 62) = 3.544, MSE = 49,964, p = .064], it was robust in the item analysis [F 2(1, 157) = 4.726, MSE = 222,912, p = .031]. Because effects of stimulus modality did not moderate this three-way interaction (i.e., the pattern depicted in Fig. 3 held true for both the visual and auditory modalities), subsequent analyses were collapsed across modality.

Fig. 3
figure 3

Response times and error rates to clear and degraded targets at a simultaneous (0-ms) and a longer (750-ms) stimulus onset asynchrony (SOA) for a phonological task (top graphs) and an orthographic task (bottom graphs) for auditory targets, for visual targets, and collapsed across the auditory and visual modalities. Error bars represent one standard error of the mean.

To better understand the Stimulus Clarity × Cue–Target SOA × Task interaction, follow-up tests were conducted examining cue–target SOA and stimulus clarity for the phonological and orthographic tasks separately. A 2 (stimulus clarity) × 2 (cue–target SOA) ANOVA on the data from the phonological task revealed a main effect of cue–target SOA in both the subject and item analyses [F 1(1, 63) = 38.29, MSE = 88,317, p = .0001; F 2(1, 158) = 76.423, MSE = 111,846, p = .0001]. Similarly, an effect of stimulus clarity was obtained in both the subject and item analyses [F 1(1, 63) = 11.348, MSE = 33,647, p = .001; F 2(1, 158) = 3.97, MSE = 129,354, p = .048]. Critically, the interaction of cue–target SOA and stimulus clarity was not significant for the phonological task [F 1(1, 63) = .013, MSE = 43,262, p = .910; F 2(1, 158) = .047, MSE = 116,107, p = .828] (see the top portion of Fig. 3). For the orthographic task, 2 (stimulus clarity) × 2 (cue–target SOA) ANOVAS by subjects and items revealed a main effect of cue–target SOA [F 1(1, 63) = 92.330, MSE = 53,599, p = .0001; F 2(1, 158) = 81.610, MSE = 146,510, p = .0001]. Unlike in the phonological task, the Stimulus Clarity × Cue–Target SOA interaction was significant for the orthographic task [F 1(1, 63) = 7.458, MSE = 43,798, p = .008; F 2(1, 158) = 5.872, MSE = 145,430, p = .017] (see the bottom portion of Fig. 3).

Two 2 (modality: auditory or visual) × 2 (task: phonological or orthographic) × 2 (cue–target SOA: 0 or 750 ms) × 2 (stimulus clarity: clear or degraded) ANOVAs by subjects and items on the accuracy data revealed a main effect of stimulus clarity [F 1(1, 62) = 9.682, MSE = .004, p = .003; F 2(1, 159) = 7.757, MSE = .024, p = .006]: Responses in the clear condition (.90) were more accurate than those in the degraded condition (.88). In addition, a main effect of modality was obtained in the item analysis only [F 1(1, 62) = 0.513, MSE = .053, p = .475; F 2(1, 159) = 4.281, MSE = .032, p = .040]. The interaction of stimulus clarity with modality was significant in both analyses [F 1(1, 62) = 10.593, MSE = .004, p = .002; F 2(1, 159) = 8.154, MSE = .025, p = .005]. For the auditory modality, accuracy was higher when responding to clear stimuli (.90) than to degraded stimuli (.86), but such a difference in accuracy across stimulus clarity was not obtained for the visual modality (.89). This pattern can be seen in Fig. 3.

Discussion

On each trial, participants performed either a phonological task (i.e., determining whether the letter “a” in a word was pronounced /e/ or /æ/) or an orthographic task (i.e., determining whether the sound /s/ in a word was spelled with an “s” or “c”). The cue indicating which task to be performed was presented simultaneously with or 750 ms before the target. The target was presented either clearly or degraded. Not surprisingly, participants responded faster when the cue was presented before the target as compared to simultaneous presentation with the target, and when the target was presented clearly as compared to when it was degraded.

Although we predicted that prelexical processing might occur in parallel with interpreting the task cue in the visual modality when the task involved the extraction of orthographic codes, and that such parallel processing might occur in the auditory modality when the task involved the extraction of phonological codes—and as such, that an underadditive pattern might be found only in each of these respective situations—the present findings did not support this position. Instead, the patterns of results did not differ for the auditory and visual modalities.

Most importantly, participants responded differently in the phonological and orthographic tasks when stimulus clarity and cue–target SOA were taken into account. Specifically, for the orthographic task, RTs revealed a stimulus clarity effect (i.e., faster RTs to clear than to degraded targets) at a 750-ms SOA, but not at the 0-ms SOA. This interaction of stimulus clarity and SOA is consistent with the findings obtained in earlier work examining the processing of words. Specifically, Paulitzki et al. (2009) found that under some circumstances the processing of their stimuli occurred automatically and in conjunction with interpreting the task cue. Kahan et al. (2011) similarly found that concurrent processing of the stimulus and cue occurred for word stimuli. Unlike in the present study, however, Kahan et al. found an interaction between stimulus clarity and cue–target SOA for both phonological and orthographic tasks. Although in the present study we found underadditive effects for the orthographic task with word stimuli, it should be noted that Besner and Care (2003) found additive effects for the pronunciation of nonword stimuli. This difference in effects may well be due to specific methodological differences between the present study and the study conducted by Besner and Care. Specifically, Besner and Care used a pronunciation task and examined nonword stimuli, whereas for the present study we used a letter/sound identification task and word stimuli. As such, the differences in our results may reflect differences in processing between word and nonword stimuli, as was suggested by Kahan et al. However, another possibility is that since our procedure decoupled orthographic and phonological effects, whereas the procedure used by Besner and Care did not, we were better able to isolate two types of prelexical processes: those that occur when participants extract orthographic codes from degraded words and those that occur when extracting phonological codes from degraded words. If prelexical processing occurs in parallel with deciphering the task cue when orthographic, but not phonological, information is being sought, this may explain why Besner and Care failed to find underadditivity in their pronunciation task, since their task required the extraction of phonological information—which, on the basis of our results, may have precluded their finding underadditivity.

Given the methodological modifications made in the present study, these data provide stronger evidence of capacity-free processing of stimulus information during the orthographic task than earlier studies have. The elimination of the stimulus clarity effect (“absorption,” as it was characterized by Kahan et al., 2011) suggests that the perceptual processing of a stimulus was occurring in parallel with the interpretation of the presented task cue. It is less clear from these data whether this “underadditivity” is the result of automatic processing of the stimulus word (i.e., a lexical effect) facilitating, in a top-down manner, prelexical processes, or whether the results are not related to our using word stimuli. Although past work (e.g., Kahan et al., 2011) had suggested that absorption is indicative of automatic processing of the stimulus, it is equally possible that these results pinpoint early stages of processing in which stimulus quality may (in the case of an orthographic task) or may not (in the case of a phonological task) be processed in parallel with cue interpretation. What is clear is that during the orthographic task, prelexical processing is taking place in a capacity-free manner upon stimulus presentation.

Interestingly, this elimination of the stimulus clarity effect in the simultaneous cue–target presentation condition, which was evident in both modalities for the orthographic task, was not obtained for the phonological task. Thus, unlike in Kahan et al. (2011), the findings for the phonological task in the present study appear to be “additive.” These findings raise the question of why processing should be different for the phonological than for the orthographic task. One possibility is that priority may be given to orthographic processes. Support for this suggestion has come from a study by Breznitz (2003), in which behavioral and electrophysiological (ERP) latencies were measured while dyslexic and normal-reading adults performed phonological tasks (requiring homophone and rhyme decisions) and orthographic tasks (requiring homograph decisions). Breznitz found that P2 and N2 latencies were longer for phonological processing than for orthographic processing for both dyslexic and normal readers. Breznitz went on to suggest that discriminating acoustic elements in the brain takes more time than does discriminating visual features, because phonological processing is sequential and orthographic processing is more holistic. Such a characterization of phonological processing as being more protracted than orthographic processing might be related to the present finding of capacity-free processing in the orthographic but not the phonological task.

In the present study, the phonological task required participants to respond to the vowel “a” in the presented word, whereas the orthographic task required them to respond to the consonant /s/ in the presented word. Thus, another possible explanation for the difference between the phonological and orthographic tasks in the present study may be that consonant-based processing is qualitatively different from vowel-based processing. Numerous studies in the literature have provided evidence for the differences between consonants and vowels. Research suggests that consonants and vowels differ in their functions (Toro, Nespor, Mehler, & Bonatti, 2008), how they are processed in categorical perception and delayed-recognition tasks (Ades, 1977), how quickly participants respond to them in detection tasks (Cutler, 1997), and their roles in reading (New, Araujo, & Nazzi, 2008). Lee, Rayner, and Pollatsek (2002) found temporal differences in consonants and vowels in reading English, such that at short (30-ms) prime durations, gaze durations on target words were shorter for targets that shared consonants with a high-frequency prime than for targets that shared vowels with a high-frequency prime. Moreover, recent research has examined differences between consonants and vowels using electrophysiological measures. Carreiras, Gillon-Dowens, Vergara, and Perea (2009) found that consonantal information was more critical in accessing whole-word forms during reading. Vergara-Martinez, Perea, Marin, and Carreiras (2011) showed that primes and targets that are related on the basis of the relative positions of their consonants lead to responses similar to those with primes and targets that are identical. But primes and targets that are related on the basis of the relative positions of their vowels lead to responses similar to those for primes and targets that are unrelated. Additionally, a study of two aphasic patients by Caramazza, Chialant, Capasso, and Miceli (2000) indicated that the categorical distinction between consonants and vowels is a brain-based one, given that consonants and vowels appear to be processed by different neural mechanisms.

Irrespective of the reasons for these processing differences, the present research was able to replicate the results of Kahan et al. (2011) when participants performed an orthographic task, but not when participants performed a phonological task. Although Kahan et al. suggested that participants are able to automatically access information for either task, the present data unequivocally show such automaticity only when participants are trying to extract orthographic codes. On the basis of these findings, and in lieu of past research, it appears that concurrent processing can occur in both auditory and visual modalities when the extraction of orthographic information is required, even when stimuli are used that unconfound orthographic from phonological codes.