A nonspatial sound modulates processing of visual distractors in a flanker task

Salagovic, Cailey A.; Leonard, Carly J.

doi:10.3758/s13414-020-02161-5

A nonspatial sound modulates processing of visual distractors in a flanker task

Published: 20 October 2020

Volume 83, pages 800–809, (2021)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

A nonspatial sound modulates processing of visual distractors in a flanker task

Download PDF

Cailey A. Salagovic¹ &
Carly J. Leonard¹

1368 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Successful navigation of information-rich, multimodal environments involves processing both auditory and visual information. The extent to which information within each modality is processed varies because of many factors, but the influence of auditory stimuli on the processing of visual stimuli in these multimodal environments is not well understood. Previous research has shown that a preceding sound leads to decreased reaction times in visual tasks (Bertelson, Quarterly Journal of Experimental Psychology 19(3), 272–279, 1967). The current study examines whether a nonspatial, task-irrelevant sound additionally alters processing of visual distractors that flank a central target. We used a version of a flanker task in which participants responded to a central letter surrounded by two irrelevant flanker letters. When these flankers are associated with a conflicting response, a congruency effect occurs such that reaction time to the target is slowed (Eriksen & Eriksen, Perception & Psychophysics, 16(1), 143–149, 1974). In two experiments using this task, results showed that a preceding tone caused general speeding of reaction time across flanker types, consistent with alerting. The tone also caused decreased variation in response time. Critically, the tone modulated the congruency effect, with a greater speeding for congruent flankers than for incongruent flankers. This suggests that the influence of flanker identity was more intense after tone presentation, consistent with a nonspatial sound increasing perceptual and/or response-association processing of flanking stimuli.

Retrospective auditory cues can improve detection of near-threshold visual targets

Article Open access 12 December 2019

Auditory enhancement of visual searches for event scenes

Article 10 January 2022

Cross-modal cueing in audiovisual spatial attention

Article 22 May 2015

Processing sensory information is vital for normal functioning, however the immense amount of input that the brain receives at any one time cannot all be processed to the same degree. Generally, attention functions to prioritize a subset of stimuli from the environment to be processed further, but information from task-irrelevant stimuli also has a strong influence on the system. Studies have shown evidence against a strict model of discrete processing stages gated by an early locus of attentional selection (e.g., Moore & Egeth, 1997; Vogel, Luck, & Shapiro, 1998). Early support for this idea came from Erik Eriksen and colleagues, who proposed a “continuous flow” model of processing in which information accumulation in the visual system about the target may occur in concurrence with response associations about distractors (C. W. Eriksen & Schultz, 1979). Given that humans are adapted to live in environments with simultaneous input from multiple sensory modalities, it is of particular interest how this continuous flow of processing is influenced by information integrated across these different sources.

Although a great deal of research has examined how visual attention influences information processing, the extent to which auditory stimuli can interact with this information processing remains unclear. Despite the relative lack of research, several fundamental audiovisual interactions are generally accepted. In real-world environments, sound commonly accompanies visual events and has an effect on visual attention. Perhaps the most robust interaction is the orienting effect of spatially localized sounds. When a sound can be discerned as having come from a specific location, such as the beeping of a car horn on the road, attention may be allocated to that area (Hillyard, Störmer, Feng, Martinez, & McDonald, 2016), enhancing processing of visual input (Frassinetti, Bolognini, & Làdavas, 2002). As with visual objects, attention also tends to be directed to sounds that provide information related to a current goal (Fritz, Elhilali, David, & Shamma, 2007). Additionally, sounds can direct visual attention to semantically congruent stimuli (Mastroberardino, Santangelo, & Macaluso, 2015). For example, the whistling of a kettle, signifying the need to turn off the heat, can attract attention to the stovetop. However, not all sounds are spatial or task-relevant. With the ubiquity of technology such as earbuds and music streaming services, experiencing sounds which are nonspatial and not relevant to current-task performance is becoming increasingly common. Furthermore, these sounds are likely completely independent of the visual scene.

Previous research has shown that the onset of a nonspatial, irrelevant auditory stimulus decreases reaction time (RT) to subsequent visual stimuli (Bertelson, 1967). These results are consistent with the activation of an alerting network, which may serve to prime the motor system in a generalized manner to execute responses faster (e.g., Niemi & Näätänen, 1981). Posner and Petersen (1990) proposed that alerting does not influence the build-up of visual information, but rather allows more rapid response selection. However, in addition to providing this type of alerting effect, sound may also alter the continuous flow of how visual stimuli are processed during perception or response-selection processes. Experiments have demonstrated that a task-irrelevant, nonspatial sound occurring temporally near the second target in a rapid serial visual presentation (RSVP) stream significantly increased identification of that target (Chen & Yeh, 2008). In particular, support of the view that sound alters the spatial allocation of visual attention, Kusnir, Chica, Mitsumasu, and Bartolomeo (2011) found that a preceding sound improved detection of a near-threshold visual target presented at one of two possible peripheral locations. However, none of these studies evaluated the effect of auditory stimuli on the processing of visual stimuli at task-irrelevant locations.

One method that has traditionally been used to assess the processing of irrelevant information in the visual field is the flanker task developed by B. A. Eriksen and Eriksen (1974). This task presents a central target flanked on each side by one or more additional distractors. Participants are asked to respond to the identity of the central target while disregarding the task-irrelevant flankers. When the response assigned to the central target does not match that of the flankers, RT is slowed due to processing of this incompatible response. This indicates that despite the target always occurring at fixation, information from flanking distractors is processed to the level of response to some degree. This flanker compatibility effect can be modulated by various factors including flanker eccentricity, attentional focus, and perceptual load (Miller, 1991).

Previous research has shown that the influence of flankers is modulated by the degree to which attention is allocated to them. Generally, when a perceptually demanding stimulus requires focused attention at a location, sensitivity to more peripheral stimuli is decreased (Carmel, Thorne, Rees, & Lavie, 2011). In contrast, increasing attentional allocation to flanking stimuli can enhance the degree to which they are integrated into perceptual processing (Freeman, Sagi, & Driver, 2001). More specifically, response-congruency effects from flankers have been shown to increase under conditions that promote attention to flankers (Gaspelin, Ruthruff, & Jung, 2014). Thus, the increase in attention to peripheral locations caused by sound, as purported by Kusnir et al. (2011), may be predicted to cause a similar increase in congruency effects.

In the current study, we used a flanker task with letter stimuli to specifically examine how a nonspatial auditory tone may change the processing of task-irrelevant distractors. Consistent with the literature on alerting (Bertelson, 1967), we expected that the tone would cause general speeding of RT to subsequently presented visual stimuli. Given the literature discussed above showing that a task-irrelevant nonspatial sound may change attentional allocation, we hypothesized that such a sound may alter attention to the flankers. The zoom lens model proposed by C. W. Eriksen and St. James (1986) is a useful framework through which to consider how a nonspatial sound may change attention. They propose that the scope of spatial attention is flexible, such that it can range from narrowly focused to more broadly distributed. In the case of a flanker task, a narrow focus of attention might be more optimal. However, if the spatial distribution of attention were to be expanded, flankers may be processed to a greater extent leading to an increased effect of congruency. Although this attentional theory is motivated by previous literature, it must also be noted that increased interference from incongruent flankers would also be consistent with the tone influencing later stages of the continuous flow of information processing. This is further considered in Experiment 2.

Experiment 1

Method

Participants

A total of 18 participants completed Experiment 1 and received course credit, and three were excluded due to technical difficulties with task presentation or eye-tracking data collection. Fifteen participants (mean age 19.7 years, 14 females) were included in the analysis reported below, which was the planned N chosen to be comparable with that used in previous investigations of flanker effect modulation (Chen & Yeh, 2008; Eriksen & Eriksen, 1974; Weinbach & Henik, 2012). All participants were recruited through the University of Colorado Denver participant pool and earned course credit for participation. Eligible participants reported having normal or corrected-to-normal vision and hearing, and no neurological impairments. Before beginning the experiment, all participants gave written informed consent. Institutional Review Board approval for this study was obtained from the University of Colorado Denver COMIRB.

Apparatus and stimuli

Participants were seated at a desk in a dimly lit room and instructed to place their chin in a chin rest situated 80 cm away from a 24-inch monitor. During the experiment, eye movements were recorded by an SR Research 1000 Plus desk-mounted eye tracker linked to the experimental computer via MATLAB and Psychtoolbox software (Brainard, 1997). The eye tracker was calibrated to the participant’s right eye before beginning the experiment and during the task as necessary. Eye tracking was done only as a means to ensure central fixation at the beginning of each trial, and there was no a priori plan to investigate eye movements.^{Footnote 1} Participants wore Audio-technica ATH-ANC20 headphones and used a Logitech videogame controller to make responses. The participant was overseen by an investigator from behind one-way mirrored glass during the experiment.

The task employed is a classic flanker task consisting of three letters (see Fig. 1). The central letter was the target and was always presented with a flanker letter on each side. The identity of the flanker letters could match or differ from the identity of the central letter. Both left and right flanker letters had the same identity on a given trial. The flankers were 2° away from the central letter in half the trials and 4° away in the other half. All letters were approximately .93° × 1.15 ° and appeared in black on a midgray background. The letter display remained on the screen until response. In 50% of the trials (sound-present), a 20-ms sine tone (500 Hz) was played through both sides of the headphones at 65 dB 100 ms before the onset of the letter display. This timing was chosen according to previous research showing auditory tones to be most effective when presented between 100 and 500 ms prior to the onset of the visual stimuli (e.g., Fuentes & Campoy, 2008). There was no tone presented in the remaining 50% of trials (sound-absent).

Design

Each trial display consisted of some combination of the letters S, C, E, and H. The target letters S and C were assigned to the top-left game-pad button and E and H to the top-right game-pad button. There were three congruency conditions. The stimuli-congruent condition was when the target had the same identity and same button response as the flanker letters (e.g., S S S). The response-congruent condition was when the target had a different identity from the flanker letters but the same button response (e.g., C S C). The incongruent condition was when the target had a different identity and button response from the flanker letters (e.g., E S E). These three conditions were crossed with the sound-present versus sound-absent manipulation. This yielded 24 trials of each congruency condition coupled with the tone and 24 trials with no tone per block of 144 trials. In half of these 24 trials, flankers appeared 2° from the target letter and in the other half they appeared at 4° from the target letter. All trial types were randomly intermixed. There were four blocks, resulting in a total of 96 trials for each critical condition. Each block also included two breaks when the participant was prompted to relax their eyes.

Procedure

Before starting the experiment, participants received verbal instructions about the task accompanied by a printout showing example stimuli. Participants were instructed to respond only to the middle letter identity and ignore the other letters. Participants were also instructed that sounds were irrelevant and to focus on the letter task. Making responses as quickly and accurately as possible was emphasized. Figure 1 shows an example of trial events. On each trial, a fixation cross appeared, and participants were required to fixate within a 0.5° radius for 400 ms before the trial progressed to ensure central fixation. Then, 300 ms after fixation was achieved, the tone occurred on sound-present trials. Finally, 100 ms after the onset of the tone, the letter display was presented, with the central letter replacing the fixation cross. On trials without a tone, timing was the same simply minus audible tone presentation. After completing the experiment, participants were debriefed and granted course credit.

Analysis

For each participant, trials with reaction times outside of three standard deviations for that condition were removed. This resulted in the removal of 1.5% of trials overall. Performance as a function of eccentricity is shown in Table 1. There was no main effect of the flanker eccentricity manipulation on reaction time, F(1, 14) = 3.43, p = .09, η_p² = 0.19. RTs seemed to be reduced with increased eccentricity mainly for the incongruent conditions, although this interaction of eccentricity and flanker type did not reach significance, F(2, 28) = 3.09, p = .06, η_p² = 0.18. Previous literature has shown varying compatibility effects on RT to a central target when distractors are between eccentricities of 1 to 5° (i.e., Egeth, 1977; Gatti & Egeth, 1978. On the other hand, B. A. Eriksen and Eriksen (1974) found no changes for distractors further than 1° in the periphery. While our results suggest that separation may have some effect on flanker processing, critically, there were no significant interactions involving eccentricity and sound presence (all Fs < 1). Since these null results prevent drawing further conclusions about sound, all further analyses collapse over this eccentricity manipulation.

Table 1. Experiment 1 mean RT and accuracies across flanker eccentricity conditions

Full size table

Results

Accuracy

Accuracy was high with participants completing 95% of trials correctly overall (see Fig. 2a). There was a significant effect of flanker type, F(2, 28) = 13.7, p < .001, η_p² = 0.49. Consistent with previous flanker studies, this was driven by more errors in the incongruent condition compared with the other conditions (p < .001, Bonferroni corrected). However, there was no main effect of sound presence, F(2, 28) = 2.14, p = .17, η_p² = 0.13, or interaction of flanker type and sound, F(2, 28) = 1.48, p = .25, η_p² = 0.10. Therefore, only correct trials were further analyzed.

Reaction time

RT is reported for each flanker type and sound presence condition in Fig. 2b. The tone acted as an alert that decreased RT across conditions, supported by a significant main effect of sound presence, F (1, 14) = 181.4, p < .001, η_p² = 0.93. The congruency manipulation resulted in the expected pattern, with incongruent trials having slower RTs. Consistent with this, there was a main effect of flanker type, F(2, 28) = 20.9, p < .001, η_p² = 0.60. Importantly for the purposes of this experiment, the flanker type effect was mediated by an interaction with sound presence, F(2, 28) = 8.67, p = .001, η_p² = 0.38.

Planned comparisons were done to examine the differential speeding by sound, with a Bonferroni-corrected alpha value of 0.017 used to assess significance. The two types of congruent flanker conditions were both speeded by the presence of a tone (67 ms for congruent-stimulus and 60 ms for congruent-response), although the magnitude of this facilitation did not differ significantly, t(14) = 1.12, p = .28, d = 0.29. On the contrary, the incongruent flanker condition was speeded by the tone only 40 ms. This facilitation was significantly smaller than that of the congruent-stimulus condition, t(14) = 3.34, p = .005, d = 0.85, and also the congruent-response condition, t(14) = 3.31, p = .005, d = 0.86.

To better understand the influence of sound on response time, standard deviation of response times in each condition were calculated. For the sound-present condition, RT standard deviations were 104.5, 103.8, and 116.6 ms for congruent-stimulus, congruent-response, and incongruent flanker conditions, respectively. On trials with no sound, these standard deviations were 114.5, 110.6, and 116.1 ms. An analysis of variance (ANOVA) of this measure showed a significant main effect of sound, such that its presence reduced RT variability, F(1, 14) = 5.3, p = 0.04, η_p² = 0.28. There was also a main effect of flanker type, F(2, 28) = 5.1, p = .01, η_p² = 0.27, driven by significantly higher variance for incongruent compared with the congruent-stimuli (p = .01, Bonferroni corrected). However, the interaction was not significant, F(2, 28) = 1.37, p = .27, η_p² = 0.09.

Discussion

The results indicate that the auditory tone presented before the flanker display led to faster RT overall and effectively serves as an alert. To better understand effects on RT, we examined their variability within each condition. As previously found by Wu et al. (2011), there was higher variance in RT for incongruent flankers conditions. Moreover, the presence of a sound reduced reaction time variability, which may suggest it better enabled participants to coordinate their response with the onset of the stimulus display. Critically, the tone also differentially affected RTs with regard to flanker type, such that RTs are speeded less for the incongruent condition compared with the congruent conditions. These results are consistent with the idea that flanking visual distractors are processed more following an auditory tone.

The design of Experiment 1 produced two types of congruent conditions due to the assignment of multiple letters to each response button. Central and flanking letters could have the same identity and same response (stimuli-congruent) or have different identities but still have the same associated response (response-congruent). Although no significant differences of the effect of sound were found between these conditions, the results do clearly show a difference between the congruent and incongruent. In a second experiment, we sought to further investigate how response association may relate to this sound facilitation.

Experiment 2

Experiment 2 aimed to further investigate if response-conflict caused by the incongruent flankers was the factor that reduced response speeding by the tone. To do so, a neutral condition using a flanker letter that never appeared as the target and had no assigned response was included. Furthermore, the congruent condition was exclusively composed of displays in which the flanker letters match the central letter identity (previously stimuli-congruent condition). These design changes allowed for analysis of the effect of the tone across three clearly defined, discrete flanker conditions.

Neutral flankers do not cause RT slowing compared with a no-flanker condition when sufficiently spaced from the central target (C. W. Eriksen, 1995), and thus could help isolate effects of the sound. If a nonspatial sound generally increases the integration of flanker response association, RT for the congruent flankers should be faster than RT for the neutral flankers. This would represent alerting benefits plus facilitation from increased activation of congruent flanker response association. Likewise, if a nonspatial sound generally increases the integration of flanker response association, trials with incongruent flankers would be expected to show the least benefit. This would represent alerting benefits negated by increased response interference from the incongruent flankers. Such findings of facilitation for congruent relative to neutral and slowing for incongruent relative to neutral would support an account of greater attention to flankers (Kusnir et al., 2011), as well as an account of greater flanker response-association activation (Fischer, Plessow, & Kiesel, 2012). However, if the RT slowing caused by a nonspatial sound is specifically due to the increased presence of response conflict, we might expect that facilitation by the sound would be the same for congruent and neutral, and only less for the incongruent condition. This would be consistent with previous work showing that flanker effects are often largely due to interference from incongruent flankers, and not facilitation from congruent flankers (Schaffer & La Berge, 1979).

Experiment 2 provided an opportunity to replicate our previous finding that interference from the incongruent flanker is increased by an auditory tone compared with the congruent condition, resulting in less RT benefit. Furthermore, this would enable us to replicate the finding that the auditory tone led to reduced RT variability. Thus, we predicted that RT would be speeded overall and be less variable after hearing an auditory tone due to generalized alerting, although less so for incongruent trials due to enhanced processing of conflicting response-mapping in the flanking visual distractors.