Simulation and acoustic measurement of anti-phase attenuation
The screening test relies on the attenuation of the anti-phase tone when played in free-field conditions. We thus first evaluated the extent of the attenuation produced by anti-phase tones. We used simulations to choose an appropriate test frequency and then made measurements to assess the degree of attenuation in realistic listening conditions.
Figure 2A shows the expected attenuation over space in ideal free-field conditions (see
Supplemental Materials
). In simulations, the test frequency used in the screening test (200 Hz) produces consistent attenuation over a broad region of space, making the attenuation effect robust to variations in head position. Higher frequencies produce attenuation that depends sensitively on head position and thus are not ideal for our screening task. Figure 2B shows measurements of attenuation of a 200-Hz anti-phase tone using a head-and-torso simulator placed at various locations relative to the speakers. Attenuation is greater than −20 dB in every case, substantially exceeding the −6 dB required for the screening test.
In-lab experiments
To validate the task, we ran it in the lab, with participants either wearing headphones or listening over loudspeakers (Experiment 1; Fig. 3). Each participant completed six trials, as in the online experiment. The results show that our screening task was effective at distinguishing participants who were listening over headphones from those listening over loudspeakers: 20 of 20 participants wearing headphones passed the test, whereas 19 of 20 participants listening over loudspeakers did not. Critically, the task achieves good discrimination between headphone and laptop listening with just a small number of trials. The short duration of this screening task is intended to facilitate its use online, where it might be desirable to run relatively brief experiments.
To test our screening task in more arduous and varied conditions, we asked a second set of participants to use the speakers on their own laptops in several locations within the Brain and Cognitive Sciences Building at MIT (see Methods). Unlike the online task and the previous in-lab experiment, these participants ran the task four times in a row, rather than just once, to enable testing the robustness of the results across four different testing rooms. Because practice effects (due to, e.g., familiarity with the stimulus, or setting the volume differently) could have produced a performance advantage for this experiment relative to the experiment over desktop speakers, we examined the results for just the first run for each participant (Fig. 4A) in addition to that for all four runs combined (Fig. 4B). We additionally examined the mean score across all four runs for each participant (Fig. 4C) to get an indication of whether certain participants were consistently able to perform the task without headphones.
Administering the test over laptop speakers (Fig. 4) again produced substantially worse performance than when participants were wearing headphones (Fig. 3, in blue), although it elicited a different pattern of responses than our test with desktop-speakers (K-S test between distributions of Figs. 3 and 4B in red, p < 0.05, D = 0.37), with a greater proportion passing our threshold (>4 correct). The screening test thus failed to detect 4 of 22 participants using laptop speakers, a modest but nonnegligible subset of our sample. The distribution of participants’ mean scores (Fig. 4C) indicates that some participants performed poorly in all rooms (mean scores in the range 0-1) while some performed well in all rooms (mean scores in the range 5-6). Examining scores obtained in each room (Figure S2) also suggests that the testing space had little impact on performance. Instead, the difference in performance could have arisen from variation in laptop speaker designs or variation in distance from the ears to the speakers due to user behavior (e.g., leaning in). Some participants (3/22) even reported using vibrations felt on the surface of the laptop to perform the task. Because 200 Hz is within the range of vibrotactile stimulation, and because phase-cancellation could also occur in surface vibrations, using touch instead of free-field hearing might not necessarily alter the expected results. However, this strategy could possibly improve performance if vibrations in the laptop-case fail to attenuate to the same degree they would in the air, for instance if a participant placed their hand close to a speaker.
Figures 3 and 4 suggest that our screening task is more effective (i.e., produces lower scores absent headphones) when desktop speakers, rather than laptops, are used. This might be expected if desktop speakers generally sit farther from the listener, because anti-phase attenuation with low-frequency tones becomes more reliable as distance to the listener increases (Figure S1B).
The dependence of test effectiveness on hardware raises the question of what sort of listening setup online participants will tend to have. To address this issue, for a portion of our online experiments (described below), we queried participants about this on our online demographics page. We found them split rather evenly between desktops and laptops. In the brief experiment run with this question added, 97 participants said they were using desktops while 107 said they were using laptops (45.8% and 50.5% respectively). The remaining 8 participants (3.6%) said they were using other devices (e.g., tablet, smartphone).
Online experiments
The cumulative pass rate (with passing defined as at least 5 of 6 correct trials) for headphone screening tasks we have run online is 64.7% (3,335 of 5,154 participants). The distribution of scores for these participants (Fig. 5) contains modes at 0 and 6 trials correct; confidence intervals (95%) obtained by bootstrapping indicate that the mode at zero is reliable. Given that chance performance on this task produces two of six trials correct on average, the obtained distribution of scores is difficult to explain by merely supposing that some participants are unmotivated or guessing. Instead, the systematic below-chance performance suggests that some participants were not wearing headphones: participants attempting in earnest to perform the task over stereo loudspeakers might be expected to score below chance, because the sound heard as quietest under those conditions— the anti-phase tone—is always the incorrect response.
Another explanation for below-chance performance on the screening task is that participants tend to confuse the instructions in a way that leads to consistently incorrect responses (for example, attempting to select the loudest rather than softest of the 3 tones). To evaluate this possibility, we ran a control version of the screening task (conducted online) in which no tones were in anti-phase (i.e., all three tones had starting phases in the L/R channels of 0°/0°), such that listening over speakers should not produce below-chance performance if participants were otherwise following the instructions correctly. The screening task was otherwise identical to the previous experiments. Results from 150 online participants are shown in Fig. 6. As before, chance performance should yield two trials correct on average.
The scores obtained from this control version of the screening task are distributed differently from the scores from our standard task (K-S test, p < 0.0001, D = 0.24). In particular, there are far fewer below-chance scores. This result suggests that the preponderance of below-chance scores observed in the standard task (i.e., when anti-phase tones are used; Fig. 4) is not due to confusion of instructions. The control task results also reveal that some proportion of online participants are screened out for poor performance even without anti-phase tones—given a pass threshold of 5 or more trials correct, 18 of 150 participants (12.0%) in this control task would have failed to pass screening (35.3% fail in the standard task with anti-phase tones). In contrast, none of the 20 participants who performed the task in the lab over headphones would have been screened out (Fig. 3). Our procedure appears to act as a screen for a subset of online participants that perform poorly (e.g., due to low motivation, difficulty understanding the instructions, or adverse environmental listening conditions), in addition to screening out those attempting the task over loudspeakers.